View on GitHub

HiCOPS

Computational framework for scalable acceleration of database peptide search on supercomputers

Getting Started

Follow the below steps to get started with HiCOPS:

Setup

Setup the peptide database, experimental MS/MS dataset and HiCOPS instrumentation using the below instructions.

Setup Database

Get the desired protein sequence database from UniProt/Swissprot. Digest the protein sequence database into a peptide sequence database using Digestor tool available with OpenMS (preferred) or using Protein Digestion Simulator. Make sure that the generated peptide sequence database is in FASTA format. Example commands for the OpenMS Digestor tool:

$ Digestor.exe -in <proteome.fasta> -out <digested.fasta> \ 
-out_type fasta -threads 8 -missed_cleavages 2 -enzyme Trypsin \ 
-min_length 6 -max_length 46 -FASTA:ID both -FASTA:description remove

Now use the db_prep tool to separate coarse-grained peptide sequence clusters. This tool will generate files in ./<output>/<len>.pep directory. Read more about the usage of db_prep tool here.

Setup MS/MS Dataset

HiCOPS currently only supports the MS2 format for experimental MS/MS data. Please convert all experimental MS/MS data files into this format using the raw2ms2 command line tool available with HiCOPS. Read more about the usage of raw2ms2 tool here.

Setup Instrumentation

Optional: If HiCOPS instrumentation was enabled during build, it can be configured and modified using the following environment variables. See how to enable HiCOPS instrumentation in the Install document.

Variable Description
TIMEMORY_ENABLED Enable/disable Timemory instrumentation interface. Set to : ON (default), OFF. Requires => USE_TIMEMORY=ON option in install step.
HICOPS_MPIP_INSTR Enable MPI data_tracker instrumentation. Set to: ON (default), OFF. Requires => USE_TIMEMORY=ON and USE_MPIP_LIBRARY=ON options in install step.
HICOPS_INST_COMPONENTS Append to the instrumentation components. Set as: HICOPS_INST_COMPONENTS="ci,.." where ci are Timemory components.
HICOPS_PAPI_EVENTS Edit the hardware counters that are instrumented. Set as: HICOPS_PAPI_EVENTS="hi,.." where hi are PAPI counters.

To list all available timemory components here. By default, the following hardware counters are inserted into the HICOPS_PAPI_EVENTS.

HICOPS_PAPI_EVENTS="PAPI_TOT_INS, PAPI_TOT_CYC, PAPI_L3_TCM, \
PAPI_L2_TCA, PAPI_L3_TCA, PAPI_MEM_WCY, PAPI_RES_STL, \
PAPI_STL_CCY, PAPI_BR_CN, PAPI_BR_PRC, PAPI_FUL_ICY"

To see which hardware counters are available on your system and their description, use the papi_avail or timemory-avail tool. Refer to the PAPI documentation here for more information.

NOTE: If a PAPI counter is not available on the system but is added to the HICOPS_PAPI_EVENTS anyway, the profiler will not instrument any of the counters in the list regardless of their availability.

Run HiCOPS

Follow the instructions relevant to your compute environment to seamlessly run HiCOPS. We categorize the compute environments into two categories

XSEDE Comet

If you are running on XSEDE Comet environment, skip the rest of this document and follow the instructions here.

Any Other System

Follow the below instructions if you are running on any system but XSEDE Comet.

Generate Params

i. Ensure that the hicops-core library path is added to LD_LIBRARY_PATH.

# append hicops-core lib path to LD_LIBRARY_PATH.
$ export LD_LIBRARY_PATH=$HICOPS_INSTALL/lib:$LD_LIBRARY_PATH

ii. Generate HiCOPS template runtime parameters file using the hicops_config tool located at $HICOPS_INSTALL/bin.

# run hicops_comet with -g
$ $HICOPS_INSTALL/bin/hicops_config -g
# generated: ./sampleparams.txt

iii. Edit the generated sampleparams.txt file and setup HiCOPS’ runtime parameters, database and data paths.

iv. Generate the HiCOPS runtime parameters file (uparams.txt) using hicops_config as:

# run hicops_comet with sampleparams.txt
$ $HICOPS_INSTALL/bin/hicops_config ./sampleparams.txt

# generated: uparams.txt

Execute

v. Run HiCOPS with uparams.txt as input argument with or without MPI depending on HiCOPS install options. Use the resource manager (SLURM, LSH) if working on a managed cluster system.

Note: Configure the mpirun options as follows: set binding level to socket and binding policy to scatter.

Note: We highly recommend running HiCOPS through batch job submission sbatch instead of srun. Make sure to follow the relevant Hybrid MPI/OpenMP batch submission template when doing so.

# without MPI
$ $HICOPS_INSTALL/bin/hicops $HICOPS_INSTALL/bin/uparams.txt

# SLURM without MPI
$ srun [OPTIONS] $HICOPS_INSTALL/bin/hicops $HICOPS_INSTALL/bin/uparams.txt

# with MPI
$ mpirun -np [N] [OPTIONS] $HICOPS_INSTALL/bin/hicops \
  $HICOPS_INSTALL/bin/uparams.txt

# SLURM with MPI
$ srun [OPTIONS] mpirun -np [N] [OPTIONS] $HICOPS_INSTALL/bin/hicops \
  $HICOPS_INSTALL/bin/uparams.txt

vi. After HiCOPS execution is complete, run the psm2excel tool with workspace output directory (set in the sampleparams.txt file) as arguments.

# psm2excel
$ $HICOPS_INSTALL/tools/psm2excel [/path/to/hicops/workspace/output]

# psm2excel with SLURM
$ srun [OPTIONS] --nodes=1 $HICOPS_INSTALL/tools/psm2excel -i \
  [/path/to/hicops/workspace/output]

vii. Repeat Steps iii. to vi. when you modify parameters in the sampleparams.txt.

Precautions

Please read and follow the following precautions to avoid any errors.