Purple - picking unique relevant peptides for viral experiments  1.0
A bioinformatics project developed by MF1 at the RKI, Berlin, Germany.
Purple - picking unique relevant peptides for viral experiments Documentation
  • ╔═══╦╗░╔╦═══╦═══╦╗░░╔═══╗
  • ║╔═╗║║░║║╔═╗║╔═╗║║░░║╔══╝
  • ║╚═╝║║░║║╚═╝║╚═╝║║░░║╚══╗
  • ║╔══╣║░║║╔╗╔╣╔══╣║░╔╣╔══╝
  • ║║░░║╚═╝║║║╚╣║░░║╚═╝║╚══╗
  • ╚╝░░╚═══╩╝╚═╩╝░░╚═══╩═══╝

Picking Unique Relevant Peptides for viraL Experiments

Version: 0.3.1

Description

Recent outbreaks of the Ebola and Zika viruses demonstrate that viruses are a critical topic which has an immense need for research. In viral diagnostics, mass spectrometry is catching up to the established ELISA and PCR. Targeted mass spectrometry-based proteomic analysis is predestined for viral detection since it excels with a high sensitivity for low abundance peptides. However, a preselection of marker peptides must be established for developing a targeted assay. For this purpose the tool Purple was designed in this thesis. Based on the input of the user, Purple identifies peptides that are representative for a species or strain of interest. A comparison is done between target peptides and background peptides, which includes not only uniqueness to identical background sequences, but also to highly similar ones, as it is necessary in biological applications, such as viruses with a high mutation rate. In addition, the ability to predict the detectability of the recommended peptides was performed in a proof of concept. The software suggests a sufficient amount of peptides which are highly different from a given background and can be detected. With the simplification of targeted assay design, Purple can be effectively used in targeted proteomics applications for robust, time efficient and high-throughput diagnostic testing in viral contexts.

Requirements

  • Python 3.4+
    • tqdm
    • biopython

How to use Purple

  1. Download the latest version from the releases page and extract it.
  2. Edit the config file src/config.yml
Parameter Description Example Default
target List of targets to find unique peptides [Hepatitis B, Hepatitis A] No default
threshold Threshold to filter matches Values between 0 and 100 70
update_DB Build a database or use old one True or False False
path_DB Path to folder with fasta files C:/myFASTAs/ ../res/DB/
path_output Path to output folder to store results C:/results/ ../output/
targetFile File name of the fasta with target entries target.fasta
i_am_not_sure_about_target Option to check targets before matching peptides True or False True
max_len_peptides Maximum length of peptides Positive numerical values 25
min_len_peptides Minimum length of peptides Positive numerical values 5
removeFragments Option to remove proteins with "(Fragments)" in the header True or False No default
print_peptides Print peptides at the end True or False False
comment Comments for the log book Text or numbers no comment
do_genome_mapping Do additional genome mapping afterwards True or False False
  1. Run Purple in the console. Python is required.

    ```bash python Purple_Main.py –config config.yml ```

  2. Open results in the output folder (output)
    • Peptide: Unique peptide.
    • Score: Score of the inexact matching for each peptide.
    • Occurrences: Number of occurrences for each peptide.
    • Species: species of the peptide.
    • Protein name: Names of the proteins containing this peptide.
    • Description: Complete header of the proteins listed in protein name.

How to use Purple portable

  1. Download the latest portable version from the releases page and extract it.
  2. Edit the config file config/config.yml and specify database folder and target.
Parameter Description Example Default
target List of targets to find unique peptides [Hepatitis B, Hepatitis A] No default
threshold Threshold to filter matches Values between 0 and 100 70
update_DB Build a database or use old one True or False False
path_DB Path to folder with fasta files C:/myFASTAs/ ../res/DB/
path_output Path to output folder to store results C:/results/ ../output/
targetFile File name of the fasta with target entries target.fasta
i_am_not_sure_about_target Option to check targets before matching peptides True or False True
max_len_peptides Maximum length of peptides Positive numerical values 25
min_len_peptides Minimum length of peptides Positive numerical values 5
removeFragments Option to remove proteins with "(Fragments)" in the header True or False No default
print_peptides Print peptides at the end True or False False
comment Comments for the log book Text or numbers no comment
do_genome_mapping Do additional genome mapping afterwards True or False False
  1. Run Purple portable by double-clicking the Purple_Main.exe in the main folder (Python is not required) or run it via command line:

    ```bash Purple_Main.exe ```

  2. Open results in the output folder (output)
    • Peptide: Unique peptide.
    • Score: Score of the inexact matching for each peptide.
    • Occurrences: Number of occurrences for each peptide.
    • Species: species of the peptide.
    • Protein name: Names of the proteins containing this peptide.
    • Description: Complete header of the proteins listed in protein name.

Workflow

Workflow

Author: Johanna Lechner