PepShop Input Documentation

A. Query Prohormone Database

The PepShop database can be searched by UniProt accession number, gene symbol, organism name (7 species), exact amino acid sequence, and peptide monoisotopic mass with adjustable mass tolerance level.

1. Accession Number

The primary accession numbers in one of the following databases: UniProt (Examples: P22005, P01193), NCBI-RefSeq (Examples: NP_079142.2, NP_891558.1), UniGene (Examples: Mm.210541, Rn.141298, Hs.1897), Mouse genome index database (Examples: MGI:108058, MGI:2675256), Entrez Gene (Examples: 223780, 4879, 24602).

2. Prohormone Symbol

The prohormone symbol (Examples: "ADML", "COLI").

3. Species

PepShop supports the prohormone and peptide identification based on the availability of sequenced genomes. Currently information is available for the Human, Mouse, Rat, Rhesus monkey, Cattle, Dog and Pig species. User can select the organism name from the pull down menu.

4. Monoisotopic Mass

The monoisotpic mass of peptide with adjustable mass tolerance (±).

5. Prohormone Sequence

The full or partial exact prohormone or peptide amino acid sequence. There are two supported sequence formats: FASTA format or only sequence without FASTA header (Examples: "YGGFM", "KYVMGHFRWD"). Multiple sequence submissions are not supported at the same time.

B. Proteomics MS/MS Tools

1. MS/MS spectrum search

PepShop enables the search of user-provided MS/MS spectral data profiles against the in-house neuropeptide Swepep repository of annotated peptides and PepShop peptide databases using three open source database search algorithms, Crux, X!Tandem, and OMSSA. Details on search paratmeters are summarized below.

Description of parameters:

Search title

An optional text that will be printed at the top of output page.

Peptide database

Select PepShop peptide sequence databases to be searched. PepShop curated peptide databases contains experimentally proven neuropeptides provided by SwePep, UniProt and NeuroPred. Currently, the peptide databases of Human, Mouse, Rat, cow, pig, dog and rhesus monkey species are available.

MS/MS spectrum data

MS/MS spectral data can be provided either using a submission box or a file selected using browse button. The submission box takes priority over file upload method. For OMSSA and X!Tandem, MS/MS spectral data can be provided in Mascot Generic Format (MGF) (example). The Crux requires MS/MS spectral data in MS2 format (example).

Measurement Errors (±)

The precursor mass tolerance is the error window on experimental precursor mass values. The fragment mass tolerance is the error window for fragment ion tolerance. Descriptions of units are given in the table.

Unit	Description
Da	absolute units of Da
ppm	fraction expressed as parts per million
mass	m+h value is converted to mass
mz	tolerance as mass-to-charge ratio

Isotopic mass type

Provide whether the precursor and fragment ion mass values are monoisotopic or average. Default is monoisotopic.

Maximum precursor charge

The maximum precursor charge state to consider in an MS/MS ion search.

Minimum number of m/z peaks

The minimum number of MS/MS peaks required for a spectrum to be considered. This parameter ensures that a spectrum contains enough number of peaks for a usefull interpretation.

Minimum parent m+h

The minimum parent mass-to-charge ratio for a spectrum to be considered. This parameter is used to exclude noise peaks in the parent ion spectrum.

Minimum fragment m/z

The minimum fragment m/z peak to be considered. he fragment ion mass spectra produced by peptides contain peaks that are sequence-specific and those that are caused by individual amino acid residues. The peaks caused by these individual amino acid residues tend to be relatively small (m/z < 200) and they do not affect the outcome of a search.

Minimum peptide length

The minimum length of the peptides to place in the index. Default value is 6.

Total spectrum peaks

The maximum number of peaks to be used from a spectrum. Only these most intense peaks are used in scoring process.

Minimum ion count

The minimum number of ions required for a peptide to be scored.

Maximum ion charge

The maximum fragment ion charge state. The possible values are 1, 2, 3 or peptide. Default value is peptide.

Ions to search

The ion-series to search in specturm. Default values are b and y ion-series. User can also specify other ions-series: a, c, x and z.

Refine model

The refinement module improves the speed and accuracy of peptide identification.

Point mutations

The point mutations are part of model refinement process of X!Tandem. By setting this to yes, X!Tandem tests the selected sequences for the possibility of a point mutations in the generated peptides.

Maximum valid expectation value

The highest value allowed for reported peptides. All peptides with expectation value less than this value are considered to be statistically significant and are recorded.

Results, output

The parameter determines type of results to report. It has three possible values: all, valid and stochastic. When set as 'all', results for all of the spectra are reported. When set as 'valid', results that have an expectation value less than output maximum valid expectation value are reported. With 'stochastic' option X!Tandem reports only those results that have an expectation value greater than output maximum valid expectation value.

Minimum number of m/z matches

The minimum number of m/z matches a sequence library peptide must have for the hit to the peptide to be recorded.

Maximum number of ions per series, sp

The maximum number of ions in each series being searched. Default is 100. Set it to '0' to search all ions.

Minimum number of m/z matches, hm

The the minimum number of m/z matches a sequence library peptide must have for the hit to the peptide to be recorded.

Intense peaks matched, ht

The number of m/z values corresponding to the most intense peaks that must include one match to the theoretical peptide.

Maximum hits reported

The maximum number of peptides reported per spectrum.

2. Predict Fragment Ions from Sequence

Given a peptide sequence, predicts the corresponding fragment ion-series and there mass-to-charge (m/z) values according to the provided options. User can submit a single sequence as shown in example. Details on the specific options are provided in the following table.

Precursor charge state
Mass type	monoisotopic average
Ion types	a ions b ions c ions x ions y ions z ions
Neutral mass losses	Ammonia loss (-17Da) Water loss (-18Da)

Description of parameters:

Precursor charge state:: The charge state of the peptide and the program predicts precursor and fragment ions upto this charge state. The Precursor charge state allow user to select the maximum charge state. The maximum charge state of the fragment ion-series depends on the Precursor charge state.
Mass type:: Which isotopes to use in calculating the precursor and fragment ion mass-to-charge (m/z) values. Default is monoisotopic.
Ion types:: Which fragment ion-series to predict. Default values are b and y ions. The other ion series can also be selected: a, c, x and z.
Neutral mass losses:: PepShop reports loss of single water or ammonia molecules from the selected ion series depending upon amino acid composition of ions. The water losing amino acids are Ser, Thr, Glu and Asp. The ammonia losing amino acids are Arg, Lys, Gln and Asn.

C. Bioinformatic Tools

Bioinformatics tools for prediction of cleavage sites in prohromone sequences, single sequence search (BLASTP) and Multiple Sequence Alignment (MUSCLE) have been provided in PepShop.

The basic use of these tools requires:

1. Entering one or more sequences in a FASTA format using one of two methods:
    1. The sequence submission box.
    2. A file selected using the file upload box:
2. Optional parameter selection for BLASTP and NeuroPred:
    1. NeuroPred: One or more models in the Model Section box; Post-Translational Modifications (PTMs).
    2. BLASTP: One database from Database Selection box.
3. Run the tool.

1. Sequence Submission

A sequence is the only required input for running these tools. Sequences in FASTA format may be entered into the sequence submission box, or uploaded from a text file; multiple sequences can be submitted at the same time. The sequences in submission box takes priority over those uploaded from a text file; therefore, only sequences in the submission box will be used, even if sequences were also uploaded from the text file.

2. NeuroPred

NeuroPred will attempt to predict cleavage sites in the submitted sequences using known motif model (default selection) and other logistic regression models trained on experimentally verified cleavage information. In addition masses of predicted peptides are calculated using common neuropeptide post-translational modifications.

a). Model Selection

The following cleavage prediction models are available, the first model is an empirical model, whereas all other models are binary logistic regression models.

The following models are available:

Known Motif:: This model, reported by Southey et al. (2006a), is based on the occurrence of the following motifs in the sequence: xxKR, xxKK, xxRR, RxxR and RxxK, where x is any amino acid and cleavage occurs after the right-most or C-terminal amino acid. When any of these motifs is present, the cleavage probability is set to 0.88 and when two of these motifs are present (e.g. RxKR, which is composed of the motifs xxKR and RxxR), the cleavage probability is set to 0.997. This is the default model.
Mollusc:: The Complex Model, reported by Hummon et al. (2003), trained on precursor sequences from the mollusc, Aplysia californica.
Mammalian:: Model reported by Amare et al. (2006), trained on published mammalian precursor sequences from cow, human, mouse, pig and rat.
Insect:: A joint model derived from the models trained by Southey et al. (2008) on published precursor sequences from the honey bee, Apis mellifera and fruit fly, Drosophila melanogaster.
Mollusc_Basic:: Model reported by Hummon et al. (2003), trained on precursor sequences from the mollusc, Aplysia californica. This model has fewer terms than the Mollusc model and also had poorer predictive ability than the Mollusc mode (Hummon et al. 2003).
Apis:: Model reported by Southey et al. (2008), trained on published precursor sequences from the honey bee, Apis mellifera, reported by Hummon et al., 2006.
Drosophila:: Model reported by Southey et al. (2008), trained on precursor sequences from the fruit fly, Drosophila melanogaster
Bovine:: A model trained on published cattle, Bos taurus, precursor sequences by Tegge et al. (2007, 2008).
Human:: A model trained on published human, Homo sapiens, precursor sequences by Tegge et al. (2007, 2008).
Mouse:: A model trained on published mouse, Mus musculus, precursor sequences by Tegge et al. (2007, 2008).
Rat:: A model trained on published rat, Rattus norvegicus, precursor sequences by Tegge et al. (2007, 2008).
Any Basic Site:: Simply considers any basic amino acid (Arginine or Lysine) as cleaved. However, when the amino acid combinations listed in the known motif model (e.g. xxKR, xxKK, xxRR, RxxR and RxxK) are present, only the last amino acid in the motif is considered as cleaved.

b). Selection of Post-Translational Modifications

Prohormone precursors undergo extensive modification before active neuropeptides and hormones are obtained. In addition to cleavages at basic sites and immediate removal of C-terminal basic residues (Trim C-terminal K and R), several other modifications may be present. The most common PTMs are amidation (of C-terminal glycine) and pyroglutamylation (cyclization of N-terminal glutamate or glutamine). Sulfation of tyrosine (Tyr-Sulfation) and acetylation are also common, albeit occurring somewhat less frequently; thus these four common PTMs are grouped together while even less common PTMs are grouped separately. Disulfide bond formation between two cysteine residues resulting in a mass loss of 2 Da is a common PTM in neuropeptides, but it is difficult to predict whether the disulfide bond is formed between two peptides or within a peptide containing two or more cysteines. In addition, it is difficult to identify the cysteine pairs involved in bond formation. For these reasons, disulfide bond formation is not modeled in NeuroPred. The user may consult web-based tools such as Cyspred to determine potentially disulfide bonding cysteines. A table of the available PTMs can be found here.

Post-Translational Modifications (PTMs)

Trim C-terminal K and R

Most Common PTMs

Less Common PTMs

Amidation O-linked Glycosylation of S N-linked Glycosylation of S Bromination of W

Pyroglutamination O-linked Glycosylation of T N-linked Glycosylation of T Methylation of E

Acetylation Dipeptidase Hydroxylation of P Methylation of H

Sulfation of Y Carboxylation of E Phosphorylation of S Methylation of K

DiAcetylation Phosphorylation of T Methylation of R

3. Blastp

Standard protein-protein BLAST (blastp) will attempt to identify both submitted amino acid sequences and other similar sequences in selected protein database. The blastp find local regions of similarity using pairwise sequence alignment. When sequence similarity spans the whole sequence, blastp will also report a global alignment, which is the preferred result for prohormone identification purposes. The following databases are available in PepShop for blastp: RefSeq-mammalian, RefSeq-invertebrate, RefSeq-othervertebrates, UniProt and PepShop prohormone databases.

4. Muscle

Multiple Sequence Alignment (MSA) of two or more than two submitted sequences can be performed using Muscle. To get alignment, click on

D. References

Amare, A., Hummon, A.B., Southey, B.R., Zimmerman, T.A., Rodriguez-Zas, S.L., Sweedler, J.V., Bridging neuropeptidomics and genomics with bioinformatics: prediction of mammalian neuropeptide prohormone processing. J. Proteome Res. 2006, 5, 1162-1167. Abstract.

Hummon, A.B., Hummon, N.P., Corbin, R.W., Li, L.J., Vilim, F.S., Weiss, K.R., Sweedler, J.V., From precursor to final peptides: a statistical sequence-based approach to predicting prohormone processing. J. Proteome Res. 2003, 2, 650-656. Abstract.

Hummon, A.B. Richmond, T.A. Verleyen, P. Baggerman, G. Huybrechts, J. Ewing, M A. Vierstraete, E. Rodriguez-Zas, S.L. Schoofs, L. Robinson, G.E. Sweedler, J.V. , From the Genome to the Proteome: Uncovering Peptides in the Apis Brain, Science 2006, 314, 647-649. Abstract.

Southey, B.R., Rodriguez-Zas, S.L., Sweedler, J.V., Prediction of neuropeptide prohormone cleavages with application to RFamides. Peptides 2006a, 27, 1087-1098. Abstract.

Southey B.R., Amare A., Zimmerman T.A., Rodriguez-Zas S.L., Sweedler J.V., NeuroPred: a tool to predict cleavage sites in neuropeptide precursors and provide the masses of the resulting peptides. Nucleic Acids Res. 2006b, 34 (Web Server issue), W267-272. Abstract.

Tegge, A.N. Southey, B.R. Sweedler, J.V. Rodriguez-Zas, S.L., Enhanced Prediction of Cleavage in Bovine Precursor Sequences. Lecture Notes in Computer Science, Bioinformatics Research and Applications, Vol. 4463, pp. 350-360, 2007, Springer. Abstract.

Southey, B.R., Hummon, A.B., Richmond, T.A., Sweedler, J.V., Rodriguez-Zas, S.L., Prediction of neuropeptide cleavage sites in insects. Bioinformatics, 2008, 24, 815-825. Full Text

Tegge, A.N., Southey, B.R., Sweedler, J.V., Rodriguez-Zas, S.L., Comparative Analysis of Neuropeptide Cleavage Sites in Human, Mouse, Rat, and Cattle. Mamm. Genome, 2008 , 19(2), 106-120. Abstract.

Questions or comments: Sandra Rodriguez Zas (rodrgzzs@illinois.edu)

Trim C-terminal K and R
Most Common PTMs	Less Common PTMs
Amidation	O-linked Glycosylation of S	N-linked Glycosylation of S	Bromination of W
Pyroglutamination	O-linked Glycosylation of T	N-linked Glycosylation of T	Methylation of E
Acetylation	Dipeptidase	Hydroxylation of P	Methylation of H
Sulfation of Y	Carboxylation of E	Phosphorylation of S	Methylation of K
	DiAcetylation	Phosphorylation of T	Methylation of R