UIUC Home Page
 


Documentation
Introduction
Input
Output

 
Sequences

Overview
Apis
Drosophila
Insect Insulin-like
Various Insect
Mammalian (Amare)
Cattle
Human
Mouse
Rat
Rhesus monkey
Cattle (Genome)
Chicken
Zebra Finch


NeuroPred versions

Current
Test 2009
2006 NAR

 
    

NeuroPred: Output Documentation


A. Overview of NeuroPred Outputs

B. Cleavage Prediction Diagram

C. Predicted Cleavage Results

D. Model Accuracy Statistics Output

1. Results for Individual Sequences

2. Results Across All Sequences

a. Sequence Description

b. Model Accuracy Statistics

c. Area Under the ROC curve (AUC)

E. Obtain Mass of Predicted Peptides Output

F. References


A. Overview of NeuroPred Outputs

This document provides a description of the outputs of NeuroPred. A description and example usage of an earlier interface to NeuroPred (2006 NAR) was provided by Southey et al. (2006b) that varies slightly from this version. A newer version of NeuroPred, Test 2009, provides similar output but can use artifical neural network models described in Tegge et al. 2008 and Southey et al. 2008.

There are six major components to the output; however, not all outputs are provided because specific outputs depend both on the output selection task and certain options selected:

  1. An error message may be displayed for selected errors such as sequence format errors or invalid values for the settings. Whenever possible, NeuroPred will continue to perform the requested tasks using the default values.
  2. Navigation links are provided at the top of the output page to facilitate access to the various components of the output page depending on the Output Selection Task selected.
  3. Cleavage Prediction Diagram is always provided for both interfaces and is provided for every Output Selection Task except for the Print Probabilities of Basic Sites only task.
  4. Predicted Cleavage Results is optionally selected in the Simplified Options Interface and always provided in the Advanced Options Interface. This output is provided for every Output Selection Task except for the Print Probabilities of Basic Sites only task.
  5. Model Accuracy Statistics Output is only provided for the Model Accuracy Statistics task and always provided for both interfaces.
  6. Mass Prediction Output is only provided for the Obtain Mass of Predicted Peptides task and always provided for both interfaces.
TOP

B. Cleavage Prediction Diagram

The cleavage prediction diagram is provided for all Output Selection Tasks except for Print Probabilities of Basic Sites only task. Each sequence entered is automatically converted to upper case and split into groups of a maximum of 50 amino acids. Each group is presented in sequence order as follows: The first column, Sequence, denotes that first line is the sequence and contains up to five blocks in which each block holds a maximum of ten amino acids. Immediately below this line is another line for each selected model, and the final line consists of a consensus report of all models. This is repeated until the sequence is completely shown. For the model and consensus lines, a series of "s" is provided to indicate the signal sequence determined by either the global default value or sequence specific values. For each model, sites where the cleavage probability for that model exceeded the threshold probability are denoted with the letter "C" below the site while non-cleaved sites are designated by a period ".". The consensus line is defined for each site as "C" if at least one model predicted cleavage or "." if all models did not predict cleavage. By default, the rules of Amare et al. 2006 and Southey et al. 2008 are implemented and the resulting redundant sites are denoted by an 'r'. This symbol will not appear when the Ignore processing rules option in the Advanced Options Interface is set to No.

The Cleavage Prediction Diagram below, and all other examples provided herein, are generated using the Human Proglucagon Sequence and the default NeuroPred settings with the Known Motif and Mammalian models selected.


      Cleavage Prediction Diagram
      Sequence MKSIYFVAGL FVMLVQGSWQ RSLQDTEEKS RSFSASQADP LSDPDQMNED
      Known Motif ssssssssss sssss..... .......... .......... ..........
      Mammal ssssssssss sssss..... C......... .......... ..........
      Consensus ssssssssss sssss..... C......... .......... ..........
      Sequence KRHSQGTFTS DYSKYLDSRR AQDFVQWLMN TKRNRNNIAK RHDEFERHAE
      Known Motif rC........ ........rC .......... .rC......r C.........
      Mammal rC........ ........rC .......... .rC......r C.....C...
      Consensus rC........ ........rC .......... .rC......r C.....C...
      Sequence GTFTSDVSSY LEGQAAKEFI AWLVKGRGRR DFPEEVAIVE ELGRRHADGS
      Known Motif .......... .......... ......r.rC .......... ...rC.....
      Mammal .......... .......... ......r.rC .......... ...rC.....
      Consensus .......... .......... ......r.rC .......... ...rC.....
      Sequence FSDEMNTILD NLAARDFINW LIQTKITDR
      Known Motif .......... .......... .........
      Mammal .......... .......... .........
      Consensus .......... .......... .........

    When the Model Accuracy Statistics task is selected and has valid input, the default Cleavage Prediction diagram is modified to include the known cleavage information by inserting a row titled Known Cuts presenting the non-cleaved sites as zeros and cleaved sites as C's.

TOP

C. Predicted Cleavage Results

If the Display Cleavage Probabilities is set to 'Yes' in the Simplified Options Interface or the Advanced Options Interface is used, then the Predicted Cleavage Results table is provided for all valid sequences submitted. This table reports cleavage results for any site across all selected models where at least one model predicted cleavage. Under the Model Accuracy Statistics task, with valid input, the cleavage information is provided for all known cleavages reported by the user. The Predicted Cleavage Results table for the Human Proglucagon sequence using the default NeuroPred settings with the Known Motif and Mammalian models selected is shown below:

      Predicted Cleavage Results
      Site Known Cleavage Model Cleavage
      Probability
      CI Lower Bound CI Upper Bound
      R21 N/A Known Motif 0.0000 0.0000 0.0000
      Mammal 0.6267 0.3601 0.8335
      K29 N/A Known Motif 0.0000 0.0000 0.0000
      Mammal 0.0122 0.0037 0.0395
      R31 N/A Known Motif 0.0000 0.0000 0.0000
      Mammal 0.2655 0.1736 0.3834
      R52 N/A Known Motif 0.8808 0.8808 0.8808
      Mammal 0.9957 0.9786 0.9992
      K64 N/A Known Motif 0.0000 0.0000 0.0000
      Mammal 0.0018 0.0007 0.0047
      R70 N/A Known Motif 0.8808 0.8808 0.8808
      Mammal 0.5257 0.4347 0.6149
      R83 N/A Known Motif 0.8808 0.8808 0.8808
      Mammal 0.9105 0.8655 0.9415
      R85 N/A Known Motif 0.0000 0.0000 0.0000
      Mammal 0.0496 0.0388 0.0631
      R91 N/A Known Motif 0.8808 0.8808 0.8808
      Mammal 0.9105 0.8655 0.9415
      R97 N/A Known Motif 0.0000 0.0000 0.0000
      Mammal 0.5434 0.3089 0.7601
      K117 N/A Known Motif 0.0000 0.0000 0.0000
      Mammal 0.0018 0.0007 0.0047
      K125 N/A Known Motif 0.0000 0.0000 0.0000
      Mammal 0.0104 0.0024 0.0437
      R130 N/A Known Motif 0.9975 0.9975 0.9975
      Mammal 0.9201 0.8575 0.9566
      R145 N/A Known Motif 0.8808 0.8808 0.8808
      Mammal 0.5257 0.4347 0.6149
      R165 N/A Known Motif 0.0000 0.0000 0.0000
      Mammal 0.0496 0.0388 0.0631
      K175 N/A Known Motif 0.0000 0.0000 0.0000
      Mammal 0.0018 0.0007 0.0047

      Description of Predicted Cleavage Results columns

      Site:
      Identity and location of the cleaved site where R or K denote Arginine and Lysine, respectively, and the number is the position of the amino acid from the start of the submitted sequence (i.e. including the signal peptide sequence).
      Known Cleavage:
      Denotes prior knowledge of cleavage. For the Predict Cleavage Sites Only and Obtain Mass of Predicted peptides tasks, N/A denotes not applicable because any known cleavage information that is entered is ignored. For the Model Accuracy Statistics task, with valid input, the "Known Cleavage" column will read "True" for known cleaved sites or "False" for known non-cleaved sites.
      Model:
      Selected prediction model or models.
      Cleavage Probability:
      The predicted cleavage probability at that site for each model selected.
      CI Lower Bound:
      Lower bound of the confidence interval limit of the predicted cleavage probability.
      CI Upper Bound:
      Upper bound of the confidence interval limit of the predicted cleavage probability.
TOP

D. Model Accuracy Statistics Output

The Model Accuracy Statistics task provides a series of outputs for each sequence and a summary across all sequences. By default, the model accuracy statistics are calculated only for basic amino acids, following the processing rules of Amare et al. 2006 and Southey et al. 2008. Consequently, different results will occur under the Advanced Options Interface when changing these two options from the default values:

  • Ignore processing rules option: Under the Advanced Options Interface, if Yes is selected, all of the 'redundant' sites in the sequence will be used to compute the different statistics provided.
  • Use basic sites for accuracy statistics option: If No is selected under the Advanced Options Interface, the complete sequence, including all non-basic sites that are usually considered uncleaved, will be used to compute the different statistics provided.
TOP

1. Results for Individual Sequences

    For each sequence submitted, the Individual Sequence Model Accuracy Statistics table provides the number of correct and incorrect predictions calculated by each selected model at threshold probabilities incremented from 0.1 to 0.9 by 0.1 units.

        Individual Sequence Model Accuracy Statistics
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        True Positives Known Motif 5 5 5 5 5 5 5 5 1
        Mammal 6 6 6 6 6 4 4 4 4
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        True Negatives Known Motif 9 9 9 9 9 9 9 9 10
        Mammal 7 7 8 8 8 9 10 10 10
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        False Positives Known Motif 1 1 1 1 1 1 1 1 0
        Mammal 3 3 2 2 2 1 0 0 0
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        False Negatives Known Motif 1 1 1 1 1 1 1 1 5
        Mammal 0 0 0 0 0 2 2 2 2

        Description of the columns for the Individual Sequence Model Accuracy Statistics table

        Statistic:

        True Positives:
        Number of sites correctly predicted to be cleaved.
        True Negatives:
        Number of sites correctly predicted to be non-cleaved.
        False Positives:
        Number of sites incorrectly predicted to be cleaved.
        False Negatives:
        Number of sites incorrectly predicted to be non-cleaved.
        Model:
        Selected prediction model or models.
TOP

2. Results Across All Sequences

Three tables of accuracy statistics are calculated using the information from all sequences:

a. Sequence Description

      Provides a summary across all the sequences entered.

        Sequence Description
        Number of precursors entered 1
        Number of sites processed 16
        Number of known cleaved sites 6
        Number of known non-cleaved sites 10
        Prevalence 0.3750

        Description of the columns for the Sequence Description table

        Number of precursors entered:
        Number of precursor sequences that were recognized by NeuroPred.
        Number of sites:
        The total number of sites across all precursor sequences entered.
        Number of known cleaved sites:
        Total number of cleaved sites across all precursor sequences.
        Number of known non-cleaved sites:
        Total number of non-cleaved sites across all precursor sequences.
        Prevalence:
        The total number of known cleaved sites divided by the total number of sites processed.

TOP

    b. Model Accuracy Statistic

      The following statistics are calculated for each selected model at threshold probabilities incremented from 0.1 to 0.9 across all the submitted sequences.

        Model Accuracy Statistics
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        True Positives Known Motif 5 5 5 5 5 5 5 5 1
        Mammal 6 6 6 6 6 4 4 4 4
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        True Negatives Known Motif 9 9 9 9 9 9 9 9 10
        Mammal 7 7 8 8 8 9 10 10 10
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        False Positives Known Motif 1 1 1 1 1 1 1 1 0
        Mammal 3 3 2 2 2 1 0 0 0
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        False Negatives Known Motif 1 1 1 1 1 1 1 1 5
        Mammal 0 0 0 0 0 2 2 2 2
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        Correct Classification Rate Known Motif 0.8750 0.8750 0.8750 0.8750 0.8750 0.8750 0.8750 0.8750 0.6875
        Mammal 0.8125 0.8125 0.8750 0.8750 0.8750 0.8125 0.8750 0.8750 0.8750
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        Sensitivity Known Motif 0.8333 0.8333 0.8333 0.8333 0.8333 0.8333 0.8333 0.8333 0.1667
        Mammal 1.0000 1.0000 1.0000 1.0000 1.0000 0.6667 0.6667 0.6667 0.6667
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        Specificity Known Motif 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 1.0000
        Mammal 0.7000 0.7000 0.8000 0.8000 0.8000 0.9000 1.0000 1.0000 1.0000
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        Positive Precision Known Motif 0.8333 0.8333 0.8333 0.8333 0.8333 0.8333 0.8333 0.8333 1.0000
        Mammal 0.6667 0.6667 0.7500 0.7500 0.7500 0.8000 1.0000 1.0000 1.0000
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        Negative Precision Known Motif 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.6667
        Mammal 1.0000 1.0000 1.0000 1.0000 1.0000 0.8182 0.8333 0.8333 0.8333
        Statistic Model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
        Correlation Known Motif 0.7333 0.7333 0.7333 0.7333 0.7333 0.7333 0.7333 0.7333 0.3333
        Mammal 0.6831 0.6831 0.7746 0.7746 0.7746 0.5919 0.7454 0.7454 0.7454

        Description of the columns for the Model Accuracy Statistics table

        Statistic:

        True Positives:
        Number of sites correctly predicted to be cleaved.
        True Negatives:
        Number of sites correctly predicted to be non-cleaved.
        False Positives:
        Number of sites incorrectly predicted to be cleaved.
        False Negatives:
        Number of sites incorrectly predicted to be non-cleaved.
        Correct Classification Rate:
        Number of correctly predicted sites divided by the number of sites.
        Sensitivity:
        Number of true positives divided by the number of known cleaved sites.
        Specificity:
        Number of true negatives divided by the number of known non-cleaved sites.
        Positive precision:
        Proportion of sites that are predicted to be cleaved that are true positives.
        Negative precision:
        Proportion of sites that are not predicted to be cleaved that are true negatives.
        Correlation:
        Mathew's correlation coefficient between observed and predicted cleavage.

TOP

c. Area under the ROC curve (AUC):

The area under the receiver-operator characteristic (ROC) curve is a summary over all user-selected models. This curve indicates the percentage of correct decisions where values greater than 0.8 indicate excellent performance and values under 0.7 indicate poor performance.

        Area under the ROC curve
        Model AUC
        Known Motif 0.8750
        Mammal 0.9500

        Description of the columns for the Area under the ROC curve table

        Model:
        Selected prediction model or models.
        AUC:
        Area Under the receiver-operator characteristic (ROC) curve

TOP

E. Obtain Mass of Predicted peptides Output

When the Obtain Mass of Predicted peptides task is selected, the original sequence is cleaved based on predicted values for each model. The resulting peptides are extended by an order of 2 by default, or by the Degree of peptide extension value selected in the Advanced Options Interface (described in the Degree of peptide extension option), before the selected post-translational modifications are applied. Only peptides where the selected post-translational modifications have been successfully applied are reported using small blue brackets (e.g. DFPEEVAIVEEL[Amide] denotes amidation of the peptide DFPEEVAIVEELG). The average and monoiostopic masses of the predicted peptides are calculated at all stages such that masses are available for every combination of PTM to address the possibility that some PTMs may be absent. Note that the standard mass or molecular weight, not the MH+ or M+H mass, is calculated. The results are presented in the Mass of Predicted Peptides table:

      Mass of Predicted Peptides
      Abb. Peptide NCut CCut PTM applied Predicted Aver. Mass Predicted Mono. Mass Peptide sequence
      Q16_R21 Signal Peptidase Mammal Cleaved 760.808200 760.366770 QGSWQR
      Q16_Q20 Signal Peptidase Mammal TrimKR 604.620700 604.265660 QGSWQ
      S22_R52 Mammal Known Motif, Mammal Cleaved 3512.677900 3510.585600 SLQDTEEKSRSFSASQADPLSDPDQMNEDKR
      S22_D50 Mammal Known Motif, Mammal TrimKR 3228.316300 3226.389530 SLQDTEEKSRSFSASQADPLSDPDQMNED
      H53_R70 Known Motif, Mammal Known Motif, Mammal Cleaved 2148.277000 2147.008320 HSQGTFTSDYSKYLDSRR
      H53_S68 Known Motif, Mammal Known Motif, Mammal TrimKR 1835.902000 1834.806100 HSQGTFTSDYSKYLDS
      A71_R83 Known Motif, Mammal Known Motif, Mammal Cleaved 1636.889700 1635.824270 AQDFVQWLMNTKR
      A71_T81 Known Motif, Mammal Known Motif, Mammal TrimKR 1352.528100 1351.628200 AQDFVQWLMNT
      N84_R91 Known Motif, Mammal Known Motif, Mammal Cleaved 985.114700 984.562840 NRNNIAKR
      N84_A89 Known Motif, Mammal Known Motif, Mammal TrimKR 700.753100 700.366770 NRNNIA
      H92_R97 Known Motif, Mammal Mammal Cleaved 831.840800 831.356250 HDEFER
      H92_E96 Known Motif, Mammal Mammal TrimKR 675.653300 675.255140 HDEFE
      H98_R130 Mammal Known Motif, Mammal Cleaved 3668.087000 3665.875340 HAEGTFTSDVSSYLEGQAAKEFIAWLVKGRGRR
      H98_G128 Mammal Known Motif, Mammal TrimKR 3355.712000 3353.673120 HAEGTFTSDVSSYLEGQAAKEFIAWLVKGRG
      D131_R145 Known Motif, Mammal Known Motif, Mammal Cleaved 1758.949600 1757.899900 DFPEEVAIVEELGRR
      D131_G143 Known Motif, Mammal Known Motif, Mammal TrimKR 1446.574600 1445.697680 DFPEEVAIVEELG
      H146_R179 Known Motif, Mammal Sequence End Cleaved 3922.341600 3919.921380 HADGSFSDEMNTILDNLAARDFINWLIQTKITDR
      H146_D178 Known Motif, Mammal Sequence End TrimKR 3766.154100 3763.820270 HADGSFSDEMNTILDNLAARDFINWLIQTKITD
      H98_R127 Known Motif, Mammal Sequence End TrimKR+ Amidation 3297.675400 3295.669520 HAEGTFTSDVSSYLEGQAAKEFIAWLVKGR[Amide]
      D131_L142 Known Motif, Mammal Sequence End TrimKR+ Amidation 1388.538000 1387.694080 DFPEEVAIVEEL[Amide]
      Q16_R21 Known Motif, Mammal Sequence End Cleaved+ QPyroglutamination 743.777600 743.340170 [p-]QGSWQR
      Q16_Q20 Known Motif, Mammal Sequence End TrimKR+ QPyroglutamination 587.590100 587.239060 [p-]QGSWQ
      S22_R52 Known Motif, Mammal Sequence End Cleaved+ Acetylation 3554.715200 3552.596200 [Ac-]SLQDTEEKSRSFSASQADPLSDPDQMNEDKR
      S22_D50 Known Motif, Mammal Sequence End TrimKR+ Acetylation 3270.353600 3268.400130 [Ac-]SLQDTEEKSRSFSASQADPLSDPDQMNED
      A71_R83 Known Motif, Mammal Sequence End Cleaved+ Acetylation 1678.927000 1677.834870 [Ac-]AQDFVQWLMNTKR
      A71_T81 Known Motif, Mammal Sequence End TrimKR+ Acetylation 1394.565400 1393.638800 [Ac-]AQDFVQWLMNT
      H53_R70 Known Motif, Mammal Sequence End Cleaved+ YSulfation 1x 2228.341200 2226.965120 HSQGTFTSDY[SO3H]SKYLDSRR
      H53_S68 Known Motif, Mammal Sequence End TrimKR+ YSulfation 1x 1915.966200 1914.762900 HSQGTFTSDY[SO3H]SKYLDS

      Description of the columns for the Mass of Predicted Peptides table

      Abb. Peptide:
      Abbreviated Peptide where the start and the end of the peptide is provided by a single letter amino acid code and location within the sequence.
      NCut:
      The model or models that predicted cleavage at this site that corresponds to the start or N-terminal region of the peptide. Signal Peptidase or Sequence Start are used to denote the start of the peptide whether a signal peptide is indicated or not.
      CCut:
      The model or models that predicted cleavage at this site that corresponds to the end or C-terminal region of the peptide. Sequence End is used to denote the end of the sequence.
      PTM Applied:
      Provides which PTMs have been applied to the peptide. Most PTM designations are self-explanatory; however, "Cleaved" denotes that the peptide has only been cleaved, "Extended" denotes that adjacent peptides have been joined, and "TrimKR" indicates that all the C-terminal K and R amino acids have been removed after cleavage. When a PTM is applied multiple times, only the mass with all occurrences is reported and the frequency is also reported in the "PTM Applied" column by indicating the number of PTM occurrences, followed by the times symbol ("x"), (e.g., Acetylation 3x).
      Predicted Aver. Mass:
      Predicted average mass of the peptide including any PTMs.
      Predicted Mono. Mass:
      Predicted monoisotopic mass of the peptide including any PTMs.
      Peptide Sequence:
      Complete sequence of the peptide. Any post-translational modifications that have been successfully applied are reported using small blue brackets (e.g. DFPEEVAIVEEL[Amide] denotes amidation of the peptide DFPEEVAIVEELG).

TOP

F. References

Amare, A., Hummon, A.B., Southey, B.R., Zimmerman, T.A., Rodriguez-Zas, S.L., Sweedler, J.V., Bridging neuropeptidomics and genomics with bioinformatics: prediction of mammalian neuropeptide prohormone processing. J. Proteome Res., 2006, 5, 1162-1167. Abstract.

Southey B.R., Amare A., Zimmerman T.A., Rodriguez-Zas S.L., Sweedler J.V., NeuroPred: a tool to predict cleavage sites in neuropeptide precursors and provide the masses of the resulting peptides. Nucleic Acids Res., 2006b, 34 (Web Server issue), W267-272. Abstract.

Tegge, A.N., Southey, B.R., Sweedler, J.V., Rodriguez-Zas, S.L., Comparative Analysis of Neuropeptide Cleavage Sites in Human, Mouse, Rat, and Cattle. Mamm. Genome, 2008 , 19(2), 106-120. Abstract.

Southey, B.R., Hummon, A.B., Richmond, T.A., Sweedler, J.V., Rodriguez-Zas, S.L., Prediction of neuropeptide cleavage sites in insects. Bioinformatics, 2008, 24, 815-825. Full Text

TOP