UIUC Neuroproteomics and Neurometabolomics Center on Cell-Cell Signaling >> Welcome to NeuroPred >> NeuroPred output documentation

	NeuroPred versions Current
	Documentation NeuroPred Introduction Input Output


	Sequences Overview Apis Drosophila Insect Insulin-like Various Insect Mammalian (Amare) Cattle Human Mouse Rat Rhesus monkey Cattle (Genome) Chicken Zebra Finch

NeuroPred: Output Documentation

A. Overview of NeuroPred Outputs

B. Cleavage Prediction Diagram

C. Predicted Cleavage Results

D. Model Accuracy Statistics Output

1. Results for Individual Sequences

2. Results Across All Sequences

a. Sequence Description

b. Model Accuracy Statistics

c. Area Under the ROC curve (AUC)

E. Obtain Mass of Predicted Peptides Output

F. References

A. Overview of NeuroPred Outputs

This document provides a description of the outputs of NeuroPred. A description and example usage of an earlier interface to NeuroPred (2006 NAR) was provided by Southey et al. (2006b) that varies slightly from this version. A newer version of NeuroPred, Test 2009, provides similar output but can use artifical neural network models described in Tegge et al. 2008 and Southey et al. 2008.
There are six major components to the output; however, not all outputs are provided because specific outputs depend both on the output selection task and certain options selected:

An error message may be displayed for selected errors such as sequence format errors or invalid values for the settings. Whenever possible, NeuroPred will continue to perform the requested tasks using the default values.

Navigation links are provided at the top of the output page to facilitate access to the various components of the output page depending on the Output Selection Task selected.

Cleavage Prediction Diagram is always provided for both interfaces and is provided for every Output Selection Task except for the Print Probabilities of Basic Sites only task.

Predicted Cleavage Results is optionally selected in the Simplified Options Interface and always provided in the Advanced Options Interface. This output is provided for every Output Selection Task except for the Print Probabilities of Basic Sites only task.

Model Accuracy Statistics Output is only provided for the Model Accuracy Statistics task and always provided for both interfaces.

Mass Prediction Output is only provided for the Obtain Mass of Predicted Peptides task and always provided for both interfaces.

TOP

B. Cleavage Prediction Diagram

The cleavage prediction diagram is provided for all Output Selection Tasks except for Print Probabilities of Basic Sites only task. Each sequence entered is automatically converted to upper case and split into groups of a maximum of 50 amino acids. Each group is presented in sequence order as follows: The first column, Sequence, denotes that first line is the sequence and contains up to five blocks in which each block holds a maximum of ten amino acids. Immediately below this line is another line for each selected model, and the final line consists of a consensus report of all models. This is repeated until the sequence is completely shown. For the model and consensus lines, a series of "s" is provided to indicate the signal sequence determined by either the global default value or sequence specific values. For each model, sites where the cleavage probability for that model exceeded the threshold probability are denoted with the letter "C" below the site while non-cleaved sites are designated by a period ".". The consensus line is defined for each site as "C" if at least one model predicted cleavage or "." if all models did not predict cleavage. By default, the rules of Amare et al. 2006 and Southey et al. 2008 are implemented and the resulting redundant sites are denoted by an 'r'. This symbol will not appear when the Ignore processing rules option in the Advanced Options Interface is set to No.

The Cleavage Prediction Diagram below, and all other examples provided herein, are generated using the Human Proglucagon Sequence and the default NeuroPred settings with the Known Motif and Mammalian models selected.

**Cleavage Prediction Diagram**
Sequence	MKSIYFVAGL FVMLVQGSWQ RSLQDTEEKS RSFSASQADP LSDPDQMNED
Known Motif	ssssssssss sssss..... .......... .......... ..........
Mammal	ssssssssss sssss..... C......... .......... ..........
Consensus	ssssssssss sssss..... C......... .......... ..........
Sequence	KRHSQGTFTS DYSKYLDSRR AQDFVQWLMN TKRNRNNIAK RHDEFERHAE
Known Motif	rC........ ........rC .......... .rC......r C.........
Mammal	rC........ ........rC .......... .rC......r C.....C...
Consensus	rC........ ........rC .......... .rC......r C.....C...
Sequence	GTFTSDVSSY LEGQAAKEFI AWLVKGRGRR DFPEEVAIVE ELGRRHADGS
Known Motif	.......... .......... ......r.rC .......... ...rC.....
Mammal	.......... .......... ......r.rC .......... ...rC.....
Consensus	.......... .......... ......r.rC .......... ...rC.....
Sequence	FSDEMNTILD NLAARDFINW LIQTKITDR
Known Motif	.......... .......... .........
Mammal	.......... .......... .........
Consensus	.......... .......... .........

When the Model Accuracy Statistics task is selected and has valid input, the default Cleavage Prediction diagram is modified to include the known cleavage information by inserting a row titled Known Cuts presenting the non-cleaved sites as zeros and cleaved sites as C's.

TOP

C. Predicted Cleavage Results

If the Display Cleavage Probabilities is set to 'Yes' in the Simplified Options Interface or the Advanced Options Interface is used, then the Predicted Cleavage Results table is provided for all valid sequences submitted. This table reports cleavage results for any site across all selected models where at least one model predicted cleavage. Under the Model Accuracy Statistics task, with valid input, the cleavage information is provided for all known cleavages reported by the user. The Predicted Cleavage Results table for the Human Proglucagon sequence using the default NeuroPred settings with the Known Motif and Mammalian models selected is shown below:

**Predicted Cleavage Results**
Site	Known Cleavage	Model	Cleavage Probability	CI Lower Bound	CI Upper Bound
R21	N/A	Known Motif	0.0000	0.0000	0.0000
R21	N/A	Mammal	0.6267	0.3601	0.8335
K29	N/A	Known Motif	0.0000	0.0000	0.0000
K29	N/A	Mammal	0.0122	0.0037	0.0395
R31	N/A	Known Motif	0.0000	0.0000	0.0000
R31	N/A	Mammal	0.2655	0.1736	0.3834
R52	N/A	Known Motif	0.8808	0.8808	0.8808
R52	N/A	Mammal	0.9957	0.9786	0.9992
K64	N/A	Known Motif	0.0000	0.0000	0.0000
K64	N/A	Mammal	0.0018	0.0007	0.0047
R70	N/A	Known Motif	0.8808	0.8808	0.8808
R70	N/A	Mammal	0.5257	0.4347	0.6149
R83	N/A	Known Motif	0.8808	0.8808	0.8808
R83	N/A	Mammal	0.9105	0.8655	0.9415
R85	N/A	Known Motif	0.0000	0.0000	0.0000
R85	N/A	Mammal	0.0496	0.0388	0.0631
R91	N/A	Known Motif	0.8808	0.8808	0.8808
R91	N/A	Mammal	0.9105	0.8655	0.9415
R97	N/A	Known Motif	0.0000	0.0000	0.0000
R97	N/A	Mammal	0.5434	0.3089	0.7601
K117	N/A	Known Motif	0.0000	0.0000	0.0000
K117	N/A	Mammal	0.0018	0.0007	0.0047
K125	N/A	Known Motif	0.0000	0.0000	0.0000
K125	N/A	Mammal	0.0104	0.0024	0.0437
R130	N/A	Known Motif	0.9975	0.9975	0.9975
R130	N/A	Mammal	0.9201	0.8575	0.9566
R145	N/A	Known Motif	0.8808	0.8808	0.8808
R145	N/A	Mammal	0.5257	0.4347	0.6149
R165	N/A	Known Motif	0.0000	0.0000	0.0000
R165	N/A	Mammal	0.0496	0.0388	0.0631
K175	N/A	Known Motif	0.0000	0.0000	0.0000
K175	N/A	Mammal	0.0018	0.0007	0.0047

Description of Predicted Cleavage Results columns

Site:: Identity and location of the cleaved site where R or K denote Arginine and Lysine, respectively, and the number is the position of the amino acid from the start of the submitted sequence (i.e. including the signal peptide sequence).
Known Cleavage:: Denotes prior knowledge of cleavage. For the Predict Cleavage Sites Only and Obtain Mass of Predicted peptides tasks, N/A denotes not applicable because any known cleavage information that is entered is ignored. For the Model Accuracy Statistics task, with valid input, the "Known Cleavage" column will read "True" for known cleaved sites or "False" for known non-cleaved sites.
Model:: Selected prediction model or models.
Cleavage Probability:: The predicted cleavage probability at that site for each model selected.
CI Lower Bound:: Lower bound of the confidence interval limit of the predicted cleavage probability.
CI Upper Bound:: Upper bound of the confidence interval limit of the predicted cleavage probability.

TOP

D. Model Accuracy Statistics Output

The Model Accuracy Statistics task provides a series of outputs for each sequence and a summary across all sequences. By default, the model accuracy statistics are calculated only for basic amino acids, following the processing rules of Amare et al. 2006 and Southey et al. 2008. Consequently, different results will occur under the Advanced Options Interface when changing these two options from the default values:

Ignore processing rules option: Under the Advanced Options Interface, if Yes is selected, all of the 'redundant' sites in the sequence will be used to compute the different statistics provided.

Use basic sites for accuracy statistics option: If No is selected under the Advanced Options Interface, the complete sequence, including all non-basic sites that are usually considered uncleaved, will be used to compute the different statistics provided.

TOP
1. Results for Individual Sequences

For each sequence submitted, the Individual Sequence Model Accuracy Statistics table provides the number of correct and incorrect predictions calculated by each selected model at threshold probabilities incremented from 0.1 to 0.9 by 0.1 units.

**Individual Sequence Model Accuracy Statistics**
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
True Positives	Known Motif	5	5	5	5	5	5	5	5	1
True Positives	Mammal	6	6	6	6	6	4	4	4	4
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
True Negatives	Known Motif	9	9	9	9	9	9	9	9	10
True Negatives	Mammal	7	7	8	8	8	9	10	10	10
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
False Positives	Known Motif	1	1	1	1	1	1	1	1	0
False Positives	Mammal	3	3	2	2	2	1	0	0	0
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
False Negatives	Known Motif	1	1	1	1	1	1	1	1	5
False Negatives	Mammal	0	0	0	0	0	2	2	2	2

Description of the columns for the Individual Sequence Model Accuracy Statistics table

Statistic:

Model:: Selected prediction model or models.

TOP

2. Results Across All Sequences

Three tables of accuracy statistics are calculated using the information from all sequences:

a. Sequence Description

Provides a summary across all the sequences entered.

**Sequence Description**
Number of precursors entered	1
Number of sites processed	16
Number of known cleaved sites	6
Number of known non-cleaved sites	10
Prevalence	0.3750

Description of the columns for the Sequence Description table

Number of precursors entered:: Number of precursor sequences that were recognized by NeuroPred.
Number of sites:: The total number of sites across all precursor sequences entered.
Number of known cleaved sites:: Total number of cleaved sites across all precursor sequences.
Number of known non-cleaved sites:: Total number of non-cleaved sites across all precursor sequences.
Prevalence:: The total number of known cleaved sites divided by the total number of sites processed.

TOP

b. Model Accuracy Statistic

The following statistics are calculated for each selected model at threshold probabilities incremented from 0.1 to 0.9 across all the submitted sequences.

**Model Accuracy Statistics**
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
True Positives	Known Motif	5	5	5	5	5	5	5	5	1
True Positives	Mammal	6	6	6	6	6	4	4	4	4
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
True Negatives	Known Motif	9	9	9	9	9	9	9	9	10
True Negatives	Mammal	7	7	8	8	8	9	10	10	10
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
False Positives	Known Motif	1	1	1	1	1	1	1	1	0
False Positives	Mammal	3	3	2	2	2	1	0	0	0
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
False Negatives	Known Motif	1	1	1	1	1	1	1	1	5
False Negatives	Mammal	0	0	0	0	0	2	2	2	2
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
Correct Classification Rate	Known Motif	0.8750	0.8750	0.8750	0.8750	0.8750	0.8750	0.8750	0.8750	0.6875
Correct Classification Rate	Mammal	0.8125	0.8125	0.8750	0.8750	0.8750	0.8125	0.8750	0.8750	0.8750
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
Sensitivity	Known Motif	0.8333	0.8333	0.8333	0.8333	0.8333	0.8333	0.8333	0.8333	0.1667
Sensitivity	Mammal	1.0000	1.0000	1.0000	1.0000	1.0000	0.6667	0.6667	0.6667	0.6667
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
Specificity	Known Motif	0.9000	0.9000	0.9000	0.9000	0.9000	0.9000	0.9000	0.9000	1.0000
Specificity	Mammal	0.7000	0.7000	0.8000	0.8000	0.8000	0.9000	1.0000	1.0000	1.0000
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
Positive Precision	Known Motif	0.8333	0.8333	0.8333	0.8333	0.8333	0.8333	0.8333	0.8333	1.0000
Positive Precision	Mammal	0.6667	0.6667	0.7500	0.7500	0.7500	0.8000	1.0000	1.0000	1.0000
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
Negative Precision	Known Motif	0.9000	0.9000	0.9000	0.9000	0.9000	0.9000	0.9000	0.9000	0.6667
Negative Precision	Mammal	1.0000	1.0000	1.0000	1.0000	1.0000	0.8182	0.8333	0.8333	0.8333
Statistic	Model	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
Correlation	Known Motif	0.7333	0.7333	0.7333	0.7333	0.7333	0.7333	0.7333	0.7333	0.3333
Correlation	Mammal	0.6831	0.6831	0.7746	0.7746	0.7746	0.5919	0.7454	0.7454	0.7454

Description of the columns for the Model Accuracy Statistics table

Statistic:

True Positives:

Number of sites correctly predicted to be cleaved.

True Negatives:

Number of sites correctly predicted to be non-cleaved.

False Positives:

Number of sites incorrectly predicted to be cleaved.

False Negatives:

Number of sites incorrectly predicted to be non-cleaved.

Correct Classification Rate:

Number of correctly predicted sites divided by the number of sites.

Sensitivity:

Number of true positives divided by the number of known cleaved sites.

Specificity:

Number of true negatives divided by the number of known non-cleaved sites.

Positive precision:

Proportion of sites that are predicted to be cleaved that are true positives.

Negative precision:

Proportion of sites that are not predicted to be cleaved that are true negatives.

Correlation:

Mathew's correlation coefficient between observed and predicted cleavage.

TOP

c. Area under the ROC curve (AUC):

The area under the receiver-operator characteristic (ROC) curve is a summary over all user-selected models. This curve indicates the percentage of correct decisions where values greater than 0.8 indicate excellent performance and values under 0.7 indicate poor performance.

**Area under the ROC curve**
Model	AUC
Known Motif	0.8750
Mammal	0.9500

Description of the columns for the Area under the ROC curve table

Model:: Selected prediction model or models.
AUC:: Area Under the receiver-operator characteristic (ROC) curve

TOP

E. Obtain Mass of Predicted peptides Output

When the Obtain Mass of Predicted peptides task is selected, the original sequence is cleaved based on predicted values for each model. The resulting peptides are extended by an order of 2 by default, or by the Degree of peptide extension value selected in the Advanced Options Interface (described in the Degree of peptide extension option), before the selected post-translational modifications are applied. Only peptides where the selected post-translational modifications have been successfully applied are reported using small blue brackets (e.g. DFPEEVAIVEEL[Amide] denotes amidation of the peptide DFPEEVAIVEELG). The average and monoiostopic masses of the predicted peptides are calculated at all stages such that masses are available for every combination of PTM to address the possibility that some PTMs may be absent. Note that the standard mass or molecular weight, not the MH+ or M+H mass, is calculated. The results are presented in the Mass of Predicted Peptides table:

**Mass of Predicted Peptides**
Abb. Peptide	NCut	CCut	PTM applied	Predicted Aver. Mass	Predicted Mono. Mass	Peptide sequence
Q16_R21	Signal Peptidase	Mammal	Cleaved	760.808200	760.366770	QGSWQR
Q16_Q20	Signal Peptidase	Mammal	TrimKR	604.620700	604.265660	QGSWQ
S22_R52	Mammal	Known Motif, Mammal	Cleaved	3512.677900	3510.585600	SLQDTEEKSRSFSASQADPLSDPDQMNEDKR
S22_D50	Mammal	Known Motif, Mammal	TrimKR	3228.316300	3226.389530	SLQDTEEKSRSFSASQADPLSDPDQMNED
H53_R70	Known Motif, Mammal	Known Motif, Mammal	Cleaved	2148.277000	2147.008320	HSQGTFTSDYSKYLDSRR
H53_S68	Known Motif, Mammal	Known Motif, Mammal	TrimKR	1835.902000	1834.806100	HSQGTFTSDYSKYLDS
A71_R83	Known Motif, Mammal	Known Motif, Mammal	Cleaved	1636.889700	1635.824270	AQDFVQWLMNTKR
A71_T81	Known Motif, Mammal	Known Motif, Mammal	TrimKR	1352.528100	1351.628200	AQDFVQWLMNT
N84_R91	Known Motif, Mammal	Known Motif, Mammal	Cleaved	985.114700	984.562840	NRNNIAKR
N84_A89	Known Motif, Mammal	Known Motif, Mammal	TrimKR	700.753100	700.366770	NRNNIA
H92_R97	Known Motif, Mammal	Mammal	Cleaved	831.840800	831.356250	HDEFER
H92_E96	Known Motif, Mammal	Mammal	TrimKR	675.653300	675.255140	HDEFE
H98_R130	Mammal	Known Motif, Mammal	Cleaved	3668.087000	3665.875340	HAEGTFTSDVSSYLEGQAAKEFIAWLVKGRGRR
H98_G128	Mammal	Known Motif, Mammal	TrimKR	3355.712000	3353.673120	HAEGTFTSDVSSYLEGQAAKEFIAWLVKGRG
D131_R145	Known Motif, Mammal	Known Motif, Mammal	Cleaved	1758.949600	1757.899900	DFPEEVAIVEELGRR
D131_G143	Known Motif, Mammal	Known Motif, Mammal	TrimKR	1446.574600	1445.697680	DFPEEVAIVEELG
H146_R179	Known Motif, Mammal	Sequence End	Cleaved	3922.341600	3919.921380	HADGSFSDEMNTILDNLAARDFINWLIQTKITDR
H146_D178	Known Motif, Mammal	Sequence End	TrimKR	3766.154100	3763.820270	HADGSFSDEMNTILDNLAARDFINWLIQTKITD
H98_R127	Known Motif, Mammal	Sequence End	TrimKR+ Amidation	3297.675400	3295.669520	HAEGTFTSDVSSYLEGQAAKEFIAWLVKGR[Amide]
D131_L142	Known Motif, Mammal	Sequence End	TrimKR+ Amidation	1388.538000	1387.694080	DFPEEVAIVEEL[Amide]
Q16_R21	Known Motif, Mammal	Sequence End	Cleaved+ QPyroglutamination	743.777600	743.340170	[p-]QGSWQR
Q16_Q20	Known Motif, Mammal	Sequence End	TrimKR+ QPyroglutamination	587.590100	587.239060	[p-]QGSWQ
S22_R52	Known Motif, Mammal	Sequence End	Cleaved+ Acetylation	3554.715200	3552.596200	[Ac-]SLQDTEEKSRSFSASQADPLSDPDQMNEDKR
S22_D50	Known Motif, Mammal	Sequence End	TrimKR+ Acetylation	3270.353600	3268.400130	[Ac-]SLQDTEEKSRSFSASQADPLSDPDQMNED
A71_R83	Known Motif, Mammal	Sequence End	Cleaved+ Acetylation	1678.927000	1677.834870	[Ac-]AQDFVQWLMNTKR
A71_T81	Known Motif, Mammal	Sequence End	TrimKR+ Acetylation	1394.565400	1393.638800	[Ac-]AQDFVQWLMNT
H53_R70	Known Motif, Mammal	Sequence End	Cleaved+ YSulfation 1x	2228.341200	2226.965120	HSQGTFTSDY[SO3H]SKYLDSRR
H53_S68	Known Motif, Mammal	Sequence End	TrimKR+ YSulfation 1x	1915.966200	1914.762900	HSQGTFTSDY[SO3H]SKYLDS

Description of the columns for the Mass of Predicted Peptides table

Abb. Peptide:: Abbreviated Peptide where the start and the end of the peptide is provided by a single letter amino acid code and location within the sequence.
NCut:: The model or models that predicted cleavage at this site that corresponds to the start or N-terminal region of the peptide. Signal Peptidase or Sequence Start are used to denote the start of the peptide whether a signal peptide is indicated or not.
CCut:: The model or models that predicted cleavage at this site that corresponds to the end or C-terminal region of the peptide. Sequence End is used to denote the end of the sequence.
PTM Applied:: Provides which PTMs have been applied to the peptide. Most PTM designations are self-explanatory; however, "Cleaved" denotes that the peptide has only been cleaved, "Extended" denotes that adjacent peptides have been joined, and "TrimKR" indicates that all the C-terminal K and R amino acids have been removed after cleavage. When a PTM is applied multiple times, only the mass with all occurrences is reported and the frequency is also reported in the "PTM Applied" column by indicating the number of PTM occurrences, followed by the times symbol ("x"), (e.g., Acetylation 3x).
Predicted Aver. Mass:: Predicted average mass of the peptide including any PTMs.
Predicted Mono. Mass:: Predicted monoisotopic mass of the peptide including any PTMs.
Peptide Sequence:: Complete sequence of the peptide. Any post-translational modifications that have been successfully applied are reported using small blue brackets (e.g. DFPEEVAIVEEL[Amide] denotes amidation of the peptide DFPEEVAIVEELG).

TOP

F. References

Amare, A., Hummon, A.B., Southey, B.R., Zimmerman, T.A., Rodriguez-Zas, S.L., Sweedler, J.V., Bridging neuropeptidomics and genomics with bioinformatics: prediction of mammalian neuropeptide prohormone processing. J. Proteome Res., 2006, 5, 1162-1167. Abstract.

Southey B.R., Amare A., Zimmerman T.A., Rodriguez-Zas S.L., Sweedler J.V., NeuroPred: a tool to predict cleavage sites in neuropeptide precursors and provide the masses of the resulting peptides. Nucleic Acids Res., 2006b, 34 (Web Server issue), W267-272. Abstract.

Tegge, A.N., Southey, B.R., Sweedler, J.V., Rodriguez-Zas, S.L., Comparative Analysis of Neuropeptide Cleavage Sites in Human, Mouse, Rat, and Cattle. Mamm. Genome, 2008 , 19(2), 106-120. Abstract.

Southey, B.R., Hummon, A.B., Richmond, T.A., Sweedler, J.V., Rodriguez-Zas, S.L., Prediction of neuropeptide cleavage sites in insects. Bioinformatics, 2008, 24, 815-825. Full Text

TOP