      Documentation Sequences NeuroPred versions #### Cleavage Prediction with Logistic Regression

The prediction that a site will be cleaved or conversely not cleaved is based on the occurrence of different amino acids at different locations that surround the site of interest. The cleavage of a site can be considered a binary outcome because there are only two outcomes; cleaved and not cleaved. Thus, we are interested in obtaining the probability that a site will be cleaved based on the amino acids that surround the site.

A common approach to predict binary outcomes is to use logistic regression to describe the relationship between cleavage at a site and the amino acids that surround that site. The advantage of using logistic regression is that the probability is always constrained between zero and one such that negative probabilities and probabilities greater than one never occur.

Logistic regression uses the logit function to model the cleavage probability at a site as a linear function of the amino acids at the different locations that surround the site:

Logit(π)=log(π/(1-π))= log(π (1-π)-1)= log(π/(1-π))=Xb

Where:
• π = cleavage probability.
• log = the natural logarithm.
• Xb represents the combined influence of the specific amino acids present at each location surrounding the site on the cleavage probability of a site.
• X = the design vector for each amino acid that is possible at each location surrounding the site.
• b = the vector of regression coefficients for each amino acid that is possible at each location surrounding the site.

Given the specific amino acids present at each location surrounding the site, the cleavage probability of a site can be calculated as:

π=exp(Xb)(1+exp(Xb))-1 = (1+exp(-Xb))-1 =1/(1+ exp(-Xb))

Where:
• exp = the exponential function