BLAST, PSI-BLAST, and Homology
Are two similar sequences homologous (i.e., is their similarity due to shared ancestry) ?
One way to quantify the similarity between two sequences is to:
- compare the actual sequences and calculate alignment score
- randomize (scramble) one (or both) of the sequences and calculate the alignment score for the randomized
sequences.
- repeat step 2 at least 100 times
- describe distribution of randomized alignment scores
- do a statistical test to determine if the score obtained for the real sequences is significantly better than the score
for the randomized sequences
To
illustrate the assessment of similarity/homology we will use a program
from Pearson's FASTA package called PRSS.
This and many other programs by Bill Pearson are available from his web
page at ftp://ftp.virginia.edu/pub/fasta/.
A web version is available here.
Example: Sequences are here
(fl), here
(B), here
(A) and here
(A2)
P-values give the probability of to find a match of this quality or better. P values are [0,1], E-values are [0,infinity).
For small values E=P
Z-values give the distance between the actual alignment score and the mean of the scores for the randomized
sequences expressed as multiples of the standard deviation calculated for the randomized scores.
For example: a z-value of 3 means that the actual alignment score is 3 standard deviations better than the average for
the randomized sequences. Z-values > 3 are usually considered as suggestive of homology, z-values > 5 are considered
as sufficient demonstration.
If you can demonstrate significant similarity using either randomization or an
unweighted blast search, your sequences are homologous (i.e. related by common
ancestry). Convergent evolution has not been shown to lead to sequence similarities
detectable by these means (see above - this might not be true for scores in PSI-blast)
If the actual alignment score is more than three standard deviations (of the randomized
sequences) better than the mean for the randomized sequences, the two sequences are
homologous (i.e. related by common ancestry).
BUT:
Failure to detect significant similarity does only shows our inability to detect
homology, it does not prove that the sequences are not homologous.
Example:
- Jim Knox (MCB-UConn) has studied many proteins involved in bacterial cell wall biosynthesis and
antibiotic binding, synthesis or destruction. Many of these proteins have identical 3-D structure, and
therefore can be assumed to be homologous, however, the above tests fail to detect this homologies. (for
example, enzymes with GRASP nucleotide binging sites are depicted
here).
|