The output of PRSS for two related sequences
(V-ATPase A-subunits from fly and Giardia) is given here:
giardiaA.txt, 654 aa vs DrosA.txt
s-w est
< 38 0
0:
40 0 0:
42 0 0:
44 1 1:*
46 5 3:==*==
48 7 5:====*==
50 12 8:=======*====
52 11 9:========*==
54 10 10:=========*
56 10 10:=========*
58 9 10:=========*
60 4 8:====
*
62 4 7:====
*
64 7 6:=====*=
66 3 5:=== *
68 4 4:===*
70 3 3:==*
72 2 2:=*
74 1 2:=*
76 0 1:*
78 1 1:*
80 2 1:*=
82 1 1:*
84 0 0:
86 0 0:
88 0 0:
90 0 0:
92 1 0:=
94 2 0:==
96 0 0:
98 0 0:
100 0 0:
102 0 0:
>104 0
0: O
61400 residues in 100
sequences,
BLOSUM50 matrix, gap penalties: -12,-2
unshuffled s-w score: 1861; shuffled score range: 45 - 96
Lambda: 0.1405 K: 0.0059872; P(1861)= 7.1435e-111
For 100 sequences, a score
>=1861 is expected 7.14e-109 times
The stars * denote the distribution fitted to the
randomized data.
The “probability” of the actual alignment score (or better) is calculated based
on this distribution.
The histogram improves when more shuffling
rounds are included, but the bottom line stays the same:
giardiaa.txt, 654 aa vs drosA.txt
// stuff deleted
96 0 1:*
98 0 0:
100 1 0:=
102 0 0:
104 0 0:
106 0 0:
108 0 0:
>110 0
0: O
614000 residues in 1000
sequences,
BLOSUM50 matrix, gap penalties: -12,-2
unshuffled s-w score: 1861; shuffled score range: 42 - 102
Lambda: 0.15491 K: 0.012682; P(1861)= 3.4764e-122
For 1000
sequences, a score >=1861 is expected 3.48e-119
times
Comparing
two less related sequences (ATPase involved in protein export versus V-ATPase A-subunit)
one obtains:
giardiaa.txt, 654 aa vs flii.txt
s-w est
< 36 0
0:
38 0 0:
40 0 0:
42 2 1:*=
44 4 3:==*=
46 12 5:====*=======
48 3 7:=== *
50 7 9:======= *
52 10 10:=========*
54 10 10:=========*
56 9 9:========*
58 7 9:======= *
60 7 7:======*
62 8 6:=====*==
64 7 5:====*==
66 3 4:===*
68 4 3:==*=
70 0 3:
*
72 1 2:=*
74 2 2:=*
76 0 1:*
78 0 1:*
80 1 1:*
82 1 1:*
84 1 0:=
86 0 0:
88 0 0:
90 0 0:
92 0 0:
94 1 0:=
96 0 0:
98 0 0:
100 0 0:
102 0 0:
>104 0
0: O
43400 residues in 100
sequences,
BLOSUM50 matrix, gap penalties: -12,-2
unshuffled s-w score: 269; shuffled score range: 43 - 96
Lambda: 0.13635 K: 0.0055017; P(269)= 1.9668e-13
For 100 sequences, a score
>=269 is expected 1.97e-11 times
And for sequences whose relationship is
either not detected or
they are unrelated, the output looks as
follows:
test2.txt, 565 aa vs
flii.txt
s-w est
< 36 0
0:
38 0 0:
40 0 0:
42 3 1:*==
44 8
4:===*====
46 6 7:======*
48 13 11:==========*==
50 11 12:===========*
52 9 13:========= *
54 15 12:===========*=== O
56 8 10:======== *
58 6 8:====== *
60 6 6:=====*
62 7 5:====*==
64 2 3:==*
66 3 2:=*=
68 2 2:=*
70 0 1:*
72 0 1:*
74 0 1:*
76 0 0:
78 0 0:
80 0 0:
82 0 0:
84 0 0:
86 0 0:
88 1 0:=
90 0 0:
92 0 0:
94 0 0:
96 0 0:
> 98 0
0:
43400 residues in 100
sequences,
BLOSUM50 matrix, gap penalties: -12,-2
unshuffled s-w score: 54; shuffled score range: 43 - 90
Lambda: 0.17371 K: 0.032327; P(54)= 0.5179
For 100 sequences, a score
>=54 is expected 52 times