PUZZLE 4.0.2 Type of analysis: tree reconstruction Parameter estimation: approximate (faster) Parameter estimation uses: neighbor-joining tree (for substitution process and rate variation) Standard errors (S.E.) are obtained by the curvature method. The upper and lower bounds of an approximate 95% confidence interval for parameter or branch length x are x-1.96*S.E. and x+1.96*S.E. SEQUENCE ALIGNMENT Input data: 18 sequences with 631 amino acid sites Number of constant sites: 28 (= 4.4% of all sites) SUBSTITUTION PROCESS Model of substitution: JTT (Jones et al. 1992) Amino acid frequencies (estimated from data set): pi(A) = 10.9% pi(R) = 5.6% pi(N) = 3.8% pi(D) = 6.1% pi(C) = 1.1% pi(Q) = 2.8% pi(E) = 7.8% pi(G) = 7.2% pi(H) = 2.9% pi(I) = 7.4% pi(L) = 8.5% pi(K) = 5.9% pi(M) = 1.9% pi(F) = 3.2% pi(P) = 3.3% pi(S) = 5.3% pi(T) = 5.4% pi(W) = 0.5% pi(Y) = 2.3% pi(V) = 8.1% RATE HETEROGENEITY Model of rate heterogeneity: uniform rate SEQUENCES IN INPUT ORDER 5% chi-square test p-value Microcysti passed 76.48% [21] Anabaena passed 51.69% [21] Lycopersic passed 8.74% [16] Escherichi passed 62.63% [30] Bacillus passed 92.88% [27] Lactococcu passed 11.18% [26] Methanococ failed 0.00% [19] Archaeoglo passed 10.24% [36] Methanobac passed 58.09% [49] Enterobact failed 0.00% [30] Azotobacte failed 0.02% [24] Rhodobacte failed 0.00% [19] Frankia failed 0.00% [36] Aquifex failed 1.85% [38] Deinocococ passed 17.94% [12] Thermus passed 11.87% [12] Sacchmt passed 8.79% [20] Sacchcyt passed 15.96% [24] The chi-square tests compares the amino acid composition of each sequence to the frequency distribution assumed in the maximum likelihood model. The number in square brackets indicates how often each sequence is involved in one of the 115 completely unresolved quartets of the quartet puzzling tree search. IDENTICAL SEQUENCES The sequences in each of the following groups are all identical. To speed up computation please remove all but one of each group from the data set. All sequences are unique. MAXIMUM LIKELIHOOD DISTANCES Maximum likelihood distances are computed using the selected model of substitution and rate heterogeneity. 18 Microcysti 0.00000 0.25710 0.73325 0.85251 0.89803 1.00397 1.00710 1.18943 1.18579 1.90378 1.43924 1.73049 1.51592 1.73901 2.02385 1.85475 1.97440 1.91073 Anabaena 0.25710 0.00000 0.74451 0.86529 0.83549 0.98458 1.02310 1.16864 1.21251 1.80102 1.53382 1.68494 1.57361 1.78054 2.03677 1.80486 1.90645 1.86889 Lycopersic 0.73325 0.74451 0.00000 1.05616 0.99435 1.00344 1.13758 1.20268 1.26966 1.88130 1.47993 1.67110 1.56612 2.22720 2.19326 2.04210 2.12451 2.07584 Escherichi 0.85251 0.86529 1.05616 0.00000 1.07888 1.14293 1.21198 1.37743 1.46605 1.93158 1.63988 1.72480 1.64976 1.88424 2.14058 1.98227 2.10296 2.09448 Bacillus 0.89803 0.83549 0.99435 1.07888 0.00000 0.81627 1.13460 1.22363 1.21453 1.87583 1.59631 1.63376 1.41305 1.98419 2.07637 1.89149 2.06128 2.02385 Lactococcu 1.00397 0.98458 1.00344 1.14293 0.81627 0.00000 1.13164 1.28466 1.29266 1.96296 1.70580 1.58592 1.62123 1.87400 2.10692 1.87551 1.98533 1.98620 Methanococ 1.00710 1.02310 1.13758 1.21198 1.13460 1.13164 0.00000 0.71364 0.82837 1.76596 1.47462 1.65468 1.39369 1.55562 1.66896 1.46532 1.68152 1.62790 Archaeoglo 1.18943 1.16864 1.20268 1.37743 1.22363 1.28466 0.71364 0.00000 0.98653 1.87501 1.59404 1.57877 1.51489 1.76506 1.74833 1.50531 1.76468 1.76290 Methanobac 1.18579 1.21251 1.26966 1.46605 1.21453 1.29266 0.82837 0.98653 0.00000 1.61202 1.34243 1.72847 1.30313 1.57938 1.73929 1.52817 1.68849 1.65603 Enterobact 1.90378 1.80102 1.88130 1.93158 1.87583 1.96296 1.76596 1.87501 1.61202 0.00000 1.09119 1.37637 1.13940 2.45541 2.04586 2.03438 2.35770 2.31420 Azotobacte 1.43924 1.53382 1.47993 1.63988 1.59631 1.70580 1.47462 1.59404 1.34243 1.09119 0.00000 1.00608 0.98038 2.18707 1.99251 1.97092 2.05232 2.06017 Rhodobacte 1.73049 1.68494 1.67110 1.72480 1.63376 1.58592 1.65468 1.57877 1.72847 1.37637 1.00608 0.00000 0.93970 2.17641 2.13047 2.16137 2.41398 2.40091 Frankia 1.51592 1.57361 1.56612 1.64976 1.41305 1.62123 1.39369 1.51489 1.30313 1.13940 0.98038 0.93970 0.00000 2.15899 2.00706 1.98547 2.15262 2.11385 Aquifex 1.73901 1.78054 2.22720 1.88424 1.98419 1.87400 1.55562 1.76506 1.57938 2.45541 2.18707 2.17641 2.15899 0.00000 2.01536 1.90659 2.05165 2.08314 Deinocococ 2.02385 2.03677 2.19326 2.14058 2.07637 2.10692 1.66896 1.74833 1.73929 2.04586 1.99251 2.13047 2.00706 2.01536 0.00000 0.49739 0.94336 0.95713 Thermus 1.85475 1.80486 2.04210 1.98227 1.89149 1.87551 1.46532 1.50531 1.52817 2.03438 1.97092 2.16137 1.98547 1.90659 0.49739 0.00000 0.73310 0.73587 Sacchmt 1.97440 1.90645 2.12451 2.10296 2.06128 1.98533 1.68152 1.76468 1.68849 2.35770 2.05232 2.41398 2.15262 2.05165 0.94336 0.73310 0.00000 0.10351 Sacchcyt 1.91073 1.86889 2.07584 2.09448 2.02385 1.98620 1.62790 1.76290 1.65603 2.31420 2.06017 2.40091 2.11385 2.08314 0.95713 0.73587 0.10351 0.00000 Average distance (over all possible pairs of sequences): 1.58556 TREE SEARCH Quartet puzzling is used to choose from the possible tree topologies and to simultaneously infer support values for internal branches. Number of puzzling steps: 1000 Analysed quartets: 3060 Unresolved quartets: 115 (= 3.8%) Quartet trees are based on approximate maximum likelihood values using the selected model of substitution and rate heterogeneity. QUARTET PUZZLING TREE Support for the internal branches of the unrooted quartet puzzling tree topology is shown in percent. This quartet puzzling tree is not completely resolved! :---Bacillus :-------------99: : :---Lactococcu : : :---Rhodobacte : :-96: : : :---Frankia : :-----96: :-92: : : :---Enterobact : : : :-92: : : : :---Azotobacte : : : : : : :---Deinocococ : : : :-99: : : : : :---Thermus : :-99: :-96: : : : : :---Sacchmt : :-88: :-99: :-94: : : :---Sacchcyt : : : : : : : :-----------Aquifex : : : : : : :---Methanococ : : : :-74: :-99: : :-----55: :---Archaeoglo : : : : : : : :-------Methanobac : : : : : :-----------------------Escherichi : : : :---------------------------Lycopersic : :-------------------------------Anabaena : :-------------------------------Microcysti Quartet puzzling tree (in CLUSTAL W notation): (Microcysti,((((Bacillus,Lactococcu)99,(((Rhodobacte,Frankia)96, (Enterobact,Azotobacte)92)96,(((Deinocococ,Thermus)99,(Sacchmt, Sacchcyt)99)96,Aquifex)88,((Methanococ,Archaeoglo)74,Methanobac)55)99)92, Escherichi)94,Lycopersic)99,Anabaena); BIPARTITIONS The following bipartitions occured at least once in all intermediate trees that have been generated in the 1000 puzzling steps: Bipartitions included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) ****..**** ******** : 993 **........ ........ : 993 ********** ****..** : 992 ******.... ........ : 992 ********** ******.. : 986 ********** ****.... : 960 *********. ...***** : 959 ********** *..***** : 959 ***....... ........ : 936 ****...... ........ : 923 *********. .******* : 919 ********** ***..... : 880 ******..** ******** : 737 ******...* ******** : 546 Bipartitions not included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) *********. ........ : 407 ******...* ***..... : 334 ********.. ........ : 302 ******.*.* ******** : 257 ********.* ***..... : 186 ********.. ...***** : 109 **.*...... ........ : 61 ********** ...***** : 56 ******.... ...***** : 51 ***.**.... ........ : 49 ******..** ***..... : 38 *********. ...*.... : 36 ******...* ***.**** : 26 ********.* *..***** : 25 ***...**** ******** : 25 ********** ***.**.. : 22 *********. *..***** : 21 *********. .*.***** : 20 *********. ..****** : 20 ********.* ****.... : 13 (67 other less frequent bipartitions not shown) MAXIMUM LIKELIHOOD BRANCH LENGTHS ON QUARTET PUZZLING TREE (NO CLOCK) Branch lengths are computed using the selected model of substitution and rate heterogeneity. :----5 Bacillus :---19 : :------6 Lactococcu :-30 : : :------12 Rhodobacte : : :--20 : : : :-----13 Frankia : : :--------22 : : : : :--------10 Enterobact : : : :--21 : : : :-----11 Azotobacte : :-----29 : : :----15 Deinocococ : : :---23 : : : :--16 Thermus : : :--------25 : : : : :-17 Sacchmt : : : :-----24 : : : :-18 Sacchcyt : :---26 : : :-----------14 Aquifex : : : : :---7 Methanococ : : :--27 : : : :------8 Archaeoglo : :-28 : :------9 Methanobac :--31 : :------4 Escherichi :--32 : :------3 Lycopersic : :--2 Anabaena : :--1 Microcysti branch length S.E. branch length S.E. Microcysti 1 0.13239 0.01984 19 0.18596 0.03264 Anabaena 2 0.12642 0.01956 20 0.16718 0.04519 Lycopersic 3 0.49016 0.04217 21 0.12764 0.04325 Escherichi 4 0.54189 0.04675 22 0.67198 0.07494 Bacillus 5 0.34491 0.03783 23 0.21337 0.04412 Lactococcu 6 0.46783 0.04376 24 0.43105 0.05324 Methanococ 7 0.26890 0.03481 25 0.70421 0.08266 Archaeoglo 8 0.47465 0.04405 26 0.26413 0.05859 Methanobac 9 0.53488 0.05464 27 0.10524 0.03213 Enterobact 10 0.70687 0.06989 28 0.08719 0.03213 Azotobacte 11 0.40463 0.05389 29 0.38025 0.04689 Rhodobacte 12 0.53956 0.06087 30 0.04684 0.02503 Frankia 13 0.42866 0.05541 31 0.10733 0.02558 Aquifex 14 0.98908 0.08534 32 0.17363 0.02749 Deinocococ 15 0.36043 0.04204 Thermus 16 0.16417 0.03195 Sacchmt 17 0.03629 0.01395 13 iterations until convergence Sacchcyt 18 0.06797 0.01550 log L: -15836.38 Quartet puzzling tree with maximum likelihood branch lengths (in CLUSTAL W notation): (Microcysti:0.13239,((((Bacillus:0.34491,Lactococcu:0.46783)99:0.18596, (((Rhodobacte:0.53956,Frankia:0.42866)96:0.16718,(Enterobact:0.70687, Azotobacte:0.40463)92:0.12764)96:0.67198,(((Deinocococ:0.36043, Thermus:0.16417)99:0.21337,(Sacchmt:0.03629,Sacchcyt:0.06797)99:0.43105) 96:0.70421,Aquifex:0.98908)88:0.26413,((Methanococ:0.26890,Archaeoglo:0.47465) 74:0.10524,Methanobac:0.53488)55:0.08719)99:0.38025)92:0.04684,Escherichi:0.54189) 94:0.10733,Lycopersic:0.49016)99:0.17363,Anabaena:0.12642); TIME STAMP Date and time: Thu Mar 30 20:36:52 2000 Runtime: 559 seconds (= 9.3 minutes = 0.2 hours)