PUZZLE 4.0.2 Type of analysis: tree reconstruction Parameter estimation: approximate (faster) Parameter estimation uses: neighbor-joining tree (for substitution process and rate variation) Standard errors (S.E.) are obtained by the curvature method. The upper and lower bounds of an approximate 95% confidence interval for parameter or branch length x are x-1.96*S.E. and x+1.96*S.E. SEQUENCE ALIGNMENT Input data: 19 sequences with 298 amino acid sites Number of constant sites: 10 (= 3.4% of all sites) SUBSTITUTION PROCESS Model of substitution: JTT (Jones et al. 1992) Amino acid frequencies (estimated from data set): pi(A) = 9.2% pi(R) = 8.2% pi(N) = 3.6% pi(D) = 4.1% pi(C) = 0.2% pi(Q) = 4.3% pi(E) = 11.0% pi(G) = 4.6% pi(H) = 0.9% pi(I) = 6.8% pi(L) = 10.4% pi(K) = 9.1% pi(M) = 3.1% pi(F) = 3.2% pi(P) = 2.5% pi(S) = 4.5% pi(T) = 5.3% pi(W) = 0.1% pi(Y) = 2.2% pi(V) = 6.7% RATE HETEROGENEITY Model of rate heterogeneity: Gamma distributed rates Gamma distribution parameter alpha (estimated from data set): 1.18 (S.E. 0.09) Number of Gamma rate categories: 8 Rates and their respective probabilities used in the likelihood function: Category Relative rate Probability 1 0.0986 0.1250 2 0.2676 0.1250 3 0.4469 0.1250 4 0.6517 0.1250 5 0.8996 0.1250 6 1.2229 0.1250 7 1.7025 0.1250 8 2.7103 0.1250 Categories 1-8 approximate a continous Gamma-distribution with expectation 1 and variance 0.85. Combination of categories that contributes the most to the likelihood (computation done without clock assumption assuming quartet-puzzling tree): 1 1 1 1 7 1 1 1 1 1 3 6 3 7 8 7 6 6 8 3 5 1 1 1 2 3 2 5 7 2 2 6 2 4 3 3 1 5 5 1 3 3 1 1 1 5 1 2 2 2 1 4 3 2 1 5 7 2 5 7 4 3 7 7 4 2 8 5 3 5 7 5 5 4 4 1 5 7 3 3 6 6 2 7 5 4 5 2 7 5 3 3 7 5 6 3 4 3 3 6 8 6 5 8 5 2 5 6 5 6 5 2 2 4 1 1 8 1 1 7 4 6 8 8 8 8 6 5 8 1 4 8 6 5 3 3 3 6 7 6 3 2 4 6 4 3 6 3 4 6 6 2 6 5 5 3 4 6 4 3 5 2 1 3 3 1 2 4 4 5 5 1 4 4 1 1 2 5 1 2 1 1 1 1 1 1 1 3 4 1 1 1 3 4 5 5 2 5 4 3 1 7 4 3 1 2 1 3 2 1 1 3 5 6 1 1 1 4 3 2 7 6 5 4 8 6 8 7 8 7 8 8 8 8 7 8 8 8 8 7 8 8 4 8 6 8 8 8 8 7 6 8 8 8 8 8 8 8 7 8 3 1 1 3 1 1 1 8 8 8 7 8 8 3 8 8 8 8 1 1 1 1 1 1 1 1 1 8 8 8 8 8 8 8 5 6 5 8 SEQUENCES IN INPUT ORDER 5% chi-square test p-value Deinococcu failed 0.27% [109] Thermus failed 1.03% [103] Pyrococcus passed 53.01% [117] Desulfuroc passed 74.02% [120] Methanococ passed 94.15% [112] Methanobac passed 47.80% [130] Methanosar passed 95.88% [106] Archaeoglo passed 33.90% [163] Oryctolagu passed 99.78% [69] Homosapien passed 99.83% [70] Bostaurus passed 99.85% [73] Celegans passed 64.88% [66] Saccharomy passed 82.11% [55] Candida passed 36.72% [68] Neurospora passed 66.68% [83] Schizosacc passed 21.14% [65] Sulfolobus failed 0.70% [223] Enterococc passed 42.00% [211] Treponema failed 0.00% [157] The chi-square tests compares the amino acid composition of each sequence to the frequency distribution assumed in the maximum likelihood model. Warning: Result of chi-square test may not be valid because of small maximum likelihood frequencies and short sequence length! The number in square brackets indicates how often each sequence is involved in one of the 525 completely unresolved quartets of the quartet puzzling tree search. IDENTICAL SEQUENCES The sequences in each of the following groups are all identical. To speed up computation please remove all but one of each group from the data set. All sequences are unique. MAXIMUM LIKELIHOOD DISTANCES Maximum likelihood distances are computed using the selected model of substitution and rate heterogeneity. 19 Deinococcu 0.00000 1.30333 2.63634 3.07238 2.72397 3.44798 3.02503 3.26283 3.97649 3.94138 4.03932 3.63080 3.55359 3.88733 3.82511 4.46848 4.70740 4.50359 3.62528 Thermus 1.30333 0.00000 2.25730 2.35346 2.33561 2.53312 2.37504 2.60277 3.14726 3.19492 3.26796 3.35298 3.80945 3.38112 3.74110 3.58336 3.46868 3.71815 2.68366 Pyrococcus 2.63634 2.25730 0.00000 0.38048 0.99509 1.17677 1.51929 1.52459 3.13740 3.10192 3.17229 2.97268 3.52022 3.12931 3.17129 2.96015 3.51782 2.44580 3.14524 Desulfuroc 3.07238 2.35346 0.38048 0.00000 1.04484 1.27138 1.57372 1.56204 3.20443 3.16715 3.24108 2.87491 3.49943 3.19021 3.13553 2.92405 3.49268 2.33948 3.34207 Methanococ 2.72397 2.33561 0.99509 1.04484 0.00000 0.77782 1.33976 1.43311 2.73505 2.77448 2.85837 2.79597 3.08563 2.79921 2.88790 3.08480 3.01164 2.47840 3.38428 Methanobac 3.44798 2.53312 1.17677 1.27138 0.77782 0.00000 1.73081 1.56769 2.82428 2.86641 2.93367 2.64911 2.95834 2.75964 3.16943 2.62703 3.11381 2.23077 3.33402 Methanosar 3.02503 2.37504 1.51929 1.57372 1.33976 1.73081 0.00000 1.08171 3.91106 3.93416 4.03406 3.80407 3.99233 3.99066 3.76400 3.81889 3.03597 2.86165 2.94787 Archaeoglo 3.26283 2.60277 1.52459 1.56204 1.43311 1.56769 1.08171 0.00000 3.24417 3.20541 3.28228 3.07992 3.43537 3.34415 3.34736 3.35522 3.36653 2.53300 3.20889 Oryctolagu 3.97649 3.14726 3.13740 3.20443 2.73505 2.82428 3.91106 3.24417 0.00000 0.00818 0.01647 0.45311 0.97865 0.98058 1.06681 1.22898 4.83261 3.88773 4.75321 Homosapien 3.94138 3.19492 3.10192 3.16715 2.77448 2.86641 3.93416 3.20541 0.00818 0.00000 0.01234 0.44750 0.96882 0.97257 1.06045 1.22030 4.82172 3.90941 4.76187 Bostaurus 4.03932 3.26796 3.17229 3.24108 2.85837 2.93367 4.03406 3.28228 0.01647 0.01234 0.00000 0.45861 0.97567 0.99614 1.06380 1.22218 4.91284 3.99512 4.89006 Celegans 3.63080 3.35298 2.97268 2.87491 2.79597 2.64911 3.80407 3.07992 0.45311 0.44750 0.45861 0.00000 1.08127 1.14623 1.16426 1.26725 5.15302 4.29271 4.44284 Saccharomy 3.55359 3.80945 3.52022 3.49943 3.08563 2.95834 3.99233 3.43537 0.97865 0.96882 0.97567 1.08127 0.00000 0.33599 0.48445 0.82862 5.04507 3.91388 4.18899 Candida 3.88733 3.38112 3.12931 3.19021 2.79921 2.75964 3.99066 3.34415 0.98058 0.97257 0.99614 1.14623 0.33599 0.00000 0.69414 0.87732 4.51022 4.04042 4.40705 Neurospora 3.82511 3.74110 3.17129 3.13553 2.88790 3.16943 3.76400 3.34736 1.06681 1.06045 1.06380 1.16426 0.48445 0.69414 0.00000 0.96686 4.57334 4.39877 4.55808 Schizosacc 4.46848 3.58336 2.96015 2.92405 3.08480 2.62703 3.81889 3.35522 1.22898 1.22030 1.22218 1.26725 0.82862 0.87732 0.96686 0.00000 3.93122 3.82746 4.70313 Sulfolobus 4.70740 3.46868 3.51782 3.49268 3.01164 3.11381 3.03597 3.36653 4.83261 4.82172 4.91284 5.15302 5.04507 4.51022 4.57334 3.93122 0.00000 4.68392 3.78345 Enterococc 4.50359 3.71815 2.44580 2.33948 2.47840 2.23077 2.86165 2.53300 3.88773 3.90941 3.99512 4.29271 3.91388 4.04042 4.39877 3.82746 4.68392 0.00000 3.86966 Treponema 3.62528 2.68366 3.14524 3.34207 3.38428 3.33402 2.94787 3.20889 4.75321 4.76187 4.89006 4.44284 4.18899 4.40705 4.55808 4.70313 3.78345 3.86966 0.00000 Average distance (over all possible pairs of sequences): 2.83056 TREE SEARCH Quartet puzzling is used to choose from the possible tree topologies and to simultaneously infer support values for internal branches. Number of puzzling steps: 1000 Analysed quartets: 3876 Unresolved quartets: 525 (= 13.5%) Quartet trees are based on approximate maximum likelihood values using the selected model of substitution and rate heterogeneity. QUARTET PUZZLING TREE Support for the internal branches of the unrooted quartet puzzling tree topology is shown in percent. This quartet puzzling tree is not completely resolved! :---Sulfolobus :-----------------78: : :---Treponema : : :---Homosapien : :100: : :100: :---Bostaurus : : : : :-99: :-------Oryctolagu : : : :-92: : :-----------Celegans : : : : : :-99: :---Saccharomy : : : : :-72: : : : : :-71: :---Candida : : : : : : : : : :-91: :-------Neurospora : : : : : : : :-----------Schizosacc : :-65: : : :---Methanosar : : :-90: : : : :---Archaeoglo : : : : : : :---Pyrococcus : : :-89: : :---------53: :---Desulfuroc : : : : :---Methanococ : :-62: : : :---Methanobac : : : :-------Enterococc : :---------------------------Thermus : :---------------------------Deinococcu Quartet puzzling tree (in CLUSTAL W notation): (Deinococcu,((Sulfolobus,Treponema)78,(((((Homosapien,Bostaurus)100, Oryctolagu)100,Celegans)99,(((Saccharomy,Candida)72,Neurospora)71, Schizosacc)91)99,((Methanosar,Archaeoglo)90,(Pyrococcus, Desulfuroc)89,(Methanococ,Methanobac)62,Enterococc)53)65)92, Thermus); BIPARTITIONS The following bipartitions occured at least once in all intermediate trees that have been generated in the 1000 puzzling steps: Bipartitions included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) ********.. .******** : 1000 *********. .******** : 999 ********.. ..******* : 993 ********.. ......*** : 993 **........ ......... : 915 ********** **....*** : 907 ******..** ********* : 901 **..****** ********* : 890 ********** ******.*. : 783 ********** **..***** : 723 ********** **...**** : 706 **........ ......*.* : 653 ****..**** ********* : 617 **......** *******.* : 531 Bipartitions not included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) ****..**** *******.* : 364 **....**** *******.* : 293 **....**** ********* : 283 *****.**** *******.* : 264 **..**..** ********* : 214 ********** **..*.*** : 197 **........ ........* : 183 **....**.. ......*.* : 174 **......** ********* : 164 **......** ******... : 164 ********** ***.*.*** : 151 ********** **.*.**** : 136 ******..** *******.* : 136 **...***** ********* : 112 ******..** ******.*. : 98 ********.. .....**** : 82 *.******** ******.*. : 81 ********** ****..*** : 76 **..****** *******.* : 73 ****....** *******.* : 67 (131 other less frequent bipartitions not shown) MAXIMUM LIKELIHOOD BRANCH LENGTHS ON QUARTET PUZZLING TREE (NO CLOCK) Branch lengths are computed using the selected model of substitution and rate heterogeneity. :------------17 Sulfolobus :--20 : :-----------19 Treponema :------33 : : :-10 Homosapien : : :-21 : : : :-11 Bostaurus : : :--22 : : : :-9 Oryctolagu : : :--23 : : : :--12 Celegans : : :------------27 : : : : :-13 Saccharomy : : : : :-24 : : : : : :--14 Candida : : : : :-25 : : : : : :--15 Neurospora : : : :--26 : : : :---16 Schizosacc : :--32 : : :----7 Methanosar : : :----28 : : : :----8 Archaeoglo : : : : : :--3 Pyrococcus : :---29 : : :--4 Desulfuroc : ---31 : : :--5 Methanococ : :--30 : : :----6 Methanobac : : : :--------------18 Enterococc : :---2 Thermus : :------1 Deinococcu branch length S.E. branch length S.E. Deinococcu 1 1.08801 0.17176 20 0.36240 0.21397 Thermus 2 0.42222 0.13156 21 0.00001 9.00000 Pyrococcus 3 0.20236 0.05578 22 0.23281 0.05311 Desulfuroc 4 0.28238 0.06146 23 0.34923 0.09006 Methanococ 5 0.34484 0.08216 24 0.11000 0.04274 Methanobac 6 0.64626 0.10806 25 0.15170 0.05661 Methanosar 7 0.68589 0.13062 26 0.27551 0.08745 Archaeoglo 8 0.72231 0.13570 27 2.16016 0.33795 Oryctolagu 9 0.00667 0.00596 28 0.64613 0.13840 Homosapien 10 0.00298 0.00443 29 0.49857 0.10718 Bostaurus 11 0.01241 0.00765 30 0.25441 0.08515 Celegans 12 0.26784 0.05539 31 0.56753 0.18576 Saccharomy 13 0.11289 0.03290 32 0.27596 0.16681 Candida 14 0.25498 0.04430 33 0.98561 0.22009 Neurospora 15 0.36816 0.05926 Schizosacc 16 0.53741 0.08010 Sulfolobus 17 2.30918 0.37713 Enterococc 18 2.51694 0.32424 16 iterations until convergence Treponema 19 2.10641 0.34789 log L: -7238.52 WARNING --- at least one brach length is close to an internal boundary! Quartet puzzling tree with maximum likelihood branch lengths (in CLUSTAL W notation): (Deinococcu:1.08801,((Sulfolobus:2.30918,Treponema:2.10641)78:0.36240, (((((Homosapien:0.00298,Bostaurus:0.01241)100:0.00001,Oryctolagu:0.00667) 100:0.23281,Celegans:0.26784)99:0.34923,(((Saccharomy:0.11289,Candida:0.25498) 72:0.11000,Neurospora:0.36816)71:0.15170,Schizosacc:0.53741)91:0.27551) 99:2.16016,((Methanosar:0.68589,Archaeoglo:0.72231)90:0.64613,(Pyrococcus:0.20236, Desulfuroc:0.28238)89:0.49857,(Methanococ:0.34484,Methanobac:0.64626) 62:0.25441,Enterococc:2.51694)53:0.56753)65:0.27596)92:0.98561,Thermus:0.42222); TIME STAMP Date and time: Fri Jun 18 11:40:41 1999 Runtime: 2297 seconds (= 38.3 minutes = 0.6 hours)