PUZZLE 4.0.2 Type of analysis: tree reconstruction Parameter estimation: approximate (faster) Parameter estimation uses: neighbor-joining tree (for substitution process and rate variation) Standard errors (S.E.) are obtained by the curvature method. The upper and lower bounds of an approximate 95% confidence interval for parameter or branch length x are x-1.96*S.E. and x+1.96*S.E. SEQUENCE ALIGNMENT Input data: 15 sequences with 1130 amino acid sites Number of constant sites: 12 (= 1.1% of all sites) SUBSTITUTION PROCESS Model of substitution: JTT (Jones et al. 1992) Amino acid frequencies (estimated from data set): pi(A) = 5.8% pi(R) = 5.1% pi(N) = 4.9% pi(D) = 5.7% pi(C) = 1.3% pi(Q) = 3.1% pi(E) = 7.7% pi(G) = 6.0% pi(H) = 3.3% pi(I) = 8.5% pi(L) = 9.0% pi(K) = 6.8% pi(M) = 2.7% pi(F) = 3.7% pi(P) = 4.5% pi(S) = 5.7% pi(T) = 4.9% pi(W) = 0.7% pi(Y) = 3.6% pi(V) = 6.9% RATE HETEROGENEITY Model of rate heterogeneity: Gamma distributed rates Gamma distribution parameter alpha (estimated from data set): 1.27 (S.E. 0.07) Number of Gamma rate categories: 8 Rates and their respective probabilities used in the likelihood function: Category Relative rate Probability 1 0.1119 0.1250 2 0.2870 0.1250 3 0.4670 0.1250 4 0.6693 0.1250 5 0.9114 0.1250 6 1.2242 0.1250 7 1.6848 0.1250 8 2.6444 0.1250 Categories 1-8 approximate a continous Gamma-distribution with expectation 1 and variance 0.79. Combination of categories that contributes the most to the likelihood (computation done without clock assumption assuming quartet-puzzling tree): 8 8 8 8 8 8 8 3 8 8 6 8 7 8 1 8 7 1 3 8 8 5 8 6 8 8 1 8 7 3 6 4 8 8 5 4 1 1 1 1 4 7 1 3 1 3 4 1 1 4 8 8 5 8 8 1 7 7 3 4 8 4 1 1 4 4 3 1 1 1 4 3 1 7 1 5 8 4 1 8 8 3 3 8 2 8 7 8 1 8 5 3 1 2 4 4 1 7 1 3 2 5 8 2 2 8 8 6 6 1 2 5 2 2 4 7 2 3 1 6 2 1 1 3 8 2 8 2 8 5 8 2 8 6 8 6 7 2 3 1 4 8 2 2 2 3 2 5 6 2 7 3 3 8 8 1 2 8 8 3 8 8 4 7 8 4 5 7 7 4 4 7 5 6 3 8 6 3 5 5 8 8 8 8 8 8 8 8 8 5 6 5 7 6 4 1 2 6 3 1 1 2 2 1 3 4 8 4 4 3 5 5 4 3 3 2 5 1 3 1 3 2 3 3 8 8 4 8 8 8 7 4 7 2 6 7 3 5 4 3 5 5 8 6 2 8 2 5 5 2 1 3 2 1 2 1 1 1 1 1 1 1 6 1 3 2 1 6 2 5 4 6 5 5 3 5 3 3 2 2 3 1 4 2 1 2 2 1 3 7 3 2 1 5 1 7 4 2 2 4 7 7 7 8 6 6 6 8 8 1 8 1 8 8 8 8 8 1 8 8 1 1 8 8 8 8 8 8 1 8 8 8 7 6 6 8 4 4 4 6 7 2 2 5 6 5 4 4 3 2 4 5 3 3 3 4 3 2 1 6 5 3 6 4 2 4 6 5 5 4 2 2 1 1 1 1 1 1 1 1 2 3 3 5 4 4 3 7 2 5 7 4 2 3 2 1 1 1 4 2 5 6 8 2 1 1 1 6 6 1 5 4 7 6 5 4 1 1 5 1 3 1 1 2 1 1 1 1 2 3 8 2 4 5 2 5 6 1 6 6 4 1 6 5 2 6 4 7 2 4 5 2 3 8 4 1 1 2 2 1 1 1 2 1 1 2 1 1 1 1 1 3 2 4 2 1 2 2 3 6 6 4 2 8 4 8 8 7 8 8 5 4 1 2 2 3 1 1 2 1 2 8 3 1 1 4 3 2 3 3 2 5 2 6 2 2 6 7 3 2 5 5 3 6 7 8 5 8 1 1 1 6 3 5 6 5 8 7 3 7 4 7 8 5 3 7 4 7 4 5 5 8 3 8 8 8 1 1 3 1 2 3 1 1 1 1 1 2 4 2 1 3 2 3 5 2 3 1 7 2 6 4 2 4 4 1 7 3 2 2 2 1 1 1 1 2 1 1 2 1 2 6 3 3 7 1 5 6 5 2 7 7 8 8 8 5 6 5 6 8 6 3 6 2 4 3 6 1 8 3 2 4 1 1 1 1 1 2 1 5 5 3 1 5 5 2 2 4 7 4 8 6 3 1 2 3 4 2 3 4 2 1 1 1 5 8 5 3 7 3 2 2 5 5 2 8 5 7 6 7 8 8 3 6 8 1 2 8 2 7 8 5 3 4 5 2 7 8 3 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 1 8 8 8 8 8 8 8 8 8 8 8 8 8 4 2 5 2 4 4 5 4 6 3 8 5 7 4 5 5 7 6 1 6 7 4 3 8 8 8 8 1 8 5 5 8 4 1 6 8 8 8 8 1 8 8 8 8 8 3 8 8 8 4 8 8 8 8 1 3 8 8 8 8 2 8 7 5 8 8 7 7 8 8 7 8 8 7 4 4 5 5 5 8 2 1 5 5 8 6 6 8 8 6 1 8 8 3 2 4 6 8 8 7 3 3 7 5 5 7 6 7 5 7 1 8 3 6 8 8 8 8 8 8 4 8 6 8 5 6 8 6 4 7 4 4 3 8 3 8 5 8 8 8 7 8 8 5 3 8 8 6 3 8 8 8 8 4 8 3 6 8 8 8 8 7 4 7 4 8 4 8 2 8 4 6 8 7 8 8 8 1 8 2 8 4 5 6 8 8 3 8 6 2 4 6 8 3 4 5 6 4 6 5 5 6 8 5 6 2 7 5 6 6 8 5 8 7 5 5 6 6 6 8 7 8 6 8 4 7 8 8 8 8 7 8 8 8 8 6 8 6 4 7 4 6 7 8 8 8 7 8 8 7 5 4 4 6 8 7 8 8 7 1 3 6 3 8 8 7 8 8 7 5 7 8 8 8 8 5 5 8 6 8 8 8 3 7 5 5 6 8 2 5 7 4 8 8 8 7 8 3 8 8 1 7 8 2 5 8 6 2 8 8 8 8 8 8 8 8 7 8 5 8 4 8 8 7 8 3 6 3 8 8 8 7 8 7 8 6 8 7 2 4 8 8 7 8 8 3 7 1 8 6 8 2 8 8 8 6 8 8 8 7 8 8 3 8 8 8 8 6 6 1 3 8 8 5 8 8 2 8 2 8 5 8 6 8 1 1 8 1 8 8 8 8 1 8 8 8 8 8 1 8 1 SEQUENCES IN INPUT ORDER 5% chi-square test p-value Methanococ passed 9.18% [66] Pyrococcus failed 3.11% [37] Methanoco2 passed 7.15% [42] Methanobac passed 44.00% [46] Archaeoglo passed 6.13% [35] Bostaurus passed 95.89% [25] Homosapien passed 98.21% [28] Schizosacc passed 40.88% [30] Saccharomy passed 11.96% [24] Methanoco3 passed 58.74% [71] Archaeogl2 passed 7.94% [39] Vibrio failed 0.00% [65] Deinococcu failed 0.85% [36] Arabidopsi failed 0.01% [14] Plasmodium failed 0.00% [14] The chi-square tests compares the amino acid composition of each sequence to the frequency distribution assumed in the maximum likelihood model. The number in square brackets indicates how often each sequence is involved in one of the 143 completely unresolved quartets of the quartet puzzling tree search. IDENTICAL SEQUENCES The sequences in each of the following groups are all identical. To speed up computation please remove all but one of each group from the data set. Bostaurus, Homosapien. MAXIMUM LIKELIHOOD DISTANCES Maximum likelihood distances are computed using the selected model of substitution and rate heterogeneity. 15 Methanococ 0.00000 2.11592 2.11640 2.17057 2.31878 2.55410 2.66609 2.83141 3.15383 2.03583 2.00512 3.35192 3.70995 3.28945 3.38501 Pyrococcus 2.11592 0.00000 0.74543 0.83494 0.87139 2.47183 2.25086 2.72157 2.77429 1.88941 2.24739 2.89952 2.49029 2.75415 4.32155 Methanoco2 2.11640 0.74543 0.00000 0.80440 0.87461 2.36547 2.22585 2.79317 2.70750 1.67402 2.40047 2.91932 2.83359 2.54805 4.52644 Methanobac 2.17057 0.83494 0.80440 0.00000 0.80005 2.47534 2.36099 2.87744 2.75551 1.78275 2.25365 3.20476 2.90155 2.73000 3.99778 Archaeoglo 2.31878 0.87139 0.87461 0.80005 0.00000 2.48309 2.40855 2.95787 2.82522 1.87455 2.11352 2.86959 3.00043 2.87344 4.41076 Bostaurus 2.55410 2.47183 2.36547 2.47534 2.48309 0.00000 0.00000 1.14045 1.33685 2.32135 2.84844 3.36935 3.41398 3.64963 4.25356 Homosapien 2.66609 2.25086 2.22585 2.36099 2.40855 0.00000 0.00000 0.84316 0.92911 2.31720 2.83395 3.35264 3.41343 2.57889 2.47006 Schizosacc 2.83141 2.72157 2.79317 2.87744 2.95787 1.14045 0.84316 0.00000 0.91533 2.54050 2.91982 3.72445 3.68539 4.03895 5.02556 Saccharomy 3.15383 2.77429 2.70750 2.75551 2.82522 1.33685 0.92911 0.91533 0.00000 2.69419 3.02400 3.71790 3.39282 4.20893 4.68039 Methanoco3 2.03583 1.88941 1.67402 1.78275 1.87455 2.32135 2.31720 2.54050 2.69419 0.00000 2.39407 2.83818 3.25703 2.66340 3.00673 Archaeogl2 2.00512 2.24739 2.40047 2.25365 2.11352 2.84844 2.83395 2.91982 3.02400 2.39407 0.00000 3.59653 4.35932 3.63187 4.13689 Vibrio 3.35192 2.89952 2.91932 3.20476 2.86959 3.36935 3.35264 3.72445 3.71790 2.83818 3.59653 0.00000 2.43236 3.52040 5.08737 Deinococcu 3.70995 2.49029 2.83359 2.90155 3.00043 3.41398 3.41343 3.68539 3.39282 3.25703 4.35932 2.43236 0.00000 3.67529 3.67533 Arabidopsi 3.28945 2.75415 2.54805 2.73000 2.87344 3.64963 2.57889 4.03895 4.20893 2.66340 3.63187 3.52040 3.67529 0.00000 4.00750 Plasmodium 3.38501 4.32155 4.52644 3.99778 4.41076 4.25356 2.47006 5.02556 4.68039 3.00673 4.13689 5.08737 3.67533 4.00750 0.00000 Average distance (over all possible pairs of sequences): 2.75742 TREE SEARCH Quartet puzzling is used to choose from the possible tree topologies and to simultaneously infer support values for internal branches. Number of puzzling steps: 1000 Analysed quartets: 1365 Unresolved quartets: 143 (= 10.5%) Quartet trees are based on approximate maximum likelihood values using the selected model of substitution and rate heterogeneity. QUARTET PUZZLING TREE Support for the internal branches of the unrooted quartet puzzling tree topology is shown in percent. This quartet puzzling tree is not completely resolved! :---Vibrio :---------97: : :---Deinococcu : : :---Methanobac : :-75: : : :---Archaeoglo :-----91: : :-------Pyrococcus :-54: : : : :-------Methanoco2 : : : : :---Schizosacc : : :-99: : : : :---Saccharomy : : :-94: :-73: : : : :---Bostaurus : : : : :-97: : : :-78: :---Homosapien : : : : : : :---Arabidopsi : : :-----89: : : :---Plasmodium : : : :-------------------Methanoco3 : :-----------------------Archaeogl2 : :-----------------------Methanococ Quartet puzzling tree (in CLUSTAL W notation): (Methanococ,(((Vibrio,Deinococcu)97,((Methanobac,Archaeoglo)75, Pyrococcus,Methanoco2)91,(((Schizosacc,Saccharomy)99,(Bostaurus, Homosapien)97)94,(Arabidopsi,Plasmodium)89)78)54,Methanoco3)73, Archaeogl2); BIPARTITIONS The following bipartitions occured at least once in all intermediate trees that have been generated in the 1000 puzzling steps: Bipartitions included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) *******..* ***** : 992 *****..*** ***** : 972 ********** *..** : 968 *****....* ***** : 941 *....***** ***** : 913 ********** ***.. : 886 *****....* ***.. : 776 ***..***** ***** : 754 *......... *.... : 728 *........* *.... : 537 Bipartitions not included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) *..******* ***** : 434 *****....* *.... : 424 **...***** ***** : 279 *........* ..... : 236 *.*..***** ***** : 218 *****..... *.... : 194 *....***** *..** : 182 *********. *..** : 165 *........* ***.. : 150 *****....* ***.* : 93 *....****. *..** : 76 **..****** ***** : 73 *.*.****** ***** : 68 *****..... ***.. : 66 *....****. ***** : 66 *...****** ***** : 53 *....***** ***.. : 47 *......... ***.. : 42 *****....* *..** : 37 ********** *.... : 35 (99 other less frequent bipartitions not shown) MAXIMUM LIKELIHOOD BRANCH LENGTHS ON QUARTET PUZZLING TREE (NO CLOCK) Branch lengths are computed using the selected model of substitution and rate heterogeneity. :-----------12 Vibrio :--------16 : :------------13 Deinococcu :-24 : : :---4 Methanobac : : :-17 : : : :---5 Archaeoglo : :-----18 : : :---2 Pyrococcus : : : : : :---3 Methanoco2 : : : : :--8 Schizosacc : : :---19 : : : :----9 Saccharomy : : :-----21 : : : : :-6 Bostaurus : : : :--20 : : : :-7 Homosapien : :------23 : : :--------14 Arabidopsi : :--------22 : :--------------15 Plasmodium :----25 : :--------10 Methanoco3 : :---------11 Archaeogl2 : :-------1 Methanococ branch length S.E. branch length S.E. Methanococ 1 1.25043 0.16769 16 1.37334 0.26537 Pyrococcus 2 0.45263 0.04956 17 0.15030 0.03705 Methanoco2 3 0.47099 0.05018 18 0.91101 0.13432 Methanobac 4 0.45596 0.04984 19 0.54779 0.06829 Archaeoglo 5 0.56614 0.05595 20 0.32439 0.06273 Bostaurus 6 0.00001 0.00028 21 0.92969 0.16984 Homosapien 7 0.00001 0.00071 22 1.50132 0.22906 Schizosacc 8 0.37036 0.04838 23 1.11719 0.18478 Saccharomy 9 0.60190 0.05683 24 0.09109 0.09509 Methanoco3 10 1.34685 0.16272 25 0.63473 0.13787 Archaeogl2 11 1.68126 0.19911 Vibrio 12 1.99199 0.28141 Deinococcu 13 2.12742 0.35979 Arabidopsi 14 1.42879 0.20238 12 iterations until convergence Plasmodium 15 2.63223 0.25499 log L: -18709.43 WARNING --- at least one brach length is close to an internal boundary! Quartet puzzling tree with maximum likelihood branch lengths (in CLUSTAL W notation): (Methanococ:1.25043,(((Vibrio:1.99199,Deinococcu:2.12742)97:1.37334, ((Methanobac:0.45596,Archaeoglo:0.56614)75:0.15030,Pyrococcus:0.45263, Methanoco2:0.47099)91:0.91101,(((Schizosacc:0.37036,Saccharomy:0.60190) 99:0.54779,(Bostaurus:0.00001,Homosapien:0.00001)97:0.32439)94:0.92969, (Arabidopsi:1.42879,Plasmodium:2.63223)89:1.50132)78:1.11719)54:0.09109, Methanoco3:1.34685)73:0.63473,Archaeogl2:1.68126); TIME STAMP Date and time: Fri Jun 18 10:45:12 1999 Runtime: 2605 seconds (= 43.4 minutes = 0.7 hours)