PUZZLE 4.0.2 Type of analysis: tree reconstruction Parameter estimation: approximate (faster) Parameter estimation uses: neighbor-joining tree (for substitution process and rate variation) Standard errors (S.E.) are obtained by the curvature method. The upper and lower bounds of an approximate 95% confidence interval for parameter or branch length x are x-1.96*S.E. and x+1.96*S.E. SEQUENCE ALIGNMENT Input data: 15 sequences with 901 amino acid sites Number of constant sites: 3 (= 0.3% of all sites) SUBSTITUTION PROCESS Model of substitution: JTT (Jones et al. 1992) Amino acid frequencies (estimated from data set): pi(A) = 5.3% pi(R) = 6.6% pi(N) = 3.3% pi(D) = 5.6% pi(C) = 1.4% pi(Q) = 2.3% pi(E) = 8.6% pi(G) = 6.7% pi(H) = 1.7% pi(I) = 7.3% pi(L) = 10.0% pi(K) = 6.9% pi(M) = 2.1% pi(F) = 4.6% pi(P) = 5.4% pi(S) = 5.4% pi(T) = 4.7% pi(W) = 1.0% pi(Y) = 3.9% pi(V) = 7.1% RATE HETEROGENEITY Model of rate heterogeneity: Gamma distributed rates Gamma distribution parameter alpha (estimated from data set): 2.10 (S.E. 0.18) Number of Gamma rate categories: 8 Rates and their respective probabilities used in the likelihood function: Category Relative rate Probability 1 0.2150 0.1250 2 0.4134 0.1250 3 0.5873 0.1250 4 0.7668 0.1250 5 0.9689 0.1250 6 1.2176 0.1250 7 1.5677 0.1250 8 2.2633 0.1250 Categories 1-8 approximate a continous Gamma-distribution with expectation 1 and variance 0.48. Combination of categories that contributes the most to the likelihood (computation done without clock assumption assuming quartet-puzzling tree): 8 8 8 1 8 8 8 8 8 6 8 8 8 8 8 6 5 5 6 7 3 8 8 6 8 3 8 8 7 2 8 3 7 8 6 7 7 8 5 3 4 1 4 5 4 1 3 7 3 7 7 2 1 8 2 5 2 3 1 2 6 3 6 4 6 5 8 3 3 8 3 7 5 2 8 7 7 2 5 4 4 6 7 8 1 8 8 8 4 5 4 1 8 4 6 5 3 5 2 5 2 4 6 4 5 4 8 4 3 2 2 4 2 2 1 3 5 3 2 7 3 3 4 3 4 5 4 5 2 4 8 8 7 4 1 7 4 8 8 4 6 4 1 7 8 3 2 4 4 3 3 1 1 2 3 3 1 3 1 2 8 4 4 8 1 1 7 6 5 4 1 6 1 5 3 1 1 2 1 8 8 5 3 7 3 4 4 2 7 7 6 7 4 8 8 7 8 8 1 7 7 8 3 8 5 3 8 4 4 7 2 2 5 1 2 6 4 3 7 8 8 6 6 8 6 8 7 2 5 6 6 8 6 8 7 4 5 7 8 4 6 3 2 8 8 1 8 8 8 8 2 8 3 5 8 7 7 8 1 2 2 5 6 6 6 4 4 6 4 4 7 2 4 2 1 1 4 1 1 1 3 3 3 1 2 1 1 3 2 2 8 6 4 2 1 6 7 1 8 5 7 8 2 4 3 6 4 5 2 3 8 6 7 6 8 3 3 6 4 4 2 3 1 2 3 2 1 5 4 3 3 6 8 6 7 5 4 8 3 1 8 2 8 4 8 2 1 2 4 8 3 7 2 8 6 6 6 4 2 1 8 8 8 7 6 4 2 1 1 4 1 3 1 3 2 3 4 3 2 2 6 4 1 2 8 6 2 8 8 3 2 3 1 3 1 3 1 1 2 1 4 2 2 1 5 7 2 2 1 4 1 1 5 2 3 3 7 5 1 3 3 4 1 3 4 8 1 5 8 8 7 5 1 3 3 1 1 1 3 2 5 1 2 1 1 8 2 1 4 3 8 2 3 5 2 6 8 5 2 5 5 7 8 8 8 7 2 6 8 8 8 3 2 3 2 1 2 1 5 2 4 1 1 1 3 1 6 3 3 7 8 6 6 6 6 7 6 8 8 7 6 4 8 1 5 7 8 8 4 4 8 8 3 2 5 7 4 7 7 8 2 8 4 5 8 8 5 8 5 8 8 4 3 6 1 5 3 6 8 8 8 4 7 7 5 7 6 8 3 3 6 6 5 7 6 7 8 8 8 8 1 8 8 4 8 5 7 4 4 8 8 8 7 8 7 8 8 8 8 8 3 8 2 8 5 8 8 8 4 8 8 8 8 8 6 1 8 8 8 8 8 8 6 8 8 8 8 4 6 7 8 6 8 3 3 1 4 7 3 8 8 2 5 5 1 3 8 1 2 1 6 8 8 3 1 8 4 4 8 8 7 5 1 5 8 1 8 6 4 2 7 2 5 6 1 2 8 8 8 8 8 8 1 8 3 8 8 8 8 8 8 8 8 8 8 8 1 8 8 8 8 8 1 8 1 1 8 5 8 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 1 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 1 8 8 8 8 8 8 8 8 8 8 8 8 8 8 1 8 8 1 8 8 8 8 8 8 8 8 8 8 8 1 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 1 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 1 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 SEQUENCES IN INPUT ORDER 5% chi-square test p-value Methanobac failed 0.36% [45] Methanococ passed 7.12% [30] Methanoco2 failed 0.02% [68] Methanoba2 passed 45.56% [34] Pyrococcus passed 77.71% [33] Deinococcu passed 60.37% [61] Archaeoglo passed 42.55% [49] Methanoco3 failed 0.02% [60] Thermotoga passed 78.45% [56] Synechocys failed 0.00% [51] Methanoba3 failed 0.00% [60] Synechocy2 failed 0.00% [49] Thermotog2 passed 13.50% [81] Thermotog3 failed 2.81% [89] Thermotog4 passed 8.06% [50] The chi-square tests compares the amino acid composition of each sequence to the frequency distribution assumed in the maximum likelihood model. The number in square brackets indicates how often each sequence is involved in one of the 204 completely unresolved quartets of the quartet puzzling tree search. IDENTICAL SEQUENCES The sequences in each of the following groups are all identical. To speed up computation please remove all but one of each group from the data set. All sequences are unique. MAXIMUM LIKELIHOOD DISTANCES Maximum likelihood distances are computed using the selected model of substitution and rate heterogeneity. 15 Methanobac 0.00000 1.17693 3.15224 3.51857 3.89573 3.48473 5.12660 4.42893 4.50729 5.15505 4.51144 5.58335 4.90276 5.61178 6.56371 Methanococ 1.17693 0.00000 2.92616 3.56851 3.80980 3.70000 5.58599 4.41008 4.49913 4.33680 5.15755 5.90970 4.98280 5.82423 5.98663 Methanoco2 3.15224 2.92616 0.00000 3.45246 3.59367 3.75731 4.68667 4.09834 3.99831 4.55068 4.81152 4.47536 4.79688 5.73113 6.15525 Methanoba2 3.51857 3.56851 3.45246 0.00000 1.59476 3.98080 5.31709 4.85372 4.27759 5.04451 5.66323 6.40958 5.23539 5.97241 6.74721 Pyrococcus 3.89573 3.80980 3.59367 1.59476 0.00000 3.83925 5.96825 5.05368 4.84938 5.06787 5.51079 5.74353 4.76537 7.25692 5.76212 Deinococcu 3.48473 3.70000 3.75731 3.98080 3.83925 0.00000 3.02593 3.14875 3.01406 3.00927 2.85026 3.03825 3.54433 3.72318 4.42641 Archaeoglo 5.12660 5.58599 4.68667 5.31709 5.96825 3.02593 0.00000 1.75166 2.17302 2.83779 1.97748 2.96012 2.80035 3.45681 6.14691 Methanoco3 4.42893 4.41008 4.09834 4.85372 5.05368 3.14875 1.75166 0.00000 2.31207 2.94894 2.22798 3.23583 2.77748 3.79654 6.04120 Thermotoga 4.50729 4.49913 3.99831 4.27759 4.84938 3.01406 2.17302 2.31207 0.00000 1.55141 2.29120 2.66364 3.16637 3.33318 5.94620 Synechocys 5.15505 4.33680 4.55068 5.04451 5.06787 3.00927 2.83779 2.94894 1.55141 0.00000 2.66867 2.36331 3.34496 3.69532 6.57357 Methanoba3 4.51144 5.15755 4.81152 5.66323 5.51079 2.85026 1.97748 2.22798 2.29120 2.66867 0.00000 2.79965 3.00007 3.53257 6.71886 Synechocy2 5.58335 5.90970 4.47536 6.40958 5.74353 3.03825 2.96012 3.23583 2.66364 2.36331 2.79965 0.00000 3.80415 3.23835 5.58719 Thermotog2 4.90276 4.98280 4.79688 5.23539 4.76537 3.54433 2.80035 2.77748 3.16637 3.34496 3.00007 3.80415 0.00000 3.86786 6.18964 Thermotog3 5.61178 5.82423 5.73113 5.97241 7.25692 3.72318 3.45681 3.79654 3.33318 3.69532 3.53257 3.23835 3.86786 0.00000 7.20808 Thermotog4 6.56371 5.98663 6.15525 6.74721 5.76212 4.42641 6.14691 6.04120 5.94620 6.57357 6.71886 5.58719 6.18964 7.20808 0.00000 Average distance (over all possible pairs of sequences): 4.24863 TREE SEARCH Quartet puzzling is used to choose from the possible tree topologies and to simultaneously infer support values for internal branches. Number of puzzling steps: 1000 Analysed quartets: 1365 Unresolved quartets: 204 (= 14.9%) Quartet trees are based on approximate maximum likelihood values using the selected model of substitution and rate heterogeneity. QUARTET PUZZLING TREE Support for the internal branches of the unrooted quartet puzzling tree topology is shown in percent. This quartet puzzling tree is completely resolved. :---Methanoba2 :-----------------98: : :---Pyrococcus : : :---Thermotoga : :-74: : : :---Synechocys : :-----57: :-58: : : :---Synechocy2 : : : :-58: : : : :---Thermotog3 : : :-96: : : : : :---Archaeoglo : : : : :-51: : : : : :-59: :---Methanoco3 : : : : : : :-94: :-98: :-50: :-------Methanoba3 : : : : : : : :-----------Thermotog2 : : : : : : :---Deinococcu : : :-------------68: : : :---Thermotog4 : : : :---------------------------Methanoco2 : :-------------------------------Methanococ : :-------------------------------Methanobac Quartet puzzling tree (in CLUSTAL W notation): (Methanobac,(((Methanoba2,Pyrococcus)98,((((Thermotoga,Synechocys)74, (Synechocy2,Thermotog3)58)57,(((Archaeoglo,Methanoco3)51, Methanoba3)59,Thermotog2)50)96,(Deinococcu,Thermotog4)68)98)58, Methanoco2)94,Methanococ); BIPARTITIONS The following bipartitions occured at least once in all intermediate trees that have been generated in the 1000 puzzling steps: Bipartitions included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) ***..***** ***** : 980 *****..... ..... : 978 ******.... ....* : 958 **........ ..... : 935 ********.. ***** : 744 *****.**** ****. : 682 ******..** .**** : 587 ********** *.*.* : 581 ***....... ..... : 581 ********.. *.*.* : 566 ******..** ***** : 509 ******..** .*.** : 504 Bipartitions not included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) ******.*** .**** : 358 ********.. *.*** : 337 **.**..... ..... : 288 ******.... ..*.* : 286 *******.** **.** : 233 ******..** **.** : 193 *****..... ....* : 173 *********. *.*** : 149 ******.... ..... : 144 **...***** ***** : 130 *********. *.*.* : 100 ******..** ....* : 96 ******..** .*..* : 81 ******.... .*.** : 65 ********.. *...* : 58 ********** **..* : 54 *..******* ***** : 51 ******..** ..*.* : 50 ******.... .**** : 48 ******..** .**.* : 47 (81 other less frequent bipartitions not shown) MAXIMUM LIKELIHOOD BRANCH LENGTHS ON QUARTET PUZZLING TREE (NO CLOCK) Branch lengths are computed using the selected model of substitution and rate heterogeneity. :---4 Methanoba2 :-----16 : :----5 Pyrococcus :-26 : : :---9 Thermotoga : : :--17 : : : :----10 Synechocys : : :-19 : : : : :-----12 Synechocy2 : : : :--18 : : : :--------14 Thermotog3 : : :---23 : : : : :----7 Archaeoglo : : : : :-20 : : : : : :----8 Methanoco3 : : : : :-21 : : : : : :-----11 Methanoba3 : : : :-22 : : : :-------13 Thermotog2 : :------25 : : :-------6 Deinococcu : :-24 : :----------------15 Thermotog4 :----27 : :------3 Methanoco2 : :--2 Methanococ : :---1 Methanobac branch length S.E. branch length S.E. Methanobac 1 0.66818 0.08733 16 1.24403 0.19966 Methanococ 2 0.53814 0.08281 17 0.40451 0.08748 Methanoco2 3 1.59672 0.20582 18 0.36482 0.11102 Methanoba2 4 0.77273 0.11181 19 0.17341 0.07802 Pyrococcus 5 0.90409 0.11542 20 0.26134 0.07905 Deinococcu 6 1.76053 0.25911 21 0.12688 0.07029 Archaeoglo 7 0.83776 0.09998 22 0.17661 0.07753 Methanoco3 8 1.02464 0.11150 23 0.56682 0.17367 Thermotoga 9 0.62673 0.07797 24 0.00001 0.00039 Synechocys 10 0.91450 0.08930 25 1.47920 0.23757 Methanoba3 11 1.11350 0.11445 26 0.13908 0.12943 Synechocy2 12 1.25785 0.14132 27 1.03965 0.17723 Thermotog2 13 1.87668 0.17411 Thermotog3 14 2.10101 0.19255 12 iterations until convergence Thermotog4 15 4.18533 0.46681 log L: -18259.45 WARNING --- at least one brach length is close to an internal boundary! Quartet puzzling tree with maximum likelihood branch lengths (in CLUSTAL W notation): (Methanobac:0.66818,(((Methanoba2:0.77273,Pyrococcus:0.90409)98:1.24403, ((((Thermotoga:0.62673,Synechocys:0.91450)74:0.40451,(Synechocy2:1.25785, Thermotog3:2.10101)58:0.36482)57:0.17341,(((Archaeoglo:0.83776, Methanoco3:1.02464)51:0.26134,Methanoba3:1.11350)59:0.12688,Thermotog2:1.87668) 50:0.17661)96:0.56682,(Deinococcu:1.76053,Thermotog4:4.18533)68:0.00001) 98:1.47920)58:0.13908,Methanoco2:1.59672)94:1.03965,Methanococ:0.53814); TIME STAMP Date and time: Thu Jun 17 12:47:54 1999 Runtime: 1816 seconds (= 30.3 minutes = 0.5 hours)