PUZZLE 4.0.2 Type of analysis: tree reconstruction Parameter estimation: approximate (faster) Parameter estimation uses: neighbor-joining tree (for substitution process and rate variation) Standard errors (S.E.) are obtained by the curvature method. The upper and lower bounds of an approximate 95% confidence interval for parameter or branch length x are x-1.96*S.E. and x+1.96*S.E. SEQUENCE ALIGNMENT Input data: 22 sequences with 699 amino acid sites Number of constant sites: 4 (= 0.6% of all sites) SUBSTITUTION PROCESS Model of substitution: JTT (Jones et al. 1992) Amino acid frequencies (estimated from data set): pi(A) = 9.1% pi(R) = 3.8% pi(N) = 4.5% pi(D) = 5.7% pi(C) = 1.0% pi(Q) = 2.2% pi(E) = 7.9% pi(G) = 8.6% pi(H) = 1.5% pi(I) = 7.0% pi(L) = 7.1% pi(K) = 7.9% pi(M) = 2.8% pi(F) = 3.7% pi(P) = 3.4% pi(S) = 4.5% pi(T) = 5.5% pi(W) = 0.7% pi(Y) = 3.5% pi(V) = 9.5% RATE HETEROGENEITY Model of rate heterogeneity: Gamma distributed rates Gamma distribution parameter alpha (estimated from data set): 1.96 (S.E. 0.14) Number of Gamma rate categories: 8 Rates and their respective probabilities used in the likelihood function: Category Relative rate Probability 1 0.2055 0.1250 2 0.4030 0.1250 3 0.5781 0.1250 4 0.7598 0.1250 5 0.9652 0.1250 6 1.2189 0.1250 7 1.5774 0.1250 8 2.2921 0.1250 Categories 1-8 approximate a continous Gamma-distribution with expectation 1 and variance 0.51. Combination of categories that contributes the most to the likelihood (computation done without clock assumption assuming quartet-puzzling tree): 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 1 8 8 8 8 8 8 8 8 7 8 5 8 8 2 8 1 8 8 5 2 3 8 3 8 8 8 8 1 8 8 1 8 8 8 8 1 8 8 8 8 8 8 8 8 7 4 8 4 2 2 1 2 1 3 4 1 1 1 3 3 2 1 3 6 4 5 4 1 8 8 6 7 4 8 6 2 6 2 5 3 4 4 6 5 2 1 4 3 6 1 2 2 1 2 8 4 6 3 8 2 7 6 8 6 6 4 6 5 6 5 5 2 8 8 8 8 8 3 8 1 4 8 2 8 4 8 5 7 5 1 1 5 1 8 3 6 3 7 1 8 7 2 1 8 7 6 3 5 2 8 5 8 6 7 7 1 1 6 5 8 8 7 5 1 1 3 1 3 6 1 1 1 8 3 8 2 5 8 2 7 3 2 2 8 8 6 8 6 8 8 8 8 5 6 8 8 8 8 1 8 8 4 4 5 2 1 6 4 6 8 4 5 8 2 6 4 8 8 3 7 7 8 8 8 7 8 8 8 8 5 4 5 1 2 1 1 1 4 1 3 1 1 3 1 4 2 1 2 5 8 6 7 2 4 6 1 2 3 3 2 7 6 2 4 5 4 7 7 8 5 1 7 4 4 2 8 7 4 4 6 5 3 7 7 7 2 1 5 5 6 5 3 5 4 1 6 7 5 7 2 5 8 5 7 5 4 1 7 8 1 5 2 4 6 4 8 8 8 6 1 1 3 1 2 5 3 2 1 4 4 1 1 2 5 4 5 3 5 8 7 8 7 4 6 7 8 4 1 2 1 8 1 2 4 7 2 5 1 1 8 6 8 8 8 8 8 8 6 3 1 4 1 4 1 1 3 1 5 4 8 5 8 5 3 8 8 8 7 8 8 8 8 8 2 3 2 1 4 2 1 4 4 6 2 3 2 3 3 8 1 6 7 1 8 8 8 6 8 5 8 8 3 5 3 3 3 4 5 3 7 4 6 3 7 6 6 3 3 2 1 4 2 4 8 8 1 5 7 8 3 8 7 5 8 5 5 8 8 8 2 8 5 4 8 4 5 2 3 3 6 8 1 7 8 2 8 6 2 3 3 5 4 7 8 4 8 4 2 4 1 2 2 5 5 1 6 7 8 6 4 3 6 4 2 4 7 5 1 5 1 1 4 8 8 4 2 5 7 5 2 5 8 8 2 6 4 3 4 2 4 6 2 5 2 5 5 8 2 6 6 6 6 3 2 7 8 1 8 7 7 7 8 8 8 8 8 8 8 8 8 1 8 8 8 8 8 8 8 8 8 8 1 2 1 8 8 2 8 8 8 8 8 1 8 8 8 8 1 8 8 8 8 2 1 8 1 8 1 8 8 8 8 8 8 8 8 8 1 8 8 8 8 8 1 2 2 2 8 8 1 8 8 8 8 1 8 8 8 8 1 8 8 8 8 8 1 8 8 2 8 1 8 8 8 8 8 8 1 8 1 SEQUENCES IN INPUT ORDER 5% chi-square test p-value Brassica failed 0.53% [68] Archaeoglo failed 2.23% [42] Archaeogl2 passed 35.83% [51] Brachyspir passed 98.20% [51] Serpulina4 passed 93.80% [54] Serpulina3 passed 94.59% [55] Serpulina7 passed 97.22% [43] Serpulina passed 81.86% [57] Brachyspi2 passed 95.20% [69] Mycoplasma failed 3.56% [53] Enterococc passed 21.83% [46] Streptococ passed 62.13% [60] Treponema passed 37.37% [50] Enterococ2 passed 73.79% [37] Methanobac failed 0.02% [52] Methanococ passed 8.31% [50] Archaeogl3 passed 37.20% [93] Archaeogl4 passed 23.72% [78] Borrelia failed 0.00% [47] Staphyloco passed 15.75% [62] Archaeogl5 passed 8.38% [101] Deinococcu failed 0.00% [101] The chi-square tests compares the amino acid composition of each sequence to the frequency distribution assumed in the maximum likelihood model. The number in square brackets indicates how often each sequence is involved in one of the 330 completely unresolved quartets of the quartet puzzling tree search. IDENTICAL SEQUENCES The sequences in each of the following groups are all identical. To speed up computation please remove all but one of each group from the data set. All sequences are unique. MAXIMUM LIKELIHOOD DISTANCES Maximum likelihood distances are computed using the selected model of substitution and rate heterogeneity. 22 Brassica 0.00000 5.72643 5.68818 4.77798 4.97284 5.15737 5.26194 4.86517 5.35363 5.73652 5.33914 5.04106 5.18005 6.18977 4.45054 4.88474 4.82329 4.89387 5.24370 5.22828 4.72371 2.97660 Archaeoglo 5.72643 0.00000 1.02986 1.87760 1.93435 1.94423 1.84493 1.89593 1.98482 2.44321 1.94930 2.21065 2.21003 1.98846 1.85968 1.81775 1.93472 1.43361 1.58034 2.44839 3.40001 2.26408 Archaeogl2 5.68818 1.02986 0.00000 2.09737 2.09220 2.06788 1.98212 2.09252 2.05185 2.78513 2.45319 2.29793 2.34742 2.27274 1.86110 1.87629 2.32533 1.92641 1.83643 2.60262 3.34408 2.10537 Brachyspir 4.77798 1.87760 2.09737 0.00000 0.05637 0.06816 0.09948 0.08072 0.10560 1.31044 1.09831 1.28129 1.28309 1.74892 2.40079 2.26795 2.17665 2.17244 1.69402 2.75737 3.34949 2.56060 Serpulina4 4.97284 1.93435 2.09220 0.05637 0.00000 0.08583 0.10463 0.10645 0.11178 1.28457 1.09545 1.24746 1.26474 1.67845 2.38202 2.19801 2.27274 2.15922 1.68916 2.87456 3.42501 2.61193 Serpulina3 5.15737 1.94423 2.06788 0.06816 0.08583 0.00000 0.09583 0.08486 0.10308 1.35357 1.15035 1.28548 1.34001 1.70947 2.40197 2.20831 2.31602 2.28042 1.70369 2.83872 3.39483 2.59064 Serpulina7 5.26194 1.84493 1.98212 0.09948 0.10463 0.09583 0.00000 0.10728 0.09692 1.28876 1.09525 1.27703 1.34278 1.72898 2.38622 2.18593 2.29382 2.16635 1.62439 2.84341 3.34068 2.69218 Serpulina 4.86517 1.89593 2.09252 0.08072 0.10645 0.08486 0.10728 0.00000 0.11732 1.28092 1.08209 1.28303 1.27637 1.69530 2.45372 2.30278 2.25636 2.25004 1.69041 2.90118 3.23967 2.63626 Brachyspi2 5.35363 1.98482 2.05185 0.10560 0.11178 0.10308 0.09692 0.11732 0.00000 1.28638 1.06261 1.31447 1.37485 1.70613 2.39081 2.25856 2.26864 2.27294 1.71905 2.93114 3.36340 2.52425 Mycoplasma 5.73652 2.44321 2.78513 1.31044 1.28457 1.35357 1.28876 1.28092 1.28638 0.00000 1.34377 1.63858 1.63125 1.80528 2.88309 2.96938 2.55382 2.87386 2.12632 3.15388 4.52888 2.62056 Enterococc 5.33914 1.94930 2.45319 1.09831 1.09545 1.15035 1.09525 1.08209 1.06261 1.34377 0.00000 1.30058 1.35676 1.27403 2.24000 2.41163 2.12088 2.11826 1.73703 2.68329 3.58655 2.42026 Streptococ 5.04106 2.21065 2.29793 1.28129 1.24746 1.28548 1.27703 1.28303 1.31447 1.63858 1.30058 0.00000 0.70385 1.75220 2.47805 2.38584 2.63132 2.71668 2.13463 2.64567 3.45781 2.75699 Treponema 5.18005 2.21003 2.34742 1.28309 1.26474 1.34001 1.34278 1.27637 1.37485 1.63125 1.35676 0.70385 0.00000 1.80688 2.82132 2.52707 2.65441 2.50138 2.00485 2.84852 3.60905 2.53375 Enterococ2 6.18977 1.98846 2.27274 1.74892 1.67845 1.70947 1.72898 1.69530 1.70613 1.80528 1.27403 1.75220 1.80688 0.00000 2.55593 2.75319 2.68012 2.25731 2.05347 2.81565 3.84860 2.66806 Methanobac 4.45054 1.85968 1.86110 2.40079 2.38202 2.40197 2.38622 2.45372 2.39081 2.88309 2.24000 2.47805 2.82132 2.55593 0.00000 0.71750 1.88552 1.62655 2.10906 2.29856 2.68562 1.98170 Methanococ 4.88474 1.81775 1.87629 2.26795 2.19801 2.20831 2.18593 2.30278 2.25856 2.96938 2.41163 2.38584 2.52707 2.75319 0.71750 0.00000 2.03156 1.48332 1.85900 2.45557 2.56272 2.20938 Archaeogl3 4.82329 1.93472 2.32533 2.17665 2.27274 2.31602 2.29382 2.25636 2.26864 2.55382 2.12088 2.63132 2.65441 2.68012 1.88552 2.03156 0.00000 2.02013 2.22056 3.41046 3.09750 2.41932 Archaeogl4 4.89387 1.43361 1.92641 2.17244 2.15922 2.28042 2.16635 2.25004 2.27294 2.87386 2.11826 2.71668 2.50138 2.25731 1.62655 1.48332 2.02013 0.00000 1.75086 2.36960 3.05127 2.27539 Borrelia 5.24370 1.58034 1.83643 1.69402 1.68916 1.70369 1.62439 1.69041 1.71905 2.12632 1.73703 2.13463 2.00485 2.05347 2.10906 1.85900 2.22056 1.75086 0.00000 1.96764 3.38470 1.97920 Staphyloco 5.22828 2.44839 2.60262 2.75737 2.87456 2.83872 2.84341 2.90118 2.93114 3.15388 2.68329 2.64567 2.84852 2.81565 2.29856 2.45557 3.41046 2.36960 1.96764 0.00000 3.72167 2.06028 Archaeogl5 4.72371 3.40001 3.34408 3.34949 3.42501 3.39483 3.34068 3.23967 3.36340 4.52888 3.58655 3.45781 3.60905 3.84860 2.68562 2.56272 3.09750 3.05127 3.38470 3.72167 0.00000 3.21773 Deinococcu 2.97660 2.26408 2.10537 2.56060 2.61193 2.59064 2.69218 2.63626 2.52425 2.62056 2.42026 2.75699 2.53375 2.66806 1.98170 2.20938 2.41932 2.27539 1.97920 2.06028 3.21773 0.00000 Average distance (over all possible pairs of sequences): 2.31683 TREE SEARCH Quartet puzzling is used to choose from the possible tree topologies and to simultaneously infer support values for internal branches. Number of puzzling steps: 1000 Analysed quartets: 7315 Unresolved quartets: 330 (= 4.5%) Quartet trees are based on approximate maximum likelihood values using the selected model of substitution and rate heterogeneity. QUARTET PUZZLING TREE Support for the internal branches of the unrooted quartet puzzling tree topology is shown in percent. This quartet puzzling tree is not completely resolved! :---Archaeoglo :---------98: : :---Archaeogl2 : : :---Streptococ : :-----99: : : :---Treponema : : : : :---Serpulina7 : : :-81: : : : :---Brachyspi2 : : : : : :-------Brachyspir :-98:-98: : : :-------Serpulina4 : : : : : :-------Serpulina3 : : : : : :-------Serpulina : : : : :---Enterococc : : :-67: : :-52: :---Enterococ2 : : : :-------Mycoplasma : : :---Methanobac : :-70: :-----59: :---Methanococ : : : :-------Archaeogl4 : :---------------Archaeogl3 : :---------------Borrelia : :---------------Staphyloco : :---------------Archaeogl5 : :---------------Deinococcu : :---------------Brassica Quartet puzzling tree (in CLUSTAL W notation): (Brassica,(Archaeoglo,Archaeogl2)98,((Streptococ,Treponema)99, ((Serpulina7,Brachyspi2)81,Brachyspir,Serpulina4,Serpulina3, Serpulina)98,((Enterococc,Enterococ2)67,Mycoplasma)52)98, ((Methanobac,Methanococ)70,Archaeogl4)59,Archaeogl3,Borrelia, Staphyloco,Archaeogl5,Deinococcu); BIPARTITIONS The following bipartitions occured at least once in all intermediate trees that have been generated in the 1000 puzzling steps: Bipartitions included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) ********** *..******* ** : 992 *..******* ********** ** : 977 ***......* ********** ** : 976 ***....... ....****** ** : 975 ******.*.* ********** ** : 808 ********** ****..**** ** : 704 ********** .**.****** ** : 672 ********** ****..*.** ** : 591 *********. .**.****** ** : 518 Bipartitions not included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) ***...*.** ********** ** : 493 ***..***** ********** ** : 490 *......... .......... .* : 459 ***....... ....****.* ** : 436 *****.*.** ********** ** : 367 ********** ********.. ** : 359 ***....... *..******* ** : 323 ***....... ********** ** : 311 ***......* *..******* ** : 293 *......... ....****.* ** : 291 *......... .........* .* : 284 *......... .......... ** : 282 *********. .********* ** : 267 ***.*.**** ********** ** : 261 ***...**** ********** ** : 261 ***....... .**.****** ** : 261 *......... .......... *. : 254 ********** *****.*.** ** : 249 ********** *********. *. : 242 ***.*.*.** ********** ** : 231 (218 other less frequent bipartitions not shown) MAXIMUM LIKELIHOOD BRANCH LENGTHS ON QUARTET PUZZLING TREE (NO CLOCK) Branch lengths are computed using the selected model of substitution and rate heterogeneity. :--2 Archaeoglo :--23 : :---3 Archaeogl2 : : :--12 Streptococ : :---24 : : :--13 Treponema :----29 : : :-7 Serpulina7 : : :-25 : : : :-9 Brachyspi2 : : : : : :-4 Brachyspir : :---26 : : :-5 Serpulina4 : : : : : :-6 Serpulina3 : : : : : :-8 Serpulina : : : : :--11 Enterococc : : :-27 : : : :----14 Enterococ2 : :-28 : :----10 Mycoplasma : : :--15 Methanobac : :---30 : : :--16 Methanococ :--31 : :----18 Archaeogl4 : :------17 Archaeogl3 : :----19 Borrelia : :-------20 Staphyloco : :-----------21 Archaeogl5 : :-------22 Deinococcu : :------------------1 Brassica branch length S.E. branch length S.E. Brassica 1 4.08327 0.42139 23 0.44487 0.06560 Archaeoglo 2 0.46355 0.06054 24 0.49172 0.06337 Archaeogl2 3 0.61061 0.06825 25 0.02513 0.00863 Brachyspir 4 0.03450 0.00998 26 0.50316 0.05973 Serpulina4 5 0.06198 0.01329 27 0.16450 0.04375 Serpulina3 6 0.04454 0.01125 28 0.06362 0.03962 Serpulina7 7 0.04441 0.01125 29 0.83002 0.08847 Serpulina 8 0.05369 0.01238 30 0.48355 0.07067 Brachyspi2 9 0.05518 0.01255 31 0.25244 0.05624 Mycoplasma 10 0.86753 0.07986 Enterococc 11 0.39794 0.05337 Streptococ 12 0.34338 0.04738 Treponema 13 0.39542 0.04971 Enterococ2 14 0.89490 0.08328 Methanobac 15 0.41816 0.05223 Methanococ 16 0.33059 0.04836 Archaeogl3 17 1.37098 0.12229 Archaeogl4 18 0.75985 0.08031 Borrelia 19 0.85414 0.08394 Staphyloco 20 1.51415 0.13256 Archaeogl5 21 2.41187 0.21317 12 iterations until convergence Deinococcu 22 1.42995 0.17035 log L: -16701.80 Quartet puzzling tree with maximum likelihood branch lengths (in CLUSTAL W notation): (Brassica:4.08327,(Archaeoglo:0.46355,Archaeogl2:0.61061)98:0.44487, ((Streptococ:0.34338,Treponema:0.39542)99:0.49172,((Serpulina7:0.04441, Brachyspi2:0.05518)81:0.02513,Brachyspir:0.03450,Serpulina4:0.06198, Serpulina3:0.04454,Serpulina:0.05369)98:0.50316,((Enterococc:0.39794, Enterococ2:0.89490)67:0.16450,Mycoplasma:0.86753)52:0.06362)98:0.83002, ((Methanobac:0.41816,Methanococ:0.33059)70:0.48355,Archaeogl4:0.75985) 59:0.25244,Archaeogl3:1.37098,Borrelia:0.85414,Staphyloco:1.51415, Archaeogl5:2.41187,Deinococcu:1.42995); TIME STAMP Date and time: Thu Jun 17 19:27:37 1999 Runtime: 6377 seconds (= 106.3 minutes = 1.8 hours)