PUZZLE 4.0.2 Type of analysis: tree reconstruction Parameter estimation: approximate (faster) Parameter estimation uses: neighbor-joining tree (for substitution process and rate variation) Standard errors (S.E.) are obtained by the curvature method. The upper and lower bounds of an approximate 95% confidence interval for parameter or branch length x are x-1.96*S.E. and x+1.96*S.E. SEQUENCE ALIGNMENT Input data: 22 sequences with 942 amino acid sites Number of constant sites: 8 (= 0.8% of all sites) SUBSTITUTION PROCESS Model of substitution: JTT (Jones et al. 1992) Amino acid frequencies (estimated from data set): pi(A) = 8.2% pi(R) = 4.5% pi(N) = 3.8% pi(D) = 6.5% pi(C) = 1.0% pi(Q) = 2.9% pi(E) = 7.6% pi(G) = 8.6% pi(H) = 2.7% pi(I) = 6.5% pi(L) = 9.0% pi(K) = 5.5% pi(M) = 1.9% pi(F) = 3.8% pi(P) = 5.2% pi(S) = 6.1% pi(T) = 4.7% pi(W) = 1.3% pi(Y) = 2.9% pi(V) = 7.1% RATE HETEROGENEITY Model of rate heterogeneity: Gamma distributed rates Gamma distribution parameter alpha (estimated from data set): 2.11 (S.E. 0.19) Number of Gamma rate categories: 8 Rates and their respective probabilities used in the likelihood function: Category Relative rate Probability 1 0.2229 0.1250 2 0.4219 0.1250 3 0.5949 0.1250 4 0.7725 0.1250 5 0.9718 0.1250 6 1.2163 0.1250 7 1.5597 0.1250 8 2.2398 0.1250 Categories 1-8 approximate a continous Gamma-distribution with expectation 1 and variance 0.47. Combination of categories that contributes the most to the likelihood (computation done without clock assumption assuming quartet-puzzling tree): 1 1 1 1 1 2 1 1 2 1 1 1 2 1 1 1 7 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2 1 1 1 1 1 1 2 1 1 2 2 1 1 1 2 1 1 1 7 1 1 2 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 2 1 1 2 1 1 2 2 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 1 2 2 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 1 1 2 2 7 1 1 1 2 1 2 2 1 1 1 1 2 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 1 1 2 1 1 2 1 7 1 2 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 7 1 1 1 1 2 2 1 2 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2 2 1 2 2 1 1 2 1 2 1 2 7 1 1 1 1 2 2 8 8 8 1 8 4 5 8 8 2 8 8 8 8 2 8 8 2 4 8 8 6 5 7 5 8 5 8 4 5 8 7 6 7 7 5 4 6 5 3 4 7 1 2 7 2 5 1 2 2 8 8 2 8 8 8 4 8 7 6 8 3 7 3 8 8 4 8 5 5 1 8 7 7 7 8 4 2 8 8 8 7 8 5 5 5 4 4 8 5 5 8 7 8 8 8 8 7 7 3 3 4 2 5 6 7 8 1 3 4 5 4 7 5 4 3 2 3 3 2 3 1 4 1 1 1 1 4 3 5 8 5 5 1 6 2 4 1 1 8 4 2 2 3 2 1 1 1 4 3 2 1 1 1 2 2 1 1 1 1 2 4 2 2 4 4 4 1 4 4 3 6 4 3 8 6 6 4 2 5 2 3 2 6 5 2 4 1 2 1 1 1 3 2 2 8 4 2 7 5 6 4 6 4 2 1 6 5 4 6 4 7 6 3 7 2 2 5 4 7 2 2 1 1 3 2 5 3 3 5 2 3 7 4 4 5 4 3 4 6 2 2 8 8 4 8 8 2 8 1 8 3 6 5 5 2 7 2 3 2 6 7 3 3 2 2 4 2 3 4 7 4 3 7 7 6 5 3 8 1 4 8 7 8 8 8 8 6 8 8 5 8 6 7 8 8 8 8 8 8 4 6 5 8 2 8 2 7 1 7 4 4 1 3 2 8 2 5 8 3 8 2 2 6 4 8 7 6 7 5 3 8 4 6 7 7 6 7 5 7 5 8 7 5 5 8 8 6 7 6 8 8 2 8 7 6 7 3 8 2 5 4 3 5 5 2 4 4 6 8 3 3 4 4 2 5 6 1 3 2 3 5 5 7 8 6 6 7 1 7 1 4 8 6 3 2 4 4 3 7 3 7 2 5 7 8 5 8 7 4 4 7 7 8 7 8 6 7 7 8 8 7 3 8 6 8 8 8 8 6 8 8 5 8 8 6 2 5 6 4 4 8 5 4 6 4 4 8 4 3 6 7 2 5 4 7 5 6 3 8 7 3 8 7 8 5 2 3 2 2 1 2 3 4 4 4 3 5 8 4 7 4 8 5 4 3 6 5 3 1 8 3 7 7 5 4 4 1 1 2 4 1 1 4 5 8 2 6 3 3 8 6 6 4 5 2 3 3 3 3 4 7 5 2 5 8 8 6 6 3 8 7 1 1 1 1 1 1 1 2 1 SEQUENCES IN INPUT ORDER 5% chi-square test p-value Bacillus4 passed 67.61% [252] Bacillus failed 0.00% [241] Bacillus2 passed 97.39% [226] Lactococcu failed 0.51% [169] Lactobacil failed 0.07% [188] Archaeoglo passed 62.35% [194] Thermotoga passed 57.82% [227] Ecoli2 failed 0.97% [227] Haemophilu passed 67.62% [237] Bordetella failed 2.12% [211] Mycobacter failed 0.00% [172] Mycobacte2 failed 0.00% [172] Schizosacc passed 35.74% [148] Saccharomy failed 4.74% [100] Celegans passed 75.06% [189] Deinococcu failed 1.02% [214] Chlamydoph failed 0.04% [267] Bacillus3 passed 28.14% [359] Saccharom2 failed 0.00% [222] Listeria failed 0.26% [365] Ecoli failed 4.19% [408] Pyrococcus failed 0.49% [368] The chi-square tests compares the amino acid composition of each sequence to the frequency distribution assumed in the maximum likelihood model. The number in square brackets indicates how often each sequence is involved in one of the 1289 completely unresolved quartets of the quartet puzzling tree search. IDENTICAL SEQUENCES The sequences in each of the following groups are all identical. To speed up computation please remove all but one of each group from the data set. All sequences are unique. MAXIMUM LIKELIHOOD DISTANCES Maximum likelihood distances are computed using the selected model of substitution and rate heterogeneity. 22 Bacillus4 0.00000 3.96523 4.24399 4.03481 4.73263 3.48951 3.29380 4.48460 4.28685 4.07264 4.58700 4.94711 6.11667 6.05218 4.95084 4.85649 5.14940 2.80053 5.62937 3.26967 3.92624 4.98914 Bacillus 3.96523 0.00000 0.75279 1.44101 1.56855 3.50051 3.57881 3.97644 3.91049 4.98791 4.96229 4.93314 5.44465 6.06242 4.79540 3.99153 6.05105 5.23505 5.58967 3.83575 4.42440 4.72146 Bacillus2 4.24399 0.75279 0.00000 1.39834 1.69437 3.84670 3.77254 3.95552 3.98232 5.29393 4.73465 4.66157 4.98106 6.22613 5.25441 5.12427 5.61193 4.96127 6.02283 4.35114 4.41493 5.57676 Lactococcu 4.03481 1.44101 1.39834 0.00000 1.19359 3.41775 3.39827 3.91865 4.29903 4.86322 5.45979 5.63595 7.05172 7.31396 5.33835 4.57725 6.31218 5.15762 6.73578 3.95787 4.97127 5.79450 Lactobacil 4.73263 1.56855 1.69437 1.19359 0.00000 3.43941 3.45143 4.23202 4.50731 5.11291 6.07604 6.15371 6.38560 6.98403 5.73038 5.76105 6.64413 5.30582 6.01536 4.00757 4.74291 5.44625 Archaeoglo 3.48951 3.50051 3.84670 3.41775 3.43941 0.00000 0.93021 4.06017 3.69097 4.48167 3.92585 3.89301 4.86535 4.54979 4.31656 4.35658 5.25493 4.75461 4.81530 3.38942 4.14664 5.31303 Thermotoga 3.29380 3.57881 3.77254 3.39827 3.45143 0.93021 0.00000 3.73107 4.09851 3.90019 4.35819 4.50848 4.23834 4.10973 3.87797 4.12421 5.12903 4.62746 4.87378 2.81107 4.02035 5.22026 Ecoli2 4.48460 3.97644 3.95552 3.91865 4.23202 4.06017 3.73107 0.00000 0.59929 0.78261 3.74813 3.93543 4.20176 4.71690 4.55219 3.25250 5.11027 4.56233 4.53734 3.26957 3.54212 4.15203 Haemophilu 4.28685 3.91049 3.98232 4.29903 4.50731 3.69097 4.09851 0.59929 0.00000 0.93588 4.32347 4.24767 4.07965 4.73424 4.97243 3.77463 5.50817 4.33283 4.83485 3.11759 3.75660 3.90609 Bordetella 4.07264 4.98791 5.29393 4.86322 5.11291 4.48167 3.90019 0.78261 0.93588 0.00000 4.12234 4.16030 4.89163 5.03027 4.77266 3.37906 5.25252 4.79616 4.56294 3.95563 3.59873 3.90923 Mycobacter 4.58700 4.96229 4.73465 5.45979 6.07604 3.92585 4.35819 3.74813 4.32347 4.12234 0.00000 0.15425 2.58293 2.71017 2.60288 2.38937 3.27621 5.93278 3.91642 4.63282 4.88311 6.32729 Mycobacte2 4.94711 4.93314 4.66157 5.63595 6.15371 3.89301 4.50848 3.93543 4.24767 4.16030 0.15425 0.00000 2.65044 2.80664 2.53295 2.65326 3.23996 5.89442 3.78841 4.82746 4.91334 5.83594 Schizosacc 6.11667 5.44465 4.98106 7.05172 6.38560 4.86535 4.23834 4.20176 4.07965 4.89163 2.58293 2.65044 0.00000 0.82212 1.08925 2.44206 2.96487 6.00698 2.36001 4.67732 5.08017 6.94463 Saccharomy 6.05218 6.06242 6.22613 7.31396 6.98403 4.54979 4.10973 4.71690 4.73424 5.03027 2.71017 2.80664 0.82212 0.00000 1.22213 3.24469 3.45151 5.56073 2.59949 4.63752 6.19739 7.26597 Celegans 4.95084 4.79540 5.25441 5.33835 5.73038 4.31656 3.87797 4.55219 4.97243 4.77266 2.60288 2.53295 1.08925 1.22213 0.00000 2.71306 3.22623 4.61072 2.60857 3.88653 5.24713 5.62339 Deinococcu 4.85649 3.99153 5.12427 4.57725 5.76105 4.35658 4.12421 3.25250 3.77463 3.37906 2.38937 2.65326 2.44206 3.24469 2.71306 0.00000 3.66026 4.10606 4.17474 4.18069 3.85099 4.86462 Chlamydoph 5.14940 6.05105 5.61193 6.31218 6.64413 5.25493 5.12903 5.11027 5.50817 5.25252 3.27621 3.23996 2.96487 3.45151 3.22623 3.66026 0.00000 6.25298 3.60922 5.13422 5.55920 6.00945 Bacillus3 2.80053 5.23505 4.96127 5.15762 5.30582 4.75461 4.62746 4.56233 4.33283 4.79616 5.93278 5.89442 6.00698 5.56073 4.61072 4.10606 6.25298 0.00000 5.55246 4.15241 4.31026 5.59350 Saccharom2 5.62937 5.58967 6.02283 6.73578 6.01536 4.81530 4.87378 4.53734 4.83485 4.56294 3.91642 3.78841 2.36001 2.59949 2.60857 4.17474 3.60922 5.55246 0.00000 3.97211 4.78061 5.28097 Listeria 3.26967 3.83575 4.35114 3.95787 4.00757 3.38942 2.81107 3.26957 3.11759 3.95563 4.63282 4.82746 4.67732 4.63752 3.88653 4.18069 5.13422 4.15241 3.97211 0.00000 3.27426 4.34631 Ecoli 3.92624 4.42440 4.41493 4.97127 4.74291 4.14664 4.02035 3.54212 3.75660 3.59873 4.88311 4.91334 5.08017 6.19739 5.24713 3.85099 5.55920 4.31026 4.78061 3.27426 0.00000 4.38124 Pyrococcus 4.98914 4.72146 5.57676 5.79450 5.44625 5.31303 5.22026 4.15203 3.90609 3.90923 6.32729 5.83594 6.94463 7.26597 5.62339 4.86462 6.00945 5.59350 5.28097 4.34631 4.38124 0.00000 Average distance (over all possible pairs of sequences): 4.32585 TREE SEARCH Quartet puzzling is used to choose from the possible tree topologies and to simultaneously infer support values for internal branches. Number of puzzling steps: 1000 Analysed quartets: 7315 Unresolved quartets: 1289 (= 17.6%) Quartet trees are based on approximate maximum likelihood values using the selected model of substitution and rate heterogeneity. QUARTET PUZZLING TREE Support for the internal branches of the unrooted quartet puzzling tree topology is shown in percent. This quartet puzzling tree is not completely resolved! :---Mycobacter :-------------79: : :---Mycobacte2 : : :---Schizosacc : :-95: : :-90: :---Saccharomy :-93: : : : : :-59: :-------Celegans : : : : : :-72: :-----------Saccharom2 : : : : : :---------------Chlamydoph : : : :-------------------Deinococcu : : :---Archaeoglo : :-----96: : : :---Thermotoga : : :---------66: :---Bacillus :-90: : :-94: : : : : :---Bacillus2 : : :-83: : : : :---Lactococcu : : :-94: : : :---Lactobacil : : : : :---Listeria : :-----------------63: : : :---Ecoli : : : : :---Ecoli2 : : :-89: : : :-64: :---Haemophilu : : : : : :---------59: :-------Bordetella : : : :-----------Pyrococcus : :---------------------------Bacillus3 : :---------------------------Bacillus4 Quartet puzzling tree (in CLUSTAL W notation): (Bacillus4,(((Mycobacter,Mycobacte2)79,((((Schizosacc,Saccharomy)95, Celegans)90,Saccharom2)59,Chlamydoph)72,Deinococcu)93,( (Archaeoglo,Thermotoga)96,((Bacillus,Bacillus2)94,(Lactococcu, Lactobacil)94)83)66,(Listeria,Ecoli)63,(((Ecoli2,Haemophilu)89, Bordetella)64,Pyrococcus)59)90,Bacillus3); BIPARTITIONS The following bipartitions occured at least once in all intermediate trees that have been generated in the 1000 puzzling steps: Bipartitions included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) *****..*** ********** ** : 956 ********** **..****** ** : 948 *..******* ********** ** : 940 ***..***** ********** ** : 937 ********** .......*.* ** : 926 ********** **...***** ** : 900 *......... .......*.. .. : 899 *******..* ********** ** : 891 *....***** ********** ** : 828 ********** ..******** ** : 790 ********** **...*.*.* ** : 718 *......*** ********** ** : 656 *******... ********** ** : 644 ********** *********. .* : 628 ********** **...***.* ** : 591 *******... ********** *. : 588 Bipartitions not included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) ********** .....*.*.* ** : 498 ********** ..***.**** ** : 452 *......... .......*.* *. : 327 ********** ******.*.* ** : 320 *......*** .......*.* ** : 236 *******... .......*.* *. : 221 *******... *********. .. : 218 *********. ********** *. : 200 ********** .****.**** ** : 197 *******... .......*.. .. : 189 *......... .......*.* .. : 158 ***....*** ********** ** : 151 *******... ********** .. : 123 ********** *********. .. : 111 ********** ..****.*** ** : 109 *....**... .......*.. .. : 107 ********** ********** .. : 106 *......... .......*.* ** : 85 ********** ..***..*** ** : 81 *******... .......*.* .. : 78 (257 other less frequent bipartitions not shown) MAXIMUM LIKELIHOOD BRANCH LENGTHS ON QUARTET PUZZLING TREE (NO CLOCK) Branch lengths are computed using the selected model of substitution and rate heterogeneity. :-11 Mycobacter :-----23 : :-12 Mycobacte2 :-----28 : : :--13 Schizosacc : : :--24 : : : :--14 Saccharomy : : :--25 : : : :--15 Celegans : : :-26 : : : :------19 Saccharom2 : :-27 : : :--------17 Chlamydoph : : : :------16 Deinococcu : : :--6 Archaeoglo : :----29 : : :--7 Thermotoga :--33 : : :--2 Bacillus : : :--30 : : : :--3 Bacillus2 : :------32 : : :--4 Lactococcu : :-31 : :---5 Lactobacil :----38 : : :------20 Listeria : :-34 : : :--------21 Ecoli : : : : :-8 Ecoli2 : : :-35 : : : :--9 Haemophilu : : :----36 : : : :--10 Bordetella : :---37 : :----------22 Pyrococcus : :-------18 Bacillus3 : :----1 Bacillus4 branch length S.E. branch length S.E. Bacillus4 1 0.98087 0.15579 23 1.36609 0.15018 Bacillus 2 0.34124 0.04982 24 0.29867 0.05798 Bacillus2 3 0.39832 0.05129 25 0.37923 0.08719 Lactococcu 4 0.46943 0.06092 26 0.20000 0.08867 Lactobacil 5 0.64709 0.06962 27 0.17584 0.09546 Archaeoglo 6 0.50792 0.07069 28 1.27972 0.18189 Thermotoga 7 0.43068 0.06834 29 0.86248 0.14437 Ecoli2 8 0.24424 0.04068 30 0.43627 0.06975 Haemophilu 9 0.37301 0.04755 31 0.16865 0.05933 Bordetella 10 0.52041 0.06135 32 1.56538 0.19461 Mycobacter 11 0.04612 0.01776 33 0.44569 0.12881 Mycobacte2 12 0.10632 0.01961 34 0.18195 0.12061 Schizosacc 13 0.39021 0.05431 35 0.07369 0.04797 Saccharomy 14 0.42139 0.05381 36 0.96324 0.17552 Celegans 15 0.44760 0.06158 37 0.78862 0.17751 Deinococcu 16 1.59992 0.21367 38 0.98466 0.17349 Chlamydoph 17 1.98911 0.19018 Bacillus3 18 1.73984 0.20019 Saccharom2 19 1.55793 0.15187 Listeria 20 1.38198 0.16992 Ecoli 21 1.93315 0.21073 16 iterations until convergence Pyrococcus 22 2.62860 0.29164 log L: -21953.86 Quartet puzzling tree with maximum likelihood branch lengths (in CLUSTAL W notation): (Bacillus4:0.98087,(((Mycobacter:0.04612,Mycobacte2:0.10632)79:1.36609, ((((Schizosacc:0.39021,Saccharomy:0.42139)95:0.29867,Celegans:0.44760) 90:0.37923,Saccharom2:1.55793)59:0.20000,Chlamydoph:1.98911)72:0.17584, Deinococcu:1.59992)93:1.27972,((Archaeoglo:0.50792,Thermotoga:0.43068) 96:0.86248,((Bacillus:0.34124,Bacillus2:0.39832)94:0.43627,(Lactococcu:0.46943, Lactobacil:0.64709)94:0.16865)83:1.56538)66:0.44569,(Listeria:1.38198, Ecoli:1.93315)63:0.18195,(((Ecoli2:0.24424,Haemophilu:0.37301)89:0.07369, Bordetella:0.52041)64:0.96324,Pyrococcus:2.62860)59:0.78862)90:0.98466, Bacillus3:1.73984); TIME STAMP Date and time: Fri Jun 18 21:36:17 1999 Runtime: 6892 seconds (= 114.9 minutes = 1.9 hours)