PUZZLE 4.0.2 Type of analysis: tree reconstruction Parameter estimation: approximate (faster) Parameter estimation uses: neighbor-joining tree (for substitution process and rate variation) Standard errors (S.E.) are obtained by the curvature method. The upper and lower bounds of an approximate 95% confidence interval for parameter or branch length x are x-1.96*S.E. and x+1.96*S.E. SEQUENCE ALIGNMENT Input data: 12 sequences with 682 amino acid sites Number of constant sites: 12 (= 1.8% of all sites) SUBSTITUTION PROCESS Model of substitution: JTT (Jones et al. 1992) Amino acid frequencies (estimated from data set): pi(A) = 8.7% pi(R) = 5.9% pi(N) = 3.6% pi(D) = 6.0% pi(C) = 1.6% pi(Q) = 4.1% pi(E) = 5.1% pi(G) = 8.6% pi(H) = 2.9% pi(I) = 5.2% pi(L) = 8.5% pi(K) = 3.6% pi(M) = 1.8% pi(F) = 3.7% pi(P) = 5.8% pi(S) = 8.6% pi(T) = 5.5% pi(W) = 1.5% pi(Y) = 3.4% pi(V) = 6.1% RATE HETEROGENEITY Model of rate heterogeneity: Gamma distributed rates Gamma distribution parameter alpha (estimated from data set): 1.48 (S.E. 0.04) Number of Gamma rate categories: 8 Rates and their respective probabilities used in the likelihood function: Category Relative rate Probability 1 0.1422 0.1250 2 0.3279 0.1250 3 0.5079 0.1250 4 0.7039 0.1250 5 0.9334 0.1250 6 1.2248 0.1250 7 1.6471 0.1250 8 2.5129 0.1250 Categories 1-8 approximate a continous Gamma-distribution with expectation 1 and variance 0.68. Combination of categories that contributes the most to the likelihood (computation done without clock assumption assuming quartet-puzzling tree): 8 3 3 8 5 8 8 8 7 8 8 8 8 8 8 8 3 6 8 8 4 8 1 5 7 8 8 8 6 8 8 7 8 1 3 8 8 8 8 8 8 8 1 3 8 8 8 8 8 8 8 8 8 8 8 8 8 3 3 5 8 8 3 7 1 8 8 8 8 1 8 8 8 1 8 8 6 1 1 8 8 7 5 8 6 6 8 7 7 3 7 3 8 8 1 8 8 7 8 8 8 8 6 1 7 8 6 5 7 4 3 4 2 7 1 4 4 2 7 5 6 6 2 1 2 3 1 3 1 4 2 1 7 8 8 8 8 5 3 1 3 3 5 4 4 1 1 1 2 2 2 1 2 3 6 4 1 5 8 7 3 3 7 5 2 1 1 1 6 8 5 6 1 3 1 4 3 4 1 1 3 2 4 1 3 2 2 2 1 1 4 1 2 2 5 1 1 2 1 1 5 2 6 3 5 7 3 8 1 3 3 7 3 1 3 3 1 1 7 5 2 3 4 1 2 1 6 8 2 8 2 8 2 8 8 8 2 1 8 8 2 8 8 6 2 2 7 8 5 2 3 3 2 1 2 6 5 1 4 1 2 3 8 8 1 2 1 2 1 1 1 2 3 2 2 5 3 7 8 7 6 2 4 7 2 6 4 5 3 6 4 2 2 2 4 2 3 1 1 7 3 1 2 6 4 6 2 2 1 2 2 2 5 4 1 1 3 5 6 6 1 4 6 1 7 6 6 6 5 1 7 7 1 2 7 6 1 1 3 5 1 3 3 1 1 1 2 4 8 3 6 4 7 2 1 4 4 7 8 1 5 8 8 5 3 4 8 8 8 3 8 5 6 6 8 8 4 6 6 8 7 7 8 6 8 8 4 1 1 4 1 8 3 7 6 8 2 8 8 1 8 8 8 8 8 3 4 6 7 8 3 7 2 6 3 1 8 8 8 8 8 8 7 5 4 6 4 1 5 8 8 8 7 6 1 8 8 1 8 8 8 8 8 1 8 8 4 1 7 8 8 8 8 8 8 7 5 1 1 8 7 3 5 3 3 2 3 1 1 1 5 2 1 3 5 1 1 4 8 3 2 1 1 1 1 2 2 6 3 2 5 2 2 2 4 3 1 6 4 5 4 3 5 8 6 8 8 8 8 1 8 8 8 1 8 1 8 8 8 8 8 6 6 8 8 8 1 8 8 8 8 8 1 8 1 8 8 8 8 8 8 6 8 1 8 8 8 8 6 8 1 8 8 1 7 8 7 6 8 1 8 8 8 7 6 6 8 5 3 1 4 8 4 5 1 6 1 5 4 2 4 3 3 6 1 1 7 1 5 3 7 8 4 3 7 2 6 3 7 8 2 6 6 4 8 4 7 8 1 8 7 8 6 2 8 7 5 8 1 8 6 1 4 1 1 4 1 1 6 5 8 7 8 5 6 4 8 8 4 5 7 6 3 8 8 8 8 8 8 5 7 7 8 8 8 8 8 8 8 8 4 8 8 5 8 5 4 8 5 8 1 6 8 1 1 8 1 5 4 1 7 1 SEQUENCES IN INPUT ORDER 5% chi-square test p-value Emericella passed 86.75% [3] Schizosacc passed 43.91% [14] Ascobolus passed 14.62% [9] Schizosac2 passed 25.36% [10] Saccharomy passed 18.48% [5] Mycobacter failed 0.01% [2] Mycobacte2 failed 1.67% [2] Corynebact passed 54.96% [5] Haemophilu failed 0.43% [3] Leptospira passed 46.78% [6] Acremonium passed 57.82% [7] Deinococcu failed 0.00% [2] The chi-square tests compares the amino acid composition of each sequence to the frequency distribution assumed in the maximum likelihood model. The number in square brackets indicates how often each sequence is involved in one of the 17 completely unresolved quartets of the quartet puzzling tree search. IDENTICAL SEQUENCES The sequences in each of the following groups are all identical. To speed up computation please remove all but one of each group from the data set. All sequences are unique. MAXIMUM LIKELIHOOD DISTANCES Maximum likelihood distances are computed using the selected model of substitution and rate heterogeneity. 12 Emericella 0.00000 0.89136 2.23526 2.42569 2.29186 2.12958 2.22050 2.05328 1.87777 1.73618 2.64961 7.08600 Schizosacc 0.89136 0.00000 2.31632 2.44461 2.48691 2.25984 2.33683 2.09036 2.05932 2.20187 2.74069 6.69395 Ascobolus 2.23526 2.31632 0.00000 0.81537 0.89059 1.97950 1.96036 1.80975 1.89040 1.93072 1.19308 8.99838 Schizosac2 2.42569 2.44461 0.81537 0.00000 0.99382 2.24983 2.12429 1.90703 2.01816 2.00990 1.21063 8.99838 Saccharomy 2.29186 2.48691 0.89059 0.99382 0.00000 2.03466 2.08891 1.84931 1.92029 1.88087 1.34454 8.99849 Mycobacter 2.12958 2.25984 1.97950 2.24983 2.03466 0.00000 0.16796 1.06872 2.16401 2.11198 2.01691 6.65312 Mycobacte2 2.22050 2.33683 1.96036 2.12429 2.08891 0.16796 0.00000 1.02651 2.07532 2.06717 1.94742 7.39097 Corynebact 2.05328 2.09036 1.80975 1.90703 1.84931 1.06872 1.02651 0.00000 2.05830 1.81727 1.81361 7.55402 Haemophilu 1.87777 2.05932 1.89040 2.01816 1.92029 2.16401 2.07532 2.05830 0.00000 1.34980 2.09703 7.80267 Leptospira 1.73618 2.20187 1.93072 2.00990 1.88087 2.11198 2.06717 1.81727 1.34980 0.00000 2.22831 8.99896 Acremonium 2.64961 2.74069 1.19308 1.21063 1.34454 2.01691 1.94742 1.81361 2.09703 2.22831 0.00000 8.99832 Deinococcu 7.08600 6.69395 8.99838 8.99838 8.99849 6.65312 7.39097 7.55402 7.80267 8.99896 8.99832 0.00000 Average distance (over all possible pairs of sequences): 2.90505 TREE SEARCH Quartet puzzling is used to choose from the possible tree topologies and to simultaneously infer support values for internal branches. Number of puzzling steps: 1000 Analysed quartets: 495 Unresolved quartets: 17 (= 3.4%) Quartet trees are based on approximate maximum likelihood values using the selected model of substitution and rate heterogeneity. QUARTET PUZZLING TREE Support for the internal branches of the unrooted quartet puzzling tree topology is shown in percent. This quartet puzzling tree is not completely resolved! :---Ascobolus :-87: :-92: :---Schizosac2 : : :100: :-------Saccharomy : : : :-----------Acremonium : : :---Haemophilu :-68:---------93: : : :---Leptospira : : : : :---Mycobacter :-64: : :100: : : :-----78: :---Mycobacte2 : : : : : :-------Corynebact : : : :-------------------Deinococcu : :-----------------------Schizosacc : :-----------------------Emericella Quartet puzzling tree (in CLUSTAL W notation): (Emericella,(((((Ascobolus,Schizosac2)87,Saccharomy)92,Acremonium)100, (Haemophilu,Leptospira)93,((Mycobacter,Mycobacte2)100,Corynebact)78)68, Deinococcu)64,Schizosacc); BIPARTITIONS The following bipartitions occured at least once in all intermediate trees that have been generated in the 1000 puzzling steps: Bipartitions included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) **...***** .* : 999 *****..*** ** : 998 ********.. ** : 933 **...***** ** : 924 **..****** ** : 871 *****...** ** : 783 **........ .* : 675 **........ .. : 639 Bipartitions not included in the quartet puzzling tree: (bipartition with sequences in input order : number of times seen) **......** .* : 428 **...***.. .* : 389 *.******** *. : 312 *****...** *. : 197 **......** .. : 196 **...**... .* : 115 **.*.***** ** : 112 *****..*** *. : 102 *.......** .. : 55 **..****** .* : 52 **...**.** .* : 36 **...****. .* : 34 *****..... ** : 24 **.......* .* : 20 *.***...** *. : 16 ***.****** .* : 13 *.***..*** *. : 11 ***..***** ** : 10 ****.***** .* : 10 **.......* .. : 7 (15 other less frequent bipartitions not shown) MAXIMUM LIKELIHOOD BRANCH LENGTHS ON QUARTET PUZZLING TREE (NO CLOCK) Branch lengths are computed using the selected model of substitution and rate heterogeneity. :--3 Ascobolus :-13 : :--4 Schizosac2 :-14 : :--5 Saccharomy :---15 : :---11 Acremonium :----19 : : :---9 Haemophilu : :-16 : : :---10 Leptospira : : : : :-6 Mycobacter : : :---17 : : : :-7 Mycobacte2 : :--18 : :--8 Corynebact :-20 : :---------------------------12 Deinococcu : :---2 Schizosacc : :-1 Emericella branch length S.E. branch length S.E. Emericella 1 0.25773 0.06109 13 0.03319 0.03902 Schizosacc 2 0.68003 0.07132 14 0.24675 0.06726 Ascobolus 3 0.37543 0.05042 15 0.74046 0.11001 Schizosac2 4 0.52434 0.05771 16 0.33693 0.08815 Saccharomy 5 0.60680 0.06387 17 0.70396 0.09408 Mycobacter 6 0.13951 0.02675 18 0.62105 0.10519 Mycobacte2 7 0.05884 0.02351 19 1.11265 0.13535 Corynebact 8 0.52062 0.08360 20 0.02121 0.12852 Haemophilu 9 0.94098 0.11870 Leptospira 10 0.77780 0.10665 Acremonium 11 0.77443 0.09168 51 iterations until convergence Deinococcu 12 8.99867 1.47695 log L: -10296.61 WARNING --- at least one brach length is close to an internal boundary! Quartet puzzling tree with maximum likelihood branch lengths (in CLUSTAL W notation): (Emericella:0.25773,(((((Ascobolus:0.37543,Schizosac2:0.52434)87:0.03319, Saccharomy:0.60680)92:0.24675,Acremonium:0.77443)100:0.74046,(Haemophilu:0.94098, Leptospira:0.77780)93:0.33693,((Mycobacter:0.13951,Mycobacte2:0.05884) 100:0.70396,Corynebact:0.52062)78:0.62105)68:1.11265,Deinococcu:8.99867) 64:0.02121,Schizosacc:0.68003); TIME STAMP Date and time: Fri Jun 18 15:27:51 1999 Runtime: 1262 seconds (= 21.0 minutes = 0.4 hours)