Part I. From the depicted graphic overview, it is clear that there is no match that corresponds to the whole length of the query sequence. Rather there is a match to the middle of one protein (the best hit, top row), and the rest of the top hits only have matches to the beginning and the end of the query sequence. (The gray shaded area in the middle corresponds to "no match at all"). The latter matches (the ones that align to the beginning and the end) are from PROBABLE TRANSCRIPTION TERMINATION FACTOR RHO. The middle portion of the top hit corresponds to part of clpP subunit of Clp protease in Chlamydomonas eugametos: Query: 184 KCLTSDFHTVLTTRGFIPIADVTLDDKVAVLDNNTGDMSYQNPQKVHKYDYDGPMYDVKT 243 +CLTSD HTVLTTRG+IPIADVTLDDKVAVLDNNTG+MSYQNPQKVHKYDY+GPMY+VKT Sbjct: 447 ECLTSD-HTVLTTRGWIPIADVTLDDKVAVLDNNTGEMSYQNPQKVHKYDYEGPMYEVKT 505 Query: 244 AGVELFVTPNHRMYVXXXXXXXXQNGYNLVEASSIFGKKVRYKNDAIWNKTDYQFILPDT 303 AGV+LFVTPNHRMYV YNLVEASSIFGKKVRYKNDAIWNKTDYQFILP+T Sbjct: 506 AGVDLFVTPNHRMYV-NTTNNTTNQNYNLVEASSIFGKKVRYKNDAIWNKTDYQFILPET 564 Query: 304 ATLTGHTNKISSTPAIQPDMNAFL--FGLWIANG---KIADKTAENNQQKQRWKVILTQV 358 ATLTGHTNKISSTPAIQP+MNA+L FGLWIANG KIA+KTAENNQQKQR+KVILTQV Sbjct: 565 ATLTGHTNKISSTPAIQPEMNAWLTFFGLWIANGHTTKIAEKTAENNQQKQRYKVILTQV 624 Query: 359 KEDVCEIIEQTLNKLGFNFIRSGKDYTIENKQLFSYLNPFENGALNKYLPDFA-DLTT-- 415 KEDVC+IIEQTLNKLGFNFIRSGKDYTIENKQL+SYLNPF+NGALNKYLPD+ +L++ Sbjct: 625 KEDVCDIIEQTLNKLGFNFIRSGKDYTIENKQLWSYLNPFDNGALNKYLPDWVWELSSQQ 684 Query: 416 CKIL-----------TKASNKAYYFSTSERFANDVSRLALSASHAGTTSTGGLDAAPSNL 464 CKIL TK + +YFSTSERFANDVSRLAL HAGTTST L+AAPSNL Sbjct: 685 CKILLNSLCLGNCLFTKNDDTLHYFSTSERFANDVSRLAL---HAGTTSTIQLEAAPSNL 741 Query: 465 YDTIIGLPTMERVTEYRVIVNQSSFFSYSTDKTTVLNLS--------AQSALSLEQNSQK 516 YDTIIGLP T +RVI+NQSSF+SYSTDK++ LNLS AQSAL+LEQNSQK Sbjct: 742 YDTIIGLPVEVNTTLWRVIINQSSFYSYSTDKSSALNLSNNVACYVNAQSALTLEQNSQK 801 Query: 517 INKNTLVLTKNNVKSQT-HSQRAERWDTALLTQKELDNSLNHDILINKNPGTSQLECVVN 575 INKNTLVLTKNNVKSQT HSQRAER DTALLTQKELDNSLNH+ILINKNPGTSQLECVVN Sbjct: 802 INKNTLVLTKNNVKSQTMHSQRAERVDTALLTQKELDNSLNHEILINKNPGTSQLECVVN 861 Query: 576 PEVNNTSTNDRFVYWKGPVYSLTGPNNVFYVQR-GKAVWTHNT 617 PEVNNTSTNDRFVY+KGPVY LTGPNNVFYVQR GKAVWT N+ Sbjct: 862 PEVNNTSTNDRFVYYKGPVYCLTGPNNVFYVQRNGKAVWTGNS 904
As an alternative, a PSI-BLAST search could be performed to detect the
function of the middle part (different inteins will be returned as the
results of the search).
The resulting phylogenetic reconstruction (clustalx) shows different
clusters for the proteins with different function (A/V-ATPase catalytic
subunits in red, F-ATPase catalytic subunits in blue, ..., rho termination
factors in turquoise): The extein from the unknown ORF groups within the group of Rho termination factors, and hence most probably is an ortholog to the Rho termination factor. |
Part II. Again, the bacteria do not form a single group. Another possible analysis would be to calculate a bootstrap analysis using protein parsimony as implemented in PHYLIP. The section below is copied from the outfile. Again, most of the eukaryotes were deleted, as were the bacterial sequences that constitute long branches in the above analyses. All gaps in the input file were replaced by "?". The following programs were used in sequence: SEQBOOT, PROTPARS, CONSENSE. Below is a section from the outfile from CONSENSE: Consensus tree program,
version 3.6a2.1 Extended majority rule
consensus tree +------Desulfuroc +100.0-| | +------Thermococc +100.0-| | | +------hPyrococcu | +-99.0-| +-64.5-| +------aPyrococcu | | | | +-------------Aeropyrum | +100.0-| +-61.9-| | +------sSulfolobu | | +100.0-| | | +------aSulfolobu | | | | +------Methanobac | +---------------74.2-| +-45.5-| +------jMethanoco | | | | +------Thermoplas | | +--------------100.0-| | | | +------Ferroplasm | | | | | | +------pnStreptoc | +-60.2-| +-100.0-| | | | +------pyStreptoc +-91.3-| | +-99.0-| | | | | | +------hEnterococ | | +100.0-| +100.0-| | | | +------fEnterococ | | | | | +--------------------Clostridiu | | | | +------Halobacter | | +--------100.0-| +100.0-| | | +------Haloferax | | +---------------71.1-| | | | +------MMethanosa | | | +100.0-| | | +-43.8-| +------BMethanosa | | | +------| | +-------------Archaeoglo | | | | | | +------Deinococcu | | +-----------------------------------100.0-| | | +------Thermus | | | +-------------------------------------------------------Trichomona | +--------------------------------------------------------------Giardia remember: this is an unrooted tree!
Try #1: First we try the Deinococcus/Thermus group as distinct from pyStreptoc, pnStreptoc, hEnterococ, fEnterococ, Clostridium: LIKELIHOOD MAPPING ANALYSIS Number of quartets: 260 (all possible) Quartet trees are based on approximate maximum likelihood values using the selected model of substitution and rate heterogeneity. Sequences are grouped in 4 clusters. Cluster a: 2 sequences Deinococcu Thermus Cluster b: 13 sequences Thermococc Desulfuroc aPyrococcu hPyrococcu jMethanoco Methanobac Ferroplasm Thermoplas BMethanosa MMethanosa Haloferax Halobacter Archaeoglo Cluster c: 5 sequences pyStreptoc pnStreptoc hEnterococ fEnterococ Clostridiu Cluster d: 2 sequences Trichomona Giardia Quartets of sequences used in the likelihood mapping analysis are generated by drawing one sequence from each of the clusters a, b, c, and d. LIKELIHOOD MAPPING STATISTICS Occupancies of the three areas 1, 2, 3: (a,b)-(c,d) /\ / \ / \ / 1 \ / \ / \ / \ / \ / \/ \ / 3 : 2 \ / : \ /__________________\ (a,d)-(b,c) (a,c)-(b,d) Number of quartets in region 1: 42 (= 16.2%) Number of quartets in region 2: 8 (= 3.1%) Number of quartets in region 3: 210 (= 80.8%) Occupancies of the seven areas 1, 2, 3, 4, 5, 6, 7: (a,b)-(c,d) /\ / \ / 1 \ / \ / \ / /\ \ / 6 / \ 4 \ / / 7 \ \ / \ /______\ / \ / 3 : 5 : 2 \ /__________________\ (a,d)-(b,c) (a,c)-(b,d) Number of quartets in region 1: 39 (= 15.0%) left: 24 right: 15 Number of quartets in region 2: 5 (= 1.9%) bottom: 4 top: 1 Number of quartets in region 3: 199 (= 76.5%) bottom: 76 top: 123 Number of quartets in region 4: 4 (= 1.5%) bottom: 2 top: 2 Number of quartets in region 5: 3 (= 1.2%) left: 3 right: 0 Number of quartets in region 6: 6 (= 2.3%) bottom: 5 top: 1 Number of quartets in region 7: 4 (= 1.5%) Only 3.1% of the quartets support a single bacterial group, and only 1.2% do so strongly. In contrast the split of the bacterial sequences into two groups is strongly supported by 199 quartets (=76.5%).
Try #2: The clusters are chosen to ask the question: does 1Treponema
group with the Crenarchaeotic sequences? LIKELIHOOD MAPPING ANALYSIS Number of quartets: 260 (all possible) Quartet trees are based on approximate maximum likelihood values using the selected model of substitution and rate heterogeneity. Sequences are grouped in 4 clusters. Cluster a: 1 sequences 1Treponema Cluster b: 13 sequences Thermococc Desulfuroc aPyrococcu hPyrococcu jMethanoco Methanobac Ferroplasm Thermoplas BMethanosa MMethanosa Haloferax Halobacter Archaeoglo Cluster c: 5 sequences pyStreptoc pnStreptoc hEnterococ fEnterococ Clostridiu Cluster d: 4 sequences aSulfolobu sSulfolobu Aeropyrum Pyrobaculu Quartets of sequences used in the likelihood mapping analysis are generated by drawing one sequence from each of the clusters a, b, c, and d. LIKELIHOOD MAPPING STATISTICS Occupancies of the three areas 1, 2, 3: (a,b)-(c,d) /\ / \ / \ / 1 \ / \ / \ / \ / \ / \/ \ / 3 : 2 \ / : \ /__________________\ (a,d)-(b,c) (a,c)-(b,d) Number of quartets in region 1: 0 (= 0.0%) Number of quartets in region 2: 28 (= 10.8%) Number of quartets in region 3: 232 (= 89.2%) Occupancies of the seven areas 1, 2, 3, 4, 5, 6, 7: (a,b)-(c,d) /\ / \ / 1 \ / \ / \ / /\ \ / 6 / \ 4 \ / / 7 \ \ / \ /______\ / \ / 3 : 5 : 2 \ /__________________\ (a,d)-(b,c) (a,c)-(b,d) Number of quartets in region 1: 0 (= 0.0%) left: 0 right: 0 Number of quartets in region 2: 20 (= 7.7%) bottom: 20 top: 0 Number of quartets in region 3: 218 (= 83.8%) bottom: 171 top: 47 Number of quartets in region 4: 0 (= 0.0%) bottom: 0 top: 0 Number of quartets in region 5: 22 (= 8.5%) left: 14 right: 8 Number of quartets in region 6: 0 (= 0.0%) bottom: 0 top: 0 Number of quartets in region 7: 0 (= 0.0%) 7.7% of the quartets group 1 Treponema with the other bacteria. Removing long sequneces form cluster b might improve the result further. Based on results of ML mapping analyses, especially try #1, the tentative
conclusion is that the bacterial A-type ATPase catalytic sequences do
not form a single group. Within the Eukaryotes and within the Archaea the phylogeny of the V/A-ATPases agrees very nicely with the evolution of other conserved molecular markers (rRNA, elongation factors). The F-ATPase bacterial F-ATPases also were found to agree with rRNA to a surprising extend. In contrast the A/V-ATPase subunits found in some bacteria have a phylogeny that does not agree well with other markers, and one indication is that these bacterial subunits do not form a strongly supported group compared to the Archaeal and Eucaryal homologs. The presence of archaeal type ATPases in bacteria can be explained two
ways: The performed analysis indicate that horizontal gene transfer impacted the distribution of A-ATPases among bacteria, but more analysis need to be done to test different alternatives. For example, at present it remains unclear if the A-ATPases in the Deinococcaceae and the Borrelia and Chlamydia ATPases have resulted from a single transfer event. Methods one might use include ml ratio tests and the calculation of posterior probabilities.
|