!! Work on your student project !!

    • Compile a data set

    • Align sequences. Modify alignment by hand, if necessary. You might consider to generate a version of your alignment with gaps replaced by missing characters and one in which all gap containing columns are removed.

    • Reconstruct phylogenies using at least three different approaches.

    • Which substitution model best describes the evolution of your data (with or without ASRV).

    • Possible extension:

    • How do your data analyses compare to the small ribosomal RNA subunits? 16S/18SrRNA has become the gold standard for molecular markers. It is rather easy to compile datasets of 16SrRNA from the same set of species that are represented in your dataset (see below).

 

If you run out of things to do on your student project:

TREE-PUZZLE (cont.)

Perform the analysis listed under (B) in class 11 and repeat the analysis with 16S rRNAs.

The dataset rdp.fa contains the aligned ribosomal RNA sequences (retrieved from RDP database) from the following prokaryotes:

ARCHAEA:

Sulfolobus acidocaldarius (70 °C),
Sulfolobus solfataricus (70-85 °C),
Archaeoglobus fulgidus (83 °C),
Methanosarcina barkeri (30-37 °C),
Methanosarcina mazeii (37°C),
Methanococcus jannaschii (80°C),
Haloferax volcanii (37°C),
Halobacterium salinarium (ca37°C),
Methanobacterium thermoautotrophicum (60-65°C),
Desulfurococcus sp. (85-90 °C),
Thermococcus sp (75+ °C),
Aeropyrum pernix (90°C),
Thermoplasma acidophilum (55-60°C),
Pyrococcus abysii and Pyrococcus horikoshii  (95°C)

BACTERIA:

Enterococcus hirae (37°C),
Borrelia burgdorferi (33-37°C),
Thermus thermophilus (70-80°C),
Deinococcus radiodurans (30°C),
Chlamydia trachomatis,
Chlamydophila pneumoniae.

DOES the environmental temperature influence the amount of among site rate variation? To address this question we can estimate the shape parameter (=the amount of ASRV) separately for subsets of sequences from thermo and mesophilic organisms.

Using ClustalX generate three data sets:
4-7 prokaryotes with a growth temperature above 50°C
a comparable** group of 4-7 prokaryotes with a growth temperature below 50°C
a data set containing both of the above groups.

** Calculate a tree (puzzle distance matrix, then NEIGHBOR would be fine) and choose sequences for your thermophile and mesophile subsets so that they have similar relationships.

Analyse all three of these data sets using TREE-PUZZLE (use the default options, but select a rate heterogeneity model described by a Gamma distribution with 8 rate categories - have TREE-PUZZLE estimate the shape parameter)

Is the among site rate variation different for the two sets of species? Discuss your findings.

Do your results for the rRNA dataset differ from the results you obtained using the ATPase dataset?

Work on your own dataset (student project)!!!