!! Work on your student project !!
-
Compile a data set
-
Align sequences. Modify
alignment by hand, if necessary. You might consider to generate
a version of your alignment with gaps replaced by missing characters
and one in which all gap containing columns are removed.
-
Reconstruct phylogenies using at
least three different approaches.
-
Which substitution model best describes
the evolution of your data (with or without ASRV).
-
Possible extension:
-
How do your data analyses compare
to the small ribosomal RNA subunits? 16S/18SrRNA has become the
gold standard for molecular markers. It is rather easy to compile
datasets of 16SrRNA from the same set of species that are represented
in your dataset (see below).
If you run out of things to do on your student project:
TREE-PUZZLE (cont.)
Perform the analysis listed under (B) in class 11 and repeat the analysis
with 16S rRNAs.
The dataset rdp.fa contains the aligned ribosomal
RNA sequences (retrieved from RDP
database) from the following prokaryotes:
ARCHAEA:
Sulfolobus acidocaldarius (70 °C),
Sulfolobus solfataricus (70-85 °C),
Archaeoglobus fulgidus (83 °C),
Methanosarcina barkeri (30-37 °C),
Methanosarcina mazeii (37°C),
Methanococcus jannaschii (80°C),
Haloferax volcanii (37°C),
Halobacterium salinarium (ca37°C),
Methanobacterium thermoautotrophicum (60-65°C),
Desulfurococcus sp. (85-90 °C),
Thermococcus sp (75+ °C),
Aeropyrum pernix (90°C),
Thermoplasma acidophilum (55-60°C),
Pyrococcus abysii and Pyrococcus horikoshii (95°C)
BACTERIA:
Enterococcus hirae (37°C),
Borrelia burgdorferi (33-37°C),
Thermus thermophilus (70-80°C),
Deinococcus radiodurans (30°C),
Chlamydia trachomatis,
Chlamydophila pneumoniae.
DOES the environmental temperature influence the amount
of among site rate variation? To address this question we can estimate
the shape parameter (=the amount of ASRV) separately for subsets of sequences
from thermo and mesophilic organisms.
Using ClustalX generate three data sets:
4-7 prokaryotes with a growth temperature above 50°C
a comparable** group of 4-7 prokaryotes with a growth temperature below
50°C
a data set containing both of the above groups.
** Calculate a tree (puzzle distance matrix, then NEIGHBOR would be fine)
and choose sequences for your thermophile and mesophile subsets so that
they have similar relationships.
Analyse all three of these data sets using TREE-PUZZLE (use the default
options, but select a rate heterogeneity model described by a Gamma distribution
with 8 rate categories - have TREE-PUZZLE estimate the shape parameter)
Is the among site rate variation different for the two
sets of species? Discuss your findings.
Do your results for the rRNA dataset differ from the
results you obtained using the ATPase dataset?
Work on your own dataset (student project)!!!
|