Assignments:
1) Align the
sequences contained in the file testseq1.txt
using clustalw1.7 and the default parameters.
2) Do you get the
same result with clustalx? (Safe the alignment on your computer – you’ll need
it again for #5)
3) Do you know what
the reason is for the long region in the yeast ATPases that has no counterpart
in the other vacuolar/archaeal ATPase catalytic subunits? (Medline, Entrez,
blast)
4) In either
clustalw or clustalx, what happens, when you change the gap penalty parameters
away from their default values?
Can you find a
parameter combination, so that the inserted sequence present in the two yeast
sequences is no longer recognizable in the alignment.
5) Calculate a
phylogenetic bootstrap consensus tree from the alignment calculated under #2
and look at it in treeview. Does this
tree conform to your expectation based on what you know about organismal
evolution? (Saccharomyces, Candida and
Neurospora are fungi, Drosophila is a fly, Acetabularia a green algae, Daucus
is another name for carrots, Sulfolobus is an archaeon, and Borrelia is a
bacterium).
Does the tree change, when you exclude positions with gaps? When you add correction for multiple
substitutions?
6) Repeat your
exploration of the different gap penalty parameters using the more divergent sequences
in testseq2.txt. Can you find parameter choices that reveal
the so-called non-homologous region in the catalytic V-ATPase subunits (i.e. a
region of 100 aa, about 150 aa from the amino terminal end that is absent in
the other ATPase subunits)? (Test at least a total of four combinations of pair
wise and multiple alignment parameters.)
Can you get rid of the amino acids sprinkled into the corresponding
gap?
Are the nucleotide
binding site motifs GGxxxxxGxxGxGKTV (in the V-ATPase A-subunits) properly
aligned between the different types?
What sequence in
the V-ATPase B subunits does align to this motif?
Do the ATPase
subunits and the rho termination factors (ttf in testseq2.txt) share other
motifs besides the GKT region
Calculate a
bootstrap consensus tree for this dataset and inspect it with treeview. Try to root the tree with the ttf and fl
subunits. Which of the bifurcations in
this tree are certain to represent gene duplications, which are likely to
represent speciation events?
7) If you have time
left, decide on a topic (protein or nucleic acid family) for your student
project.