Assignments:

1) Align the sequences contained in the file testseq1.txt using clustalw1.7 and the default parameters.

2) Do you get the same result with clustalx? (Safe the alignment on your computer – you’ll need it again for #5)

3) Do you know what the reason is for the long region in the yeast ATPases that has no counterpart in the other vacuolar/archaeal ATPase catalytic subunits? (Medline, Entrez, blast)

4) In either clustalw or clustalx, what happens, when you change the gap penalty parameters away from their default values?

Can you find a parameter combination, so that the inserted sequence present in the two yeast sequences is no longer recognizable in the alignment.

5) Calculate a phylogenetic bootstrap consensus tree from the alignment calculated under #2 and look at it in treeview.  Does this tree conform to your expectation based on what you know about organismal evolution?   (Saccharomyces, Candida and Neurospora are fungi, Drosophila is a fly, Acetabularia a green algae, Daucus is another name for carrots, Sulfolobus is an archaeon, and Borrelia is a bacterium). 
Does the tree change, when you exclude positions with gaps?  When you add correction for multiple substitutions?

6) Repeat your exploration of the different gap penalty parameters using the more divergent sequences in testseq2.txt.  Can you find parameter choices that reveal the so-called non-homologous region in the catalytic V-ATPase subunits (i.e. a region of 100 aa, about 150 aa from the amino terminal end that is absent in the other ATPase subunits)? (Test at least a total of four combinations of pair wise and multiple alignment parameters.)  Can you get rid of the amino acids sprinkled into the corresponding gap? 

Are the nucleotide binding site motifs GGxxxxxGxxGxGKTV (in the V-ATPase A-subunits) properly aligned between the different types? 

What sequence in the V-ATPase B subunits does align to this motif?

Do the ATPase subunits and the rho termination factors (ttf in testseq2.txt) share other motifs besides the GKT region

Calculate a bootstrap consensus tree for this dataset and inspect it with treeview.  Try to root the tree with the ttf and fl subunits.  Which of the bifurcations in this tree are certain to represent gene duplications, which are likely to represent speciation events? 

7) If you have time left, decide on a topic (protein or nucleic acid family) for your student project.