MCB 3421 Assignment 9

Your name:
Your email address:

A note about the "all on one line" problem you may come across when working with files

If you work on a Windows or Mac computer, your text editor (e.g., Microsoft Word) will be able to read all possible files and translate end of line characters correctly. A frequent problem is that the end of line character in MAC and in UNIX (including Darwin, the system that OsX is running on) is different. If you open a UNIX application like clustalx, it expects the UNIX end of line character, and in case the file uses MAC end of line characters, everything will be in a single line.

How do I convert between Unix and Windows text files?
How do I convert between Unix and Mac OS X text files?

Also, most versions of Microsoft Word, when you save a document as text file, allow you to select different end of line coding - the default usually is the setting for windows.

Part1: Introduction to seaview

Seaview is already installed on the Windows computers in Whetten 300A. Find it in the All Programs menu, or use the search (magnifying glass icon) to locate "seaview".

ONLY IF HAVING TROUBLE: One can download seaview from here. Right-click on "MS Windows", then Save target as... to the Desktop. Double-click the "seaview4.exe" file on the Desktop to extract the contents to a seaview4 folder. In this seaview4 folder is the "seaview.exe" program. (There is also a seaview64bits.exe if your computer supports it.)

We will be using this sequence file. The sequences in this file are annotated as follows: The sequences are named according to the following schema:

Adenotes the catalytic ATP binding subunits of the vacuolar proton pumping ATPase that is found on membranes of the eukaryotic endomembrane system and of the archaeal type A-ATPsynthase
- these were called A-subunits, because they are the largest subunits of the water soluble head group of these ATPase/ATPsynthases

betdenotes the catalytic ATP binding subunits of the bacterial type F-ATPsynthase (also present in mitochondria and plastids)
- these were called beta subunits because they are the second largest subunit in the head group

B denotes the non-catalytic ADP binding subunits of the vacuolar proton pumping V-ATPase that is found on membranes of the eukaryotic endomembrane system and of the archaeal type A-ATPsynthase
- these were called B-subunits, because they are the second largest subunits in the head group of these ATPase/ATPsynthases

alp denotes the non-catalytic ADP binding subunits of the bacterial type F-ATPsynthase
- these were called alpha subunits because they are the largest subunit in the head group

fl denotes proteins that if mutated prevent the assembly of the bacterial flagella

The subunit designation is followed by the genus name:
Mus musculus - house mouse, an animal (mammal)
Arabidopsis thaliana - thale cress, a flowering plant (model organism for botanists)
Neurospora crassa - bakers mold, an ascus forming fungus (model organism for geneticists)
Ecoli - Escherichia coli, a Gram negative bacterium, (model organism for biology)
Aquifex aeolicus- an extremely thermophilic bacterium (model organism for astrobiologists)
Methanosarcina barkeri, a euyarchaeote
Sulfolobus acidocaldarius, a crenarchaeote
Plasmodium, Trpanosoma - flagelated protozoa

Save the above sequence file to the Desktop. Then, from seaview, File — Open, and open the sequence file.

Alignment

Check the setting in Align — Alignment options. It should be set to "clustalo" by default. (The only other option is "muscle".)

Align — Align all

This should only take a few seconds. Then click "OK".

Maximize the window and scroll to position 412. Most of the ATPase subunits have a "canonical" motif (G.....GKT) characteristic for many nucleotide binding sites. With which sequence has this motif been replaced in the B subunits of the vacuolar type ATPases?

You can save the alignment in many different formats (PHYLIP, FASTA, NEXUS, and MSF) (File Menu -> Save as).

Change the alignment program to "muscle" (Align — Alignment options). Align — Align all.

Were the intein / extein junctions retained?

Building trees

Using Trees—Distance methods, calculate four neighbor joining trees as follows:

"BioNJ", Distance "Observed" (observed means no correction for multiple substitutions), ignore all gap sites UNCHECKED, CECKed Bootstrap, 100 replicates
"BioNJ", Distance "Observed", ignore all gap sites CHECKED, CECKed Bootstrap, 100 replicates
"BioNJ", Distance Poisson, ignore all gap sites UNCHECKED, CECKed Bootstrap, 100 replicates
"BioNJ", Distance Poisson, ignore all gap sites CHECKED, CECKed Bootstrap, 100 replicates

Within each tree window, explore the "Re-root" option. (After choosing a new root, switch back to the "Full" option to make the display neater.)
Where do you think the root of the tree might be located?
Which subunits are paralogs (i.e. evolved from a gene duplication), which are probably orthologous (i.e., the homologs are related by a speciation, not a gene duplication vent)?
In particular, are the beta and A subunits orthologs or paralogs?
Which of the bifurcations correspond to gene duplications?

Explore the "Swap" option. Does this change the tree?

Explore the "Subtree" option. Can you manage to draw a tree from which only the flagellar assembly ATPase are excluded?

Compare the neighbor joining trees calculated using the different options.
What are the differences? (Note: The scale bar indicates the average number of substitutions per site)

Fo a neighbor joining tree calculation that worked well, check the "Br support" box in the tree window. Inspect the resulting tree. Given the support values, can you be sure that the flagellar assembly ATPase subunit diverged before the duplication that gave rise to the catalytic and non-catalytic subunits?

Save the tree from the tree window using File—"Save rooted tree". Open the tree from the bootstrap analysis in FigTree (should be already installed on the lab computers). Explore different options to display editorialize the tree (re-root, collapse and/or color different clades). The header allows you to select nodes/clades or taxa. Once you did this, the available options are turned on in the header (e.g., reroot becomes available, after you selected a "node" (realy a branch)). (Note the automated coloring according to support values only works, if the support is given as a fraction of 1). The menus in the bar on the left allow to set font sizes, line widths, etc.) If you arrive at a nice depiction, export the tree as pdf (in the file menu), and send a copy to gogarten@uconn.edu .

Part 2: More alignments -- and combining alignments

The following is a list of intein containing Yeast V-ATPase catalytic subunits -- CLICK HERE --. Download these sequences (send to: pulldown menu, select file and then fasta format), and then drag the file into the seaview alignment window. Align them into a multiple formated sequence file using the clustalo (set the option in the align menu under options, then start the alignment by selecting align all).

Scanning through the alignment, can you predict which part of the sequences corresponds to the ATPase subunit, and which to the intein? (If you click on a residue, seaview tells the position in the alignment and the position in the individual sequence). Redo the alignment in muscle. Do you see any differences between the muscle and the clustalw alignment? (Yes/No; Yes/No, if yes from about where to where in which sequence?)

Open a second seaview window and align the intein containing sequences (the sequence names start with gi....) with the ones from the S. cerevisiae ATPaseSU file. --HERE are all the ATPase SU + the intein contining A SU-- --Here are only the yeast A-SU --.

Seaview interface hints: To select a sequence click on the name. To select more than one sequence click and drag. To select non-adjacent sequences just click on each one. A second click will deselect the sequence. If you want to clear all your selections to start over, double click on any name. It is often necessary to move sequences in an alignment to put them next to one another for comparison. To move a sequence in SeaView, select the sequence by clicking on it. Move your cursor to the location where you want it to go and control click (hold down the control key and click). This will move your selected sequence to the new location. If you want to move a block of sequences select the block by dragging and do the same thing.

Explore different options to align the sequences with and without inteins. Define two groups of sequences (those with and without inteins) and explore a profile alignment between the two (select each group first and align within the group, then between the two groups). Try both muscle and clustalo as alignment algorithm for the profile alignment. Is the intein clearly recognized as an inserted region?

In an alignment containing only the A-subunits with inteins, and one A-SU from yeast without intein as reference (do it yourself, or here), use gblocks to define conserved sites: In the sites menu, select create set, select GBLOCKS, select "allow for gap positions within the final blocks". Which parts of the intein are flagged as reliably aligned (you could compare this to to the conserved blocks listed in inbase here )? What do you expect to happen, if you use GBLOCKS on the larger dataset? In particular, what would happen to the intein sequences? (Try it out , if the answer is not obvious!)

Different alignment programs create different alignments - clustal and muscle are rather similar, PRANK produces alignments of a different flavor. Try an alignment of the yeast V-ATPase subunits with intein. (webPrank does not like the multiple fasta file created above, this one works). If this takes too long, the resulting file is here. (Load the aligned multiple sequence fasta file into seaview or jalview.) How do the muscle and the prank alignments differ? What are the overall lengths of the alignments? Why might the PRANK alignment be advantageous for some downstream applications?

If you have time, repeat the alignment with MAFFT (online version here). This is fast. How does the resulting alignment compare to the muscle, and PRANK alignments?

Some programs allow to estimate the robustness of aligned regions. Guidance from Tal Pupko's group at TAU produces alignments with good estimation of reliability of individual alignment columns (the output from guidance for the yeast V-ATPases with inteins alignment is here). For your future work, you might want to keep this service in mind. You also can use GBLOCKS in seaview (see above) to select sites which are reliably aligned, but this may be too conservative.

Finished?

Check the appropriate radio button below before pressing the submit button:

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone.