MCB 3421 Assignment 9

Your name:
Your email address:

A note about the "all on one line" problem you may come across when working with files

If you work on a Windows or Mac computer, your text editor (e.g., Microsoft Word) will be able to read all possible files and translate end of line characters correctly. A frequent problem is that the end of line character in MAC and in UNIX (including Darwin, the system that OsX is running on) is different. If you open a UNIX application like clustalx, it expects the UNIX end of line character, and in case the file uses MAC end of line characters, everything will be in a single line.

How do I convert between Unix and Windows text files?
How do I convert between Unix and Mac OS X text files?

Also, most versions of Microsoft Word, when you save a document as text file, allow you to select different end of line coding - the default usually is the setting for windows.

Part1: Introduction to seaview(30 minutes)

Seaview is already installed on the Windows computers in Whetten 300A. Find it in the All Programs menu, or use the search (magnifying glass icon) to locate "seaview".

ONLY IF HAVING TROUBLE: One can download seaview from here. Right-click on "MS Windows", then Save target as... to the Desktop. Double-click the "seaview4.exe" file on the Desktop to extract the contents to a seaview4 folder. In this seaview4 folder is the "seaview.exe" program. (There is also a seaview64bits.exe if your computer supports it.)

We will be using this sequence file. The sequences in this file are annotated as follows:

Save the above sequence file to the Desktop. Then, from seaview, File — Open, and open the sequence file. Check the setting in Align — Alignment options. It should be set to "clustalo" by default. (The only other option is "muscle".)

Align — Align all

This should only take a few seconds. Then click "OK".

Maximize the window and scroll to position 401. Most of the ATPase subunits have a "canonical" motif (G.....GKT) characteristic for many nucleotide binding sites. With which sequence has this motif been replaced in the B subunits of the vacuolar type ATPases?

Save the alignment in different formats (PHYLIP, FASTA, NEXUS, and MSF) (File Menu -> Save as).  Using a text editor (MS Word) and a non-proportional font (e.g. COURIER), inspect the different formats.

NOTE :
*.MSF is read by GCG; Phylip (*.phy) is the "new" phylip - interlaced format; NEXUS is used by PAUP and MrBayes 

File — New window. This time change the alignment program to "muscle" (Align — Alignment options). Align — Align all.

Were the intein / extein junctions retained?



If you have time: Some programs require specialized formats. You can use a text editor like MSWord to get your alignment into the desired format, but things are certainly much easier, if you start out with a format close to the desired one. Hint: Often you do not want to use the complete alignment, but only those portions which are sufficiently conserved. You can take a file in clustal format (*.aln) and delete columns with a text editor (in MSWord pressing down the alt key (on a PC) before clicking the mouse switches to column mode -- to see the alignment, you may need to decrease the font size and select a non-proportional font in your editor!). Although the different lines in the resulting alignment have different lengths, seaview reads in the aligned sequences correctly, and you can output the shortened sequences in any desired format you want. One problem is that on a MAC you have to convert back and forth from MAC to UNIX format. The seaview program makes it easier to select positions fro further analysis.

Part 2: More alignments -- and combining alignments

The following is a list of intein containing Yeast V-ATPase catalytic subunits -- CLICK HERE --. Download these sequences (send to: pulldown menu, select file and then fasta format), and then drag the file into the seaview alignment window. Align them into a multiple formated sequence file using the clustalo (set the option in the align menu under options, then start the alignment by selecting align all).

Scanning through the alignment, can you predict which part of the sequences corresponds to the ATPase subunit, and which to the intein? (If you click on a residue, seaview tells the position in the alignment and the position in the individual sequence). Redo the alignment in muscle. Do you see any differences between the muscle and the clustalw alignment? (Yes/No; Yes/No, if yes from about where to where in which sequence?)

Open a second seaview window and align the intein containing sequences (the sequence names start with gi....) with the ones from the S. cerevisiae ATPaseSU file. --HERE are all the ATPase SU + the intein contining A SU-- --Here are only the yeast A-SU --.

Seaview interface hints: To select a sequence click on the name. To select more than one sequence click and drag. To select non-adjacent sequences just click on each one. A second click will deselect the sequence. If you want to clear all your selections to start over, double click on any name. It is often necessary to move sequences in an alignment to put them next to one another for comparison. To move a sequence in SeaView, select the sequence by clicking on it. Move your cursor to the location where you want it to go and control click (hold down the control key and click). This will move your selected sequence to the new location. If you want to move a block of sequences select the block by dragging and do the same thing.

Explore different options to align the sequences with and without inteins. Define two groups of sequences (those with and without inteins) and explore a profile alignment between the two (select each group first and align within the group, then between the two groups). Try both muscle and clustalo as alignment algorithm for the profile alignment. Is the intein clearly recognized as an inserted region?

In an alignment containing only the A-subunits with inteins, and one A-SU from yeast without intein as reference (do it yourself, or here), use gblocks to define conserved sites: In the sites menu, select create set, select GBLOCKS, select "allow for gap positions within the final blocks". Which parts of the intein are flagged as reliably aligned (you could compare this to to the conserved blocks listed in inbase here )? What do you expect to happen, if you use GBLOCKS on the larger dataset? In particular, what would happen to the intein sequences? (Try it out , if the answer is not obvious!)

Different alignment programs create different alignments - clustal and muscle are rather similar, PRANK produces alignments of a different flavor. Try an alignment of the yeast V-ATPase subunits with intein. (webPrank does not like the multiple fasta file created above, this one works). If this takes too long, the resulting file is here. (Load the aligned multiple sequence fasta file into seaview or jalview.) How do the muscle and the prank alignments differ? What are the overall lengths of the alignments? Why might the PRANK alignment be advantageous for some downstream applications?

If you have time, repeat the alignment with MAFFT (online version here). This is fast. How does the resulting alignment compare to the muscle, and PRANK alignments?

Some programs allow to estimate the robustness of aligned regions. Guidance from Tal Pupko's group at TAU produces alignments with good estimation of reliability of individual alignment columns (the output from guidance for the ATPases alignment is here and here). For your future work, you might want to keep this service in mind. You also can use GBLOCKS in seaview (see above) to select sites which are reliably aligned, but this may be too conservative.

Finished?

Check the appropriate radio button below before pressing the submit button:

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone.