CLASS 11. Multiple Sequence Alignments

HOMEWORK PROBLEM SET #2 is posted to Moodle. Due in class on FRIDAY, FEBRUARY 5.

INSTRUCTIONS:

For each exercise, provide search query used and keep the answers brief. Email me the answers by Sunday 11:59PM AST at the latest.

Use "CLASS 11 EXERCISE" as a message subject, and type answers directly to email body (i.e., no document attachments please). Make sure that first line of your message is your NAME.

  1. Download ClustalX v. 2.0.12 using the appropriate link in the yellow box. If you are using class workstation, install the program onto your "M" drive (e.g., into "M:/ClustalX/" directory and NOT into "Program Files" directory).

  2. We will be using this sequence file, which contains ATPase subunits (SUs) and homologs from several organisms in FASTA format. The sequences in this file are annotated as follows:

    - Archaeal/vacuolar ATPase A subunits, names start with A (catalytic SUs)
    - Archaeal/vacuolar ATPase B subunits, names start with B (non-catalytic or regulatory SUs)
    - bacterial Flagellar assembly ATPase subunits, names start with F
    - bacterial (mitochondrial) F-ATPase beta subunits, names start with beta (catalytic SUs)
    - bacterial (mitochondrial) F-ATPase alpha subunits, names start with alpha (non-catalytic or regulatory SUs)
    - bacterial rho transcription termination factors, names start with ttf

    Start ClustalX by double clicking on the clustalx icon.

    Using the FILE menu, load the sequence file.

    Once you loaded the sequences, calculate an alignment (Alignment menu -> "Do Complete Alignment").

    It will take several minutes.

    Maximize the window and scroll to position 300. Most of the ATPase subunits have a "canonical" motif (G.....GKT) characteristic for many nucleotide binding sites. With which sequence has this motif been replaced in the B subunits of the vacuolar type ATPases?

  3. Save the alignment in different formats (PHYLIP, FASTA, NEXUS, and MSF) (File Menu -> "Save sequences as...").  Using a text editor (such as MS Word) and a non-proportional font (e.g. COURIER), inspect the different formats. We will be using some of these formats later in the class, in phylogenetic analyses.

  4. The following is a list of intein containing Yeast V-ATPase catalytic subunits -- CLICK HERE --. Download these sequences in FASTA format, and then align them in ClustalX as you've done in the previous exercise. Scanning through the alignment, can you predict which part of the sequences corresponds to the ATPase subunit, and which to the intein?

  5. Align the intein-containing sequences with the ones from the ATPaseSU file used in exercise 1. --HERE-- is a combined sequence file. When you align the sequences with the ones from the ATPaseSU file (try both a profile alignment, and a simple alignment of all of the sequences), is the intein clearly recognized as an inserted region?

  6. Select all sequences. In the "Alignment" menu select sub-menu "Alignment Parameters" and then select "Reset All Gaps Before Alignment". Change the alignment parameters (e.g., reduce the opening gap penalty in both pairwise and multiple alignment parameter settings). Is the intein still aligned to "gaps-only" region in the other ATPases subunits?

  7. COBALT is a new online alignment program from NCBI. Try to align the combined data set in COBALT. Compare it to the alignment from ClustalX.