Assignment for Class 14

Your name:
Your email address:

If you want to work from home : ClustalX for PCs can be found in this Zip-file . For MAC OSX follow the installation instructions given here.

If you work under OsX, your text editor (MSWord) will be able to read all possible files and translate end of line characters correctly. A frequent problem is that the end of line character in MAC and in UNIX (including Darwin, the system that OsX is running on) is different. If you open a UNIX application like clustalx, it expects the UNIX end of line character, and in case the file uses MAC end of line characters, everything will be in a single line. If this happens you can use the following terminal commands (the terminal is a window that opens a command line interface to your operating system) (see here for a summary of frequently used UNIX commands):

tr '\r' '\n' < macfile.txt > unixfile.txt

  1. (15 minutes) Getting to know The Darwin Operating System.
    The file we will work on this is here. Copy this file onto the desktop (click on the link, but keep the mouse button pressed, then select "save link to ..."), or into the directory you want to work in. Open this file in MSWord. In which format are the sequences?


    Open the terminal application (if it is not on the dock already, you find it in the application folder, within the utilities subfolder).
    type pwd then press <return> (things you'll have to type in the terminal window are in courier font, if there is a > at the beginning of the line, this represents the system prompt, do not type it.)
    pwd is a command that returns (prints) the current working directory. Which is your current working directory?

    Type ls (short for list) to see the contents of your working directory.
    "cd name" changes your working directory to the directory called name. Change your working directory to the directory that contains the ATPases file (if you put the file onto your desktop, the following should work: cd Desktop . If you want to move up one step in the directory hierarchy type cd .. , i.e. you move out of a sub folder) .
    Type ls to see a listing of your current working directory.
    Type more ATPaseSU.MAC.fa .
    Does the file content look normal?
    What character do you see where you would have expected a new line to start?
    Press the spacebar to scroll to the end of the file.

    Type
    tr '\r' '\n' < ATPaseSU.MAC.fa > ATPaseSU.unix.fa

    (This translates the MAC end or line characters into UNIX end of line characters)
    (if this does not work on your system try /usr/bin/tr '\015' '\n' < ATPaseSU.MAC.fa > ATPaseSU.unix.fa)

    type more ATPaseSU.unix.fa .
    Does the file look normal?
    Try to open both versions in MSWord.

    Note: if you frequently use UNIX applications on a Mac under OSX, you should get the ConvertNewlines application (it is installed on the Macs in the computerlab). Find the program, than drag the file you want to change over it (drop it into the icon).

  2. (20 minutes) Using the graphics User Interface (GUI) locate the clustalx application (in mcb221 in the clustal folder). Start the program by double clicking on the clustalx.app icon. Using the FILE menu, try to open both versions of the sequence files in clustalx.
    Once you loaded the sequences, calculate an alignment (Alignment menu -> Do complete Alignment now). Maximize the window and scrol to position 300. Most of the ATPase subunits have a "canonical" motif (G.....GKT) characteristic for many nucleotide binding sites. With which sequence has this motif been replaced in the B subunits of the vacuolar type ATPases?

    Save the alignment in different formats (PHYLIP, FASTA, NEXUS, and MSF) (File Menu -> Save sequences as).  Using a text editor (MS Word) and a non-proportional font (e.g. COURIER), inspect the different formats.

    NOTE :
    *.MSF is read by GCG; Phylip (*.phy) is the "new" phylip - interlaced format; NEXUS is used by PAUP and MrBayes 

    If you have time: You can use the input/output options to reformat an alignment. Some programs require specialized formats. You can use a text editor like MSWord to get your alignment into the desired format, but things are certainly much easier, if you start out with a format close to the desired one. Hint: Often you do not want to use the complete alignment, but only those portions which are sufficiently conserved. You can take a file in clustal format (*.aln) and delete columns with a text editor (in MSWord pressing down the alt key (on a PC) before clicking the mouse switches to column mode -- to see the alignment, you often need to decrease the font size and select a non-proportional font in your editor!). Although the different lines in the resulting alignment have different lengths, clustal reads in the aligned sequences correctly, and you can output the shortened sequences in any desired format you want. The problem is that on a MAC you have to convert back and forth from MAC to UNIX format.

  3. (15 minutes) Jalview is an excellent JAVA applet to inspect and edit multiple sequence alignments. It also allows inspection of protein space for the aligned sequences. This works surprisingly well. The Jalview Homepage contains a lot of additional information.
    Start Jalview as Java Web Start Application (if the window does not appear after a few seconds, check the dock for the JAVA icon and click on it).
    Load the ATPaseSU.unix.aln or ATPaseSU.unix.fa file (sequences need to be aligned!) into Jalview (File menu ->load local file).
    Explore the different coloring options (COLOUR menu). Which one seems to work best (most meaningfull - scroll through the alignment to a more conserved region).




    Note: You can change/edit the alignment by pointing on an amino acid residue and dragging it to the right or left. Try it, but leave the sequences in an aligned state before you move on.

    CALCULATE an AVERAGE DISTANCE TREE USING PID
    Click somewhere in the resulting tree to color groups of related sequences in the same color.

    CALCULATE the PRINCIPAL COMPONENT ANALYSIS.
    In a principal component analyses, the new dimensions are calculated as a linear combination of the original dimensions, so that greatest variance by any projection of the data set comes to lie on the first axis, etc. for the following dimensions. Can you find a higher dimension that breaks up the vacuolar ATPase A subunits? (Their names start with A.).
    Which of the A subunit sequences cluster together, if you use this dimension (2, 3 and 5 worked for me)?



Finished?

Check the appropriate radio button below before pressing the submit button:

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone.