CLASS 26. Phylogenetic Trees in ClustalX. Tree Visualization.

INSTRUCTIONS:

For each exercise, provide search query used and keep the answers brief. Email me the answers by Sunday 11:59PM AST at the latest.

Use "CLASS 26 EXERCISE" as a message subject, and type answers directly to email body (i.e., no document attachments please). Make sure that first line of your message is your NAME.

PART 1.

  1. Download this file onto your computer. Download the program ClustalX (if you have not kept it on M drive), start it, and load the sequences into the clustalx program (File > Load sequences).
    The sequences are named according to the following schema:

    A
    denotes the catalytic ATP binding subunits of the vacuolar proton pumping ATPase that is found on membranes of the eukaryotic endomembrane system and of the archaeal type A-ATPsynthase
    - these were called A-subunits, because they are the largest subunits of the water soluble head group of these ATPase/ATPsynthases

    bet
    denotes the catalytic ATP binding subunits of the bacterial type F-ATPsynthase (also present in mitochondria and plastids)
    - these were called beta subunits because they are the second largest subunit in the head group

    B denotes the non-catalytic ADP binding subunits of the vacuolar proton pumping V-ATPase that is found on membranes of the eukaryotic endomembrane system and of the archaeal type A-ATPsynthase
    - these were called B-subunits, because they are the second largest subunits in the head group of these ATPase/ATPsynthases

    alp denotes the non-catalytic ADP binding subunits of the bacterial type F-ATPsynthase
    - these were called alpha subunits because they are the largest subunit in the head group

    fl denotes proteins that if mutated prevent the assembly of the bacterial flagella

    The subunit designation is followed by the genus name:

    Mus musculus - house mouse, an animal (mammal)
    Arabidopsis thaliana - thale cress, a flowering plant (model organism for botanists)
    Neurospora crassa - baker's mold (model organism for geneticists)
    Ecoli - Escherichia coli, a Gram negative bacterium, (model organism for microbiologists)
    Aquifex aeolicus- an extremely thermophilic bacterium (model organism for astrobiologists)
    Methanosarcina barkeri, a euryarchaeote
    Sulfolobus acidocaldarius, a crenarchaeote
    Plasmodium, Trepanosoma - flagelated protozoa

  2. Align the sequences using the default options (Alignment > Do complete alignment)

  3. Calculate a neighbor joining trees (Trees > Draw N-J tree) using the different options: (don't forget to give different names to your trees!)

    Exclude positions with gaps (unchecked) - correct for multiple substitutions (unchecked)
    Exclude positions with gaps (unchecked) - correct for multiple substitutions (checked)
    Exclude positions with gaps (checked) - correct for multiple substitutions (unchecked)
    Exclude positions with gaps (checked) - correct for multiple substitutions (checked)

    Save the trees into different files using names that allow you to remember which options you used - the absence of a logfile is one of the drawbacks of ClustalX.

  4. Download NJPlot. Using Njplot, for now load the tree file that was calculated including positions with gaps and without correction for multiple substitutitions (clustalx has generated two types of tree files: *.dnd is a tree that is used to guide the alignment; *.ph is the neighbor joining tree in Newick format. The *.ph trees are the ones you want to explore.

  5. In njplot explore the new outgroup option.
    Where do you think the root of the tree might be located?
    Which subunits are paralogs, which are probably orthologous? In particular, are the beta and A subunits orthologs or paralogs?
    Which of the bifurcations correspond to gene duplications?

  6. In njplot explore the swap nodes option. Does this change the tree?

  7. In njplot explore the subtree option. Can you manage to draw a tree from which only the flagellar assembly ATPase are excluded?

  8. Compare the neighbor joining trees that clustalx calculated using the different options.
    What are the differences?

  9. PART 2.

  10. Load the sequences contained in testseq1.txt (V/A-ATPase catalytic subunits) into clustalx. Align the sequences (note the inteins in two of the sequences) and calculate a neighbor joining tree. Load the tree into NJPlot. Use Sulfolobus and Thermococcus as the outgroup. Does the tree correspond to your expectations?

    Sulfolobus and Thermococcus are Archaea, Borrelia is a spirochete (bacterium), Acetabularia is a green algae, Daucus is a flowering plant (carrot), Candida and Saccharomyces are yeasts, Neurospora is another fungus (not a yeast though), Drosophila is a fruit fly and Trypanosomes are protists.

  11. The sequences in testseq1.txt are quite similar to one another. To test the effect of long branches, a homologous, but only distantly related sequence, was added to this file (the ATPase involved in flagellar assembly from Salmonella). The resulting file is testseq1b.txt .
    Align the sequences and calculate neighbor joining trees for this file using the possible permutations of gaps/ no gaps, and with and without correction for multiple substitutions. Which of the resulting trees appears to best reflect the actual evolution? Give a justification for your choice. What might be the reason that the others options worked less well?

  12. What do you expect to happen, when you replace the Salmonella sequence with a completely (?) unrelated ( testseq1c.txt )? Is your expectation confirmed?