Assignment #10

Your name:
Your email address:

Note: To do these exercises you need to install ClustalX, Phylip, and NJplot on your computer.

Download ClustalX from HERE. Drag the ClustalX icon to your Desktop. (Original source is here, for PCs and Macs.)

Download Phylip from HERE. Drag the "phylip3.69" folder to your Desktop. (Original source is here, for PCs and Macs.)

Download NJplot from HERE (original link is here for PCs and Macs).

Download Figtree from here. (The latest versions for different operating systems are here)

Download Seaview. The latest versions of the seaview program available for different platforms are here.

Download TreeViewX from HERE (original link is here for PCs and Macs).

If you're doing this exercise from home, you may need to have WINZIP or STUFF-IT expander installed (read the help manuals to learn how to use these programs) - UConn's ftp server has copies of these programs in the restricted folder (only accessible from within UConn).

Exercise 1: 30 minutes: Using ClustalX to perform bootstrap analyses using neighbor joining

Load these sequences into ClustalX, align them using the default options. From the menu:

File -- Load Sequences

Alignment -- Do Complete Alignment

In the Trees menu select "correct for multiple substitutions" and "exclude positions with gaps" (there should be checkmarks visible in the menu, before you run start the bootstrap analysis).
[If you plan to inspect the tree in treeview or Figtree, in the output format options, click on branch and select node from the menu (this changes where the bootstrap values are written, as labels to the branch or as labels to the nodes - the latter is philosophically not really correct (these are unrooted trees) but is the way that many programs read the tree files).]

Calculate "Bootstrap NJ tree". The results will be written into a tree called something.phb (there is another tree calculated during the alignment that has the extension dnd)

Inspect the tree using njplot. Root the tree as appropriate and check display bootstrap support values.

Which branches have only low bootstrap support? Are any of these findings surprising? What might be the reason?

1B. Using Phylip to perform bootstrap analyses using parsimony

[comment: You could perform the same analysis in seaview using a GUI; however, it is a good idea to go through this step by step, because you may get a better understanding of how the non-parametric bootstrap works.]

While still in your ClustalX alignment:

File -- Save Sequences as...

tick PHYLIP format

This will generate a file ending in ".phy" with your clustal alignment in PHYLIP format.

Edit this ATPaseSU.phy file, changing all gap "-" characters to missing data "?" characters. (In MSWord or TextWrangler)

Copy this ATPaseSU.phy file into the phylip-3.69/exe folder

Go into the "phylip-3.69" folder on your Desktop, then into "exe"

Run "seqboot"

seqboot: can't find input file "infile"
Please enter a new file name> ATPaseSU.phy
Y
Random number seed (must be odd)?
an-odd-number

completed replicate number   10
completed replicate number   20
completed replicate number   30
completed replicate number   40
completed replicate number   50
completed replicate number   60
completed replicate number   70
completed replicate number   80
completed replicate number   90
completed replicate number  100

Output written to file "outfile"

Done.

Rename "outfile" to ATPaseSUboot.phy

Run "protpars"

protpars: can't find input file "infile"
Please enter a new file name> ATPaseSUboot.phy
j  (Randomize input order of sequences)
Random number seed (must be odd)?
an-odd-number, e.g. 5 (do NOT type "an-odd-number")
Number of times to jumble?
1
m  (Analyze multiple data sets)
Multiple data sets or multiple weights? (type D or W)
d
How many data sets?
100
Random number seed (must be odd)?
1
Number of times to jumble?
1
y  (go!)

Rename outtree to ATPaseSUboot.tre

Run "consense"

consense: can't find input tree file "intree"
Please enter a new file name> ATPaseSUboot.tre 

consense: the file "outfile" that you wanted to
     use as output file already exists.
     Do you want to Replace it, Append to it,
     write to a new File, or Quit?
     (please type R, A, F, or Q) 
R

Are these settings correct? (type Y or the letter for one to change)
y

Consensus tree written to file "outtree"

Output written to file "outfile"

Done.

Rename outtree to ATPaseSUconsensus.tre

Inspect the tree using njplot. Root the trees as appropriate and check display bootstrap support values.

Which branches have only low bootstrap support? Very briefly, how does this compare to the previous tree made with ClustalX using a distance-based analysis?

1C. Using PhyML to perform bootstrap analyses using maximum likelihood

Sadly, this would take too long. We will skip on to the second exercise, but should you be interested:

HERE is a tree from a maximum-likelihood (ML) reconstruction using PhyML Online,

 

Exercise 2 : 30 minutes

Long Branch Attraction (LBA) is a serious problem in phylogenetic reconstruction. LBA denotes the fact that long branches tend to be grouped together with significant support, even though the organisms representing the long branches did not share more recent common ancestry. The support usually is measured through bootstrap support values for the different trees. We have simulated the evolution of 4 sequences (named A,B,C,D) according to the following tree:

Files containing these sequences in multiple sequence fasta format were generated and named according to the length chosen for the two long branches (all scaled in substitutions per site). For the simulation we assumed that the Among site rate variation could be described with a gamma distribution that has a shape factor of 1 (equal to an exponential distribution).

These files are HERE (open the folder, then ctrl click on the individual files to save them into the "phylip-3.69/exe" folder).

Your task is to explore the sensitivity of different phylogenetic reconstruction algorithms towards LBA. At the minimum you should use protein parsimony and one protein distance matrix analysis approach. In this case we know that the sequences are aligned as given; however, you to explore the effect that the alignment algorithm has on LBA, we can align them before phylogenetic reconstruction. To keep track of things, name the files accordingly.

NOTE I: If you want to explore the effect of alignment, it might be a good idea to use seaview and muscle as alignment program - especially for the more divergent sequences, clustalx takes a very long time. You should do at least one of the exercises using the phylip programs directly, for the following ones, you could use the GUI provided in seaview.

Note II: You can divide the labor with your neighbor, distributing different sequences to different students.

We will use programs as implemented in SEAVIEW

2A: To test parsimony, choose the files with x = 0.1, 0.3, 1, 3, 10.

How long are the sequences before and after alignment with muscle - the default in the align menu of seaview)?

For each of the datasets, use the tree menu in seaview, select parsimony, uncheck "ignore all gap sites", check "gaps as unknown states", check "bootstrap with 100 replicates". (Note: If you are interested in the best parsimony tree, then you want to use the original dataset (not bootstrapped) and randomize the input order for several independent heuristic searches, if you do a bootstrap analysis, repeated heuristic searches for each dataset are not worth the time.)

In the following box list the files that you chose, aligned or as provided, the bootstrap support for the correct tree, and the support for the LBA tree:

2B) Explore a distance matrix based approach with respect to LBA. Depending on the settings, these might be less sensitive to LBA. x = 0.3, 3, 30, 300 are good choices to explore.

In the following box list the parameters you selected in seaview, the files that you chose (aligned or as provided), and for each file indicate the bootstrap support for the correct tree, and the support for the LBA tree:

 

2C (optional) Explore how the sensitivity of Protdist towards LBA depends on the correction for multiple substitutions.

In the following box list give the parameters you chose for protdist, the files that you choose, indicate if you aligned them or used them as provided, and for each file give the bootstrap support for the correct tree, and the support for the LBA tree:

 

 

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone.