Assignment #13 for Class 37

Your name:
Your email address:

Note: To do these exercises you need to install ClustalX, Phylip, and NJplot on your computer.

Download ClustalX from HERE. Drag the ClustalX icon to your Desktop. (Original source is here, for PCs and Macs.)

Download Phylip from HERE. Drag the "phylip-3.68" folder to your Desktop. (Original source is here, for PCs and Macs.)

Download NJplot from HERE (original link is here for PCs and Macs).

Download TreeViewX from HERE (original link is here for PCs and Macs).

If you're doing this exercise from home, you may need to have WINZIP or STUFF-IT expander installed (read the help manuals to learn how to use these programs) - UConn's ftp server has copies of these programs in the restricted folder (only accessible from within UConn).

Exercise 1: 30 minutes: Using Clustalx to perform bootstrap analyses using neighbor joining

Load these sequences into ClustalX, align them using the default options. From the menu:

File -- Load Sequences

Alignment -- Do Complete Alignment

In the Trees menu select "correct for multiple substitutions" and "exclude positions with gaps" (there should be checkmarks visible in the menu, before you run start the bootstrap analysis).
[If you plan to inspect the tree in treeview, in the output format options, click on branch and select node from the menu (this changes where the bootstrap values are written, as labels to the branch or as labels to the nodes - the latter is philosophically not really correct (these are unrooted trees) but is the way that many programs read the tree files).]

Calculate "Bootstrap NJ tree". The results will be written into a tree called something.phb (there is another tree calculated during the alignment that has the extension dnd)

Inspect the tree using njplot. Root the tree as appropriate and check display bootstrap support values.

Which branches have only low bootstrap support? Are any of these findings surprising? What might be the reason?

1B. Using Phylip to perform bootstrap analyses using parsimony

While still in your ClustalX alignment:

File -- Save Sequences as...

tick PHYLIP format

This will generate a file ending in ".phy" with your clustal alignment in PHYLIP format.

Edit this ATPaseSU.phy file, changing all gap "-" characters to missing data "?" characters.

Copy this ATPaseSU.phy file into the phylip-3.68/exe folder

Go into the "phylip-3.68" folder on your Desktop, then into "exe"

Run "seqboot"

seqboot: can't find input file "infile"
Please enter a new file name> ATPaseSU.phy
Y
Random number seed (must be odd)?
an-odd-number

completed replicate number   10
completed replicate number   20
completed replicate number   30
completed replicate number   40
completed replicate number   50
completed replicate number   60
completed replicate number   70
completed replicate number   80
completed replicate number   90
completed replicate number  100

Output written to file "outfile"

Done.

Rename "outfile" to ATPaseSUboot.phy

Run "protpars"

protpars: can't find input file "infile"
Please enter a new file name> ATPaseSUboot.phy
j  (Randomize input order of sequences)
Random number seed (must be odd)?
an-odd-number
Number of times to jumble?
1
m  (Analyze multiple data sets)
Multiple data sets or multiple weights? (type D or W)
d
How many data sets?
100
Random number seed (must be odd)?
1
Number of times to jumble?
1
y  (go!)

Rename outtree to ATPaseSUboot.tre

Run "consense"

consense: can't find input tree file "intree"
Please enter a new file name> ATPaseSUboot.tre 

consense: the file "outfile" that you wanted to
     use as output file already exists.
     Do you want to Replace it, Append to it,
     write to a new File, or Quit?
     (please type R, A, F, or Q) 
R

Are these settings correct? (type Y or the letter for one to change)
y

Consensus tree written to file "outtree"

Output written to file "outfile"

Done.

Rename outtree to ATPaseSUconsensus.tre

Inspect the tree using njplot. Root the trees as appropriate and check display bootstrap support values.

Which branches have only low bootstrap support? Very briefly, how does this compare to the previous tree made with ClustalX using a distance-based analysis?

1C. Using PhyML to perform bootstrap analyses using maximum likelihood

Sadly, this would take too long. We will skip on to the second exercise, but should you be interested:

HERE is a tree from a maximum-likelihood (ML) reconstruction using PhyML Online

 

Exercise 2 : 30 minutes

Long Branch Attraction (LBA) is a serious problem in phylogenetic reconstruction. LBA denotes the fact that long branches tend to be grouped together with significant support, even though the organisms representing the long branches did not share more recent common ancestry. The support usually is measured through bootstrap support values for the different trees. We have simulated the evolution of 4 sequences (named A,B,C,D) according to the following tree:

Files containing these sequences in multiple sequence fasta format were generated and named according to the length chosen for the two long branches (all scaled in substitutions per site). For the simulation we assumed that the Among site rate variation could be described with a gamma distribution that has a shape factor of 1 (equal to an exponential distribution).

These files are HERE

Your task is to explore the sensitivity of different phylogenetic reconstruction algorithms towards LBA. At the minimum you should use protein parsimony and one protein distance matrix analysis approach. In this case we know that the sequences are aligned as given, i.e., you could just load them into clustalx and save them as phylip formatted files. Alternatively, you could explore the effect that the alignment algorithm has on LBA. To keep track of things, name the files accordingly.

We will use programs from PHYLIP for this exercise. PHYLIP is a collection of programs for phylogenetic analyses written by Joe Felsenstein. The programs are freely available (including source code), and can be used on a variety of different operating system. The programs are modular. Different modules exist to create bootstrap samples, calculate distance matrices and calculate trees from the distance matrices (Fitch and Neighbor), calculate consensus trees, etc.. All programs either use files called infile or intree, or alternatively the user needs to provide the file name. (Note for your future work with these programs that phylip by default treats gaps as a 21 character. If you want to treat the gap as missing data, you need to replace the gap symbol with "?"'s). In case you want to use one of the programs on your own, read the excellent manuals that come with the software.

Save the file with the aligned sequences in the directory where the PHYLIP executables are stored (within the "phylip-3.68/exe" folder).

2A: To test parsimony, choose the files with x = 0.1, 0.3, 1, 3. Generate the corresponding files in Phylip format using clustalx (or clustalw).

How long are the sequences (if you aligned them, how long are they after the alignment)?

Use SEQBOOT from the phylip package to generate 100 bootstrap samples for each file. By default PHYLIP writes these into a single multiple sequence file. This file is named outfile. Rename it into something that reminds you of what is in this file (e.g.: 0_1.phy.boot).

Use PROTPARS from the phylip package to evaluate the 100 bootstrap samples (select option M for multiple files). The program generates a file that contains the trees calculated from each sample in parenthesis notation (outtree). Rename this file (e.g., 0_1.boot.protparstrees).

Use the program consense form the phylip package to calculate the consensus tree and its bootstrap support value.

In the following box list the files that you choose, aligned or as provided, the bootstrap support for the correct tree, and the support for the LBA tree:

2B) Explore a distance matrix based approach with respect to LBA. Depending on the settings, these might be less sensitive to LBA. x = 0.3, 3, 30, 300 are good choices to explore.
To do the analyses in PHYLIP do the following:
Convert the sequences to phylip format using clustalw. (convert only and/or align the sequences (see above))

In the following box list the parameters you chose for protdist, the files that you chose (aligned or as provided), and for each file indicate the bootstrap support for the correct tree, and the support for the LBA tree:

 

2C (optional) Explore how the sensitivity of Protdist towards LBA depends on the correction for multiple substitutions.

In the following box list give the parameters you chose for protdist, the files that you choose, indicate if you aligned them or used them as provided, and for each file give the bootstrap support for the correct tree, and the support for the LBA tree:

 

 

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone.