Assignment for exercises #11

Your name:
Your email address:

Exercise 1: 15 minutes: Using Clustalx to perform bootstrap analyses using neighbor joining

We will use the same datasets as in the previous exercise (here) and (here). Load these sequences into clustalx, align them using the default options;
in the trees menu select "correct for multiple substitutions" and "exclude positions with gaps" (there should be checkmarks visible in the menu, before you run start the bootstrap analysis).
[If you plan to inspect the trees in treeview, in the output format options, click on branch and select node from the menu (this changes where the bootstrap values are written, as labels to the branch or as labels to the nodes - the latter is philosophically not really correct (these are unrooted trees) but is the way that many programs read the tree files).]

Calculate "Bootstrap NJ tree". The results will be written into a tree called something.phb (there is another tree calculated during the alignment that has the extension dnd)

Inspect the trees using njplot. Root the trees as appropriate and check display bootstrap support values.

Which branches have only low bootstrap support? Are any of these findings surprising? What might be the reason?

 

Exercise 2 : 20 minutes

Long Branch Attraction (LBA) is a serious problem in phylogenetic reconstruction. LBA denotes the fact that long branches tend to be grouped together with significant support, even though the organisms representing the long branches did not share more recent common ancestry. The support usually is measured through bootstrap support values for the different trees. We have simulated the evolution of 4 sequences (named A,B,C,D) according to the following tree:

Files containing these sequences in multiple sequence fasta format were generated and named according to the length chosen for the two long branches (all scaled in substitutions per site). For the simulation we assumed that the Among site rate variation could be described with a gamma distribution that has a shape factor of 1 (equal to an exponential distribution).

These files are at http://carrot.mcb.uconn.edu/mcb221_2007/simsequences/

Your task is to explore the sensitivity of different phylogenetic reconstruction algorithms towards LBA. At the minimum you should use protein parsimony and one protein distance matrix analysis approach. In this case we know that the sequences are aligned as given, i.e., you could just load them into clustalx and save them as phylip formatted files. Alternatively, you could explore the effect that the alignment algorithm has on LBA. To keep track of things, name the files accordingly.

We will use programs from PHYLIP for this exercise. PHYLIP is a collection of programs for phylogenetic analyses written by Joe Felsenstein. The programs are freely available (including source code), and can be used on a variety of different operating system. The programs are modular. Different modules exist to create bootstrap samples, calculate distance matrices and calculate trees from the distance matrices (Fitch and Neighbor), calculate consensus trees, etc.. All programs either use files called infile or intree, or alternatively the user needs to provide the file name. (Note for your future work with these programs that phylip by default treats gaps as a 21 character. If you want to treat the gap as missing data, you need to replace the gap symbol with "?"'s). In case you want to use one of the programs on your own, read the excellent manuals that come with the software.

Save the file with the aligned sequences in the directory where the PHYLIP executables are stored (within the mcb221 folder).

2A: To test parsimony, choose the files with x = 0.1, 0.3, 1, 3. Generate the corresponding files in Phylip format using clustalx (or clustalw).

How long are the sequences (if you aligned them, how long are they after the alignment)?

Use SEQBOOT from the phylip package to generate 100 bootstrap samples for each file. By default PHYLIP writes these into a single multiple sequence file. This file is named outfile. Rename it into something that reminds you of what is in this file (e.g.: 0_1.phy.boot).

Use PROTPARS from the phylip package to evaluate the 100 bootstrap samples (select option M for multiple files). The program generates a file that contains the trees calculated from each sample in parenthesis notation (outtree). Rename this file (e.g., 0_1.boot.protparstrees).

Use the program consense form the phylip package to calculate the consensus tree and its bootstrap support value.

In the following box list the files that you choose, aligned or as provided, the bootstrap support for the correct tree, and the support for the LBA tree:

2B) Explore a distance matrix based approach with respect to LBA. Depending on the settings, these might be less sensitive to LBA. x = 0.3, 3, 30, 300 are good choices to explore.
To do the analyses in PHYLIP do the following:
Convert the sequences to phylip format using clustalw. (convert only and/or align the sequences (see above))

In the following box list the parameters you chose for protdist, the files that you chose (aligned or as provided), and for each file indicate the bootstrap support for the correct tree, and the support for the LBA tree:

 

2C (optional) Explore how the sensitivity of Protdist towards LBA depends on the correction for multiple substitutions.

In the following box list give the parameters you chose for protdist, the files that you choose, indicate if you aligned them or used them as provided, and for each file give the bootstrap support for the correct tree, and the support for the LBA tree:

 

 

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone.