The
sequences in testseq4 are small ribosomal RNAs from bacteria and mitochondria,
with two archaeal sequences one might want to use as outgroup:
The rRNA in animal mitochondria evolves much faster
than the rRNA in bacteria and in plant mitochondria.
This dataset is designed to test and explore the long branch attraction
artifact.
- Run DNAPARS on the original dataset (use
default options). The results will be written to the file named "outfile".
!!! Rename it !!! Otherwise the next
PHYLIP program you run will overwrite it.
- Explore the different options. (If in doubt, check the manual).
Why might it be useful
to jumble the input order and run repeated analyses?
What is the number
of steps in the most parsimonious tree?
Do your analyses
always get the same tree with the same number of steps?
- How are gaps treated
by this program? Does this make sense?
How could you change
this? Do it. (If you need a hint, take peak here.)
Run DNAPARS on the
modified alignment. Do you get a different tree?
Does it more correspond
to your expectation?
- Open one of the trees you calculated in a text editor, and copy the
tree into treeview (via [ctrl]-C [ctrl]-V). Edit the tree in treeview,
so that it corresponds to expectation (i.e., the mitochondrial sequences
should all group together). Paste this tree, and the trees that resulted
from #3 and #1 into a single textfile. The first line should be a "3"
(i.e. the number of trees), the trees should be seperated from one another
by ";". If this is too cryptic, look at this
file for an example. If you generate your user trees with treeview,
one unfortunate difference in handling names is the treatment of names
that contain numbers. Treeview decides to add '' (single quotes) to
these names. For example a name like Dros_4 turns into .'Dros_4' The
problem is that DNAPARS will not find the sequence 'Dros_4', only Dros_4.
You need to remove the '(quotes) from the edited trees in your text
editor.
Save your trees (feel free to use more than one aditional tree) as a
file called intree, or give it any name you like.
Run DNAPARS on testseq4c.phy
(see question #3). Select the usertree option, and when prompted, enter
the name of your usertree file.
Are the trees in
the usertree file considered to be significantly different? (The answer
should be in outfile.)
- Run DNADIST on the original
dataset. Use the default values and repeat the analysis with at least
two other distance measure (option D -- don't worry, we
will talk about these in more detail, for now all you need to know is
that logdet should be insensitive to compositional bias, and that Jukes-Cantor
is the more simple model of substitution as compared to the Kimura two
parameter model, which is more simple than F84) . If you are sure that
you don't crash the program you can append the distance matrices to
a single file.
- Use FITCH to calculate tree from the distance
matrices you calculated in previous step. Turn on global rearrangements
[option G] and randomization of input order [option J]. (If you append
hte matrices to a single file, you need to use the M-option.) Use the
default options for everything else.
- Use NEIGHBOR to calculate trees from the
distance matrices.
Are the trees from
Fitch and Neighbor different?
Which of the trees
comes closest to you expectation? Why? (i.e., descibe the part of your
expectation that is met by the tree.)
- Run SEQBOOT to generate a file with 100 boostrap
replicates (use default options). (Remember: Rename the outfile!)
- Run DNADIST on the boostrapped dataset one
time with Jukes Cantor (JC) distances [option D]. In order for the program
to read all the bootstrapped samples, you have to tell program to analyze
multiple datasets. [option M]. Output of the program are the distance
matrices in the file named "outfile". Again, do not forget to rename
it! (OK, this was the last warning you get on the renaming business.)
- Use NEIGHBOR to calculate tree from the distance
matrices you calculated in previous step. Turn on randomization of input
order [option J with one replicate]. And do not forget to switch [option
M] for multiple dataset analysis.
- Use CONSENSE to calculate a consensus tree
from the trees generated in the last step.
- Explore the tree in TreeView.
Which branches are
strongly supported? Are there any surprises?
|