CLASS 27. ASRV. Consensus Trees. Exploration of Tree Space. Parsimony. PHYLIP.

HOMEWORK READING:

Ch. 10,11 (textbook);

Ch. 8 (supp. textbook);

Yang, 1996 (Moodle);

Among Site Rate Variation

Different regions of DNA or protein sequence may have different probabilities to undergo change. Two ways to deal with this problem mathematically:

  • Allow only a fraction of sites to vary: X% of alignments sites are free to vary, whole remaining (100-X)% have probability zero to undergo change.
  • Use Gamma distribution to describe how rates vary across sites:

Fig. Gamma distribution.


In phylogenetics, Gamma distribution is usually approximated by a discrete distribution (i.e. several rate categories are used, usually 4 or 8). If Gamma distribution is used in the model, it is designated by symbol Γ (e.g., 'HKY85+Γ'). For more information, see Z. Yang (1996) review (posted to Moodle).

Fig. Estimated shape parameters for various genes (from Yang, 1996).


ML methods with among site rate variation deal better with LBA artifact than parsimony and distance analyses (data)

Consensus Trees

  • Strict Consensus Tree: only those clusters present in all of the trees will be present in the consensus tree.
  • Majority Rule Consensus Tree: only those clusters present in the specified fraction (>0.50) of the trees will be present in the consensus tree (most often type of consensus tree used in bootstrap analyses.
Fig. 7.6 [Source]. Consensus trees show features that are consistent between trees.


Fig. 7.5 [Source]. Collapsing branches not supported by large fraction of bootstrap samples.


Exploring Tree Space

Tree Space is vast and therefore if you have more than 20 sequences, it is not feasible to find an optimal tree through exhaustive search of tree space.

Heuristic Strategy:

  • Start with a tree (often a NJ tree, can be a randomly chosen tree)
  • Rearrange the tree and assess if it is better one (according to some "score" based on tree's relation to the data)
  • Continue rearranging the tree, keeping only better rearrangements ("hill climbing")
Fig. 8.10 [Source]. Tree Landscape Illustration.


How do trees get rearranged (illustration):

  • Nearest Neighbor Interchange (NNI)
  • Subtree Pruning and Regrafting (SPR)
  • Tree Bisection and Reconnection (TBR)

Parsimony

Data for parsimony is individual sites in the alignment and number of changes for all sites are added up to get a score of a tree (unweighted parsimony). Most parsimonious tree is the tree that requires smallest number of evolutionary changes (has the smallest score).

Fig. 8.15 [Source]. Weight Matrices for Parsimony.


Fig. 8.13 [Source]. Example of most parsimonious assignment of ancestral states for a 6-taxon tree.


Sites can be:

  • informative (sites that discriminate alternative trees)
  • uninformative (sites that do not discriminate alternative trees)

Concept of informative and uninformative sites applies only to parsimony analyses

Example:

		sequence1 AATGC
		sequence2 AATCA
		sequence3 GATCA
		sequence4 GACCC
		

Objections to parsimony:

  • Why should evolution be most parsimonious?
  • Parsimony ignores branch lengths
  • Parsimony is prone to long branch attraction (LBA).

PHYLIP



PHYLIP (the PHYLogeny Inference Package) is an open-source package of programs for inferring phylogenies (evolutionary trees), distributed and maintained by Joe Felsenstein and collaborators.
  • PHYLIP is the oldest widely-distributed package. It has been in distribution since October 1980, and has over 28,000 registered users.
  • Contains 35 command-line programs (modules)
  • All programs are controlled through a text menu
  • Output is written onto special files with names like "outfile" and "outtree".
  • PHYLIP is very well documented: official and 3rd party manuals.