CLASS 18. Cladistics. Types of Homology. Mutations and Substitutions.

HOMEWORK READING:

Ch.10 (textbook);

Section 7.1, 7.2, 8.1 (supp.textbook);

Fitch, 2000 (Moodle)

Terminology from Cladistics

Cladistics is a form of biological systematics that classifies species of organisms into hierarchical monophyletic groups [Wikipedia]. Willi Hennig is considered to be the father of cladistics. The goals of cladistics is natural taxonomy, i.e. classification based on evolutionary relationships. [More on Biological Classification].

The term cladogram refers to a strictly bifurcating diagram, where each clade is defined by a common ancestor that only gives rise to members of this clade. I.e., a clade is monophyletic (derived from one ancestor) as opposed to polyphyletic (derived from many ancestors).

Fig. Differences between a monophyletic and non-monophyletic groups


A clade is recognized and defined by shared derived characters (= synapomorphies [derived from the Greek words syn = with, in company with, together with; apo = away from; and morphe = shape.]). Shared primitive characters (= sympleisiomorphies) do not define a clade.

  • Example of synapomorphy that defines Diptera clade: Halteres

To use these terms you need to have polarized characters; for most molecular characters you don't know which state is primitive and which is derived.

Related terms:

autapomorphy = a derived character that is only present in one group; an autapomorphic character does not tell us anything about the relationship of the group that has this character ot other groups.

homoplasy = a derived character that was derived twice independently (convergent evolution). Note that the characters in question might still be homologous (e.g. a position in a sequence alignment, front limbs turned into wings in birds and bats).

paraphyletic = a taxonomic group that is defined by a common ancestor, however, the common ancestor of this group also has decendants that do not belong to this taxonomic group. Many systematists despise paraphyletic groups (and consider them to be polyphyletic). Examples for paraphyletic groups are reptiles and protists. Many consider the archaea to be paraphyletic as well.

Fig. Example of a paraphyletic group


Gene Trees vs. Species Trees

Why could a gene tree be different from the species tree?

  • Lack of resolution
  • Lineage sorting
  • Gene duplications/gene loss
  • Missing Data
  • Horizontal gene transfer
  • Fig. Polytomy


    Fig. Illustration of lineage sorting (after Fig. 2.24 in Page and Holmes, Molecular Evolution, Blackwell, 1998)


    Fig. 7.10 [Source]. The evolutionary history of a gene that has undergone two independent duplication events


    Fig. 7.11 [Source]. The effects of gene loss and missing gene data on phylogenetic trees


    Fig. Example of horizontal gene transfer to cyanobacteria: threonyl tRNA synthetase.
    (Fig. 5 in Zhaxybayeva et al, 2006)


    Types of Homology

    Homology: Two sequences are homologous, if there existed an ancestral molecule in the past that is ancestral to both of the sequences.

    Evolution of protein families: Homology (shared ancestry) versus Analogy (convergent evolution)

    Orthology: bifurcation in molecular tree reflects speciation
    Paralogy: bifurcation in molecular tree reflects gene duplication
    Xenology: gene was obtained by organism through horizontal transfer
    Synology: genes ended up in one organism through fusion of lineages.

    the "-logs" are often spelled with "ue" ending. E.g., "orthologues".

    Orthologs: bifurcation in molecular tree reflects speciation. These are the molecules people interested in the taxonomic classification of organisms want to study.

    Paralogs: bifurcation in molecular tree reflects gene duplication. The study of paralogs and their distribution in genomes provides clues on the way genomes evolved. Gene and genome duplication have emerged as the most important pathway to molecular innovation, including the evolution of developmental pathways.

    Xenologs: gene was obtained by organism through horizontal transfer. The classic example for Xenologs are antibiotic resistance genes, but the history of many other molecules also fits into this category: inteins, selfsplicing introns, transposable elements, ion pumps, other transporters, etc. etc.

    Synologs: genes ended up in one organism through fusion of lineages. The paradigm are genes that were transferred into the eukaryotic cell together with the endosymbionts that evolved into mitochondria and plastids.

    Mutations and Substitutions

    Mutations are errors in DNA replication or DNA repair. A point mutation is mutation that affects a single nucleotide. Type of change caused by mutation:

    • Replacement of one nucleotide by another (substitution)
    • Recombination
    • Deletion
    • Insertion
    • Inversion

    Nucleotide substitutions can be further classified into:

    • transitions (substitution between A and G [purines] or between C and T [pyrimidines])
    • transversions (substitutions leading to change from purine to pyrimidine and vice versa)

    If nucleotide substitutions occur in protein-coding genes they can be:

    • synonymous, or silent (cause no amino acid change)
    • non-synonymous (otherwise), which are further classified into missense and non-sense ones.
    (see Genetic Code).

    Deletions and insertions in protein-coding genes may result in frameshift mutations.

    Measuring Genetic Change

    Given two aligned sequences, one question we can ask is "How much evolutionary change has occurred between these sequences?"

    The simplest measure of evolutionary distance is so-called p-distance:

    p = D/L,

    where D is number of sites where sequences differ and L is alignment length.

    Just counting the number of sites at which the two sequences are different is too simplistic:

    Fig. Kinds of nucleotide substitutions (after fig.5.9 in Page and Holmes, Molecular Evolution, Blackwell, 1997).


    Fig. Observed vs. Expected number of DNA substitutions. As time since divergence increases, multiple substitutions start to occur, making number of visible substitutions smaller than the number of actual ones. Eventually, after long-long time there will be substitutions at every site. Two random sequences with equal frequencies of base pairs will differ on average in 3/4 of sites. Correction is required to compensate for the difference in observed and expected number of substitutions.


    Poisson Distance Correction

    Let's make an assumption that the probability of a substitution at a site follows a Poisson distribution and rate of substitution r is uniform per site per time unit. After time t, the average number of substitutions per site will be rt.

    The probability of n substitutions occurring at a site over time period t is f(rt,n)=e-rt(rt)n/n!.

    Let's look at two sequences that diverged from their common ancestor time t ago. The probability of n=0 substitutions occurring at a site is e-rt (see formula above) for each sequence and e-2rt for both of them. And e-2rt=(1-p), where p is a p-distance.

    Under assumption that two sequences evolved independently from their ancestor, evolutionary distance between them is d=2rt. => 1-p=e-d => d=-ln(1-p).

    Fig. p-distance vs. Poisson-corrected p-distance