Your name: Your email address:
Open seaview, and load
Note that these sequences are already aligned.
We want to calculate two likelihood values using Trees — PhyML:
Collaborate with your neighbor to get likelihoods for LG and LG+Gamma (do one each).
The log-likelihood score is shown in the tree building text window in seaview — once the computation is complete it appears as "Log likelihood of the current tree:".
One important condition that has to be fulfilled before one can use a Likelihood Ratio Test (LRT) to compare two models, is that the models should be "nested". This means that the simpler model must be a constrained version of the parameter-rich model.
The likelihood ratio test is performed by doubling the difference in log-likelihood scores and comparing this test statistic with the critical value from a chi-squared distribution having degrees of freedom equal to the difference in the number of estimated parameters in the two models.
The parameter-rich model will always have a better fit, due to the extra parameters and will therefore have the highest log-likelihood, so the difference should be a positive number.
In this case there is 1 degree of freedom — the gamma shape parameter.
online chi-square calculator
to determine the significance of the test.
Does the LG+Gamma model explain the data significantly better than an equal rates LG model? Why?
Open seaview, and load
Note that these sequences are already aligned (and there are no gaps!).
We want to calculate two likelihood values using PhyML:
Again, collaborate with your neighbor to get likelihoods for WAG and WAG+Gamma.
Does the WAG+Gamma model explain the data significantly better than an equal rates WAG model? Why?
to Bayesian Analyses
In class we will use MrBayes3.2 as installed on the bioinformatics cluster. To start the program, move to the directory where your sequence data are located and type mb.
Intro Slides are here
The goal of this exercise is to
learn how to use MrBayes to reconstruct phylogenies.
by example: Identification of sites under positive selection in a protein
Professor Walter M. Fitch and assistant research biologist Robin M. Bush
of UCI's Department of Ecology and Evolutionary Biology, working with researchers
at the Centers for Disease Control and Prevention, studied the evolution of a
prevalent form of the influenza A virus during an 11-year period from 1986 to
1997. They discovered that viruses having mutations in certain parts of an important
viral surface protein were more likely than other strains to spawn future influenza
lineages. Human susceptibility to infection depends on immunity gained during
past bouts of influenza; thus, new viral mutations are required for new epidemics
to occur. Knowing which currently circulating mutant strains are more likely to
have successful offspring potentially may help in vaccine strain selection. The
researchers' findings appear in the Dec. 3 issue of Science magazine.
and his fellow researchers followed the evolutionary pattern of the influenza
virus, one that involves a never-ending battle between the virus and its host.
The human body fights the invading virus by making antibodies against it. The
antibodies recognize the shape of proteins on the viral surface. Previous infections
only prepare the body to fight viruses with recognizable shapes. Thus, only those
viruses that have undergone mutations that change their shape can cause disease.
Over time, new strains of the virus continually emerge, spread and produce offspring
lineages that undergo further mutations. This process is called antigenic drift.
"The cycle goes on and on-new antibodies, new mutants," Fitch said.
research into the virus' genetic data focused on the evolution of the hemagglutinin
gene-the gene that codes for the major influenza surface protein. Fitch and fellow
researchers constructed "family trees" for viral strains from 11 consecutive flu
seasons. Each branch on the tree represents a new mutant strain of the virus.
They found that the viral strains undergoing the greatest number of amino acid
changes in specified positions of the hemagglutinin gene were most closely related
to future influenza lineages in nine of the 11 flu seasons tested.
the family trees of various flu strains, Fitch said, researchers can attempt to
predict the evolution of an influenza virus and thus potentially aid in the development
of more effective influenza vaccines.
The research team is currently expanding
its work to include all three groups of circulating influenza viruses, hoping
that contrasting their evolutionary strategies may lend more insight into the
evolution of influenza.
Along with Fitch and Bush, Catherine A. Bender,
Kanta Subbarao and Nancy J. Cox of the Centers for Disease Control and Prevention
participated in the study.
The goal of this exercise
is to detect sites in hemmagglutinin that are under positive selection.
the analysis takes a very long time to run (several days), here are the saved
results of the MrBayes run:
The original data file is flu_data.paup
. The dataset is obtained from an
article by Yang et al, 2000 . The File used for MrBayes is
The MrBayes block used to obtain results above is:
Selecting a nucmodel=codon
with Omegavar=Ny98 specifies a model in which for
every codon the ratio of the rate of non-synonymous to synonymous substitutions
is considered. This ratio is called OMEGA. The Ny98 model considers three different
omegas, one equal to 1 (no selection, this site is neutral); the second with omega
< 1, these sites are under purifying selection; and the third with Omega >1,
i.e. these sites are under positive or diversifying selection. (The problem of
this model is that the there are only three distinct omegas estimated, and for
each site the probability to fall into one of these three classes. If the omega>1
is estimated to be very large, because one site has a large omega, the other sites
might not have a high probability to have the same omega, even though they might
also be under positive selection. This leads to the site with largest omega to
be identified with confidence, the others have more moderate probabilities to
be under positive selection).
Note : Version 2.0
of Mr Bayes has a model that estimates omega for each site individually,
the new version only allows the Ny98 model as described above..
Type logout to release the compute node from the queue.
If you you encountered problems in your session, check the queue for abandoned sessions using the command qstat.
If there are abandoned sessions under your account, kill them by deleting them from the queue by typing qdel job-ID, e.g. "qdel 40000" would delete Job # 40000
email to your instructor (and yourself) upon submit
email to yourself only upon submit (as a backup)
summary upon submit but do not send email to anyone.