MCB 372 / EEB 372 Computer Methods in Molecular
Evolution
Instructors: Peter Gogarten (Office TLS 73,
phone 486-4061, gogarten@uconnvm.uconn.edu)
Paul Lewis
(Office TLS 166A , phone 486-2069, plewis@uconnvm.uconn.edu)
Change
meeting times?
Usually:
Mondays: demonstration and lecture
Wednesdays: exercises and assignments
Requirements and basis for grading:
ASSIGNMENTS -
email results to instructor or hand in hard copy before next Wednesday
Independent
Student Project:
COMPILE, ANALYSE A DATASET OF YOUR CHOICE;
Hand in WRITTEN summary of project
Two written exams
(in-class and take-home portion)
HOMEWORK
AND INDEPENDENT STUDENT PROJECT ARE REQUIRRED OF EVERY PARTICIPANT, REGARDLESS
OF STATUS (credit, audit, or "sit
in")
Class notes and homework assignments will be
available through the www.
(point your www browser
to http://www.sp.uconn.edu/~gogarten/MCB372/
)
Examples for Independent Student Projects:
Evolution
of RubisCO and other enzymes of the Calvin cycle (archaeal genome)
Evolution
of Fructose bis Phosphate synthesizing enzymes (ATP and PP using enzymes)
Evolution
of TUBULINS and TUBULIN related eubacterial and archaeal proteins
Histones
and histone homologues in archaea and eubacteria
Actin
& intermediate filaments
Aminoacyl
tRNA synthases
H+ATPase subunits
(in particular
homologues to sub I, Sub A and B and proteolipid)
P-ATPases
Relation among different ion specificities,
homologues,
orthologues to archaeal P-types?
Inteins-vertical
versus horizontal transmission
The course will cover the following topics:
· databank searches, blast and iterative blast searches
· sequence alignment, sequence alignment using 3-D information,
· statistical analyses of sequence data,
· phylogenetic reconstruction using parsimony -, distance matrix -, maximum likelihood and other methods
· likelihood ratio tests for model selection
· Assessment of confidence using maximum likelihood mapping, bootstrapping, Bremer support/decay index, Bayesian posterior probabilities, and split decomposition
Recommended reading
GENERAL INFORMATION
Most of the programs that run locally should
go to the class folder labeled 372 (CREATE THIS FOLDER NOW).
After class you should backup your personal
files to a ZIP disk or to your UNIX account.
CLASS ACCOUNTS for UNIX (sheets handed out,
don't loose passwords),
recommended: get Student accounts for both mainframe and sp
!!!Warning!!!
The course is designed to teach practical
aspects of molecular data analyses with respect to molecular evolution.
The course does not provide an introduction into molecular
evolution, nor is it a place for extended discussions of the virtues of
cladistic analyses. Students lacking
background in molecular evolution should be aware that they will need to do
extensive reading of molecular evolution textbooks in addition to the coursework.
Sites for databank searches and retrieval:
(build your own tools page e.g.)
This is probably the only site you'll ever need for
databank searches:
******* http://www.ncbi.nlm.nih.gov/ ******
The NCBI maintains several databanks. The entries in each databank are pre-linked to other entries in the same databank and to entries in the other databanks
Medline (PubMed)
Protein
Nucleotide
Structure
Genome
The new interface is great for working with many sequences, and after some getting used to, it can do the same things as the old medline advanced search.
Other Webpages
http://www.ebi.ac.uk/ The
European homologue/analogue to NCBI. I
use this site only for their excellent software archive
The ribosomal databank project is here http://rdpwww.life.uiuc.edu/index2.html,
the old site is here: http://rdpwww.life.uiuc.edu/index2.html
other sites worth to visit:
http://www.tigr.org/tigr_home/index.html
Home
of several "completed" genomes projects
http://genome-www.stanford.edu/
YEAST
and Arabidopsis genome projects
ENTREZ
Medline - DNA - Protein – genome data
banks
Everything already cross linked between the three
databanks.
"Homologous" sequences and papers
(!) one click away (related sequence button)
But CROSSLINKS are updated only slowly
In addition to using the prelinked relation
ships you can search for similar sequences using
BLAST
There are several options: for now we will
only use gapped blast search
On the advanced blast search page you can
select five different blast programs, that perform the following searches:
BLASTP compares an amino acid query sequence
against a
protein
sequence database;
BLASTN compares a nucleotide query sequence
against a
nucleotide
sequence database;
BLASTX compares the six-frame conceptual
translation
products
of a nucleotide query sequence (both
strands)
against a protein sequence database;
TBLASTN compares a protein query sequence
against a
nucleotide
sequence database dynamically
translated
in all six reading frames (both
strands).
TBLASTX compares the six-frame translations
of a nucleo-
tide
query sequence against the six-frame transla-
tions
of a nucleotide sequence database.
DATABANKS if one does not have a specific
aim in mind nr (=non redundant - you wish) is the best choice
In assignment #12 check the different alignment
options on the format page