MCB 372 / EEB 372 Computer Methods in Molecular Evolution

Instructors: Peter Gogarten (Office TLS 73, phone 486-4061, gogarten@uconnvm.uconn.edu)

                    Paul Lewis (Office TLS 166A , phone 486-2069, plewis@uconnvm.uconn.edu)

 

Change meeting times? 

 

Usually:

Mondays: demonstration and lecture

Wednesdays: exercises and assignments

Requirements and basis for grading:

ASSIGNMENTS - email results to instructor or hand in hard copy before next Wednesday

Independent Student Project:
COMPILE, ANALYSE A DATASET OF YOUR CHOICE;
Hand in WRITTEN summary of project

Two written exams (in-class and take-home portion)

HOMEWORK AND INDEPENDENT STUDENT PROJECT ARE REQUIRRED OF EVERY PARTICIPANT, REGARDLESS OF STATUS (credit, audit, or "sit in")

Class notes and homework assignments will be available through the www.
         (point your www browser to http://www.sp.uconn.edu/~gogarten/MCB372/ )

Examples for Independent Student Projects:

          Evolution of RubisCO and other enzymes of the Calvin cycle (archaeal genome)

          Evolution of Fructose bis Phosphate synthesizing enzymes (ATP and PP using enzymes)

          Evolution of TUBULINS and TUBULIN related eubacterial and archaeal proteins

          Histones and histone homologues in archaea and eubacteria

          Actin & intermediate filaments

          Aminoacyl tRNA synthases

H+ATPase subunits
                    (in particular homologues to sub I, Sub A and B and proteolipid)

          P-ATPases Relation among different ion specificities,
                              homologues, orthologues to archaeal P-types?

          Inteins-vertical versus horizontal transmission

The course will cover the following topics:

· databank searches, blast and iterative blast searches

· sequence alignment, sequence alignment using 3-D information,

· statistical analyses of sequence data,

· phylogenetic reconstruction using parsimony -, distance matrix -, maximum likelihood and other methods

· likelihood ratio tests for model selection

· Assessment of confidence using maximum likelihood mapping, bootstrapping, Bremer support/decay index, Bayesian posterior probabilities, and split decomposition

         

Recommended reading

a) Roderic D. M. Page, Edward C. Holmes: Molecular Evolution : A Phylogenetic Approach
Blackwell Science Inc; ISBN: 0865428891 -  currently on back order. 
This book gives an excellent introduction to terms, methods, and problems in molecular evolution.  It does not contain too many details on individual algorithm, but it provides a very readable overview. 

b) Li: Molecular Evolution – well written, gives good background, especially with respects to the biological aspects of evolution. 

Graur and Li: Fundamentals of Molecular Evolution, Second Edition – more data on algorithms, but fewer examples

Hillis, Moritz, and Mable: Molecular Systematics, Second Edition – A lot of data and information on how to make trees. It does not make for an easy reading experience.  Only recommended to look things up in more detail. 

 

GENERAL INFORMATION

Most of the programs that run locally should go to the class folder labeled 372 (CREATE THIS FOLDER NOW).

After class you should backup your personal files to a ZIP disk or to your UNIX account.

CLASS ACCOUNTS for UNIX (sheets handed out, don't loose passwords),
recommended: get Student accounts for both mainframe and sp

!!!Warning!!!

The course is designed to teach practical aspects of molecular data analyses with respect to molecular evolution.  
The course does
not provide an introduction into molecular evolution, nor is it a place for extended discussions of the virtues of cladistic analyses.   Students lacking background in molecular evolution should be aware that they will need to do extensive reading of molecular evolution textbooks in addition to the coursework. 

 

Sites for databank searches and retrieval:
        (build your own tools page e.g.)

This is probably the only site you'll ever need for databank searches:
       *******
http://www.ncbi.nlm.nih.gov/ ******

The NCBI maintains several databanks.  The entries in each databank are pre-linked to other entries in the same databank and to entries in the other databanks

Medline (PubMed)
Protein
Nucleotide
Structure
Genome

The new interface is great for working with many sequences, and after some getting used to, it can do the same things as the old medline advanced search.

Other Webpages

http://www.ebi.ac.uk/   The European homologue/analogue to NCBI.  I use this site only for their excellent software archive

The ribosomal databank project is here http://rdpwww.life.uiuc.edu/index2.html,
         the old site is here: http://rdpwww.life.uiuc.edu/index2.html

other sites worth to visit:

http://www.tigr.org/tigr_home/index.html

                    Home of several "completed" genomes projects

http://genome-www.stanford.edu/

                    YEAST and Arabidopsis genome projects

 

ENTREZ

Medline - DNA - Protein – genome data banks

Everything already cross linked between the three databanks.

"Homologous" sequences and papers (!) one click away (related sequence button)

But CROSSLINKS are updated only slowly

 

In addition to using the prelinked relation ships you can search for similar sequences using

BLAST

There are several options: for now we will only use gapped blast search

On the advanced blast search page you can select five different blast programs, that perform the following searches:

BLASTP compares an amino acid query sequence against a

         protein sequence database;

BLASTN compares a nucleotide query sequence against a

         nucleotide sequence database;

BLASTX compares the six-frame conceptual translation

         products of a nucleotide query sequence (both

         strands) against a protein sequence database;

TBLASTN compares a protein query sequence against a

         nucleotide sequence database dynamically

         translated in all six reading frames (both

         strands).

TBLASTX compares the six-frame translations of a nucleo-

         tide query sequence against the six-frame transla-

         tions of a nucleotide sequence database.

DATABANKS if one does not have a specific aim in mind nr (=non redundant - you wish) is the best choice

In assignment #12 check the different alignment options on the format page

Goto Assignments #1: