MCB 372 / EEB 372 Computer Methods in Molecular Evolution

MCB 371 / EEB 371 Current Topics in Molecular Evolution

Organizational Meeting and Class 1

Instructors: Peter Gogarten (Office TLS 73, phone 486-4061, gogarten@uconn.edu)
Paul Lewis (Office TLS 166A , phone 486-2069, paul.lewis@uconn.edu)
Teaching Assistant: Olga Zhaxybayeva (Office TLS 75, phone 486-3686, olga.zh@uconn.edu)

Usually:

Mondays: demonstrations, 20 minute student presentation, discussions and lecture
Wednesdays: exercises and assignments

Requirements and basis for grading:

  1. 20% participation and assignments -
    email results to instructor/TA or hand in hard copy before next Wednesday
  2. 20% Independent Student Project:
    COMPILE, ANALYSE A DATASET OF YOUR CHOICE;
    Hand in WRITTEN summary of project, 15 minute oral summary during last week of classes
  3. 20% each: Two written exams (in-class and take-home portion, or take home portion only)
  4. 20% One 15 Minute presentation in Monday section.

    HOMEWORK, INDEPENDENT STUDENT PROJECT, AND PRESENTATION ARE REQUIRRED OF EVERY PARTICIPANT, REGARDLESS OF STATUS (credit, audit, or "sit in")
Class notes and homework assignments will be available through the www at http://www.sp.uconn.edu/~gogarten/MCB372/ or at http://carrot.mcb.uconn.edu/mcb372/ )

Examples for Independent Student Projects:

  • Evolution of RubisCO and other enzymes of the Calvin cycle (archaeal genome)
  • Evolution of Fructose bis Phosphate synthesizing enzymes (ATP and PP using enzymes)
  • Evolution of TUBULINS and TUBULIN related eubacterial and archaeal proteins
  • Histones and histone homologues in archaea and eubacteria
  • Actin & intermediate filaments
  • Aminoacyl tRNA synthases
  • H+ATPase subunts (in particular homologues to sub I, Sub A and B and proteolipid)
  • P-ATPases: Relation among different ion specificities, homologues, orthologues to archaeal P-types?
  • Inteins-vertical versus horizontal transmission
  • Origin of the photsynthetic machinery
  • Evolution of phycobilisomes
  • Evolution of cyanobacteria

Please don't hesitate to discuss your choice of project with the instuctors or with the TA.
The earlier you decide on a topic the better!

The course will cover the following topics:

  • databank searches, BLAST and iterative BLAST searches,
  • sequence alignment, sequence alignment using 3-D information,
  • statistical analyses of sequence data,
  • phylogenetic reconstruction using parsimony -, distance matrix -, maximum likelihood, bayesian inference and other methods
  • likelihood ratio tests for model selection
  • Assessment of confidence using maximum likelihood mapping, bootstrapping, Bremer support/decay index, Bayesian posterior probabilities, and split decomposition

    The following programs will be used:

    NCBI webpages, ClustalX, Swiss Protein Data Bank Viewer, Phylip, TREE-PUZZLE, PAUP*, MrBayes

GENERAL INFORMATION

After class you should backup your personal files to a ZIP disk or on to a mainframe or UNIX account.

Having an account on a UNIX machine (e.g., SP) might be a good idea, but is not required.

!!!Warning!!!

The course is designed to provide hands on experience in molecular data analyses with respect to molecular evolution. Past experiences with unprepaired students compelled us to offer this this course in conjunction with a lecture/seminar style course that explores the theoretical fondations of the different applications.  Every student is required to show up for both the lab exercises and the lectures and seminars!
The course does not provide an overview on computer applications in molecular biology in general nor does it attempt to provide an overview on Bioinformatics. The focus is on different aspects of molecular evolution, while discussions among students are encouraged, this is not a seminar to expound extensively on the virtues of cladistic analyses.  

Students will need to spend time in the computer lab in addition to the scheduled class period, and students that did not have prior exposure to methods used in phylogenetic reconstruction will need to spend some time reading and studying.

 

Sites for databank searches and retrieval:
        (build your own tools page with programs you frequently usee e.g.)

This is probably the only site you'll ever need for databank searches:
       *******
http://www.ncbi.nlm.nih.gov/ ******

The NCBI maintains several databanks.  The entries in each databank are pre-linked to other entries in the same databank and to entries in the other databanks

Medline (PubMed), including books
Protein
Nucleotide
Structure
Genome
Taxonomy

Other Webpages

http://www.ebi.ac.uk/  
The European homologue/analogue to NCBI.  I use this site for their excellent software archive.

http://rdpwww.life.uiuc.edu/index2.html
    The US ribosomal databank project, the old version is here: http://rdpwww.life.uiuc.edu/index2.html

http://www-rrna.uia.ac.be/rrna/index.html
     The European ribosomal RNA databank

http://www.jgi.doe.gov/JGI_microbial/html/index.html
    Microbial genomes at the DOE joint genome institute

http://www.tigr.org/tigr_home/index.html
        Home of several "completed" genomes projects

http://genome-www.stanford.edu/
        YEAST and Arabidopsis genome projects

http://www.ncbi.nlm.nih.gov:80/PMGifs/Genomes/micr.html
        List of completed genomes at the NCBI

 

ENTREZ

Medline - DNA - protein genome data banks - protein structures - books

Everything already cross linked between the three databanks.

"Homologous" sequences and papers (!) one click away (related sequence / related medline buttons)

Warning: Sometimes CROSSLINKS are updated only slowly. Links of papers to sequences often never make it into the databanks

In addition to using the prelinked relationships you can search for similar sequences at the NCBI's site; however, often this is not necessary.

A sample genbank formated entry is here. Explore the meaning of the different links.

Other formats that are frequently used and notes on the different alphabets are here.

Assignments #1