MCB 372 - CLASS 12

Questions and comments regarding last Wednesday's class

Sequences with gaps deleted are here, same without prokaryotes are here.

Results from calculation of distance matrix are here

Neighbor joining tree from these distances w/o prokaryotes is here; with prokaryotic sequences is here

Eight usertrees (somewhat modified according to justified expectation) are here

The evaluation of the usertrees is here (note 6, 7 and 8 are unresolved trees)

A test for the molecular clock using the ml ratio test is here.
Note that the topology in the usertree within the plants is probably incorrect (several branches are close to 0).
Note the currently official version of puzzle does not allow you to put the the root in a user defined location (the option is there, but the root goes on the branch leading to the first species. The analysis given here as an example was calculated with a version of puzzle provided by one of the authors (Heiko Schmidt).

Student presentations #4 and #5:

Among Site Rate Variation
Estimating number of substitutions

 

Application of ML mapping to comparative Genome analyses

A recent article on the use of ml mapping in comparative genome analyses is here. (Go through Fig1, 2, 3, 4, 7, and Tab. 4)

 

Automation of Repetitive Tasks

SEALS demo (SSH client PuTTY is available here or here).

 

Protein Data Bank at Research Collaboratory for Structural Bioinformatics (RCSB)

Protein Data Bank (PDB) is a public collection of three-dimensional structures of macromolecular complexes experimentally determined by X-ray crystallographers and NMR spectroscopists.

There are three ways to search the PDB:

  • using the PDB ID codes
  • using SearchLite
  • using SearchFields.

PDB ID code is a unique four-character alphanumerical code assigned to every structure in the databank. The characters in the code might be numbers 0-9, and the uppercase letters from A to Z. The PDB ID codes are often reported in the articles as a reference to the structure of a biomolecule.

Search Lite allows you to search databank using the keywords about the biomolecule you are interested in. The full list of attributes as well as the examples of search queries are given here.

And SearchFields allows you to create very sophisticated and customized queries. For example, it allows to use primary sequence in FASTA format to search for structures.

The data in the Protein Data Bank is stored in the special format, so called PDB format. If you are interested in the format specification, you can click here. This is a format that SPDBV (and other visualization programs) reads. It contains information about position of every atom in a molecule, as well as some auxilary information such as citations and comments.