CLASS 6. Introducing Annotathon.
What is Annotathon?
Annotathon Rule Book
pp. 12-32, 53-55 ( in "The New Science of Metagenomics" book, Moodle)
Gross 2007 (Moodle)
WATCH VIDEO: Global Ocean Sampling expedition
The goal is to gather as much information about a piece of DNA sequence as possible:
- ORF finding
- Homolog hunting: BLAST
- Multiple sequence alignments
- Phylogenetic analysis
- Ontologies (molecular function and biological process)
- Conserved domains
- Taxonomic Classification
Global Ocean Sampling Expedition
In Spring 2003, an oceanographic expedition aboard of Sorcerer II was led by J. Craig Venter.Ocean surface waters were sampled along their voyage:
The goal of the project was to explore the diversity of microbes in the ocean without culturing them first (99% of microorganisms cannot be cultured in the laboratory, an observation known as "the great plate count anomaly" [Staley and Konopka, 1985]). In 2007 the results were published in PLoS Biology Special Issue. 6.25 Gbp of data was generated from 0.8- to 0.1-um size fraction (this fraction represents mostly prokaryotic [Bacteria and Archaea] populations) and it remains the largest single data set of environmental sequences. The data consists of individual pieces (reads, each on average 822 bp long) and is not curated. The data is hosted in its own database, CAMERA.
Some tidbits about the data set:- very large (approximately twice the size of human genome)
- data from different organisms is lumped together
- individual reads are mostly too short to contain whole genes
Data from Global Ocean Sampling Expedition (GOS) is an example of metagenomics data set. Meta is Greek for "transcendent". In metagenomics, communities of microbes are studied through direct sampling of the environment, transcending needs of individual organisms and circumventing unculturability of most organisms (see homework reading for more details).