CLASS 6. Introducing Annotathon.

What is Annotathon?

HOMEWORK READING:
Annotathon Rule Book

pp. 12-32, 53-55 ( in "The New Science of Metagenomics" book, Moodle)

Gross 2007 (Moodle)

WATCH VIDEO: Global Ocean Sampling expedition

Annotathon is a training environment in which we will apply some of the tools we learn in the class to the real data in need of annotation (see below). Our team name is "MountA Mounties". Use the provided in class TEAM CODE to register (if you miss this class, e-mail me for the magic word).

The goal is to gather as much information about a piece of DNA sequence as possible:

  • ORF finding
  • Homolog hunting: BLAST
  • Multiple sequence alignments
  • Phylogenetic analysis
  • Ontologies (molecular function and biological process)
  • Conserved domains
  • Taxonomic Classification

Global Ocean Sampling Expedition

In Spring 2003, an oceanographic expedition aboard of Sorcerer II was led by J. Craig Venter.
Fig. Sorcerer II. [Source]


Ocean surface waters were sampled along their voyage:

Fig. Sampling Sites. "A total of 41 different samples were taken from a wide variety of aquatic habitats collected over 8,000 km." (Figure 1 from Rusch et al. 2007. PMID: 17355176)

The goal of the project was to explore the diversity of microbes in the ocean without culturing them first (99% of microorganisms cannot be cultured in the laboratory, an observation known as "the great plate count anomaly" [Staley and Konopka, 1985]). In 2007 the results were published in PLoS Biology Special Issue. 6.25 Gbp of data was generated from 0.8- to 0.1-um size fraction (this fraction represents mostly prokaryotic [Bacteria and Archaea] populations) and it remains the largest single data set of environmental sequences. The data consists of individual pieces (reads, each on average 822 bp long) and is not curated. The data is hosted in its own database, CAMERA.

Some tidbits about the data set:
  1. very large (approximately twice the size of human genome)
  2. data from different organisms is lumped together
  3. individual reads are mostly too short to contain whole genes

Data from Global Ocean Sampling Expedition (GOS) is an example of metagenomics data set. Meta is Greek for "transcendent". In metagenomics, communities of microbes are studied through direct sampling of the environment, transcending needs of individual organisms and circumventing unculturability of most organisms (see homework reading for more details).