Your name: Your email address:
(20 minutes) Gene Identification Exercise A (Prokaryotic genomic DNA)
For the following sequences from a prokaryote (the archaeon Thermoplasma acidophilum ), identify possible Open Reading Frames using ORF-finder at the NCBI at http://www.ncbi.nlm.nih.gov/gorf/gorf.html
Go to the IMG server (link). In this exercise, we will compare portions of genomes from different bacterial species to verify if they all have the same order for the genes encoding the ATP synthase subunits. First, click on the find genes tool bar. In the Keyword window, type in ATP synthase and select Thermotoga maritima in the organism list. The resulting search displays all subunits that are part of the complete ATP synthase (except the Flagellum-specific ATP synthase which is part of the bacterial flagellum assembly).
4) Background:
Problems in finding Open Reading Frames (ORFs) and Coding Sequences (cds) provide a nice example for the failing of first principle approaches:
In Eukaryotes the coding sequence is often interrupted by introns. Genes are transcribed into RNA. With the help of so-called spliceosomes introns are removed from the RNA and the exon portions are religated. In Arabidopsis the splice site consensus is as follows (from www.arabidopsis.org/info/splice_site.pdf):
(This table summarizes the sequences surrounding the intron splice sites in the plant Arabidopsis. E.g., in 52.9% of the intron exon boundaries (bottom part) the first base of the exon is a G, and in 40.5% the next nucleotide is a T.)
Given the many introns known in Arabidopsis, and the fact that many of the spliceosomal RNAs have been sequenced, one might expect that given a sequence it would be possible to recognize with high reliability which parts of a sequence are coding. The following exercises will demonstrate that this is not the case.
This sequence is a fragment of genomic DNA from the genome of the plant Arabidopsis thaliana .
1) Use GENSCAN at :
http://genes.mit.edu/GENSCAN.html
or at
http://genome.dkfz-heidelberg.de/cgi-bin/GENSCAN/genscan.cgi
to predict exons and introns encoded on this piece of genomic DNA.
Alternatively, you can use either DOTLET or Blast2seq to better visualize where are the missed exons. Pick one of those two programs (take your favorite one :)) and compare the predicted peptide sequence with this one (protein sequence translated from cDNA).
Check the appropriate radio button below before pressing the submit button:
Send email to your instructor (and yourself) upon submit Send email to yourself only upon submit (as a backup) Show summary upon submit but do not send email to anyone.