Your name: Your email address:
Background:
Problems in finding Open Reading Frames (ORFs) and Coding Sequences (cds) provide a nice example for the failing of first principle approaches:
In higher Eukaryotes the coding sequence is often interrupted by introns. Genes are transcribed into RNA. With the help of so-called spliceosomes introns are removed from the RNA and the exon portions are religated. In Arabidopsis the splice site consensus is as follows (from www.arabidopsis.org/info/splice_site.pdf):
(This table summarizes the sequences surrounding the intron splice sites in the plant Arabidopsis. E.g., in 52.9% of the intron exon boundaries (bottom part) the first base of the exon is a G, and in 40.5% the next nucleotide is a T.)
Given the many introns known in Arabidopsis, and the fact that many of the spliceosomal RNAs have been sequenced, one might expect that given a sequence it would be possible to recognize with high reliability which parts of a sequence are coding. The following exercises will demonstrate that this is not the case.
This sequence is a fragment of genomic DNA from the genome of the plant Arabidopsis thaliana .
1) Use GENSCAN at :
http://genes.mit.edu/GENSCAN.html
or at
http://genome.dkfz-heidelberg.de/cgi-bin/GENSCAN/genscan.cgi
to predict exons and introns encoded on this piece of genomic DNA.
Alternatively, you can use either DOTLET or Blast2seq to better visualize where are the missed exons. Pick one of those two programs (take your favorite one :)) and compare the predicted peptide sequence with this one (protein sequence translated from cDNA).
1. Open an account at the Biologists workbench . The Biology WorkBench is a web-based tool for biologists. The WorkBench allows biologists to search many popular protein and nucleic acid sequence databases. Database searching is integrated with access to a wide variety of analysis and modeling tools, all within a point and click interface that eliminates file format compatibility problems.
2. Import the sequence into the workbench
Bacteriorhodopsin is a membrane protein whose crystal structure is known. If you want to explore the structure go to the pdb ( here ). Search for 1JGJ and glance using the Java QuickPDB applet. Don't spend too much time here, we'll do more protein structures later in this course.
3. Perform some analyses with bacteriorhodopsin sequence
The sequence here is a piece of genomic DNA from an Archaeon that encodes part of an ATPsynthase operon. Pretend that you want to amplify the A-subunit (= catalytic subunit) encoding fragment with two primers, and that subsequently you want to clone this fragment into an expression vector. Which restriction sites could you incorporate at the end of your primers? The idea is to digest the amplified product with the two restriction enzymes, and thus be able to clone the amplification product into the vector in a known orientation (i.e.: the enzymes you select should NOT cut the amplified fragment). The expression vector has the following unique cloning sites (in order, the 1st is closest to the promoter) BamHI, KpnI, HindIII, PstI and SmaI. Which of these could you utilize in your primers?
To answer the question you first need to find the region of the DNA that encodes the A-subunit. One way is to use the NCBI's ORF finder. You could also use programs from the workbench.
Send email to your instructor (and yourself) upon submit Send email to yourself only upon submit (as a backup) Show summary upon submit but do not send email to anyone.