Assignment 9

Your name:
Your email address:

Background:

Problems in finding Open Reading Frames (ORFs) and Coding Sequences (cds) provide a nice example for the failing of first principle approaches:

In higher Eukaryotes the coding sequence is often interrupted by introns. Genes are transcribed into RNA. With the help of so-called spliceosomes introns are removed from the RNA and the exon portions are religated. In Arabidopsis the splice site consensus is as follows (from www.arabidopsis.org/info/splice_site.pdf):

(This table summarizes the sequences surrounding the intron splice sites in the plant Arabidopsis. E.g., in 52.9% of the intron exon boundaries (bottom part) the first base of the exon is a G, and in 40.5% the next nucleotide is a T.)

Given the many introns known in Arabidopsis, and the fact that many of the spliceosomal RNAs have been sequenced, one might expect that given a sequence it would be possible to recognize with high reliability which parts of a sequence are coding. The following exercises will demonstrate that this is not the case.

PART A: ORF predictions using GENESCAN (20 minutes)

This sequence is a fragment of genomic DNA from the genome of the plant Arabidopsis thaliana .

1) Use GENSCAN at :

http://genes.mit.edu/GENSCAN.html

or at

http://genome.dkfz-heidelberg.de/cgi-bin/GENSCAN/genscan.cgi

to predict exons and introns encoded on this piece of genomic DNA.

Paste sequence into the sequence window
Select Arabidopsis or plant as organism
Run GENSCAN
Inspect the graphic output (.pdf file)
Copy and use the predicted peptide to do a blastp search of swissprot.
Inspect the alignments between target and query sequence. Can you locate the exons that were missed by GENSCAN?

Alternatively, you can use either DOTLET or Blast2seq to better visualize where are the missed exons. Pick one of those two programs (take your favorite one :)) and compare the predicted peptide sequence with this one (protein sequence translated from cDNA).

Part B: prediction of transmembrane sequences in a protein (15 minutes)

1. Open an account at the Biologists workbench . The Biology WorkBench is a web-based tool for biologists. The WorkBench allows biologists to search many popular protein and nucleic acid sequence databases. Database searching is integrated with access to a wide variety of analysis and modeling tools, all within a point and click interface that eliminates file format compatibility problems.

2. Import the sequence into the workbench

Log into the workbench
Select PROTEIN tools (note that you can toggle between styles of buttons and menus (3 styles) by clicking on the upper Biology Workbench banner).
Select the program that allows to add a protein sequence (you need to click on run to start the program)
Upload the sequence encoding the bacteriorhodopsin (gi: 15826249) into into the appropriate form field (copy and paste the entire file), and add a label name. Click Update, and then Save.

Bacteriorhodopsin is a membrane protein whose crystal structure is known. If you want to explore the structure go to the pdb ( here ). Search for 1JGJ and glance using the Java QuickPDB applet. Don't spend too much time here, we'll do more protein structures later in this course.

3. Perform some analyses with bacteriorhodopsin sequence

In the biologist's workbench, select the bacteriorhodopsin molecule and run the programs GREASE and TMHMM on this sequence (Both programs predict the transmembrane spanning helices regions of a sequence). Compare the results with the annotation in the genbank formatted file. How well does the prediction of membrane spanning helices work?

Part C: Restriction site analyses (only if time permits)

The sequence here is a piece of genomic DNA from an Archaeon that encodes part of an ATPsynthase operon. Pretend that you want to amplify the A-subunit (= catalytic subunit) encoding fragment with two primers, and that subsequently you want to clone this fragment into an expression vector. Which restriction sites could you incorporate at the end of your primers? The idea is to digest the amplified product with the two restriction enzymes, and thus be able to clone the amplification product into the vector in a known orientation (i.e.: the enzymes you select should NOT cut the amplified fragment).
The expression vector has the following unique cloning sites (in order, the 1st is closest to the promoter)
BamHI, KpnI, HindIII, PstI and SmaI. Which of these could you utilize in your primers?

To answer the question you first need to find the region of the DNA that encodes the A-subunit. One way is to use the NCBI's ORF finder. You could also use programs from the workbench.

Load the sequence in ORF finder ( here ) and into the workbench. Determine from where to where the A-subunit is encoded - take a note.
Perform a restriction site analyses (Nucleic Acid Tools, TAGC -program)
enter the appropriate parameters in the form (think twice), and run the program
try to find the pertinent information in the output

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone.