Your name: Your email address:
1. (20 minutes) Go to the ENTREZ genome section . Select the genome from Aeropyrum pernix. (click on genome under the appropriate domain listed in the bar on the right, in the table select the link to the right hand of the species name. Selecting the species name itself, will bring you back to the taxonomy browser.)
Explore the different genome views:
2. (20 Minutes) Select a microbial genome, and a question to address using TAX PLOT (see below for inspiration). (In this plot each circle represents a single protein from the query genome, plotted by its BLAST scores to the highest scoring protein from each of the selected organisms. THINK ABOUT THIS FOR A SECOND before you start clicking. What does it mean if an ORF is plotted close to one of the axies? Symmetrical hits are shown as diamonds.) Click on the protein(s) of interest or enter a query string to see the homologs in two chosen organisms.) Select two reference genomes appropriate for your question (see below for examples). Change the zoom when you click at the graphic. Select a different function, then click compare. A) Your question: B) Your genome: C) Your two reference genomes: D) Which candidate genes did you find?:
For example:
What do the two coordinates represent? What are the individual dots? If substitutions were fixed in the different genes in a clock like fashion if there were no Horizontal Gene Transfer, where would all the ORFs end up?
3. (20 Minutes) We already used the geneplot program at the NCBI during an earlier class. This program is accessible from each genome listed at http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi . (Click on the NCnnnnn link), or at http://www.ncbi.nlm.nih.gov/sutils/geneplot.cgi?tax1=224326&tax2=290434 .
The latest incarnation of this program only plots the best reciprocal hits in the two genomes. This is of utility if one wants to compare two closely related genomes and one is only interested in genes that didn't undergo gene duplications. However, if one is interested in duplications of parts of genomes, or if one is hunts for repetitive elements, the restriction to mutual top scoring hits is not advisable. The EMU server < http://emu.imb.uq.edu.au/ >, at present at the University of Queensland in Brisbane Australia, maintained by Robert Beiko and Mark Ragan and initiated by Robert Charlesbois, provides many interesting tools for comparative genomics. Among others, it allows to calculate geneplots, where every BLAST hit is plotted, and the quality of the blast hit is given on a scale from 0 (bad) to 1 (identical sequences). You can use this number to encode the intensity of the point plotted (see here for an example comparing two Pyrococcus genomes). Sadly, we don't have a plotting program available on the iMacs that could easily handle the output generated by the EMU gene plot program < http://emu.imb.uq.edu.au/bioinf10.php >. This Program produces a long list, where the 1st number gives the number of the ORF in genome 1 (query) and the 2nd number gives the ORFnumber of the match and the 3 rd number gives a measure of the quality of the match (for more info read the "more information ...").
Select two similar genomes (the Borrelia garnii and B. burgdorferi provide an excellent example).
First, do the comparison using the NCBI gene plot (leave the window open, we will use it later to help identify genes), then repeat the analyses on the EMU site. The output consists of a long list of numbers. (How exactly this looks depends on your browser. IE5 looks best, Safari and Firefox are thrown off by some of the html code. In case you use firefox, copy the whole list and paste it into MS word, replace "," with "^t", and "<br>" with "^p" and delete all spaces (use EDIT - Replace all), and save the file as a text file.
Open MS Excel, open a new, empty workbook. Go to DATA - IMPORT EXTERNAL DATA - IMPORT TEXT FILE and select the file you just modified in Word.
Once you have the data in Excel, highlight the first three columns, and SORT the columns based on the entries in the 3 rd column (in the DATA menu select SORT ....)
in ascending order (i.e. the best hits are on top). Select the top of the first two columns and generate a scatter plot. (In case of the two Borrelia genomes, plotting everything between 1 and .3 in the third column works well.)
Compare your EXCEL generated plot with the one from NCBI's geneplot.
What do the genes encode that are repeated in Borrelia burgdorferi?
How many repeat units did you detect?
What cutoff was most effective?
If you have time, try to find out (using the NCBI webpage for the Borrelia genomes) where these genes are located in the Borrelia genomes.
Other things to try:
If you compare a genome against itself (for example because you want to find repetitive elements), EMU omits the match with self (i.e. you don't see the central diagonal).
Send email to your instructor (and yourself) upon submit Send email to yourself only upon submit (as a backup) Show summary upon submit but do not send email to anyone.