CLASS 20. Genome-wide Online Analyses.

INSTRUCTIONS:

For each exercise, provide search query used and keep the answers brief. Email me the answers by Sunday 11:59PM AST at the latest.

Use "CLASS 20 EXERCISE" as a message subject, and type answers directly to email body (i.e., no document attachments please). Make sure that first line of your message is your NAME.

  1. Gene Synteny. In this exercise, we will focus on a service developed by the Joint Genome Institute (JGI). The Integrated Microbial Genomes (IMG) server allow you to search, select and compare portions of fully sequenced or partially completed genome sequences.

    Go to the IMG server. We will compare portions of genomes from different bacterial species to verify if they all have the same order for the genes encoding the ATP synthase subunits.

    Click on the Find Genes toolbar. In the Keyword window, type in "ATP synthase" and select Thermotoga maritima in the organism list. The resulting search displays all subunits that are part of the complete ATP synthase (except the Flagellum-specific ATP synthase which is part of the bacterial flagellum assembly).

    In the Gene object ID column, click on the number corresponding to the beta (b) chain, this will lead you to a page containing information about that particular ORF.

    In the "Evidence for function Prediction" box, you can see a graphical representation of the genome where the ATP synthase beta chain is located (red ORF). You can put your mouse cursor over the ORF to display the identity of the gene. (You need to wait until the page is completely loaded!) Look at the ORFs located around the beta chain gene. Where do you think the operon containing the ATP synthase begins and ends?

    Now, click on "Ortholog Neighborhood Viewer" (since we haven't selected any other organism, the next page will display a default list of different species). Is the subunit order of the ATP synthase conserved in the other bacterial species? List a few of the differences you can find. [If you do not know what species you are looking at, check them out at NCBI's Taxonomy Database]

  2. Genome Dot Plots. The NCBI provides a facility (called GenePlot) for pairwise genome comparison. This is similar to DotLet, but the units of comparison are complete Open Reading Frames. A circle is placed in the plot, if a gene "x" from genome A has a top scoring BLAST hit in genome B (gene "y"), AND if gene "y" also is the top-scoring BLAST hit of gene "x" (symmetrical best hits, sometimes called reciprocal best hits).

    First compare the two Leptospira interrogans serovars from the scroll down lists (serovar Copenhageni and serovar Lai) and do the Genome Plot by pressing the 'Compare Selected Pair' button. What can you conclude on the genome differences by looking at the window on the left?

    Select two closely related species and do the Genome Plot (If the two species you chose to compare do not contain any interesting results, retry with other species). Which species did you compare? What interesting feature did you find when comparing those two genomes? Try to use the zoom in feature in the right window to figure out what an interesting sequence encodes.

  3. NCBI's Genome Resources Walk. Go to the ENTREZ genome section .  Select the genome from Aeropyrum pernix. (click on the major organism group, in this case Archaea, and then click on the accession number NC_000854. Selecting the species name itself will bring you back to the taxonomy browser.)

    Explore the different genome views:

    A)Select Protein coding genes in the feature table, scroll down to an entry that is not labeled as a hypothetical protein, and explore the different links in the protein's row.

    B) Select structural RNAs in the feature table -- how many 16S rRNA and and how many 5S rRNA coding genes are described for this genome?

    C) Click somewhere on the circular map of the genome. In the diagram to the left of the circle (which represents the zoomed in view), click on one of the ORFs (what do the colors stand for?) and go to the gene's database record. In the Blink (=BLAST link) link, what do the different colors represent in the symbolic alignment at the left hand site of the table (if you picked something that doesn't have any matches, go back and select an ORF that is colored). In the Blink results, what are the scores linked to? [Hint: some info is available at Blink Help].

    D) Explore the "Sequence Viewer presentation" of the selected genome section.

    E) Select TaxMap - the window that opens has an interactive graphic that displays all ORFs as dots colored according to the domain that the highest scoring BLAST hit belongs to. (What do yellow, pink and blue represent?) Click on one of the pink dots. In the table that will be displayed below the graphic, click on the number left to the pink letter E. Does the top hit represent a significant hit? (hint: click on the score).

    You can change the bitscore cutoff by clicking inside the distribution curve. (DO NOT CLICK REPEATEDLY. It takes some time for the page to refresh). Click once approximately in the middle of the graphic. How many pink dots are left?
    Click on the number on the right hand in the "Best" column, in the row of the eubacteria. What types of proteins do you find listed?