!!!! Wednesdays class will meet in the Kresge library (TLS, 2nd floor)!!!!
A few comments on the exercises from class 3 are here.

Today's topic is the use of genome data.

In Exercise 1-2 we will first use a query sequence (an intein - for more info on inteins go here), PSI blast and the so-called non-redundant database to calculate a position specific scoring matrix (PSSM). (= Exercise 1)

We then will use this matrix to search different genomes for the presence of sequences that be inteins. We then will use blink and blast searches with the sequences that the PSSM retrieved to verify if the target sequence indeed represents an intein.

In a third exercise will will screen a baterial genome for genes that are candidates for interdomain or inter phylum horizontal gene transfer.

Short lecture and demonstrations:

The NCBI provides several different interfaces to browse through and analyse genomes. For example, in the Borrelia genome, if you click on the “complete genome”, you get a graphical representation, further clicks move you down throw several levels to the nucleotide and encoded amino acid sequence.  If you click on an ORF, you retrieve the sequence followed by an output of a blast search of this sequence against the nr database.  The graphic representation shows you which part of the ORF generated the match, if you click on the number that represents the score, you open a new window with the alignment (again with nice graphics included).  If you click on the number an window with the matching sequence in gb-format opens up.  If the ORF is part of a cluster of putatively orthologous genes, you can get information on the cluster by clicking on the COGnumber.

From the Borrelia genome page, you can go to tables listing all ORF, or to taxtable, which provides an interesting nearest neighbor coloring of the genome.  It is noteworthy that many of the pink dots are endonucleases.  Also, there are many transporters among the odd colored genes. 

In an attempt to capture some phylogenetic information in blast comparisons, Olendzenski et al. pioneered an approach to use multiple reference genomes to screen for putatively horizontally transferred genes (see Fig. 4). A similar approach, but using only two instead of three reference genomes is implemented in the TAX PLOT program at the NCBI's genome page (see the examples given in question 3 below!).

You pick one genome to analyze, and two reference genomes. The program returns a plot of every ORF in the selected genome represented in a coordinate system, where the two coordinates are the highest alignment score with the two reference genomes:

Selected genome was from Borrelia burgdorferi. The list of selected genes is below:

Definition Blast2Seq GenBank Blink
V-type ATPase, subunit B (atpB) [Borrelia burgdorferi] 15594439 =>
aa V-TYPE ATP SYNTHASE BETA CHAIN (V-TYPE A 722 12585403 =>
aa ATP synthase F1 alpha subunit [Aquifex a 261 15606090 =>

V-type ATPase, subunit A (atpA) [Borrelia burgdorferi] 15594440 =>
aa H+-transporting ATP synthase, subunit A 1051 11498766 =>
aa ATP synthase F1 beta subunit [Aquifex ae 221 15607015 =>

prolyl-tRNA synthetase (proS) [Borrelia burgdorferi] 15594747 =>
aa prolyl-tRNA synthetase (proS) [Archaeogl 655 11499201 =>
aa proline-tRNA synthetase [Aquifex aeolicu 167 15605873 =>

phenylalanyl-tRNA synthetase, beta subunit (pheT) [Borrelia burgdorferi] 15594859 =>
aa phenylalanyl-tRNA synthetase, subunit be 709 11499019 =>
aa phenylalanyl-tRNA synthetase beta subuni 153 15606806 =>

chemotaxis histidine kinase (cheA-1) [Borrelia burgdorferi] 15594912 =>
aa chemotaxis histidine kinase (cheA) [Arch 798 11498645 =>
aa histidine kinase sensor protein [Aquifex 86 15605839 =>

methionyl-tRNA synthetase (metG) [Borrelia burgdorferi] 15594932 =>
aa methionyl-tRNA synthetase (metS) [Archae 873 11499048 =>
aa methionyl-tRNA synthetase alpha subunit 436 15606482 =>

spermidine/putrescine ABC transporter, ATP-binding protein (potA) [Borrelia burgdorferi] 15594987 =>
aa spermidine/putrescine ABC transporter, A 678 11499200 =>
aa ABC transporter [Aquifex aeolicus] 325 15607081 =>

lysyl-tRNA synthetase [Borrelia burgdorferi] 15595004 =>
aa lysyl-tRNA synthetase (lysS) [Archaeoglo 642 11498815 =>
aa cysteinyl-tRNA synthetase [Aquifex aeoli 92 15606347 =>

!!!! Wednesdays class will meet in the Kresge library (TLS 2nd floor) !!!!

Assignment #3:

[Links from this page open in a separate window]

  1. Use Internet Explorer for this exercise. Do a PSI-BLAST search for 3 iterations with the following sequence:
     >gi|7436316|pir||D75028 Pab VMA intein
     CVDGDTLVLTKEFGLIKIKDLYKILDGKGKKTVNGNEEWTELERPITLYGYKDGKIVEIKATHVYKGFS
     AGMIEIRTRTGRKIKVTPIHKLFTGRVTKNGLEIREVMAKDLKKGDRIIVAKKIDGGERVKLNIRVEQKR
     GKKIRIPDVLDEKLAEFLGYLIADGTLKPRTVAIYNNDESLLRRANELANELFNIEGKIVKGRTVKALLI
     HSKALVEFFSKLGVPRNKKARTWKVPKELLISEPEVVKAFIKAYIMCDGYYDENKGEIEIVTASEEAAYG
     FSYLLAKLGIYAIIREKIIGDKVYYRVVISGESNLEKLGIERVGRGYTSYDIVPVEVEELYNALGRPYAE
     LKRAGIEIHNYLSGENMSYEMFRKFAKFVGMEEIAENHLTHVLFDEIVEIRYISEGQEVYDVTTETHNFI
     GGNMPTLLHNT  
    What types of enzymes do you get as hits? Do you notice anything strange about the search results? Save the PSSM (Position Specific Scoring Matrix, or profile) from your search on the 4th iteration. To do that choose PSSM from pull-down menu under Format options and click "Format!" button. After the search is done, you should get strangely looking alphanumerical symbol mixture in your browser window. This is a PSSM. Save PSSM matrix to the disk as text file, or keep this browser window opened. We are going to use this profile in the question #2.

  2. Now we will use the PSSM to BLAST the completed genomes. Go to Microbial Genomes Genomic BLAST page (Let it load completely before choosing any options). Paste intein sequence into query sequence box, change Query and Database entries to "Protein". Choose one of the following genomes as Database:
    • Pyrobaculum aerophilum
    • Aeropyrum pernix
    • Sulfolobus tokodaii
    • Archaeoglobus fulgidus
    • Methanothermobacter thermautotrophicus
    • Thermoplasma volcanium
    • Methanococcus jannaschii
    • Saccharomyces cerevisiae (This genome is on Other eukaryotes Genomic BLAST page, but user interface is the same)

    After that click "Adv. BLAST" button. This will redirect you to the BLAST search window. Paste your PSSM from Question #1 into PSSM box (under Options). What are the results of your search? Did you get any significant matches? What are they? If you have significant matches, does the match occur over the full lengths of both query and subject sequences? Use Blink to investigate if the hits are indeed inteins.
    What is your conclusion? In your answer indicate
     - genomes searched,
     - number of significant matches found,
     - the E-values of these matches, and
     - the identity of these matches
         (i.e., are these probable inteins, or are they likely to be something else?).


  3. Select a microbial genome, and a question to address. Select two reference genomes appropriate for your question.

    Your question:

    Your genome:

    Your reference genomes:

    Which candidate genes did you find?:

    For example:

    • If you ask the question: which genes in Treponema pallidum are candidates for having been transferred from the archaeal domain into this genome, you go to the NCBI genomes page, select the Treponema pallidum genome (here), and then select TAX PLOT (here). To look for genes transferred from the archaea, you need to select one bacterial genome (a deep branching one would be nice, if there is such a thing), and an archaeal genome. Aquifex and Archaeoglobus would be suitable.
    • If you look for halobacterial (archaeal) genes in cyanobacteria you could select B. subtilis (B.=Bacillus) and H. NRC1 (-Halobacterium, which is an archaeon, not a bacterium!) as reference genomes adn the genome from Synechocystis sp PCC6803 as the genome to analyse.

!!!! Wednesdays class will meet in the Kresge library (TLS 2nd floor) !!!!