Entrez Exercise

Your name:
Your email address:

  1. You heard that an interesting paper by a person named Doolittle on horizontal gene transfer between bacteria and mammals was recently published in Science. Can you find the title using Entrez? (Hint: keywords to try might be "evolution", "Doolittle", and so on. Do not forget to use "Preview/Index" to limit your search results). Paste the title here:

    Paste your query here:

    Go to the online article in Science. Note that this is enhanced article (with HyperNotes, HN) and the hypernotes feature link to Gogarten Lab.

  2. Using the "Preview Index function" in ENTREZ PubMed, search for articles authored by Woese CR in 1987. Select the third article (the one on microsporidia). Click on related articles. Sort the results (i.e. the related articles) by publication date (pulldown menu). If nothing is happening, click the display button. When was the most recent related paper published?
    When the oldest related paper published?
    As an aside, compare titles of the recent articles with the original one:


  3. Find the paper co-authored by Senejani and Gogarten. What is the topic of the paper?


  4. Display the abstract of this paper and click on the book link (on top right of abstract). Items in the abstract that are covered in the reference book turn into hyperlinks. If you need more information on any of the items follow these links. Follow the one but last link (Phylogenetic), select "Molecular Biology of the Cell" book and click on the first link. According to the diagram, which bacteria gave rise to the plastids of red and green algae? Do you believe this?


  5. To what domain, kingdom and family does Thermoplasma belong? (Use the Taxonomy link)


  6. How many protein sequences are available for Thermoplasma acidophilum , how many are available for the genus Thermoplasma? (In the taxonomy browser go to Thermoplasma and check protein in the header)

  7. Using the "Preview Index function" in ENTREZ search for articles authored by Gogarten JP and Taiz L. Paste your query here:

    Select one paper and explore the different display options. Note: if you don't select anything, the display command works on the whole list. The MedLine format is a great way to import literature citations into EndNote.

  8. How many nucleotide and protein links do you find associated with ALL of these papers?
    Display and select the nucleotide links, click the "Add to Clipboard" button. This adds entries to a clipboard maintained at the NCBI's computer - NOT your local PC. You can accumulate things on this NCBI clipboard and then display all items in a certain format.
  9. Do all of the protein links refer to different sequences? (Inspect two that sound similar)? How come?

  10. Pick one protein sequence; display it in the different available formats . Which of the formats provides you with the most information about the sequence?

    Don't forget to check out the FASTA format. This format is easy to generate and is most widely used to get data into and out of programs.

  11. From the protein links select a protein sequence from carrots. Click on related sequences. This will provide a list of all sequences related to the one protein sequence you had selected. How many related sequences are retrieved?
  12. Do all these proteins perform similar function? What might be the common denominator?

  13. Go to the clipboard. This should give you the nucleotide sequences referred to in papers from Taiz, Gogarten at al. that you had saved earlier. Display one nucleotide sequence. Then click on related sequences (i.e., related to this one sequence). How many do you find?


  14. Why is there such a difference between the number of sequences related to a nucleotide sequence, and the number related to a protein sequence?

  15. Open one of the vacuolar ATPase sequences (protein) and click on the BLink link. Note the color coding of the taxonomic relations on top and the color-coded schematic alignments below. You also can obtain trees depicting the relationships of the organisms that have genes that match to your sequence. The symbolic alignment is particular helpful in case your protein consists of many different domains (go here for a striking example). When you click on the first number next to the symbolic alignment (SCORE column), you open a page with a pairwise alignment. The other links open the entry of the target sequence and the BLink of the target sequence.


  16. Challenge: How many different archaeal RubisCO (=ribulose bisphosphate carboxylase oxygenase = rbcl = ribulose bisphosphate carboxylase large subunit) encoding genes can you find in the protein data bank. Pretend that you are only intereted in RubisCO genes in Archaea NOT in bacterial RubisCOs. (Archaea and Bacteria are the two domains of prokaryotes.) Clear your clipboard at the NCBI. Start by selecting "protein" in ENTREZ. Explore different search strategies (names, fields, enzyme and substrate names ... .) Save positives to the clipboard. If you later go to the clipboard, you can retrieve related sequences. Remember, nobody claimed that this is a perfect world. It certainly is not easy to formulate a good search strategy. If you don't know if an organism is an Archaeon, click on the taxonomy link associated with most sequences. How many different archaea that have a RubiCO homologue can you find? How many of these have more than one RubisCO gene?

Finished?

Check the appropriate radio button below before pressing the submit button:

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone