MCB 3421 Computer Lab Assignment 4 (Databank searches, part A: ENTREZ)

Your name:
Your email address:

1. (less than 20minutes)
Use Pubmed in NCBI's Entrez to find an article written by Carl R. Woese (famous scientist, co-discoverer of the Archaea), published in the journal Proceedings of the National Academy of Sciences of the United States of America with the words primary kingdoms in the title of the paper. Try to use Boolean operators and field tags; if you cannot recall the tags, use the Preview/Index tool under advanced search (link on top right). Also, if you are not sure how a journal or author may be spelled in the databank, start the name and then use the "show index list" link in the advanced search.

What query did find the 1977 article?

How many related citations (link in right hand bar) are linked to this article?
(in the window that gives you the title, authors and abstract, click on the link labeled "related citations" in the forth table in the right hand site; number is in the header Results 1 to 20 of xxx)
When was the most recent published (Hint: In the Display settings pull down menu set the "Sort by" option to Pub Date)?   
(Aside: If you wonder why prokaryotes is the wrong choice when asked in an exam about the names of the three domains of life, check the 2009 opinion piece from Norm Pace. A more readable and complete summary of the history is in Jan Sapp's article at )

2. (ca. 7 minutes) (Note: NCBI's pull-down menus do not always work well in Safari -- use Firefox)
In NCBI's pubmed find the earliest paper co-authored by Senejani and Gogarten. What is the topic of the paper?

To learn about inteins, select books as the target database to search (pulldown menu on the left, below the black bar) and search for intein homing, then click on the < op results in this book> link for the first book, select chapter 11.3, scroll down to the images on homing and splicing (section 11.3.4.) -- somewhat informative. For more information check Wikipedia on inteins

3. (ca. 5 minutes)
Dr. Gogarten seems obsessed by an important protein called ATP synthase. Is he interested in anything else? How many articles has he published that are NOT related to the ATP synthase OR ATPase?
What query did you assemble?
How many articles did you find?

4. (10 minutes)
For a scientist of your choice (e.g., your advisor, or someone who publishes in your field of interest), using ISI's Web of Science database search for articles that cite this author. First, search for the author, go through the list of articles and select the ones that are actually written by the person you are looking for. (tip: display 50 articles per page, then use select page); second, add selected articles to marked list. When done, click on the citation report link.
What was the H-index of the scientist? (in your own words, what does this number mean?)
Which was the most cited article? How often was the cited?
Pick one frequently cited article. When was the article published, from when was the most recent citation?
Did you find any interesting article?             
Was this article available online?      
Repeat the exercise using Google Scholar. By default Google lists articles in order of how often they were cited (the rich get richer principle). Do you identify the same article. Was the number of citations similar/identical?

5. (5 minutes)
Using Pubmed, search for articles co-authored by Taiz and Gogarten.

a) How many articles did you retrieve?

b) Using the "find related data" pull down menu in the right bar, display all Nucleotide Links.
For one of the sequences from carrot, select related sequences.
How many related sequences does the nucleotide sequence have?

c) For one of the sequences from carrot, click the link to the gene bank file, then under CDS, click on the link to the protein sequence (following protein ID:)
Once the protein sequence is loaded, click on related sequences (in the column on the right hand).
How many related sequences does the protein sequence have?

What might explain the difference ?

Go to the protein encoded by gi|167559 (167560). How many related sequences (link in right hand bar) does this entry have?

Do all sequences appear to be homologs of one another?

Go to the BLink (under related information. More information on the Blast link is here). How are the archaeal and bacterial homologs annotated (pick one typical annotation line). Does the annotation change, when you move to the end of the lists? Give some examples:

6. (10 minutes)
Using Entrez, search Protein (use drop-down box to select the Protein database) for 19888400 (this is a gi number, see historical note)

Click on the BLink link in the lower part of the right-hand column (under related information).
Do you notice anything interesting about the alignments? (note the last paragraph in the documentation here )

The top of the BLINK alignment columns are little brown lines that point towards conserved domains. Follow a few of these links. What does this tell you about the central portion of the sequence?

7. (8 minutes)
To what domain (super kingdom), phylum (kingdom), and family does Thermoplasma belong? (Use the Taxonomy Search. Click on the "Thermoplasma" link that is returned as the result of the search. In the line labeled lineage, if you hover the mouse pointer over the names, it tells you which taxonomic category you are pointing at. )

How many protein and genome sequences are available for the genus Thermoplasma? (In the taxonomy browser go to Thermoplasma and check protein and genome in the header, then click on <Display>)


Check the appropriate radio button below before pressing the submit button:

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone