MCB 3421 Computer Lab Assignment 4 (Databank searches, part A: ENTREZ)

Your name:
Your email address:

Remember, to request an account on the bioinformatics cluster. Do this now, if you do not have one already. Go to and
use 1st pull down menu to sign up for a class account ...

1. (less than 20minutes)
Use Pubmed in NCBI's Entrez to find an article written by Carl R. Woese (famous scientist, co-discoverer of the Archaea), published in the journal Proceedings of the National Academy of Sciences of the United States of America with the words primary kingdoms in the title of the paper. Try to use Boolean operators and field tags; if you cannot recall the tags, use the Preview/Index tool under advanced (link on top under the search bar on the right). Also, if you are not sure how a journal or author may be spelled in the databank, start the name and then use the "show index list" link in the advanced search.

What query did find the 1977 article?

How many similar articles (link in right hand bar, under Similar articles, click on see all) are linked to this article?
The number is in the header Results 1 to 20 of xxx)
When was the most recent published (Hint: In the "Sort by" pull down menu set the option to Pub Date)?   
(Aside: If you wonder why prokaryotes is the wrong choice when asked in an exam about the names of the three domains of life, check the 2009 opinion piece from Norm Pace. A more readable and complete summary of the history is in Jan Sapp's article at )
How many articles available at BioMed Central did cite the article by Carl Woese on the primary kingdoms?

2. (ca. 7 minutes) (Note: If NCBI's pull-down menus do not always work in your browser, try to use Firefox)
In NCBI's pubmed find the earliest paper co-authored by Senejani and Gogarten. What is the topic of the paper?

To learn about inteins, select books as the target database to search (pulldown menu on the left, below the black bar) and search for intein homing, then click on the < Top results in this book> link for the first book, select chapter 11.3, scroll down to the images on homing and splicing (section 11.3.4., Figs 11.32 and 11.33) -- somewhat informative. For more information check Wikipedia on inteins

3. (ca. 5 minutes)
Dr. Gogarten seems obsessed by an important protein called ATP synthase. Is he interested in anything else? How many articles has he published that are NOT related to the ATP synthase OR ATPase?
What query did you assemble?
How many articles did you find?

4. (10 minutes)
For a scientist of your choice (e.g., your advisor, or someone who publishes in your field of interest), using ISI's Web of Science database search for articles that cite this author. First, search for the author, go through the list of articles and select the ones that are actually written by the person you are looking for. (tip: display 50 articles per page, then use select page); second, add selected articles to marked list. When done, click on the citation report link (on top of the table on the right side).
What was the H-index of the scientist? (in your own words, what does this number mean?)
Which was the most cited article? How often was the cited?
Pick one frequently cited article. When was the article published, from when was the most recent citation?
Did you find an interesting article?             
Was this article available online?      
Repeat the exercise using Google Scholar. By default Google lists articles in order of how often they were cited (the rich get richer principle). Do you identify the same article. Was the number of citations similar/identical?

NOTE: If you do a lot of writing in science, you want to use bibliography software. Endnote is ok, but costs money and struggles with updates to the operating system. Zotero is free and works with any version of MSWord or Open Office (I use it via firefox). It incorporates citations from pubmed into your reference library. Mendeley is similar (and compatible with Zotero), but is slower to be ubdated.

5. (5 minutes)
Using Pubmed, search for articles co-authored by Taiz and Gogarten.

a) How many articles did you retrieve?

b) Using the "find related data" pull down menu in the right bar, display all Nucleotide Links.
How many related sequences do the articles sequence have?

c) For the sequences from carrot, click the link to the gene bank file, then click on CDS and observe what happens to the nucleotide sequence. Then click on the link to the protein sequence (following protein ID:)
Once the protein sequence is loaded, click on related sequences (in the column on the right hand).
How many related sequences does the protein sequence have?

Could all of these sequences be homologs?


For the sequences from Nicotiana sylvestris, click on the link to identical sequences. How many sequences are listed under this entry?

Go to the BLink for the Daucus carota catalytic subunit (under related information. More information on the Blast link is here). How are the archaeal and bacterial homologs are listed? As what are the archaeal homologs annotated? (Pick one typical annotation line).

Go to the homologous protein from Arabidopsis Observe the different parts of the genpept file. Then click on the gene link under related information on the right.

Where in the genome is this gene located (mouse over the green arrows in the middle).
If this does not work, or if you want to explore different ways to explore the gene neighborhood, click on map-viewer under related information. On the left hand site are the zoom in and out levers, and a link to Maps and options. In case the display is not useful, click on the Maps and Options, and make sure that gene and Ref seq transcripts are added to the display.
Where is the gene located? (Chromosome, nucleotides)

6. (10 minutes)
Using Entrez, search Protein (use drop-down box to select the Protein database) for 19888400 (this is a gi number, see historical note)

Click on the BLink link in the lower part of the right-hand column (under related information).
Do you notice anything interesting about the alignments? (note the last paragraph in the documentation here )

The top of the BLINK alignment columns are little brown lines that point towards conserved domains. Follow a few of these links. What does this tell you about the central portion of the sequence?

7. (8 minutes)
To what domain (super kingdom), phylum (kingdom), and family does Thermoplasma belong? (Use the Taxonomy Search. Click on the "Thermoplasma" link that is returned as the result of the search. In the line labeled lineage, if you hover the mouse pointer over the names, it tells you which taxonomic category you are pointing at. )

How many protein and genome sequences are available for the genus Thermoplasma? (In the taxonomy browser go to Thermoplasma and check protein and genome in the header, then click on <Display>)

Try to figure out what you could do to download all the protein sequences that are annotated as being encoded by a Thermoplasma genome.


Check the appropriate radio button below before pressing the submit button:

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone