MCB 3421 Computer Lab Assignment 4 (Databank searches, part A: ENTREZ)

Your name:
Your email address:

Remember, to request an account on the bioinformatics cluster. Do this now, if you do not have one already. Go to http://bioinformatics.uconn.edu/contact-us/ and
use 1st pull down menu to sign up for a class account ...

1. (less than 20minutes)
Use Pubmed in NCBI's Entrez to find an article written by Carl R. Woese (famous scientist, co-discoverer of the Archaea), published in the journal Proceedings of the National Academy of Sciences of the United States of America with the words primary kingdoms in the title of the paper. Try to use Boolean operators and field tags; if you cannot recall the tags, use the Preview/Index tool under advanced (link on top under the search bar on the right). Also, if you are not sure how a journal or author may be spelled in the databank, start the name and then use the "show index list" link in the advanced search.

What query did find the 1977 article?


How many similar articles (link in right hand bar, under Similar articles, click on see all) are linked to this article?
The number is in the header Items 1 to 20 of xxx)

When was the most recent published (Hint: In the "Sort by" pull down menu set the option to Pub Date)?   

(Aside: If you wonder why prokaryotes is the wrong choice when asked in an exam about the names of the three domains of life, check the 2009 opinion piece from Norm Pace. A more readable and complete summary of the history is in Jan Sapp's article at http://www.ncbi.nlm.nih.gov/pubmed/18053933 )
How many articles available at BioMed Central did cite the article by Carl Woese on the primary kingdoms?

2. (ca. 7 minutes) (Note: If NCBI's pull-down menus do not always work in your browser, try to use Firefox)
In NCBI's pubmed find the earliest paper co-authored by Senejani and Gogarten. What is the topic of the paper?

To learn about inteins, select books as the target database to search (pulldown menu on the left, below the black bar) and search for intein homing, then click on the < Top results in this book> link for the first book, select chapter 11.3, scroll down to the images on homing and splicing (section 11.3.4., Figs 11.32 and 11.33) -- somewhat informative. For more information check Wikipedia on inteins

3. (ca. 5 minutes)
Dr. Gogarten seems obsessed by an important protein called ATP synthase. Is he interested in anything else? How many articles has he published that are NOT related to the ATP synthase OR ATPase?
What query did you assemble?
How many articles did you find?

4. (10 minutes)
For a scientist of your choice (e.g., your advisor, or someone who publishes in your field of interest), using ISI's Web of Science database search for articles that cite this author. First, search for the author, go through the list of articles and select the ones that are actually written by the person you are looking for. (tip: display 50 articles per page, then use select page); second, add selected articles to marked list. When done, click on the citation report link (on top of the table on the right side).
What was the H-index of the scientist? (in your own words, what does this number mean?)

Which was the most cited article? How often was the cited?

Pick one frequently cited article. When was the article published, from when was the most recent citation?

Did you find an interesting article?    
        
Was this article available online?      


Repeat the exercise using Google Scholar. By default Google lists articles in order of how often they were cited (the rich get richer principle). Do you identify the same article. Was the number of citations similar/identical?

NOTE: If you do a lot of writing in science, you want to use bibliography software. Endnote is ok, but costs money and struggles with updates to the operating system. Zotero and Mendeley are free and works with any version of MSWord or Open Office. It incorporates citations from pubmed into your reference library. Mendeley is similar (and compatible with Zotero), but is slower to be updated.

5. (15 minutes)
Using Pubmed, search for articles co-authored by Taiz and Gogarten.

a) How many articles did you retrieve?

b) Using the "find related data" pull down menu in the right bar, display all Nucleotide links.
How many related sequences do the articles sequence have?

c) For the sequences from carrot, click the link to the gene bank file, then click on CDS and observe what happens to the nucleotide sequence. Then click on the link to the protein sequence (following protein ID:)
Once the protein sequence is loaded, run BLAST (in the column on the right hand); in the blast form under databanks select uniprotKB/Swissprot, under alrorithm parameters select max target sequences to 20000 and E-value threshold 0.001 (the E-value of a match gives the probability to get a match of this quality due to chance). place a chemark in the field at the bottom of the form saying "Show results in a new window". Execute the search.
How many matching sequences does the protein sequence have?

Could all of these sequences be homologs?

 

Select the tblastn search algorithm on top (searches the 6 frame translation of the nucleotide database), and select the Human Ref Seq gene sequences as database. (Same E-value threshold).
Do you obtain a match? How is it annotated?

Return to the mRNA sequence of the carrot V-ATPase catalytic subunit (here). In the right column select blast. For the target database select Human Ref Seq gene. Execute the search. Do you obtain a match? How is it annotated?

What could be the reason for the discrepancy?

 

Go to the homologous protein from Arabidopsis http://www.ncbi.nlm.nih.gov/protein/O23654.1. Observe the different parts of the genpept file. Then click on the gene link under related information on the right.

Where in the genome is this gene located? (Chromosome, nucleotides from ... to)(scroll down, mouse over the green arrows in the middle - wait).

6. (10 minutes)
Using Entrez, search Protein (use drop-down box to select the Protein database) for 19888400 (this is a gi number, see historical note)

Click on the Similar protein sequences using SmartBlast link in the lower part of the right-hand column.
Do you notice anything interesting about the alignments?

7. (8 minutes)
To what domain (super kingdom), phylum (kingdom), and family does Thermoplasma belong? (Use the Taxonomy Search. Click on the "Thermoplasma" link that is returned as the result of the search. In the line labeled lineage, if you hover the mouse pointer over the names, it tells you which taxonomic category you are pointing at. )

How many protein and genome sequences are available for the genus Thermoplasma? (In the taxonomy browser go to Thermoplasma and check protein and genome in the header, then click on <Display>)

Try to figure out what you could do to download all the protein sequences that are annotated as being encoded by a Thermoplasma genome.


Finished?

Check the appropriate radio button below before pressing the submit button:

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone