Assignment for Class 34

Your name:
Your email address:

 

Assignments:

1. (2 Minutes) Using PRSS determine if there is significant similarity between proteins with the following gi numbers 145722 (D-Ala D-Ala ligase) and 121663 (Glutathione synthetase). Select "Accession/GI number" from the drop-down box that will by default have "FASTA format" selected.

What is the P-Value of the comparison?

2. (10 minutes) Do a PSI-BLAST search with the Glutathione synthetase as a query (use swissprot as database). On the Format page, set the E-value cut-off for inclusion in the next round ("PSI-BLAST Threshold") to 0.0001 and change the maximum target sequences to 10000. Note : By default PSI-Blast switches back and forth between the format and the result window. DO NOT CLICK the "Run PSI_Blast iteration X" button repeatedly. Click it once and open the Format window!.

After how many iterations (do not more than 5 iterations!) do you start to pick up carbamoyl phosphate synthetases and D-Ala D-Ala ligase?

Which other types of enzyme are included among the hits?


Notice! : If this takes a long time, collaborate with your neighbor. One of you could do task #2, the other #3.


3. (10 minutes) Do a PSI-BLAST search with the D-Ala D-Ala ligase as a query (swissprot as database). After how many iterations (do not more than 5 iterations!)do you start to pick up carbamoyl phosphate synthetases?

Which other types of enzyme are included among the hits?


What might be the reason for the different results obtained in tasks 2 and 3?


4. (14 Minutes) Do a PSI-BLAST (use SwissProt as the database) search for 3 iterations with the following sequence:

>Pab_VMA intein from gi|7436316|pir||D75028
CVDGDTLVLTKEFGLIKIKDLYKILDGKGKKTVNGNEEWTELERPITLYGYKDGKIVEIKATHVYKGFS
AGMIEIRTRTGRKIKVTPIHKLFTGRVTKNGLEIREVMAKDLKKGDRIIVAKKIDGGERVKLNIRVEQKR
GKKIRIPDVLDEKLAEFLGYLIADGTLKPRTVAIYNNDESLLRRANELANELFNIEGKIVKGRTVKALLI
HSKALVEFFSKLGVPRNKKARTWKVPKELLISEPEVVKAFIKAYIMCDGYYDENKGEIEIVTASEEAAYG
FSYLLAKLGIYAIIREKIIGDKVYYRVVISGESNLEKLGIERVGRGYTSYDIVPVEVEELYNALGRPYAE
LKRAGIEIHNYLSGENMSYEMFRKFAKFVGMEEIAENHLTHVLFDEIVEIRYISEGQEVYDVTTETHNFIGG
NMPTLLHNT

What types of enzymes do you get as hits?

Do you notice anything strange about the search results?

5. Using PSSMs

Here is a FASTA formatted file containing annotated IS605 transposase protein sequences from Frankia genomes.

We will use it to build a PSSM for this protein family, and then compare (mainly quantitatively) two searches:

  1. a blastp search of Frankia genomes for ORFs with significant matches to the sequences in the FASTA file
  2. a PSI-tblastn search of Frankia genomes for significant matches to this PSSM

To do this, we will use the cluster.

A FASTA file of the proteins in all 3 Frankia genomes is here - ThreeFrankia.faa

A FASTA file of the nucleotide sequences of all 3 Frankia genomes is here - ThreeFrankia.fna

Note that these (currently 3) Frankia genomes were retreived from Entrez Genome (using the F links in the right-hand column). In this case, we retreived the *.faa (protein) files, and the *.fna (genome nucleotide) files.

blastpgp -d nr -i ThreeFrankiaIS605.faa -j 2 -C IS605Check.chk -e 1e-5 -o blast.out -a 2

This takes quite a while, so grab the checkpoint file from here - IS605Check.chk

Options for blastpgp here

formatdb -p T -i ThreeFrankia.faa -o T

formatdb -p F -i ThreeFrankia.fna -o T

blastall -p blastp -i ThreeFrankiaIS605.faa -d ThreeFrankia.faa -e 1e-5 -m 8 -o blastp.out -a 2

blastpgp -d ThreeFrankia.faa -i ThreeFrankiaIS605.faa -a 2 -R IS605Check.chk -o PSIblastP.out

blastall -p psitblastn -i ThreeFrankiaIS605.faa -d ThreeFrankia.fna -R IS605Check.chk -e 1e-5 -m 8 -o psitblastn.out -a 2

Counting the number of lines in a file:

wc -l blastp.out

wc -l psitblastn.out

How does the number of blastp matches compare to the number of PSI-blast and PSI-tblastn matches?

If there is a significant difference in the number of matches, can you think of a reason why this could happen?

Also see MCB 372 PSI-Blast exercise

 

 

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone.