Assignment 8

Dotlet Exercises

Your name:
Your email address:

The Swiss Institute for Bioinformatics provides a JAVA applet that perform interactive dot plots. It is called Dotlet (if site doesn't load, try here, here, here, or here [you might need to set the JAVA security setting to allow a particular site to run a script; see here for instructions. Under MacOSX, you also might need to go to the general security settings in system preferences and click on the "open anyhow" button]). The main use of dot plots is to detect domains, duplications, insertions, deletions, and, if you work at the DNA level, inversions (excellent illustrations of the use of dot plots are given on the examples page).

Comparing yeast ATPase catalytic subunit with yeast HO endonuclease. Go to the applet and input the sequences:
Sce_VMA.fa (the vacuolar ATPase catalytic subunit from yeast),
SceHO.fa (the mating type switching HO endonuclease from yeast),
vma1Neurospora.fa (the vacuolar ATPase catalytic subunit from Neurospora crassa) and
Sce_intein.fa.
Careful, once you leave the webpage, the back arrow will only return you to the applet, but you have to input the sequences again (so make sure that your applet is in a separate browser window).
Also, when you input sequences make sure you paste the sequence only, without a sequence description line. Give the sequences a name that allows you to recognize which sequence is which (e.g. Yeast_vma1, YeastHO, Neurospora_vma1, Yeast_intein)

Select Neurospora A-subunit (vma1Neurospora.fa) and the yeast subunit with intein (Sce_VMA.fa). Select a window size between 9 and 15 and click "compute". The program will compare every window of the chosen size in one sequence to all the possible windows in the other sequence. On the right you see a histogram that describes how often window pairs with the indicated score occurred. The sliding bars below and above the histogram let you select the colors with which matches are depicted. (I like black for matches, white for mismatches better than the default).
Note: the sequences may be longer than fit into the display window. Either you can use the levers on top and at the left side to move the display window down and/or to the right, or you can selet a compression using the pull-down menu labeled as 1:1 (1:3 or 1:4 usually work)

If you click on the dot plot panel, the alignment window at the bottom aligns the two sequences accordingly. You can fine-tune the alignment using the arrows.

Which sequence positions (from ... to....) in the yeast sequence represent the intein?
If you compare the HO endonuclease (sex change enzyme) (SceHO.fa) to the intein (Sce_intein.fa), does the complete intein sequence match to something in the HO endonuclease? Is there a part of the sequence in the HO endonuclease that might correspond to an extein?
Comparison of nucleotide sequence with introns vs. protein sequence it codes.
Dot plots have many different applications. One of them is to analyze and visualize the intron exon structure of genes. In dotlet, if you use a nucleotide sequence for the first sequence, and a protein sequence for the second, the program will compare the translation in all three frames to the protein sequence. Load the following two sequences into dotlet:

A) The genomic sequence from Arabidopsis thaliana containing the gene encoding the vacuolar ATPase (arab.fa), the given sequence is the reverse complement of a sequence that is part of chromosome 1.

B) The protein sequence as translated from the cDNA sequence as given in GI 3334404

How many exons are in the gene ?
Are neighboring exon sequences always in the same reading frame? (Use the mouse pointer to place the blue cross-hairs on the diagonal and then use the arrow key until one of the three frames matches to the protein sequence.) Try this for a couple of exons.
Repetitive proteins in Dotlet
Using dotlet load GI 15668394 and GI 19887539 (again omit the labels from the sequence, but give them a name so you can recognize them :)).
Compare the Methanocaldococcus protein against itself. Do you see any repetitive units? How many?
Does the choice of scoring matrix make a difference?
Compare the Methanopyrus sequence against the one from Methanocaldococcus. How many equivalents to the single repeat unit in Methanocaldococcus do you find?
How many repeats do you identify when you compare the Methanopyrus sequence against itself?
Compare the two sequences using Pairwise Blast. Which program should you use? What is the effect of turning the filter on or off?

Part2: Jalview

Jalview is a JAVA application to inspect and edit multiple sequence alignments. It also allows inspection of protein space for the aligned sequences. This works surprisingly well. The Jalview Homepage contains a lot of additional information.

Go to the jalview applet page and either start the Jalview desktop (link on top), or a jalview applet (links in the middle of page).
If this does not work, download this file, unarchive it, and double-click on Jalview.app.
Preferably, you want to load the JaLVIEWdesctop, but the Jalview lite version is just as fine, except the sequence input is more difficult (delete the sequences from the example, add the sequences form the file).

Close the windows that may have opened as a demonstration, except for the multiple sequence alignment window.

Load the sequences from the ATP-ase Subunit alignment into Jalview (either load from file, if the desktop application runs, or paste into the input window -- select new window after you are done pasting.).

Explore the different coloring options (COLOUR menu). Which one seems to work best (most meaningful - scroll through the alignment to a more conserved region).

Note: You can change/edit the alignment by clicking on an amino acid residue and dragging it to the right or left using the arrow keys. Try it, but leave the sequences in an aligned state before you move on. (If this doesn't work, presss F2 and try again)

Select all sequences. CALCULATE an AVERAGE DISTANCE TREE USING % identity
Click somewhere in the resulting tree to color groups of related sequences in the same color. You can right-click (or command click) on a node to change color for a group of sequences.
Chose a color scheme that colors all subunits of the same type in the same color

CALCULATE the PRINCIPAL COMPONENT ANALYSIS.
In a principal component analyses, the new dimensions are calculated as a linear combination of the original dimensions, so that greatest variance by any projection of the data set comes to lie on the first axis, etc. for the following dimensions. Can you find a higher dimension that breaks up the vacuolar ATPase A subunits? (Their names start with A.).
Which of the A subunit sequences cluster together, if you use this dimension (1, 2 and 4 worked for me)?

Finished?

Check the appropriate radio button below before pressing the submit button:

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone.