Assignment for Class 32

Your name:
Your email address:

Exercise 1: 15 minutes: Using Clustalx to perform bootstrap analyses using neighbor joining

We will use the same datasets as in the previous exercise (here) and (here). Load these sequences into clustalx, align them using the default options;
in the trees menu select "correct for multiple substitutions" and "exclude positions with gaps" (there should be chackmarks visible in the menu, before you run start the bootstrap analysis).
[If you plan to inspect the trees in treeview, in the output format options, click on branch and select node from the menu (this changes where the bootstrap values are written, as labels to the branch or as labels to the nodes - the latter is philosophically not really correct (these are unrooted trees) but is the way that many programs read the tree files).]

Calculate "Bootstrap NJ tree". The results will be written into a tree called something.phb (there is another tree calculated during the alignment that has the extension dnd)

Inspect the trees using njplot. Root the trees as appropriate and check display bootstrap support values.

Which branches have only low bootstrap support? Are any of these findings surprising? What might be the reason?

Exercise 2 : 20 minutes

There are several programs that allow the inspection and manipulation of 3-D structural protein data. In this course we use the swiss protein data bank viewer.

SPDBV is an excellent choice, because it also provides an interface to the Swiss Protein databank modeling software.

The SPDBV program is available free of charge at the expasy homepage.

These are several excellent on-line tutorials available to learn the use spdbv:

A basic tutorial is at

http://www.usm.maine.edu/~rhodes/SPVTut/index.html ,

And a course on structure, spdbv, and modeling is at

http://www.expasy.ch/swissmod/course/course-index.htm

The exercise in this section is taken with slight modifications from Gale Rhode's the basic tutorial, many of the exercises in the following sections parallel exercises in the basic tutorial.

You can retrieve pdb files from the NCBI, or from the protein structure data bank at Rutgers University. (To do so search for the file, click the explore link and right click on the link that indicated download uncompressed pdb file.) (The ones used in the course are also available here ).

Do the following:

copy all files listed here onto your computer (go to the listing, ctrl click on the name and save to a folder on your computer).

Locate the folder that contains the program SPDBV.

Start SPDBV through double clicking on the icon

load 1HEW.pdb

click on the three cursor control buttons and rotate/move/enlarge the picture of the lysozyme molecule

click on the center molecule button to the left of the cursor buttons

click on the page icon (you may need to expand the menu window to see the page icon) and scan through the pdb file

open the control panel (in the WIND-menu).

open the alignment window (in the WIND-menu)

select all

in the WIND-menu, display Ramachandran plot

in the control panel, select different residues (click on the 'h' and 's' in the first column, then hit return (return make the selected residues visible, else the visibility and the selected residues can be different!). How dows the display change in the Ramachandran plot? In the main window? (For more info on the Ramachandran plot see http://www.bmb.uga.edu/wampler/tutorial/prot2.html)

select all

Explore different coloring (CPK, secondary structure, accessibilty) and display options (show CA trace only, show oxygen, ...)

REMARK: If you do serious work save your work periodically, sometimes it is impossible to recover from inadvertent mouse clicks)

Remark2: There is a difference between select (the residue turns red in the control panel) and actually seeing the residue in the main window. If you hit return the selected residues become visible.

Select (point the cursor over the NAG201...at the buttom of the control panel) the NAG inhibitor (shift click adds to the selection).

Color CPK

Invert selection (in the SELECT menu)

Color secondary structure

Invert selection

Tools compute H-Bonds

alt click "side" column in the control panel to turn the sidechain display off

Select only the NAG inhibitor

select Neighbors of selected aa - check select add to selection buton

hit return

click on the "side" header in control panel (acts only on selected residues)

select group properties Non-polar aa

click on Header COL in display panel select a blue color to color hydrophobic residues blue

Are there "blue" residues interacting with the N-Acetyl glucosamines? How come?

Can you locate which one of the Tryptophane residues sits under the second of the N-Acetyl glucosamines?

Play around, if in doubt use the ? buttom.

The worst that can happen is that you'll have to restart your computer.

Open the alignment window and display the complete lysozyme molecule. Observe the color change in the structure that happens when you move the mouse over the sequence in the alignment window.

The resulting display after some beautifications might look like this:

yellow: the NAG inhibitor;
blue: residues in the binding pocket that are non-polar, depicted as space filling balls;
red: other amino acids in the binding pocket;
gray: the rest of the Lysozyme molecule, but only the backbone.

Trouble shooting: In case your cartoon (ribbon) display does not look nice:
1st: in the Control panel window, check that the coloring commands are selected to pertain to the the ribbon.
2nd: under preferences, select ribbon, and place a check mark in the field "render as solid ribbon"

Other things to try:
   3D rendition (in the display menu),
   slab view (shift and mouse forward/backward move the slab through the molecule, shift and mouse left/right change the slab size),
   explore the make up of the pdp file (text icon below the cursor control),
   have a look at the opening control window (upper left icon below the cursor control).

If you right click on a residue in either the alignment window or the control window, the display centers on this residue.

shift and mouse click adds residues to the list of selected residues (works in either window)

Can you obtain a figure similar to the one below?

Go to the control panel click on the little black triangle to the right of the col column and select color ribbon, then secondary structure in the color menu. Display ribbon in the control panel, remove the other displays .....

Exercise 3: 20 minutes

Aligning F-ATPase alpha and beta subunits

Start SPdbV

Open 1bmf.pdb

Color Chain

Change color chainD to grey/blue (click in control panel on D in first column to select chain D, right click on COL, select color)

Scrol down the control panel and select all ATP analogs (press shift key then click to add to selection)

right click on COL in heading and select red color

Read the pdb file to get info on which chain is which

select chain F (including nuc) and save selected residues as betaTP.

select chain A (including nuc) and save selected residues as alphaE.

After playing with the F1-ATPase, close this file and open betaTP AND alphaE.

Open layer info (Wind menu)

select and display only the nucleotides

There are different ways to align 3-D structures. One way is to select 3 corresponding points in each of the two structures. To do so you can use the substrate molecule.

Using the mov check off in the Layer Info Window, reorient the two ANPs so that they are in a similar orientation (but not overlapping).

Click on the align bottom with the 3 green and 3 red dots. Notice the red instructions that appear in the header next to the pdb-page icon. Follow these instruction using three corresponding atoms.

SHIFT DISPLAY CA chain only(Shift makes the commands act on both layers)

Using the mov checks in the Layer Info, move the two chains next to each other.

What do you think about the result?

Another way to align structures is to use the magic fit in the tools command. Do this and run improve fit (again, notice the red info in the header)

Click on alpha in Layer info to make the alpha subunit the active layer

Color CPK

Make the beta subunit the active layer

COLOR rms . The further the atoms in the beta subunit are away from the alpha subunit, the longer wavelengths it is the colored.

DISPLAY Show alignment window - gives you the aligned sequences.

Which part of the molecule appears most different between the Alpha vs. the Beta subunit?

Is the Walker motif (G--G--GKT) well aligned in the structure base amino-acids alignments (check the alignment window)?

If you have time, repeat the exercise for the three beta subunits to observe the structural changes the beta subunit (chain D, E and F) is undergoing in the catalytic cycle. Ctrl TAB let you cycle through the three layers.

If you are interested to explore the PHYLIP package in more detail:

Exercise 4.

PHYLIP is a collection of programs for phylogenetic analyses written by Joe Felsenstein. The programs are freely available (including source code), and can be used on a variety of different operating system. The programs are modular. Different modules exist to create bootstrap samples, calculate distance matrices and calculate trees from the distance matrices (Fitch and Neighbor), calculate consensus trees, etc.. All programs either use files called infile or intree, or alternatively the user needs to provide the file name. We will use the sequences from the exercise above. The file is here. (Note for your future work with this program that phylip by default treats gaps as a 21 character. If you want to treat the gap as missing data, you need to replace the gap symbol with "?"'s). In case you want to use one of the programs on your own, you need to read the excellent manuals that come with the software.

Save the file with the aligned sequences in the directory where the PHYLIP executables are stored (within the mcb221 folder).

To calculate a phylogenetic tree from the aligned sequences using protein parsimony, double click the icon protpars.app. This should open a terminal window and launch the protpars program. When prompted, enter the name of the sequence file (testseq1b.phy). Read through the menu options, but do not change them. Enter Y to start the program. The results will be written into two files outtree and outfile. Outtree can be opened with njplot, outfile is a text file. Inspect both files. (Note: Phylip by default uses the two file named outfile and outtree. Rename your files so that you remember what is in there.)

Where does the Salmonella sequence go? (This is as expected, parsimony analyses are very sensitive to the Long Branch Attraction (LBA)).
How are the fungal sequences resolved? (What does this tell us about parsimony and missing data?)

To calculate a distance matrix from the aligned sequences, we can use the program PROTDIST. You start it by double clicking on PROTDIST.app. Protdist allows to select many different models to calculate distances between two sequences. We will select a model that considers that different sites change with different probability by choosing option G. After we start the program by entering Y, we are prompted to enter 1/shape parameter, we choose one (1), which corresponds to an exponential distribution.
We rename the outfile into dist.txt.

Phylip has several programs to calculate trees from distance matrices. The fastest is Neighbor (same algorithm as used in clustalw). Neighbor joining is an algorithmic tree building program (not much liked by many evolutionary biologist, it does not find a tree that fulfills an optimality criterion). Using Neighbor.app, calculate neighbor joining trees from dist.txt . Rename the outtree into Neighbor1.ph.

FITCH.app is a program that tries to find the tree that fits the distance matrix with the least amount of error (using the defaults it uses a weighted least square error, placing a higher weight on shorter distances). Especially in case of trees with many branches this gives superior results compared to NEIGHBOR, but it can take a long time. Start FITCH.app, load dist.txt, select option G (global rearrangement) and enter Y to start the program. Rename the outtree FITCH.ph. Inspect the tree with njplot. What is different compared to Neighbor.ph? Which one seems more realistic?

In the analyses using PROTDIST, we found that the two yeasts did not group together. What could one do to improve the analyses?

PHYLIP contains many modular programs. The program seqboot allows to create pseudosamples from a set of aligned sequences. The other programs have the option to analyze multiple data sets, their output then consists of multiple distance matrices and, ultimately, of multiple trees. The program consense can calculate a consensus from these multiple trees. The bootstrap support values indicate in how many instances the different bipartitions (=splits or branches) were recovered in the pseudosamples.

For a distance matrix analysis, the programs would be joined together in the following order: seqboot, protdist, fitch, consense, njplot. For a parsimony analysis: seqboot, protpars, consense, njplot.

Using the same dataset, do a bootstrap analysis using 100 bootstrapped samples followed by either parsimony, and/or distance matrix analysis ( use the exponential distribution to correct for among site rate variation and Fitch.app to calculate the trees. Remeber to tell the programs to analyze multiple data sets). Summarize the obtained trees using consense.app. The results are written into outfile.

Which analyses did you perform? Which aspects of the phylogeny were well supported, which were not?

Optional :

If you have more time to spare and you are up for a challenge, take a look at the nucleosome. Right click here and save as pdb file. Open it from within spdbv. You might want to do some of the future exercises with the nucleosome in addition to the ATPases - thus save the pdb file, where you can find it again.

Align all the histones form the nucleosome to one reference histone and color in rmv:

The result might look something like this:

The picture shows a structure alignment of the 8 histones (2 each) that are part of the nucleosome. All the histones were colored regarding the match to H2A, except H2A, which was colored according to its match to H3. Coloring option RMS - shorter wavelengths - better match

Below same as last figure, but histones are depicted side by side :

Below are two views of the complete nucleosome. Histones H2A are depicted as space filling balls and RMS colored regarding their match to H3. The rest of the molecule is colored according to chain.

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone.