Pairwise and Multiple Sequence Alignments

Alignment of two sequences (pairwise alignment):
Pairwise sequence alignment is an optimal line-up of two sequences (Note: although any two sequences may be aligned, it only make sense to align homologous sequences).

The alignment programs align two sequences by introductions of gaps. Gaps are either insertions or deletions in a sequence. Sometimes they are also called indels. Introductions of gaps is penalized. The idea is to find the least costly solution in terms of substititions and gaps.

The Swiss Institute for Bioinformatics provides a JAVA applet that perform interactive dot plots. It is called Dotlet. The main use of dot plots is to detect domains, duplications, insertions, deletions, and, if you work at the DNA level, inversions (check the examples on the help pages of the dotlet application).

Multiple sequence alignment is an alignment that contains more than two sequences.

One of the most widely used alignment programs is ClustalW ( Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22, 4673-4680).

ClustalW runs on all most popular platforms (different flavors of unix, mac, pc) and is freely available.

ClustalX is the Graphical User Interface(GUI) to ClustalW. ClustalX is available for different platforms at the ebi's ftp site.

Clustal reads and writes several formats used by different programs. It also reads aligned sequences. One of the commonly used formats for input is FASTA format. The aligned sequences are saved by default in Clustal own format (.aln extension)

ClustalW is also implemented on many web-servers. One of those is available as a part of BCM launcher.

To align sequences Clustal performs the following steps:

1) Pairwise distance calculation
2) Clustering analysis of the sequences
3) Iterated alignment of two most similar sequences or groups of sequences.

It is important to realize that the second step is the most important. The relationships found here will create a serious bias in the final alignment. The better your guide tree, the better your final alignment. You can load a guide tree into clustal. This tree will then be used instead of the neighbor joining tree calculated by clustalw as a default. (The guide tree needs to be in normal parenthesis notation WITH branch lengths).

An excellent and very readable description of alignments is available here [Bioinformatics Course, U. of Chicago]