SEALS Demo Exercise

SEALS (A System for Easy Analysis of Lots of Sequences)
is a software package designed for large-scale data analysis in bioinformatics. Using a friendly command-line user interface, SEALS allows to automate many repetitive tasks as demonstrated below.

General:

1. Move to TCSH

   tcsh

2. Activate SEALS

   activate_seals

A. Single Sequence Manipulation

  1. BLAST search with a query protein my_protein.txt against nr:
    blastall -i my_protein.txt -d /flower/db/nr -p blastp -o results.br -I T
  2. Prune BLAST search with E-value cut off 10-50:
    blast2blast results.br -ecut e-50 -save
  3. Extract GI numbers from BLAST search results:
    blast2gi results_mod.br -save
  4. Extract sequences from Genbank using GI list obtained in the previous step:
    gi2genbank results_mod.gi -timeout 50 -tries 0 -append
  5. Convert Genbank formatted dataset to FASTA formatted with custom definition line:
    feature2fasta results_mod.gb -defline '$gi|$organism|$definition' -save
  6. Check for identical FASTA records:
    fauniq results_mod.fa -basis=sequence -save
  7. Create alignments:
    clustalw results_mod_uniq.fa -align -type=protein
  8. Create NJ trees:
    clustalw results_mod_uniq.aln -tree -kimura -tossgaps


B. Multiple Sequences manipulation at once:

  1. BLAST search with query proteins my_proteins.txt against nr:
    blastall -i my_proteins.txt -d /flower/db/nr -p blastp -o results.br -I T
  2. Prune BLAST search with E-value cut off 10-50:
    blast2blast results.br -ecut e-50 -save -delete
  3. Shatter separate BLAST searches into separate files:
    shatterblast results_mod.br -word 2

    rm results_mod.br
  4. Extract GI numbers from BLAST search results:
    blast2gi *.br -save
  5. Extract sequences from Genbank using GI list obtained in the previous step:
    gi2genbank *.gi -timeout 50 -tries 0 -append
  6. Convert Genbank formatted dataset to FASTA formatted with custom definition line:
    feature2fasta *.gb -defline '$gi|$organism|$definition' -save
  7. Check for identical FASTA records:
    fauniq *.fa -basis sequence -save
  8. Create alignments:
    foreach_file *_uniq.fa -command='clustalw -infile={} -align -type=protein';
    
  9. Create NJ trees:
    foreach_file *.aln -command='clustalw -infile={} -tree -kimura -tossgaps';

All commands separated by semicolons can be placed into a file (a script file, e.g. named script.sh) and than all you have to type is

./script.sh
(!!!)