DNA and Protein Sequence Notations

There are sets of symbols used to abbreviate nucleotides and amino acids. The characters can be either upper or lower case. The characters enable input of nucleic acid sequences taking full account of any ambiguities in the sequence.

         Symbol   Meaning
          ------   -------
            A       Adenine
            G       Guanine
            C       Cytosine
            T       Thymine
            U       Uracil
            Y       pYrimidine  (C or T)
            R       puRine      (A or G)
            W       "Weak"      (A or T)
            S       "Strong"    (C or G)
            K       "Keto"      (T or G)
            M       "aMino"     (C or A)
            B       not A       (C or G or T)
            D       not C       (A or G or T)
            H       not G       (A or C or T)
            V       not T       (A or C or G)
          X,N,?     unknown     (A or C or G or T)
            O       deletion
            -       deletion

The protein sequences are given by the one-letter code used by the late Margaret Dayhoff's group in the Atlas of Protein Sequences. They are as follows:

               Symbol               Stands for
               ------               ----------
                 A                     ala
                 B                     asx
                 C                     cys
                 D                     asp
                 E                     glu
                 F                     phe
                 G                     gly
                 H                     his
                 I                     ileu
                 J                  (not used)
                 K                     lys
                 L                     leu
                 M                     met
                 N                     asn
                 O                  (not used)
                 P                     pro
                 Q                     gln
                 R                     arg
                 S                     ser
                 T                     thr
                 U                  (not used)
                 V                     val
                 W                     trp
                 X             unknown amino acid
                 Y                     tyr
                 Z                     glx
                 *                nonsense (stop)
                 ?        unknown amino acid or deletion
                 -                   deletion

where "nonsense", and "unknown" mean respectively a nonsense (chain termination) codon and an amino acid whose identity has not been determined. The state "asx" means "either asn or asp", and the state "glx" means "either gln or glu" and the state "deletion" means that alignment studies indicate a deletion has happened in the ancestry of this position, so that it is no longer present.

Here are the same one-letter codes tabulated the other way 'round:

              Amino acid               One-letter code
              ----------               ---------------
                ala                           A
                arg                           R
                asn                           N
                asp                           D
                asx                           B
                cys                           C
                gln                           Q
                glu                           E
                gly                           G
                glx                           Z
                his                           H
                ileu                          I
                leu                           L
                lys                           K
                met                           M
                phe                           F
                pro                           P
                ser                           S
                thr                           T
                trp                           W
                tyr                           Y
                val                           V
                deletion                      -
                nonsense (stop)               *
                unknown amino acid            X
                unknown (incl. Deletion)      ?

Note: recently new "exotic" amino acids were added to the list of notations:


               Symbol               Stands for
               ------               ----------
                 O                     pyr (pyrrolysine)
                 U                     sec (selenocysteine)
                 

However, most programs do not (yet) recognize these notations.