reverse complement

This page provides the web service to compute a reverse complement of the given DNA or RNA sequence, along with the brief explanation. The reverse complement sequence is the partner sequence with that the given sequence would pair with the highest possible affinity.

Reverse complement (paste sequence and press the button). Upper/lowercase is not affected. Note that pressing twice returns initial sequence. Depending on your Java configuration, paste may not work with the applet. In such case, please use our JavaScript version for longer sequences

In double stranded DNA and RNA sequences, adenine (A) pairs with thymine (T) or with uracil (U), and guanine (G) pairs with cytosine (C).[1] In the complement pairing sequence each nucleotide is substituted by its partner. Nucleotide sequences have also direction and in double stranded DNA or RNA the paired sequences run in the opposite directions (are antiparallel). Hence the complement sequence must also be reversed.


While in the real world DNA is normally double stranded (coupled with its reverse complement), RNA is usually single stranded. However RNA does contain short double-stranded parts, largely responsible for the molecule taking the required spatial shape. These parts can be detected with Nussinov algorithm and visualized using dot-bracket notation.


RNA double stranded structure ("hairpin"), containing sequences GUAC and CAUG that are reverse complement of each other.


DNA can also be single stranded. During protein synthesis (initial stage, transcription of genomic information in the nucleus), DNA pairs with RNA. If the molecules are long enough, they usually will stick together also if only part of the nucleotides pair. G and C interact stronger than A and T, so in some context they may be understood as "more reverse complement" of each other.

Converting ambiguity codes

Certain tasks also require to reverse-complement ambiguity codes, meaning more than one possible nucleotide. Such codes may arise from sequencing errors, indicate that specified enzyme like restrictase accepts several alternatives for the given position or be results of consensus search between different organisms. Normally purine (R - A or G) is complemented into pyrimidine (Y, C or T) and amino (M, A or C) into keto (K, G or T), while strong (S) and weak (W) that differ by the number of hydrogen bonds (two or three) are not swapped (nucleotide and its complement of course use the same number of bonds to make the complementing pair). Codes that specifically exclude one nucleotide can be complemented into codes that specifically exclude the complementing nucleotide.

See [2] for the typical conversion rules.

Implementation

This algorithm is relatively easy to implement unless it is important to care about performance (the real world sequences that require processing can be very long). Developers see the sequence reversing as an interesting task, if not done in the straightforward native way.[3].

Biotechnicians frequently need to perform this operation when designing primers - short DNA sequences that must stick to the known subsequence in the larger DNA they investigate. Reverse complement is also usually implemented (directly or indirectly) in all tools that works with genomic DNA, as initially is usually unknown which of the two DNA strands contains the sequence being searched.








See also

References

  1. 1 Watson J.D, Crick F.H.C (1953) A Structure for Deoxyribose Nucleic Acid. Nature 171, 737-738.
  2. 2 Nucleotide conversion table
  3. 3 Advogato.org discussion about reversing of the sequence