Translation (biology)

This page provides the web service to translate nucleotide sequence into protein, along with the brief explanation. In biology, translation is a conversion of when genetic code (nucleotide sequece) into amino acid sequence of the new protein being built.

Translating applet (sample sequence is a fragment of mellitin, a bee venom). x in the output means a stop codon where the protein synthesis will be terminated.
It is not the only step in protein synthesis, many others important steps are performed both before and after it to produce a protein from genetic code. However it is, without any doubt, an important step.

Genetic code

The mRNA (nucleotide sequence) only contains four major nucleotides (A,G,U,C) but this sequence must be translated into peptide sequence containing as many as 20 different amino acids. Hence more than one nucleotide per amino acid is required, and actually three are used. Three nucleotides would provide 64 combinations (called codons). Majority of combinations map to one or another amino acid (hence most of amino acids can be encoded by more than one codon). Several codons (UAA, UAG, UGA) do not encode any amino acid and serve as terminators (end marks), where the protein sequence should end (RNA continues after termination point, sometimes even starting a new protein). The two first nucleotides are more important; change of the third one frequently does not change the encoded amino acid.

RNA to amino acid translation table (genetic code) is surprisingly universal; humans, mushrooms and plants all use the same conversion table. Only some very specific organisms that are often already parts of other cells (like mitochondria, for instance) may use a different code, while it is still highly similar. Genetic code is "locked" because any change in it causes the incorrect translation of hundreds and thousands of proteins where the changed code is used and such cell does not survive. However the cell may not be able to produce huge amounts of protein where one or more amino acids are encode by unusual code that is infrequently used for these amino acids in other genes of the cell. Gene engineers need to pay attention to this when they move the gene between the two very different organisms (like human and bacteria).

Translation frame

Insertion or deletion of the single nucleotide would result completely different amino acids encoded after the point of mutation. Such mutations (known as frame shift mutations) are known and usually render protein fully unusable for the cell. Hence it is important to "synchronize" the starting point of the synthesis. The code AUG (that also codes amino acid) serves as start marker; the synthesis begins where this codon is found.


RNA that participates in translation is already single stranded. However DNA, the initial source of the genetic information, is usually double stranded, and RNA potentially could be copied (transcripted) from either strand (chain). Transcripting from the "wrong" strand would result a reverse complement sequence that also encodes completely different protein. Normally only one DNA strand is transcribed for the particular gene.

As a result, a given DNA sequence can be converted into protein sequence in six possible ways (3 frame positions + two DNA strands). Hence the tools that search for the given protein sequence in a sequenced genome DNA (like BLAST must actually perform six searches rather than one.

See also