Nussinov algorithm

This is an advanced topic. It assumes that your are familiar with Dynamic programming table

Nussinov algorithm[1] is an algorithm to predict possible RNA secondary structure (folding), discovering parts that have complementary sequences. Unlike search algorithms, this method works with the single sequence. It computes the sequence against itself using dynamic programming table, however the matching scores are set so that the letters are treated as "matching" if the corresponding nucleotides would pair. Hence A "matches" T and G "matches" C, but none of the four (AGTC) nucleotides matches itself as two identical nucleotides do not pair. The algorithm is formally written as

Here w(i,j) = 1 if chars at the positions i and j are complementary (AT, TA, GC, CG). Otherwise this function is 0.

Another formula set that may be simpler is given in [2] and referred as Nussinov-Jacobson algorithm:

with initialization

Nussinov algorithm does not necessary generates the most stable structure and may have scattered matches that are not biologically reasonable[3]. It is too simple to be accurate but has been a stepping stone for more complex algorithms.

Pseudoknots

The algorithm above detects complementary sequences but tells nothing about how the RNA would fold. The usually expected structure ("knots") implies that one independent matching pair of sequences either completely precedes the other or is fully nested within[4]. To emphasize this fact, RNA folding is described using the dot-bracket notation, marking every pairing nucleotide with ( and its matching counterpart with ). Parenthesis in the resulted expression must be balanced, for instance

  • ....(((...)))......(((((......)))))...... - two independent
  • ....(((........(((.....)))...)))......... - one inside another
  • ....(((........(((.....)))..((((...))))....))) - two inside one

Using this rule, it is possible to write a wrapping algorithm that in many cases correctly predicts RNA folding. If these rules are violated, it is said that RNA forms "pseudoknots". Pseudoknots do exist in nature, and more complex algorithms are required to discover them.

Alternative sequences

A "real world" size RNA may have multiple alternative ways of folding, when either one or another subset of the detected matching sequences are involved. As a rule, RNA folds in the way that is the most "thermodinamicaly stable" - this usually means that the folding case that has the biggest total number of paired nucleotides takes over others.


References

  1. 1 I519 Introduction to Bioinformatics / Lecture 9 / Sep 24, 2007. Yuzhen Ye School of Informatics Indiana University
  2. 2 Nussinov algorithm lecture by Ruth Nussinov herself
  3. 3 RNA Secondary Structure. Case Western Reserver University encyclopedia
  4. 4 RNA folding

See also