Have an Android phone?

Try our new game!

Dynamic programming table

This is an advanced topic. It assumes that your are familiar with Sequence alignment

To use the applet, please set the two sequences with Set button and then use > to advance by one step. At the end of the algorithm, the found alignment path is highlighted in green.

Dynamic programming table is the sequence alignment alorithm that forms the base of many more complex algorithms in this group [1].

The dynamic programming approach uses rectangular table, where one sequence makes the horizontal edge and another makes the vertical edge. Any cell in the table contains the edit distance of the substrings that start at zero position (top left corner) and end at the current column/row in the table. The the path from the top left corner to the bottom right corner that follows the lowest possible values (shortest edit distance) is the found alignment of the two sequences, using this algorithm. The simple (naive) attempt to fill in the table would be to loop over all cells and try to guess the edit distance every time from scratch. This is not efficient enough to be usable.

The table can be filled in more efficiently if we start from the top bottom corner and and proceed gradually increasing both horizontal and vertical position. To value for the cell at any position can be easily found from just from the match score (comparing characters that are indexed in the two strings by the current row and column), to that the minimal value from the three adjacent cells (top, left and top left) should be added. The top left cell extends the direct comparison trace (as both indexes advance by one) and in this case only mismatch penalty (if any) is added.

Picking the values from the top or left cells represent turns that come from insertions and deletions, as in this case only one of the indexes advances. In this case apart the mismatch score (if any), the gap penalty must be additionally added. The gap penalty is often fixed but for easier understanding in the demo it is defined separately for any location inside the string (insertion is just a gap in another string), as a growing sequence.

In this simple algorithm we just need a loop that visits every cell in the table to get a full picture of the similarity between the two sequences. More advanced algorithms try to complete the task faster by computing only part of this table close to the expected optimal trace of alignment. This is possible because the penalty score only grows as we move outside the green valley of similarity, and the valley should start at the top left corner. Hence it is possible to attempt to trace the valley, computing only values near the current minimum.

The "classic" dynamic programming table treats all mismatches as equivalent. If some letters should be treated as more similar (similar amino acids in a protein sequence, for instance), a score comparison table can be implemented. By implementing this table, we get the Needleman & Wunsch algorithm.

References

  1. 1 Dan Gusfield (1997). Algorithms of on Strings, Trees and Sequences. Cambridge University Press, ISBM 0 521 58519 8, 534p.