Four Russians (algorithm)

This is an advanced topic. It assumes that your are familiar with Dynamic programming table

The Four Russians algorithm provides increased performance of the sequence alignment by partitioning dynamic programming table into multiple blocks and filling in the table one block at time rather than one cell at time[1]. The idea is to spend only and not per block. To do this, blocks are precomputed in advance and during the filling in step these precomputed values are reused. The theoretical limit of this algorith is

It is possible to pre-compute blocks because the value of every cell in the table only depends on three adjacent cells (top, left and top left) and the two characters, one in the pattern and one in the text string. If blocks partially overlap, adjacent blocks sharing one row and one column, the all content of the block is a function of its first row, first column and the substrings of text and pattern that are along the edges of the block.

Division into blocks is not efficient enough on its own to increase the performance. However the algorithm uses one more optimization, the offsets. In any row, column or diagonal of the simple dynamic programming table for edit distance (insertion, mismach and deletion penalties are equal), the values of the two adjacent cells differ at most by one. Using this lemma, the values in a single row of the block can be encoded specifying the absolute value for only the first cell in a row. For other cells, it is enough to specify the difference (offset) of the each successive cell to its left neighbour. The first member belongs to the left or top row and is actually the input of the block function; need not be stored. Only the difference rows need to be stored and there are much less of them than it would be vectors to enumerate all possible absolute values for the row.

The Four Russians algorithm is often viewed as a theoretical contribution and is seldom implemented exactly as described in the original paper. Instead, the basic ideas of precomputed blocks and offsets are commonly reused.

Only one of the four authors is actually Russian[2] .

References

  1. 1 Alzarov V.L, Dinic E.A, Kronrod M.A, Faradzev I.A. (1970). On economic construction of the transitive closure of a directed graph. Dokl Acad Nauk SSSR 194 487-488.
  2. 2 Dan Gusfield (1997). Algorithms of on Strings, Trees and Sequences. Cambridge University Press, ISBM 0 521 58519 8, 534p.