Have an Android phone?

Try our new game!

Closest pair of points

This article describes the algorithm of efficient finding the pair of closest points in the given set of points, including the interactive visualization.

Demonstration of the algorithm. With four points, the are following steps: a) division into two sets of two points; b) check of both sets (with two points in the set, the distance needs to be just measured); c) yellow rectangles appear in the left side indicating the search boundaries; the points on the right side that are touched by these rectangles must be checked if they do not make the closest pair with the corresponding points on the right side; d) all helper markers are cleared, showing the solution. You can place points by the mouse click or add four random points at once with the button.
The trivial solution of this problem would be to loop over all points in the set, then for each point loop over the same set again, checking this way all possible pairs and then selecting the pair with the closest distance. In total, n*(n-1) points must be compared (the points needs not be checked against itself). With big number of points this soon becomes impractical. Naive solution requires time where n is the number of points and d is the number of dimensions[1][2]

As soon as we know the distance between the two arbitrary points, we do not need to check any pair for that we known that its points are located more far apart than this known distance. The approximate distances can be estimated if we sort our points along x or y axis. The difference between one of the coordinates is always less than the full distance between the points, Using the sorted list rather than arbitrary set can significantly reduce the number of pairs that must be checked, as we can break the search as soon as the difference along the sorted axis exceeds the distance between the points in the closest pair we already know. This optimization is very obvious and sufficient by itself for the single dimensional space.

Deeper optimization that also works with two or three dimensions and allows to find the closest pair in O(n log n) time requires to use recursive division and partitioning that is somewhat similar to Quicksort algorithm[3].

The algorithm

  • Sort the points along one of the coordinates (x coordinate, for instance).
  • Split the set of points into two equal-sized subsets by a vertical line x = x_mid.
  • Apply the previous step recursively in the left and right subsets. This gives the left-side and right-side minimal distances l_min and r_min respectively
  • Find the minimal distance among the pair of points in which one point lies on the left of the dividing vertical and the second point lies to the right (m_min). This value is either returned to the upper level of recursion or (at the highest level) is the minimal distance between the points.

The last step is the most obvious at the deepest level of recursion, when there is only one point to the right and one point to the left (so just the distance between them needs to be checked). If not, the distance ranges are computed for the left side points. Points that are on the right side yet fall into these ranges are checked against the corresponding points on the left side for the possible closest pair. As the list is sorted by the x coordinate, the set of points between the two vertical lines can be iterated without looking at the rest of the points.

If there is an odd number of points, it is not possible to divide them into the two sets of equal size. In such cases one of the sides can contain one point less; the algorithm steps still can be correctly applied. Subset that contains only one point can return infinity as its value.

When the splitting is applied recursively, the left subset spans from left boundary till the middle line of the current subset and the right subset spans from the middle line till the right boundary of the current subset.

If we need to known the points and not just the distance between them, m_min, l_min and r_min must be structures that remember the points of the pair that they represent.

Visualization

Our visualization allows to set points with the mouse and then apply algorithm at once or step by step. It shows the current boundaries of the partitioned interval as the vertical black lines and the middle line as the red line with the red triangle above. the boundaries and the middle line that are current (belong to the current level of recursion) are highlighted in bold (thicker). After recursively descending first to the right and then to the left regions, the yellow rectangles around the points on the left side show how far the machine should search. Only the points the belong to the right side and are inside the yellow rectangle must be taken into consideration.

There is a proven theorem that at most six neighbours need to be checked if they do not make the closest pair with the given point,


References

  1. 1 http://www.cs.ucsb.edu/~suri/cs235/ClosestPair.pdf
  2. 2 http://www.facweb.iitkgp.ernet.in/~arijit/courses/autumn2005/close.pdf
  3. 3 http://en.wikipedia.org/wiki/Closest_pair_of_points_problem