[go: nahoru, domu]

Jump to content

Interval tree: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Breuwi (talk | contribs)
Enhance intro
Breuwi (talk | contribs)
Rewrite (previous version was wrong) as expansion. I used "_" instead subscript because it interfered with the flow of the text.
Line 1: Line 1:
In [[computer science]], the '''interval tree''' is an [[ordered tree data structure|ordered tree]] [[data structure]] to hold intervals. Specifically, it allows one to efficiently find all intervals or that overlap with any given interval or point. It is often used for windowing queries, for example, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene.
In [[computer science]], the '''interval tree''' is an [[ordered tree data structure|ordered tree]] [[data structure]] to hold intervals. Specifically, it allows one to efficiently find all intervals or that overlap with any given interval or point. It is often used for windowing queries, for example, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene.


In a simple case, the intervals do not overlap and they can be inserted into a simple binary tree and queried in O(log ''n'') time. However, with arbitrarily overlapping intervals, there is no way to compare two intervals for insertion into the tree since orderings sorted by the beginning points or the ending points may be different. It might be tempting to build two parallel trees, one ordered by the beginning point, and one ordered by the ending point of each interval. We can discard half of each tree in O(log ''n'') time, but the results must be merged, requiring O(''n'') time. This gives us queries in O(''n'' + log ''n'') = O(''n''), no better than brute-force!
An interval tree is a simple ordered tree, for example a [[binary search tree]] or [[self-balancing binary search tree]], where the tree is ordered by the 'low' values of the intervals, and an extra annotation is added to every node recording the maximum high value of both its subtrees. It is simple to maintain this attribute in only O(''h'') steps during each addition or removal of a node, where ''h'' is the height of the node added or removed in the tree, by updating all ancestors of the node from the bottom up.


Interval trees solve this problem. Queries require O(log ''n'' + ''m'') time, with ''n'' being the total number of intervals and ''m'' being the number of reported results. Construction requires O(''n'' log ''n'') time, and storage requires O(''n'') space.
Now, it's known that two intervals ''A'' and ''B'' overlap only when both ''A''.low ≤ ''B''.high and ''A''.high ≥ ''B''.low. When searching the trees for nodes overlapping with a given interval, you can immediately skip:
* all nodes to the right of nodes whose low value is past the end of the given interval.
* all nodes that have their maximum 'high' value below the start of the given interval.


== Constructing an Interval Tree in One Dimension ==
This is easily extended to higher dimensions, by cycling through the dimensions at each level of the tree. For example, for two dimensions, the odd levels of the tree might contain ranges for the ''x'' coordinate, while the even levels contain ranges for the ''y'' coordinate.

Given a set of ''n'' intervals on the number line, we want to construct a data structure so that we can efficiently retrieve all intervals overlapping another interval or point.

We start by taking the entire range of all the intervals and dividing it in half at ''x_center'' (in practice, ''x_center'' should be picked to keep the tree relatively balanced). This gives three sets of intervals, those completely to the left of ''x_center'' which we'll call ''S_left'', those completely to the right of ''x_center'' which we'll call ''S_right'', and those overlapping ''x_center'' which we'll call ''S_center''.

The intervals in ''S_left'' and ''S_right'' are recursively divided in the same manner until there are no intervals left.

The points in S_center that overlap the center point are stored in a separate data structure linked to the node in the interval tree. This data structure consists of two lists, one containing all the intervals sorted by their beginning points, and another containing all the intervals sorted by their ending points.

The result is a binary tree with each node storing:
* A center point
* A pointer to another node containing all intervals completely to the left of the center point
* A pointer to another node containing all intervals completely to the right of the center point
* All intervals overlapping the center point sorted by their beginning point
* All intervals overlapping the center point sorted by their ending point

== Querying an Interval Tree in One Dimension ==

Given the data structure constructed above, we receive queries consisting of ranges or points, and return all the ranges in the original set overlapping this input.

=== Intersecting with an Interval ===

First, we can reduce the case where an interval ''R'' is given as input to the simpler case where a single point is given as input. We first find all ranges with beginning or end points inside the input interval ''R'' using a separately constructed tree. In the one-dimensional case, we can use a simple tree containing all the beginning and ending points in the interval set, each with a pointer to its corresponding interval.

A binary search in O(log ''n'') time for the beginning and end of R reveals the minimum and maximum points to consider. Each point within this range references an interval that overlaps our range and is added to the result list. Care must be taken to avoid duplicates, since an interval might begin and end within ''R''. This can be done using a binary flag on each interval to mark whether or not it has been added to the result set.

The only intervals not yet considered are those overlapping ''R'' that do not have a point inside ''R'', in other words, intervals that enclose it. To find these, we pick any point inside ''R'' and use the algorithm below to find all intervals intersecting that point (again, being careful to remove duplicates).

=== Intersecting with a Point ===

The task is to find all intervals in the tree that overlap a given point ''x''. The tree is walked with a similar recursive algorithm as would be used to traverse a traditional binary tree, but with extra affordance for the intervals overlapping the "center" point at each node.

For each tree node, ''x'' is compared to ''x_center'', the midpoint used in node construction above. If ''x'' is less than ''x_center'', the leftmost set of intervals, ''S_left'', is considered. If ''x'' is greater than ''x_center'', the rightmost set of intervals, ''S_right'', is considered.

As each node is processed as we traverse the tree from the root to a leaf, the ranges in its ''S_center'' are processed. If ''x'' is less than ''x_center'', we know that all intervals in ''S_center'' end after ''x'', or they could not also overlap ''x_center''. Therefore, we need only find those intervals in ''S_center'' that begin before ''x''. We can consult the lists of ''S_center'' that have already been constructed. Since we only care about the interval beginnings in this scenario, we can consult the list sorted by beginnings and quickly find ''x'' with a binary search.

All ranges from the beginning of the list to that found point overlap ''x'' because they begin before ''x'' (as we proved through the binary search) and end after ''x'' (as we know because they overlap ''x_center'' which is larger than ''x''.

Likewise, if ''x'' is greater than ''x_center'', we know that all intervals in ''S_center'' must begin before ''x'', so we find those intervals that end after ''x'' using the list sorted by interval endings.

If ''x'' exactly matches ''x_center'', all intervals in ''S_center'' can be added to the results without further processing and tree traversal can be stopped.

== Interval Trees in Higher Dimensions ==

The interval tree data structure can be generalized to a higher dimension ''N'' with identical query and construction time and O(''n'' log ''n'') space.

First, a range tree in ''N'' dimensions is constructed that allows efficient retrieval of all intervals with endpoints inside the query region ''R''. Once the corresponding ranges are found, the only thing that is left are those ranges that enclose the region in some dimension. To find these overlaps, N range trees are created, and one axis intersecting ''R'' is queried for each. For example, in two dimensions, the bottom of the square ''R'' (or any other horizontal like intersecting it) would be queried against the interval tree constructed for the vertical axis. Likewise, the left (or any other vertical line intersecting R) would be queried against the interval tree constructed on the vertical axis.

Each range tree also needs an addition for higher dimensions. At each node we traverse in the tree, ''x'' is compared with ''S_center'' to find overlaps. Instead of two sorted lists of points as was used in the one-dimensional case, a range tree is constructed. This allows efficient retrieval of all points in ''S_center'' that overlap region ''R''.


== References ==
== References ==
Line 15: Line 62:


[[Category:Trees (structure)]]
[[Category:Trees (structure)]]


{{comp-stub}}

Revision as of 06:44, 23 September 2006

In computer science, the interval tree is an ordered tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals or that overlap with any given interval or point. It is often used for windowing queries, for example, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene.

In a simple case, the intervals do not overlap and they can be inserted into a simple binary tree and queried in O(log n) time. However, with arbitrarily overlapping intervals, there is no way to compare two intervals for insertion into the tree since orderings sorted by the beginning points or the ending points may be different. It might be tempting to build two parallel trees, one ordered by the beginning point, and one ordered by the ending point of each interval. We can discard half of each tree in O(log n) time, but the results must be merged, requiring O(n) time. This gives us queries in O(n + log n) = O(n), no better than brute-force!

Interval trees solve this problem. Queries require O(log n + m) time, with n being the total number of intervals and m being the number of reported results. Construction requires O(n log n) time, and storage requires O(n) space.

Constructing an Interval Tree in One Dimension

Given a set of n intervals on the number line, we want to construct a data structure so that we can efficiently retrieve all intervals overlapping another interval or point.

We start by taking the entire range of all the intervals and dividing it in half at x_center (in practice, x_center should be picked to keep the tree relatively balanced). This gives three sets of intervals, those completely to the left of x_center which we'll call S_left, those completely to the right of x_center which we'll call S_right, and those overlapping x_center which we'll call S_center.

The intervals in S_left and S_right are recursively divided in the same manner until there are no intervals left.

The points in S_center that overlap the center point are stored in a separate data structure linked to the node in the interval tree. This data structure consists of two lists, one containing all the intervals sorted by their beginning points, and another containing all the intervals sorted by their ending points.

The result is a binary tree with each node storing:

  • A center point
  • A pointer to another node containing all intervals completely to the left of the center point
  • A pointer to another node containing all intervals completely to the right of the center point
  • All intervals overlapping the center point sorted by their beginning point
  • All intervals overlapping the center point sorted by their ending point

Querying an Interval Tree in One Dimension

Given the data structure constructed above, we receive queries consisting of ranges or points, and return all the ranges in the original set overlapping this input.

Intersecting with an Interval

First, we can reduce the case where an interval R is given as input to the simpler case where a single point is given as input. We first find all ranges with beginning or end points inside the input interval R using a separately constructed tree. In the one-dimensional case, we can use a simple tree containing all the beginning and ending points in the interval set, each with a pointer to its corresponding interval.

A binary search in O(log n) time for the beginning and end of R reveals the minimum and maximum points to consider. Each point within this range references an interval that overlaps our range and is added to the result list. Care must be taken to avoid duplicates, since an interval might begin and end within R. This can be done using a binary flag on each interval to mark whether or not it has been added to the result set.

The only intervals not yet considered are those overlapping R that do not have a point inside R, in other words, intervals that enclose it. To find these, we pick any point inside R and use the algorithm below to find all intervals intersecting that point (again, being careful to remove duplicates).

Intersecting with a Point

The task is to find all intervals in the tree that overlap a given point x. The tree is walked with a similar recursive algorithm as would be used to traverse a traditional binary tree, but with extra affordance for the intervals overlapping the "center" point at each node.

For each tree node, x is compared to x_center, the midpoint used in node construction above. If x is less than x_center, the leftmost set of intervals, S_left, is considered. If x is greater than x_center, the rightmost set of intervals, S_right, is considered.

As each node is processed as we traverse the tree from the root to a leaf, the ranges in its S_center are processed. If x is less than x_center, we know that all intervals in S_center end after x, or they could not also overlap x_center. Therefore, we need only find those intervals in S_center that begin before x. We can consult the lists of S_center that have already been constructed. Since we only care about the interval beginnings in this scenario, we can consult the list sorted by beginnings and quickly find x with a binary search.

All ranges from the beginning of the list to that found point overlap x because they begin before x (as we proved through the binary search) and end after x (as we know because they overlap x_center which is larger than x.

Likewise, if x is greater than x_center, we know that all intervals in S_center must begin before x, so we find those intervals that end after x using the list sorted by interval endings.

If x exactly matches x_center, all intervals in S_center can be added to the results without further processing and tree traversal can be stopped.

Interval Trees in Higher Dimensions

The interval tree data structure can be generalized to a higher dimension N with identical query and construction time and O(n log n) space.

First, a range tree in N dimensions is constructed that allows efficient retrieval of all intervals with endpoints inside the query region R. Once the corresponding ranges are found, the only thing that is left are those ranges that enclose the region in some dimension. To find these overlaps, N range trees are created, and one axis intersecting R is queried for each. For example, in two dimensions, the bottom of the square R (or any other horizontal like intersecting it) would be queried against the interval tree constructed for the vertical axis. Likewise, the left (or any other vertical line intersecting R) would be queried against the interval tree constructed on the vertical axis.

Each range tree also needs an addition for higher dimensions. At each node we traverse in the tree, x is compared with S_center to find overlaps. Instead of two sorted lists of points as was used in the one-dimensional case, a range tree is constructed. This allows efficient retrieval of all points in S_center that overlap region R.

References