Full article: IB-CBB: an improved spatial index considering intersection based on clipped bounding boxes

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Efficiently querying multiple spatial datasets is a challenging task in geoscience. The majority of spatial processing techniques use minimum bounding box (MBB) to approximate neighbouring spatial objects and and place them adjacent in the spatial index. However, due to the existence of redundant space in MBB of these methods, this problem can significantly reduce the query efficiency. In this paper, we propose a novel two-stage adaptive method of clipping the bounding box in spatial query, called IB-CBB (Intersection Based Clipped Bounding Boxes). The first stage employs a clipped bounding box, which records the redundant spatial spaces within the bounding box of the spatial index by calculating the clip points. As a result, the computational complexity of indexed child nodes in the query process is reduced. The second stage optimizes the above query algorithm by judging the intersection of the query box and the MBB of index node, significantly reducing the query time. Experiments demonstrate that IB-CBB outperforms the baseline method in terms of reducing the computational time.

KEYWORDS:

1. Introduction

With the explosive growth of spatial data, effectively organizing and managing spatial data has become progressively significant. A spatial index is an important method to organize and retrieve spatial data. The core idea of most spatial indexes is to represent spatial objects or adjacent spatial objects with minimal bounding box (MBB) which only requires multiple 2D points. For example, in the R-tree and its variant trees, adjacent spatial objects are grouped together to form a higher-level node which represents the MBB of these spatial objects. The advantages of MBB lie in not only being compact for spatial data storage but also in the ability to easily calculate and filter spatial data whose MBB does not intersected with the query box. The querying efficiency of spatial index relies on the quality of spatial data partitioning, which is measured by the overlap and coverage of the resultant MBBs. With the rapid growth of the spatial data scale, two problems arise with the use of MBB: firstly, heavily intersecting MBBs might cause poor data partitioning precision, which makes it necessary to search multiple paths of the tree during spatial queries; secondly, the probability of the query box intersecting with an MBB might be increased by the redundant coverage, not the practical objects inside MBB. Thus, it is necessary to check several paths in the tree in the querying phase. Moreover, minimizing the coverage of MBBs is also critical for enhancing the performance of the spatial index.

In existing studies, many researchers proposed different optimized methods. Extensive researches have been conducted on how to minimize the coverage of MBB in the R-tree variant (Evagorou and Heinis Citation2021; Langendoen, Glasbergen, and Daudjee Citation2021). For instance, spatial objects can be fitted with polygons, or even circles and circular cones. However, due to the extensive redundant space in MBB of these methods, this problem has not been well solved. They generally have the following shortcomings: first, the structure used to represent spatial objects is too complex; Second, the operation of space intersection query is complicated. Third, irregular areas still create a lot of redundant space. The best of all is Sidlauskas’s proposal (Sidlauskas et al. Citation2018) of the clipped bounding box (CBB) for optimizing spatial index, which clips off the corners of MBBs to get the simple and non-convex polygons. CBB removes the dead space of spatial objects and stores clip points in a small, auxiliary data structure, which can avoid the complexities of representation and data intersecting calculation.

However, the current CBB technology is mainly oriented to the spatial retrieval requirements of 2D spatial data, and there is limited research on three-dimensional (3D) data. In this paper, we propose a novel two-stage adaptive method of clipping the bounding box in spatial query, called IB-CBB (Intersection Based Clipped Bounding Boxes), so that the performance optimization of the method can be remarkably enhanced in 3D space. In the flow of IB-CBB, the first stage employs a clipped bounding box, which records the redundant spatial locations in the bounding box of the spatial index by calculating the clip points, thereby reducing the child node calculation in the query process. The second stage optimizes the above query algorithm by judging the intersection of the query box and the MBB of index nodes, greatly reducing the query time. If there is no special explanation later, ‘index node’ refers to the MBB of the index node.

Due to the fact that CBB-based spatial indexes compare all clip points during query processing, this leads to in poor query performance when the query box covers a large area. Furthermore, when extending to 3D space indexing, CBB fails to consider the situation of adjacent vertices. The clipped space is not the most optimized. Therefore, the main contributions of this paper are as follows.

An improved R-tree index is proposed, in which the space of the bounding box in the index node is optimized to reduce unnecessary computation of child nodes during query processing.
The paper discusses in detail the intersection of the query box and the bounding box of the index node in different dimensions, providing a basis for optimizing the subsequent algorithmic judgements in queries.
An IB-CBB based query algorithm is proposed, which avoids redundant comparison of clip points during query processing and further optimizes the efficiency of spatio-temporal range queries.

The remainder of the paper is organized as follows. We begin by reviewing related work and current technologies. Then, we introduce the basic idea of a multi-dimensional spatial index structure based on a clipped bounding box. Following this, we propose an improved method, which is an intersection-based clipped bounding box. Experiments are conducted to validate our proposed methods. Finally, we summarize our conclusions and present ideas for future research.

2. Related work

In traditional multi-dimensional indexes, there are notable differences in the treatment of spatial and temporal dimensions. Spatial and temporal dimensions are treated as distinct dimensions in the data space. The fundamental concept of multi-dimensional indexing is to build a separate index for each dimension. This approach involves initially creating a spatial index at a specific time, such as an R-tree (Guttman Citation1984), Quad-tree (Esche et al. Citation2012) or $k^{2}$ -tree (Brisaboa, Ladra, and Navarro Citation2009), and follows the chronological order to construct corresponding spatial index at different timestamps. Subsequently, each timestamp’s spatial index is organized using a one-dimensional index, such as a B-tree (Graefe and Kuno Citation2011). The aim is to share invariant data within each spatial index to conserve storage space. Common approaches include MR-tree (Kim and Yun Citation2004), HR-tree (Li and Tang Citation2010), MV3R-tree (Tao and Papadias Citation2001), and SMO index (Romero, Brisaboa, and Rodrguez Citation2012). For instance, when dealing with spatial trajectory data, the temporal aspect is prioritized over the spatial aspect. As a result, multi-dimensional structure indexes offer high time query efficiency (Chakka et al. Citation2003). Given the frequent changes in data, each spatial index must accommodate these changes in trajectory data and be continuously updated, leading to a high overall maintenance cost for the index.

However, there is a lack of discussion on how to effectively handle both spatial and temporal dimensions in this research (Mahmood, Punni, and Aref Citation2019). Several multi-dimensional index structures have been proposed, such as 3DR-tree (Jun, Hong, and Yu Citation2003), TB-tree (Pfoser et al. Citation2000), STR-tree (Li and Tang Citation2011), PH-tree (Zäschke, Zimmerli, and Norrie Citation2014), SICC (S. Wang, Maier, and Ooi Citation2016). These indexes offer the advantage of adapting their structure to the distribution of the data. Nevertheless, many of these methods suffer from large redundancy in space and overlook the efficiency of index construction (L. Wang and Xu Citation2010).

3. Multi-dimensional spatial index based on CBB

3.1. General concept of clipped bounding boxes

The concept of Clipped Bounding Boxes (CBB) was proposed by Sidlauskas (Sidlauskas et al. Citation2018) as a solution to the issue of ‘dead space’ in MBB. As illustrated in , dead space refers to the regions within MBBs that do not contain any practical objects. CBB employs a core idea similar to the ‘skyline’ algorithm, where a set of auxiliary points are stored in each leaf node of the index structure to record redundant spaces. This approach is characterized by its compact storage structure and ease of implementation. Furthermore, CBB exhibits outstanding performance in calculating intersection testing, which significantly reduces I/O operations and saves query time. Notably, this method is highly versatile as it does not require any modifications to the construction and structure of the spatial index, other than the addition of an extra set of data. Consequently, CBB can be extensively applied to various MBB-based spatial indexes, particularly in R-tree and its variant trees.

Figure 1. Clipped bounding boxes in spatial index.

3.1.1. Vertexes and direction of MBB

In the context of MBBs, it is easy to observe that the redundant space is primarily concentrated near each vertex of the bounding box. This region is relatively straightforward to compute and optimize. Therefore, the focus of the optimization method should be on trimming the redundant space near each vertex. During the computation process, it is necessary to separately identify the clipped space for each vertex. In order to distinguish the vertices, the subscript $d$ of $n$ dimensions is utilized to denote the direction of each vertex in the bounding box.

(1)

d = d_{1} d_{2} d_{3} \dots d_{n}, d_{i} \in \{0, 1\}

(1)

The subscript of $d$ denote each dimension, and the 3D spatial index is represented by $d_{1} d_{2} d_{3}$ . The values 0 and 1 are used to denote the positive and negative directions of a dimension, respectively.

As shown in , if $R$ is used to denote the smallest bounding box of a 2D spatial index, then $R_{00}$ and $R_{11}$ denote the lower-left vertex and upper-right vertex of this rectangular bounding box $R$ , respectively. In addition, to distinguish the individual vertices of the child nodes, the same method is used to label them, and the child nodes are denoted by $O_{i}$ .

3.1.2. Dominance

The concept of ‘dominance’ is the core of the algorithm in this paper. When a point $p$ is more qualified than another point $q$ in all aspects, it is said that $p$ dominates $q$ . This concept was first proposed by Borzsony (Borzsony, Kossmann, and Stocker Citation2001) and has been widely studied, and is a very important operator in this optimization method. For the convenience of subsequent expression calculations, we use it to indicate that $p$ dominates $q$ . Since the distance between points is mainly considered in the calculation of the clipping space, and since the vertices as reference points need to be calculated separately, it is necessary to pay attention to the direction of the domination in each calculation, and the subscript is used to indicate the dominated reference point $t$ :

(2)

\begin{matrix} p ≺ q, iff : | p [i] - t [i] \leq q [i] - t [i] | & for all i = 1, \dots, n \end{matrix}

(2)

Here $| |$ denotes the absolute value of the distance and $i$ denotes the individual dimension. $p$ dominates $q$ only if, in each dimension, $p$ is closer to the reference point $t$ than $q$ . For example, in , if $R_{00}$ is used as the reference and $O 2_{00}$ dominates $O 1_{11}$ , it is expressed as

(3)

O 2_{00} ≺_{R_{00}} O 1_{11}

(3)

3.1.3. Skyline points

For each vertex of MBB, by the concept of domination, we can create a skyline point dataset for each vertex, i.e. for every vertex, the set of all points that are not dominated by any point in the MBB, denoted as

(4)

S k_{t} (R) = \{p \in R : \exists t o \wd \hss / \hss \box q \in R, q ≺_{t} p\}

(4)

Skyline points are shown as the blue dots (denoted by letter ‘O’) in , and the direction $d$ of the subscript is the direction of the vertices to which they belong. Calculating skyline points for each vertex separately involves all the child nodes in the MBB $R$ . This step may seem to be very computationally intensive. However, it is not necessary to involve all the vertices in each calculation. In fact, it is not necessary to involve all the vertices in each computation because a vertex in a child node must dominate all the other vertices in that child node when computing the dominance relationship.

(5)

O i_{d} ≺_{R_{d}} P, ∃P \in Oi | P \neq O i_{d}

(5)

Here, $R_{d}$ is the reference point, and $P$ denotes any point in $Oi$ that is not a vertex $Oid$ .

As long as the direction of the reference point is specified, the child node can directly choose a vertex of the same direction to represent the entire child node, because that vertex must dominate all the points within that child node. In this way, when computing skyline points for each vertex, the number of points actually involved in the calculation of the dominance relationship is the total number of child nodes. The computation of skyline has been fully investigated in Son’s work (Son et al. Citation2017), and there are still subsequent scholars who have proposed optimization methods for the computation of the skyline in different studies [11] (Kalyvas and Tzouramanis, Citation2017). In this paper, the number of points involved in the computation of skyline is only the maximum number of child nodes stored, which usually does not exceed 32, so the simplest loop comparison method (naive nested loops) (Son et al. Citation2017) can be used.

As shown in Algorithm 1, line 2 sets a flag with an initial value of 1. Lines 4–6 indicate that if a child node is dominated, and the flag value changes to 0 and exits the loop at that level. If the loop is completed and the flag value is still 1, the point is not dominated by any other point and meets the skyline points requirement and is output as the result.

Table

Display Table

3.1.4. Stairline points

As the vertex set of the smallest bounding box, skyline points can be used as clip points. That is, the points in skyline are combined with the corresponding vertices, and the generated rectangle is the clipped space, such as the point $O 2_{00}$ combined with $R_{00}$ in . However, it can be easily seen that skyline points are not the optimal solution for clipping. There is still a large space of the area that can be clipped, and even skyline points in many cases, such as $O 1_{11}$ , $O 2_{11}$ are not meaningfully clipped.

Sidlauskas et al. (Citation2018) proposed the concept of stairline points. After finding the skyline points for each vertex, they are combined pairwise. The values in each dimension that are farthest from the reference point are identified to obtain a point with a larger clipping area. The clipping space is shown in . When there are numerous points, some unqualified points may be dominated by the skyline points. Therefore, all the merged stairline points need to be filtered. This is achieved by determining whether these points will be dominated by the previously obtained skyline points and removing the dominated points. The remaining points are the desired ones, which are stored in an R-tree structure.

To facilitate the representation of the merged method, define the operator:

(6)

R_{d} (p, q) = l

(6)

The initial phase involves identifying the direction of the reference point. For each dimension, if the direction is 0, the larger value of the two points in that dimension is utilized in the calculation. Conversely, if the direction is 1, the smaller value is taken into account. Sidlauskas (Sidlauskas et al. Citation2018) defines stairline points as follows.

(7)

S t_{t} (R) = \{l = R_{d} (p, q) | \forall p, q \in Sk (g), q_{n} ≺_{t} l\}

(7)

Here, $p, q$ are two points in skyline points. The combination of these two points generates a new point $l$ . It is necessary to ensure that the newly generated $l$ points will not be dominated by any points in skyline points, otherwise $l$ may act as a clip point, potentially truncating the space of child nodes.

As depicted in Algorithm 2, the pseudocode in lines 1–3 represents the combination of any two skyline points. Line 4 completes the removal of duplicate points. Lines 5–8 removes the points dominated by skyline points, with the remaining points output as the result. Although it is more straightforward to calculate on a spatial index by directly connecting adjacent skyline points in pairs along one axis, it cannot be directly applied in the spatial-temporal index due to the lack of sorting along each axis. Additionally, the combined points may be dominated by other skyline points. Therefore a filtering step is needed to remove such dominated points.

Table

Display Table

Take as an example, $R_{11}$ is the reference vertex. Initially we calculate the skyline points in the direction of $R_{11}$ , considering only the three points $O 1_{11}$ , $O 2_{11}$ , $O 3_{11}$ that represent the 3 child nodes. Then according to the definition of domination, $O 3_{11}$ is dominated by $O 2_{11}$ . Thus, only $O 1_{11}$ and $O 2_{11}$ are the skyline points of $R_{11}$ . These points are then combined according to their coordinates to find the farthest point $r_{11}$ from $R_{11}$ , which is the stairline points. At the same time, we can also find out the stairline points in several other directions, as shown by the red point in .

3.1.5. Storage and application of clipped points

In the IB-CBB method, the original structure of the index is not altered. Instead, only the stairline points and corresponding direction $d$ need to be recorded in each leaf node. The auxiliary points should be stored in an appropriate amount. Excessive auxiliary points can lead to complex calculations, while storing too few points can result in insufficient space for clipping. If the stairline point has a small clipping area, it may not need to be stored, such as $r_{01}$ in .

When querying data, the query box is first compared with the MBB. If they intersect, the auxiliary points of the record are further compared to determine whether the data within the MBB intersects with the query box. If the clip point can dominate the points opposite to its direction, it indicates that the child nodes (actual data) within the node do not intersect with the query box. The query box does not need to continue comparing with each child node one by one. Taking the query box in as an example, it intersects with the bounding box of $R$ , and then calculates the relationship between the stairline points and the diagonal points of the query box. Among them, $q_{11}$ is dominated by $r_{00} 2$ , which means that the query box does not intersect with the objects within $R$ . This avoids further comparison to all objects within $R$ .

shows the comparison of the query process between our method and traditional method. We add a step to determine whether the vertices of the query box will be dominated by the corresponding clip points. When the redundant space in the spatial index is large, the probability of the query box falling into the redundant space and being dominated by the clip points in the new step increases. Although the determination process increases the computation time partially, the step of comparing the query box with the child nodes one by one is eliminated, thereby reducing data I/O and computation time. However, the disadvantage is that if the query box does not fall into the redundant space in any node, it may increase the computational workload. By comparing different R-trees, different data, and different numbers of auxiliary points stored, Sidlauskas’ experiments prove that the optimization method has significant performance improvement for spatial index, not only for intersection test, but also for insertion, deletion, spatial join and other data processing.

Figure 2. Comparison of query process.

3.2. Multi-dimensional index based on CBB

Our method is focused on extending the spatial dimension from 2D to 3D. This approach can be applied to any spatial dimension and can also handle elevation or attribute data as the third dimension.

3.2.1. Applied to higher-dimensional index

The method can clearly be extended to higher-dimensional spaces. Take as an example, where the R-tree is built and the MBB is clipped using the CBB method. The cubes of different colours are drawn according to the three stairline points of $R_{100}$ . However, it is obvious from the figure that some stairline points are not optimal solutions. There are still points that can be used to clip more space. In the following section, we explain the reason why these newly generated points are not optimal and introduce our method to find the optimal clip points in the spatial index. After comparing this method with the CBB method applied to the most widely used R-trees, we find that the performance improvement of this method is significant.

Figure 3. Clipping bounding box in spatial query.

3.2.2. Limitation of CBB

Firstly, the CBB method fails to consider the three adjacent vertices when determining the clip point of a vertex. This is reasonable in the spatial index of a 2D space, as demonstrated in . Because there must be spatial data that dominates its neighbouring vertices, otherwise, the boundary of the MBB would become smaller.

In contrast, in 3D or multi-dimensional space, the neighbouring vertices of a vertex must be taken into account. For example, in , there is no spatial data between vertices $R_{100}$ , $R_{101}$ and their adjacent vertices. The points generated by the adjacent vertices and other skyline points may become potential clip points.

Figure 4. Expanding in 3D.

Secondly, it is necessary to analyse the redundant space in 3D space. Assuming that there are many obstacle points in 3D space, starting from a certain point, expand it in the positive direction of three dimensions simultaneously. When this continuously expanding cube touches an obstacle point, the expansion in that dimension stops. However, it can still continue to expand in other dimensions until encounter obstacle points in all dimensions. In simple terms, it is to find the largest cube in 3D space that does not contain any obstacle points. In 3D space, an obstacle point can only prevent the expansion of a cube in one dimension.

The CBB method only uses two obstacle points to limit the cube. It is obvious that the cube is not optimal. The cube can still continue to expand in another dimension, meaning that there may be better points than those found by the CBB method that result in larger clipping spaces. Taking the cube in as an example, it can still continue to expand in two additional dimensions.

3.2.3. Expanded clipping method

Skyline points in high-dimensional space are determined based on the directions of each vertex. There are many methods for finding skyline points in high-dimensional space, which can be referred to in (Son et al. Citation2017). However, considering that leaf nodes of R-tree store only a few of child nodes, the simplest naive nested loops method can be sufficient.

Next, similar to the CBB method, stairline points are obtained respectively, and filtered by skyline points. It should be noted that the newly-generated point must have a dimension value consistent with a skyline point (A) and the other two dimension values consistent with another skyline point (B). However, it is important to note that the values of the new point in three dimensions cannot be merely consistent with one of the skyline points, or else the point will definitely be dominated by another point. The stairline points as shown in are merged by two skyline points( $O 2_{100}$ , $O 3_{100}$ ), and its x and y values are taken from $O 2_{100}$ , the z value is taken from $O 3_{100}$ .

One dimension expansion can only be limited by one obstacle point, one dimension (z) is limited by $O 3_{100}$ , while two dimensions (x, y) are limited by $O 2_{100}$ , which evidently is the key to find the optimal solution.

For $O 3_{100}$ , the z value can be considered as a fixed point to the limited z dimension. In that case, only the x and y dimensions should be considered. In the following process, we will neither calculate the skyline points where the z value is farther than the fixed point from the corner, nor take the fixed point into account. Instead, only x value and y value should be considered, then the problem is simplified to a 2D space as shown in . Distinctly, the clipped space can be still expanded toward the x or y direction. It is worth noting that it can only be expanded in one direction, because if it expands simultaneously in both directions, it will inevitably contain $O 2_{100}$ , which means that the actual existing object $O 2$ represented by $O 2_{100}$ will be divided into redundant space, which is what we try to avoid.

Figure 5. Expanded method seen from the z axis.

According to , the point $N 2$ might block the clipped space from expanding towards their direction, while the point $N 1$ in the MBB might block the expansion in the x direction. Two new clipped points $N 1$ and $N 2$ are the optimal solutions we need.

In a word, two new points with larger clipping spaces can be derived from each stairline point generated by CBB method.

4. Multi-dimensional spatial query optimization based on IB-CBB

The CBB method involves comparing all the clip points during the query process, resulting in lower query performance when the query box covers a large area. To address this issue, this paper proposes an optimized method for queries, namely IB-CBB, which specifically targets the case where the query box intersects with the node’s bounding box. By reducing the number of comparisons in the CBB method, the IB-CBB method improves query performance.

4.1. Impact of query range on optimization methods

During the query process, the amount of computation that can be optimized tends to a fixed value, which is only related to the experimental data and the construction method of the spatial index. Therefore, as the size of the query bounding box increases, the overall computational load also increases. This is due to an increase in the number of comparisons between the auxiliary points and the query box, while the optimizable computational load remains constant. At this point, the performance of the optimized index may show a downward trend.

To validate this conclusion, we conducted tests on the dataset par03 (Beckmann and Seeger Citation2008), which contains rectangular objects with significant variations in size and shape. We built both the R-tree (Guttman Citation1984) and the CBB index, and selected 10,000 bounding boxes randomly for queries under the two indexes. The total query time was recorded. The number of query results was altered by modifying the size of the query bounding box. As the query range expanded, the query efficiency of the spatial index was shown in . The data labels represent the actual query time, and the curve represents the ratio of the query time of the CBB method to that of the R-tree method.

Figure 6. Performance comparison of different query ranges.

As can be seen from , when the query range is larger, the time consumption of the CBB method actually increases. Especially when the amount of data contained in the query bounding box exceeds 10,000, the CBB method needs to compare a large number of auxiliary clip points with the query bounding box, which leads to a decrease in query performance. To solve this problem, we propose an IB-CBB query optimization algorithm.

4.2. Intersection-based clipping bounding boxes

When the query box is significantly larger than the index node, the latter is entirely contained by the former. In this case, it becomes unnecessary to compare with the auxiliary clip points stored within the node, neither with each child node like the traditional method. Rather, all data in the node can be directly output as the result without further computation, thereby significantly reducing computational time.

Additionally, when the query box is much smaller than the index node or they have similar sizes, it is necessary to compare all auxiliary points with the query bounding box. This step can still be further optimized. Since auxiliary points have direction and should only be compared with corresponding query box vertices, we can choose the required auxiliary points for comparison based on the intersection between the query box and the index node. Although this involves more computation for determining intersections, it saves unnecessary calculations for auxiliary points.

Different intersection cases between query boxes and index nodes can lead to variations in subsequent processing. Therefore, we propose an IB-CBB query optimization algorithm that includes an additional step to determine the intersection case. This allows for further optimization of queries based on the classification of intersections.

4.2.1. Intersection of query box and bounding box

In real-world data, the intersection of bounding boxes can be diverse. To facilitate subsequent query computations, we categorize them into the following types for discussion.

4.2.1.1. 1D intersection

There are four intersection cases of two line segments at the 1D level as shown in the following .

Figure 7. 1D intersection.

Where the upper line segment represents the query line segment, and the lower one represents the node line segment. There exists a difference in direction, with the negative direction to the left and the positive direction to the right. Therefore, four numbers from 0–3 can be used to represent four different intersection cases in one dimension. Based on this, the intersection can be represented in higher dimensions and used for subsequent computations.

4.2.1.2. 2D space intersection

shows that the query box intersects only one vertex of the index node, so it only needs to be compared with the corresponding vertex.

Figure 8. 2D intersection.

shows that the query box intersects with one edge of the index node, and the auxiliary points of the two adjacent vertices are likely to dominate the query box, so it needs to be compared with the corresponding two vertices.

shows that the query box passes through the index node, so there is a high probability that it intersects with the data inside the node. Therefore, there is no need for further comparison, and it can directly compare the child nodes. Of course, it is also possible that the part passed by the query box does not have any data in the node, but this situation cannot be optimized by the CBB method either.

shows that the query box is contained within the index node, so all vertices must be compared.

shows that the index node is contained within the query box, so all data in the node must belong to the query result, so it can directly output the result.

We use a 2-digit number to represent the intersection situation, where each digit represents a dimension. The 2D intersection situation is shown in the .

Table 1. 2D intersection representation.

Download CSV Display Table

The first digit represents the intersection of the Y-axis, and the second digit represents the intersection of the X-axis. Each digit has four possible cases. When combined, they represent the 16 possible intersections in 2D space. For example, 00 indicates that the query box intersects with the index node on both the X-axis and Y-axis in the left direction. 20 indicates that the query box intersects with the index node on the X-axis in the left direction, while being contained within the index node on the Y-axis.

4.2.1.3. 3D space(spatial) intersection

The intersection in three dimensions is more complex, with a total of 64 possible cases. It should be noted that three dimensions can be applied to time, elevation, or other attributes, and have a wide range of applications. Of course, this method can also be extended to higher dimensions.

Six intersection cases are shown in .

Figure 9. 3D (spatial) intersection.

shows that the query cube only intersects with the vertices of the node cube, so it only needs to be compared with the corresponding vertices.

shows that the query cube intersects with an edge of the node cube or passes through it. Both vertices of the edge are likely to dominate the query cube, so it needs to be compared twice.

shows that the query cube contains one face of the node cube, and there is a high probability of intersecting with the data inside the node. Therefore, there is no need for further comparison with the auxiliary points, and it can directly compare the child nodes.

shows that the query cube intersects with one face of the node cube. The four vertices on this face are likely to dominate the query cube, so it needs to be compared four times.

shows that the node cube is contained within the query cube. In this case, all data in the node must belong to the query result, so all child nodes can be directly output as results.

shows that the query cube is contained within the node cube. In this situation, only a full comparison of all vertices is possible.

We use a 3-digit number to represent the intersection situation, where each digit represents a dimension. The 3D intersection situation is shown in the .

Table 2. 3D(spatial) intersection.

Download CSV Display Table

Each bit represents the intersection of the Z,Y,X axes respectively, and combined together they represent 64 intersection cases in 3D space. For example, 001 indicates that both the Z and Y axes intersect with the edge of the node cube in left direction, and the X axis in right direction. 201 indicates that query cube is contained by the node in Z axis, while the Y axis and the X axis intersects with the edge of the node cube, as shown in in .

4.3. Optimization of query process

The query process of the optimization method is shown in . When the query box intersects with the index node, the intersection step is added. When the intersection result is ‘intersection cases’, the corresponding clip point is compared with the query box to determine the dominant relationship. When the intersection result is ‘without calculation’, the step of comparing with child nodes one by one is carried out. Last, when the result of intersection is ‘output directly’, it will output results. When all nodes are calculated, the query ends.

Figure 10. Flowchart of query optimization.

The maximum and minimum values of each dimension of the bounding box $R$ . Since this step is judged after the intersection of the query box and the node wrapper box, they must intersect partially in each dimension, which can be referred to the four cases of the one-dimensional intersection diagram. For the $ith$ dimension, if $Q . \min [i] <= R . \min [i]$ and they intersect, the intersection result must be 0 or 3. If $Q . \min [i] > R . \min [i]$ , the intersection result is 1 or 2, if the $Q . \max [i]$ ‘s value is small, the intersection is 2, otherwise it is 1.

The method for determining the intersection situation is shown in Algorithm 3. The input includes dimension n, range of the query box $Q$ and the index node $R$ . It is assumed that the query box intersects with the index node before this step. For the $i$ -th dimension, if $Q . \min [i] <= R . \min [i]$ and they intersect, then the intersection result must be either 0 or 3. If $Q . \max [i]$ is greater, then the intersection result is 0. Otherwise, it is 3. If $Q . \min [i] > R . \min [i]$ , then the intersection result can be either 1 or 2. If $Q . \max [i]$ is smaller, then the intersection result is 2. Otherwise, it is 1.

Table

Display Table

5. Experimental evaluation

In this section, we conduct rigorous experiments to assess the efficiency of our method. All experiments are executed using the C++ programming language on a server with an Intel(R) Xeon(R) CPU E5–2620 V4 @2.10 GHz processor and 512 GB of memory.

We compare the original method with the improved method for the 3D spatial clipping method and apply it to the existing multi-dimensional index standard (Benchmark) (Beckmann and Seeger Citation2008). This standard is often used in spatial index research. For example, the most commonly used spatial index in current popular spatial databases and academia for performance comparison is the R±tree (Sellis, Roussopoulos, and Faloutsos Citation1987) and R*-tree (Beckmann et al. Citation1990). The standard not only provides the source code of four common R-trees and the code of common spatial data manipulation, but also, most importantly, provides 28 reliable datasets. So that many researchers nowadays, when optimizing spatial indexes, borrow the source code and apply the improved methods to the provided datasets for comparison tests.

Datasets:

abs03: this dataset is generated by equally spaced squares of equal size, which is uniformly distributed in the space.
rea03: this dataset is a 3D dataset which consists 11,958,999 points extracted from the original file of a biological dataset.
par03: all rectangular objects of this dataset have a large variation in size and shape.
abs02: this dataset consists 1,048,576 3D boxes.
par02: this dataset is the 2D version of abs03.

We choose datasets with different data distribution density and dimensions. abs03 is used for testing regular-sized, uniformly distributed spatial data. rea03 is used for testing 3D spatial data. par03 is used for testing spatial data with irregular sizes and non-uniform distribution.

In the experiment, both real and synthetic data were utilized. We have made efforts to select experimental data that are representative in terms of their type, distribution, and scale. However, the reproducibility of the results may not be guaranteed for all datasets, but we believe that the majority of datasets should be covered.

5.1. Evaluation of spatial index based on CBB

5.1.1. Clipped volume comparison

In , we reproduce the CBB method. The percentages of the clipped volume to the total node volume are shown in green. The X-axis represents the datasets and the number of auxiliary points stored in each node, and Y-axis represents the percentage and two spatial indexes. Next, we improve the CBB method by adding adjacent non-dominated corner vertices as skyline points for each vertex. On this basis, the expanding method is adopted and the increasing percentage of the two improvements is shown in green and yellow respectively.

Figure 11. Datasets and number of clip points used in R-tree.

Figure 12. Datasets and number of clip points used in R*-tree.

The effect of clipping area after using IB-CBB method is shown in . The X-axis represents the maximum number of auxiliary points stored per node. The Y-axis represents the proportion of the clipping space to the total space in this paper. CORNER and EXPAND represent the proportions of the clipping area to the total space after the first and second optimizations of the method in this paper, respectively.

The experiment sets a 5% threshold for clip points. It means that points with a cropping ratio less than 5% will not be recorded. This is because a very small cropping ratio does not significantly improve index performance but consumes storage space and computational resources. To study the relationship between the number of stored clip points $k$ and the clipping space, We vary the number from 1 to 32. When the value of $k$ exceeds 8, more clip points do not significantly increase the clipping space. Moreover, increasing the number of cropping points leads to an increase in the number of comparisons during query processing. Therefore, the value of k should not be too large.

From the bar graphs, the volume of the expanding method is obviously prominent by clipping most of the dead space from the node volume in 3D space, even approximately to 3 times the volume of the CBB method. Much of dead space in spatial index can be clipped away effectively.

5.1.2. Improvement in range query performance

By using the par03 (Beckmann and Seeger Citation2008) dataset, we test different sizes of query boxes and compare the number of accessed nodes. The X-axis represents the query data and the number of clip points stored in each node, and the Y-axis represents the percentage of the accessed nodes of the optimized index to the original index. The query data QR0, QR2, QR3 are from Sidlauskas’s work (Sidlauskas et al. Citation2018), retrieve approximately 1, 100 and 1000 objects, respectively, per query. As can be seen in , when retrieved results or query boxes are smaller, the performance of reducing node accesses is more obvious, even reducing 40% in some cases. Adding the number of clip points can also reduce the accessed nodes, but it will lead to extra calculations which may lower the query efficiency.

Figure 13. Node accesses in clipped index w.R.t unmodified(100%).

5.2. Evaluation of IB-CBB

5.2.1. Spatial index

We built R-tree, CBB, and IB-CBB indexes on the dataset abs02 and par02 (Beckmann and Seeger Citation2008). The R-tree was used as the baseline method. The query performance comparison of three indexing methods under different datasets is shown in . As can be seen from the figure, when the query range is small, CBB has the highest efficiency. However, IB-CBB has a higher query efficiency than R-tree, with an optimization performance improvement of up to 20%. When the query range is large, the advantage of IB-CBB becomes prominent.

Figure 14. Spatial index performance in different datasets.

5.2.2. Spatial-temporal index

In the 3D aspect, spatial indexes are build for the datasets abs03 and par03 (Beckmann and Seeger Citation2008) respectively.

The query times of three spatial indexes for different datasets are compared as shown in the by changing the size of the query box. We can see the similar results like the . The optimization performance reaches 30%.

Figure 15. Spatial index performance in different datasets.

6. Conclusions

This paper first analyzes the issues encountered with the CBB method during the query process. Subsequently, an improved strategy is proposed to address these issues. This strategy reduces unnecessary computation of child nodes in the CBB query process by storing additional auxiliary points and marking redundant space in the bounding box. The method is then extended from 2D to 3D space. An intersection-based CBB method (IB-CBB) is proposed, by extending the auxiliary points in each dimension. This approach can find the optimal solution to clip redundant space and effectively reduce the number of node accesses during the query process, further improving the query performance of multi-dimensional indexes. Experimental results show that the IB-CBB method can clip the MBB area about three times larger than the CBB method, reducing node computation by 40% and improving query performance by up to 30%.

The IB-CBB method can enhance the query efficiency based on R-tree index, however, due to its structure based on R-tree index, the optimal performance improvement is still related to the depth of the tree. In the future, it is worth considering further optimizing the index by combining time and attribute features. Moreover, the method used to calculate the clip points has similarities with the visibility analysis method in 3D space, which could be explored for applications like 3D visibility analysis or web-based 3D data loading.

Author contributions

Conceptualization, Wei Xiong; methodology, Ye Wu, Ruiqing li; software, Jingzhi Cao and Ruiqing li; formal analysis, Wei Xiong, Jingzhi Cao and Ruiqing Li; resources, Jingzhi Cao, Ruiqing li, Wei Xiong; writing – original draft preparation, Jingzhi Cao, Ye Wu; writing – review and editing, Wei Xiong; visualization, Ye Wu, Jingzhi Cao; project administration, Jingzhi Cao; funding acquisition, Wei Xiong. All authors have read and agreed to the published version of the manuscript.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

Benchmark Dataset:https://www.mathematik.uni-marburg.de/seeger/rrstar/index.html

Additional information

Funding

The work was supported by the National Natural Science Foundation of China [41871284]; Natural Science Foundation of Hunan Province [2020JJ4663].

References

Beckmann, N., H. P. Kriegel, R. Schneider, and B. Seeger. 1990. “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles.” Proceedings of the 1990 ACM SIGMOD international conference on Management of data, June, 1990, Atlantic City, New Jersey, USA, 322–331. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/93597.98741.
Google Scholar
Beckmann, N., and B. Seeger. 2008. A Benchmark for Multidimensional Index Structures. Accessed March 22, 2024. https://www.mathematik.uni-marburg.de/~seeger/rrstar/index.html.
Google Scholar
Borzsony, S., D. Kossmann, and K. Stocker. 2001. “The skyline operator.” Proceedings 17th International Conference on Data Engineering, April 2-6, 2001, Heidelberg, Germany, 421–430. Los Alamitos, CA, USA: IEEE Computer Society. https://doi.org/10.1109/ICDE.2001.914855.
Google Scholar
Brisaboa, N. R., S. Ladra, and G. Navarro. 2009. ”k2-Trees for Compact Web Graph Representation.” String Processing and Information Retrieval, August 25-27, 2009, Saariselkä, Finland, 18–30. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-03784-9_3.
Google Scholar
Chakka, V. P., A. Everspaugh, and J. M. Patel. 2003. Proceedings of First Biennial Conference on Innovative Data Systems Research, January 5–8, 2003, Asilomar, CA, USA. http://www-db.cs.wisc.edu/cidr/cidr2003/program/p15.pdf.
Google Scholar
Esche, M., A. Glantz, A. Krutz, M. Tok, and T. Sikora. 2012. “Quadtree-based temporal trajectory filtering.” 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 1553–1556. IEEE Xplore. https://doi.org/10.1109/ICIP.2012.6467169.
Google Scholar
Evagorou, G., and T. Heinis. 2021. “Mambo-indexing dead space to accelerate spatial queries.” 33rd International Conference on Scientific and Statistical Database Management, Tampa, FL, USA, 73–84. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3468791.3468804.
Google Scholar
Graefe, G., and H. Kuno. 2011. “Modern b-tree techniques.”2011 IEEE 27th International Conference on Data Engineering, Hannover, Germany, 1370–1373. IEEE Xplore. https://doi.org/10.1109/ICDE.2011.5767956.
Google Scholar
Guttman, A. 1984. “R-Trees: A Dynamic Index Structure for Spatial Searching.” Proceedings of the 1984 ACM SIGMOD international conference on Management of data, Boston, Massachusetts, 47–57. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/971697.602266.
Google Scholar
Jun, B., B. Hong, and B. Yu. 2003. “Dynamic Splitting Policies of the Adaptive 3DR-Tree for Indexing Continuously Moving Objects.” International Conference on Database and Expert Systems Applications, September 1-5, 2003, Prague, Czech Republic, 308–317. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-540-45227-0_31.
Google Scholar
Kalyvas, C., and T. Tzouramanis. 2017. “A Survey of Skyline Query Processing.” https://arxiv.org/abs/1704.01788.
Google Scholar
Kim, K. C., and S. W. Yun. 2004. Web and Wireless Geographical Information Systems, November 26-27, 2004, Goyang, Korea, 167–180. Berlin, Heidelberg: Springer. https://doi.org/10.1007/11427865_13.
Google Scholar
Langendoen, K., B. Glasbergen, and K. Daudjee. 2021. “NIR-Tree: A Non-Intersecting R-Tree.” 33rd International Conference on Scientific and Statistical Database Management, July 6-7, 2021, Tampa, FL, USA, 157–168. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3468791.3468818.
Google Scholar
Li, G., and J. Tang. 2010. “A New HR-Tree Index Based on Hash Address.” 2010 2nd International Conference on Signal Processing Systems, July 5-7, 2010, Dalian, China, V3-35-V3–38. IEEE Xplore. https://doi.org/10.1109/ICSPS.2010.5555818.
Google Scholar
Li, G., and J. Tang. 2011. “A New R-Tree Spatial Index Based on Space Grid Coordinate Division.” Proceedings of the 2011, International Conference on Informatics, Cybernetics, and Computer Engineering (ICCE2011), November 19-20, 2011, Melbourne, Australia, 133–140. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-25188-7_16.
Google Scholar
Mahmood, A. R., S. Punni, and W. G. Aref. 2019. “Spatio-Temporal Access Methods: A Survey (2010-2017).” GeoInformatica 23 (1): 1–36. https://doi.org/10.1007/s10707-018-0329-2.
Web of Science ®Google Scholar
Pfoser, D., C. S. Jensen, and Y. Theodoridis. 2000. Novel Approaches in Query Processing for Moving Object Trajectories Proceedings of the 26th International Conference on Very Large Data Bases, September 10-14, 2000, 395–406. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Google Scholar
Romero, M., N. Brisaboa, and M. A. Rodrguez. 2012. “The Smo-Index: A Succinct Moving Object Structure for Timestamp and Interval Queries.” Proceedings of the 20th International Conference on Advances in Geographic Information Systems, November 6-9, 2012, Redondo Beach, California, 498–501. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2424321.2424399.
Google Scholar
Sellis, T., N. Roussopoulos, and C. Faloutsos. 1987. The R+ -Tree: A Dynamic Index for Multi-Dimensional Objects. Technica Report. http://hdl.handle.net/1903/4541.
Google Scholar
Sidlauskas, D., S. Chester, E. T. Zacharatou, and A. Ailamaki. 2018. “Improving spatial data processing by clipping minimum bounding boxes.” 2018 IEEE 34th International Conference on Data Engineering (ICDE), April 16-19, 2018, Paris, France, 425–436. IEEE Xplore. https://doi.org/10.1109/ICDE.2018.00046.
Google Scholar
Son, W., F. Stehn, C. Knauer, and H. K. Ahn. 2017. “Top-k Manhattan spatial skyline queries.” Information Processing Letters 123:27–35. https://doi.org/10.1016/j.ipl.2017.03.003.
Web of Science ®Google Scholar
Tao, Y., and D. Papadias. 2001. “The MV3R-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries.” Proceedings of Very Large Data Bases Conference (VLDB), September 11-14, 2001, Roma, Italy, 431–440. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Google Scholar
Wang, S., D. Maier, and B. C. Ooi. 2016. “Fast and Adaptive Indexing of Multi-Dimensional Observational Data.” Proceedings of the VLDB Endowment 9 (14): 1683. https://doi.org/10.14778/3007328.3007334.
Google Scholar
Wang, L., and Q. Xu. 2010. “Gps-Free Localization Algorithm for Wireless Sensor Networks.” Sensors 10 (6): 5899–5926. https://doi.org/10.3390/s100605899.
PubMed Web of Science ®Google Scholar
Zäschke, T., C. Zimmerli, and M. C. Norrie. 2014. “The Ph-Tree: A Space-Efficient Storage Structure and Multi-Dimensional Index.” Proceedings of the 2014 ACM SIGMOD international conference on Management of data, June 22-27, 2014, Snowbird, Utah, USA, 397–408. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2588555.2588564.
Google Scholar

IB-CBB: an improved spatial index considering intersection based on clipped bounding boxes

ABSTRACT

1. Introduction

2. Related work

3. Multi-dimensional spatial index based on CBB

3.1. General concept of clipped bounding boxes

3.1.1. Vertexes and direction of MBB

3.1.2. Dominance

3.1.3. Skyline points

3.1.4. Stairline points

3.1.5. Storage and application of clipped points

3.2. Multi-dimensional index based on CBB

3.2.1. Applied to higher-dimensional index

3.2.2. Limitation of CBB

3.2.3. Expanded clipping method

4. Multi-dimensional spatial query optimization based on IB-CBB

4.1. Impact of query range on optimization methods

4.2. Intersection-based clipping bounding boxes

4.2.1. Intersection of query box and bounding box

4.2.1.1. 1D intersection

4.2.1.2. 2D space intersection

Table 1. 2D intersection representation.

4.2.1.3. 3D space(spatial) intersection

Table 2. 3D(spatial) intersection.

4.3. Optimization of query process

5. Experimental evaluation

5.1. Evaluation of spatial index based on CBB

5.1.1. Clipped volume comparison

5.1.2. Improvement in range query performance

5.2. Evaluation of IB-CBB

5.2.1. Spatial index

5.2.2. Spatial-temporal index

6. Conclusions

Author contributions

Disclosure statement

Data availability statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date