Theoretically Optimal and Empirically Efficient R-trees with Strong Parallelizability

Size: px

Start display at page:

Download "Theoretically Optimal and Empirically Efficient R-trees with Strong Parallelizability"

Paulina Daisy Chapman
6 years ago
Views:

Theoretically Otimal and Emirically Efficient R-trees with Strong Parallelizability Jianzhong Qi, Yufei Tao, Yanchuan Chang, Rui Zhang School of Comuting and Information Systems, The University of

1 Theoretically Otimal and Emirically Efficient R-trees with Strong Parallelizability Jianzhong Qi, Yufei Tao, Yanchuan Chang, Rui Zhang School of Comuting and Information Systems, The University of Melbourne Deartment of Comuter Science and Engineering, Chinese University of Hong Kong ABSTRACT The massive amount of data and large variety of data distributions in the big data era call for access methods that are efficient in both query rocessing and index bulk-loading, and over both ractical and worst-case workloads. To address this need, we revisit a classic multidimensional access method the R-tree. We roose a novel R-tree acking strategy that roduces R-trees with an asymtotically otimal I/O comlexity for window queries in the worst case. Our exeriments show that the R-trees roduced by the roosed strategy are highly efficient on real and synthetic data of different distributions. The roosed strategy is also simle to arallelize, since it relies only on sorting. We roose a arallel algorithm for R-tree bulk-loading based on the roosed acking strategy, and analyze its erformance under the massively arallel communication model. Exerimental results confirm the efficiency and scalability of the arallel algorithm over large data sets. PVLDB Reference Format: Jianzhong Qi, Yufei Tao, Yanchuan Chang, and Rui Zhang. Theoretically Otimal and Emirically Efficient R-trees with Strong Parallelizability. PVLDB, (): -, 0. DOI: htts://doi.org/0./.. INTRODUCTION Satial databases have been traditionally used in geograhic information systems, comuter-aided-design, multimedia data management, and medical studies. They are becoming ubiquitous with the roliferation of location-based services such as digital maing, augmented reality gaming, geosocial networking, and targeted advertising. For examle, in maing services such as Google Mas, the search this area functionality suorts querying laces of interest (POIs) such as shos within a given view area (cf. Fig. a). In a oular augmented reality game, Pokémon GO [], every layer has an avatar laced in the game ma based on the layer s geograhical location. The layers can interact Permission to make digital or hard coies of all or art of this work for ersonal or classroom use is granted without fee rovided that coies are not made or distributed for rofit or commercial advantage and that coies bear this notice and the full citation on the first age. To coy otherwise, to reublish, to ost on servers or to redistribute to lists, requires rior secific ermission and/or a fee. Articles from this volume were invited to resent their results at The th International Conference on Very Large Data Bases, August 0, Rio de Janeiro, Brazil. Proceedings of the VLDB Endowment, Vol., No. Coyright 0 VLDB Endowment 0-09//0... $ DOI: htts://doi.org/0./. (a) Shos (blue and red dots) in a ma view (b) Gaming objects in a game view Figure : Window queries in real alications with gaming objects (e.g., okémons ) in the game view through their avatars (cf. Fig. b). Managing POIs or gaming objects in a view which usually has a rectangular window shae is a tyical alication of satial databases. In these alications, there may be hundreds of millions of satial objects (e.g., shos, restaurants, okémons, etc.) with a variety of distributions to be managed. Meanwhile, there may be millions of service requests from users, e.g., Google Mas is answering millions of queries er day [], and Pokémon GO is attracting over 0 million active users []. Reorting POIs or okémons in a given area in real time under such settings oses significant challenges. Satial indices are imortant techniques to address such challenges. They offer fast retrieval of satial objects. We revisit a classic satial index the R-tree [9]. Our aim is to achieve an R-tree structure that is efficient in both window query rocessing and tree bulk-loading, and over both ractical and worst-case workloads. R-trees have attracted extensive research interests [,,, 0,, ] and industrial alications [, ]. An R-tree is a balanced

2 tree structure designed for external memory based satial object indexing. Every node in an R-tree may contain multile entries. In the leaf nodes, the entries are minimum bounding rectangles (MBR) of the data objects (and ointers to them); in the inner nodes, the entries are MBRs of and ointers to the child nodes. An R-tree node usually corresonds to a disk block, the size of which constrains the caacity of the node, i.e., the maximum number of entries er node, denoted as B. Given an R-tree, a window query returns all the data objects (e.g., POIs or okémons) indexed in the tree that are within or intersect a given query window which is usually a rectangular region of interest. R-trees have good query efficiency in ractice when they are constructed with carefully crafted heuristics [,,, ]. However, all these heuristics suffer from the limitation that they do not roduce an R-tree with attractive erformance guarantees in the worst case. The Priority R-tree (PR-tree) [] is an R-tree structure that has a theoretical query erformance guarantee. It answers a window query with O((n/B) /d + k/b) I/Os in the worst case, which is known to be asymtotically otimal []. Here, n, d, and k denote the data set size, the dimensionality, and the outut size, i.e., the number of objects satisfying the query, resectively. The PR-tree is designed for rectangles. As a followu study shows [0], the PR-tree may not have satisfactory emirical erformance on data objects of a small size (e.g., oint data) or queries with small query windows; it also suffers in bulk-loading erformance; and the tree construction is difficult to arallelize. We re-examine the construction of R-trees and aim for high window query efficiency over oint data, which is a common way for reresenting locations on digital mas. Satial objects with extents can also be efficiently transformed into oints for query rocessing, as a recent study [] shows. We target alication scenarios such as digital maing, where the data sets are relatively static and can be batchudated. We construct R-trees that are query cost otimal for static data. Our R-trees handle dynamic data udates efficiently as well, although the otimality may be comromised. We also offer efficient techniques for eriodical R-tree rebuilding to reserve the query cost otimality. We roose an R-tree acking strategy that creates R- trees with the worst-case otimal window query I/O cost O((n/B) /d + k/b). This strategy has a simle rocedure and the R-trees roduced have high ractical query efficiency. A key ste we take before acking the data oints is to ma them into a rank sace such that their coordinates are maed to their ranks in each dimension. Ties in one dimension are broken by the coordinates in the other dimension. As a result, we obtain data oints with no reetitive coordinates in either dimension. We then simly ack every B data oints into a leaf node (excet ossibly the last leaf node) of an R-tree in the ascending order of the Z-order values of the data oints in the rank sace. The Z-order is an ordering created by the Z-curve [] which is a common tye of sace-filling curves (SFC). The inner nodes of the R-tree are created by acking every B child nodes into a arent node (excet ossibly the last node in each level) again in the ascending order of the Z-cure values and recursively from the bottom to the to of the tree. An inner node entry stores a ointer to a child node and its MBR. Since our R-tree acking strategy relies only on sorting, it takes O((n/B) log M/B (n/b)) I/Os to bulk-load an R-tree, where M is the size of the memory. A key advantage of this strategy is that it is highly arallelizable, which is an imortant feature in the big data era. Bulk-loading an R-tree with this strategy well suits the oular massively arallel communication (MPC) model [], which aves the foundation for designing algorithms for MaReduce systems []. We roose a arallel bulk-loading algorithm that takes O(log s n) rounds of comutation, where s = n/g and g is the number of machines articiating in the arallel algorithm. The (arallel) running time of our algorithm is O((n log n)/g), while the total time (summed over all machines) is O(n log n). For modern machines, s is large, e.g., at the order of millions, allowing the roosed algorithm to bulk-load an R-tree with a very large number of data oints in just a few rounds of comutation. While the rank sace has been used by the comutational geometry community to develo theoretical bounds [, ], we observe for the first time that rank-sace conversion can be leveraged to build a worst-case otimal structure for window queries. Furthermore, it is erhas surrising that we are able to achieve the urose by combining the rank sace with an SFC, because SFC-based indexes were reviously thought to have oor worst-case query costs. Indeed, as shown in [], if an SFC is used directly (i.e., in the original data sace) for indexing, there exist window queries which retrieve few oints, but have I/O costs linear to the data set size. In fact, even analyzing the query cost of an SFC-based index is non-trivial. The limited literature on this toic [, 9, ] has focused on the average query cost, which is analyzed indirectly by studying the clustering behavior of SFCs. In summary, this aer makes the following contributions:. We roose the first SFC-based acking strategy that creates R-trees with a worst-case otimal window query I/O cost.. The roosed acking strategy suggests a simle R- tree bulk-loading algorithm that relies only on sorting. We roose such an algorithm under the massively arallel communication model (and thus, works on MaReduce systems) with attractive erformance guarantees.. We erform extensive exeriments on both real and synthetic data. The results confirm the sueriority of the roosed R-tree acking strategy: on real data, the query I/O cost of the R-trees that we construct is u to % lower than that of PR-trees [] and similar to that of STR-trees [], which are a classic tye of sorting based bulk-loaded R-trees; on highly skewed synthetic data, the query I/O cost of the R-trees that we construct is % lower than that of PR-trees and 0% lower than that of STR-trees. The roosed bulkloading algorithm also outerforms the PR-tree bulkloading algorithm in running time by % over large data sets with 0 million data oints. The rest of the aer is organized as follows: Section discusses related work. Section details the roosed R-tree acking strategy and the worst-case window query I/O costs. Section describes the roosed arallel R-tree bulk-loading algorithm. Section discusses data udate handling. Section studies the emirical erformance of the roosed algorithms. Section concludes the aer.

3 . RELATED WORK We review studies on satial queries and access methods, with a focus on R-trees. Satial queries and access methods. We focus on the window query (rectangular range query) which is a basic tye of satial queries []. A window query returns all data objects that satisfy a certain redicate with a given query window, i.e., a (hyer)rectangular region of interest. Common query redicates include containment and intersection, which require the data objects to be fully contained in or intersect the query window, resectively. A straightforward window query algorithm sequentially checks every data object and returns an object if it satisfies the query redicate. This algorithm takes O(n/B) I/Os regardless of data distribution and outut size. Satial indices have been used to obtain higher query efficiency. We focus on the R-tree index [9]. For a comrehensive review on satial indices, interested readers are referred to []. R-trees. As discussed earlier, the R-tree is a balanced tree structure. The maximum number of entries er tree node (node caacity) B is constrained by the block size, while the minimum number of entries er tree node (excet the root node) is Ω(B). The root node needs to contain at least entries unless it is also a leaf node. Thus, the height of an R-tree indexing n objects is bounded by O(log B n). A window query is rocessed by a to-down traversal over the nodes of an R-tree whose MBRs satisfy the query. When the leaf nodes are reached, data objects in them satisfying the query are returned. A series of studies [, 9,, 0, ] roose heuristics to otimize the node MBRs during dynamic data insertion. The R*-tree [], for examle, considers the MBR overlas and region erimeters to decide the node into which a new object should be inserted. R-tree acking and bulk-loading. A different stream of research considers how to construct an R-tree by acking data objects into the leaf nodes directly rather than inserting them individually. The entire R-tree is bulk-loaded in a bottom-u fashion. Most R-tree acking algorithms [, 0,,, ] rely on some ordering of the data objects and hence have an I/O cost of O((n/B) log M/B (n/b)), which is the cost for sorting n objects (recall that M is the number of objects allowed in the main memory). Secifically, Roussooulos and Leifker [] sort the data objects by their x-coordinates and ack every B objects into a leaf node. Leutenegger et al. [] first sort the data objects by their x- coordinates, and then artition the data into n/b subsets. Objects in each subset are sorted by their y-coordinates and acked into the leaf nodes. Other studies use the Hilbert ordering [, 0, ]. R-trees bulk-loaded based on Hilbert ordering have good window query erformance on nicely distributed data [0]. However, these R-trees do not have worst-case otimal query erformance. There are also to-down bulk-loading algorithms, e.g., To-down Greedy Slit (TGS) []. TGS artitions the data set into two subsets reeatedly until B aroximately equisized subsets have been obtained. The MBRs of these B subsets form entries of the root. Each artition uses a cut orthogonal to an axis that yields two subsets with the minimum sum of costs, where the cost is based on a user-defined function, e.g., the area of the MBR of a subset. There are O(B) candidate cuts, where the hidden constant lies in the different cuts in different dimensions and on different orderings (e.g., lower x corner, center, etc.). In each dimension and with a articular ordering, the i th cut uts i n/b objects in one subset and the rest in the other subset. TGS has been shown to roduce R-trees with good query efficiency, but it has a high worst-case I/O cost, O(n log B n), for R-tree construction. This is because it needs to scan the data set B times to create the B artitions of a node (assuming that the orderings used for artitioning have been recomuted). If viewed from a recursive binary artition ersective, the I/O cost of TGS is effectively O((n/B) log n) []. Agarwal et al. [] roose an algorithm to bulk-load a Box-tree, which can be converted to an R-tree that obtains a worst-case query I/O cost of O((n/B) /d +k log B n). This work is more of theoretical interest. No imlementation or exerimental results have been given for the algorithm. The PR-tree [] is an R-tree that offers a worst-case window query I/O cost of O((n/B) /d + k/b), which is asymtotically otimal []. A PR-tree is created from a seudo-pr-tree, which is an unbalanced tree built in a todown fashion. To create a seudo-pr-tree, the data set is artitioned into six artitions to form the child nodes of the root. Four of the artitions contain B objects each, which are objects with the smallest lower x-coordinates, the smallest lower y-coordinates, the largest uer x-coordinates, and the largest uer y-coordinates, resectively. The remaining two artitions are two equisized artitions of the remaining objects, which are then recursively artitioned to form subtrees of the root. When a seudo-pr-tree is created, its leaf nodes are used as the leaf nodes of a PR-tree. The MBRs of the leaf nodes are used to create another seudo-pr-tree, the leaf nodes of which are used as the arent nodes of the leaf nodes of the PR-tree. A PR-tree is then built with O((n/B) log n) I/Os bottom-u. Arge et al. [] further roose a bulk-loading strategy which lowers the I/O cost to O((n/B) log M/B (n/b)). The main issues of the PR-tree are that it lacks ractical efficiency in bulk-loading and answering queries with small query windows [0]. We also note that other satial indices such as kd-trees [], O-trees [], and cross-trees [] can offer a worst-case otimal query I/O erformance. Comared with R-trees created by our acking strategy, kd-trees are more difficult to bulkload in arallel. In the MPC model, Agarwal et al. [] roose a randomized algorithm that can bulk-load a kdtree with O(oly log s n) rounds of comutation. In contrast, we can bulk-load an R-tree with O(log s n) rounds of comutation which is lower, and our bulk-loading algorithm is deterministic. As for O-trees, they do not belong to the R- tree family. They combine multile auxiliary structures to ensure their theoretical guarantees. This aroach is mainly of theoretical interest, but in ractice is exensive in both sace consumtion and query cost (even being asymtotically otimal in the worst case). Cross-trees share a similar issue in its ractical query erformance []. These indices are not discussed further. Parallel R-tree management. Parallelism has been exloited to scale R-trees to large data sets and user grous. An early study [] considers storing an R-tree on a multidisk system. It stores a newly created tree node in the disk that contains the most dissimilar nodes to otimize the system throughut. A few studies [,, ] assume a shared nothing (client-server) architecture for distributed R-tree storing and query rocessing. Koudas et al. [] store the inner nodes on a server while the leaf nodes on clients. Schnitzer and Leutenegger [] further create local R-trees

4 Symbol P n d q k B h g s Table : Frequently Used Symbols Descrition A data set A data oint The cardinality of P The dimensionality of P A window query q The answer set size of a window query The node caacity of an R-tree The height of an R-tree The number of machines in a cluster The number of data oints allowed in a machine on clients for higher query efficiency. Mondal et al. [] study load balancing for R-trees in shared nothing systems. The studies above do not focus on arallel R-tree bulkloading. Paadooulos and Manolooulos [] roose a generic rocedure for arallel satial index bulk-loading. They use samling to estimate the data distribution which hels artition the data sace into regions. Data objects in different regions are assigned to different clients for local index building. A global index is built on the server which serves as a coordinator for query rocessing. A more recent study [0] bulk-loads an R-tree with the MaReduce framework level by level, where each level takes one MaReduce round. The study also uses an SFC for object ordering. However, its focus is on deciding the slit oints among tree nodes. The tree nodes can contain varying numbers of entries, which leads to non-otimal bulk-loading costs. Similar ideas have been used on GPUs [] without a cost analysis.. R-TREE PACKING We consider a set P of n data oints in a d-dimensional Euclidean sace. For ease of resentation, we use d = in the following discussion, although our aroach also alies to any d >. We focus on window query rocessing. Given a rectangle q, a window query reorts all the oints in P q. We list the frequently used symbols in Table.. Maing to Rank Sace y y q y e q y e 0 0 x e x e x 0 x Original sace Rank sace Figure : Maing to rank sace Before creating an index structure over P, we first ma the data oints into a -dimensional rank sace as follows. In each dimension of the original data sace, we sort the data oints by their coordinates and use the ranks as the coordinates in the corresonding dimension of the rank sace. Rank ties in a dimension are broken by the coordinates in the other dimension of the original sace. We assume no data oints with the same coordinates in both dimensions. Define by [n] the integer domain [0, n ]. After the maing, P becomes a set of n -dimensional oints in [n] such that no two oints share the same x- or y-coordinate. Figure illustrates the maing with a set of (n = ) oints P = {,,..., }. The coordinates of the oints in the rank sace are their ranks in the original sace. For examle, has the smallest x-coordinate and second largest y-coordinate in the original sace. Thus, its x-coordinate and y-coordinate in the rank sace are 0 and, resectively. Points and both have the second smallest x- coordinate in the original sace. In the rank sace, has an x-coordinate of while has an x-coordinate of, because has a smaller y-coordinate in the original sace. A query rectangle q = [x e, x e] [y e, y e] in the original sace is maed to a rectangle q = [x, x ] [y, y ] in [n]. Here, x (y ) is the smallest rank of the data oints whose x-coordinates (y-coordinates) in the original sace are greater than or equal to x e (y e); x (y ) is the largest rank of the data oints whose x-coordinates (y-coordinates) in the original sace are smaller than or equal to x e (y e). In Fig., the solid rectangle reresents a query rectangle q. In the original sace, the query range [x e, x e] sans,,, and in the x-dimension, among which ( ) has the smallest (largest) rank (). Thus, [x e, x e] is maed to [, ] in the rank sace. Similarly, the query range [y e, y e] is maed to [, ] in the rank sace. Note that the query maing does not introduce false ositives in the query answer because the data oints do not share the same coordinate in either dimension in the rank sace. Our goal is to store P in a structure so that all window queries can be answered efficiently in the worst case. Without loss of generality, we consider that n is a ower of.. Tree Structure and Packing Strategy Our structure is simly an R-tree where the leaf nodes are obtained by acking oints in ascending order of their Z-order [] values. Other sace-filling curves such as Hilbert curves can also be used (as will be discussed in Section.) but Z-order is used for illustration. For each oint P, we comute its Z-order value Z() in [n]. Suose that = (x, y), where x = α α...α l and y = β β...β l in binary form where l = log n. Then, Z() = β α β α...β l α l. We sort all the oints of P by their Z-order values, and cut the sorted list into subsequences, each of which has length B, excet ossibly the last subsequence, where B is a arameter that controls how many oints fit in a leaf node of an R-tree. Each leaf node includes the oints in a subsequence. The inner nodes of the R-tree are created by acking every B child nodes into a arent node (excet ossibly the last node in each level) recursively from the bottom to the to of the tree. This rocess resembles how a B-tree is created, excet that an inner node entry stores a ointer to a child node and its MBR instead of a key value. This creates our target R-tree. Figure illustrates the R-tree acking strategy. The rank sace can be seen as an grid. A Z-curve (the dotted olyline) is drawn across the rank sace. The order that a cell is reached by the curve is the Z-order value of the data oint in the cell, e.g., in Fig. a, is in the second cell reached by the curve; its Z-order value is, which is

5 y (0) q 0 0 () () () () (9) (a) () () x y 0 N T (0) () N q () () N () () N (9) N () N 0 x (b) N N T N N N N (c) Figure : R-tree acking labeled in arentheses next to (same for the other oints). Based on the Z-order values, the data oints are sorted as:,,,,,,,. We use B = in this examle. The eight data oints are acked into four leaf nodes: N =,, N =,, N =,, and N =,. These four leaf nodes are then acked in the order of the Z- order values of the data oints stored in them, resulting in two inner nodes N = N, N and N = N, N. A root node is further created to oint to N and N. Figures b and c show the MBRs and the R-tree T created. Algorithm : Build-R-tree Inut: P = {,,..., n}: a d-dimensional database; B: the caacity of a tree node. Outut: T : an R-tree over P. Ma P into the rank sace; for each i P in the rank sace do Comute Z-order value of i ; Sort P in ascending order of the Z-order values; Let Q ; for every B data oints in the sorted P do Create a leaf node N to store the B data oints; Q.enqueue( N, ); 9 while Q.size() > do 0 Dequeue the first B nodes of the same level t from Q; Create a node N to store MBRs (ointers) of the nodes; Q.enqueue( N, t + ); Let T oint to the last node in Q; return T ; Algorithm summarizes the the roosed R-tree acking strategy with the hel of an auxiliary queue Q. The queue stores -tules in the form of N, t where N and t are a tree node and its level in the tree, resectively. The acking strategy takes (i) two sorts on the data oints to ma them into the rank sace (Line ), (ii) a linear scan on the data oints to comute their Z-order values (Lines and ), (iii) another sort on the Z-order values (Line ), and (iv) another linear scan on the data oints (Lines to ) and log B n rounds of linear scans on the MBRs of tree nodes for acking and loading an R-tree (Lines 9 to ). Together, this acking strategy takes O((n/B) log M/B (n/b)) I/Os to bulk-load an R-tree (the CPU time is O(n log n), noticing that the Z- order value of a oint can be calculated in O(log n) time). Sorting and linear scans can be easily arallelized. This suggests a simle arallel R-tree bulk-loading algorithm where everything boils down to sorting. We resent such an algorithm in Section.. Window Query Processing When a window query is issued, we first ma it to the rank sace following the rocedure described in Section.. To facilitate fast maing, we create two B-trees to store airs of oint coordinates in the original sace and corresonding coordinate in the rank sace, one for each dimension. Query maing using B-trees takes O(log B n) I/Os. The maed query is then answered by our R-tree in the same way as for a conventional R-tree. We omit the seudo-code of the query algorithm for conciseness. As an examle, in Fig. c, we show the search aths for rocessing query q in gray... Query Cost We rove that our R-tree answers a window query with O((n/B) /d + k/b) I/Os in the worst case, where k is the number of oints reorted. This query comlexity is known to be asymtotically otimal [, 0]. Let h log B n be the height of the tree. Label the levels of the tree as,,..., h bottom u. Consider any level t [, h]. Let l be any vertical line in [n]. We rove the following lemma, which is sufficient for establishing our claim. Lemma. The line l intersects the MBRs of O( n/b t ) nodes at level t. Proof. Intuitively, the MBR of a node intersects a line l when it covers data oints on both sides of l (e.g., in Fig. a, node N contains and on both sides of the dashed line l) or data oints on l (e.g., in N ). Such a node corresonds to a Z-curve segment that crosses or ends at l. Since different nodes corresond to non-overlaing curve segments (because the data oints are acked by ascending Z-order values), we derive the number of nodes intersecting l via the number of times that the Z-curve crosses l. Let m be the smallest ower of larger than or equal to nbt. Divide [n] into an (n/m) (n/m) grid denoted as G, where each cell has m locations in [n]. Note that the Z-curve traverses all the locations in a cell before moving to another, i.e., it never comes back to the same cell. We use Fig. a to illustrate the roof for the case where t =, i.e., at the leaf node level. It shows the four leaf

6 y 0 l 0 N (a) C G y N q N N N N 0 N x 0 x (b) C Figure : Window query I/O cost G N Tye : When u is in column C, the x-coordinate of any data oint in the subtree of the node is in the range of [a, b]. There are b a + = m distinct x-coordinates in the range, imlying m data oints in the range (recall that all oints have distinct x- coordinates). Each node at level t can index B t data oints. Thus, there are O( + m/b t ) nodes of Tye. In Fig. a, b a+ = 0+ =. The four data oints in the gray column can form at most m/b t = / = nodes fully contained in the column, although there is just one such node in this examle which is N. If does not exist then and will form another node fully contained in the column. nodes N, N, N, and N (the dashed rectangles) of an R- tree constructed. We have nb t = = which means m =. The rank sace is divided into an (/) (/) = grid, which is reresented by the black solid line grid in the figure. The Z-curve enters and leaves each cell once, e.g., for the to-left cell, the Z-curve enters at its bottom-left corner and leaves from its to-right corner. Let C = [a, b] [n] be the column of G that contains line l. In Fig. a, the vertical dashed line reresents l, which is in column C = [0, ] [] highlighted in gray. Define the line x = a as the left boundary of C, and the line x = b as the right boundary. Let u be a node whose MBR intersects l. Define X(u) as the x-range of the MBR of u. For examle, N intersects the line, and X(N ) = [0, ]. Such u can be one of the following tyes: Tye : a X(u) or b X(u), i.e., u overlas column C (cf. node N ). Tye : X(u) [a, b], i.e., u is inside column C (cf. node N ). We rove that there are at most n/m n/b t nodes of Tye and O( + m/b t ) = O( n/b t ) nodes of Tye, which comletes the roof of the lemma: Tye : Note that the Z-curve crosses the left boundary (i.e., enters column C) n/m times. This is because there are n/m cells of G in each column, and the curve enumerates all the locations of a cell of G before moving to another. In Fig. a, there are n/m = / = cells in column C. The curve enters these two cells once each. Since the data oints are sorted and acked into nodes by their curve values, there are at most n/m nodes that contain data oints on both sides of the left boundary. Otherwise, some of these nodes must have overlaing curve values, which is against our acking strategy. The same alies to the right boundary. Thus, there are at most n/m nodes of Tye. In the figure, N overlas the right boundary of the to cell of column C. It contains two data oints and on the two sides of the right boundary of this cell. The curve segment between them crosses the right boundary of the cell. Since the curve only leaves the cell once, there cannot be another node N that also overlas the right boundary of the cell. Otherwise, the two curve segments corresonding to N and N must overla, which violates our acking strategy. Discussion. For an R-tree with a height h log B n, line l interests the MBRs of O( n/b) + O( n/b ) O( n/b h ) = O( n/b) nodes. The above roof assumed n being a ower of and d =. When n is not a ower of, let ρ = log n. We enlarge the rank sace to [ ρ ]. Line l intersects O( ρ /B) nodes in the enlarged rank sace. We have O( ρ /B) = O( n/b) because ρ n. The above argument can also be generalized to an arbitrary fixed dimensionality d to rove that our query cost is bounded by O((n/B) /d + k/b) in the worst case. This will be roven in Section.., which roves an even stronger result that subsumes the aforementioned bound as a secial case... Query Sensitive Bound in Arbitrary Dimensionality Consider a query rectangle q = [x, y ] [x, y ]... [x d, y d ] in [n] d, where d is an arbitrary fixed dimensionality. For each i [, d], set λ i = y i x i +, and define Z i = {,,..., d} \ {i}, namely, Z i includes all the integers from to d excet i. We will rove a stronger version of our revious lemma: our structure answers the query in O(log B n + Λ /d /B /d + k/b) () I/Os where Λ = d i= j Z i λ j. In this bound, log B n, Λ /d /B /d, and k/b denote the costs to ma the query to rank sace (query q here is already maed), to find the nodes intersecting the query boundary, and to outut the oints inside the query, resectively. The bound looks a bit unusual such that it would hel to look at some secial cases: for d =, the query cost is O(log B n + (λ + λ )/B + k/b), while for d =, the cost becomes O(log B n + (λ λ + λ λ + λ λ ) / /B / + k/b). Since λ i n for all i [, n], it always holds that j Z i λ j n d and Λ d n d. Thus, Equation () is bounded by O((d n d ) /d /B /d +k/b) = O((n/B) /d + k/b). In other words, Equation () is never worse than the (query insensitive) bound established in Section.., but could be substantially better when q is small. We say that the MBR of a node artially intersects q if it has a non-emty intersection with q, but is not contained by q. We will rove the following lemma, which is sufficient for establishing our claim. Lemma. The query rectangle q artially intersects the MBRs of O( + Λ /d /(B t ) /d ) nodes at level t.

7 Proof. Let m be the smallest ower of at least (Λ B t ) /d. Divide [n] d into an (n/m) (n/m)... (n/m) }{{} d grid G, where each cell has m d locations in [n] d (the cell s rojection on each dimension covers m values). For each i [, d], define a dimension-i column of G as the maximal set of cells in G that have the same rojection on dimension i. Grid G has n/m dimension-i columns, each of which is a d-dimensional rectangle in [n] d that covers the entire range [n] on every dimension j i. We use Fig. b to illustrate the roof, where d = and t =, i.e., at the leaf node level. We have q = [, ] [, ] (the solid line rectangle) and hence λ = + = and λ = + =. Thus, m = (Λ B t ) /d = (λ + λ )B t = ( + ) =. The rank sace is divided into an (/) (/) = grid, which is reresented by the black solid line grid. Grid G has / = dimension- (xdimension) columns, i.e., the two vertical columns. A node whose MBR artially intersects q must intersect one of the d boundary faces of q (e.g., edges of q in Fig. b). We will rove that there can be at most O( + m/b t + j Z i λ j/m d ) nodes intersecting each of the two faces of q erendicular to dimension i. Summing this u on all d dimensions gives an uer bound on the total number of nodes that artially intersect q: d i= O( + m/bt + j Z i λ j/m d ) = O(d + dm/b t + Λ/m d ) = O( + Λ /d /(B t ) /d ) as desired in the lemma. Due to symmetry, it suffices to consider the face of q that corresonds to x (i.e., erendicular to dimension ) we refer to it as the dimension- left face of q. Let C be the dimension- column of G that covers this face; C is a rectangle that can be written as [a, b] [n] [n]... [n] }{{} d some a, b satisfying b a+ = m and b is a multile of. In Fig. b, dimension is the x-dimension, and C is the gray column [0, ] []. Define the left boundary (or right boundary) of C to be the set of oints in [n] d with coordinate a (or b, resectively) on dimension. Let u be a level-t node with an MBR intersecting the dimension- left face of q, and X(u) be the rojection of the MBR of u on dimension, e.g., N intersects the left edge of q, and X(N ) = [0, ]. Such u can be of the following tyes: Tye : a X(u) or b X(u). Tye : X(u) [a, b]. Next, we analyze the number of nodes for each tye. Tye : The Z-curve crosses the left boundary of C at most O( d j= λi/m ) = O(+λλ...λ d /m d ) times within the dimension- left face of q. This is because there are O( d j= λi/m ) cells of C within the range of q in dimensions to d (e.g., in Fig. b, there are +λ /m = +/ = cells of C within the dimension range [, ] of q), and the curve enumerates all the locations of a cell before moving to another cell. By the reasoning exlained in the roof of Lemma, there are at most O( d j= λi/m ) nodes containing data oints on both sides of the left boundary. The same alies to for the right boundary. Therefore, the number of Tye- nodes is O( + λ λ...λ d /m d ). Tye : All the B t oints in the subtree of node u must have x-coordinates between a and b. There can be only b a+ = m such oints (recall that all oints have distinct coordinates in each dimension), imlying at most O( + m/b t ) such nodes. It thus follows that the dimension- left face of q intersects the MBRs of O( + m/b t + j Z i λ j/m d ) nodes at level t. This comletes the roof.. Extending to Other Sace Filling Curves Although we have used the Z-curve as the reresentative SFC, the only roerty that we require from the Z-curve is the following quad-tree recursive attern. Divide the data sace [n] d (where n is a ower of ) into d rectangles of the same size, i.e., each rectangle is a d-dimensional square with a rojection length of n/ on each dimension (recall how the root of a d-dimensional quad-tree would artition the sace). For examle, in Fig. a, grid G is divided into = squares (cells) each with an edge length of. The quad-tree recursive attern says that the SFC must first enumerate all the oints within a rectangle before starting to enumerate the oints of another. In Fig. a, the Z-curve enumerates the oints of the bottom-left cell before moving to the bottom-right cell. The attern must be followed recursively within each rectangle, by treating it as a smaller data sace [n/] d. All our roofs hold verbatim on any SFCs (e.g., the Hilbert curve) that obey this attern.. PARALLEL R-TREE BULK LOADING Next, we resent a arallel R-tree bulk-loading algorithm based on our acking strategy. A straightforward arallel algorithm that bulk-loads an R-tree level by level requires O(log B n) rounds of arallel comutation. We show how to reduce the number of rounds to O(log s n) without sacrificing the comutation time. Here, s denotes the number of data oints that a machine articiating in the arallel algorithm can handle. Modern machines can easily handle millions of data oints, where log s n is tyically bounded by a constant. The key idea of the roosed algorithm is to distribute the data oints (or MBRs of tree nodes) in a way that the machines can bulk-load O(log B s) levels of the final R-tree in each round. Then, O(log s n) such rounds suffice to build the entire R-tree of log B s log s n = log B n levels. To bulkload log B s levels in each round, a machine is assigned a subset of the data oints (MBRs) that forms a few R-tree branches of log B s levels indeendently from the data assigned to the other machines. This is feasible because we can assign data oints to the machines in their sorted order for acking indeendently.. Parallel Comutation Model Without relying on a articular arallel latform such as Aache Hadoo, we design the arallel bulk-loading algorithm based on a generalized arallel model named the massively arallel communication (MPC) model [,, ]. Poular arallel frameworks such as MaReduce [] and Sark [9] are tyical examles of this model. The MPC model makes the following assumtions. Let n be the inut size, g be the number of machines, and s = n/g.

8 In each round of arallel comutation, every machine receives some data from other machines, erforms comutation, and sends some data to other machines. We consider only algorithms that require a machine to receive/send O(s) words of information in each round (with the terminology of [], these algorithms must have load O(s)). MPC algorithms are measured by: (i) the number of comutation rounds R, (ii) the (arallel) running time T, and (iii) the total amount of comutation W. The running time sums u the maximum comutation cost of a single machine in each round; the total amount of comutation sums u the comutation costs of all machines in all rounds. Let t Mi,r be the time comlexity of machine M i in round r. Then, R R g T = max i..g tm i,r W = r= r= i= t Mi,r For the urose of building an R-tree, W should not exceed the time comlexity O(n log n) for a single-machine imlementation of the roosed acking strategy; T should be O((n log n)/g) to achieve a seedu of g with g machines. A rimitive oeration we need is sorting. In the MPC model, sorting n elements (initially evenly distributed on the g machines) can be done in O(log s n) rounds, O((n log n)/g) running time, and O(n log n) total amount of comutation [] (see Tao et al. [] for a simle algorithm when s g ln(g n) holds). Maing n data oints to the rank sace and sorting by Z-order values thus can be done in O(log s n) rounds. This rocess takes O((n log n)/g) running time and O(n log n) total amount of comutation. We focus on acking the sorted data oints to form an R-tree next. R N N T N 9 N 0 N N R N N M N N N N N N M M Figure : Parallel R-tree bulk-loading. Distributed Packing Every round bulk-loads Θ(log B s) levels of the target R- tree. In the first round, O(s) consecutive data oints are assigned to a machine by the ascending order of their Z-order values, where an R-tree of Θ(log B s) levels is bulk-loaded locally. This creates O(n/s) R-trees. A second round bulkloads the next Θ(log B s) levels of the target R-tree over the root MBRs of those O(n/s) R-trees. For this urose, O(+ g/s) machines are used, each assigned O(s) root MBRs; this results in O(n/s ) tree roots. The above rocess reeats until the MBRs can all be bulk-loaded in a single machine (the M number of articiating machines decreases by a factor of Θ(s) each time, while each such machine is always assigned O(s) MBRs). A total of O(log B n/ log B s) = O(log s n) rounds are incurred, where O(s log s n) = O((n log s n)/g) running time and O(n) total amount of comutation are taken to comute the MBRs. Figure illustrates the rounds, where n =, B =, g =, and s =. A total of log s n = rounds are needed. Each round bulk-loads log B s = levels. In the first round R, every machine is assigned s = data oints. The machines bulk-load R-trees of levels locally. The MBRs of the roots of these local R-trees are bulk-loaded by a single machine M in the second round R. We omit the seudo-code of the arallel bulk-loading algorithm as it is similar to Algorithm, excet that now a machine handles O(s) data oints instead of n, and the loo to bulk-load an R-tree (Lines 9 to ) is broken into rounds.. UPDATE HANDLING Our R-tree structure can also allow dynamic data udates. An object to be removed can be simly deleted from our R-tree in the same way as for a conventional R-tree. An object to be inserted need to be first maed to the rank sace in a similar way to that of query window maing. Let be an object to be inserted and (x e, y e) be its coordinates in the original sace. If x e (y e) equals to the x-coordinates (y-coordinates) of some existing data oints, x e (y e) is simly maed to the largest x-rank (y-rank) of those data oints. Otherwise, x e (y e) is maed to two coordinates x and x (y and y ) in the rank sace. Here, x (y ) is the smallest rank of the data oints whose x- coordinates (y-coordinates) in the original sace are greater than x e (y e); x (y ) is the largest rank of the data oints whose x-coordinates (y-coordinates) in the original sace are smaller than x e (y e). The four coordinates x, x, y, y may form four oints (x, y ), (x, y ), (x, y ), and (x, y ) in the rank sace. To ensure no false negatives in query rocessing, object needs to be maed to all these oints. Then, it can be inserted into our R-tree in the same way as for a conventional R-tree. Allowing dynamic data udates comromises the otimal worst-case I/O cost. A eriodic rebuild of the R-tree is needed for otimal erformance. We target alications where the data set is relatively static, e.g., buildings in a ma tend not to change in every second. The arallel bulk-loading algorithm roosed in Section can be used to eriodically rebuild the R-tree to coe with data udates. As our exeriments show, it only takes. minutes to build an R-tree with 00 million data objects. This should satisfy the targeted alications.. EXPERIMENTS This section reorts exerimental results on comaring the bulk-loading and window query erformance of the R- tree constructed by the roosed strategy with those of the STR-tree [], Hilbert R-tree [0], TGS R-tree [], and PRtree [], which have been described in Section. We denote the baseline algorithms by STR, HR, TGS, and PR, resectively. We denote the roosed algorithm by ZR for that it builds an R-tree based on Z-order values. As discussed in Section., the roosed acking strategy is also alicable to other sace-filling curves such as the Hilbert curve. To demonstrate this alicability, we further

We denote this R-tree by HRR. This R-tree shares a similar structure with the Hilbert R-tree, excet that the data oints are maed to the rank sace before they are acked.

9 Table : Parameters and Their Settings Parameter Setting Distribution Uniform, Skew, Cluster n 0.M, M, M, 0M, 0M α,,,, 9 Query window area (%) 0.00, 0.0, 0.0, 0., 0.,, (a) Tiger data (b) Skew data imlement an R-tree based on the roosed acking strategy where the data oints are sorted and acked by their Hilbert order values in the rank sace. We denote this R-tree by HRR. This R-tree shares a similar structure with the Hilbert R-tree, excet that the data oints are maed to the rank sace before they are acked. Note that the query cost bounds derived in Section. hold for this tree. Following revious studies [,, 0, ], we focus on the I/O cost of the algorithms.. Exerimental Setu The window query exeriments are run on a -bit machine running Ubuntu.0 with a.0 GHz Intel i CPU, GB memory, and a 00 GB hard disk. We use Ke Yi s single-machine imlementation [] of the Hilbert R-tree, TGS R-tree, and PR-tree, which uses the TPIE library [] a C++ library that rovides APIs for imlementation of external memory algorithms and data structures. For ease of comarison, we also imlement a single-machine version of the STR-tree and the roosed HRR and ZR R-trees using TPIE. In all the R-tree structures, we use 0 bytes for each entry in a node. For an inner node entry, these 0 bytes include bytes for the coordinates ( bytes each) of an MBR and bytes for a ointer ointing to the disk block storing the corresonding child node. For a leaf node entry, these 0 bytes include bytes for an ID of a data oint and bytes for the coordinates also in the form of an MBR for ease of imlementation. We use a block size of KB, and the maximum fanout of a node is 0. The bulk-loading exeriments are run on the single machine described above and on a cluster. The arallel bulkloading algorithms are imlemented with Scala and run on Aache Sark..0-SNAPSHOT which also suorts the MaReduce model but is more efficient than Hadoo MaReduce. We use a virtual-node cluster created from an academic comuting cloud (Nectar []) running on OenStack. Each virtual-node has GB memory and cores running at. GHz. One of the nodes acts as the master and the other nodes act as slaves. Each core simulates a worker machine, and hence there are 0 worker machines in total, i.e., g = 0. The network bandwidth is u to 00 Mbs. We use Aache Hadoo..0 with Yarn as the resource manager. We use both real and synthetic data sets. The real data set contains,,9 rectangles (90 MB in size) reresenting geograhical features in eastern states of the USA extracted from the TIGER/Line 00SE data [9]. We use the center of the rectangles as our data oints. We denote this data set as Tiger. Figure a illustrates the data set. Synthetic data sets are generated with a sace domain of where the data set cardinality ranges from 0. to 0 million. We generate three grous of synthetic data sets, denoted as Uniform, Skew, and Cluster, resectively. The Uniform data sets contain data oints with uniform distribution. The Skew and Cluster data sets are (c) Cluster data Figure : Exerimental data generated following the PR-tree aer []. A Skew data set is generated from a Uniform data set by raising the y- coordinates to their owers, i.e., the coordinates of a randomly generated data oint are converted from (x, y) to (x, y α ) where α. Figure b illustrates a Skew data set where α = 9. The Cluster data set contains 0,000 clusters with centers evenly distributed on a horizontal line. Each cluster has,000 oints following a uniform distribution in a square around the center. Figure c illustrates a ortion of the Cluster data set with four clusters. We vary query window size, data set size, and data skewness in the exeriments. The exerimental arameters are summarized in Table, where default values are in bold.. Results We resent the exerimental results in this subsection... Window Query Processing We start with the window query erformance of the R- trees. We generate 00 queries at random locations in each exeriment. For ease of comarison, we follow revious studies [,, 0, ] and reort the average I/O cost er query relative to the outut size. Let the number of blocks read for a query be I and the outut size be k/b. We reort I/(k/B). Note that I/(k/B), i.e., we need to at least read all the blocks containing the data oints in the query answer. A smaller value of I/(k/B) is more referable. I/O cost Query window size (%) HR HRR PR STR TGS ZR Figure : Query I/O cost on Uniform data (varying query window size) Varying query window size. We first vary the area of the query window from 0.00% to % of the data sace. Figure shows the query I/O cost relative to the outut size 9

10 k/b over 0 million uniform data oints. A general observation is that the relative query costs of the R-trees created by the various acking strategies decrease as the query window area increases. This is because a larger query window overlaing a tree node has a better chance of overlaing the data oints in this node, i.e., there are lower ercentages of extra query I/Os that do not contribute to the outut. Meanwhile, the R-trees HRR and ZR created by the roosed acking strategy have the smallest query I/O costs. The advantage is more significant with smaller query windows, e.g., when the query window area is 0.00% of the data sace, the query I/O costs of HRR and ZR are 0% and % lower than that of PR, resectively (note the logarithmic scale). HRR and ZR also outerform HR, STR, and TGS. This demonstrates that HRR and ZR not only have an asymtotically otimal cost in the worst case but also erform well in other cases. HRR in articular outerforms HR by u to % when the query window area is 0.00%. This advantage attributes to the rank sace maing before acking the data oints. The roosed acking strategy effectively incororates the designs of both HR and STR, which have reorted good erformance on non-extreme data. We also notice that HRR outerforms ZR. This suggests that when acking data oints in the same rank sace, the Hilbert curve yields a better acked R-tree than the Z-curve does. This result is consistent with an earlier study [] that comares the query erformance of R-trees acked with the Hilbert curve and the Z-curve in the same Euclidean sace. To hel further understand the benefit of the roosed acking strategy, we list the average outut size (k/b) er query as follows. For the seven different query window areas tested (i.e., from 0.00% to % of the data sace size), the outut sizes are.9, 9.,., 9.9,., 9.0, and.0, resectively. Based on these outut sizes and the relative query I/O costs shown in Fig., we can derive the absolute query I/O costs of the different R-trees. For examle, at query window area being %, the relative query I/O costs of HRR, PR, and ZR are.09,., and., which corresond to 9.9 (.0.09), 0.0 (.0.), and 0.09 (.0.) I/Os, resectively. This means that HRR and ZR have. and. fewer I/Os than PR, resectively. While these numbers might seem small, they are imrovements er query. For target alications such as digital maing, there can be millions of user queries to be rocessed at the same time. The accumulated benefit of HRR and ZR over such a large number of queries is non-trivial. Varying data set cardinality. Next, we vary the data set cardinality from 0. to 0 million while keeing the query window size at 0.0% of the data sace (outut size ranging from 0.9 to 9.). We see from Fig. that HRR and ZR outerform PR consistently by reducing about 0% and 0% of the relative query I/O costs, resectively. For fairness, the PR-tree is designed for rectangles. It may not be otimal on oint data for which HRR and ZR are designed. HR and STR have closer query erformance to that of HRR and ZR, for that they share similar design with HRR and ZR. However, ZR still has a lower cost when the data set cardinality exceeds one million, while HRR has the lowest cost constantly. This again demonstrates the advantage of using the rank sace for indexing. The erformance of TGS fluctuates the most. It relies on a heuristic otimization function when acking the data oints, i.e., minimizing the area of the MBRs, which may not be otimal for all cases. I/O cost 0 9 Skewness HR HRR PR STR TGS ZR Figure 9: Query I/O cost on Skew data Varying skewness. Figure 9 shows the query I/O erformance on Skew data where the skewness arameter α is varied from to 9 (outut size ranging from 9. to.). As the skewness increases, the relative I/O costs of all the R-trees increase excet for the TGS R-tree. This is natural as more skewed data makes it more difficult to reduce overlas between the MBRs of different tree nodes, leading to more blocks being accessed for query rocessing. TGS is an excetion. Its MBR minimizing otimization function leads to better acking for more skewed data. Regardless of this, HRR and ZR still outerform TGS and the other acking strategies. Their relative query I/O costs are u to % and 0% lower than those of PR, resectively. STR is again the closest for that it shares a similar acking strategy with ZR. I/O cost HR HRR PR STR TGS ZR I/O cost HR HRR PR STR TGS ZR Data set size (million) Figure : Query I/O cost on Uniform data (varying data set cardinality) Query window size (%) Figure 0: Query I/O cost on Tiger data 0

An Introduction To Range Searching

An Introduction To Range Searching Jan Vahrenhold eartment of Comuter Science Westfälische Wilhelms-Universität Münster, Germany. Overview 1. Introduction: Problem Statement, Lower Bounds 2. Range Searching