Running head General Balanced Trees Author's address Arne Andersson Department of Computer Science Lund University BOX 118 S-1 00 Lund Sweden

Size: px

Start display at page:

Download "Running head General Balanced Trees Author's address Arne Andersson Department of Computer Science Lund University BOX 118 S-1 00 Lund Sweden"

Alisha Hopkins
5 years ago
Views:

1 General Balanced Trees Arne Andersson Department of Computer Science Lund University Sweden 1

2 Running head General Balanced Trees Author's address Arne Andersson Department of Computer Science Lund University BOX 118 S-1 00 Lund Sweden

3 Abstract We show that, in order to achieve eæcient maintenance of a balanced binary search tree, no shape restriction other than a logarithmic height is required. The obtained class of trees, general balanced trees, may be maintained at a logarithmic amortized cost with no balance information stored in the nodes. Thus, in the case when amortized bounds are suæcient, there is no need for sophisticated balance criteria. The maintenance algorithms use partial rebuilding. This is important for certain applications, and has previously been used with weight-balanced trees. We show that the amortized cost incurred by general balanced trees is lower than what has been shown for weight-balanced trees. 3

4 1 Introduction One of the fundamental data structures in computer science is the binary search tree. New methods to maintain data in search trees have been developed and thoroughly analyzed all through the history of the discipline. Attention has mainly been focused on trees with bounded height or balanced trees. The reason for this is obvious since worst-case access time is proportional to the height of a tree. The traditional way to maintain balance at a low cost is by means of some more os less sophisticated balance criterion. To illustrate the large variation in the world of balance criteria some examples are given below. We use jt j to denote the number of leaves èweightè of the tree T. AVL-trees, introduced by Adelson-Velskii and Landis ë1ë, are deæned by a balance criterion requiring the heights of the two subtrees of each node to diæer by at most 1. AVL-trees have a maximum height of 1:44 log jt j. Symmetric binary B-trees, or SBB-trees, were introduced by Bayer ë6ë. The very same class of trees often occurs under the name red-black trees, due to Guibas and Sedgewick ë14ë. The edges in an SBB-tree are of two types: horizontal and vertical. Two adjacent edges are never both horizontal and the number of vertical edges on the path from the root to a leaf is the same for all leaves. This criterion guarantees a maximum height oflogjt j. Weight-balanced trees were introduced by Nievergelt and Reingold ë0ë. For each nodein the tree, the ratio of the weights of its two subtrees is restricted. Given a parameter æ, 0 æé1=, for each nodev in the tree, the quotient between the weight of èi.e. the number of leaves inè v's smallest subtree and the weight of v itself must be at least æ. The maximum height ofaweight-balanced tree is log jt j log 1 1,æ æ-balanced trees were introduced by Olivie ë1ë. These trees are path-balanced. Given a parameter æ, 0éæé1, for each node in the tree, the quotient between the lengths of the shortest and longest outgoing paths must be at least æ. This guarantees a maximum height of log jt j æ. kíneighbour trees were introduced by Maurer, Ottmann, and Six ë18ë. They are unarybinary trees where all leaves have the same depth. The number of binary nodes between two unary nodes at the same level is at least k. A k-neighbour tree has a height ofatmost log jt j. logè,1=èk+1èè For each of these classes of trees, computing the maximum height from the balance criterion is a nontrivial exercise. Other examples of balanced trees are HBèkè-trees ë1ë, one-sided height-balanced trees ë15, 16ë, and SBBèkè-trees ë4ë. Note that the splay trees, introduced by Sleator and Tarjan ë4ë, are not strictly balanced since the height of a splay tree T may be æèjt jè. A natural question that arises from the study of balanced trees is whether we really need to make a detour over those balance criteria when the only thing we want èmostlyè is a logarithmic height. The disadvantage of using a sophisticated criterion is illustrated by. 4

5 Figure 1: A well-balanced tree? the tree in Figure 1. This tree has the smallest possible height with respect to its weight; however, according to the balance criteria mentioned above it is not well-balanced at all. In this article we show that, as long as we are concerned with the amortized cost of maintenance, we can replace the criteria above by a weak global criterion; we just have to specify the relation between the size and the maximum height of the tree. Not only is the balance criterion simple; the maintenance algorithms are also simple and they can be implemented without keeping any balance information in the nodes. This class of trees, called general balanced trees, is maintained by partial rebuilding. The method of partial rebuilding has been used by Overmars and van Leeuwen ë, 3ë to maintain weight-balanced trees. It can also be used to maintain a modiæed version of æ-balanced trees ë3ë. Partial rebuilding is an attractive method in the sense that it is useful not only for ordinary binary search trees, but also for more complicated data structures, such as multidimensional search trees ë7ë, where other balancing methods do not work. Making a careful analysis we are able to show a lower maintenance cost of general balanced trees than what has been shown for weight-balanced trees. Thus, we improve the method of partial rebuilding in terms of maintenance cost. A preliminary version of this article is published in Proceedings of Workshop on Algorithms and Data Structures, WADS '89, Ottawa ëë. An independent article has also been presented by Galperin and Rivest at the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'93 ë13ë. Preliminaries In the following we do not distinguish between nodes and subtrees and by the subtree v we mean the subtree rooted at the node v and by T we mean the entire tree èor the root of the treeè. We deal with binary search trees where each internal node contains one element and each leaf is empty. The number of edges on the path from the root T to the node v is the depth of v. The largest number of edges from a node v to a leaf is the height of v, denoted hèvè. The weight of v equals the number of leaves in v and is denoted. Note that atreecontaining n internal nodes èor elementsè has aweight of n +1. 5

6 Let v H and v L denote the node v's highest and lowest subtrees respectively, ties are broken arbitrarily. As a measure of the shape of the subtree v we use the weight diæerence at v, æèvè, deæned as æèvè = Max è0; jv H j,jv L j,1è è1è As the basic restructuring operation, our algorithms use a procedure for rebuilding a tree to perfect balance, as deæned below. Deænition 1 A binary tree v is perfectly balanced if and only if and æèvè = 0 and v's both subtrees are perfectly balanced trees. In the following we assume that rebuilding a èsub-ètree v takes Oèè time. Examples of linear algorithms for balancing trees can be found in the literature ë10, 11, 13, 17, 5ë. We also assume that updates are performed by adding or removing nodes at the lowest level of the tree. For each nodev, an update below v changes the value of æèvè by at most one. Hence, at least æèvè updates have been made in the subtree v since the last time it was perfectly balanced. 3 Main result The main idea in maintaining a general balanced tree is to let the tree take any shape as long as its height does not exceed dc log jt je for some constant cé1. When this criterion is violated, the height can be decreased by partial rebuilding at a low amortized cost. This is due to the following observation on the shape of an unbalanced tree: Lemma 1 Let T be a binary tree, hèt è é dc log jt je. Let v be the lowest node on èany ofè T 's longest pathèsè such that hèvè é dc log e. Then, æèvè é 1,1=c, 1,1 Proof: Since v is the lowest node satisfying hèvè é dc log e we know that hèv H è dc log jv H je. Hence, èè This gives that dc log e é hèvè =hèv H è+1dc log jv H je +1 dc log e é dc log jv H je +1 jv H j é,1=c è3è è4è From the fact that jv L j =,jv H j we can compute the value of æèvè: æèvè =jv H j,è,jv H jè, 1 =jv H j,,1 é 1,1=c, 1,1: è5è By a straightforward application of Lemma 1 we can maintain a balanced tree eæciently during insertions. 6

7 Theorem 1 If no deletions are made, a binary search tree T with maximum height dc log jt je, c é 1, can be maintained at an amortized cost of Oèlog jt jè per insertion. No balance information is needed in the nodes, only one global integer is needed. Proof: We let the global integer contain the value of jt j. During insertion, we keep track of the depth of the insertion path. Whenever the depth of a new leaf exceeds dc log jt je we back up along the path until we ænd the lowest node v, hèvè é dc log e. We make a partial rebuilding at v. In order to locate the node v, we need to keep track of the heights and weights of the visited nodes as we follow the path upwards. The weights are computed by explicitly counting the nodes in all subtrees along the path. The total cost of this counting is Oèè. After the rebuilding, hèvè = dlog e. This implies that the height of v, and therefore also the height of T, does not exceed the height before the insertion. Hence, if hèt è dc log jt je held before the insertion, it will also hold after. Since the height of an empty tree is zero, the height condition holds by induction. The cost of the rebalancing, including the cost of locating v, is Oèè. According to Lemma 1, æèvè has been changed æèè times since the last time v was involved in a rebuilding. Hence, by reserving a constant amount of extra time each time the weight diæerence is changed at a node, enough time will be saved to cover the cost of rebuilding. Since an update aæects the weight diæerence of Oèlog nè nodes, the amortized cost of insertions will be Oèlog nè. In order to handle deletions eæciently, we make a simple extension to the algorithm above; when enough deletions have been made to cover the cost, we rebuild the entire tree to perfect balance. Theorem Given constants cé1, and b é 0, a balanced tree T with maximum height dc log jt j + be may be maintained without any balance information stored in the nodes, using two global integers, at an amortized cost of Oèlog jt jè per update. Proof: We let the two global integers contain jt j, thenumberofleaves in jt j, and dèt è, the number of deletions made since the last time T was globally rebuilt. Updates are performed in the following way: Insertion: If the depth of the new leaf exceeds dc logèjt j +dèt èèe we back up along the insertion path until we ænd the lowest node v, hèvè é dc log e, where a partial rebuilding is made. Deletion: dèt è increases by one.if dèt è è b=c, 1èjT j we rebuild T to perfect balance and set dèt è=0. A deletion does not increase hèt è. Furthermore, after an insertion we are guaranteed that hèt è dc logèjt j +dètèèe, provided that this relation holds before the insertion. Hence, by induction, hèt è dclogèjt j +dètèèe. Since dèt è é è b=c, 1èjT j, we conclude that hèt è dc logèjt j +dèt èèe l c log è b=c èjt jm = dc log jt j + be: è6è Rebuildings are made on two occasions: when the number of deletions get too large and when the tree gets too high during insertion. The amortized cost for the ærst type of 7

8 M D Q K P T H S X J R V Z U Figure : A GBè1:è-tree which requires rebalancing. rebuilding is constant for each deletion. For the second case we use the same argument as in the proof of Theorem 1. By reserving a constant time each time an update èinsertion or deletionè changes the weight diæerence of a node, we save enough time to cover partial rebuildings. Since the trees maintained in Theorems 1 and are allowed to take any shape as long as their height is low enough, we call them general balanced trees. We use the notation GB-trees or GBècè-trees, where c is the height constant used in the theorems. èthe constant b in Theorem is omitted in this notation.è Example: Figure shows a GBè1:è-tree maintained as in the proof of Theorem. We assume that two deletions have been made since the last global rebuilding and that node U is being inserted. The path to U is too long since hèt è = 6 é d1: logè15 + èe = d1: logèjt j +dètèèe. We have to make a partial rebuilding at one of the nodes on that path. Making a depth-ærst search we ænd that hèqè = 5 é d1: log 10e = d1:logjqje. A partial rebuilding is made at Q and the insertion is completed. The resulting tree is shown in Figure 3. 4 Analysis II: the constant factor Above we proved our main result; a GB-tree can be maintained at Oèlog nè amortized cost per update. In this section, we make a more detailed study. The purpose of this study is to show a better constant factor for GB-trees than what has previously been shown for weight-balanced trees. When analysing the constant factor, by cost we mean the amount of restructuring 8

9 M D T K R X H Q S V Z J P U Figure 3: The tree in Figure after a partial rebuilding. work needed per update. To be more precise, we let the cost of a partial rebuilding equal the number of internal nodes involved. Thus, the cost of a partial rebuilding at node v is,1. Other costs, such as counting sizes of trees, creating new nodes or removing nodes, etc are ignored. The comparison with weight-balanced trees is made in Section 5. Although this was not made in Theorem, the result of Lemma 1 allows us to compute a constant factor, telling an upper bound on the amount of rebalancing work spent per update. We chose to express this work in terms of the number of internal nodes involved in a rebuilding. Thus, the cost of a partial rebuilding at node v is,1. According to Lemma 1, when a partial rebuilding is to be made at node v, we have that æèvè é 1,1=c, 1,1: 1 Thus, the amortized cost of changing the æ-value of a node during an update is. 1,1=c,1 Since an update is made below at most dc log jt je nodes, the amortized cost per update dc log jt je in the entire tree is. 1,1=c,1 The improved analysis is based on the fact that when a node becomes unbalanced there is not only a diæerence in weight between its two subtrees but also a certain imbalance at the lower levels of the tree. In order to measure this imbalance we deæne a function èvè asfollows. èvè = è 0 if v is a leaf æèvè+èv H è+èv L è otherwise, where æèvè is deæned as in the previous section. The imbalance along the longest path from a node v can be expressed using the function æèvè, deæned below. æèvè = è 0 if v is a leaf æèvè+æèv H è otherwise. 9 è7è è8è è9è

10 From the deænitions follows that æèvè æèvè èvè: è10è Below, in Lemma 5, we compute the value of èvè whenv is about to be rebuilt. In order to do that, we ærst show which conæguration gives the lowest possible value of æèvè in Lemmas 3 and 4. We say that a node v has minimal weight if decreasing its weight by one without changing its height would make it out of balance, i.e. if hèvè é dc logè,1èe. Lemma If u has minimal or less weight and u H ju H j,ju L j0. has minimal or larger weight, then Proof: The conditions for the lemma give hèuè é dc logèjuj,1èe and hèu H è dclog ju H je: Hence, dc logèjuj,1èe éhèuè =hèu H è+1dclog ju H je +1: Due to the strict inequality, we can remove the ceilings. Hence, c logèjuj,1è éclog ju H j +1: è11è è1è è13è è14è This gives that ju H j é,1=c èjuj,1è é èjuj,1è= è15è and hence ju H j,ju L j =ju H j,juj é èjuj,1è=,juj é,1: è16è The lemma follows since ju H j,ju L j is an integer. Lemma 3 Let v be a node that is about to be rebuilt during an insertion. Furthermore, assume that is æxed. Then, the smallest possible value of æèvè occurs when each descendant w 6= v along the longest path from v satisæes at least one of the following: æ w has minimal weight; æ jw L j =1. Proof: We prove the lemma by acontradiction. Assume that the smallest value of æèvè occurs only when there exists some nodes on v's longest path which does not satisfy any of the two statements above. Let w be the highest such node and u be the parent of w. That is, w is the same node as u H. The node u will either be v itself, or it may have minimal weight or an empty subtree. Thus, u has minimal or less weight, w has more than minimal weight, and w L é 1. Consider moving a node from w L to u L. Since jw L j é 1, there is a node that can be moved. After the move, w will have minimal or larger weight, while u's weight will be 10

11 the same as before. The overall condition, that a rebuilding is to be made at v, is not aæected by the move. Lemma implies that ju H j,ju L j 0 after the move. This, in turn, implies that ju H j,ju L j was at least before the move. Thus, the move caused æèuè to decrease by at least 1, while æèwè has changed by at most 1. Altogether, the move will not cause æèuè, and hence æèvè, to increase. We can now continue moving nodes until w satisæes the two conditions in the lemma. This gives the contradiction. Lemma 4 For each node v with an empty subtree or minimal weight the following is true: æèvè é 1,1=c, 1,3 Proof: If v has an empty subtree then we are done since æèvè =,1. Otherwise, since v has its smallest possible weight we have è17è hèvè é dc logè,1èe è18è This and the fact that dc log jv H je hèv H è implies that dc log jv H je +1 hèv H è+1=hèvè é dc logè,1èe: è19è Thus, c log jv H j +1éclogè,1è 1=c jv H j é,1 jv H j é,1=c è,1è é,1=c,1 è0è From this follows that æèvè =jv H j,è,jv H jè, 1 =jv H j,,1 é,1=c,1,,1 1,1=c, 1,3: è1è Given the result of Lemma 3 and 4 we can compute the value of èvè whenv is about to be rebuilt. Lemma 5 Let v be a node where a rebuilding is to be made during an insertion. Then, èvè é, 1=c æ è,1è, 3dc log e èè 1=c, 1 11

12 Proof: Let v d be v's dth descendant on the longest path from v. We have that hèv d è+d =hèvè; hèvè é dc log e; and, for d 1, hèv d è dclog jv d je: Combining Eqs. è3è, è4è, and è5è gives è3è è4è è5è dc log jv d je + d hèv d è+d =hèvè é dc log e d=c jv d j é jv d j é,d=c è6è Assume, hypothetically, that v has a minimal value of æ. Combining Lemmas 3 and 4 with Eq. è6è gives that fore every d 1: æèv d è é 1,1=c, 1,d=c,3 è7è For d = 0, the same inequality holds as shown by Lemma 1. Adding the weight diæerences on v's longest path gives èvè æèvè é X dc log e,1 d=0 æèv d è dc log e,1 X 1,1=c, 1 = 1,1=c, 1 æ d=0,d=c, 1,,dc log e=c dc log e,1 X d=0,3dc log e 1,,1=c è, 1=c 1=c, 1 æ 1 1!,,3dc log e =, 1=c 1=c æ è,1è, 3dc log e: è8è, 1 Note that the hypothesis on æèvè is not required. Finally, we are able to give an upper bound on the amount of restructuring work needed to maintain a general balanced tree. In the analysis we assume that the cost of rebuilding a subtree v equals the number of elements in v, which is,1. Theorem 3 Provided that the cost of rebuilding a subtree v is,1 and that cé1 and b é 0, a binary search tree T of height at most dc log jt j + be can be maintained at an amortized restructuring cost of 1=c,1 c log jt j + oèlog jt jè per update., 1=c 3 1

13 Proof: We use the algorithm from Theorem. First, we note that the rebuildings made to compensate for deletions èi.e. when dèt è becomes too largeè only require a constant amortized cost per deletion; the constant depends on the relation between b and c. We ignore the cost for these rebuildings since it is included in the oèlog nè term in the amortized update cost. We associate the tree T with a potential function æèt è. The function æ is chosen such that each update èbefore any rebuildingè increases its value by an amount that is at most 1=c,1 c log jt j + oèlog jt jè. Each rebuilding decreases æèt èby an amount that covers the, 1=c cost of the rebuilding. By showing that the potential is always positive or zero, we prove that the amortized cost of restructuring is at most 1=c,1 c log jt j + oèlog jt jè., 1=c Before we give thepotential function, we would like to make an addition to the deænition of v H and v L from Section. Previously, wehave only used v H and v L in association with nodes that are about to be involved in a partial rebuilding. In order to make our potential argument hold also for nodes that are not about to be involved in a rebuilding, we let the potential function look into the future. For each node v, we let v H be that of v's subtrees which is going to be highest the next time v is involved in a rebuilding, regardless of which subtree is currently the highest. If v will never be involved in any future rebuilding, or if both subtrees will have the same height, we take v H as the right subtree. This deænition implies that we will not be able to compute the exact value of our potential function at a certain moment. However, at each update, we can still compute upper and lower bounds on the change in potential, which is enough for our purposes. In particular, this deænition implies that a partial rebuilding at the node v will not aæect the values of w L and w H for any node w outside the subtree v. The potential æèt è is chosen as where the function èvè is chosen as =c, 1=c æèt è= 1=c, 1 æ èt è+4 1=c, è, 1=c è èt è è9è èvè = 8 é : 0 if v isaleaf æèvèædc log e + èv H è+èv L è otherwise. Note that for a perfectly balanced tree, æèt è = 0. A single update causes the value of dc log e èt è to be changed by at most for each ancestor v of the insertedèdeleted node. By deænition of weight and height we have that hèvè + 1,which implies that è30è dc log e dc logèhèvè+1èe hèvè+1 è31è Thus, the ancestor at height h causes an increase of èt èby at most an update causes the following change in èt è: æèvè = O dc log jt j+b+1e X dc logèh +1èe h=1 log dc log jt je h dc logèh+1èe h+1. Therefore, = oèlog jt jè è3è 13

14 During an update the value of èt èischanged by at most the number of ancestors of the insertedèdeleted node. The number of ancestors is at most dc log jt j + be. Altogether we get an increase of æ by =c, 1=c ææèt è= 1=c, 1 æ æèt è+4, 1=c è, 1=c è æèt è 1=c, 1 c log jt j + oèlog jt jè è33è, 1=c per update. Left to show is that the restructuring cost is covered by the potential function æ. Since the cost of rebuilding trees of constant size can be included in the oèlognè term in the theorem, we may w.l.o.g. assume that 4 é æ 1=c : è34è, 1=c Using the value of æèvè from Lemma 1 in the deænition of èvè, it follows that when v is about to be rebuilt, æèvè ædcælog e èvè è, 1=c é,1 1=c! =, 1=c 1=c ædc æ log e, dc æ log e æ dc æ log e : è35è After a rebuilding at the node v æèvè = 0. Furthermore, for each node w in T which is not part of the subtree v, the values of w H and w L remain unchanged after the rebuilding èdue to the redeænition made aboveè. Hence, Eq. è9è implies a decrease of T 's potential such that =c, 1=c ææèt è=æèvè = 1=c, 1 æ èvè+4, 1=c è, 1=c è èvè: è36è Taking èvè from Lemma 5 and èvè from Eq. 35 we get è! æèvè é 1=c, 1, 1=c, 1=c 1=c, 1 æ è,1è, 3dc log e + 4 è æ =c, 1=c, 1=c è, 1=c è dc log e, 1=c =,1, 1=c, 1, 1=c æ 3dc log e + 1=c, 1, æ 4dc log 4 æ e, 1=c =,1+ 1=c, 1, 1=c, 4 æ 14 =c, 1=c è, 1=c è =c, 1=c è, 1=c è 1 æ dc log e! dc log e æ 1 A ædc log e: è37è

15 Eq. è34è implies that the last term is positive. Hence ææèt è é,1, which implies that the decrease in potential covers the cost of rebuilding at v. The proof follows from the fact that our potential function æ is always nonnegative. 5 A comparison with weight-balanced trees As mentioned in the introduction, the method of partial rebuilding is useful not only for ordinary binary search trees, but also for more complicated data structures, such as multidimensional search trees, where other balancing methods do not work. Previously, the only classes of balanced tree suitable for partial rebuilding have been the weightbalanced trees ë19,, 3ë and a modiæed version of æ-balanced trees ë3ë. Since general balanced trees are also maintained by partial rebuilding, it is natural to compare the maintenance costs. Since æ-balanced trees have the same maintenance cost as weightbalanced trees, we exclude them from our comparison. In order to compare the structures we chose their balance criteria in such a way that the maximum heights of the trees are the same. Given a constant æ, 0 éæé1=, each node v in a weight-balanced tree has to fulæll jv's smallest subtreej æ è38è The maximum height of the tree is & log jt j log 1 1,æ ' : è39è A maximum height of dc log jt je for weight-balanced trees is achieved by choosing æ = 1,,1=c. When anodev becomes out of balance, we have that ëë æèvè è1, æè,1= 1,1=c, 1,1: è40è Thus, if the cost of rebuilding a subtree v is,1, the amortized cost of an update 1 is at most for each ancestor of the insertedèdeleted node. Since an update is 1,1=c,1 made below at most dc log jt je nodes, the amortized cost per update in the entire tree is dc log jt je at most. èa more detailed study of weight-balanced trees maintained by partial 1,1=c,1 rebuilding can be found in the literature ëëè. It should be noted that this bound on maintaining weight-balanced trees is the same as the ærst bound given for GB-trees at the beginning of Section 4. If we compare the cost of GBècè-trees ètheorem 3è with the cost of weight-balanced trees we get = cost of general balanced trees cost of weight-balanced trees 1=c,1 dc log jt je + oèlog jt jè, 1=c dc log jt je 1,1=c,1 15

16 1=c, 1, 1=c æ 1,1=c, 1 =1,,1=c é 1 è41è Thus, the upper bound on the restructuring cost of maintaining balanced trees by partial rebuilding has been reduced by a factor at least. 6 Application to multidimensional search trees In cases when rotations are costly, such as when maintaining k-d trees ë7ë, the method of partial rebuilding oæers a powerful alternative. This has been shown for weight-balanced trees ë19,, 3ë and the same asymptotic bounds can be derived for general balanced trees. Also here, we achieve the advantage of less stored information and a lower constant factor. A detailed study of these matters has been made by Mark Overmars ëë. For the sake of completeness, we just mention that if thecost of rebalancing a subtree v is OèP èèè, the amortized cost of an update will be P ènè O log n n. For example, applied on k-d-trees, we get an amortized update cost of Oèlog nè. 7 Conclusions Introducing general balanced trees, we have shown that there is no need for sophisticated balance criteria in order to maintain balance eæciently. The presented trees use a natural and attractive maintenance strategy; rebalancing is not made until it is really needed. Baer and Schwab ë5ë made an attempt to use a global balance criterion and make restructurings only when this criterion is violated. The amortized cost for their method is æènè per operation. Therefore, they concluded that the best balancing methods are the ones that involve the strictest balance criteria. Here we have shown that, choosing carefully where to make rebuilding, a tree with a global balance criterion can be eæciently maintained. By comparing general balanced trees with weight balanced trees, we have also shown that a restricted balance criterion is not necessarily the best. Another advantage with the general balanced trees is that they can be easily maintained without any extra information stored in the nodes. As shown by Brown ë8, 9ë, the explicitly stored balance information may in some classes of balanced trees be eliminated by coding the information through the location of empty pointers. However, the information is still stored, although implicitly. The splay tree presented by Sleator and Tarjan ë4ë does not require any balance information stored in the nodes. However, the height of a splay tree is not guaranteed to be Oèlog nè. The logarithmic cost for searching in a splay tree is amortized, while we obtain a logarithmic worst-case cost. In summary, the discovery of general balanced trees ælls a gap in the well-studied area of binary search trees, showing that what may be the simplest balance criterion works surprisingly well. We believe that these trees oæer a competitive alternative to other balanced tree structures, both from a theoretical and practical point ofview. 16

17 Finally, an open problem: In this article we have given a better upper bound on the constant factor for GB-trees than what has been shown for weight-balanced trees. We conjecture that the constant factor is indeed better for GB-trees and we leave it as an open problem to prove this. Acknowledgements Many thanks to Amir Ben-Amram for spotting two æaws in the proof of Theorem 3 and for a large number of improving comments during the revising process. The comments by Kerstin Andersson, Svante Carlsson, Johan Hastad, Thomas Ottmann, and Ola Petersson have also improved the presentation of this work. References ë1ë G. M. Adelson-Velskii and E. M. Landis. An algorithm for the organization of information. Dokladi Akademia Nauk SSSR, 146èè:159í16, 196. ëë A. Andersson. Improving partial rebuilding by using simple balance criteria. In Proc. Workshop on Algorithms and Data Structures, Springer Verlag, pages 393í40, ë3ë A. Andersson. Maintaining æ-balanced trees by partial rebuilding. International Journal of Computer Mathmathics, 38:37í48, ë4ë A. Andersson, Ch. Icking, R. Klein, and Th. Ottmann. Binary search trees of almost optimal height. Acta Informatica, 8:165í178, ë5ë J-L. Baer and B. Schwab. A comparison of tree-balancing algorithms. Communications of the ACM, 0è5è:3í330, ë6ë R. Bayer. Symmetric binary B-trees: Data structure and maintenance algorithms. Acta Informatica, 1è4è:90í306, 197. ë7ë J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18è9è:509í517, ë8ë M. R. Brown. A storage scheme for height-balanced trees. Information Processing Letters, 7è5è:31í3, ë9ë M. R. Brown. Addendum to ëa storage scheme for height-balanced trees". Information Processing Letters, 8è3è:154í156, ë10ë H. Chang and S. S. Iynegar. Eæcient algorithms to globally balance a binary search tree. Communications of the ACM, 7è7è:695í70, ë11ë A. C. Day. Balancing a binary tree. Computer Journal, 19è4è:360í361, ë1ë C. C. Foster. A generalization of AVL-trees. Communications of the ACM, 16è8è:513í 517,

18 ë13ë I Galperin and R. L. Rivest. Scapegoat trees. In Proceedings of The Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 165í174, ë14ë L. J. Guibas and R. Sedgewick. Adichromatic framework for balanced trees. In Proc. 19th Ann. IEEE Symp. on Foundations of Computer Science, pages 8í1, ë15ë D. E. Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching. Addison-Wesley, Reading, Massachusetts, ISBN X. ë16ë S. R. Kosaraju. Insertion and deletion in one-sided height-balanced trees. Communications of the ACM, 1:6í7, ë17ë W. A. Martin and D. N. Ness. Optimizing binary trees grown with a sorting algorithm. Communications of the ACM, 15èè:88í93, 197. ë18ë H. A. Maurer, Th. Ottmann, and H. W. Six. Implementing dictionaries using binary trees of very small height. Information Processing Letters, 5è1è:11í14, ë19ë K. Mehlhorn. Data Structures and Algorithms 1: Sorting and Searching. Springer- Verlag, ISBN X. ë0ë J. Nievergelt and E. M. Reingold. Binary trees of bounded balance. SIAM Journal on Computing, è1è:33í43, ë1ë H. J. Olivie. A Study of Balanced Binary Trees and Balanced One-Two-Trees. Ph.D. Thesis, Dept of Mathematics, University of Antwerp, ëë M. H. Overmars. The Design of Dynamic Data Structures, volume 156 of Lecture Notes in Computer Science. Springer Verlag, ISBN X. ë3ë M. H. Overmars and J. van Leeuwen. Dynamic multi-dimensional data structures based on quad- and k-d trees. Acta Informatica, 17:67í85, 198. ë4ë D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. Journal of the ACM, 3è3è:65í686, ë5ë Q. F. Stout and B. L. Warren. Tree rebalancing in optimal time and space. Communications of the ACM, 9è9è:90í908,

The Complexity of Constructing Evolutionary Trees Using Experiments

The Complexity of Constructing Evolutionary Trees Using Experiments Gerth Stlting Brodal 1,, Rolf Fagerberg 1,, Christian N. S. Pedersen 1,, and Anna Östlin2, 1 BRICS, Department of Computer Science, University