Illinois Institute of Technolog epartment of omputer Science Spla Trees S 535 esign and nalsis of lgorithms Fall Semester, 2018 Spla trees are a powerful form of (leicographic) balanced binar trees devised b Sleator and Tarjan [1]. Spla trees are self-adjusting, so that frequentl-accessed items drift toward the root. Ever time we access the tree, we reorganie it through a sequence of rotations this organiation will be epensive on a onetime basis, but analsis shows that the basic operations require O(log n) amortied time: we show that m consecutive operations on a spla tree with n nodes require amortied time O((m+n)logn+m). Moreover, spla trees do not require an etra information in each node, as is needed for red-black trees. 1 Splaing The main operation in spla trees is the spla operation step. Splaing on a node moves to the root of the tree b a sequence of rotations that move it up two levels up at a time. If is an even number of levels from the root, then we use these two-level rotations to bring directl to the root. If is an odd number of levels from the root, then we these rotations to bring it up to one level below the root, at which point we appl either a right rotation (called a ZIG) or a left-rotation (called a ZG) to bring up to the node: ZG ZIG Thus, if our node is a left child we ZIG it to the root. Similarl, if it is a right child, we ZG it from the right to the root. To move the node two levels up the tree, consider its position relative to its grandparent. e appl either a ZIG-ZIG(that is, two ZIG s in a row) to the appropriatesubtree if is the leftmost grandchild, or a ZG-ZG if it is the rightmost grandchild. ZG-ZG ZIG-ZIG
S 535 Fall, 2018 2 Spla Trees Similarl, if is the right child of a left parent, we appl a ZIG-ZG: ZIG-ZG That is, we first rotate left at (the ZG) and then rotate right at (the ZIG). If is the left child of a right parent, we appl a ZG-ZIG operation: ZG-ZIG That is, we first rotate right at (the ZIG) and then rotate left at (the ZG). Using these operations, splaing on a node of depth d requires d rotations. e make a rotation our basic unit of time, as the can be implemented in constant time. Thus, we can sa that we need time d to spla on a node at depth d. onsider the following eample where we spla on the node 5. Note that the subtree rooted at 5 is constantl moving up the tree. 6 ZG-ZG 6 ZIG-ZG 5 2 F 2 F 2 6 3 5 4 E F 4 4 E 3 5 3 E e see that the tree goes from being long and thin list to being shorter and bushier. This is the general behavior under splaing: paths in the tree tend shorten considerabl. Pla around with the following URL
S 535 Fall, 2018 3 Spla Trees to get an idea of the power of the spla operation to keep a tree balanced as it undergoes searches, insertions, and deletions: http://www.ibr.cs.tu-bs.de/courses/ss98/audii/applets/st/splatree-eample.html 2 mortied ost of Splaing Since splaing is the primar operation, we now anale its amortied cost. To do so, we define a potential function. Suppose we have a spla tree T. Then, 1. ssign each node in T a positive weight w(). 2. Let the sie of node, denoted S(), be the sum of the weights of all the nodes in the subtree rooted at. 3. efine the rank r() = lgs(). efine the potential function of the tree T to be Φ(T) = r(). nodes T For eample, let the weight function be w() = 1 for each (internal) node (weights of leaves which are reall null pointers are 0), and consider the tree 5 2 1 4 3 The sie of node 4, for eample, is w(4)+w(3) = 1+1 = 2; thus, r(4) = lg2 = 1. e can similarl compute the rank of each internal node to get the potential of the tree: Φ(T) = r(1)+r(2)+r(3)+r(4)+r(5) = lg1+lg3+lg1+lg2+lg5 = lg30 4.91 e now use the tools developed during our stud of amortied analsis. Recall that: MORTIZE OST = TUL OST + change in potential so that ĉ i = c i +Φ(T i ) Φ(T i 1 )
S 535 Fall, 2018 4 Spla Trees and hence n ĉ i = i=1 n c i +Φ(T n ) Φ(T 0 ). i=1 here T i is the tree after the ith operation, ĉ i is the amortied cost of the ith operation (that is, what ou would charge a customer of our data-structure business) and c i is the actual cost of that operation (that is, what ou have to pa the graduate student to do the work). In other words, over a sequence of operations, MORTIZE OST (sequence) = TUL OST (sequence)+φ end Φ start or, TUL OST (sequence) = MORTIZE OST (sequence) Φ end +Φ start. (1) ecause rotations take O(1) time, we define time to mean the number of rotations needed in an operation on a tree. ccess Lemma The amortied time to spla at a node in a tree with root t is at most ( 3[r(t) r()]+1 = O 1+log S(t) ). S() The proof of this lemma is b analing the various possible steps involved in splaing. Let r ( ) be the rank of a node after the spla, and r( ) be the rank of a node before the spla. e show that the amortied cost of an step of the spla operation is at most 3[r () r()], with eception of the one etra rotation needed at the root if is initiall at odd depth. Thus, when we perform several parts of a spla operation together, the 3[r () r()] terms telescope, giving an amortied cost of amortied costs = 1+ 3[r () r()] as claimed in the lemma. e now consider the various cases. = 1+3[r final () r initial ()] = 1+3[lgS(root) lgs()] ( = O 1+log S(t) ), S() ase 1: One rotation This case occurs as the last step when the splaed node is at an odd depth from the root. The actual cost of one rotation is just one time unit (rotation). Thus, we are left to anale the change potential. For the ZIG rotation ZIG the potentials of subtrees,, and are unaffected b a ZIG because their internal node structure does not change. Thus we need be concerned onl with the change in the ranks of and. enote b r() and
S 535 Fall, 2018 5 Spla Trees r() the ranks of and, respectivel, before the ZIG; denote b r () and r () the ranks of and after the ZIG operation. The change in the potential function caused b a ZIG is Φ = r ()+r () r() r(). learl, r () r() since starts the ZIG overlooking subtrees,, and and node, but ends the ZIG overlooking onl subtrees and. Thus, we can bound the potential difference b Φ = r () r()+r () r() r () r(). Now we can bound the amortied cost, as per equation (2): MORTIZE OST = TUL OST + Φ = 1+ Φ 1+r () r() 1+3[r () r()] The last inequalit is clearl a weak statement, since the 3 is unnecessar; nevertheless, this is the inequalit needed for the telescoping mentioned above. The amortied cost of a ZG operation can be computed in the same wa, with the appropriate relabelings. ase 2: ZIG-ZIG The actual cost of a ZIG-ZIG operation is two time units, the two rotations. e now need to compute the change in the potential function. Recall the definition of a ZIG-ZIG: ZIG-ZIG s in the previous case, the potentials of the subtrees,,, and are unaffected b the operation. Thus, we have, Φ = r ()+r ()+r () r() r() r(). (2) Now, to bring this Φ to the desired form, we notice a few relationships. First, r () = r() because the rotated is precisel in the same position as the old. Moreover, r() r() because overlooks in the original tree. Similarl, r () r (). Thus, (2) becomes giving an amortied cost of Φ r ()+r () 2r(), MORTIZE OST 2+r ()+r () 2r()
S 535 Fall, 2018 6 Spla Trees e want to show that which would follow from or that is, proving that MORTIZE OST 3[r () r()], 2+r ()+r () 2r() 3[r () r()], 2r ()+r()+r ()+2 0; 2r ()+r() +r () 2 will give the claimed bound. Let us anale the lefthand side of this last inequalit 2r ()+r()+r () = r ()+r() r ()+r () = log S () S() log S () S () = log S() S () +log S () S (), (3) where S ( ) is the sie of a node after the ZIG-ZIG operation, and S( ) is the sie of a node before the ZIG-ZIG operation. efine so that (3) becomes a = S() S () and b = S () S (), lga+lgb. learl a > 0 and b > 0. Moreover, S()+S () S () because before the ZIG-ZIG overlooks subtrees and, and overlooks subtrees and, but after the ZIG-ZIG overlooks all subtrees,,, and, in addition to and. Thus, S() S () + S () S () 1 ( less than is possible because the weight of is not included), and hence a+b 1. Thus, we have a > 0, b > 0, and a+b 1. Using the conveit of the logarithm, elementar calculus tells us that in the region of interest lga+lgb reaches a maimal value of 2 for a = b = 1/2. Thus, lga+lgb 2. Substituting back we get which is what we needed to show, so 2r ()+r()+r () 2, for the ZIG-ZIG operation, as we claimed. MORTIZE OST 3[r () r()] appropriate relabeling, we see that the ZG-ZG operation has the same amortied time.
S 535 Fall, 2018 7 Spla Trees ase 3: ZIG-ZG This case is similar in fashion to the case of ZIG-ZIG that we just analed: First, recall the definition of a ZIG-ZG: ZIG-ZG s in the previous case, the actual cost of a ZIG-ZG is 2 rotations, and we have to compute the change in the potential function, Φ = r ()+r ()+r () r() r() r(). gain we note that r () = r(). Furthermore, r() r() because is above in the original tree. Thus we have MORTIZE OST = 2+r ()+r ()+r () r() r() r() s in the previous case, we wish to show that 2+r ()+r () 2r(). MORTIZE OST 3[r () r()]; so we will show that 2+r ()+r () 2r() 2[r () r()], which is less than or equal to our desired bound of 3[r () r()]. The last inequalit can be rearranged to 2r () r () r () 2. s in the previous case, we know that S ()+S () S (), so that efine and again we find lga+lgb 2 so that S () S () + S () S () 1. a = S () S () and b = S () S () 2r () r () r () 2, and the stated bound on the amortied cost of the ZIG-ZG step follows. The ZG-ZIG case follows in eactl the same fashion, proving the ccess Lemma. Note that the doublerotation steps are necessar in this calculation, because the do not carr the +1 term that the single-rotation amortied costs carr. This +1 term would destro the telescoping.
S 535 Fall, 2018 8 Spla Trees 3 alance Theorem e can now determine the actual time needed for multiple accesses to the tree. alance Theorem The total access time for m accesses of a tree with n items is: O((m+n)logn+m). The proof of this theorem follows from our potential function and the ccess Lemma, using (1). efine the weight of a node to be w() = 1/n. From the ccess Lemma, we know that the amortied cost for the m accesses is ( m O 1+log S(t) ) ( = m O 1+log 1 ) S() 1/n = m O(1+logn). The first equalit follows from the fact that t overlooks all the other nodes, so its weight is n 1/n = 1. The greatest possible starting potential is Φ start n lg1 = 0, i=1 because no verte can have sie greater than 1 (the total sie of the whole tree is 1). On the other hand, the smallest ending potential is n Φ end lg 1 n nlogn i=1 because, at worst, ever verte has its own weight of 1/n; the second inequalit comes from Stirling s approimation. Putting all of the pieces together, as the theorem states. TUL OST m O(1+logn)+0 ( nlogn) O((m+n)logn+m), 4 Static Optimalit Theorem e can refine the alance Theorem if we know something about the access frequencies of the tree nodes. Static Optimalit Theorem If each item is accessed at least once, the total time for m accesses in a tree with n nodes is n O m+ q i log m, q i where q i is the number of times item i is accessed so that n i=1 q i = m. i=1
S 535 Fall, 2018 9 Spla Trees The proof follows as in the alance Theorem, but using the weight function = q i m, (so the sum ofall weightsis 1) and observingthat the amortiedcost ofan accessofitem i is O(1+log S(t) S(i) ) = O(1+log m q i ). The Static Optimalit Theorem is amaing because it is within a constant multiple of the entrop the information-theoretic lower bound on access time for a binar tree! (This is also true for the alance Theorem.) Thus spla trees are within a constant multiple of the lower bound on the problem. Moreover, the are as good as finger trees (trees that keep fingers pointing to the most frequentl-occurring items). 5 Operations on Spla Trees ith the spla operation and its analsis, it is not too difficult to implement and anale operations such as search, insert, and delete on spla trees. First, however, we introduce three new tree operations. ccess ccess takes an item i as input, runs a search on the tree to find the node containing i, then splas on that node, moving it to the top of the tree. If no such item is found, we spla on the last non-null node that we eamined in the binar search; that is, we spla at either i or i +, the predecessor or successor of i, respectivel. Join The join operation takes two trees, T 1 and T 2, for which ever item in T 2 is greater than ever item in T 1, and returns a single tree T containing the items of both trees. Implementing join on spla trees requires accessing the largest item, which we denote b i, in T 1 b following non-null right pointers from the root, followed b a spla at the last node found, which is i, the largest item in the tree. The spla puts i at the root of T 1 and, as the largest element, must have a null right subtree, which we replace with T 2 : T 1 T 2 join i T 1 i T 2 Split This is the reverse of join. It takes a tree T and a node i as input, and creates two trees T 1 and T 2 such that all items in T 1 are smaller than (or equal to) i and all items in T 2 are greater than (or equal to) i. Implementing split on spla trees involves accessing i, and then breaking one of the root s branches, depending on whether the root is greater or less than i (arbitraril selecting one if i is the root).
S 535 Fall, 2018 10 Spla Trees T spla at i i T1 i T 1 T 2 T 2 If i is not in the tree, the root after the spla at i is either i or i +, the leicographic predecessoror successor of i, respectivel. Now it is eas to implement the familiar operations of insert and delete: Insert Takes a tree T and an item i (presumed not in the tree) as an input and inserts a node containing i into T. To perform an insert, simpl split(t,i) and then make a new tree whose left and right branches are the trees T 1 and T 2 returned from split and whose node contains the item i. T split at i T 1 T 2 connect using i i T 1 T 2 Note that insert uses the fact that the split operation works if the node i is not present in the tree. elete Takes a tree T and an item i in the tree and deletes i from T. To perform delete, we again do a split(t,i). Then remove i and join the resulting subtrees. T spla at i i join T 1 and T 2 T 1 and T 2 T 1 T 2 elete could also be done b searching for the node containing i; suppose this node is and has a parent. e then replace as a child of b joining the left and right subtrees of and then splaing on. ll these operations have a logarithmic amortied time bound. Specificall, the following table gives the amortied times for the spla tree operations as a function of, the total weight of the items in the tree(s). The variables i + and i denote, respectivel, the successor and predecessor of i in the tree. If i or i + is undefined, then w(i ) = and w(i + ) =, respectivel.
S 535 Fall, 2018 11 Spla Trees Operation mortied ost access(i,t) 3lg +1 if i is in T access(i,t) 3lg min{w(i ),w(i + )} +1 if i is not in T join(t 1,T 2 ) 3lg +O(1) where i is the last item in T 1 split(i,t) 3 lg +O(1) if i is in T split(i,t) 3lg min{w(i ),w(i + )} +O(1) if i is not in T insert(i,t) 3lg min{w(i ),w(i + )} +lg +O(1) ) delete(i,t) 3lg +3lg +O(1) ( w(i ) The bounds for access and split follow directl from the ccess Lemma. The other bounds follow b analing the change in potential. For eample, the bound on join is found as follows: first we do an access on the largest value in T 1 ; this costs at most 3lg(S(T 1 )/)+1 amortied time. The link requires onl O(1) additional work, but linking trees T 1 and T 2 increases the potential so we also have to eamine that. The onl node whose weight changes is the root of T 1, which becomes the root of the entire tree. Thus, because the total weight is = S(T 1 )+S(T 2 ), the change in potential of the new root i is at most lg lgs(t 1 ) = lg. S(T 1 ) ombining these terms gives the desired bound: S(T1 ) 3lg +O(1)+lg S(T 1 ) S(T1 ) = 2lg S(T1 ) = 2lg +lg 3lg +O(1) The bounds for insert and delete are proven in a similar manner. ( S(T1 ) +lg ( S(T 1 ) ) +O(1) ) +O(1) e warned: The constants hidden in the O notation are large for this data structure, so it ma not be practical in real life. Reference [1] aniel. Sleator and Robert E. Tarjan, Self-djusting inar Search Trees, JM, vol. 32 (1985), pp. 652 686.