CSE 5311 Notes 5: Trees. (Last updated 6/4/13 4:12 PM)

SE 511 Notes 5: Trees (Last updated 6//1 :1 PM) What is the optimal wa to organie a static tree for searching? n optimal (static) binar search tree is significantl more complicated to construct than an optimal list. 1. ssume access probabilities are known: kes are K 1 < K < < K n pi = probabilit of request for Ki qi = probabilit of request with Ki < request < Ki+1 q 0 = probabilit of request < K 1 qn = probabilit of request > Kn. ssume that levels are numbered with root at level 0. Minimie the epected number of comparisons to complete a search:. Eample tree: n n p j ( KeLevel( j) +1) + q j MissLevel j j=1 j=0 0 K 1p 1 K1 p1 K p q0 q1 K K5 p p5 q q q q5. Solution is b dnamic programming: Principle of optimalit - solution is not optimal unless the subtrees are optimal. ase case - empt tree, costs nothing to search.

pi+1 pj qi qj qi+1 qj-1 c( i, j) cost of subtree with kes K i+1,,k j c( i, j) alwas includes eactl p i+1,, p j and q i,,q j c( i,i) = 0 ase case, no kes, just misses for q i (request between K i and K i+1 ) Recurrence for finding optimal subtree: c( i, j) = w( i, j) + min ( c( i,k 1) + c( k, j) ) i<k j tries ever possible root ( k ) for the subtree with kes K i+1,,k j w( i, j) = p i+1 + + p j + q i + + q j accounts for adding another probe for all kes in subtree : Left: pi+1 + + pk 1 + qi + qk 1 Right: p k+1 + + p j + q k + q j Root: p k Kk c(i,k-1) c(k,j) 5. Implementation: k-famil is all cases for c i,i + k. k-families are computed in ascending order from 1 to n. Suppose n = 5:

_0 1 _ 5 c( 0,0) c( 0,1) c( 0,) c( 0,) c 0, c( 1,1 ) c( 1, ) c( 1, ) c( 1, ) c( 1,5 ) c(,) c(,) c(,) c(,5) c(,) c(, ) c(,5) c(,) c(,5) c 5,5 c( 0,5) ompleit: O n space is obvious. O n time from: n k=1 k( n +1 k) where k is the number of roots for each c i,i + k in famil k. and n +1 k is the number of c( i,i + k) cases 6. Traceback - besides having the minimum value for each c( i, j), it is necessar to save the subscript for the optimal root for c i, j as r[i][j]. This also leads to Knuth s improvement: must have a ke with subscript no less than the ke and no greater than the ke subscript for the Theorem: The root for the optimal tree c i, j subscript for the root of the optimal tree for c i, j 1 root of optimal tree c( i +1, j). (These roots are computed in the preceding famil.) Proof: 1. onsider adding p j and q j to tree for c( i, j 1). Optimal tree for c i, j at the root or use one further to the right. must keep the same ke Ki+1 Kj-1. onsider adding p i+1 and q i to tree for c( i +1, j). Optimal tree for c i, j ke at the root or use one further to the left. must keep the same Ki+ Kj

7. nalsis of Knuth s improvement. case for k-famil will var in the number of roots to tr, but overall time is reduced to Each c i, j O n b using a telescoping sum: n n k n ( r[ i +1] [ i + k] r[ i] [ i + k 1] +1) = k= i=0 k= n = ( r[ n k +1] [ n] r[ 0] [ k 1] + n k +1) k= n n ( n 0 + n k +1) = ( n k +1) = O n k= k= r[ 1] [ k] r[ 0] [ k 1] +1 + r[ ] [ 1+ k] r[ 1] [ k] +1 + r[ ] [ + k] r[ ] [ 1+ k] +1 + + r[ n k +1] [ n] r[ n k] [ n 1] +1 n=7; q[0]=0.06; p[1]=0.0; q[1]=0.06; p[]=0.06; q[]=0.06; p[]=0.08; q[]=0.06; p[]=0.0; q[]=0.05; p[5]=0.10; q[5]=0.05; p[6]=0.1; q[6]=0.05; p[7]=0.1; q[7]=0.05; for (i=1;i<=n;i++) ke[i]=i;

5 w[0][0]=0.060000 w[0][1]=0.160000 w[0][]=0.80000 w[0][]=0.0000 w[0][]=0.90000 w[0][5]=0.60000 w[0][6]=0.810000 w[0][7]=1.000000 w[1][1]=0.060000 w[1][]=0.180000 w[1][]=0.0000 w[1][]=0.90000 w[1][5]=0.50000 w[1][6]=0.710000 w[1][7]=0.900000 w[][]=0.060000 w[][]=0.00000 w[][]=0.70000 w[][5]=0.0000 w[][6]=0.590000 w[][7]=0.780000 w[][]=0.060000 w[][]=0.10000 w[][5]=0.80000 w[][6]=0.50000 w[][7]=0.60000 w[][]=0.050000 w[][5]=0.00000 w[][6]=0.70000 w[][7]=0.560000 w[5][5]=0.050000 w[5][6]=0.0000 w[5][7]=0.10000 w[6][6]=0.050000 w[6][7]=0.0000 w[7][7]=0.050000 ounts - root trick without root trick 77 verage probe length is.680000 trees in parenthesied prefi c(0,0) cost 0.000000 c(1,1) cost 0.000000 c(,) cost 0.000000 c(,) cost 0.000000 c(,) cost 0.000000 c(5,5) cost 0.000000 c(6,6) cost 0.000000 c(7,7) cost 0.000000 c(0,1) cost 0.160000 1 c(1,) cost 0.180000 c(,) cost 0.00000 c(,) cost 0.10000 c(,5) cost 0.00000 5 c(5,6) cost 0.0000 6 c(6,7) cost 0.0000 7 c(0,) cost 0.0000 (1,) c(1,) cost 0.500000 (,) c(,) cost 0.00000 (,) c(,5) cost 0.10000 5(,) c(,6) cost 0.570000 6(5,) c(5,7) cost 0.60000 7(6,) c(0,) cost 0.780000 (1,) c(1,) cost 0.700000 (,) c(,5) cost 0.80000 (,5) c(,6) cost 0.800000 5(,6) c(,7) cost 1.000000 6(5,7) c(0,) cost 1.050000 (1,(,)) c(1,5) cost 1.10000 (,5(,)) c(,6) cost 1.10000 5((,),6) c(,7) cost 1.90000 6(5(,),7) c(0,5) cost 1.90000 ((1,),5(,)) c(1,6) cost 1.60000 5((,),6) c(,7) cost 1.810000 5((,),7(6,)) c(0,6) cost.050000 ((1,),5(,6)) c(1,7) cost.0000 5((,),7(6,)) c(0,7) cost.680000 5((1,(,)),7(6,)) : c(0,) + c(,7) + w[0][7] 0. 1.9 1.0 =.7 : c(0,) + c(,7) + w[0][7] 0.78 1.0 1.0 =.78 5: c(0,) + c(5,7) + w[0][7] 1.05 0.6 1.0 =.68 1 5 6 7

SPLY TREES 6 Self-adjusting counterpart to VL and red-black trees dvantages - 1) no balance bits, ) some help with localit of reference, ) amortied compleit is same as VL and red-black trees isadvantage - worst-case for operation is O(n) lgorithms are based on rotations to spla the last node processed () to root position. Zig-Zig: 1. Single right rotation at.. Single right rotation at. (+ smmetric case) Zig-Zag: ouble right rotation at. (+ smmetric case) Zig: pplies ONLY at the root Single right rotation at. (+ smmetric case) Insertion: ttach new leaf and then spla to root.

7 eletion: 1. ccess node to delete, including spla to root.. ccess predecessor in left subtree and spla to root of left subtree.. Take right subtree of and make it the right subtree of. mortied nalsis of Splaing for Retrieval (aside): ctual cost (rotations) is for ig-ig and ig-ag, but 1 for ig. S() = number of nodes in subtree with as root ( sie ) r() = lg S() ( rank ) = r Φ T T Eamples:.=lg 5 E.=lg 5 1.58=lg E =lg 1.58=lg Φ=.9 1=lg Φ=6.9 Now suppose that the leaf in the second eample is retrieved. Two ig-igs occur.

E.=lg 5.=lg 5 8 =lg =lg 1.58=lg Φ=6.9 1=lg E 1=lg Φ=5. ˆ i = i + Φ( fter) Φ( efore) = + 5. 6.9 =. 1+ lg n = 7.96 nother eample of splaing. There will be a ig-ag and a ig. E.17=lg 9 E.17=lg 9.17=lg 9 =lg H F 1=lg 1=lg G Φ=9.17 =lg I =lg H F 1=lg 1=lg G I Φ=9.17 =lg 1=lg Φ=9.98 E F 1=lg.81=lg 7 H G =lg I ˆ i = i + Φ fter Φ( efore) = + 9.98 9.17 =.81 1+ lg n =10.51 ompute amortied compleit of individual steps and then the complete splaing sequence: Lemma: If α > 0, β > 0, α + β 1, then lgα + lgβ. Proof: lgα + lgβ = lgαβ. αβ is maimied when α = β = 1, so ma lgα + lgβ =. ccess Lemma: Suppose 1) is node being splaed ) subtree rooted b has and r i 1 before ith step and r i after ith step S i 1 S i then ˆ i r i r i 1, ecept last step which has ˆ i 1+ r i r i 1.

Proof: Proceeds b considering each of the three cases for splaing: 9 Zig-Zig: ˆ i = i + Φ T i = + r i = + ri + r i (*) + ri Let α = S i 1 S i Φ( T i 1 ) + r i + r i ( ) r i 1 r i 1 r i 1 ( ) Potential changes onl in this subtree + ri ( ) ri 1 ri 1 ri = ri 1 ( ) + r i ( ) r i 1 r i 1 r i 1 + ri ( ) ri 1 ri ri, β = S i( ) S i, α > 0, β > 0, α + β = S i 1 S i +S i 1. ( is absent from numerator) Lemma conditions are satisfied, so lgα + lgβ. ppling logs to α and β gives: ri 1 + ri ( ) ri which ma be rearranged as: 0 ri ri 1 ri ( ) dd this to (*) to obtain: ˆ i r i r i 1 Zig-Zag:

ˆ i = i + Φ T i = + r i = + r i (**) + r i Φ( T i 1 ) + r i + r i ( ) r i 1 r i 1 r i 1 ( ) Potential changes onl in this subtree + r i ( ) r i 1 r i 1 r i = r i 1 ( ) + r i ( ) r i 1 r i 1 r i 1 Lemma ma be applied b observing that S i + S i ( ) S i and thus + S i( ) S i 1 S i S i lemma, lg S i S i + lg Si S i + r i ( ) r i + r i ( ) r i which can substitute into (**) r i r i ˆ i + ri = r i r i ri 1 r i 1 r i 1 Since r i 1 r i 10 Zig: ˆ i = i + Φ Ti =1+ ri =1+ r i 1+ r i 1+ r i Φ( Ti 1 ) + ri ri 1 ri 1 Potential changes onl in this subtree r i 1 r i 1 = r i r i 1 r i r i r i 1 r i 1 r i

ound on total amortied cost for an entire spla sequence: 11 m m 1 ˆ i = ˆ i + ˆ m i=1 i=1 m 1 r i i=1 = r m 1 ( r i 1 ) =1+ r m 1+ r m +1+ r m r m 1 r 0 +1+ r m r m 1 r 0 =1+ lgn since is the root after the final rotation sides: If each node is assigned a positive weight and the sie of node, S(), is the sum of the weights in the subtree, other results ma be shown such as: Static Optimalit: Spla trees (online) perform within a constant factor of the optimal (static) binar search tree (offline) for a sequence of requests. ut the elusive goal remains: namic Optimalit onjecture: Spla trees (online) perform within a constant factor of the optimal (dnamic) binar search tree (offline) for a sequence of requests. In principle, this ma be attacked like MTF is dnamicall optimal. Man difficulties arise... Potential Function: Minimum number of rotations to transform a ST into another ST? Transformation: Suppose a ST with n nodes has kes 1... n. onstruct the regular (n+)-gon with vertices 0, 1,,..., n+1. If the ST has a subtree with root i, minimum ke min( i), and maimum ke ma( i), then include edges { i,min( i) 1} and { i,ma( i) +1}. lso, include edge { 0,n +1}. These edges give a triangulated polgon. ( bound of Ο loglog n as been shown: http://erikdemaine.org/papers/tango_fos00/)

1 ST rotation corresponds to flipping the diagonal of a quadrilateral: 1 5 1 5 5 1 0 1 5 6 0 1 5 6 0 1 5 6 Rotate Left Rotate Right Ma also rotate the original ST left at 1 and right at 5.

SE 511 Notes 6: Skip Lists (Last updated 6//1 :1 PM) randomied data structure with performance characteristics similar to treaps, i.e. another alternative to balanced binar search trees. Paper available at ftp://ftp.cs.umd.edu/pub/skiplists/skiplists.pdf. dvantages: Simple algorithms, especiall for ordered retrieval Highl likel to perform as well as balanced trees. Less space: isadvantages: No balance bits. an be tuned to use fewer pointers per node (on average). oesn t handle etreme changes in number of records well. ad worse case (similar to hashing), but depends on random generator rather than data. urden on memor management due to varing sie nodes.

esign ecision : Probabilit 0 < p 0.5 (0.5 tpical) for determining number of pointers in a new node: p k 1 = probabilit of a node having at least k pointers 1 = Epected number of pointers (used) in each node. (geometric sum) 1 p haracteristics: Low p Fewer pointers in structure Increased search time (alwas epected to be logarithmic). High p More pointers in structure ecreased search time