Skip lists: A randomized dictionary

Discrete Math for Bioiformatics WS 11/12:, by A. Bocmayr/K. Reiert, 31. Otober 2011, 09:53 3001 Sip lists: A radomized dictioary The expositio is based o the followig sources, which are all recommeded readig: 1. Pugh: Sip lists: a probabilistic alterative to balaced treed. Proceedigs WADS, LNCS 382, 1989, pp. 437-449 2. Sedgewic: Algorithme i C++, 2002, Pearsos, (Chapter 13.5) 3. Lecture Script from Michiel Smid, Uiversity of the Saarlad. 4. Motwai, Raghava: Radomized algorithms, Chapters 8.3 ad 4.1 5. Kleiberg, Tardos: Algorithm desig, Chapter 13.9 Itroductio Here a little refresher of sum formulas you will eed: (x + y) = ( ) x y ad for 0 < r < 1: r = 1 r +1 1 r r 1 = 1 r Itroductio We cosider the so called dictioary problem. Give a set S of real umbers, store them i a data structure such that the followig three opertios ca be performed efficietly: Search(x): Give the real umber x, report the maximal elemet of S { } that is at most equal to x. Isert(x): Give a real umber x isert it ito the data structure. Delete(x): Give a real umber x delete it from the data structure. The stadard data structures for this problem is the balaced biary tree. It supports all the above operatios i worst case time O(log) ad uses O() space. Well ow classes of balaced trees are for example AVL-trees, BB[α]-trees ad red-blac-trees. I order to maitai their worst case time behaviour all those data structures eed more or less elaborate rebalacig operatios which mae a implemetatio o-trivial, which i tur leads ot to the best practical ru times. We will itroduce i this lecture a alterative, radomized data structure, the sip list. It uses i expectatio liear space ad supports the above dictioary operatio i expected time O(log ) with high probability. Why do we do this? We will see that the data structure is coceptually much simpler ad more elegat tha balaced trees. Nevertheless we will exchage a worst case rutime agaist a expected ru time. However, the aalysis will show that sip lists behave very well ad are very fast i practice (the differece is similar to the determiistic merge sort ad the radomized quicsort algorithm). The goal of this lecture is to

3002 Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 itroduce you to the data structure show you how to aalyze the radomized ru time itroduce you to tail estimates usig Cheroff bouds. Sip lists Throughout the lecture we assume that we ca geerate radom, idepedet bits i uit time. Let S be a set of real umbers. The we costruct a sequece of sets S 1,S 2,... as follows: 1. For each elemet x S, flip a coi util zero comes up. 2. For each i 1, S i is the set of elemets i S for which we flipped the coi at least i times. Let h be the umber of sets that are costructed. The it is clear that /0 = S h S h 1 S h 2 S 2 S 1 = S The sip list for S cosists the of the followig: 1. For each 1 i h, the elemets of S i { } are stored i a sorted lied list L i. 2. For each 1 < i h, there is a poiter from each x L i to its occurrece i L i 1. Here is a example. Suppose S = {1,2,5,7,8,9,11,12,14,17,19,20}. Flippig cois might lead to S 1 = S, S 2 = {1,2,5,8,11,17,20}, S 3 = {2,5,11,20}, S 4 = {11}, ad S 5 = /0. Searchig sip lists We ca ow implemet the search for x as follows: 1. Let y h be the oly elemet i L h. 2. For i = h,h 1,...,2 (a) Follow the poiter from y i i L i to its occurece i L i 1.

Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 3003 (b) Startig i y i 1, wal to the right alog L i 1, util a elemet is reached that is larger tha x or the ed of L i 1 is reached. Let ow y i 1 be the last ecoutered elemet i L i 1 that is at most equal to x. 3. Output y 1. The followig figure illustrates the search step (we search for elemet 10): It is ot hard to imagie how the isertio ad deletio operatios wor o a sip list. Isertig ito a sip list For isertig a elemet x ito the dictioary we proceed as follows: 1. Ru the search algorithm for x. Let y 1,y 2,...,y h be the elemets of L 1,L 2,...,L h that are computed while searchig. If x = y 1, the x S ad othig has to be doe. Hece assume that x y 1. 2. Flip a coi util a zero comes up. Let l be the umber of coi flips. 3. For each 1 i mi(l,h), add x to the list L i immediately after y i. 4. If l h, the create ew lists L h+1,...,l l+1 storig the sets S h+1 { }, where each set cotais x except for S l+1 which is empty. 5. For each 1 < i l, give x i L i a poiter to its occurrece i L i 1. 6. If l h, the for each h + 1 i l + 1, give i L i a poiter to its occurrece i L i 1. 7. Set h = max(h,l + q).

3004 Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 Deletig from a sip list 1. Ru the search algorithm for x. Let y 1,y 2,...,y h be the elemets of L 1,L 2,...,L h that are computed while searchig. If x y 1, the x / S ad othig has to be doe. Hece assume that x = y 1. 2. For each 1 i h such that x = y i delete y i from the list L i. 3. For i = h,h 1,...: if L i 1 oly stores, delete the list L i ad set h = h 1. Why are sip lists efficiet? The ituitio We have see, that mostly what we do i sip lists is to search. The rebalacig is doe by throwig a coi a few times ad maig local chages alog the search path. How expesive is the search? It is the sum over all traversed path legth at each level. We expect there to be log levels. At each level we travel to the right. However for a fixed level we do ot expect to do this log, sice this would imply, that all the elemets are ot i the level above. Hece we expect to sped a costat amout of time at each level which would add up to a total search time of O(log). We will ow prove this more formally. Why are sip lists efficiet? The proofs The size of a sip list ad the ruig times of the search ad update algorithms are radom variables. We will prove that their expected values are boud by O() ad O(log ) respectively. Recall that h deotes the umber of sets S i that result from our probabilistic costructio. How ca we derive a upper boud for h? Let x be a elemet of S ad h(x) be the umber of sets S i that cotai x. The h(x) is a radom variable distributed acccordig to a geometric distributio with p = 1/2. Hece Pr(H(x) = ) = (1/2) ad E(h(x)) = 2. That meas if we loo at a specific elemet we oly expect it to be i S 1 ad S 2. Clearly h = 1+max{h(x) : x S}. From E(h(x)) = 2 for ay x S, however, we caot coclude that the expected value of h is three. We ca estimate E(h) as follows. Agai cosider a fixed x S. It follows that for ay 1, h(x) if ad oly if the first 1 coi flips produced a oe. That is Pr(h(x) ) = (1/2) 1. I additio it is clear that h + 1 if ad oly if there is a x S such that h(x). Hece Pr(h + 1) Pr(h(x) ) = 2 1 This estimate does ot mae sese for < 1 + log. For those values of we ca use the trivial upper boud Pr(h + 1) 1. The E(h) equals: log Pr(h + 1) = Pr(h + 1) + + log Pr(h + 1). (exercise: proof the first equality, that is E(X) = Pr(X ) for a radom variable X that taes values {0,1,2,...}.) The first summatio o the right had side is at most 1 + log. The secod sum ca be bouded from above by: 2 1 = (1/2) log 1 (1/2) log 1 2. + log Hece we have prove that E(h) 3 + log.

Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 3005 The expected size of a sip list ca easily be computed. Let M deote the total size of the sets S 1,S 2,...,S h. The M = x S h(x) ad by liearity of expectatio: E(M) = E(h(x)) = 2 = 2. x S x S If M deotes the total umber of odes i a sip list, the M is equal to M plus h. Hece E(M ) = E(M + h) = E(M) + E(h) 2 + 3 + log. What is left to do is to estimate the search costs. Let x be a real umber ad let C i deote the umber of elemets i the list L i that are ispected whe searchig for x (We do ot cout the elemet of L i at which the algorithm starts walig to the right. Hece, C i couts comparisos betwee x ad elemets of S.) The search cost is the proportioal to h i=1 (1 + C i). Agai we caot use liearity of expectatio sice h is a radom variable. Agai the tric is to fix a iteger A ad aalyze the search cost up to a level A ad above level A separately (ad differetly). We first estimate the search level above A, i.e., the total costs i the lists L A+1,L A+2,...,L h. Sice the cost is at most equal to the total size of these lists, its expected value is at most equal to the expected value of M A := h i=a+1 L i. How do we estimate this value? We first ote that the lists L i, A + 1 i h, form a sip list for S A+1. Hece we have: E(M A ) = E(M A S A+1 = ) Pr( S A+1 = ) where E(M A S A+1 = ) is the expected size of a sip list with elemets. We have already see that this is O(). Hece we oly eed to compute Pr( S A+1 = ). Sice S A+1 = if ad oly if out of the elemets of S exactly reach the level A + 1, we have: ( ) Pr( S A+1 = ) = ( 1 2 )A (1 ( 1 2 )A ). Settig p = 1 2 A, we ifer that the expected value of MA is proportioal to: ( ) ( ) p (1 p) 1 = p (1 p) 1 ( ) 1 = p p (1 p) 1 1 = p(p + (1 p)) 1 = p Hece the expected search cost above level A is bouded by O(/2 A ). Next we estimate the expected search cost i the lists L 1,L 2,...,L A. Recall that C i is the umber of elemets searched whe searchig for x. We use agai coditioal expectatio. Let l i (x) be the umber of elemets i L i that are at most equal to x. The E(C i ) = E(C i l i (x) = ) Pr(l i (x) = ). Assume that l i () =. Also assume that there is a elemet i L i that is larger tha x.

3006 Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 The C i = j if ad oly if the largest j 1 elemets of L i that are at most equal to x do ot appear i L i+1, but the elemet that immediately precedes these j 1 elemets does appear i L i+1. Hece Pr(C i = j l i (x) = ) ( 1 2 )j 1, 0 j. This iequality also holds if x is at least equal to the maximal elemet of L i. From this we obtai: E(C i l i (x) = ) = j=0 4. j Pr(C i = j l i (x) = ) j j=0 2 j 1 (exercise. Hit: write the sum j=0 j x j 1 as a derivative of j=0 x j, apply boudig) This, i tur implies that E(C i ) 4 Pr(l i (x) = ) = 4 It follows that the expected search cost up to level A is proportioal to: E( A i=1 (1 + C i )) = A (1 + E(C i )) 5A Summarizig we have show that the expected search time for elemet x is bouded by: O( 2 A + A). Settig A to log we obtai the required boud of O(log). Tail estimates: Cheroff bouds So far we proved bouds o the expected size, search time ad update time for a sip list. I this sectio we coder so called tail estimates. That is, we estimate the probability that the actual search time deviates sigificatly from its expected value. For example assume for a momet that the costat i the O(log) term for the search time is oe. The we wat to estimate the probability that the actual search time is at least t log. We could derive a estimate usig Marov s iequality. Lemma 1. Let X be a radom variable that taes o-egative values, ad let µ be the expected value of X. The for ay t > 0, Pr(X tµ) 1 t. Proof: Let s = tµ. The µ = x Pr(X = x) (3.1) x x Pr(X = x) x s s Pr(X = x) x s = s Pr(X s)

Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 3007 Hece the probability that the actual search time is at least t log is less tha or equal to 1/t. This is ot very impressive. The probablity that the search time is more tha 100 times its expected value is at most 1/100. So if this bouds was tight oe search i a hudred taes more tha 100 times the time of the average search. I this sectio we will see that Cheroff bouds give a much tighter estimate. We will prove that the probablity that the search time exceeds t log is less tha or equal to t/8 for t 5. Hece i a sip list of 1000 elemets, the probability that the search time is more tha 100 times its expected value is 10 38 which i practice meas, it will ever occur. (Eve for t = 50 the boud is still 10 19, ad for t = 10 the probability is still oly 2 10 4 ). Marov s iequality holds for ay o-egative radom variable. The Cheroff techique applies to radom variables X that ca be writte as the sum i=1 X i of mutually idepedet radom variables X i. (Variables are called (mutually) idepedet if their joit desity fuctio is the product of the idividual desity fuctios. Beware that mutual idepedece is differet tha pairwise idepedece! (exercise)). I such cases much better bouds ca be obtaied. So let X 1,X 2,X 3...,X be a sequece of mutually idepedet radom variables ad let X = i=1 X i. The momet geeratig fuctio (mgf) for a (discrete) radom variable Y is defied as m Y (λ) = E(e λy ) = e λy Pr(Y = y) y As the ame suggests the fuctio is used to easily geerate the momets of the radom variable Y. Clearly m(0) = 1 ad it is easy to show that µ = m (0) ad σ 2 = m (0) µ 2 (exercise). I the case of X, which is a sum of idepedet variables, m X (λ) = i=1 m X i (λ) ad of course the mea value of X is the derivate of the mgf at positio 0 which is simply the product of all meas of the X i. Or writte dow: E(e λx ) = E(e λ(x 1+ +X ) ) = i=1 E(e λx i ). Now let s > 0 ad λ > 0. Sice X s if ad oly if e λx e λs, we have Pr(X s) = Pr(e λx e λs ). By applyig Marov s iequality to the o egative radom variable e λx, we get This yields: Pr(X s) = Pr(e λx e λs ) e λs E(e λx ). Pr(X s) e λs i=1 E(e λx i ), for s > 0 ad λ > 0. This is the basic iequality we wor with. To estimate Pr(X s) we eed boud o E(e λx i ). Of course those bouds deped o the probability distributio of X i. We will ow illustrate the techique usig the geometric distributio with parameter p = 1/2. Let T be the umber of flips we eed util a oe comes up i a series of coi flips. The Pr(T = ) = (1/2) for 1 ad E(T ) = 2. Now assume we are iterested i T which is the umber of flips we eed util we obtai a oe exactly times (i.e. T = T 1 ). If we defie the radom variables X i as the umber of flips betwee the (i 1) st (excludig) ad the i-th oe (icludig), the X i is distributed accordig to a geometric distributio. (This property is also called the memoryless property of the geometric or expoetial distributio). The T = i=1 X i, where each X i is distributed accordig to a geometric distributio ad the expected value is E(T ) = 2, ad Marov s iequality gives Pr(T (2 + t) 2 2+t ). For 0 < λ < log2 we have E(e λx i ) = e λ Pr(X i = ) = (e λ /2) e λ = 2 e λ

3008 Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 We ow apply our basic iequality with s = (2 + t), where t > 0 ad get Pr(T (2 + t)) e λ(2+t) ( e λ 2 e λ ) = ( e λ(1+t) 2 e λ ) Now we choose λ such that the term o the right had side is miimized (exercise) ad fid λ = log(1 + t 2+t ). Hece we have Sice 1 x e x for all x, we have Pr(T (2 + t)) (1 + t/2) (1 t 2 + 2t )(1+t). (1 t 2 + 2t )1+t (e t 2+2t ) 1+t = e t/2. Moreover, 1 + t/2 e t/4 for t 3. This proves that for t 3 Pr(T (2 + t)) e t/4 e t/2 = e t/4. Compare this with the boud obtaied from Marov s iequality (which was 2 2+t )! We ca subsume our fidig i the followig theorem: Theorem 2. Let X 1,X 2,...,X be mutually idpedet radom variables ad assume that each X i is distributed accordig to a geometric distributio. Let T = X i, the E(T ) = 2 ad for ay t 3 holds: Pr(T (2 + t)) e t/4 e t/2 = e t/4. Corollary 3. Let c 1 be a costat ad let m be a positive iteger. Further let = c lm. The for ay s 5 it holds Pr(T s) m (s 2)c 4.