Skip lists: A randomized dictionary

Similar documents
An Introduction to Randomized Algorithms

Hashing and Amortization

1 Hash tables. 1.1 Implementation

Skip Lists. Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 S 3 S S 1

Problem Set 2 Solutions

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

CS 330 Discussion - Probability

Lecture 2: April 3, 2013

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Lecture 3: August 31

11. Hash Tables. m is not too large. Many applications require a dynamic set that supports only the directory operations INSERT, SEARCH and DELETE.

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

4.3 Growth Rates of Solutions to Recurrences

CS / MCS 401 Homework 3 grader solutions

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Lecture 7: Properties of Random Samples

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Definitions: Universe U of keys, e.g., U N 0. U very large. Set S U of keys, S = m U.

Rademacher Complexity

Basics of Probability Theory (for Theory of Computation courses)

6.3 Testing Series With Positive Terms

Design and Analysis of Algorithms

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Lecture 9: Hierarchy Theorems


The Maximum-Likelihood Decoding Performance of Error-Correcting Codes

HOMEWORK 2 SOLUTIONS

Analysis of Algorithms. Introduction. Contents

( ) = p and P( i = b) = q.

CS284A: Representations and Algorithms in Molecular Biology

Optimally Sparse SVMs

Lecture 12: November 13, 2018

Discrete Mathematics and Probability Theory Fall 2016 Walrand Probability: An Overview

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Learning Theory: Lecture Notes

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code:

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Lecture 9: Expanders Part 2, Extractors

Math 155 (Lecture 3)

Sorting Algorithms. Algorithms Kyuseok Shim SoEECS, SNU.

19.1 The dictionary problem

sin(n) + 2 cos(2n) n 3/2 3 sin(n) 2cos(2n) n 3/2 a n =

Final Review for MATH 3510

Lecture 11: Pseudorandom functions

Lecture 2: Monte Carlo Simulation

7.1 Convergence of sequences of random variables

4. Partial Sums and the Central Limit Theorem

Expectation and Variance of a random variable

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

IP Reference guide for integer programming formulations.

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

Topic 9: Sampling Distributions of Estimators

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Math 216A Notes, Week 5

Random Models. Tusheng Zhang. February 14, 2013

Lecture 2: Concentration Bounds

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

CS161: Algorithm Design and Analysis Handout #10 Stanford University Wednesday, 10 February 2016

Posted-Price, Sealed-Bid Auctions

ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data

Shannon s noiseless coding theorem

7.7 Hashing. 7.7 Hashing. Perfect Hashing. Direct Addressing

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Lecture 2 Long paths in random graphs

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc)

Lecture 5: April 17, 2013

1 Statement of the Game

Intro to Learning Theory

Disjoint set (Union-Find)

Lecture 4 February 16, 2016

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

Chapter 6 Principles of Data Reduction

7.1 Convergence of sequences of random variables

HOMEWORK I: PREREQUISITES FROM MATH 727

MAT1026 Calculus II Basic Convergence Tests for Series

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Recursive Algorithm for Generating Partitions of an Integer. 1 Preliminary

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

The Random Walk For Dummies

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

Infinite Sequences and Series

Solutions to Tutorial 5 (Week 6)

Linear Regression Demystified

LONG SNAKES IN POWERS OF THE COMPLETE GRAPH WITH AN ODD NUMBER OF VERTICES

Lecture 7: October 18, 2017

Advanced Stochastic Processes.

Lecture 19: Convergence

REGRESSION WITH QUADRATIC LOSS

Transcription:

Discrete Math for Bioiformatics WS 11/12:, by A. Bocmayr/K. Reiert, 31. Otober 2011, 09:53 3001 Sip lists: A radomized dictioary The expositio is based o the followig sources, which are all recommeded readig: 1. Pugh: Sip lists: a probabilistic alterative to balaced treed. Proceedigs WADS, LNCS 382, 1989, pp. 437-449 2. Sedgewic: Algorithme i C++, 2002, Pearsos, (Chapter 13.5) 3. Lecture Script from Michiel Smid, Uiversity of the Saarlad. 4. Motwai, Raghava: Radomized algorithms, Chapters 8.3 ad 4.1 5. Kleiberg, Tardos: Algorithm desig, Chapter 13.9 Itroductio Here a little refresher of sum formulas you will eed: (x + y) = ( ) x y ad for 0 < r < 1: r = 1 r +1 1 r r 1 = 1 r Itroductio We cosider the so called dictioary problem. Give a set S of real umbers, store them i a data structure such that the followig three opertios ca be performed efficietly: Search(x): Give the real umber x, report the maximal elemet of S { } that is at most equal to x. Isert(x): Give a real umber x isert it ito the data structure. Delete(x): Give a real umber x delete it from the data structure. The stadard data structures for this problem is the balaced biary tree. It supports all the above operatios i worst case time O(log) ad uses O() space. Well ow classes of balaced trees are for example AVL-trees, BB[α]-trees ad red-blac-trees. I order to maitai their worst case time behaviour all those data structures eed more or less elaborate rebalacig operatios which mae a implemetatio o-trivial, which i tur leads ot to the best practical ru times. We will itroduce i this lecture a alterative, radomized data structure, the sip list. It uses i expectatio liear space ad supports the above dictioary operatio i expected time O(log ) with high probability. Why do we do this? We will see that the data structure is coceptually much simpler ad more elegat tha balaced trees. Nevertheless we will exchage a worst case rutime agaist a expected ru time. However, the aalysis will show that sip lists behave very well ad are very fast i practice (the differece is similar to the determiistic merge sort ad the radomized quicsort algorithm). The goal of this lecture is to

3002 Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 itroduce you to the data structure show you how to aalyze the radomized ru time itroduce you to tail estimates usig Cheroff bouds. Sip lists Throughout the lecture we assume that we ca geerate radom, idepedet bits i uit time. Let S be a set of real umbers. The we costruct a sequece of sets S 1,S 2,... as follows: 1. For each elemet x S, flip a coi util zero comes up. 2. For each i 1, S i is the set of elemets i S for which we flipped the coi at least i times. Let h be the umber of sets that are costructed. The it is clear that /0 = S h S h 1 S h 2 S 2 S 1 = S The sip list for S cosists the of the followig: 1. For each 1 i h, the elemets of S i { } are stored i a sorted lied list L i. 2. For each 1 < i h, there is a poiter from each x L i to its occurrece i L i 1. Here is a example. Suppose S = {1,2,5,7,8,9,11,12,14,17,19,20}. Flippig cois might lead to S 1 = S, S 2 = {1,2,5,8,11,17,20}, S 3 = {2,5,11,20}, S 4 = {11}, ad S 5 = /0. Searchig sip lists We ca ow implemet the search for x as follows: 1. Let y h be the oly elemet i L h. 2. For i = h,h 1,...,2 (a) Follow the poiter from y i i L i to its occurece i L i 1.

Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 3003 (b) Startig i y i 1, wal to the right alog L i 1, util a elemet is reached that is larger tha x or the ed of L i 1 is reached. Let ow y i 1 be the last ecoutered elemet i L i 1 that is at most equal to x. 3. Output y 1. The followig figure illustrates the search step (we search for elemet 10): It is ot hard to imagie how the isertio ad deletio operatios wor o a sip list. Isertig ito a sip list For isertig a elemet x ito the dictioary we proceed as follows: 1. Ru the search algorithm for x. Let y 1,y 2,...,y h be the elemets of L 1,L 2,...,L h that are computed while searchig. If x = y 1, the x S ad othig has to be doe. Hece assume that x y 1. 2. Flip a coi util a zero comes up. Let l be the umber of coi flips. 3. For each 1 i mi(l,h), add x to the list L i immediately after y i. 4. If l h, the create ew lists L h+1,...,l l+1 storig the sets S h+1 { }, where each set cotais x except for S l+1 which is empty. 5. For each 1 < i l, give x i L i a poiter to its occurrece i L i 1. 6. If l h, the for each h + 1 i l + 1, give i L i a poiter to its occurrece i L i 1. 7. Set h = max(h,l + q).

3004 Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 Deletig from a sip list 1. Ru the search algorithm for x. Let y 1,y 2,...,y h be the elemets of L 1,L 2,...,L h that are computed while searchig. If x y 1, the x / S ad othig has to be doe. Hece assume that x = y 1. 2. For each 1 i h such that x = y i delete y i from the list L i. 3. For i = h,h 1,...: if L i 1 oly stores, delete the list L i ad set h = h 1. Why are sip lists efficiet? The ituitio We have see, that mostly what we do i sip lists is to search. The rebalacig is doe by throwig a coi a few times ad maig local chages alog the search path. How expesive is the search? It is the sum over all traversed path legth at each level. We expect there to be log levels. At each level we travel to the right. However for a fixed level we do ot expect to do this log, sice this would imply, that all the elemets are ot i the level above. Hece we expect to sped a costat amout of time at each level which would add up to a total search time of O(log). We will ow prove this more formally. Why are sip lists efficiet? The proofs The size of a sip list ad the ruig times of the search ad update algorithms are radom variables. We will prove that their expected values are boud by O() ad O(log ) respectively. Recall that h deotes the umber of sets S i that result from our probabilistic costructio. How ca we derive a upper boud for h? Let x be a elemet of S ad h(x) be the umber of sets S i that cotai x. The h(x) is a radom variable distributed acccordig to a geometric distributio with p = 1/2. Hece Pr(H(x) = ) = (1/2) ad E(h(x)) = 2. That meas if we loo at a specific elemet we oly expect it to be i S 1 ad S 2. Clearly h = 1+max{h(x) : x S}. From E(h(x)) = 2 for ay x S, however, we caot coclude that the expected value of h is three. We ca estimate E(h) as follows. Agai cosider a fixed x S. It follows that for ay 1, h(x) if ad oly if the first 1 coi flips produced a oe. That is Pr(h(x) ) = (1/2) 1. I additio it is clear that h + 1 if ad oly if there is a x S such that h(x). Hece Pr(h + 1) Pr(h(x) ) = 2 1 This estimate does ot mae sese for < 1 + log. For those values of we ca use the trivial upper boud Pr(h + 1) 1. The E(h) equals: log Pr(h + 1) = Pr(h + 1) + + log Pr(h + 1). (exercise: proof the first equality, that is E(X) = Pr(X ) for a radom variable X that taes values {0,1,2,...}.) The first summatio o the right had side is at most 1 + log. The secod sum ca be bouded from above by: 2 1 = (1/2) log 1 (1/2) log 1 2. + log Hece we have prove that E(h) 3 + log.

Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 3005 The expected size of a sip list ca easily be computed. Let M deote the total size of the sets S 1,S 2,...,S h. The M = x S h(x) ad by liearity of expectatio: E(M) = E(h(x)) = 2 = 2. x S x S If M deotes the total umber of odes i a sip list, the M is equal to M plus h. Hece E(M ) = E(M + h) = E(M) + E(h) 2 + 3 + log. What is left to do is to estimate the search costs. Let x be a real umber ad let C i deote the umber of elemets i the list L i that are ispected whe searchig for x (We do ot cout the elemet of L i at which the algorithm starts walig to the right. Hece, C i couts comparisos betwee x ad elemets of S.) The search cost is the proportioal to h i=1 (1 + C i). Agai we caot use liearity of expectatio sice h is a radom variable. Agai the tric is to fix a iteger A ad aalyze the search cost up to a level A ad above level A separately (ad differetly). We first estimate the search level above A, i.e., the total costs i the lists L A+1,L A+2,...,L h. Sice the cost is at most equal to the total size of these lists, its expected value is at most equal to the expected value of M A := h i=a+1 L i. How do we estimate this value? We first ote that the lists L i, A + 1 i h, form a sip list for S A+1. Hece we have: E(M A ) = E(M A S A+1 = ) Pr( S A+1 = ) where E(M A S A+1 = ) is the expected size of a sip list with elemets. We have already see that this is O(). Hece we oly eed to compute Pr( S A+1 = ). Sice S A+1 = if ad oly if out of the elemets of S exactly reach the level A + 1, we have: ( ) Pr( S A+1 = ) = ( 1 2 )A (1 ( 1 2 )A ). Settig p = 1 2 A, we ifer that the expected value of MA is proportioal to: ( ) ( ) p (1 p) 1 = p (1 p) 1 ( ) 1 = p p (1 p) 1 1 = p(p + (1 p)) 1 = p Hece the expected search cost above level A is bouded by O(/2 A ). Next we estimate the expected search cost i the lists L 1,L 2,...,L A. Recall that C i is the umber of elemets searched whe searchig for x. We use agai coditioal expectatio. Let l i (x) be the umber of elemets i L i that are at most equal to x. The E(C i ) = E(C i l i (x) = ) Pr(l i (x) = ). Assume that l i () =. Also assume that there is a elemet i L i that is larger tha x.

3006 Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 The C i = j if ad oly if the largest j 1 elemets of L i that are at most equal to x do ot appear i L i+1, but the elemet that immediately precedes these j 1 elemets does appear i L i+1. Hece Pr(C i = j l i (x) = ) ( 1 2 )j 1, 0 j. This iequality also holds if x is at least equal to the maximal elemet of L i. From this we obtai: E(C i l i (x) = ) = j=0 4. j Pr(C i = j l i (x) = ) j j=0 2 j 1 (exercise. Hit: write the sum j=0 j x j 1 as a derivative of j=0 x j, apply boudig) This, i tur implies that E(C i ) 4 Pr(l i (x) = ) = 4 It follows that the expected search cost up to level A is proportioal to: E( A i=1 (1 + C i )) = A (1 + E(C i )) 5A Summarizig we have show that the expected search time for elemet x is bouded by: O( 2 A + A). Settig A to log we obtai the required boud of O(log). Tail estimates: Cheroff bouds So far we proved bouds o the expected size, search time ad update time for a sip list. I this sectio we coder so called tail estimates. That is, we estimate the probability that the actual search time deviates sigificatly from its expected value. For example assume for a momet that the costat i the O(log) term for the search time is oe. The we wat to estimate the probability that the actual search time is at least t log. We could derive a estimate usig Marov s iequality. Lemma 1. Let X be a radom variable that taes o-egative values, ad let µ be the expected value of X. The for ay t > 0, Pr(X tµ) 1 t. Proof: Let s = tµ. The µ = x Pr(X = x) (3.1) x x Pr(X = x) x s s Pr(X = x) x s = s Pr(X s)

Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 3007 Hece the probability that the actual search time is at least t log is less tha or equal to 1/t. This is ot very impressive. The probablity that the search time is more tha 100 times its expected value is at most 1/100. So if this bouds was tight oe search i a hudred taes more tha 100 times the time of the average search. I this sectio we will see that Cheroff bouds give a much tighter estimate. We will prove that the probablity that the search time exceeds t log is less tha or equal to t/8 for t 5. Hece i a sip list of 1000 elemets, the probability that the search time is more tha 100 times its expected value is 10 38 which i practice meas, it will ever occur. (Eve for t = 50 the boud is still 10 19, ad for t = 10 the probability is still oly 2 10 4 ). Marov s iequality holds for ay o-egative radom variable. The Cheroff techique applies to radom variables X that ca be writte as the sum i=1 X i of mutually idepedet radom variables X i. (Variables are called (mutually) idepedet if their joit desity fuctio is the product of the idividual desity fuctios. Beware that mutual idepedece is differet tha pairwise idepedece! (exercise)). I such cases much better bouds ca be obtaied. So let X 1,X 2,X 3...,X be a sequece of mutually idepedet radom variables ad let X = i=1 X i. The momet geeratig fuctio (mgf) for a (discrete) radom variable Y is defied as m Y (λ) = E(e λy ) = e λy Pr(Y = y) y As the ame suggests the fuctio is used to easily geerate the momets of the radom variable Y. Clearly m(0) = 1 ad it is easy to show that µ = m (0) ad σ 2 = m (0) µ 2 (exercise). I the case of X, which is a sum of idepedet variables, m X (λ) = i=1 m X i (λ) ad of course the mea value of X is the derivate of the mgf at positio 0 which is simply the product of all meas of the X i. Or writte dow: E(e λx ) = E(e λ(x 1+ +X ) ) = i=1 E(e λx i ). Now let s > 0 ad λ > 0. Sice X s if ad oly if e λx e λs, we have Pr(X s) = Pr(e λx e λs ). By applyig Marov s iequality to the o egative radom variable e λx, we get This yields: Pr(X s) = Pr(e λx e λs ) e λs E(e λx ). Pr(X s) e λs i=1 E(e λx i ), for s > 0 ad λ > 0. This is the basic iequality we wor with. To estimate Pr(X s) we eed boud o E(e λx i ). Of course those bouds deped o the probability distributio of X i. We will ow illustrate the techique usig the geometric distributio with parameter p = 1/2. Let T be the umber of flips we eed util a oe comes up i a series of coi flips. The Pr(T = ) = (1/2) for 1 ad E(T ) = 2. Now assume we are iterested i T which is the umber of flips we eed util we obtai a oe exactly times (i.e. T = T 1 ). If we defie the radom variables X i as the umber of flips betwee the (i 1) st (excludig) ad the i-th oe (icludig), the X i is distributed accordig to a geometric distributio. (This property is also called the memoryless property of the geometric or expoetial distributio). The T = i=1 X i, where each X i is distributed accordig to a geometric distributio ad the expected value is E(T ) = 2, ad Marov s iequality gives Pr(T (2 + t) 2 2+t ). For 0 < λ < log2 we have E(e λx i ) = e λ Pr(X i = ) = (e λ /2) e λ = 2 e λ

3008 Siplists: A radomized dictioary, by Kut Reiert, 31. Otober 2011, 09:53 We ow apply our basic iequality with s = (2 + t), where t > 0 ad get Pr(T (2 + t)) e λ(2+t) ( e λ 2 e λ ) = ( e λ(1+t) 2 e λ ) Now we choose λ such that the term o the right had side is miimized (exercise) ad fid λ = log(1 + t 2+t ). Hece we have Sice 1 x e x for all x, we have Pr(T (2 + t)) (1 + t/2) (1 t 2 + 2t )(1+t). (1 t 2 + 2t )1+t (e t 2+2t ) 1+t = e t/2. Moreover, 1 + t/2 e t/4 for t 3. This proves that for t 3 Pr(T (2 + t)) e t/4 e t/2 = e t/4. Compare this with the boud obtaied from Marov s iequality (which was 2 2+t )! We ca subsume our fidig i the followig theorem: Theorem 2. Let X 1,X 2,...,X be mutually idpedet radom variables ad assume that each X i is distributed accordig to a geometric distributio. Let T = X i, the E(T ) = 2 ad for ay t 3 holds: Pr(T (2 + t)) e t/4 e t/2 = e t/4. Corollary 3. Let c 1 be a costat ad let m be a positive iteger. Further let = c lm. The for ay s 5 it holds Pr(T s) m (s 2)c 4.