Lecture 6: Coupon Collector s problem

Radomized Algorithms Lecture 6: Coupo Collector s problem Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 1 / 16

Variace: key features Defiitio: V ar(x) = E[(X µ) 2 ] = x (x µ) 2 Pr{X = x} where µ = E[X] = x x Pr{X = x} We call stadard deviatio of X the σ = V ar(x) Basic Properties: (i) V ar(x) = E[X 2 ] E 2 [X] (ii) V ar(cx) = c 2 V ar(x), where c costat. (iii) V ar(x + c) = V ar(x), where c costat. proof of (i): V ar(x) = E[(X µ) 2 ] = E[X 2 2µX + µ 2 ] = E[X 2 ] + E[ 2µX] + E[µ 2 ] = E[X 2 ] 2µE[X] + µ 2 = E[X 2 ] µ 2 Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 2 / 16

O the Additivity of Variace I geeral the variace of a sum of radom variables is ot equal to the sum of their variaces However, variaces do add for idepedet variables (i.e. mutually idepedet variables). Actually pairwise idepedece suffices. Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 3 / 16

Coditioal distributios Let X, Y be discrete radom variables. Their joit probability desity fuctio is f(x, y) = Pr{(X = x) (Y = y)} Clearly f 1 (x) = Pr{X = x} = y f(x, y) ad f 2 (y) = Pr{Y = y} = x f(x, y) Also, the coditioal probability desity fuctio is: Pr{(X = x) (Y = y)} f(x y) = Pr{X = x Y = y} = Pr{Y = y} f(x, y) f(x, y) = = f 2 (y) f(x, y) x = Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 4 / 16

Pairwise idepedece Let radom variables X 1, X 2,..., X. These are called pairwise idepedet iff for all i j it is Pr{(X i = x) (X j = y)} = Pr{X i = x}, x, y Equivaletly, Pr{(X i = x) (X j = y)} = = Pr{X i = x} Pr{X j = y}, x, y Geeralizig, the collectio is k-wise idepedet iff, for every subset I {1, 2,..., } with I < k for every set of values { {a i }, b ad j / I, it is Pr X j = b i I X i = a i } = Pr{X j = b} Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 5 / 16

Mutual (or full ) idepedece The radom variables X 1, X 2,..., X are mutually idepedet iff for ay subset X i1, X i2,..., X ik, (2 k ) of them, it is Pr{(X i1 = x 1 ) (X i2 = x 2 ) (X ik = x k )} = = Pr{X i1 = x 1 } Pr{X i2 = x 2 } Pr{X ik = x k } Example (for = 3). Let A 1, A 2, A 3 3 evets. They are mutually idepedet iff all four equalities hold: Pr{A 1 A 2 } = Pr{A 1 } Pr{A 2 } (1) Pr{A 2 A 3 } = Pr{A 2 } Pr{A 3 } (2) Pr{A 1 A 3 } = Pr{A 1 } Pr{A 3 } (3) Pr{A 1 A 2 A 3 } = Pr{A 1 } Pr{A 2 } Pr{A 3 } (4) They are called pairwise idepedet if (1), (2), (3) hold. Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 6 / 16

The Coupo Collector s problem There are distict coupos ad at each trial a coupo is chose uiformly at radom, idepedetly of previous trials. Let m the umber of trials. Goal: establish relatioships betwee the umber m of trials ad the probability of havig chose each oe of the coupos at least oce. Note: the problem is similar to occupacy (umber of balls so that o bi is empty). Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 7 / 16

The expected umber of trials eeded (I) Let X the umber of trials (a radom variable) eeded to collect all coupos at least oce each. Let C 1, C 2,..., C X the sequece of trials, where C i {1,..., } deotes the coupo type chose at trial i. We call the ith trial a success if coupo type chose at C i was ot draw i ay of the first i 1 trials (obviously C 1 ad C X are always successes). We divide the sequece of trials ito epochs, where epoch i begis with the trial followig the ith success ad eds with the trial at which the (i + 1)st success takes place. Let r.v. X i (0 i 1) be the umber of trials i the ith epoch. Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 8 / 16

The expected umber of trials eeded (II) 1 Clearly, X = i=0 X i Let p i the probability of success at ay trial of the ith epoch. This is the probability of choosig oe of the i remaiig coupo types, so: p i = i Clearly, X i follows a geometric distributio with parameter p i, so E[X i ] = 1 p i ad V ar(x i ) = 1 p i p 2 i By liearity of expectatio: E[X] = E [ 1 ] 1 X i = E[X i ] = i=0 i=0 1 i=0 i = = H But H l + Θ(1) E[X] l + Θ() 1 i = Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 9 / 16

The variace of the umber of eeded trials Sice the X i s are idepedet, we have: 1 1 V ar(x) = V ar(x i ) = = 2 Sice lim i=0 1 i 2 1 i 1 i 2 = π2 6 i=0 i ( i) 2 = ( i) i 2 = we get V ar(x) π2 6 2 Cocetratio aroud the expectatio The Chebyshev iequality does ot provide a strog result: For β > 1, Pr{X > β l } = Pr{X l > (β 1) l } Pr{ X l > (β 1) l } V ar(x) (β 1) 2 2 l 2 2 2 l 2 = 1 l 2 Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 10 / 16

Stroger cocetratio aroud the expectatio Let Ei r the evet: coupo type i is ot collected durig the first r trials. The Pr{Ei r} = (1 1 )r e r For r = β l we get By the uio boud we { have Pr{X > r} = Pr Pr{E r i E r i } β l } e = β (i.e. at least oe coupo is ot selected), so Pr{X > r} Pr{Ei r } β = (β 1) = ɛ, where ɛ = β 1 > 0 Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 11 / 16

Sharper cocetratio aroud the mea - a heuristic argumet Biomial distributio (#successes i idepedet trials each oe with success probability p) X B(, p) Pr{X = k} = ( ) k p k (1 p) k (k = 0, 1, 2,..., ) E(X) = p, V ar(x) = p(1 p) Poisso distributio) X P (λ) Pr{X = x} = e E(X) = V ar(x) = λ λ λx x! (x = 0, 1,... ) Approximatio: It is B(, p) P (λ), where λ = p. For large, the approximatio of the biomial by the Poisso is good. Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 12 / 16

Towards the sharp cocetratio result Let N r i = umber of times coupo i chose durig the first r trials. The E r i is equivalet to the evet {N r i = 0}. Clearly Ni r B ( r, ) 1, thus Pr{Ni r = x} = ( ( r 1 ) x ( ) x) 1 1 r x Let λ a positive real umber. A r.v. Y is P (λ) Pr{Y = y} = e λ λy y! As said, for suitable small λ ad as r approaches, P ( ) r is a good approximatio of B ( r, ) 1. Thus Pr{Ei r} = Pr{N i r = 0} e λ λ0 0! = e λ = e r (fact 1) Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 13 / 16

A iformal argumet o idepedece We will ow claim that the Ei r (1 i ) evets are almost idepedet, (although it is obvious that there is some depedece betwee them; but we are ayway headig towards a heuristic). Claim 1. For 1 i, ad ay set if idices {j 1,..., j k } ot cotaiig { i, Pr Ei r } k l=1 Er j l Pr{Ei r} { } ( Proof: Pr Ei r k Pr {E Ej r i r k )} ( ) l=1 Er jl 1 k+1 r l = { k = ( ) Pr l=1 Er 1 k r jl} l=1 r(k+1) e e rk = e r Pr{E r i } Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 14 / 16

A approximatio of the probability Because { of fact 1 ad Claim 1, we have: } { } Pr = Pr (1 e m ) e e m E m i E m i For m = (l + c) = l + c, for ay costat c R, we the get { } { } Pr{X > m = l + c} = Pr Pr = 1 e e c E m i The above probability: - is close to 0, for large positive c - is close to 1, for large egative c Thus the probability of havig collected all coupos, rapidly chages from early 0 to almost 1 i a small iterval catered aroud l (!) Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 15 / 16 E m i

The rigorous result Theorem: Let X the r.v. coutig the umber of trials for havig collected each oe of the coupos at least oce. The, for ay costat c R ad m = (l + c) it is lim Pr{X > m} = 1 e e c Note 1. The proof uses the Boole-Boferroi iequalities for iclusio-exclusio i the probability of a uio of evets. Note 2. The power of the Poisso heuristic is that it gives a quick, approximative estimatio of probabilities ad offers some ituitive isight towards the accurate behaviour of the ivolved quatities. Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 16 / 16