A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis of radomized algorithms. For a quick overview of probability ad terms associated with it, the reader is advised to see the Appedix. 1 Radomized QuickSort I this sectio, we preset the classic quick sort algorithm ad compute the expected ruig time of the algorithm. We assume that the elemets of the set are all distict. Below is the radomized quick sort algorithm. Algorithm RadQuickSort(S) Choose a pivot elemet x i u.a.r from S = {x 1, x 2,, x } Split the set S ito two subsets S 1 = {x j x j < x i } ad S 2 = {x j x j > x i } by comparig each x j with the chose x i Recurse o sets S 1 ad S 2 Output the sorted set S 1 the x i ad the sorted S 2. ed Algorithm Aalysis Let T() be the umber of steps take by the RadQuickSort algorithm o a set of size. Note that the maximum value of T() occurs whe the pivot elemet x i is the largest/smallest elemet of the remaiig set durig each recursive call of the algorithm. I this case, T() = + ( 1) + + 1 = O( 2 ). This value of T() is reached with a very low probability of 1 1 1 1 2 = 1!. Also, the best case occurs whe the pivot elemet splits the set S ito two equal sized subsets ad the T() = O( l ). This implies that T() has a distributio betwee O( l ) ad O( 2 ). Now we derive the expected value of T(). Note that if the i th smallest elemet is chose as the pivot elemet the S 1 ad S 2 will be of sizes i 1 ad i 1 respectively ad this choice has a probability of 1. The recurrece relatio for T() is: T() = + T(X) + T( 1 X) (1) where, Pr[X = i] = 1 for i {0, 1,, 1}. Hece, Pr[ 1 X = i] = Pr[X = 1 i] = 1. However, it is icorrect to write T( 1 X) = T(X). Takig expectatios o both sides of (1), E[T()] = + 1 1 E[T(i)] + 1 1 j=1 E[T(j)] = + 2 1 E[T(i)]. Let f(i) = E[T(i)]. The, f() = + 2 1 f(i). Simplifyig, Substitutig 1 for i (2), f() = 2 + 2(f(1) + f(2) + + f( 1)) (2) ( 1)f( 1) = ( 1) 2 + 2(f(1) + f(2) + + f( 2)) (3) Subtractig (3) from (2), we get f() ( 1)f( 1) = (2 1)+2f( 1) or f() = +1 2 1 f( 1)+. We prove by iductio that f() 2 l. 1
1.0.1 Claim: f() 2 l Proof: Iductio basis: = 1, the claim holds. Let the claim hold for all values up to 1. The, f() = +1 2 1 f( 1) + +1 2 1 2( 1)l( 1) + = 2(2 1) = 2(2 1) l( 1) + 2 1 (l + l(1 1 2 1 )) + We make use of the stadard iequality stated below. by iductio hypothesis Fact 1.1 1 + x e x for x R. For example, 1 + 0.2 e 0.2 ad 1 0.2 e 0.2. f() 2(2 1) (l 1 ) + 2 1 = 2 l 2 l 2 + 2 + 2 1 2 2 l, establishig the iductive step. Hece, the expected ruig time of the radomized quick sort algorithm is O( l ). But oe of the limitatios of the recurrece relatio approach is that we do ot how the ruig time of the algorithm is spread aroud its expected value. Ca this aalysis be exteded to aswer questios such as, with what probability does the algorithm RadQuickSort eeds more tha 12 l time steps? Later o, we apply a differet techique ad establish that this probability is very small. Similarly, for the case of the Maximum algorithm, by solvig the recurrece relatio for the expected ruig time, we will ot be able to aswer questios of the form, what is the probability that the ru time is greater tha 3? Aswers to such questios give how close the radom variable is spread aroud its expected value or how varied the distributio is. To be able to aswer such queries, we study Tail iequalities i the ext sectio. 2 Tail Iequalities I this sectio, we study three ways to estimate the tail probabilities of radom variables. It will be oted that, the more iformatio we kow about the radom variable the better the estimate we ca derive about a give tail probability. 2.1 Markov Iequality Theorem 2.1 Markov Iequality If X is a o-egative valued radom variable with a expectatio of µ, the Pr[X cµ] 1 c. Proof. By defiitio, µ = a apr[x = a] = a<cµ apr[x = a] + a cµ apr[x = a] 0 + a cµ cµpr[x = a] as X is o-egative valued = cµ a cµ Pr[X = a] = cµpr[x cµ] Hece, Pr[X cµ] µ cµ = 1 c. The kowledge of the stadard deviatio of the radom variable X would most ofte give a better boud. 2.2 Chebychev Iequality We first defie the terms stadard deviatio ad variace of a radom variable X. Defiitio 2.2 Let X be a radom variable with a expectatio of µ. The variace of X, deoted by var(x), is defied as var(x) = E[(X µ) 2 ]. The stadard deviatio of X, deoted by σ X, is defied as σ X = var(x). Note that by defiitio, var(x) = E[(X µ) 2 ] = E[X 2 2Xµ+µ 2 ] = E[X 2 ] µ 2. The secod equality follows from the liearity of expectatios. 2
Theorem 2.3 Chebychev Iequality Let X be a radom variable with expectatio µ X ad stadard deviatio σ X. The, Pr[ X µ X cσ X ] 1 c 2. Proof. Let radom variable Y = (X µ X ) 2. The, E[Y ] = E[(X µ X ) 2 ] = σx 2 by defiitio ad also Y is a o-egative valued radom variable. Now, Pr[ X µ X cσ X ] = Pr[(X µ X ) 2 c 2 σx 2 ] = Pr[Y c2 σx 2 ]. Applyig Markov Iequality to the radom variable Y, Pr[Y c 2 σx 2 ] = Pr[Y c2 µ Y ] 1 c. Hece the 2 theorem. 2.3 Cheroff Bouds The tail estimates give by Theorem 2.1, Theorem 2.3 work for radom variables i geeral. But if the radom variable X ca be expressed as a sum of idepedet radom variables each of which is 0, 1 valued, the we ca obtai very tight bouds o the tail estimates. This is expressed i the followig theorem ad the bouds are commoly called as Cheroff Bouds. Theorem 2.4 Cheroff Bouds: Let X be a radom variable defied as X = X 1 + X 2 + + X where each X i, 1 i, is a 0, 1 valued radom variable ad all X i s ( are idepedet. ) Also, let E[X] = µ ad Pr[X i = 1] = p i, 1 i. The for ay µ. e δ > 0, Pr[X µ(1 + δ)] δ (1+δ) 1+δ Proof. The ormal strategy employed to prove tail estimates of sums of idepedet radom variables is to make use of expoetial momets. While provig Chebychev iequality (Theorem 2.3), we made use of secod-order momet. It ca be observed that usig higher order momets would geerally improve the boud o the tail iequality. But usig expoetial momets would result i a vast improvemet i the boud as we shall see later i some examples. Observe that, µ = p i (4) by defiitio of X ad usig liearity of expectatios. Let Y i = e txi for some parameter t to be chose later. Note that Y i s are also idepedet as X i s are. Defie the radom variable Y = Y 1 Y 2 Y. The, E[Y i ] = E[e txi ] = p i e t + (1 p i )e 0 = 1 p i + p i e t (5) E[Y ] = E[Y 1 Y 2 Y ] = Π E[Y i] = Π (1 p i + p i e t ) (6) where the secod equality follows from idepedece of Y i s ad the last equality follows from (5). Now, Pr[X µ(1 + δ)] = Pr[Y e tµ(1+δ) ] E[Y ] e tµ(1+δ) = Π (1 p i + p i e t ) e tµ(1+δ) (7) by usig Markov iequality from Theorem 2.1 ad equatio (6). We ow make use of the iequality from Fact 1.1 i equatio (7) which the reduces to, Pr[X µ(1 + δ)] = Π e pi(1 et ) e tµ(1+δ) = e µ(1 e t ) = e µ[(e t 1) t(1+δ)] e tµ(1+δ) where the secod equality follows from equatio (4). I (8), we ca choose a value of t that miimizes the probability estimate. To fid the miimum let f(t) = le µ[(et 1) t(1+δ)] = µ(1 e t ) tµ(1 + δ). Differetiatig f(t) with respect to t ad equatig it to zero gives us µe t µ(1 + δ) = 0 or t = l(1 + δ). Usig this value of t i (8), That completes the proof. Pr[X µ(1 + δ)] e µ(1 (1+δ)) (1 + δ) = e µδ ( µ(1+δ) (1 + δ) = e δ ) µ. (9) (1+δ) (1 + δ) (1+δ) But the form of the iequality i equatio (9) is ot very coveiet to hadle. I additio this form is hard to ivert, i.e. give the probability boud, choose a appropriate δ. Istead, we use the followig form most ofte. (8) 3
Theorem 2.5 Let X be defied as i Theorem 2.4. The, 2.4 Applicatio of Tail Iequalities Pr[X (1 + δ)µ] We cosider the use of tail iequalities to two problems. 2.4.1 Balls ad Bis { e µδ 2 /4 if δ 1 e µδ l δ if δ > 1 Cosider throwig balls idepedetly ad uiformly at radom ito bis. We are iterested i the probability that bi 1 cotais more tha 4 balls. Defie 0, 1 valued radom variables X i, 1 i, defied as X i = 1 if ball i falls ito bi 1 ad 0 otherwise. By uiformity, Pr[X i = 1] = 1. Defie the radom variable X = X 1 + X 2 + X. Thus X deotes the umber of balls that fall i bi 1. By liearity of expectatios, E[X] = E[ X i] = E[X i] = 1 = 1. Usig Markov iequality from Theorem 2.1, we get Pr[X 4] 1 4 (10) Before usig Chebychev iequality (Theorem 2.3), we first compute the stadard deviatio of X as follows. var(x i ) = E[X 2 i ] E[X i ] 2 = 1 1 2 (11) var(x) = var(x 1 ) = ( 1 1 2 ) = 1 1 where the first equality i (12) follows from the idepedece of X i s. Hece, σ X = 1 1 (12) (13) Applyig Chebychev iequality to the radom variable X, Pr[X 4] = Pr[X 1 3] Pr[ X 1 3] ( 1 3 1 1 ) 2 = 1 1 9 1 9 (14) Usig Cheroff bouds from Theorem 2.4, Pr[X 4] = Pr[X (1 + 3)1] e 1 3l 3 = 1 27 where i we used the simpler form of Cheroff bouds as stated i Theorem 2.5. Now comparig equatios (10), (14) ad (15) we ca see that usig Chebychev iequality gives a better boud tha that of Markov iequality ad a much better boud is obtaied usig Cheroff bouds. As aother example, let us cosider the probability that bi 1 has more tha 1 + 10 l balls. Usig Markov iequality, we get Pr[X 1 1 + 10 l]. Usig Chebychev iequality we get that Pr[X 1 + 10 l] Pr[ X 1 10 l] 1 1 100 l 2 1 1+10 l 100l 2. Whereas, usig Cheroff bouds, we ca get Pr[X 1 + 10 l] e 10l l (10l ) = 10l 10 l 1. With a tighter aalysis usig Cheroff bouds it ca be show that for the case of balls ad 10 bis, the probability that bi 1 has more tha c l l l for some costat c is very small, say 1. 2 (15) 4
2.4.2 l balls ad bis I this sceario we have l balls ad bis where balls are throw ito the bis idepedetly ad uiformly at radom. Defie the radom variables X i, 1 i l ad X as above. We have Pr[X i = 1] = 1/ ad E[X] = l. Let us estimate the probability that bi 1 has more tha 10 l balls. Usig Markov iequality, we get that Pr[X 10 l] 1/10. Istead, usig Cheroff bouds, we get Pr[X 10 l] = Pr[X (1 + 9)l] e 9l l 9 1/ 20 which is expoetially small. I geeral, it ca be observed that if the expectatio of the radom variable is small the we pay a higher pealty to derive a w.h.p boud. A Probability Theory We start by defiig probability ad the itroduce some well-kow iequalities that we ofte use. Let Ω be a arbitrary set, called the sample space. We start by defiig a σ field, also sometimes called a σ algebra. Defiitio A.1 (σ field) A collectio F of subsets of Ω is called a σ field if it satisfies: 1. Ω F 2. A F implies A c F, ad 3. For ay coutable sequece A 1, A 2,..., if A 1, A 2,... F the A 1 A 2... F. Defiitio A.2 A set fuctio Pr o a σ field F of subsets of Ω such that Pr : F [0, 1] is called a probability measure if it satisfies: 1. 0 Pr(A) 1, A F. 2. Pr(Ω) = 1, ad 3. If A 1, A 2,... is a disjoit sequece of sets i F the ( ) Pr A i = Pr(A i ) The triad (Ω, F, Pr) is ofte called a probability space. For equivalet ad alterate defiitios, examples, ad a more complete itroductio, we refer the reader to stadard textbooks o probability [?]. I the followig, if o probability space is metioed the ay space (Ω, F, Pr) ca be take. We ofte use the followig iequality called Boole s iequality which is part of a geeral Boole-Boferroi iequalities [?] ad this is also sometimes referred to as the uio boud as it provides a boud o the probability of a uio of evets. This iequality is also referred to as the (fiite) sub-additivity property of the probability measure. Propositio A.3 (Boole s iequality) For ay arbitrary evets A 1, A 2,...A, ( ) Pr A i Pr(A i ) The otio of idepedece is a importat cocept i the study of probability. Defiitio A.4 (Idepedece) A collectio of evets {A i : i I} is said to be idepedet if for all S I, Pr( i S A i ) = Π i S Pr(A i ). We ow defie radom variable, which is ay measurable fuctio from Ω to IR. Let R deote the stadard Borel σ field associated with IR, which is the σ field geerated by left-ope itervals of IR [?]. Defiitio A.5 (Radom Variable) Give a probability space (Ω, F, Pr), a mappig X : Ω IR is called a radom variable if it satisfies the coditio that X 1 (R) F for every R R. 5
We represet as {X x} as the set {ω Ω X(ω) x} for x IR ad also write Pr(X x) as the probability of the above evet. Similar defiitio ca be made for represetig the set {ω Ω X(ω = x} as {X = x}. The otio of idepedece also exteds to radom variables. Two radom variables X ad Y are said to be idepedet if the evets {X x} ad {Y y} are idepedet for x, y R. The defiitio exteds to multiple radom variables just as i Defiitio A.4. Associated with ay radom variable is a distributio fuctio defied as follows. Defiitio A.6 (Distributio fuctio) The distributio fuctio F : IR [0, 1] for a radom variable X is defied as F X (x) = Pr(X x). A radom variable X is said to be a discrete radom variable if the rage of X is a fiite or coutably ifiite subset of IR. For discrete radom variables, the followig defiitio ca be provided for the desity of a radom variable. Defiitio A.7 (Desity) Give a radom variable X, the desity fuctio f X : IR [0, 1] of X is defied as f X (x) = Pr(X = x). The above defiitio ca be exteded to all types of radom variables also with proper care. I the rest of this sectio, we focus o discrete radom variables oly ad hece the defiitios are made for the case of discrete radom variables. With proper care, the defiitios however ca be exteded [?]. A importat quatity of iterest of a radom variable is its expectatio. Defiitio A.8 (Expectatio) Give a probability space (Ω, F, Pr) ad a radom variable X, the expectatio of X, deoted E[X], is defied as E[X] = x IRxPr[X = x] with the covetio that 0 = 0 = 0. 6