Lecture 14: March 1. approaches a standard normal distribution N(0, 1). Thus, as n, for any fixed β > 0 we have. e t2 2 dt

Size: px

Start display at page:

Download "Lecture 14: March 1. approaches a standard normal distribution N(0, 1). Thus, as n, for any fixed β > 0 we have. e t2 2 dt"

Rudolph Fleming
5 years ago
Views:

1 CS271 Radomess & Computatio Sprig 2018 Istructor: Alistair Siclair Lecture 14: March 1 Disclaimer: These otes have ot bee subjected to the usual scrutiy accorded to formal publicatios. They may be distributed outside this class oly with the permissio of the Istructor. So far we have talked about the expectatio of radom variables, ad moved o to the variace or the secod momet. However, either of these methods is strog eough to produce the tight bouds eeded for some applicatios. We have already stated that if all you are give is the variace of a r.v., the Chebyshev s Iequality is essetially the best you ca do. Thus, more iformatio is eeded to acheive substatially tighter bouds. We begi with the case i which the r.v. is the sum of a sequece of idepedet r.v. s; this case arises very frequetly i applicatios Cheroff/Hoeffdig Bouds These methods are usually referred to as Cheroff bouds, but should probably be referred to as Hoeffdig bouds. Cheroff established the bouds for the domai of idepedet coi flips [C52], but it was Hoeffdig who exteded them to the geeral case [H63]. The method of proof used here is due to Berstei. Let X 1... X be i.i.d. radom variables. E[X i ] = µ i, ad Var[X i ] = σ 2 i, with all µ i ad all σ i equal. Let X = i=1 X i. What ca we say about the distributio of X? For illustratio, suppose each X i is a flip of the same (possibly biased coi. Idetifyig Heads with 1 ad Tails with 0, we are iterested i the distributio of the total umber of heads, X, i flips. Let by liearity of expectatio. Let µ = E[X] = µ i σ 2 = Var[X] = σ 2 i by liearity of the variace of idepedet radom variables. The Cetral Limit Theorem states that as, X µ σ approaches a stadard ormal distributio N(0, 1. Thus, as, for ay fixed β > 0 we have Pr[ X µ > βσ] 1 2π β e t2 2 dt 1 2πβ e β2 2. Note that the probability of a β-deviatio from the mea (i uits of the stadard deviatio σ decays expoetially with β. Ufortuately, there are two deficiecies i this boud for typical combiatorial ad computatioal applicatios: The above result is asymptotic oly, ad says othig about the rate of covergece (i.e., the behavior for fiite. The above result applies oly to deviatios o the order of the stadard deviatio, while we would like to be able to talk about arbitrary deviatios. 14-1

2 14-2 Lecture 14: March 1 Cheroff/Hoeffidig bouds deal with both of these deficiecies. Here is the most geeral (though ot ofte used form of the bouds: Theorem 14.1 Let X 1... X be idepedet 0-1 r.v.s with E[X i ] = p i (ot ecessarily equal. Let X = i=1 X i, µ = E[X] = i=1 p i, ad p = µ (the average of the p i. 1. Pr[X µ + λ] exp{ H p (p + λ } for 0 < λ < µ; 2. Pr[X µ λ] exp{ H 1 p (1 p + λ } for 0 < λ < µ, where H p (x = x l( x p 1 x + (1 x l( 1 p is the relative etropy of x with respect to p. The theorem gives bouds o the probabilities of deviatig by λ above or below the mea. Note that the relative etropy is always positive (ad is zero iff x = p. The relative etropy H p (x ca be viewed as a measure of distace betwee the two-poit distributios (p, 1 p ad (x, 1 x. Thus the quatity appearig i the expoet of the tail probability i the theorem ca be iterpreted as the distace betwee the true two-poit distributio (p, 1 p ad the distributio (p + λ, 1 p λ, i which the mea has bee shifted to the tail value p + λ. Proof: The bouds i 1 ad 2 are symmetrical (replace x by x to get 2 from 1. Therefore, we eed oly prove 1. Let m = µ + λ > µ. We are tryig to estimate Pr[X m]. The geeral steps we will use to obtai the boud (ad which ca be used i other, similar scearios are as follows: Expoetiate both sides of the iequality X m, with a free parameter t. Laplace trasform. (This is like takig a Apply Markov s Iequality. Use the idepedece of the X i to trasform the expectatio of a sum ito a product of expectatios. Plug i specific iformatio about the distributio of each X i to evaluate the expectatios. Use cocavity to boud the resultig product as a fuctio of p rather tha of the idividual p i. Miimize the fial boud as a fuctio of t, usig basic calculus. Here is the derivatio: Pr[X m] = Pr[e tx e tm ] for ay t > 0 (to be chose later e tm E[e tx ] by Markov s iequality = e tm E[e txi ] by idepedece of the X i i=1 = e tm (e t p i + 1 p i from the distributios of the X i (14.1 i=1 e tm (e t p + 1 p by the Arithmetic Mea/Geometric Mea iequality More geerally, we ca view the last step as a applicatio of cocavity of the fuctio γ(z = l((e t 1z+1.

3 Lecture 14: March Fially, we eed to optimize this boud over t. Rewritig the fial expressio above as exp{ l(pe t + (1 p tm} ad differetiatig w.r.t. t, we fid that the miimum is attaied whe e t = m(1 p ( mp (ad ote that this is ideed > 1, so t > 0 as required. Pluggig this value i, we fid that { ( ( } Pr[X m] exp l m(1 p ( m + 1 p m l m(1 p ( mp. At this poit, we eed oly massage this formula ito the right form to get our result. It equals { ( } exp l( 1 p 1 m/ m (1 pm/ l( (1 m/p { ( (1 ( ( } = exp m l 1 p 1 m/ + m l p m/ = exp { ( } H p p + λ by the defiitio of etropy ad the substitutio m = µ + λ, which is equivalet to m = µ+λ = p + λ. This proves part 1 of Theorem 14.1, ad hece also part 2 by symmetry. Note that the above theorem ad proof applies equally to idepedet r.v. s X i takig ay values i the iterval [0, 1] (ot just the values 0 ad 1. The oly step i the proof where we used the distributios of the X i was step (14.1, ad oe ca check that this still holds (with = replaced by for ay X i o [0, 1] (i.e., 0-1 is the worst case, for give mea p i. Similarly, a aalogous boud holds whe the X i live o ay bouded itervals [a i, b i ] (ad the the quatities a i b i will appear i the boud Exercise. I fact, oe ca obtai similar bouds eve whe the X i are ubouded, provided their distributios fall off quickly eough, as i the case (e.g. of a geometric radom variable Some Corollaries We ow give some more useful versios of the Cheroff boud, all obtaied from the above basic form. Corollary 14.2 Pr[X µ λ] Pr[X µ + λ] } exp ( 2λ2. Proof: (Sketch. We cosider oly the upper tail; the lower tail follows by symmetry. Writig z = λ, ad comparig the expoet i part 1 of Theorem 14.1 with our target value 2λ2, we eed to show that ( p + z ( 1 p z f(z (p + z l p + (1 p z l 1 p 2z 2 0 o the iterval 0 z 1 p. To do this we ote that f(0 = 0, so it is eough to show that f(z is o-decreasig o this iterval. This ca be verified by lookig at the first ad secod derivatives of f. The details are left as a exercise. We ow prove the followig alterative corollary, due to Aglui ad Valiat [AV79], which is ever much worse ad is much sharper i cases where µ (e.g., µ = Θ(log. Corollary 14.3 For 0 < β < 1 we have Pr[X (1 βµ] exp{ µ(β + (1 β l(1 β} exp( β2 µ 2

4 14-4 Lecture 14: March 1 Ad for β > 0 we have Pr[X (1 + βµ] exp{ µ( β { + (1 + β l(1 + β} ( exp( β2 µ 2+β β > 0 exp( β2 µ 3 0 < β 1 Proof: We first prove the boud o the lower tail. Pluggig i λ = βµ = βp ito part 2 of Theorem 14.1 gives Pr[X (1 βµ] exp{ H 1 p (1 p + βp} (14.2 { ( ( 1 p ( p } = exp (1 p + βp l + (p βp l 1 p + βp p βp { ( ( 1 } exp βp + p(1 β l ( β = exp{ µ(β + (1 β l(1 β} { } exp µ (β β + β2 ( } = exp { µβ2. 2 The iequality i (14.2 comes directly from Theorem Iequality (14.3 comes from the fact that ( 1 p ( βp βp l = l 1 1 p + βp 1 p + βp 1 p + βp usig l(1 x < x. Fially, iequality (14.4 is due to the fact that (1 x l(1 x x + x 2 /2 for 0 < x < 1. The proof for the upper tail is very similar. The first step is to plug i λ = βµ ito part 1 of Theorem 14.1 ad follow the same steps as above. The oly differece is that this time we use the iequality l(1 + x < x to get (, ad the the iequality l(1 + β > to get the fial form. The details are left as a exercise. β 1 + β/2 Exercise: Is this boud ever worse tha that of Corollary 14.2? If so, by how much? Exercise: Verify that Theorem 14.1 (ad therefore also the above Corollaries hold whe the X i are ot ecessarily 0,1-valued but ca take ay values i the rage [0, 1], subject oly to their expectatios beig p i. [Hit: Show that, if Y is ay r.v. takig values i [0, 1], ad Z is a (0,1-valued r.v. with EZ = EY, the for ay covex fuctio f we have E(f(Y E(f(Z. Apply this fact to the fuctio f(x = e tx at a appropriate poit i the proof of Theorem 14.1.] Exercise: Suppose that the r.v. s are idepedet ad X i takes values i the iterval [a i, b i ]. Prove the followig geeralizatio of Corollary } ( Pr[X µ λ] 2λ 2 exp Pr[X µ + λ] i=1 (b. i a i 2 [Hit: You will eed to go back ad tweak the proof of Theorem Is there a simpler scalig proof whe all the itervals [a i, b i ] are the same?]

5 Lecture 14: March Simple Examples Example 14.4 Fair coi flips. Give a coi where each flip has a probability p i = p = 1/2 of comig up heads, Corollary 14.2 gives the followig boud o the probability that the umber of heads X differs from the expected umber µ = /2 by more tha λ: Pr[ X µ λ] 2e 2λ2. Note that the stadard deviatio is σ = p(1 p = /2. So for λ = βσ the deviatio probability is 2e β2 /2. This is similar to the asymptotic boud obtaied from the CLT (but is valid for all. Example 14.5 Biased coi flips. I this example, the bit flips still all have the same probability, but ow the probability of comig up heads is p i = p = 3/4. We ca the ask for the value of Pr[ 1/2 of the flips come up heads ] The expected umber of heads is µ = 3/4. Thus, usig Corollary 14.3 with β = 1/3 we get Pr[X /2] = Pr[X µ /4] (14.5 = Pr[X (1 1/3µ] (14.6 exp( (1/ (14.7 = e 24. (14.8 (A similar boud, with a slightly better costat i the expoet, is obtaied by usig Corollary Recall that we have already used a boud of this form twice i the course: i boostig the success probability of two-sided error radomized algorithms (BPP algorithms, ad i the aalysis of the media trick for fully polyomial radomized approximatio schemes Radomized Routig We will ow see a example of a radomized algorithm whose aalysis makes use of the above Cheroff bouds. The problem is defied as follows. Cosider the etwork defied by the -dimesioal hypercube: i.e, the vertices of the etwork are the strigs i {0, 1}, ad edges coect pairs of vertices that differ i exactly oe bit. We shall thik of each edge as cosistig of two liks, oe i each directio. Let N = 2, the umber of vertices. Now let π be ay permutatio o the vertices of the cube. The goal is to sed oe packet from each i to its correspodig π(i, for all i simultaeously. This problem ca be see as a buildig block for more realistic routig applicatios. A strategy for routig permutatios o a graph ca give useful ispiratio for solvig similar problems o real etworks. We will use a sychroous model, i.e., the routig occurs i discrete time steps, ad i each time step, oe packet is allowed to travel alog each (directed edge. If more tha oe packet wishes to traverse a give edge i the same time step, all but oe of these packets are held i a queue at the edge. We assume ay fair queueig disciplie (e.g., FIFO. The goal is to miimize the total time before all packets have reached their destiatios. A priori, a packet oly has to travel O( steps (the diameter of the cube. However, due to the potetial for cogestio o the edges, it is possible that the process will take much loger tha this as packets get delayed i the queues.

6 14-6 Lecture 14: March 1 π(i i Figure 14.1: Radomized routig i a hypercube. Defiitio 14.6 A oblivious strategy is oe i which the route chose for each packet does ot deped o the routes of the other packets. That is, the path from i to π(i is a fuctio of i ad π(i oly. A oblivious routig strategy is thus oe i which there is o global sychroizatio, a realistic costrait if we are iterested i real-world problems. Theorem 14.7 [KKT90] For ay determiistic, oblivious routig strategy o the hypercube, there exists a permutatio that requires Ω( N/ = Ω( 2 / steps. This is quite a udesirable worst case. Fortuately, radomizatio ca provide a dramatic improvemet o this lower boud. Theorem 14.8 [VB81] There exists a radomized, oblivious routig strategy that termiates i O( steps with very high probability. We ow sketch this radomized strategy, which cosists of two phases. I phase 1, each packet i is routed to δ(i, where the destiatio δ(i is chose u.a.r. I phase 2, each packet is routed from δ(i to its desired fial destiatio, π(i. I both phases, we use bit-fixig paths to route the packets. I the bit-fixig path from vertex x to vertex y of the cube, we flip each bit x i to y i (if ecessary i left-to-right order. For example, a bit-fixig path from x = to y = i the hypercube {0, 1} 7 is x = = y. Bit-fixig paths are always shortest paths. Note that δ is ot required to be a permutatio (i.e., differet packets may share the same itermediate destiatio, so this strategy is oblivious. This strategy breaks the symmetry i the problem by simply choosig a radom itermediate destiatio for each packet. This makes it impossible for a adversary with kowledge of the strategy to egieer a bad permutatio. I our aalysis, we will see that each phase of this algorithm takes oly O( steps with high probability. I order to do this, we will take a uio boud over all 2 packets, so we will eed a expoetially small probability of a sigle packet takig a log time. This is where Cheroff/Hoeffdig bouds will be required.

7 Lecture 14: March Proof: Phase 1 starts from a fixed source i ad routes to a radom destiatio δ(i. Phase 2 starts from a radom source δ(i ad routes to a fixed destiatio π(i. By symmetry, it suffices to prove that phase 1 termiates i O( steps w.h.p. Let D(i be the delay suffered by packet i; the the total time take is at most + max i D(i. We will soo prove: i Pr[D(i > c] e 2. (14.9 Takig a uio boud, it follows that Pr[ i : D(i > c] 2 e 2 < 2. It remais to prove (14.9. For a sigle packet travelig from i δ(i, let P i = (e 1, e 2,..., e k be the sequece of edges o the route take by packet i. Let S i = {j i : P j P i }, the set of packets whose routes itersect P i. The guts of the proof lie i the followig claim. Claim 14.9 D(i S(i. Proof: First ote that, i the hypercube, whe two bit-fixig paths diverge they will ot come together agai; i.e., routes which itersect will itersect oly i oe cotiguous segmet. With this observatio, we ca ow charge each uit of delay for packet i to a distict member of S(i. P i i e 1 e e e k 2 5 e 3 e 4 δ(i P j j Figure 14.2: Routes P i ad P j i the hypercube. Defiitio Let j be a packet i S(i {i}. The lag of packet j at the start of time step t is t l, where e l is the ext edge (o P i that packet j wats to traverse. Note that the lag of packet i proceeds through the sequece of values 0, 1,..., D(i. Cosider the time step t whe the lag goes from L to L + 1; suppose this happes whe i is waitig to traverse edge e l. The i must be held up i the queue at e l (else its lag would ot icrease, so there exists at least oe other packet at e l with lag L, ad this packet actually moves at step t. Now cosider the last time at which there exists ay packet with lag L: say this is time t. The some such packet must leave path P i (or reach its destiatio at time t, for at ay step amog all packets with a give lag at a give edge, at least oe must move (ad it would retai the same lag if it remaied o path P i. So we may charge this uit of delay to that packet. Each packet is charged at most oce, because it is charged oly whe it leaves P i (which, by the observatio at the start of the proof, happes oly oce. Proof: (of iequality (14.9 Defie H ij = { 1 Pi P j 0 otherwise

8 14-8 Lecture 14: March 1 By Claim 14.9, D(i j i H ij. Ad sice the H ij are idepedet we ca use a Cheroff boud to boud the tail probability of this sum. First, we eed a small detour to boud the mea, µ = E[ j i H ij] (sice the expectatios of the H ij are tricky to get hold of. Defie The, T (e l = # of paths P j that pass through edge e l (o path P i. E[total legth of all paths] E[T (e l ] = # directed edges = N (/2 = 1/2, N so the expected umber of paths that itersect P i is (by symmetry E[ j i H ij ] k/2 /2. (14.10 Note that we used the fact that E[T (e l ] does ot deped o l, which follows by symmetry. (Oe might thik this is false because we chose a particular bit orderig. However the orderig does ot matter because for every source ad every i with probability 1/2 we choose a sik such that the bit-fixig algorithm flips the i th bit. We ca ow apply the Cheroff boud (Aglui/Valiat versio: Pr[D(i (1 + βµ] exp ( β2 2 + β µ, where µ = E[D(i]. It is easy to check (exercise! that, for ay fixed value of the tail (1 + βµ, the above boud is worst whe µ is maximized. Thus we may plug i our upper boud µ /2 ad coclude that Pr[D(i (1 + β β2 2 ] exp( 2+β 2. Takig β = 6, Pr[D(i ] exp( 16 = exp( 9 4 exp( 2. Thus, usig a uio boud as idicated earlier, we see that all packets reach their destiatios i Phase 1 i at most = 9 2 steps w.h.p. The complete algorithm (phases 1 ad 2 thus termiates i 9 steps. (To keep the phases separate, we assume that all packets wait util time 9 2 before begiig Phase 2. This completes the proof that the radomized oblivious strategy routes ay permutatio i O( steps w.h.p. Refereces [AV79] [C52] D. Aglui ad L.G. Valiat, Fast probabilistic algorithms for Hamiltoia circuits ad matchigs, Joural of Computer ad System Scieces, No. 19, 1979, pp H. Cheroff, A Measure of Asymptotic Efficiecy for Tests of a Hypothesis Based o the Sum of Observatios, Aals of Mathematic Statistics, 23, 1952, pp

9 Lecture 14: March [H63] [KKT90] [VB81] W. Hoeffdig, Probability Iequalities for Sums of Bouded Radom Variables, America Statistical Associatio Joural, 1963, pp C. Kaklamais, D. Krizac ad A. Tsatilas, Tight Bouds for Oblivious Routig i the Hypercube, Proceedigs of the Symposium o Parallel Algorithms ad Architecture, 1990, pp L. G. Valiat ad G. J. Breber, Uiversal schemes for parallel commuicatio, Proceedigs of the 13th aual ACM Symposium o Theory of Computig, 1981, pp

Lecture 4: April 10, 2013

Lecture 4: April 10, 2013 TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a