Math 216A Notes, Week 5 - PDF Free Download

Math 6A Notes, Week 5 Scribe: Ayastassia Sebolt Disclaimer: These otes are ot early as polished (ad quite possibly ot early as correct) as a published paper. Please use them at your ow risk.. Thresholds for Radom Graphs As with our last example from last week, we will be lookig at applicatios of Chebyshev s Iequality. Theorem. If X is ay radom variable with mea µ, ad variace σ, the P ( X µ λσ) λ This ca be useful whe we are tryig to show that a variable is likely to be large. We first show that it is large o average (its expectatio is large), the show that it is usually close to its expectatio (the variace is small). For example, cosider the followig model of Radom Graphs (the so-called Erdős-Réyi model): We have vertices (where should be thought of as large). Every edge betwee two vertices appears with probability p, which might deped o, ad all edges are idepedet. Ituitively, p should be thought of here as a weightig scheme as p icreases from 0 to this model gives more ad more weight to graphs with more ad more edges. Defiitio. A Threshold for a evet is some probability p so that if 0, the the evet happes with probability o(), ad if, the probability of the evet is o(). p p p p Here we use the otatio f g ad g f to mea that f g teds to 0 as teds to ifiity... A Trivial Example. The threshold for havig a edge is. If p, the expected umber of edges, p ( ), teds to 0, so by Markov s iequality there is almost surely ot a edge. If p, the P( o edges ) = ( p) ( ) e p( ) = o()... A More Complicated Example (Coutig Triagles). Cosider the threshold for the evet of havig at least oe triagle i the graph (so we wat three vertices that are all coected to each other). The expected umber of triagles is ( ) p 3. 3 So If p, the expected umber of triagles teds to 0 If p, the expected umber of triagles teds to ifiity We would like to use this to say that is a threshold for the appearace of triagles i G. If we oly cosider expectatios, we ca oly get halfway there. It is true that if X is ay variable such that E(X) teds to 0, the with high probability X = 0 (This is just Markov s iequality). However, it is ot always true that if E(X) teds to ifiity tha with high probability X > 0, sice it could be the high expectatio is because X is extremely large a small fractio of the time. (The homework gives a example of how this could happe if we replace triagles by a differet graph).

Here we would like to apply Chebyshev s iequality to X, where X is the umber of triagles. I order to do this, we eed to compute the variace of X. Note that we ca thik of X as i x i, where the idex i rages over all subsets of three vertices, ad the variable x i is equal to if the subset spas a triagle ad 0 otherwise. I this form we ca write where ( ) Var(X) = E (x i E (x i )) = i i (Covar (x i, x j )), Covar (x i, x j ) := E ((x i E (x i )) (x j E (x j ))) j = E (x i x j ) E (x i ) E (x j ) is the covariace of x i ad x j. Sice the x i are o-egative idicator variables, the covariace is always at most equal to E(x i x j ). I our case we eed to compute the sum of the covariace over all pairs of potetial triagles. If we deote the vertex sets of triagles by S ad S, we ca break this up ito four parts depedig o the itersectio of S ad S. () If S S =, covariace is 0, because the appearace of the triagles are idepedet. () If S S =, the covariace is still 0 because the triagles i questio still do ot share a edge. (3) If S S = the vertices are shared, ad hece oe edge is shared. So for both triagles to be preset five edges must be preset. This meas P (S, S both there) = p 5, which also gives a upper boud o the covariace. The umber of such pairs (S, S ) is at most 5 (there s 4 vertices ivolved i the cofiguratio), so the total cotributio of such pairs to the variace is at most 4 p 5. (4) If S S, the the covariace is at most p 3 ad the umber of such pairs is at most 3, so the total cotributio here is at most 3 p 3. Addig up over all the above cases, we have that the variace is at most p 5 4 + p 3 3. By Chebyshev s iequality, we ca ow say If p, this teds to 0. P( o triagles ) P ( X E(X) > E(X)) σ µ = p5 4 + p 3 3 (p 3 3 ) = p + p 3 3 Remark. I a sese you ca thik of what we re doig as breakig up the sum defiig variace ito a sum over itersectig pairs ad a sum over o-itersectig pairs. The first sum is small just because there are t too may itersectig pairs, while the secod sum is zero because o-itersectig triagles are pairwise idepedet.. Back to the Méage s Problem Here s a problem we cosidered before: How may permutatios of,..., have σ(i) i(mod) ad σ(i) i + (mod) for i?

For ay, if we choose σ at radom, the probability that i is bad is. Sice these evets are i a sese early idepedet for large, we may also guess that the probability bad evets do ot occur is approximately ( ) e. Here s oe step towards makig that ituitio rigorous. Let, x i be the idicator fuctio of the bad evet i which σ(i) = i or i +. As above, we kow that E(x i ) = /. Imagie for ow that the x i were idepedet. We would have E((x +... + x ) ) = i = E(x i x j ) j ( ) + ( ) ( ) 4 = 6 + o() I the last lie we broke our sum up accordig to differet cases: Case (): If i = j, we have E(x i ) =. This happes for terms of our sum. Case(): If i j, the E(x i x j ) = = 4 terms with this property. sice x i ad x j are assumed idepedet. There are We ow tur to the computatio of the actual secod momet. As before, we write E((x + + x ) ) = i E(x i x j ), but ow x i ad x j are ot idepedet. We compute E(x i x j ) i three cases: Case (): If i j >, E(x i x j ) =. Case (): If i j =, E(x i x j ) = 3 ( ). (3 because oe of the four ways i which both x i ad x j could be bad ivolves both i ad j mapped to the same place by σ). Case (3): If i = j, E(x i x j ) =. We therefore have (addig up over the three cases i reverse order) E(x + + x ) = E(x i x j ) i j = + 3 ( ) + 4 ( 3) ( ) = 6 + o() So the secod momet is asymptotically the same as it were if the x i were idepedet. It is possible to show by a similar argumet that the same is true for E((x + + x ) k ) for ay k, which turs out to be eough to make the early idepedet ituitio rigorous. Remark. The splittig up ito cases here was similar to how it was for coutig triagles. Almost all of the pairs have i j >, i which case we have E(X i X j ) E(X i )E(X j ). There are a few pairs i j where the covariace is larger, but the umber of pairs is so small that the et cotributio to the variace is egligible. 3 j

3. Expoetial Momets ad the Cheroff Boud I a sese, Chebyshev s iequality is as tight as we ca hope for. If, for example, P(X = σ) = P(X = σ) = ad λ = the we certaily ca t hope to say aythig stroger tha P( X σ). But there are also may situatios where it is t close to beig as good as reality. For example, suppose that x i = ± with probability /, ad X = x i. We have Var(X) =, so Chebyshev s Iequality gives P (X > λ ) λ. However, i reality, X is asymptotically ormal, ad the probability decays like e λ /. I this case, Chebyshev is very far off from the correct boud for large λ. The key differece betwee this ad the tight example above is that here the sum is made up of a lot of idepedet variables which are ot too large. There will usually be a lot of cacellatio, ad we wat to exploit this to get a better tail boud. Here s a more geeral framework. Let x,..., x be idepedet radom variables with x i E(x i ), ad let X = x +...x. Let σ be the variace of X. We wat to fid a tail boud o P( X E(X) λσ. For simplicity, let us assume that E(x i ) = 0 (we ca always subtract a costat from each x i to make this true without affectig our boud). Before, our boud from Chebyshev s iequality was prove by usig a argumet alog the lies of P ( X E(X) > λ) = P ((X E(X)) > λ ) = P (X > λ ) E(X ) λ, Where the last iequality holds by Markov s iequality. To get a better boud, we ll apply a similar argumet to a steeper fuctio. Let t be a parameter to be chose later (we will evetually optimize over t). By Markov s iequality, we have P (X > λ) = P (e tx > e tλ ) E(etX ) e tλ. We ow exploit that our x i are idepedet to write E(e tx ) = E(e tx+tx+...+tx ) = E(e txi ). At this poit, we still eed to fid some boud o e txi. Assume for ow that t, The, tx i because of our assumptio that the x i are bouded. It is a quick calculus exercise to show that for u we have e u + u + u. This meas E(e txi ) E( + tx i + t x i ) = + Var(x i )t e t Var(x i) This trick here is why the Fourier Trasform (or Characteristic Fuctio, as some call it) is useful i probability. 4

Multiplyig over all i, we have σ et e tλ E(e txi ) e t Var(x i) exp = e t σ ( t Var(x i ) Hece, P (X > λ) E(etX ). Now, if we relabel our variables slightly, P (X > λσ) e t σ e tλσ. To e tλ get the best boud possible, we would like to take t = λ σ. However, we earlier made the assumptio t. Splittig up ito two cases our values of t give us the followig: () If λ σ, take t = λ σ. This gives, P (X > λσ) e λ 4. () If λ > σ, take t =. This gives P (x > λσ) e λσ+σ e λσ. A idetical argumet gives us a boud o the probability that x is very small. Combiig the two, we obtai Theorem. (Cheroff s Iequality) Let x,...x be idepedet radom variables, satisfyig x i E(x i ). Let X = x +... + x have variace σ. The, } P ( X E(X) λσ) max {e λ λσ 4, e. ) Oe particular special case here is useful. Suppose that each x i is the idicator fuctio of some evet with probability p i, meaig that x i = with probability p i ad x i = 0 otherwise. The Addig this up over all i, we get E(X) Var(X). E(x i ) = p i Var(x i ) = p i ( p i ). Now take λ = εe(x) σ. The Cheroff s boud becomes P ( x E(x) > ɛe(x)) max E(x) ɛ {e 4, e E(x) ɛ }. If E(x) is very large, this probability is automatically small. I particular, if the p i are bouded away from 0, this probability is expoetially small i. This is great for applicatios whe we wat to take a uio boud over a very large umber of evets. 4. Balace i Radom Graphs : Cosider a graph o vertices where every edge is idepedetly preset/abset with probability 0.5 ad large (that is to say a graph chose uiformly from all graphs o vertices). We would like to say that the edges of this graph are spread out evely. Here s three seses i which we could make such a claim: () Every vertex has degree betwee log() ad + log(). (So each vertex has about the same degree). () If we split the vertices ito equal sized subsets X ad Y, the umber of edges betwee X ad Y 3 is betwee 4 coectig them). ad 4 + 3. (Most subsets of vertices have about the same umber of vertices 5

(3) For ay two disjoit subsets X ad Y, the umber of edges betwee X ad Y are betwee X Y 3 ad X Y + 3. We ca show each of these without too much effort by the Cheroff boud ad the uio boud. For example, cosider (). Each vertex v has possible edges. The degree of v ca be thought of as x i, where x i = if edge i from v is preset. So x i has mea ad variace σ = 4. By Cheroff, P ( deg(v) > λ max(e λ λ 4, e 4 ). Now if we take λ = 4 log, the right had side becomes 4. So ay give vertex has abormal degree with probability at most 4, so the probability that some vertex has abormal degree as at most 3. For (), we do exactly the same thig. For ay particular split, the umber of edges has a mea 8 ad variace 6. So, σ = 4 ad by Cheroff, P ( umber of edges 8 > λ 4 λ λ max(e 4, e 8 ). The umber of splits is ( ) <. So we choose λ to make the Cheroff boud smaller tha. It turs out if we take λ =, the boud o the probability ay oe split fails becomes e. So the probability that some split fails is at most e. The proof for (3) is similar. Remark 3. Note that for all of these our methods were somewhat similar. Cout the umber of bad evets that we wat to avoid, the pick λ i the Cheroff boud so as to make the probability of each bad evet much smaller tha the umber of bad evets. 4.. Imbalace i Geeral Graphs. Oe questio we might ask if it is possible to, by clever costructio, come up with a graph that s more evely spread out tha the radom graph (i the sese of (3)). It s certaily reasoable to thig so i the balls i bis example it took us log balls to fill all the bis by radom droppig, but there s a obvious way of fillig all the bis with oly balls. As it turs out, the radom graph is actually the best possible here, up to possibly the costat i frot of the 3/. We ll oly sketch the proof of a weaker versio of this where we assume that the graph is regular (which ituitively should help us spread edges out more evely). Theorem 3. There a δ > 0 such that for large eough eve, ay subsets X ad Y such that E(X, Y ) X Y + δ 3. - regular graph o vertices has two Proof. (sketch): Let X be a subset of the vertices of the graph chose radomly, which P (x X) =.0 ad each x X idepedetly. Defiitio. We will call a vertex v heavy if: () v / X () The umber of eighbors of v i X is at least x + 0000 We will take Y to be the set of heavy vertices. It ca be verified directly that a give v is heavy with probability at least /0000, so E( Y ) /0000. O the other had, we kow that Y is at most. So 6

we have E( Y ) P ( Y > /0000) + /0000P ( Y /0000) P ( Y > /0000) + /0000 Comparig, P ( Y > /0000) /0000. This meas there must be a choice of X for which Y is at least /0000. But the by costructio E(X, Y ) X Y = E(X, {y}) X y Y Y 0000 δ 3/ 5. Asymptotic Bases Defiitio 3. Set A is a asymptotic basis of order k if all sufficietly large are the sum of at most k elemets of A. Example : Lagrage s 4 square Theorem (every positive iteger is the sum of at most 4 squares) meas that {, 4, 9, 6,...} is a basis of order 4. Example : Goldbach s Cojecture would imply that the primes {, 3, 5, 7,, 3,...} are a basis of order 3. Example 3: {,, 4, 8, 6, 3,...} is ot a basis for ay k, sice a umber with k + digits of i its biary represetatio caot be writte as the sum of k powers of. Questio. (Sido - 930 s): How thi ca a basis of positive itegers be? We ll focus o the case k =, ad start with a couple of cheap bouds. Let r A () be the umber of represetatios of as a + a, for a, a A. We first ote that if a + a, certaily a ad a. This meas N r A () A {,..., N} = Ay basis satisfies r A () for large, which by the above implies A {,...N} N. Coversely, if a ad a, the a + a N. This implies N A {,...N} r A (). So if r A () is small (ot too much larger tha ), the A {,..., N} N. 7

Theorem 4. (Erdős, 956) There is a A such that r A () = θ log(), meaig that there are costats c ad c such that for large eough we have c log() r A () c log() Proof. By the above, we expect that A {,..., N} should be about N log N. So we d like to cosider a log(x) set formed by takig x A radomly ad idepedetly with probability x. Well, we ca t quite do that, because probability are at most. I actuality, we ll take ( ) log() P ( A) = mi C,, where C is a large costat. The r A () is the sum of idicator evets x i of the form {i Aad i A}. We aim to show that roughly log of these evets hold. To do this, we ca safely igore (or eve adjust the probability of) ay costat umber of evets. The evets we ll fix correspod to the represetatios = ad = / + /, alog with ay evet where P(i A) =. For the remaiig evets, we have Therefore log(i) log( i) P (x i ) = C. i i E(r A ()) = Here, a upper boud for E(r A ()) is: C log i log() i i( i) + O(). c log() i = C log() C log() + o() i Ad a lower boud for E(r A ()) is: i= 4 C 09i 09( i) i( ) = = i= 4 C 09 4 4 log 3 4 4 C log() ( ) + o() c log() Takig C = 9, we see that for large 7 log() E(r A ()) 43 log(). Now for a fixed, the evets for each differece i are idepedet. Now we may use the Cheroff boud to see that P ( r A () E(r A ()) ) 6 log() e 3 log() = 3. So for sufficietly large the probability does t satisfy log() r A () 69 log()) is at most 3. This has a fiite sum over all, so by Borel Catelli we have with probability that the umber of which fail to satisfy this is fiite. 8

It s possible, but more complicated, to exted this argumet to cover the case of larger k (this was first doe by Erdős ad Tetali). The key wrikle that eeds to be overcome is that we o loger have idepedece. For example, if = 0 the represetatios 5 + 4 + ad 5 + 3 + are coupled by the presece of 5 i both of them. 5.. Two Ope Problems. Here are two ifamous cojectures due to Erdős ad Turá:. You caot have a basis for which r A () o(log()) for all large.. There is o costat C for which there exists a basis satisfyig r A () C. The secod cojecture is of course weaker tha the first. 9