Math 216A Notes, Week 5

Similar documents
An Introduction to Randomized Algorithms

Problem Set 2 Solutions

Lecture 2: Concentration Bounds

6.3 Testing Series With Positive Terms

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Lecture 4: April 10, 2013

On Random Line Segments in the Unit Square

Lecture 19: Convergence

Lecture 2. The Lovász Local Lemma

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

HOMEWORK 2 SOLUTIONS

Lecture 2: April 3, 2013

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Lecture 12: November 13, 2018

7.1 Convergence of sequences of random variables

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

CS 330 Discussion - Probability

7.1 Convergence of sequences of random variables

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Application to Random Graphs

Problem Set 4 Due Oct, 12

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

Lecture 14: Graph Entropy

Sequences I. Chapter Introduction

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Infinite Sequences and Series

Bertrand s Postulate

Lecture 9: Expanders Part 2, Extractors

4. Partial Sums and the Central Limit Theorem

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Sequences and Series of Functions

The Growth of Functions. Theoretical Supplement

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Disjoint Systems. Abstract

MA131 - Analysis 1. Workbook 2 Sequences I

Distribution of Random Samples & Limit theorems

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

Lecture 2 February 8, 2016

Lecture 6: Coupon Collector s problem

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

This section is optional.

Random Models. Tusheng Zhang. February 14, 2013

Homework Set #3 - Solutions

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Basics of Probability Theory (for Theory of Computation courses)

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Fall 2013 MTH431/531 Real analysis Section Notes

Riemann Sums y = f (x)

Lecture 4: Unique-SAT, Parity-SAT, and Approximate Counting

Hashing and Amortization

Seunghee Ye Ma 8: Week 5 Oct 28

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

Lecture 3: August 31

Lecture Chapter 6: Convergence of Random Sequences

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand Final Solutions

Model Theory 2016, Exercises, Second batch, covering Weeks 5-7, with Solutions

Chapter 6 Infinite Series

Notes for Lecture 11

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Learning Theory: Lecture Notes

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

( ) = p and P( i = b) = q.

Optimally Sparse SVMs

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Lecture 5: April 17, 2013

Agnostic Learning and Concentration Inequalities

Math 155 (Lecture 3)

(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m?

MA131 - Analysis 1. Workbook 3 Sequences II

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

Math778P Homework 2 Solution

Here, e(a, B) is defined as the number of edges between A and B in the n dimensional boolean hypercube.

Understanding Samples

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

Lecture 2 Long paths in random graphs

ST5215: Advanced Statistical Theory

Math 299 Supplement: Real Analysis Nov 2013

Notes for Lecture 5. 1 Grover Search. 1.1 The Setting. 1.2 Motivation. Lecture 5 (September 26, 2018)

CS / MCS 401 Homework 3 grader solutions

Probability for mathematicians INDEPENDENCE TAU

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

Week 5-6: The Binomial Coefficients

Advanced Stochastic Processes.

Shannon s noiseless coding theorem

STAT Homework 1 - Solutions

Notes 19 : Martingale CLT

Large holes in quasi-random graphs

Lecture 7: Properties of Random Samples

Math 525: Lecture 5. January 18, 2018

Lecture Notes for Analysis Class

Transcription:

Math 6A Notes, Week 5 Scribe: Ayastassia Sebolt Disclaimer: These otes are ot early as polished (ad quite possibly ot early as correct) as a published paper. Please use them at your ow risk.. Thresholds for Radom Graphs As with our last example from last week, we will be lookig at applicatios of Chebyshev s Iequality. Theorem. If X is ay radom variable with mea µ, ad variace σ, the P ( X µ λσ) λ This ca be useful whe we are tryig to show that a variable is likely to be large. We first show that it is large o average (its expectatio is large), the show that it is usually close to its expectatio (the variace is small). For example, cosider the followig model of Radom Graphs (the so-called Erdős-Réyi model): We have vertices (where should be thought of as large). Every edge betwee two vertices appears with probability p, which might deped o, ad all edges are idepedet. Ituitively, p should be thought of here as a weightig scheme as p icreases from 0 to this model gives more ad more weight to graphs with more ad more edges. Defiitio. A Threshold for a evet is some probability p so that if 0, the the evet happes with probability o(), ad if, the probability of the evet is o(). p p p p Here we use the otatio f g ad g f to mea that f g teds to 0 as teds to ifiity... A Trivial Example. The threshold for havig a edge is. If p, the expected umber of edges, p ( ), teds to 0, so by Markov s iequality there is almost surely ot a edge. If p, the P( o edges ) = ( p) ( ) e p( ) = o()... A More Complicated Example (Coutig Triagles). Cosider the threshold for the evet of havig at least oe triagle i the graph (so we wat three vertices that are all coected to each other). The expected umber of triagles is ( ) p 3. 3 So If p, the expected umber of triagles teds to 0 If p, the expected umber of triagles teds to ifiity We would like to use this to say that is a threshold for the appearace of triagles i G. If we oly cosider expectatios, we ca oly get halfway there. It is true that if X is ay variable such that E(X) teds to 0, the with high probability X = 0 (This is just Markov s iequality). However, it is ot always true that if E(X) teds to ifiity tha with high probability X > 0, sice it could be the high expectatio is because X is extremely large a small fractio of the time. (The homework gives a example of how this could happe if we replace triagles by a differet graph).

Here we would like to apply Chebyshev s iequality to X, where X is the umber of triagles. I order to do this, we eed to compute the variace of X. Note that we ca thik of X as i x i, where the idex i rages over all subsets of three vertices, ad the variable x i is equal to if the subset spas a triagle ad 0 otherwise. I this form we ca write where ( ) Var(X) = E (x i E (x i )) = i i (Covar (x i, x j )), Covar (x i, x j ) := E ((x i E (x i )) (x j E (x j ))) j = E (x i x j ) E (x i ) E (x j ) is the covariace of x i ad x j. Sice the x i are o-egative idicator variables, the covariace is always at most equal to E(x i x j ). I our case we eed to compute the sum of the covariace over all pairs of potetial triagles. If we deote the vertex sets of triagles by S ad S, we ca break this up ito four parts depedig o the itersectio of S ad S. () If S S =, covariace is 0, because the appearace of the triagles are idepedet. () If S S =, the covariace is still 0 because the triagles i questio still do ot share a edge. (3) If S S = the vertices are shared, ad hece oe edge is shared. So for both triagles to be preset five edges must be preset. This meas P (S, S both there) = p 5, which also gives a upper boud o the covariace. The umber of such pairs (S, S ) is at most 5 (there s 4 vertices ivolved i the cofiguratio), so the total cotributio of such pairs to the variace is at most 4 p 5. (4) If S S, the the covariace is at most p 3 ad the umber of such pairs is at most 3, so the total cotributio here is at most 3 p 3. Addig up over all the above cases, we have that the variace is at most p 5 4 + p 3 3. By Chebyshev s iequality, we ca ow say If p, this teds to 0. P( o triagles ) P ( X E(X) > E(X)) σ µ = p5 4 + p 3 3 (p 3 3 ) = p + p 3 3 Remark. I a sese you ca thik of what we re doig as breakig up the sum defiig variace ito a sum over itersectig pairs ad a sum over o-itersectig pairs. The first sum is small just because there are t too may itersectig pairs, while the secod sum is zero because o-itersectig triagles are pairwise idepedet.. Back to the Méage s Problem Here s a problem we cosidered before: How may permutatios of,..., have σ(i) i(mod) ad σ(i) i + (mod) for i?

For ay, if we choose σ at radom, the probability that i is bad is. Sice these evets are i a sese early idepedet for large, we may also guess that the probability bad evets do ot occur is approximately ( ) e. Here s oe step towards makig that ituitio rigorous. Let, x i be the idicator fuctio of the bad evet i which σ(i) = i or i +. As above, we kow that E(x i ) = /. Imagie for ow that the x i were idepedet. We would have E((x +... + x ) ) = i = E(x i x j ) j ( ) + ( ) ( ) 4 = 6 + o() I the last lie we broke our sum up accordig to differet cases: Case (): If i = j, we have E(x i ) =. This happes for terms of our sum. Case(): If i j, the E(x i x j ) = = 4 terms with this property. sice x i ad x j are assumed idepedet. There are We ow tur to the computatio of the actual secod momet. As before, we write E((x + + x ) ) = i E(x i x j ), but ow x i ad x j are ot idepedet. We compute E(x i x j ) i three cases: Case (): If i j >, E(x i x j ) =. Case (): If i j =, E(x i x j ) = 3 ( ). (3 because oe of the four ways i which both x i ad x j could be bad ivolves both i ad j mapped to the same place by σ). Case (3): If i = j, E(x i x j ) =. We therefore have (addig up over the three cases i reverse order) E(x + + x ) = E(x i x j ) i j = + 3 ( ) + 4 ( 3) ( ) = 6 + o() So the secod momet is asymptotically the same as it were if the x i were idepedet. It is possible to show by a similar argumet that the same is true for E((x + + x ) k ) for ay k, which turs out to be eough to make the early idepedet ituitio rigorous. Remark. The splittig up ito cases here was similar to how it was for coutig triagles. Almost all of the pairs have i j >, i which case we have E(X i X j ) E(X i )E(X j ). There are a few pairs i j where the covariace is larger, but the umber of pairs is so small that the et cotributio to the variace is egligible. 3 j

3. Expoetial Momets ad the Cheroff Boud I a sese, Chebyshev s iequality is as tight as we ca hope for. If, for example, P(X = σ) = P(X = σ) = ad λ = the we certaily ca t hope to say aythig stroger tha P( X σ). But there are also may situatios where it is t close to beig as good as reality. For example, suppose that x i = ± with probability /, ad X = x i. We have Var(X) =, so Chebyshev s Iequality gives P (X > λ ) λ. However, i reality, X is asymptotically ormal, ad the probability decays like e λ /. I this case, Chebyshev is very far off from the correct boud for large λ. The key differece betwee this ad the tight example above is that here the sum is made up of a lot of idepedet variables which are ot too large. There will usually be a lot of cacellatio, ad we wat to exploit this to get a better tail boud. Here s a more geeral framework. Let x,..., x be idepedet radom variables with x i E(x i ), ad let X = x +...x. Let σ be the variace of X. We wat to fid a tail boud o P( X E(X) λσ. For simplicity, let us assume that E(x i ) = 0 (we ca always subtract a costat from each x i to make this true without affectig our boud). Before, our boud from Chebyshev s iequality was prove by usig a argumet alog the lies of P ( X E(X) > λ) = P ((X E(X)) > λ ) = P (X > λ ) E(X ) λ, Where the last iequality holds by Markov s iequality. To get a better boud, we ll apply a similar argumet to a steeper fuctio. Let t be a parameter to be chose later (we will evetually optimize over t). By Markov s iequality, we have P (X > λ) = P (e tx > e tλ ) E(etX ) e tλ. We ow exploit that our x i are idepedet to write E(e tx ) = E(e tx+tx+...+tx ) = E(e txi ). At this poit, we still eed to fid some boud o e txi. Assume for ow that t, The, tx i because of our assumptio that the x i are bouded. It is a quick calculus exercise to show that for u we have e u + u + u. This meas E(e txi ) E( + tx i + t x i ) = + Var(x i )t e t Var(x i) This trick here is why the Fourier Trasform (or Characteristic Fuctio, as some call it) is useful i probability. 4

Multiplyig over all i, we have σ et e tλ E(e txi ) e t Var(x i) exp = e t σ ( t Var(x i ) Hece, P (X > λ) E(etX ). Now, if we relabel our variables slightly, P (X > λσ) e t σ e tλσ. To e tλ get the best boud possible, we would like to take t = λ σ. However, we earlier made the assumptio t. Splittig up ito two cases our values of t give us the followig: () If λ σ, take t = λ σ. This gives, P (X > λσ) e λ 4. () If λ > σ, take t =. This gives P (x > λσ) e λσ+σ e λσ. A idetical argumet gives us a boud o the probability that x is very small. Combiig the two, we obtai Theorem. (Cheroff s Iequality) Let x,...x be idepedet radom variables, satisfyig x i E(x i ). Let X = x +... + x have variace σ. The, } P ( X E(X) λσ) max {e λ λσ 4, e. ) Oe particular special case here is useful. Suppose that each x i is the idicator fuctio of some evet with probability p i, meaig that x i = with probability p i ad x i = 0 otherwise. The Addig this up over all i, we get E(X) Var(X). E(x i ) = p i Var(x i ) = p i ( p i ). Now take λ = εe(x) σ. The Cheroff s boud becomes P ( x E(x) > ɛe(x)) max E(x) ɛ {e 4, e E(x) ɛ }. If E(x) is very large, this probability is automatically small. I particular, if the p i are bouded away from 0, this probability is expoetially small i. This is great for applicatios whe we wat to take a uio boud over a very large umber of evets. 4. Balace i Radom Graphs : Cosider a graph o vertices where every edge is idepedetly preset/abset with probability 0.5 ad large (that is to say a graph chose uiformly from all graphs o vertices). We would like to say that the edges of this graph are spread out evely. Here s three seses i which we could make such a claim: () Every vertex has degree betwee log() ad + log(). (So each vertex has about the same degree). () If we split the vertices ito equal sized subsets X ad Y, the umber of edges betwee X ad Y 3 is betwee 4 coectig them). ad 4 + 3. (Most subsets of vertices have about the same umber of vertices 5

(3) For ay two disjoit subsets X ad Y, the umber of edges betwee X ad Y are betwee X Y 3 ad X Y + 3. We ca show each of these without too much effort by the Cheroff boud ad the uio boud. For example, cosider (). Each vertex v has possible edges. The degree of v ca be thought of as x i, where x i = if edge i from v is preset. So x i has mea ad variace σ = 4. By Cheroff, P ( deg(v) > λ max(e λ λ 4, e 4 ). Now if we take λ = 4 log, the right had side becomes 4. So ay give vertex has abormal degree with probability at most 4, so the probability that some vertex has abormal degree as at most 3. For (), we do exactly the same thig. For ay particular split, the umber of edges has a mea 8 ad variace 6. So, σ = 4 ad by Cheroff, P ( umber of edges 8 > λ 4 λ λ max(e 4, e 8 ). The umber of splits is ( ) <. So we choose λ to make the Cheroff boud smaller tha. It turs out if we take λ =, the boud o the probability ay oe split fails becomes e. So the probability that some split fails is at most e. The proof for (3) is similar. Remark 3. Note that for all of these our methods were somewhat similar. Cout the umber of bad evets that we wat to avoid, the pick λ i the Cheroff boud so as to make the probability of each bad evet much smaller tha the umber of bad evets. 4.. Imbalace i Geeral Graphs. Oe questio we might ask if it is possible to, by clever costructio, come up with a graph that s more evely spread out tha the radom graph (i the sese of (3)). It s certaily reasoable to thig so i the balls i bis example it took us log balls to fill all the bis by radom droppig, but there s a obvious way of fillig all the bis with oly balls. As it turs out, the radom graph is actually the best possible here, up to possibly the costat i frot of the 3/. We ll oly sketch the proof of a weaker versio of this where we assume that the graph is regular (which ituitively should help us spread edges out more evely). Theorem 3. There a δ > 0 such that for large eough eve, ay subsets X ad Y such that E(X, Y ) X Y + δ 3. - regular graph o vertices has two Proof. (sketch): Let X be a subset of the vertices of the graph chose radomly, which P (x X) =.0 ad each x X idepedetly. Defiitio. We will call a vertex v heavy if: () v / X () The umber of eighbors of v i X is at least x + 0000 We will take Y to be the set of heavy vertices. It ca be verified directly that a give v is heavy with probability at least /0000, so E( Y ) /0000. O the other had, we kow that Y is at most. So 6

we have E( Y ) P ( Y > /0000) + /0000P ( Y /0000) P ( Y > /0000) + /0000 Comparig, P ( Y > /0000) /0000. This meas there must be a choice of X for which Y is at least /0000. But the by costructio E(X, Y ) X Y = E(X, {y}) X y Y Y 0000 δ 3/ 5. Asymptotic Bases Defiitio 3. Set A is a asymptotic basis of order k if all sufficietly large are the sum of at most k elemets of A. Example : Lagrage s 4 square Theorem (every positive iteger is the sum of at most 4 squares) meas that {, 4, 9, 6,...} is a basis of order 4. Example : Goldbach s Cojecture would imply that the primes {, 3, 5, 7,, 3,...} are a basis of order 3. Example 3: {,, 4, 8, 6, 3,...} is ot a basis for ay k, sice a umber with k + digits of i its biary represetatio caot be writte as the sum of k powers of. Questio. (Sido - 930 s): How thi ca a basis of positive itegers be? We ll focus o the case k =, ad start with a couple of cheap bouds. Let r A () be the umber of represetatios of as a + a, for a, a A. We first ote that if a + a, certaily a ad a. This meas N r A () A {,..., N} = Ay basis satisfies r A () for large, which by the above implies A {,...N} N. Coversely, if a ad a, the a + a N. This implies N A {,...N} r A (). So if r A () is small (ot too much larger tha ), the A {,..., N} N. 7

Theorem 4. (Erdős, 956) There is a A such that r A () = θ log(), meaig that there are costats c ad c such that for large eough we have c log() r A () c log() Proof. By the above, we expect that A {,..., N} should be about N log N. So we d like to cosider a log(x) set formed by takig x A radomly ad idepedetly with probability x. Well, we ca t quite do that, because probability are at most. I actuality, we ll take ( ) log() P ( A) = mi C,, where C is a large costat. The r A () is the sum of idicator evets x i of the form {i Aad i A}. We aim to show that roughly log of these evets hold. To do this, we ca safely igore (or eve adjust the probability of) ay costat umber of evets. The evets we ll fix correspod to the represetatios = ad = / + /, alog with ay evet where P(i A) =. For the remaiig evets, we have Therefore log(i) log( i) P (x i ) = C. i i E(r A ()) = Here, a upper boud for E(r A ()) is: C log i log() i i( i) + O(). c log() i = C log() C log() + o() i Ad a lower boud for E(r A ()) is: i= 4 C 09i 09( i) i( ) = = i= 4 C 09 4 4 log 3 4 4 C log() ( ) + o() c log() Takig C = 9, we see that for large 7 log() E(r A ()) 43 log(). Now for a fixed, the evets for each differece i are idepedet. Now we may use the Cheroff boud to see that P ( r A () E(r A ()) ) 6 log() e 3 log() = 3. So for sufficietly large the probability does t satisfy log() r A () 69 log()) is at most 3. This has a fiite sum over all, so by Borel Catelli we have with probability that the umber of which fail to satisfy this is fiite. 8

It s possible, but more complicated, to exted this argumet to cover the case of larger k (this was first doe by Erdős ad Tetali). The key wrikle that eeds to be overcome is that we o loger have idepedece. For example, if = 0 the represetatios 5 + 4 + ad 5 + 3 + are coupled by the presece of 5 i both of them. 5.. Two Ope Problems. Here are two ifamous cojectures due to Erdős ad Turá:. You caot have a basis for which r A () o(log()) for all large.. There is o costat C for which there exists a basis satisfyig r A () C. The secod cojecture is of course weaker tha the first. 9