Notes 6 : First and second moment methods

Similar documents
Moments and tails. Chapter 2

s k k! E[Xk ], s <s 0.

Lecture 7: February 6

The expansion of random regular graphs

Lecture 5: January 30

Module 3. Function of a Random Variable and its distribution

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1

Notes 18 : Optional Sampling Theorem

Notes 15 : UI Martingales

Lecture 8: February 8

Asymptotically optimal induced universal graphs

Lecture 3 - Expectation, inequalities and laws of large numbers

RANDOM GRAPHS. Joel Spencer ICTP 12

Random Graphs. EECS 126 (UC Berkeley) Spring 2019

Course Notes. Part IV. Probabilistic Combinatorics. Algorithms

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Basic counting techniques. Periklis A. Papakonstantinou Rutgers Business School

Notes 1 : Measure-theoretic foundations I

Asymptotically optimal induced universal graphs

Estimates for probabilities of independent events and infinite series

Lecture 1 : Probabilistic Method

The number of Euler tours of random directed graphs

Avoider-Enforcer games played on edge disjoint hypergraphs

Small subgraphs of random regular graphs

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

On the number of cycles in a graph with restricted cycle lengths

Graph coloring, perfect graphs

Notes on the second moment method, Erdős multiplication tables

Modern Discrete Probability Branching processes

Decomposing oriented graphs into transitive tournaments

6.1 Moment Generating and Characteristic Functions

Ahlswede Khachatrian Theorems: Weighted, Infinite, and Hamming

Solutions to Problem Set 5

BALANCING GAUSSIAN VECTORS. 1. Introduction

Lectures 2 3 : Wigner s semicircle law

Containment restrictions

RANDOM GRAPHS BY JOEL SPENCER. as n.

arxiv: v1 [math.co] 17 Dec 2007

Assignment 4: Solutions

Math Bootcamp 2012 Miscellaneous

Bisections of graphs

Combinatorial Optimization

6.207/14.15: Networks Lecture 4: Erdös-Renyi Graphs and Phase Transitions

Nonnegative k-sums, fractional covers, and probability of small deviations

Pigeonhole Principle and Ramsey Theory

3 Undirected Graphical Models

Lower Bounds for Testing Bipartiteness in Dense Graphs

The Lefthanded Local Lemma characterizes chordal dependency graphs

Induced subgraphs of prescribed size

Basic Probability. Introduction

ACO Comprehensive Exam October 14 and 15, 2013

Randomized Algorithms

Seunghee Ye Ma 8: Week 2 Oct 6

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

MAS221 Analysis Semester Chapter 2 problems

Large topological cliques in graphs without a 4-cycle

STAT 200C: High-dimensional Statistics

3 Integration and Expectation

Induced Subgraph Isomorphism on proper interval and bipartite permutation graphs

Random Graphs. 7.1 Introduction

Lectures 2 3 : Wigner s semicircle law

1 Sequences of events and their limits

Analysis-3 lecture schemes

Similar to sequence, note that a series converges if and only if its tail converges, that is, r 1 r ( 1 < r < 1), ( 1) k k. r k =

Randomness and Computation

On the mean connected induced subgraph order of cographs

MATH 426, TOPOLOGY. p 1.

The concentration of the chromatic number of random graphs

6.207/14.15: Networks Lecture 3: Erdös-Renyi graphs and Branching processes

On the adjacency matrix of a block graph

Graphs with Large Variance

COMPSCI 240: Reasoning Under Uncertainty

Acyclic subgraphs with high chromatic number

CPSC 536N: Randomized Algorithms Term 2. Lecture 2

Limiting Distributions

Lecture 4: Two-point Sampling, Coupon Collector s problem

Notes 13 : Conditioning

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be

k-protected VERTICES IN BINARY SEARCH TREES

Bipartite decomposition of random graphs

Lecture 4: Phase Transition in Random Graphs

Cross-Intersecting Sets of Vectors

Lecture 18: March 15

Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds

Short cycles in random regular graphs

Irredundant Families of Subcubes

Independent Transversals in r-partite Graphs

Graphs with few total dominating sets

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

Not all counting problems are efficiently approximable. We open with a simple example.

Construction of a general measure structure

Math 564 Homework 1. Solutions.

EDGE-STATISTICS ON LARGE GRAPHS

Katarzyna Mieczkowska

A n = A N = [ N, N] A n = A 1 = [ 1, 1]. n=1

Application: Bucket Sort

Lecture 6: September 22

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Transcription:

Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative random variable. Then, for all b > 0, P[X b] EX b. (1) THM 6.2 (Chebyshev s inequality) Let X be a random variable with EX 2 < +. Then, for all β > 0, 1 First moment method P[ X EX > β] Var[X] β 2. (2) Recall that the expectation of a random variable has an elementary, yet handy property: linearity. If random variables X 1,..., X n defined on a joint probability space have finite first moments E[X 1 + + X n ] = E[X 1 ] + + E[X n ], (3) without any further assumption. In particular linearity holds whether or not the X i s are independent. 1.1 The probabilistic method A key technique of probabilistic combinatorics is the so-called the probabilistic method. The idea is that one can establish the existence of an object satisfying a certain property without having to construct one explicitly. Instead one argues that a randomly chosen object exhibits the given property with positive probability. The following obvious observation, sometimes referred to as the first moment principle, plays a key role in this context. 1

Lecture 6: First and second moment methods 2 THM 6.3 (First moment principle) Let X be a random variable with finite expectation. Then, for any µ R, EX µ = P[X µ] > 0. Proof: We argue by contradiction, assume EX µ and P[X µ] = 0. Write {X µ} = n 1 {X < µ + 1/n}. That implies by monotonicity that, for any ε (0, 1), P[X < µ + 1/n] < ε for n large enough. Hence µ EX = E[X; X < µ + 1/n] + E[X; X µ + 1/n] µ P[X < µ + 1/n] + (µ + 1/n)(1 P[X < µ + 1/n]) > µ, a contradiction. The power of this principle is easier to appreciate on an example. EX 6.4 (Balancing vectors) Let v 1,..., v n be arbitrary unit vectors in R n. How small can we make the norm of the combination x 1 v 1 + + x n v n by appropriately choosing x 1,..., x n { 1, +1}? We claim that it can be as small as n, for any collection of v i s. At first sight, this may appear to be a complicated geometry problem. But the proof is trivial once one thinks of choosing the x i s at random. Let X 1,..., X n be independent random variables uniformly distributed in { 1, +1}. Then E X 1 v 1 + + X n v n 2 = E X i X j v i v j i,j = i,j = i,j = i E[X i X j v i v j ] (4) v i v j E[X i X j ] v i 2 = n, where we used the linearity of expectation in (4). But note that a discrete random variable Z = X 1 v 1 + + X n v n 2 with expectation EZ = n must take a value n with positive probability by the first moment principle (Theorem 6.3). In other words, there must be a choice of X i s such that Z n. That proves the claim.

Lecture 6: First and second moment methods 3 1.2 Union bound Markov s inequality (THM 6.1) can be interpreted as a quantitative version of the first moment principle (THM 6.3). In this context, it is often stated in the following special form. THM 6.5 (First moment method) If X is a non-negative, integer-valued random variable, then P[X > 0] EX. (5) Proof: Take b = 1 in Markov s inequality (Theorem 6.1). In words THM 6.3 implies that, if a non-negative integer-valued random variable X has expectation smaller than 1, then its value is 0 with positive probability. THM 6.5 adds: if X has small expectation, then its value is 0 with large probability. This simple fact is typically used in the following manner: one wants to show that a certain bad event does not occur with probability approaching 1; the random variable X then counts the number of such bad events. See the examples below. In that case, X is a sum of indicators and THM 6.5 reduces to the standard union bound, also known as Boole s inequality. COR 6.6 Let B m = A 1 A m, where A 1,..., A m is a collection of events. Then, letting µ m := P[A i ], i we have P[B m ] µ m. In particular, if µ m 0 then P[B m ] 0. Proof: This is of course a fundamental property of probability measures. (Or take X = i 1 A i in THM 6.5.) Applications of THM 6.3 and 6.5 in the probabilistic method are referred to as the first moment method. We give another example in the next section. 1.3 Random permutations: longest increasing subsequence In this section, we bound the expected length of a longest increasing subsequence in a random permutation. Let σ n be a uniformly random permutation of [n] := {1,..., n} and let L n be the length of a longest increasing subsequence of σ n. CLAIM 6.7 EL n = Θ( n).

Lecture 6: First and second moment methods 4 Proof: We first prove that lim sup n EL n n e, which implies half of the claim. Bounding the expectation of L n is not straightforward as it is the expectation of a maximum. A natural way to proceed is to find a value l for which P[L n l] is small. More formally, we bound the expectation as follows EL n l P[L n < l] + n P[L n l] l + n P[L n l], (6) for an l chosen below. To bound the probability on the r.h.s., we appeal to the first moment method by letting X n be the number of increasing subsequences of length l. We also use the indicator trick, i.e., we think of X n as a sum of indicators over subsequences (not necessarily increasing) of length l. There are ( n l) such subsequences, each of which is increasing with probability 1/l!. Note that these subsequences are not independent. Nevertheless, by the linearity of expectation and the first moment method, P[L n l] = P[X n > 0] EX n = 1 l! ( ) n nl l (l!) 2 n l e 2 [l/e] 2l ( ) 2l e n, l where we used a standard bound on factorials. Note that, in order for this bound to go to 0, we need l > e n. The first claim follows by taking l = (1 + δ)e n in (6), for an arbitrarily small δ > 0. For the other half of the claim, we show that EL n n 1. This part does not rely on the first moment method (and may be skipped). We seek a lower bound on the expected length of a longest increasing subsequence. The proof uses the following two ideas. First observe that there is a natural symmetry between the lengths of the longest increasing and decreasing subsequences they are identically distributed. Moreover if a permutation has a short longest increasing subsequence, then intuitively it must have a long decreasing subsequence, and vice versa. Combining these two observations gives a lower bound on the expectation of L n. Formally, let D n be the length of a longest decreasing subsequence. By symmetry and the arithmetic mean-geometric mean inequality, note that [ ] Ln + D n EL n = E E L n D n. 2

Lecture 6: First and second moment methods 5 We show that L n D n n, which proves the claim. We use a clever combinatorial argument. Let L (k) n be the length of a longest increasing subsequence ending at position k, and similarly for D n (k). It suffices to show that the pairs (L (k) n, D n (k) ), 1 k n are distinct. Indeed, noting that L (k) n L n and D n (k) D n, the number of pairs in [L n ] [D n ] is at most L n D n which must then be at least n. Let 1 j < k n. If σ n (k) > σ n (j) then we see that L (k) n > L (j) n by appending σ n (k) to the subsequence ending at position j achieving L (j) n. The opposite holds for the decreasing case, which implies that (L (j) n, D n (j) ) and (L (k) n, D n (k) ) must be distinct. This combinatorial argument is known as the Erdös-Szekeres theorem. That concludes the proof of the second claim. 2 Second moment method The first moment method gives an upper bound on the probability that a nonnegative, integer-valued random variable is positive provided its expectation is small enough. In this section we seek a lower bound on that probability. We first note that a large expectation does not suffice in general. Say X n is n 2 with probability 1/n, and 0 otherwise. Then EX n = n +, yet P[X n > 0] 0. That is, although the expectation diverges, the probability that X n is positive can be arbitrarily small. So we turn to the second moment. Intuitively the basis for the so-called second moment method is that, if the expectation of X n is large and its variance is relatively small, then we can bound the probability that X n is close to 0. As we will see in applications, the first and second moment methods often work hand-in-hand. 2.1 Paley-Zygmund inequality As an immediate corollary of Chebyshev s inequality (THM 6.2), we get a first version of the so-called second moment method: if the standard deviation of X is less than its expectation, then the probability that X is 0 is bounded away from 1. Formally, let X be a non-negative, integer-valued random variable (not identically zero). Then P[X > 0] 1 Var[X] (EX) 2. (7) Indeed, by (2), P[X = 0] P[ X EX EX] Var[X] (EX) 2.

Lecture 6: First and second moment methods 6 The following tail inequality, a simple application of Cauchy-Schwarz, leads to an improved version of the second moment method. THM 6.8 (Paley-Zygmund inequality) Let X be a non-negative random variable. For all 0 < θ < 1, Proof: We have P[X θ EX] (1 θ) 2 (EX)2 E[X 2 ]. (8) EX = E[X1 {X<θEX} ] + E[X1 {X θex} ] θex + E[X 2 ]P[X θex], where we used Cauchy-Schwarz. Rearranging gives the result. As an immediate application: THM 6.9 (Second moment method) Let X be a non-negative random variable (not identically zero). Then P[X > 0] (EX)2 E[X 2 ]. (9) Proof: Take θ 0 in (8). Since (EX) 2 E[X 2 ] = 1 Var[X] (EX) 2 + Var[X], we see that (9) is stronger than (7). We typically apply the second moment method to a sequence of random variables (X n ). The previous theorem gives a uniform lower bound on the probability that {X n > 0} when E[X 2 n] C(E[X n ]) 2 for some C > 0. Just like the first moment method, the second moment method is often applied to a sum of indicators. COR 6.10 Let B m = A 1 A m, where A 1,..., A m is a collection of events. Write i j if i j and A i and A j are not independent. Then, letting µ m := i P[A i ], γ m := i j P[A i A j ], where the second sum is over ordered pairs, we have lim m P[B m ] > 0 whenever µ m + and γ m Cµ 2 m for some C > 0. If moreover γ m = o(µ 2 m) then lim m P[B m ] = 1.

Lecture 6: First and second moment methods 7 Proof: Take X := i 1 A i in the second moment method (THM 6.9). Note that Var[X] = i Var[1 Ai ] + i j Cov[1 Ai, 1 Aj ], where and, if A i and A j are independent, whereas, if i j, Hence Noting Var[1 Ai ] = E[(1 Ai ) 2 ] (E[1 Ai ]) 2 P[A i ], Cov[1 Ai, 1 Aj ] = 0, Cov[1 Ai, 1 Aj ] = E[1 Ai 1 Aj ] E[1 Ai ]E[1 Aj ] P[A i A j ]. Var[X] (EX) 2 µ m + γ m µ 2 m = 1 + γ m. µ m µ 2 m (EX) 2 E[X 2 ] = (EX) 2 (EX) 2 + Var[X] = 1 1 + Var[X]/(EX) 2, and applying THM 6.9 gives the result. We give an application of the second moment method in the section. 2.2 Erdös-Rényi random graph: small subgraphs We start with some definitions. Definitions An undirected graph (or graph for short) is a pair G = (V, E) where V is the set of vertices (or nodes or sites) and E {{u, v} : u, v V }, is the set of edges (or bonds). A subgraph of G = (V, E) is a graph G = (V, E ) with V V and E E. A subgraph containing all possible non-loop edges between its vertices is called a complete subgraph or clique. We consider here random graphs. The Erdös-Rényi random graph is defined as follows. DEF 6.11 (Erdös-Rényi graphs) Let V = [n] and p [0, 1]. The Erdös-Rényi graph G = (V, E) on n vertices with density p is defined as follows: for each pair x y in V, the edge {x, y} is in E with probability p independently of all other edges. We write G G n,p and we denote the corresponding measure by P n,p.

Lecture 6: First and second moment methods 8 Threshold phenomena are common in random graphs. Formally, a threshold function for a graph property P is a function r(n) such that { 0, if p n r(n) lim P n,pn [G n has property P ] = n 1, if p n r(n), where, under P n,pn, G n G n,pn is an Erdös-Rényi graph with n vertices and density p n. In this section, we first illustrate this definition on the clique number. Cliques clique. Let ω(g) be the clique number of a graph G, i.e., the size of its largest CLAIM 6.12 The property ω(g) 4 has threshold function n 2/3. Proof: Let X n be the number of 4-cliques in the Erdös-Rényi graph G n G n,pn. Then, noting that there are ( 4 2) = 6 edges in a 4-clique, ( ) n E n,pn [X n ] = p 6 n = Θ(n 4 p 6 4 n), which goes to 0 when p n n 2/3. Hence the first moment method (THM 6.5) gives one direction. For the other direction, we apply the second moment method for sums of indicators, COR 6.10. For an enumeration S 1,..., S m of the 4-tuples of vertices in G n, let A 1,..., A m be the events that the corresponding 4-cliques are present. By the calculation above we have µ m = Θ(n 4 p 6 n) which goes to + when p n n 2/3. Also µ 2 m = Θ(n 8 p 12 n ) so it suffices to show that γ m = o(n 8 p 12 n ). Note that two 4-cliques with disjoint edge sets (but possibly sharing one vertex) are independent. Suppose S i and S j share 3 vertices. Then P n,pn [A i A j ] = p 3 n, as the event A j implies that all edges between three of the vertices in S i are present, and there are 3 edges between the remaining vertex and the rest of S i. Similarly if

Lecture 6: First and second moment methods 9 S i S j = 2, P n,pn [A i A j ] = p 5 n. Putting these together we get γ m = i j = P n,pn [A j ] P n,pn [A i A j ] ( ) n p 6 n 4 [( ) 4 (n 4)p 3 n + 3 = O(n 5 p 9 n) + O(n 6 p 11 n ) ( n 8 p 12 ) ( n n 8 p 12 ) n = O n 3 p 3 + O n n 2 p n = o(n 8 p 12 n ) = o(µ 2 m), ( 4 2 )( n 4 2 ) ] p 5 n where we used that p n n 2/3 (so that for example n 3 p 3 n 1). COR 6.10 gives the result. Roughly speaking, the first and second moments suffice to pinpoint the threshold in this case because the indicators in X n are mostly pairwise independent and, as a result, the sum is concentrated around its mean. References [Roc] Sebastien Roch. Modern Discrete Probability: An Essential Toolkit. Book in preparation. http://www.math.wisc.edu/ roch/mdp/.