The Probabilistic Method

Similar documents
Integer Linear Programs

11.1 Set Cover ILP formulation of set cover Deterministic rounding

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Theorem (Special Case of Ramsey s Theorem) R(k, l) is finite. Furthermore, it satisfies,

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:


Lecture 4: Probability and Discrete Random Variables

Topic 3 Random variables, expectation, and variance, II

Notes on Discrete Probability

Random variables (discrete)

Handout 1: Mathematical Background

Lecture 5: January 30

CMPUT 675: Approximation Algorithms Fall 2014

Induced subgraphs with many repeated degrees

Lecture 2: Streaming Algorithms

1 Basic Combinatorics

1 Variance of a Random Variable

Asymptotically optimal induced universal graphs

CS 246 Review of Proof Techniques and Probability 01/14/19

Asymptotically optimal induced universal graphs

Lecture 6 Random walks - advanced methods

6.842 Randomness and Computation Lecture 5

Exponential Tail Bounds

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

Notes 6 : First and second moment methods

3.2 Configuration model

n if n is even. f (n)=

Handout 1: Mathematical Background

Sample Spaces, Random Variables

Lecture 7: February 6

Lecture 2: Review of Basic Probability Theory

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014

are the q-versions of n, n! and . The falling factorial is (x) k = x(x 1)(x 2)... (x k + 1).

Bandits, Experts, and Games

Handout 1: Probability

Lecture 5: Probabilistic tools and Applications II

The Beginning of Graph Theory. Theory and Applications of Complex Networks. Eulerian paths. Graph Theory. Class Three. College of the Atlantic

EECS 70 Discrete Mathematics and Probability Theory Fall 2015 Walrand/Rao Final

The 2-valued case of makespan minimization with assignment constraints

Discrete Random Variable

Chapter 2: Random Variables

Lecture 9: March 26, 2014

The first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have

1 The independent set problem

CSE 312 Foundations, II Final Exam

Lecture 14 October 22

1. If X has density. cx 3 e x ), 0 x < 0, otherwise. Find the value of c that makes f a probability density. f(x) =

Lecture 4: Sampling, Tail Inequalities

1 INFO Sep 05

14.1 Finding frequent elements in stream

Class Note #20. In today s class, the following four concepts were introduced: decision

Random Graphs. EECS 126 (UC Berkeley) Spring 2019

Some Review Problems for Exam 3: Solutions

6.041/6.431 Fall 2010 Quiz 2 Solutions

Bipartite decomposition of random graphs

Topic: Primal-Dual Algorithms Date: We finished our discussion of randomized rounding and began talking about LP Duality.

Random Variable. Pr(X = a) = Pr(s)

DD2446 Complexity Theory: Problem Set 4

CS 350 Algorithms and Complexity

Sets. Review of basic probability. Tuples. E = {all even integers} S = {x E : x is a multiple of 3} CSE 101. I = [0, 1] = {x : 0 x 1}

0.1. Linear transformations

CS 350 Algorithms and Complexity

Discrete Mathematics and Probability Theory Summer 2017 Hongling Lu, Vrettos Moulos, and Allen Tang HW 6

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Lecture 1 : Probabilistic Method

Lecture 6,7 (Sept 27 and 29, 2011 ): Bin Packing, MAX-SAT

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

1 Closest Pair of Points on the Plane

Conditional Probability (cont'd)

1. Discrete Distributions

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

DS-GA 1002 Lecture notes 2 Fall Random variables

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Problem 1: Compactness (12 points, 2 points each)

MATH 682 Notes Combinatorics and Graph Theory II

UNIVERSITY OF CALIFORNIA, BERKELEY

1 Generating functions

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

2. This exam consists of 15 questions. The rst nine questions are multiple choice Q10 requires two

Lecture 1: Introduction to Sublinear Algorithms

2. The Binomial Distribution

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

2 Generating Functions

STAT 414: Introduction to Probability Theory

a zoo of (discrete) random variables

HW Solution 2 Due: July 10:39AM

WAITING FOR A BAT TO FLY BY (IN POLYNOMIAL TIME)

CS 237: Probability in Computing

Topic 3: The Expectation of a Random Variable

18440: Probability and Random variables Quiz 1 Friday, October 17th, 2014

Homework 2. Spring 2019 (Due Thursday February 7)

Basic Probability. Introduction

The Central Limit Theorem

MATH Solutions to Probability Exercises

Lecture On Probability Distributions

Generative Techniques: Bayes Rule and the Axioms of Probability

Single Maths B: Introduction to Probability

Transcription:

Lecture 3: Tail bounds, Probabilistic Method Today we will see what is nown as the probabilistic method for showing the existence of combinatorial objects. We also review basic concentration inequalities. As an example illustrating all these, we will give a randomized construction of an Ω(log n) integrality gap instance for the Set Cover problem. Disclaimer: These lecture notes are informal in nature and are not thoroughly proofread. In case you find a serious error, please send email to the instructor pointing it out. The Probabilistic Method A simple puzzle in graph theory is the following: In any graph on 6 vertices, show that there is either a clique or an independent set of size 3. 1 This is a simple example of the ind of questions studied in the area of Ramsey theory. One of the basic results there says that for any integer, there is an N() such that: in any graph on N() vertices, there is either a clique or an independent set of size. The first proof also gave an upper bound of 2 2 on N(). The natural question is if one can obtain a much better bound (one that is not exponential in, for example). Can we give an exponential in lower bound on N()? Giving a lower bound on N() simply means constructing a graph on a large number of vertices and proving that there is neither a clique nor an independent set. This question remained open until Erdős came up with a remarable way of proving it: he proved that a random graph on roughly 2 /2 vertices has neither a clique nor an independent set with non-zero probability. Thus it follows that there exist graphs on 2 /2 vertices without a clique or an independent set, implying N() 2 /2. Note that the method is non-constructive! We cannot, in the end, come up with an explicit family of graphs for every. The proof is actually rather simple. Consider the following random process: 1. Start with n vertices (n will be specified below). 2. For every pair i, j, add the edge {ij} independently with probability 1/2. The result of this process is a graph G. Theorem 1. Let > 2 be any integer, and suppose n 2 /2 1. Then with high probability, the graph G produced above has neither a clique nor an independent set of size. Proof. The proof is rather simple. Fix a set of vertices (call it S). What is the probability that they form either a clique or an independent set? It is precisely: 2 2 ( 2), 1 A clique a set of vertices in which every pair of vertices has an edge, and an independent set is a set in which no two vertices have an edge. The Probabilistic Method continued on next page... Page 1 of 5

because we have 2 ( 2) possibilities for edges between the vertices, all equally liely, and precisely two correspond to a clique and an independent set. Now we can tae a union bound over all sets S of size, and conclude that the probability that at least one of them is either a clique or an independent set is at most ( ) ( ) n 2 2 ( 2) < 2 log 2 n+1 ( 2) n. (we used < n ). For this quantity to be < 1, we must have log 2 n + 1 < ( ) 2 = ( 1) 2. From our choice of n, we have log 2 n = /2 1, for which it is easy to chec that the above holds for any > 2. This completes the proof, and thus shows that N() 2 /2 1. The probabilistic method. The general technique of proving the existence of combinatorial objects with certain properties by proving that the probability of the properties holding for a random object is non-zero, came to be called the probabilistic method. It is used in a variety of applications, from dimension reduction and setching, to low-congestion routings in communication networs, to Ramsey theory. One interesting thing to note, is that computation above (with a slightly smaller n) shows that in fact, a random graph has neither a clique nor an independent set of size, with extremely high probability. But even so, constructing an explicit graph with this property for every is rather tricy! This is a fairly general phenomenon with probabilistic constructions, where we now that random objects possess certain properties with very high probability, but we cannot pin down an explicit object with the property. The phrase finding hay in a haystac was coined to illustrate to this seemingly strange situation. Concentration Inequalities A concentration inequality, or a tail bound for a random variable, is an inequality which says that the probability that the random variable deviates too much from its expectation is small. For a well written introduction, please refer to the survey of Chung and Lu: https://projecteuclid.org/euclid.im/1175266369 Pay particular attention to the simplest case, namely the sum of independent Bernoulli trials: let X i be independent coin tosses, i.e., random variables that taes value 0 with probability 1/2, and 1 with probability 1/2. Now consider the sum X = n i=1 X i. Recall also the formal definition of independence of a collection of random variables: 0/1 random variables X 1,..., X n are said to be independent if for any subset of the variables X i1, X i2,..., X i, and any assigment (a 1,..., a ) {0, 1}, we have Pr[X i1 = a 1 X i2 = a 2 X i = a ] = Pr[X ij = a j ]. j=1 The random variable Y = X E[X] has a simple interpretation (scaling by a factor 2) as the position at time n, of the natural random wal on the line (we start at the origin, and at every time step, we toss a fair Page 2 of 5

coin, move left by one unit if heads, or right by one unit if tails). To get a sense of how large we expect the magnitude of Y to be, a natural measure is the variance, i.e., E[ Y 2 ]. Let us evaluate this quantity. To do so, first recall that Y = Y 1 + + Y n, where Y i = X i E[X i ] = X i 1/2. We also recall the linearity of expectation, a fundamental notion in analyzing probabilistic processes: For any collection of random variables Z 1, Z 2,..., Z n (they need not be independent!), we have E[ i Z i ] = i E[Z i ]. We can use this to conclude that E[Y 2 ] = E[( i Y i ) 2 ] = E[ i Y 2 i + i j Y i Y j ] = i E[Y 2 i ] + i j E[Y i Y j ]. Now for any i j, it is easy to see that E[Y i Y j ] = 0 (convince yourself if you don t see this immediately). Further, Yi 2 is 1/4 w.prob. 1, thus E[Yi 2 ] = 1/4. These observations imply that E[Y 2 ] = n/4. So this means that typically, we expect the value Y to be n. What is the probability that it is much larger than say C n, for some large constant C? This is precisely the question answered by Chernoff bounds. It turns out to be < e C2. This also hints at the general format of most tail inequalities, which is the following: Let X be a random variable that is an aggregate of several independent random variables, without depending too strongly on any of them. Then the probability that X is more than C standard deviations away from its mean is exponentially small in C. Recall that standard deviation is simply the square root of the variance (it is the typical value of Y above). The first 10 or so pages of the Chung-Lu survey give several examples of such theorems. Apart from this, a great reference is Terry Tao s blog post: https://terrytao.wordpress.com/2010/01/03/254a-notes-1-concentration-of-measure/ Set Cover An Integrality Gap Recall the set cover problem, we have topics T 1,..., T n, and people P 1,..., P m, and each person is an expert on a subset of the topics. The goal is to pic the smallest number of people, among whom there is an expert on every T i. The ILP we considered last time is the following. For every person, we have a 0/1 variable x i that indicates if the person is piced. We then have the constraints: for each topic T j, i:p i expert on T j x i 1. (As we did last time, we will write i T j to denote i being an expert on T j.) The goal is to minimize i x i. Set Cover An Integrality Gap continued on next page... Page 3 of 5

The LP relaxation replaces x i {0, 1} with 0 x i 1. We now describe an instance in which the value of the objective ( i x i) is O(1) for the LP, but is at least Ω(log n) for the ILP. The instance is very simple: for each topic T j and person i, we mae i an expert on T j with probability 1/2, independently of other pairs i, j. What we show is the following: Theorem 2. Let I be the instance of set cover generated by the above probabilistic process. With probability at least 3/4, it satisfies the following two properties: 1. The optimal solution to the ILP is (1/4) log 2 n. 2. There exists a (fractional) solution to the LP with objective value 4. Let us now prove the theorem. We show that the probability of each of the conditions holding is 7/8, and from there, the theorem follows (Union bound the failure events). The first condition. Intuitively, the reason this condition holds is the following. Suppose we choose any set of people. Now, for a topic j, the probability that none of the chosen people is an expert on T j is precisely 2. Thus the expected number of topics that are not covered by any of the chosen people is n 2 (this is exactly lie tossing n coins, each having a probability 2 of falling heads, and we want the expected number of heads). Now as long as this number is > 1, there exists an uncovered topic. Thus, we must have log n in order to have a valid cover. The reasoning above is heuristic for two reasons. First, it was in expectation. Second, it was for a fixed set of people. We now mae the argument formal. Let us fix some -subset of people S. The probability that S does not cover some T j is 2, as we have seen. These events for different j are independent, because of the way we piced the graph. Hence, the probability that they cover all the n topics is equal to ( 1 1 2 ) n, which is e n/2. Now, what is the probability there exists a -element subsets of [n] that covers all the topics? This is tricy to compute exactly, but we can upper bound the probability by ( n ) times the probability that any given S covers all the elements (union bound). From the calculation above, this is at most ( ) n e n/2. To mae this < 1/8, we must have 8 ( n ) < e n/2. We claim that this holds whenever < (1/4) log 2 n (for large enough n). The LHS is upper bounded by 8n, which is 8e log n, or O(e log2 n ). The RHS is at least e n3/4. Since n 3/4 log 2 n (for n large), the claim follows. Thus the probability that there exists a set of size (1/4) log 2 n that covers all the topics is < 1/8. Consider the solution x i = 4/n for all i. Now consider the constraint correspond- The second condition. ing to some topic T j : i T j x i 1. (1) Set Cover An Integrality Gap continued on next page... Page 4 of 5

To show that this holds, we only need to prove that for every j, the size of the set {i : i T j } is at least n/4. From our construction, the expected size of the set is n/2. What is the probability that it is < n/4? We can use Chernoff bounds to show that this probability is at most e Ω(n). For n large enough, this quantity is < 1/8n, i.e., the probability that the constraint fails for some j is < 1/8n. Thus the probability that none of the constraints fail is at least 7/8, which is what we wanted to prove. This shows that the LP relaxation for the Set Cover problem has an integrality gap of Ω(log n). Page 5 of 5