Tail Inequalities. The Chernoff bound works for random variables that are a sum of indicator variables with the same distribution (Bernoulli trials).

Similar documents
Chernoff Bounds. Theme: try to show that it is unlikely a random variable X is far away from its expectation.

Assignment 4: Solutions

Tail Inequalities Randomized Algorithms. Sariel Har-Peled. December 20, 2002

Chapter 4 - Tail inequalities

Lecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a]

Randomized algorithm

Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds

The first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have

Randomized Algorithms Week 2: Tail Inequalities

With high probability

Solutions to Problem Set 4

CSE 525 Randomized Algorithms & Probabilistic Analysis Spring Lecture 3: April 9

Expectation, inequalities and laws of large numbers

In a five-minute period, you get a certain number m of requests. Each needs to be served from one of your n servers.

Lecture 4: Sampling, Tail Inequalities

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Disjointness and Additivity

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )

Lecture 4: Two-point Sampling, Coupon Collector s problem

Twelfth Problem Assignment

Homework 4 Solutions

Probability. VCE Maths Methods - Unit 2 - Probability

Randomized Algorithms

1 Large Deviations. Korea Lectures June 2017 Joel Spencer Tuesday Lecture III

Essentials on the Analysis of Randomized Algorithms

Lecture 5: January 30

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Introduction to Probability 2017/18 Supplementary Problems

Randomized Load Balancing:The Power of 2 Choices

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

variance of independent variables: sum of variances So chebyshev predicts won t stray beyond stdev.

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Name: Firas Rassoul-Agha

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Randomized Algorithms I. Spring 2013, period III Jyrki Kivinen

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

1 Review of The Learning Setting

Midterm Exam 2 (Solutions)

Outline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University.

HW2 Solutions, for MATH441, STAT461, STAT561, due September 9th

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

X = X X n, + X 2

CSCI8980 Algorithmic Techniques for Big Data September 12, Lecture 2

CPSC 536N: Randomized Algorithms Term 2. Lecture 2

Probability Theory Overview and Analysis of Randomized Algorithms

CSE548, AMS542: Analysis of Algorithms, Fall 2016 Date: Nov 30. Final In-Class Exam. ( 7:05 PM 8:20 PM : 75 Minutes )

Janson s Inequality and Poisson Heuristic

Expectation of geometric distribution

Random Variable. Pr(X = a) = Pr(s)

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Lecture 1: Introduction to Sublinear Algorithms

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

11.1 Set Cover ILP formulation of set cover Deterministic rounding

Exponential Tail Bounds

Midterm Exam 1 (Solutions)

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques

MARKING A BINARY TREE PROBABILISTIC ANALYSIS OF A RANDOMIZED ALGORITHM

With Question/Answer Animations. Chapter 7

Math 180B Problem Set 3

Conditional Expectation and Martingales

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 5 Spring 2006

ISyE 6739 Test 1 Solutions Summer 2015

The space complexity of approximating the frequency moments

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

MATH Notebook 5 Fall 2018/2019

Expectation of geometric distribution. Variance and Standard Deviation. Variance: Examples

Discrete Mathematics and Probability Theory Fall 2011 Rao Midterm 2 Solutions

Discrete Probability Distributions

Explicit bounds on the entangled value of multiplayer XOR games. Joint work with Thomas Vidick (MIT)

Hoeffding, Chernoff, Bennet, and Bernstein Bounds

Discrete Mathematics

Lectures on Elementary Probability. William G. Faris

STAT 200C: High-dimensional Statistics

Lectures 6: Degree Distributions and Concentration Inequalities

Week 2. Review of Probability, Random Variables and Univariate Distributions

Quick Tour of Basic Probability Theory and Linear Algebra

Introduction to Probability

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

Lecture notes for probability. Math 124

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Tail and Concentration Inequalities

Math 493 Final Exam December 01

6.842 Randomness and Computation Lecture 5

Review of Probabilities and Basic Statistics

Probabilities and Expectations

the law of large numbers & the CLT

Binomial and Poisson Probability Distributions

MATH 556: PROBABILITY PRIMER

Discrete Distributions

Lecture 18: March 15

Exam III #1 Solutions

Collaborative transmission in wireless sensor networks

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

ECEN 689 Special Topics in Data Science for Communications Networks

Copyright c 2006 Jason Underdown Some rights reserved. choose notation. n distinct items divided into r distinct groups.

Polytechnic Institute of NYU MA 2212 MIDTERM Feb 12, 2009

Transcription:

Tail Inequalities William Hunt Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV William.Hunt@mail.wvu.edu Introduction In this chapter, we are interested in obtaining significantly stronger and sharper ways for making the statement: with high probability. Using Markov s Inequality just doesn t give us a strong enough statement. Chernoff bounds are another kind of tail bound. They bound the total amount of probability of some random variable Y that is far from the mean. The Chernoff bound applies to a class of random variable that are a sum of independent indicator random variables. Occupancy problems for balls in bins is such a class, so Chernoff bounds apply. 2 Poisson Trials The Chernoff bound works for random variables that are a sum of indicator variables with the same distribution (Bernoulli trials). Let X, X 2,, X n be independent Bernoulli trials such that Pr[X i = ] = p i and Pr[X i = 0] = ( p i ) The sums of independent Bernoulli trials are considered Poisson trials. Thus, X = X +X 2 + +X n is a Poisson trial. All our bounds apply to the special case when X i are Bernoulli trials with identical probabilities, so that X has the binomial distribution. For any δ > 0, we want to answer the following questions about how much deviation of the expected value occurs so that we can find a bound on the tail probability.. What is the probability that X exceeds ( + δ)µ? Pr[X > ( + δ)µ] 2. How large should δ be so the probability is less than ε? Pr[X > ( + δ)µ] < ε

3 Chernoff Bound To answer the type of question useful in analyzing an algorithm, we can use the Chernoff Bound technique. This is seeking a bound on the tail probabilities. Definition: 3. For a random variable X, E[e tx ] is the moment generating function. We are going to focus on the sum of independent random variables. In particular, we are going to look at bounds on the sum of independent Poisson trials. The idea for the Chernoff Bound technique is to take the moment generating function of X and apply the Markov inequality to it. Theorem: 3. Let X, X 2,, X n be independent Poisson trials such that, for i n, Pr[X i = ] = p i with 0 < p i <. Then, for X = n X i, µ = E[X] = n p i, and for any δ > 0, [ e δ Pr[X > ( + δ)µ] < ( + δ) (+δ) ] µ Proof: For any positive real number t, Pr[X > ( + δ)µ] = Pr[e tx > e t(+δ)µ ] Applying the Markov inequality to the right hand side, we get Pr[e tx > e t(+δ)µ ] < E[etX ] e t(+δ)µ This is easy to calculate because the products in the exponents become sums in the expected value expression and since each X i is independent, the expectation of the product becomes the product of the expectations. So, the numerator can be calculated the expectation as: n t X i E[e tx ] = E[e ] = E[ e txi ] = E[e txi ] The random variable e tx i has the value of e t with probability p i and the value with probability p i. Therefore, the expected value is p i (e t ) + ( p i )() making the numerator: (p i e t ( + p i ) = + pi (e t ) ) 2

Using the inequality + x < e x with x = p i (e t ), we get the right hand side to be: Pr[e tx > e t(+δ)µ ] < Pr[e tx > e t(+δ)µ ] < e e p i (et ) e t(+δ)µ n p i (e t ) e t(+δ)µ Pr[e tx > e t(+δ)µ ] < e(et )µ e t(+δ)µ Now, we choose a t such that the right hand side is maximized: t = ln( + δ) Substituting t in to the inequality gives us our theorem. Definition: 3.2 The upper tail bound function for the sum of Poisson trials is [ Pr[x > ( + δ)µ] = F + (µ, δ) = e δ ] µ ( + δ) (+δ) We can use a similar argument to find the lower tail bound function for the sum of Poisson trials. Theorem: 3.2 Let X, X 2,, X n be independent Poisson trials such that, for i n, Pr[X i = ] = p i with 0 < p i <. Then, for X = n X i, µ = E[X] = n p i, and for any δ > 0, Pr[X < ( δ)µ] < e µδ2 2 Proof: For any positive real number t, Pr[X < ( δ)µ] = Pr[ X > ( δ)µ] Pr[X < ( δ)µ] = Pr[e tx > e t( δ)µ ] Applying the Markov inequality to the right hand side, we get Pr[e tx > e t( δ)µ ] < E[e tx ] e t( δ)µ Following the same steps as done in the upper bound proof, the numerator becomes: E[e tx ] = E[e t n X i ] = E[ e tx i ] = 3 E[e tx i ]

The random variable e tx i has the value of e t with probability p i and the value with probability p i. Therefore, the expected value is p i (e t ) + ( p i )() making the numerator: (p i e t + p i ) = ( + pi (e t ) ) Using the inequality + x < e x with x = p i (e t ), we get the right hand side to be: ( Now, we choose t to be: t = ln δ ) Pr[e tx > e t( δ)µ ] < Pr[e tx > e t( δ)µ ] < e e p i (e t ) e t( δ)µ n p i (e t ) e t( δ)µ Pr[e tx > e t( δ)µ ] < e(e t )µ e t( δ)µ Substituting t in to the inequality gives us: [ e δ Pr[X < ( δ)µ] < ( δ) ( δ) ] µ This can be simplified since for δ (0, ], we note that ( δ) ( δ) > e ( δ+ δ2 2 ). And by using the McLaurin expansion for ln( δ), we get the result of the theorem. Definition: 3.3 The lower tail bound function for the sum of Poisson trials is Pr[X < ( δ)µ] = F (µ, δ) = e µδ2 2 Example (): Suppose that sports commentators say that WVU wins each game that they play with a probability of 3. Assuming that the winning the games are independent of each other, what is the upper bound on the probability that they will have a winning season in a season that lasts n games? We can characterize this event by the following: Pr [ X > n 2 ], that is, what is the probability that more than half the games are won. Using the fact that the probability of winning one game is 3, we can calculate the expected value of winning n games to be E[X] = n 3 4

We now need to rewrite the probability in terms of δ and µ. In other words, we need to rewrite X > n 2 to be in the form of X > ( + δ)µ. So, we just do simple algebra: ( + δ)µ = n 2 ( + δ) n 3 = n 2 + δ = 3 2 δ = 2 So, now we have the parameters needed and can now plug into the upper bound formula: Pr[X > n 2 ] < F + ( n 3, 2 ) < ( + 2 e 2 ) (+ 2) n 3 < (0.965) n Is this good or bad? As n increases, this number drops off very rapidly. That means that the longer WVU plays the more likely they will have a winning season. Example (2): Suppose WVU now has a probability of winning each game to be 3 4. What is the probability that they suffer a losing season assuming that the outcomes of their games are independent of one another? We can use the formula for finding the lower bound by again finding δ and µ. ( δ)µ = n 2 ( δ) 3n 4 = n 2 δ = 2 3 δ = 3 Pr[X < n 2 ] < F ( 3n 4, 3 ) < e ( 3n 4 )( 3) 2 2 < (0.9592) n This probability is also exponentially small in n. 4 Deviation by δ To answer the 2 nd type of question, for example, how large does δ need to be such that Pr[X > ( + δ)µ] ε?, we can use the following definitions: Definition: 4. For any positive µ and ε, + (µ, ε) is the value of δ that satisfies F + (µ, + (µ, ε)) = ε 5

Definition: 4.2 For any positive µ and ε, (µ, ε) is the value of δ that satisfies F (µ, (µ, ε)) = ε 2 ln ε (µ, ε) = µ These definitions show that a deviation from the upper and lower bounds are irrespective of the values of n and the p i s. It shows that the deviation is kept below ε. Example (3): Suppose that we want to find a δ such that with probability p i = 3 4, the probability Pr[X < ( δ)µ] n 5. We can plug into the (µ, ε) formula for the result. ( 3n 4, n 5 ) = 0 ln n 3n = 4 3.333 ln n n This is very small, so it means that we don t need to go very far away from the expectation. Example (4): Suppose that we want to find a δ such that with probability p i = 3 4, the probability Pr[X < ( δ)µ] e.5n. We can plug into the (µ, ε) formula for the result. ( 3n 4, e.5n ) = 3n 3n 4 This means nothing since for deviations below the expectation, values of δ bigger than cannot occur. = 2 Example (5): Suppose we have n balls and n bins. What is the number m such that the first bin has m balls in it? Each ball is thrown into a bin independently and at random. We can express this by letting Y be a random variable denoting the number of balls in the st bin. We want to find m such that Pr[Y > m] n 2. If we consider the Bernoulli trails of whether or not the i th ball falls into the first bin, we have each of these events occurring with p i =. Thus, the expected value is the sum of each of the probabilities, so, µ =. We can take the event and express m in terms of m = + as follows: m = ( + δ)µ = ( + ( + (, n 2 ) ) ) 6

5 Set Balancing The Set Balancing problem can be characterized by the following: Given an n x n matrix A, all of whose entries are either 0 or, find a common vector, b, all of whose entries are or, such that A b is minimized. 5. Norm of a vector Definition: 5. The norm of a vector, x, is a non-negative real number, such that the following conditions are satisfied: x 0, x = 0 iff x = 0 c x = c x x + y = x + y x p = n p x i p If p = 2, the norm is the Euclidean norm (our well known formula for distance). If p =, the norm is the max i n x i. 5.2 Set Balancing Solution We can analyze the set balancing solution with the following: Suppose we choose a vector b uniformly and at random with probability = 2 This choice completely ignores the given matrix A. from {-,+}. If the above statement is our algorithm, we can now ask a few questions:. What is the expected value of picking any row of A with our randomly chosen b? E[a i b] = n a ij bj = 0 j= 2. How much away from 0 does the expectation deviate? We can show that Pr[ a i b < 4 n ln n] n 2, by using F to get the right hand side. Similarly, we can show that Pr[ a i b > 4 n ln n] n 2. 7

3. Suppose we say that a bad event occurs if the absolute value of the inner product of the i th row of A with b exceeds 4 n ln n. What is the probability that a bad event occurs for the i th row? Since both events are disjoint and mutually exclusive, we can sum the probabilities to show that Pr[ a i b > 4 n ln n] 2 n. 2 4. What is the probability that a bad event occurs? There are n possible bad events for each row in A. Since any one of them can occur at most 2 n 2. The union of these bad events is no more than the sum of their probabilities, which is 2 n. Result: 5. The probability that we find a vector, b, for which A b is minimized is 2 n 2 8