Lecture 2: Concentration Bounds

Similar documents
Learning Theory: Lecture Notes

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Glivenko-Cantelli Classes

Lecture 3: August 31

An Introduction to Randomized Algorithms

Lecture Chapter 6: Convergence of Random Sequences

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Rademacher Complexity

Distribution of Random Samples & Limit theorems

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Lecture 4: April 10, 2013

Lecture 2 February 8, 2016

Introduction to Probability. Ariel Yadin

Problem Set 2 Solutions

Lecture 19: Convergence

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

This section is optional.

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Lecture 16

Agnostic Learning and Concentration Inequalities

Lecture 6: Coupon Collector s problem

Lecture 10: Universal coding and prediction

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Lecture 12: November 13, 2018

Basics of Probability Theory (for Theory of Computation courses)

Math 216A Notes, Week 5

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Notes 19 : Martingale CLT

SDS 321: Introduction to Probability and Statistics

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

2.2. Central limit theorem.

1 Convergence in Probability and the Weak Law of Large Numbers

CS 330 Discussion - Probability

MATH/STAT 352: Lecture 15

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

Advanced Stochastic Processes.

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Probability and Random Processes

4. Partial Sums and the Central Limit Theorem

ST5215: Advanced Statistical Theory

Lecture 2: April 3, 2013

Lecture 4: Unique-SAT, Parity-SAT, and Approximate Counting

Solutions to HW Assignment 1

Lecture 3 : Random variables and their distributions

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Massachusetts Institute of Technology

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Notes 5 : More on the a.s. convergence of sums

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

2 Banach spaces and Hilbert spaces

Mathematics 170B Selected HW Solutions.

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

EE 4TM4: Digital Communications II Probability Theory

Ma 530 Infinite Series I

Empirical Process Theory and Oracle Inequalities

Stat 400: Georgios Fellouris Homework 5 Due: Friday 24 th, 2017

On Random Line Segments in the Unit Square

Parameter, Statistic and Random Samples

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Statistical Properties of OLS estimators

STAT Homework 2 - Solutions

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Lecture 12: September 27

MA131 - Analysis 1. Workbook 2 Sequences I

Understanding Samples

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Sequences and Series of Functions

Sequences I. Chapter Introduction

Frequentist Inference

STAT Homework 1 - Solutions

1 Introduction to reducing variance in Monte Carlo simulations

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Parameter, Statistic and Random Samples

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data

Lecture 9: Expanders Part 2, Extractors

Simulation. Two Rule For Inverting A Distribution Function

Law of the sum of Bernoulli random variables

1 Approximating Integrals using Taylor Polynomials

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

LECTURE 8: ASYMPTOTICS I

Lecture 7: October 18, 2017

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Topic 9: Sampling Distributions of Estimators

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

MATH 112: HOMEWORK 6 SOLUTIONS. Problem 1: Rudin, Chapter 3, Problem s k < s k < 2 + s k+1

Final Review for MATH 3510

Fall 2013 MTH431/531 Real analysis Section Notes

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Asymptotic distribution of products of sums of independent random variables

Transcription:

CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for formal publicatios. Laws of large umbers imply for a sequece of i.i.d. radom variables X, X 2,... with mea µ, the sample average, (X + X 2 + + X ), coverges to µ as goes to ifiity. Cocetratio bouds provide a quatitative distace betwee the sample average ad the expectatio. I this lecture we review several of these fudametal iequalities. I the ext few lectures we will see applicatios of these iequalities i desigig radomized algorithms. Let D be a distributio. Suppose we wat to estimate the mea E[X] of D ad we oly have access to idepedet samples of D, X D. Oe way to estimate the mea is to idepedetly draw samples X, X 2,..., X from the distributio ad retur the empirical mea: X i. By law of large umbers the empriical mea coverges to E [ [X]] as. I this lecture we will prove bouds o the umber of samples oe eeds to obtai a estimate of the mea withi ɛ-additive error. 2. Markov s Iequality Markov s Iequality: For ay oegative radom variable (R.V.) X ad ay umber k, P[X k] E[X] k. Proof. E[X] = i i P [X = i] i k i P [X = i] k P[X = i] k P [X k]. i k For example, for k = 3 2E[X], we ca write P [X 32 ] E[X] E[X] 3 2 E[X] = 2 3. (2.) Example: Suppose the average grade of CSE 52 is 2.0 (out of 4.0). Give a lower boud o the fractio of studets who received a grede at most 3.0. We assume that a grade ca be ay real umber betwee 0.0 ad 4.0. I this example E [X] = 2.0. Takig k = 3.0 = 3 2E[X] we get that at least /3 of the studets received grade at most 3.0. It turs out that if the oly thig that we kow about X is its expectatio the Markov s iequality will be the best boud we ca hope for. For a tight example cosider the followig sceario; assume k E[X] ad let 2-

2-2 Lecture 2: Cocetratio Bouds where ɛ is very close to 0. X = { k + ɛ E[X] k w.p. 0 w.p. E[X] k Applicatio. We use Markov s iequality to prove a upper boud o the umber of fixed poits of a radom permutatio. Recall that a permutatio is a oe to oe ad oto map σ : {, 2,..., } {, 2,..., }. We say i is a fixed poit of σ iff σ(i) = i. Claim 2.. With probability at least /k a uiformly radom permutatio σ has at most k fixed poits. Proof. The trick is to defie the right radom variable ad the use the Markov s iequality. Defie X i = I{σ(i) = i} ad X = X i. Observe that X is the umber of fixed poits of σ. We ca write dow the expectatio of X usig the liearity of expectatio. E [X] = E [X i ] = P [X i ] =. The secod equality uses that fact that the expectatio of a idicator radom variable is equal to its probability. The last equality holds sice σ is a uiform permutatio, i.e. P [X i ] =. Thus, by Markov s iequality P [X k] /k. 2.2 Chebyshev s Iequality Recall the defiitio of the variace: Var(X) := E [X E [X]] 2 = E [ X 2 + (E [X]) 2 2XE [X] ] = E[X 2 ] + (E[X]) 2 2E[XE[X]] = E[X 2 ] (E[X]) 2. (2.2) The secod ad the third equalities follow from the liearity of expectaio. Note that sice (X E[X]) 2 is a oegative radom variable, E[X 2 ] (E[X]) 2. The stadard deviatio of radom variable X is defied as σ(x) := Var(X). Chabishev s iequality: For ay radom variable X ad ay ɛ > 0, or equivaletly for ay umber k > 0, P [ X E[X] ɛ] Var(X) ɛ 2 P [ X E[X] kσ] k 2 We ca read the above iequality as follows: For ay radom variable X with probability at least 90%, X is withi three stadard deviatio of its expectatio.

Lecture 2: Cocetratio Bouds 2-3 Proof. Let Y := (X E[X]) 2 0. By Markov s iequality P[Y ɛ 2 ] E[Y ] ɛ 2 By the defiitio of Y, Or, equivaletly, P [ (X E[X]) 2 ɛ 2] Var(X) ɛ 2. P [ X E[X] ɛ] Var(X) ɛ 2. Next, we describe two applicatios of Chebyshev s iequality. Applicatio 2. Pollig. Cosider a large set of idividuals each votig 0 or o a presidecy cadidate, ad let p be the expectatio. We see that usig oly O(/ɛ 2 ) idepedet samples from the set we ca estimate p withi ad eps-additive error. Let X, X 2,..., X be the votes of idepedetly chose idividuals i this society. Observe that, for each i, { with probabilityp X i = 0 with probability p Defie a R.V. X = Xi. Obviously, E [X] = E [X i ] = p. We use the Chebyshev s iequality to show that for = O(/ɛ 2 ) w.h.p. X is withi a additive distace ɛ of p. To use Chebyshev s iequality, we first eed to upper boud the variace of X. We use the followig lemma to calculate the variace of sum of idepedet radom variables. Lemma 2.2. Let X, X 2,..., X be pairwise idepedet radom variables. This meas that for ay i j, E [X i X j ] = E [X i ] E [X j ]. For X = X +... X, we have, Var(X) = i = Var(X i ). Proof. By (2.2), Var(X) = E[X 2 ] (E[X]) 2 = i,j E[X i X j ] i,j E[X i ]E[X j ], where the secod equality follows by liearity of expectatio. By pairwise idepedece property, for ay i j, E[X i X j ] = E[X i ]E[X j ]. Therefore, the above expressio simplifies to, Var(X) = E[X 2 i ] i (E[X i ]) 2 = Var(X i ).

2-4 Lecture 2: Cocetratio Bouds I the pollig example, we ca write, Var(X) = Var(X i /) = 2 Var(X i ). Recall that X i is a Beroulli radom variable with prior p. We have, Var(X i ) = E[X 2 i ] E[X i] 2. Obviously, E[X i ] = p. I additio, E[X 2 i ] = 2 p + 0 2 ( p) = p. So, Var(X i ) = p p 2 /4 ad Now, by Chebyshev s iequality Var(X) 2 4 = 4. P[ X p ɛ] 4 ɛ 2 = 4ɛ 2 This meas that for = 3/ɛ 2, X approximates p withi a additive error of ɛ with 90% probability. Applicatio 3. Birthday Paradox. Let X,..., X {, 2,..., N} chose idepedetly ad uiformly at radom. How large should be to get a collisio, i.e., to get X i = X j for some i j? We show that if < N the w.h.p. there is o collisio. Ad, if > C. the with probability at least /C 2 there is a collisio. Defie a R.V. Y ij = I(X i = X j ) ad let Y = i,j Y ij. Note that Y ij s are depedet radom variables but they are pairwise idepedet. This crucial fact allows us to use Lemma 2.2 to calculate the variace of Y. Observe that Y is a itegral radom variables which couts the umber of collisios. So, we are iterested i P[Y ]. We start by calculatig the first momet of Y. ( 2) By Markov s iequality E[Y ] = i<j P[Y ] E[Y ] E[Y ij ] = i<j P[Y ij ] = = ( 2) N 2 2N, Therefore, if N with probability at least /2 there is o collisios. Now, let us study the case where N. Here, we use the Chebyshev s iequality. First, observe that sice Y is a itegral radom variable, By Chebyshev s iequality, Therefore Usig pairwise idepedece of Y ij s, we get Var(Y ) = i<j P[Y = 0] P[ Y E[Y ] E[Y ]]. N. P[ Y E[Y ] E[Y ]] Var(Y ) (E[Y ]) 2. P[Y ] = PY = 0 Var(Y ) (E[Y ]) 2 Var(Y ij ) = i<j ( N ) N 2 i,j N ( 2) N.

Lecture 2: Cocetratio Bouds 2-5 Therefore, P[Y ] Var(Y ) E [Y ] 2 ( ) 2 /N ( ( 2 ) = N ) 2N /N) 2 2. So, for C N, there is a collisio with probability at least 2/C 2. ( 2 2.3 Cheroff Bouds Cetral Limit Theorems i their geeral form state for a sequece i.i.d. radom variables X, X 2,... with bouded mea µ ad variace σ 2, ( ) X i µ N(0, σ 2 ) Cheroff types boud provide a quatitative boud o this covergece. Recall that Chebyshev s boud imply that the probability that a R.V. X is at distace kσ from the mea is /k 2. Roughly speakig, Cheroff types of bouds imply that for a suitable R.V. X this probability is exp(ω(k)). We start by describig the Hoeffdig s boud. Hoeffdig s Iequality: Let X,..., X be a sequece of idepedet variables where for each i, a i X i b i. The, [ [ ] ] P X i E X i ɛ 2 exp ( 22 ɛ 2 ) (ai b i ) 2. I the pollig example we had X i {0, } for each i, ad X,..., X are idepedet radom variables with E[X i ] = p. Therefore, by the Hoeffdig s iequality we get [ ] P X i p ɛ 2 exp ( 22 ɛ 2 ) = 2 exp( 2ɛ 2 ) So, for ay δ > 0, X i is withi additive error ɛ of p with probability at least δ if log δ ɛ 2. Applicatio 4. Ubiased radom walk o a lie. Cosider a particle which does a ubiased radom walk o the real lie. It starts at zero ad i each time step it moves oe step ahead or oe step back, i.e., from positio i with probability /2 it goes to i + ad with the remaiig probability it goes to i. We wat to see how far from the origi the particle will be at time. We ca simulate this variable b a sequece X,..., X of idepedet radom variables where for each i, { with probability/2 X i = with probability/2 Let X = X + X 2 + + X. We wat to prove a upper boud o X. Sice E[X] = 0, by Hoeffdig iequality, [ ] P X 0 ɛ 2 exp ( 22 ɛ 2 ) 4

2-6 Lecture 2: Cocetratio Bouds so if ɛ = log()/, with probability at least /, we have X log()/, or i other words, with probability at least /, X log(). That is, with high probability the particle is at distace log() from the origi. I the ext lecture, we show that the particle has distace at least from the origi w.h.p..