Concentration inequalities and tail bounds

Size: px

Start display at page:

Download "Concentration inequalities and tail bounds"

Ami Richardson
6 years ago
Views:

1 Concentration inequalities and tail bounds John Duchi

2 Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples 3 Hoe ding inequalities III Sub-exponential random variables 1 Definitions 2 Examples 3 Cherno /Bernstein bounds

3 Motivation I Often in this class, goal is to argue that sequence of random (vectors) X 1,X 2,... satisfies 1 n nx i=1 X i p! E[X]. I Law of large numbers: if E[kXk] < 1, then! 1 nx P lim X i 6= E[X] =0. n!1 n i=1

4 Markov inequalities Theorem (Markov s inequality) Let X be a non-negative random variable. Then P(X t) apple E[X] t.

5 Chebyshev inequalities Theorem (Chebyshev s inequality) Let X be a real-valued random variable with E[X 2 ] < 1. Then P(X E[X] t) apple E[(X E[X])2 ] t 2 = Var(X) t 2. Example: i.i.d. sampling

6 Cherno bounds Moment generating function: for random variable X, the MGF is M X ( ):=E[e X ] Example: Normally distributed random variables

7 Cherno bounds Theorem (Cherno bound) For any random variable and t 0, P(X E[X] t) apple inf 0 M X E[X] ( )e t =inf 0 E[e (X E[X]) ]e t.

8 Sub-Gaussian random variables Definition (Sub-Gaussianity) A mean-zero random variable X is E h e Xi apple exp sub-gaussian if for all 2 R Example: X N(0, 2 )

9 Properties of sub-gaussians Proposition (sums of sub-gaussians) Let P X i be independent, mean-zero n i=1 X i is P n 2 i=1 i -sub-gaussian. 2 i -sub-gaussian. Then

10 Concentration inequalities Theorem Let X be 2 -sub-gaussian. Then for t 0, P(X E[X] t) apple exp t 2 P(X E[X] apple t) apple exp t

11 Concentration: convergence of an independent sum Corollary Let X i be independent 2 i -sub-gaussian. Then for t 0, P 1 n nx i=1 X i t! apple exp nt n P n i=1 2 i!

12 Example: bounded random variables Proposition Let X 2 [a, b], withe[x] =0.Then E[e X ] apple e 2 (b a) 2 8.

13 Maxima of sub-gaussian random variables (in probability) E apple max japplen X j apple p 2 2 log n

14 Maxima of sub-gaussian random variables (in expectation) P max japplen X j p 2 2 (log n + t) apple e t.

15 Hoe ding s inequality If X i are bounded in [a i,b i ] then for t 0,! P 1 nx (X i E[X i ]) t apple exp n i=1! P 1 nx (X i E[X i ]) apple t apple exp n i=1 1 n 1 n! 2nt 2 P n i=1 (b i a i ) 2! 2nt 2 P n i=1 (b. i a i ) 2

16 Equivalent definitions of sub-gaussianity Theorem The following are equivalent (up to constants) i E[exp(X 2 / 2 )] apple e ii E[ X k ] 1/k apple p k iii P( X t) apple exp( t ) If in addition X is mean-zero, then this is also equivalent to i iii above iv X is 2 -sub-gaussian

17 Sub-exponential random variables Definition (Sub-exponential) A mean-zero random variable X is ( 2,b)-sub-Exponential if 2 2 E [exp ( X)] apple exp for apple 1 2 b. Example: Exponential RV, density p(x) = e x for x 0

18 Sub-exponential random variables Example: 2 -random variable. Let Z N(0, 2 ) and X = Z 2. Then E[e X 1 ]=. [1 2 2 ] 1 2 +

19 Concentration of sub-exponentials Theorem Let X be ( 2,b)-sub-exponential. Then P(X E[X]+t) apple ( e e t if 0 apple t apple 2 t 2b if t 2 b b = max e t 2 2 2,e t 2b.

20 Sums of sub-exponential random variables Let X i be independent ( i 2,b i)-sub-exponential random variables. Then P n i=1 X i is ( P n i=1 i 2,b )-sub-exponential, where b = max i b i Corollary: If X i satisfy above, then! 1 nx P X i E[X i ] t apple 2exp n i=1 ( nt 2 min 2 1 P n n i=1 i 2 )!, nt. 2b

21 Bernstein conditions and sub-exponentials Suppose X is mean-zero with E[X k ] apple 1 2 k! 2 b k 2 Then E[e X ] apple exp 2 2 2(1 b )

22 Johnson-Lindenstrauss and high-dimensional embedding Question: Let u 1,...,u m 2 R d be arbitrary. Can we find a mapping F : R d! R n, n d, suchthat (1 ) u i u j 2 2 apple F (ui ) F (u j ) 2 2 apple (1 + ) ui u j 2 2 Theorem (Johnson-Lindenstrauss embedding) For n & 1 2 log m such a mapping exists.

23 Proof of Johnson-Lindenstrauss continued P kxuk 2 2 n kuk t! nt 2 apple 2exp 8 for t 2 [0, 1].

24 Reading and bibliography 1. S. Boucheron, O. Bousquet, and G. Lugosi. Concentration inequalities. In O. Bousquet, U. Luxburg, and G. Ratsch, editors, Advanced Lectures in Machine Learning, pages Springer, V. Buldygin and Y. Kozachenko. Metric Characterization of Random Variables and Random Processes, volume 188 of Translations of Mathematical Monographs. American Mathematical Society, M. Ledoux. The Concentration of Measure Phenomenon. American Mathematical Society, S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: a Nonasymptotic Theory of Independence. Oxford University Press, 2013

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform