STAT 200C: High-dimensional Statistics

Size: px
Start display at page:

Download "STAT 200C: High-dimensional Statistics"

Transcription

1 STAT 200C: High-dimensional Statistics Arash A. Amini May 30, / 59

2 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d γ or even d n, e.g genes with only 50 samples. Classical methods fail. E.g., Linear regression y = X β + ε, where ε N(0, σ 2 I n ). ˆβ OLS = argmin β R d y X β 2 2 We have MSE( ˆβ OLS ) = O( σ2 d n ). Solution: Assume some underlying low-dimensional structure (e.g. sparsity). 2 / 59

3 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 3 / 59

4 Concentration inequalities Main tools in dealing with high-dimensional randomness. Non-asymptotic versions of the CLT. General form: P( X EX > t) < something small. Classical examples: Markov and Chebyshev inequalities: Markov: Assume X 0, then P(X t) EX t. Chebyshev: Assume EX 2 <, and let µ = EX. Then, P( X µ t) var(x ) t 2. Stronger assumption: E X k <. Then, P( X µ t) E X µ k t k. 4 / 59

5 Concentration inequalities Example 1 X 1,..., X n Ber(1/2) and S n = n i=1 X i. Then, by CLT Z n := S n n/2 n/4 d N(0, 1). Letting g N(0, 1), P (S n n2 ) n4 + t P(g t) 1 /2 2 e t2. Letting t = α n, P (S n n ) 2 (1 + α) 1 2 e n α2 /2. Problem: Approximation is not tight in general. 5 / 59

6 Theorem 1 (Berry Esseen CLT) Under the assumption of CLT, with ρ = E X 1 µ 3 /σ 3, P(Z n t) P(g t) ρ n. ( The bound is tight since P(S n = n/2) = 1 n ) 2 n n/2 n 1, for the Bernoulli example. Conclusion, the approximation error is O(n 1/2 ) which is a lot larger than the exponential bound O(e n α2 /2 ) that we want to establish. Solution: directly obtain the concentration inequalities, often using Chernoff bounding technique: for any λ > 0, P(Z n t) = P(e λzn e λt ) EeλZn e λt, t R. Leads to the study of the MGF of random variables. 6 / 59

7 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 7 / 59

8 Sub-Gaussian concentration Definition 1 A zero mean random variable X is sub-gaussian if for some σ > 0. Ee λx e σ2 λ 2 /2, for all λ R. (1) A general random variable is sub-gaussian if X EX is sub-gaussian. X N(0, σ 2 ) satisfies (1) with equality. A Rademacher variable: also called symmetric Bernoulli P(X = ±1) = 1 2 is sub-gaussian, Ee λx = cosh(λ) e λ2 /2. Any bounded RV is sub-gaussian: X [a, b] a.s., then (1) with σ = b a 2. 8 / 59

9 Proposition 1 Assume that X is zero-mean sub-gaussian satisfying (1). Then, ) P(X t) exp ( t2 2σ 2, for all t 0. Same bound holds with X replaced with X. Proof: Chernoff bound [ ] ( P(X t) inf e λt Ee λx = inf exp λ>0 λ>0 λt + λ2 σ 2 Union bound gives two-sided bound: P( X t) 2 exp( t2 2σ 2 ). What if µ := EX 0? Apply to X µ, ) P( X µ t) 2 exp ( t2 2σ 2. 2 ). 9 / 59

10 Proposition 2 Assume that {X i } are independent, zero-mean sub-gaussian with parameters {σ i }. Then, S n = i X i is sub-gaussian with parameter σ := i σ2 i. Sub-Gaussian parameter squared behaves like the variance. Proof: Ee λsn = i EeλX i. 10 / 59

11 Theorem 2 (Hoeffding) Assume that {X i } are independent, zero-mean sub-gaussian with parameters {σ i }. Then, letting σ 2 := i σ2 i, ( ) ) P X i t exp ( t2 2σ 2, t 0. Same bound holds with X i replaced with X i. i Alternative form, assume there are n variables, and let σ 2 := 1 n n i=1 σ2 i, and X n := 1 n n i=1 X i. Then, P ( Xn t ) ) exp ( nt2 2 σ 2, t 0. Example: X i iid Rad so that σ = σi = / 59

12 Equivalent characterizations of sub-gaussianity For a RV X, the following are equivalent: (HDP, Prop ) 1. The tails of X satisfy P( X t) 2 exp( t 2 /K1 2 ), for all t The moments of X satisfy X p = (E X p ) 1/p K 2 p, for all p The MGF of X 2 satisfies E exp(λx 2 ) exp(k 2 3 λ 2 ), for all λ 1 K 3 4. The MGF of X 2 is bounded at some point, E exp(x 2 /K4 2 ) 2. Assuming EX = 0, the above are equivalent to: 5. The MGF of X satisfies E exp(λx ) exp(k5 2 λ 2 ), for all λ R. 12 / 59

13 Sub-Gaussian norm The sub-gaussian norm is the smallest K 4 in property 4, i.e., X ψ2 = inf { t > 0 : E exp(x 2 /t 2 ) 2 }. X is sub-gaussian iff X ψ2 <. ψ2 is a proper norm on the space of sub-gaussian RVs. Every sub-gaussian variable satisfies the following bounds: P( X t) 2 exp( ct 2 / X 2 ψ 2 ), for all t 0. X p C X ψ2 p, for all p 1. E exp(x 2 / X 2 ψ 2 ) 2 When EX = 0, E exp(λx ) exp(cλ 2 X 2 ψ 2 ) for all λ R. for some universal constant C, c > / 59

14 Some consequences Recall what a universal/numerical/absolute constant means. Sub-Gaussian norm is within a constant factor of the sub-gaussian parameter σ: for numerical constant c 1, c 2 > 0, c 1 X ψ2 σ(x ) c 2 X ψ2. Easy to see that X ψ2 X. (Bounded variables are sub-gaussian) a b means a Cb for some universal constant C. Lemma 1 (Centering) If X is sub-gaussian, then X EX is sub-gaussian too and X EX ψ2 C X ψ2 where C is a universal constant. Proof: EX ψ2 EX E X = X 1 X ψ2. Note: X EX ψ2 could be much smaller than X ψ2. 14 / 59

15 Alternative forms Alternative form of Proposition 2: Proposition 3 (HDP 2.6.1) Assume that {X i } are independent, zero-mean sub-gaussian RVs. Then i X i is also sub-gaussian and i X i 2 ψ 2 C i X i 2 ψ 2 where C is an absolute constant. 15 / 59

16 Alternative form of Theorem 2: Theorem 3 (Hoeffding) Assume that {X i } are independent, zero-mean sub-gaussian RVs. Then, ( ) P X i t 2 exp ( c ) t2 i X, t 0. i ψ2 c > 0 is some universal constant. i 16 / 59

17 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 17 / 59

18 Sub-exponential concentration Definition 2 A zero mean random variable X is sub-exponential if for some ν, α > 0. Ee λx e ν2 λ 2 /2, for all λ < 1 α. (2) A general random variable is sub-exponential if X EX is sub-exponential. If Z N(0, 1), then Z 2 is sub-exponential. { e λ Ee λ(z 2 1) = 1 2λ λ < 1/2 λ > 1/2. We have Ee λ(z 2 1) e 4λ2 /2, λ < 1/4. hence sub-exponential with parameters (2, 4). Tails of Z 2 1 are heavier than a Gaussian. 18 / 59

19 Proposition 4 Assume that X is zero-mean sub-exponential satisfying (2). Then, P(X t) exp ( 1 { t 2 2 min ν 2, t }), for all t 0. α Same bound holds with X replaced with X. Proof: Chernoff bound P(X t) inf [e ] λt Ee λx λ 0 Let f (λ) = λt + λ 2 ν 2 /2. Minimizer of f over R is λ = t/ν 2. inf exp 0 λ < 1 α ( λt + λ2 ν 2 2 ). 19 / 59

20 Hence minimizer of f over [0, 1/α] is { λ t/ν 2 t/ν 2 < 1/α = 1/α t/ν 2 1/α. and the minimum is t2 f (λ 2ν ) = 2 t < ν 2 /α t α + ν2 2α 2 t. 2α t ν2 /α Thus, f (λ ) max { t2 2ν 2, t } = 1 { t 2 2α 2 min ν 2, t }. α 20 / 59

21 Bernstein inequality for sub-exponential RVs Theorem 4 (Bernstein) Assume that {X i } are independent, zero-mean sub-exponential RVs with parameters (ν i, α i ). Let ν := ( ) 1/2 i ν2 i and α := maxi α i. Then i X i is sub-exponential with parameters (ν, α), and Proof: We have ( ) P X i t exp ( 1 { t 2 2 min ν 2, t }). α i Ee λx i e λ2 ν 2 i /2, for all λ < Let S n = i X i. By independence Ee λsn = i 1 max i α i. Ee λx i e λ2 i ν2 i /2, for all λ < The tail bound follows from Proposition 4. 1 max i α i. 21 / 59

22 Equivalent characterizations of sub-exponential RVs For a RV X, the following are equivalent: (HDP, Prop ) 1. The tails of X satisfy 2. The moments of X satisfy 3. The MGF of X satisfies P( X t) 2 exp( t/k 1 ), for all t 0. X p = (E X p ) 1/p K 2 p, for all p 1. E exp(λ X ) exp(k 3 λ), for all 0 λ 1 K 3 4. The MGF of X is bounded at some point, E exp( X /K 4 ) 2. Assuming EX = 0, the above are equivalent to: 5. The MGF of X satisfies E exp(λx ) exp(k5 2 λ 2 ), for all λ 1. K 5 22 / 59

23 Equivalent characterizations of sub-gaussianity For a RV X, the following are equivalent: (HDP, Prop ) 1. The tails of X satisfy 2. The moments of X satisfy 3. The MGF of X 2 satisfies P( X t) 2 exp( t 2 /K 2 1 ), for all t 0. X p = (E X p ) 1/p K 2 p, for all p 1. E exp(λx 2 ) exp(k3 2 λ 2 ), for all λ 1 K 3 4. The MGF of X 2 is bounded at some point, E exp(x 2 /K4 2 ) 2. Assuming EX = 0, the above are equivalent to: 5. The MGF of X satisfies E exp(λx ) exp(k5 2 λ 2 ), for all λ R. 23 / 59

24 Sub-exponential norm The sub-exponential norm is the smallest K 4 in property 4, i.e., X ψ1 = inf { t > 0 : E exp( X /t) 2 }. X is sub-exponential iff X ψ1 <. ψ1 is a proper norm on the space of sub-exponential RVs. Every sub-exponential variable satisfies the following bounds: P( X t) 2 exp( ct/ X ψ1 ), for all t 0. X p C X ψ1 p, for all p 1. E exp( X / X ψ1 ) 2 When EX = 0, E exp(λx ) exp(cλ 2 X ψ1 ) for all λ 1/ X ψ1. for some universal constant C, c > / 59

25 Lemma 2 A random variable X is sub-gaussian if and only if X 2 is sub-exponential, in fact X 2 ψ1 = X 2 ψ 2. Proof: Immediate from definition. Lemma 3 If X and Y are sub-gaussian, then XY is sub-exponential, and XY ψ1 X ψ2 Y ψ2 Proof: Assume X ψ2 = Y ψ2 = 1, WLOG. Apply Young s inequality ab (a 2 + b 2 )/2 for all a, b R, twice Ee XY Ee (X 2 +Y 2 )/2 = E[e X 2 /2 e Y 2 /2 ] 1 2 E[ e X 2 + e Y 2] / 59

26 Alternative form of Proposition 4: Theorem 5 (Bernstein) Assume that {X i } are independent, zero-mean sub-exponential RVs. Then, ( ) [ ( P X i t 2 exp c min i c > 0 is some universal constant. Corollary 1 (Bernstein) t 2 i X i 2 ψ 1, t )], t 0. max i X i ψ1 Assume that {X i } are independent, zero-mean sub-exponential RVs with X i ψ1 K for all i. Then, ( 1 P n n i=1 ) [ ( t 2 X i t 2 exp c n min K 2, t )], t 0. K 26 / 59

27 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 27 / 59

28 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 28 / 59

29 Concentration of χ 2 RVs I Example 2 Let Y χ 2 n, i.e., Y = n i=1 Z 2 i where Z i iid N(0, 1). Z 2 i are sub-exponential with parameters (2, 4). Then, Y is sub-exponential with parameters (2 n, 4) and we obtain or replacing t with nt ( 1 P n n i=1 P( Y EY ) 2 exp [ 1 ( t 2 2 min 4n, t )] 4 ) i 1 t Z 2 [ 2 exp 1 ] 8 n min(t2, t), t / 59

30 Concentration of χ 2 RVs II In particular, ( 1 P n n i=1 ) i 1 t 2e nt2 /8, t [0, 1]. Z 2 Second approach ignoring constants: We have Z 2 i 1 ψ1 C Z 2 i ψ1 = C Z i 2 ψ 2 = C. Applying Corollary 1 with K = C ( 1 P n n i=1 ) i 1 t Z 2 where c 2 = c min(1/c 2, 1/C). [ ( t 2 2 exp c n min [ 2 exp c 2 n min(t 2, t) C 2, t )], C ], t 0 30 / 59

31 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 31 / 59

32 Random projection for dimension reduction Suppose that we have data points {u 1,..., u N } R d. Want to project them down to a lower-dimensional space R m (m d) such that pairwise distances u i u j are approximately preserved. Can be done by a linear random projection X : R d R m, which can be viewed as a random matrix X R m d. Lemma 4 (Johnson-Lindenstrauss embedding) Let X := 1 m Z R m d where Z has iid N(0, 1) entries. Consider any collection of points {u 1,..., u N } R d. Take ε, δ (0, 1) and assume that Then with probability at least 1 ε, m 16 δ 2 log ( N ε ). (1 δ) u i u j 2 2 Xu i Xu j 2 2 (1 + δ) u i u j 2 2, i j 32 / 59

33 Proof Fix u R d and let Y := Zu 2 2 u 2 2 = m u z i, 2 u 2 i=1 where z T i is the ith row of Z. Then, Y χ 2 m. Recalling X = Z/ m, for all δ (0, 1), ( Xu 2 2 P u 2 2 ) 1 δ ( Y ) = P m 1 δ 2e mδ2 /8 Applying to u = u i u j, for any fixed pair (i, j), we have ( X (u i u j ) 2 2 P u i u j 2 2 ) 1 δ 2e mδ2 /8 33 / 59

34 Apply a further union bound for all pairs i j ) ( N 1 δ, for some i j 2 2 ( X (u i u j ) 2 2 P u i u j 2 2 ) e mδ2 /8 Since 2 ( N 2) N 2, the result follows by solving the following for m N 2 e mδ2 /8 ε. 34 / 59

35 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 35 / 59

36 l 2 norm of sub-gaussian vectors Here X 2 = n i=1 X 2 i. Proposition 5 (Concenration of norm, HDP 3.1.1) Let X = (X 1,..., X n ) R n be a random vector with independent, sub-gaussian coordinates X i that satisfy EXi 2 = 1. Then, X 2 n ψ2 CK 2 where K = max i X i ψ2 and C is an absolute constant. The result says that the norm is highly concentration around n: X 2 n in high dimensions (n large). Assuming K = O(1), it shows that w.h.p. X 2 = n + O(1). More precisely, w.p. 1 e c1v 2, we have n K 2 v X 2 n + K 2 v 36 / 59

37 Simple argument: Assuming sd(x 2 1 ) = O(1), E X 2 2 = n var( X 2 2) = n var(x 2 1 ) sd( X 2 2) = n sd(x 2 1 ) X 2 n ± O( n) = n ± O(1), the latter can be shown by Taylor expansion. 37 / 59

38 Proof of Proposition 5: Argue that we can take K 1. Since X i is sub-gaussian, X 2 i is sub-exponential and X 2 i 1 ψ1 C X 2 i ψ1 = C X i 2 ψ 2 CK 2. Applying Bernstein s inequality (Corollary 1), for any u 0, ( X 2 2 P n ) 1 u ( 2 exp c ) 1n K 4 min(u2, u), where we used K 4 K 2 and absorbed C into c 1. Using the inequality z 1 δ = z 2 1 max(δ, δ 2 ), z, ( X ) 2 P 1 δ n ( X 2 2 P n 2 exp ) 1 max(δ, δ 2 ) ( c 1n K 4 δ2). f (u) = min(u 2, u) and g(δ) = max(δ, δ 2 ), then f (g(δ)) = δ 2 for all δ 0. Change of variable δ = t/ n gives the result. 38 / 59

39 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 39 / 59

40 l norm of sub-gaussian vectors For any vector X R n, the l norm is: X = max X i. i=1,...,n Lemma 5 Let X = (X 1,..., X n ) R n be a random vector with zero-mean, independent, sub-gaussian coordinates X i with parameter σ i. Then, for any γ 0, where σ = max i σ i. P ( X σ 2(1 + γ) log n ) 2n γ Proof: We have P( X i t) 2 exp( t 2 /2σ 2 ), hence taking t = 2σ 2 (1 + γ) log n. ) P(max X i t) 2n exp ( t2 i 2σ 2 = 2n γ 40 / 59

41 Theorem 6 Assume {X i } n i=1 are zero-mean RVs, sub-gaussian with parameter σ. Then, E[ max i=1,...,n X i ] 2σ 2 log n, n 1 Proof of 6: Jensen s inequality on e λz where Z = max i X i. 41 / 59

42 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 42 / 59

43 Theorem 7 (Azuma Hoeffding) Assume that X = (X 1,..., X n ) has independent coordinates, and let Z = f (X ). Let us write E i [Z] = E[Z X 1,..., X i ] and let Assume that i := E i [Z] E i 1 [Z]. E i 1 [e λ i ] e σ2 i λ2 /2, λ R (3) almost surely, for all i = 1,..., n. Then, Z EZ is sub-gaussian with parameter n σ = i=1 σ2 i. In particular, we have the tail bound P( Z EZ t) 2 exp( t2 2σ 2 ). { i } is called Doob s martingale difference sequence. It is a martingale difference seq. since E i 1 [ i ] = / 59

44 Proof Let S j := j i=1 i which is only a function of X i, i j. We have, noting that E 0 [Z] = Z, S n = n i = Z EZ By properties of conditional expectation, and assumption (3), Taking E n 2 of both sides: i=1 E n 1 [e λsn ] = e λsn 1 E n 1 [e λ n ] e λsn 1 e σ2 n λ2 /2 E n 2 [e λsn ] e σ2 n λ2 /2 E n 2 [e λsn 1 ] e λsn 2 e (σ2 n +σ2 n 1 )λ2 /2 Repeating the process, we get E 0 [e λsn ] exp(( n i=1 σ2 i )λ2 /2). 44 / 59

45 Bounded difference inequality Conditional sub-g. assump. holds under bounded difference property: f (x1,..., x i 1, x i, x i+1,..., x n ) f (x 1,..., x i 1, x i, x i+1,..., x n ) Li (4) for all x 1,..., x n, x i X, and all i [n], for some constants (L 1,..., L n ). Theorem 8 (Bounded difference) Assume that X = (X 1,..., X n ) has independent coordinates, and assume that f : X n R satisfies the bounded difference property (4). Then, P( f (X ) Ef (X ) ) ) t 2 exp ( 2t2 n, t 0. i=1 L2 i 45 / 59

46 Proof (Naive bound) We have i = E i [Z] E i 1 [E i [Z]] = g i (X 1,..., X i ) E i 1 [g i (X 1,..., X i )] Let X i be an independent copy of X i. Conditioned on X 1,..., X i 1, we are effectively looking at g i (x 1,..., x i 1, X i ) E[g i (x 1,..., x i 1, X i )] due to independence of {X 1,..., X i, X i }. Thus, i L i condition on X 1,..., X i 1. That is, E i 1 [e λ i ] e σ2 i λ2 /2 where σ 2 i = (2L i ) 2 /4 = L 2 i. 46 / 59

47 Proof (Better bound) Can show that i I i where I i L i, improving the constant by 4. Conditioned on X 1,..., X i, we are effectively looking at i = g i (x 1,..., x i 1, X i ) µ i where µ i is a constant (only a function of x 1,..., x i 1 ). Then, i + µ i [a i, b i ] where a i = inf g i(x 1,..., x i 1, x), b i = sup g i (x 1,..., x i 1, x). x We have (need to argue that g i satisfies bounded difference) [ b i a i = sup gi (x 1,..., x i 1, x) g i (x 1,..., x i 1, y) ] L i. x,y Thus E i 1 [e λ i ] e σ2 λ 2 /2 where σ 2 i = (b i a i ) 2 /4 L 2 i /4. x 47 / 59

48 The role of independence in the second argument is subtle. The only place we used independence is to argue that E i [Z] satisfies bounded difference for all i. We argue that E i [Z] = g i (X 1,..., X i ), which is where we use independence. Then, g i by definition and Jensen satisfies bounded difference. 48 / 59

49 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 49 / 59

50 Example: (bounded) U-statistics g : R 2 R a symmetric function, X 1,..., X n and iid sequence, U := ( 1 n ) g(x i, X j ) 2 i<j is called a U-statistic (of order 2). U is not a sum of independent variables, e.g. n = 3 gives U = 1 3( g(x1, X 2 ) + g(x 1, X 3 ) + g(x 2, X 3 ) ), but the dependence between terms is relatively weak (made precise shortly). For example, g(x, y) = 1 2 (x y)2 gives an unbiased estimator of the variance. (Exercise) 50 / 59

51 Assume that g is bounded, i.e. g b, meaning i.e., g(x, y) b for all x, y R. g := sup g(x, y) b x,y Writing U = f (X 1,..., X n ), we observe that (for fixed k) f (x) f (x \k ) ( 1 n ) g(x i, x k ) g(x i, x k) 2 i k (n 1)2b n(n 1)/2 = 4b n thus f has bounded differences with parameters L k = 4b/n. Applying Theorem 8 P ( U EU t ) 2e nt2 /8b 2, t / 59

52 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 52 / 59

53 Clique number of Erdös-Rényi Let G be an undirected graph.on n nodes. A clique in G is a complete (induced) sub-graph. Clique number of G denoted as ω(g) is the size of the largest clique(s). For two graphs G and G that differ in at most 1 edge, ω(g) ω(g ) 1. Thus E(G) ω(g) has bounded difference property with L = 1. Let G be an Erdös-Rényi random graph: Edges are independently drawn with probability p. Then, with m = ( n 2), ( ) P ω(g) E ω(g) δ 2e 2δ2 /m or setting ω(g) = ω(g)/m, ( ) P ω(g) E ω(g) δ 2e 2mδ2 53 / 59

54 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 54 / 59

55 Lipschitz functions of standard Gaussian vector A function f : R n R is L-Lipschitz w.r.t. 2 if f (x) f (y) L x y 2, x, y R n Theorem 9 (Gaussian concentration) Let X N(0, I n ) be a standard Gaussian vector and assume that f : R n R is L-Lipschitz w.r.t. the Euclidean norm. Then, P( f (X ) E[f (X )] ) ) t 2 exp ( t2 2L 2, t 0. (5) In other words, f (X ) is sub-gaussian with parameter L. Deep result, no easy proof! Has far-reaching consequences. One-sided bounds holds with prefactor 2 removed. 55 / 59

56 Example: χ 2 and norm concentrations revisited Let X N(0, I n ) and condition the function f (x) = x 2 / n. f is L-Lipschitz with L = 1/ n. Hence, ( X 2 P E X ) 2 t e nt2 /2, t 0 n n Since E X 2 n (why?), we have ( X 2 ) P 1 t e n t2 /2, t 0. n For t [0, 1], (1 + t) t, hence or setting 3t = δ, ( X 2 ) P 2 1 3t e n t2 /2, t [0, 1]. n ( X 2 ) P 2 1 δ e n δ2 /18, δ [0, 3]. n 56 / 59

57 Example: order statistics Let X N(0, I n ), and let f (x) = x (k) be the kth order statistic: For x R n, For any x, y R n, we have hence f is 1-Lipschitz. (Exercise) It follows that x (1) x (2) x (n) x (k) y (k) x y 2 P ( X (k) EX (k) t ) 2e t2 /2, t 0 iid In particular, if X i N(0, 1), i = 1,..., n, then ) P( max X n E[ max X n] t 2e t2 /2, t 0 i=1,...,n i=1,...,n 57 / 59

58 Example: singular values Consider a matrix X R n d where n > d. Let σ 1 (X ) σ 2 (X ) σ k (X ) be (ordered) singular values of X. By Weyl s theorem, for any X, Y R n d : σ k (X ) σ k (Y ) X Y op X Y F (Note that this is a generalization of order-statistics inequality.) Thus, X σ k (X ) is 1-Lipschitz: Proposition 6 Let X R n d be a random matrix with iid N(0, 1) entries. Then, ( σk P (X ) E[σ k (X )] ) δ 2e δ2 /2, δ 0 It remains to characterize E[σ k (X )]. For an overview of matrix norms, see matrix norms.pdf 58 / 59

59 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 59 / 59

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

High Dimensional Probability

High Dimensional Probability High Dimensional Probability for Mathematicians and Data Scientists Roman Vershynin 1 1 University of Michigan. Webpage: www.umich.edu/~romanv ii Preface Who is this book for? This is a textbook in probability

More information

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang. Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)

More information

Concentration inequalities and the entropy method

Concentration inequalities and the entropy method Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1

More information

Concentration inequalities and tail bounds

Concentration inequalities and tail bounds Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples

More information

Selected Exercises on Expectations and Some Probability Inequalities

Selected Exercises on Expectations and Some Probability Inequalities Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ

More information

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and

More information

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability Probability Theory Chapter 6 Convergence Four different convergence concepts Let X 1, X 2, be a sequence of (usually dependent) random variables Definition 1.1. X n converges almost surely (a.s.), or with

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Lecture 1 Measure concentration

Lecture 1 Measure concentration CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples

More information

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER 1. The method Ashwelde and Winter [1] proposed a new approach to deviation inequalities for sums of independent random matrices. The

More information

STA 711: Probability & Measure Theory Robert L. Wolpert

STA 711: Probability & Measure Theory Robert L. Wolpert STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A

More information

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random Susceptible-Infective-Removed Epidemics and Erdős-Rényi random graphs MSR-Inria Joint Centre October 13, 2015 SIR epidemics: the Reed-Frost model Individuals i [n] when infected, attempt to infect all

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of

More information

Outline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University.

Outline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University. Outline Piotr 1 1 Lane Department of Computer Science and Electrical Engineering West Virginia University 8 April, 01 Outline Outline 1 Tail Inequalities Outline Outline 1 Tail Inequalities General Outline

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

High-Dimensional Probability

High-Dimensional Probability High-Dimensional Probability An Introduction with Applications in Data Science Roman Vershynin University of California, Irvine January 22, 2018 https://www.math.uci.edu/~rvershyn/ Contents Preface vi

More information

Anti-concentration Inequalities

Anti-concentration Inequalities Anti-concentration Inequalities Roman Vershynin Mark Rudelson University of California, Davis University of Missouri-Columbia Phenomena in High Dimensions Third Annual Conference Samos, Greece June 2007

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Hoeffding, Chernoff, Bennet, and Bernstein Bounds

Hoeffding, Chernoff, Bennet, and Bernstein Bounds Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

Probability inequalities 11

Probability inequalities 11 Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on

More information

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018 ECE534, Spring 2018: s for Problem Set #4 Due Friday April 6, 2018 1. MMSE Estimation, Data Processing and Innovations The random variables X, Y, Z on a common probability space (Ω, F, P ) are said to

More information

Limiting Distributions

Limiting Distributions Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the

More information

Concentration Inequalities

Concentration Inequalities Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does

More information

Random regular digraphs: singularity and spectrum

Random regular digraphs: singularity and spectrum Random regular digraphs: singularity and spectrum Nick Cook, UCLA Probability Seminar, Stanford University November 2, 2015 Universality Circular law Singularity probability Talk outline 1 Universality

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

1 Dimension Reduction in Euclidean Space

1 Dimension Reduction in Euclidean Space CSIS0351/8601: Randomized Algorithms Lecture 6: Johnson-Lindenstrauss Lemma: Dimension Reduction Lecturer: Hubert Chan Date: 10 Oct 011 hese lecture notes are supplementary materials for the lectures.

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

Randomized Algorithms Week 2: Tail Inequalities

Randomized Algorithms Week 2: Tail Inequalities Randomized Algorithms Week 2: Tail Inequalities Rao Kosaraju In this section, we study three ways to estimate the tail probabilities of random variables. Please note that the more information we know about

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information

The Canonical Gaussian Measure on R

The Canonical Gaussian Measure on R The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where

More information

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional

More information

Chapter 7. Basic Probability Theory

Chapter 7. Basic Probability Theory Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries

More information

Proving the central limit theorem

Proving the central limit theorem SOR3012: Stochastic Processes Proving the central limit theorem Gareth Tribello March 3, 2019 1 Purpose In the lectures and exercises we have learnt about the law of large numbers and the central limit

More information

s k k! E[Xk ], s <s 0.

s k k! E[Xk ], s <s 0. Chapter Moments and tails M X (s) =E e sx, defined for all s R where it is finite, which includes at least s =0. If M X (s) is defined on ( s 0,s 0 ) for some s 0 > 0 then X has finite moments of all orders

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Stochastic Convergence Barnabás Póczos Motivation 2 What have we seen so far? Several algorithms that seem to work fine on training datasets: Linear regression

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction

More information

Stochastic Models (Lecture #4)

Stochastic Models (Lecture #4) Stochastic Models (Lecture #4) Thomas Verdebout Université libre de Bruxelles (ULB) Today Today, our goal will be to discuss limits of sequences of rv, and to study famous limiting results. Convergence

More information

On the Bennett-Hoeffding inequality

On the Bennett-Hoeffding inequality On the Bennett-Hoeffding inequality of Iosif 1,2,3 1 Department of Mathematical Sciences Michigan Technological University 2 Supported by NSF grant DMS-0805946 3 Paper available at http://arxiv.org/abs/0902.4058

More information

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38 Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

Lecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a]

Lecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a] U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 4 Professor Satish Rao September 7, 2010 Lecturer: Satish Rao Last revised September 13, 2010 Lecture 4 1 Deviation bounds. Deviation bounds

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

Small Ball Probability, Arithmetic Structure and Random Matrices

Small Ball Probability, Arithmetic Structure and Random Matrices Small Ball Probability, Arithmetic Structure and Random Matrices Roman Vershynin University of California, Davis April 23, 2008 Distance Problems How far is a random vector X from a given subspace H in

More information

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students

More information

Lecture 18: March 15

Lecture 18: March 15 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may

More information

Random Methods for Linear Algebra

Random Methods for Linear Algebra Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform

More information

Moments and tails. Chapter 2

Moments and tails. Chapter 2 Chapter Moments and tails M X (s) =E e sx, defined for all s R where it is finite, which includes at least s =0. If M X (s) is defined on ( s 0,s 0 ) for some s 0 > 0 then X has finite moments of all orders

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,

More information

8 Laws of large numbers

8 Laws of large numbers 8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable

More information

Notes 1 : Measure-theoretic foundations I

Notes 1 : Measure-theoretic foundations I Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,

More information

Matrix concentration inequalities

Matrix concentration inequalities ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018 Recap: matrix Bernstein inequality Consider a sequence of independent random

More information

Limiting Distributions

Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results

More information

Tom Salisbury

Tom Salisbury MATH 2030 3.00MW Elementary Probability Course Notes Part V: Independence of Random Variables, Law of Large Numbers, Central Limit Theorem, Poisson distribution Geometric & Exponential distributions Tom

More information

Week 9 The Central Limit Theorem and Estimation Concepts

Week 9 The Central Limit Theorem and Estimation Concepts Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population

More information

11.1 Set Cover ILP formulation of set cover Deterministic rounding

11.1 Set Cover ILP formulation of set cover Deterministic rounding CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use

More information

Tail and Concentration Inequalities

Tail and Concentration Inequalities CSE 694: Probabilistic Analysis and Randomized Algorithms Lecturer: Hung Q. Ngo SUNY at Buffalo, Spring 2011 Last update: February 19, 2011 Tail and Concentration Ineualities From here on, we use 1 A to

More information

Appendix B: Inequalities Involving Random Variables and Their Expectations

Appendix B: Inequalities Involving Random Variables and Their Expectations Chapter Fourteen Appendix B: Inequalities Involving Random Variables and Their Expectations In this appendix we present specific properties of the expectation (additional to just the integral of measurable

More information

Practice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example

Practice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example A little more E(X Practice Problem - Skewness of Bernoulli Random Variable Lecture 7: and the Law of Large Numbers Sta30/Mth30 Colin Rundel February 7, 014 Let X Bern(p We have shown that E(X = p Var(X

More information

Entropy and Ergodic Theory Lecture 15: A first look at concentration

Entropy and Ergodic Theory Lecture 15: A first look at concentration Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

Lecture 4: Inequalities and Asymptotic Estimates

Lecture 4: Inequalities and Asymptotic Estimates CSE 713: Random Graphs and Applications SUNY at Buffalo, Fall 003 Lecturer: Hung Q. Ngo Scribe: Hung Q. Ngo Lecture 4: Inequalities and Asymptotic Estimates We draw materials from [, 5, 8 10, 17, 18].

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

1. Stochastic Processes and filtrations

1. Stochastic Processes and filtrations 1. Stochastic Processes and 1. Stoch. pr., A stochastic process (X t ) t T is a collection of random variables on (Ω, F) with values in a measurable space (S, S), i.e., for all t, In our case X t : Ω S

More information

Lecture 9: March 26, 2014

Lecture 9: March 26, 2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query

More information

STA 4322 Exam I Name: Introduction to Statistics Theory

STA 4322 Exam I Name: Introduction to Statistics Theory STA 4322 Exam I Name: Introduction to Statistics Theory Fall 2013 UF-ID: Instructions: There are 100 total points. You must show your work to receive credit. Read each part of each question carefully.

More information

SDS : Theoretical Statistics

SDS : Theoretical Statistics SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff

More information

arxiv: v1 [math.pr] 11 Feb 2019

arxiv: v1 [math.pr] 11 Feb 2019 A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm arxiv:190.03736v1 math.pr] 11 Feb 019 Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Rong Ge Duke

More information

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary

More information

1 Exercises for lecture 1

1 Exercises for lecture 1 1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )

More information

Lecture 6 September 13, 2016

Lecture 6 September 13, 2016 CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Random Graphs. EECS 126 (UC Berkeley) Spring 2019

Random Graphs. EECS 126 (UC Berkeley) Spring 2019 Random Graphs EECS 126 (UC Bereley) Spring 2019 1 Introduction In this note, we will briefly introduce the subject of random graphs, also nown as Erdös-Rényi random graphs. Given a positive integer n and

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Five Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Five Notes Spring 2011 1 / 37 Outline 1

More information

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Self-normalized Cramér-Type Large Deviations for Independent Random Variables Self-normalized Cramér-Type Large Deviations for Independent Random Variables Qi-Man Shao National University of Singapore and University of Oregon qmshao@darkwing.uoregon.edu 1. Introduction Let X, X

More information

ECE534, Spring 2018: Solutions for Problem Set #3

ECE534, Spring 2018: Solutions for Problem Set #3 ECE534, Spring 08: Solutions for Problem Set #3 Jointly Gaussian Random Variables and MMSE Estimation Suppose that X, Y are jointly Gaussian random variables with µ X = µ Y = 0 and σ X = σ Y = Let their

More information

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n Large Sample Theory In statistics, we are interested in the properties of particular random variables (or estimators ), which are functions of our data. In ymptotic analysis, we focus on describing the

More information

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal

More information

Stein s Method: Distributional Approximation and Concentration of Measure

Stein s Method: Distributional Approximation and Concentration of Measure Stein s Method: Distributional Approximation and Concentration of Measure Larry Goldstein University of Southern California 36 th Midwest Probability Colloquium, 2014 Concentration of Measure Distributional

More information