STAT 200C: High-dimensional Statistics
|
|
- Giles White
- 5 years ago
- Views:
Transcription
1 STAT 200C: High-dimensional Statistics Arash A. Amini May 30, / 59
2 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d γ or even d n, e.g genes with only 50 samples. Classical methods fail. E.g., Linear regression y = X β + ε, where ε N(0, σ 2 I n ). ˆβ OLS = argmin β R d y X β 2 2 We have MSE( ˆβ OLS ) = O( σ2 d n ). Solution: Assume some underlying low-dimensional structure (e.g. sparsity). 2 / 59
3 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 3 / 59
4 Concentration inequalities Main tools in dealing with high-dimensional randomness. Non-asymptotic versions of the CLT. General form: P( X EX > t) < something small. Classical examples: Markov and Chebyshev inequalities: Markov: Assume X 0, then P(X t) EX t. Chebyshev: Assume EX 2 <, and let µ = EX. Then, P( X µ t) var(x ) t 2. Stronger assumption: E X k <. Then, P( X µ t) E X µ k t k. 4 / 59
5 Concentration inequalities Example 1 X 1,..., X n Ber(1/2) and S n = n i=1 X i. Then, by CLT Z n := S n n/2 n/4 d N(0, 1). Letting g N(0, 1), P (S n n2 ) n4 + t P(g t) 1 /2 2 e t2. Letting t = α n, P (S n n ) 2 (1 + α) 1 2 e n α2 /2. Problem: Approximation is not tight in general. 5 / 59
6 Theorem 1 (Berry Esseen CLT) Under the assumption of CLT, with ρ = E X 1 µ 3 /σ 3, P(Z n t) P(g t) ρ n. ( The bound is tight since P(S n = n/2) = 1 n ) 2 n n/2 n 1, for the Bernoulli example. Conclusion, the approximation error is O(n 1/2 ) which is a lot larger than the exponential bound O(e n α2 /2 ) that we want to establish. Solution: directly obtain the concentration inequalities, often using Chernoff bounding technique: for any λ > 0, P(Z n t) = P(e λzn e λt ) EeλZn e λt, t R. Leads to the study of the MGF of random variables. 6 / 59
7 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 7 / 59
8 Sub-Gaussian concentration Definition 1 A zero mean random variable X is sub-gaussian if for some σ > 0. Ee λx e σ2 λ 2 /2, for all λ R. (1) A general random variable is sub-gaussian if X EX is sub-gaussian. X N(0, σ 2 ) satisfies (1) with equality. A Rademacher variable: also called symmetric Bernoulli P(X = ±1) = 1 2 is sub-gaussian, Ee λx = cosh(λ) e λ2 /2. Any bounded RV is sub-gaussian: X [a, b] a.s., then (1) with σ = b a 2. 8 / 59
9 Proposition 1 Assume that X is zero-mean sub-gaussian satisfying (1). Then, ) P(X t) exp ( t2 2σ 2, for all t 0. Same bound holds with X replaced with X. Proof: Chernoff bound [ ] ( P(X t) inf e λt Ee λx = inf exp λ>0 λ>0 λt + λ2 σ 2 Union bound gives two-sided bound: P( X t) 2 exp( t2 2σ 2 ). What if µ := EX 0? Apply to X µ, ) P( X µ t) 2 exp ( t2 2σ 2. 2 ). 9 / 59
10 Proposition 2 Assume that {X i } are independent, zero-mean sub-gaussian with parameters {σ i }. Then, S n = i X i is sub-gaussian with parameter σ := i σ2 i. Sub-Gaussian parameter squared behaves like the variance. Proof: Ee λsn = i EeλX i. 10 / 59
11 Theorem 2 (Hoeffding) Assume that {X i } are independent, zero-mean sub-gaussian with parameters {σ i }. Then, letting σ 2 := i σ2 i, ( ) ) P X i t exp ( t2 2σ 2, t 0. Same bound holds with X i replaced with X i. i Alternative form, assume there are n variables, and let σ 2 := 1 n n i=1 σ2 i, and X n := 1 n n i=1 X i. Then, P ( Xn t ) ) exp ( nt2 2 σ 2, t 0. Example: X i iid Rad so that σ = σi = / 59
12 Equivalent characterizations of sub-gaussianity For a RV X, the following are equivalent: (HDP, Prop ) 1. The tails of X satisfy P( X t) 2 exp( t 2 /K1 2 ), for all t The moments of X satisfy X p = (E X p ) 1/p K 2 p, for all p The MGF of X 2 satisfies E exp(λx 2 ) exp(k 2 3 λ 2 ), for all λ 1 K 3 4. The MGF of X 2 is bounded at some point, E exp(x 2 /K4 2 ) 2. Assuming EX = 0, the above are equivalent to: 5. The MGF of X satisfies E exp(λx ) exp(k5 2 λ 2 ), for all λ R. 12 / 59
13 Sub-Gaussian norm The sub-gaussian norm is the smallest K 4 in property 4, i.e., X ψ2 = inf { t > 0 : E exp(x 2 /t 2 ) 2 }. X is sub-gaussian iff X ψ2 <. ψ2 is a proper norm on the space of sub-gaussian RVs. Every sub-gaussian variable satisfies the following bounds: P( X t) 2 exp( ct 2 / X 2 ψ 2 ), for all t 0. X p C X ψ2 p, for all p 1. E exp(x 2 / X 2 ψ 2 ) 2 When EX = 0, E exp(λx ) exp(cλ 2 X 2 ψ 2 ) for all λ R. for some universal constant C, c > / 59
14 Some consequences Recall what a universal/numerical/absolute constant means. Sub-Gaussian norm is within a constant factor of the sub-gaussian parameter σ: for numerical constant c 1, c 2 > 0, c 1 X ψ2 σ(x ) c 2 X ψ2. Easy to see that X ψ2 X. (Bounded variables are sub-gaussian) a b means a Cb for some universal constant C. Lemma 1 (Centering) If X is sub-gaussian, then X EX is sub-gaussian too and X EX ψ2 C X ψ2 where C is a universal constant. Proof: EX ψ2 EX E X = X 1 X ψ2. Note: X EX ψ2 could be much smaller than X ψ2. 14 / 59
15 Alternative forms Alternative form of Proposition 2: Proposition 3 (HDP 2.6.1) Assume that {X i } are independent, zero-mean sub-gaussian RVs. Then i X i is also sub-gaussian and i X i 2 ψ 2 C i X i 2 ψ 2 where C is an absolute constant. 15 / 59
16 Alternative form of Theorem 2: Theorem 3 (Hoeffding) Assume that {X i } are independent, zero-mean sub-gaussian RVs. Then, ( ) P X i t 2 exp ( c ) t2 i X, t 0. i ψ2 c > 0 is some universal constant. i 16 / 59
17 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 17 / 59
18 Sub-exponential concentration Definition 2 A zero mean random variable X is sub-exponential if for some ν, α > 0. Ee λx e ν2 λ 2 /2, for all λ < 1 α. (2) A general random variable is sub-exponential if X EX is sub-exponential. If Z N(0, 1), then Z 2 is sub-exponential. { e λ Ee λ(z 2 1) = 1 2λ λ < 1/2 λ > 1/2. We have Ee λ(z 2 1) e 4λ2 /2, λ < 1/4. hence sub-exponential with parameters (2, 4). Tails of Z 2 1 are heavier than a Gaussian. 18 / 59
19 Proposition 4 Assume that X is zero-mean sub-exponential satisfying (2). Then, P(X t) exp ( 1 { t 2 2 min ν 2, t }), for all t 0. α Same bound holds with X replaced with X. Proof: Chernoff bound P(X t) inf [e ] λt Ee λx λ 0 Let f (λ) = λt + λ 2 ν 2 /2. Minimizer of f over R is λ = t/ν 2. inf exp 0 λ < 1 α ( λt + λ2 ν 2 2 ). 19 / 59
20 Hence minimizer of f over [0, 1/α] is { λ t/ν 2 t/ν 2 < 1/α = 1/α t/ν 2 1/α. and the minimum is t2 f (λ 2ν ) = 2 t < ν 2 /α t α + ν2 2α 2 t. 2α t ν2 /α Thus, f (λ ) max { t2 2ν 2, t } = 1 { t 2 2α 2 min ν 2, t }. α 20 / 59
21 Bernstein inequality for sub-exponential RVs Theorem 4 (Bernstein) Assume that {X i } are independent, zero-mean sub-exponential RVs with parameters (ν i, α i ). Let ν := ( ) 1/2 i ν2 i and α := maxi α i. Then i X i is sub-exponential with parameters (ν, α), and Proof: We have ( ) P X i t exp ( 1 { t 2 2 min ν 2, t }). α i Ee λx i e λ2 ν 2 i /2, for all λ < Let S n = i X i. By independence Ee λsn = i 1 max i α i. Ee λx i e λ2 i ν2 i /2, for all λ < The tail bound follows from Proposition 4. 1 max i α i. 21 / 59
22 Equivalent characterizations of sub-exponential RVs For a RV X, the following are equivalent: (HDP, Prop ) 1. The tails of X satisfy 2. The moments of X satisfy 3. The MGF of X satisfies P( X t) 2 exp( t/k 1 ), for all t 0. X p = (E X p ) 1/p K 2 p, for all p 1. E exp(λ X ) exp(k 3 λ), for all 0 λ 1 K 3 4. The MGF of X is bounded at some point, E exp( X /K 4 ) 2. Assuming EX = 0, the above are equivalent to: 5. The MGF of X satisfies E exp(λx ) exp(k5 2 λ 2 ), for all λ 1. K 5 22 / 59
23 Equivalent characterizations of sub-gaussianity For a RV X, the following are equivalent: (HDP, Prop ) 1. The tails of X satisfy 2. The moments of X satisfy 3. The MGF of X 2 satisfies P( X t) 2 exp( t 2 /K 2 1 ), for all t 0. X p = (E X p ) 1/p K 2 p, for all p 1. E exp(λx 2 ) exp(k3 2 λ 2 ), for all λ 1 K 3 4. The MGF of X 2 is bounded at some point, E exp(x 2 /K4 2 ) 2. Assuming EX = 0, the above are equivalent to: 5. The MGF of X satisfies E exp(λx ) exp(k5 2 λ 2 ), for all λ R. 23 / 59
24 Sub-exponential norm The sub-exponential norm is the smallest K 4 in property 4, i.e., X ψ1 = inf { t > 0 : E exp( X /t) 2 }. X is sub-exponential iff X ψ1 <. ψ1 is a proper norm on the space of sub-exponential RVs. Every sub-exponential variable satisfies the following bounds: P( X t) 2 exp( ct/ X ψ1 ), for all t 0. X p C X ψ1 p, for all p 1. E exp( X / X ψ1 ) 2 When EX = 0, E exp(λx ) exp(cλ 2 X ψ1 ) for all λ 1/ X ψ1. for some universal constant C, c > / 59
25 Lemma 2 A random variable X is sub-gaussian if and only if X 2 is sub-exponential, in fact X 2 ψ1 = X 2 ψ 2. Proof: Immediate from definition. Lemma 3 If X and Y are sub-gaussian, then XY is sub-exponential, and XY ψ1 X ψ2 Y ψ2 Proof: Assume X ψ2 = Y ψ2 = 1, WLOG. Apply Young s inequality ab (a 2 + b 2 )/2 for all a, b R, twice Ee XY Ee (X 2 +Y 2 )/2 = E[e X 2 /2 e Y 2 /2 ] 1 2 E[ e X 2 + e Y 2] / 59
26 Alternative form of Proposition 4: Theorem 5 (Bernstein) Assume that {X i } are independent, zero-mean sub-exponential RVs. Then, ( ) [ ( P X i t 2 exp c min i c > 0 is some universal constant. Corollary 1 (Bernstein) t 2 i X i 2 ψ 1, t )], t 0. max i X i ψ1 Assume that {X i } are independent, zero-mean sub-exponential RVs with X i ψ1 K for all i. Then, ( 1 P n n i=1 ) [ ( t 2 X i t 2 exp c n min K 2, t )], t 0. K 26 / 59
27 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 27 / 59
28 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 28 / 59
29 Concentration of χ 2 RVs I Example 2 Let Y χ 2 n, i.e., Y = n i=1 Z 2 i where Z i iid N(0, 1). Z 2 i are sub-exponential with parameters (2, 4). Then, Y is sub-exponential with parameters (2 n, 4) and we obtain or replacing t with nt ( 1 P n n i=1 P( Y EY ) 2 exp [ 1 ( t 2 2 min 4n, t )] 4 ) i 1 t Z 2 [ 2 exp 1 ] 8 n min(t2, t), t / 59
30 Concentration of χ 2 RVs II In particular, ( 1 P n n i=1 ) i 1 t 2e nt2 /8, t [0, 1]. Z 2 Second approach ignoring constants: We have Z 2 i 1 ψ1 C Z 2 i ψ1 = C Z i 2 ψ 2 = C. Applying Corollary 1 with K = C ( 1 P n n i=1 ) i 1 t Z 2 where c 2 = c min(1/c 2, 1/C). [ ( t 2 2 exp c n min [ 2 exp c 2 n min(t 2, t) C 2, t )], C ], t 0 30 / 59
31 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 31 / 59
32 Random projection for dimension reduction Suppose that we have data points {u 1,..., u N } R d. Want to project them down to a lower-dimensional space R m (m d) such that pairwise distances u i u j are approximately preserved. Can be done by a linear random projection X : R d R m, which can be viewed as a random matrix X R m d. Lemma 4 (Johnson-Lindenstrauss embedding) Let X := 1 m Z R m d where Z has iid N(0, 1) entries. Consider any collection of points {u 1,..., u N } R d. Take ε, δ (0, 1) and assume that Then with probability at least 1 ε, m 16 δ 2 log ( N ε ). (1 δ) u i u j 2 2 Xu i Xu j 2 2 (1 + δ) u i u j 2 2, i j 32 / 59
33 Proof Fix u R d and let Y := Zu 2 2 u 2 2 = m u z i, 2 u 2 i=1 where z T i is the ith row of Z. Then, Y χ 2 m. Recalling X = Z/ m, for all δ (0, 1), ( Xu 2 2 P u 2 2 ) 1 δ ( Y ) = P m 1 δ 2e mδ2 /8 Applying to u = u i u j, for any fixed pair (i, j), we have ( X (u i u j ) 2 2 P u i u j 2 2 ) 1 δ 2e mδ2 /8 33 / 59
34 Apply a further union bound for all pairs i j ) ( N 1 δ, for some i j 2 2 ( X (u i u j ) 2 2 P u i u j 2 2 ) e mδ2 /8 Since 2 ( N 2) N 2, the result follows by solving the following for m N 2 e mδ2 /8 ε. 34 / 59
35 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 35 / 59
36 l 2 norm of sub-gaussian vectors Here X 2 = n i=1 X 2 i. Proposition 5 (Concenration of norm, HDP 3.1.1) Let X = (X 1,..., X n ) R n be a random vector with independent, sub-gaussian coordinates X i that satisfy EXi 2 = 1. Then, X 2 n ψ2 CK 2 where K = max i X i ψ2 and C is an absolute constant. The result says that the norm is highly concentration around n: X 2 n in high dimensions (n large). Assuming K = O(1), it shows that w.h.p. X 2 = n + O(1). More precisely, w.p. 1 e c1v 2, we have n K 2 v X 2 n + K 2 v 36 / 59
37 Simple argument: Assuming sd(x 2 1 ) = O(1), E X 2 2 = n var( X 2 2) = n var(x 2 1 ) sd( X 2 2) = n sd(x 2 1 ) X 2 n ± O( n) = n ± O(1), the latter can be shown by Taylor expansion. 37 / 59
38 Proof of Proposition 5: Argue that we can take K 1. Since X i is sub-gaussian, X 2 i is sub-exponential and X 2 i 1 ψ1 C X 2 i ψ1 = C X i 2 ψ 2 CK 2. Applying Bernstein s inequality (Corollary 1), for any u 0, ( X 2 2 P n ) 1 u ( 2 exp c ) 1n K 4 min(u2, u), where we used K 4 K 2 and absorbed C into c 1. Using the inequality z 1 δ = z 2 1 max(δ, δ 2 ), z, ( X ) 2 P 1 δ n ( X 2 2 P n 2 exp ) 1 max(δ, δ 2 ) ( c 1n K 4 δ2). f (u) = min(u 2, u) and g(δ) = max(δ, δ 2 ), then f (g(δ)) = δ 2 for all δ 0. Change of variable δ = t/ n gives the result. 38 / 59
39 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 39 / 59
40 l norm of sub-gaussian vectors For any vector X R n, the l norm is: X = max X i. i=1,...,n Lemma 5 Let X = (X 1,..., X n ) R n be a random vector with zero-mean, independent, sub-gaussian coordinates X i with parameter σ i. Then, for any γ 0, where σ = max i σ i. P ( X σ 2(1 + γ) log n ) 2n γ Proof: We have P( X i t) 2 exp( t 2 /2σ 2 ), hence taking t = 2σ 2 (1 + γ) log n. ) P(max X i t) 2n exp ( t2 i 2σ 2 = 2n γ 40 / 59
41 Theorem 6 Assume {X i } n i=1 are zero-mean RVs, sub-gaussian with parameter σ. Then, E[ max i=1,...,n X i ] 2σ 2 log n, n 1 Proof of 6: Jensen s inequality on e λz where Z = max i X i. 41 / 59
42 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 42 / 59
43 Theorem 7 (Azuma Hoeffding) Assume that X = (X 1,..., X n ) has independent coordinates, and let Z = f (X ). Let us write E i [Z] = E[Z X 1,..., X i ] and let Assume that i := E i [Z] E i 1 [Z]. E i 1 [e λ i ] e σ2 i λ2 /2, λ R (3) almost surely, for all i = 1,..., n. Then, Z EZ is sub-gaussian with parameter n σ = i=1 σ2 i. In particular, we have the tail bound P( Z EZ t) 2 exp( t2 2σ 2 ). { i } is called Doob s martingale difference sequence. It is a martingale difference seq. since E i 1 [ i ] = / 59
44 Proof Let S j := j i=1 i which is only a function of X i, i j. We have, noting that E 0 [Z] = Z, S n = n i = Z EZ By properties of conditional expectation, and assumption (3), Taking E n 2 of both sides: i=1 E n 1 [e λsn ] = e λsn 1 E n 1 [e λ n ] e λsn 1 e σ2 n λ2 /2 E n 2 [e λsn ] e σ2 n λ2 /2 E n 2 [e λsn 1 ] e λsn 2 e (σ2 n +σ2 n 1 )λ2 /2 Repeating the process, we get E 0 [e λsn ] exp(( n i=1 σ2 i )λ2 /2). 44 / 59
45 Bounded difference inequality Conditional sub-g. assump. holds under bounded difference property: f (x1,..., x i 1, x i, x i+1,..., x n ) f (x 1,..., x i 1, x i, x i+1,..., x n ) Li (4) for all x 1,..., x n, x i X, and all i [n], for some constants (L 1,..., L n ). Theorem 8 (Bounded difference) Assume that X = (X 1,..., X n ) has independent coordinates, and assume that f : X n R satisfies the bounded difference property (4). Then, P( f (X ) Ef (X ) ) ) t 2 exp ( 2t2 n, t 0. i=1 L2 i 45 / 59
46 Proof (Naive bound) We have i = E i [Z] E i 1 [E i [Z]] = g i (X 1,..., X i ) E i 1 [g i (X 1,..., X i )] Let X i be an independent copy of X i. Conditioned on X 1,..., X i 1, we are effectively looking at g i (x 1,..., x i 1, X i ) E[g i (x 1,..., x i 1, X i )] due to independence of {X 1,..., X i, X i }. Thus, i L i condition on X 1,..., X i 1. That is, E i 1 [e λ i ] e σ2 i λ2 /2 where σ 2 i = (2L i ) 2 /4 = L 2 i. 46 / 59
47 Proof (Better bound) Can show that i I i where I i L i, improving the constant by 4. Conditioned on X 1,..., X i, we are effectively looking at i = g i (x 1,..., x i 1, X i ) µ i where µ i is a constant (only a function of x 1,..., x i 1 ). Then, i + µ i [a i, b i ] where a i = inf g i(x 1,..., x i 1, x), b i = sup g i (x 1,..., x i 1, x). x We have (need to argue that g i satisfies bounded difference) [ b i a i = sup gi (x 1,..., x i 1, x) g i (x 1,..., x i 1, y) ] L i. x,y Thus E i 1 [e λ i ] e σ2 λ 2 /2 where σ 2 i = (b i a i ) 2 /4 L 2 i /4. x 47 / 59
48 The role of independence in the second argument is subtle. The only place we used independence is to argue that E i [Z] satisfies bounded difference for all i. We argue that E i [Z] = g i (X 1,..., X i ), which is where we use independence. Then, g i by definition and Jensen satisfies bounded difference. 48 / 59
49 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 49 / 59
50 Example: (bounded) U-statistics g : R 2 R a symmetric function, X 1,..., X n and iid sequence, U := ( 1 n ) g(x i, X j ) 2 i<j is called a U-statistic (of order 2). U is not a sum of independent variables, e.g. n = 3 gives U = 1 3( g(x1, X 2 ) + g(x 1, X 3 ) + g(x 2, X 3 ) ), but the dependence between terms is relatively weak (made precise shortly). For example, g(x, y) = 1 2 (x y)2 gives an unbiased estimator of the variance. (Exercise) 50 / 59
51 Assume that g is bounded, i.e. g b, meaning i.e., g(x, y) b for all x, y R. g := sup g(x, y) b x,y Writing U = f (X 1,..., X n ), we observe that (for fixed k) f (x) f (x \k ) ( 1 n ) g(x i, x k ) g(x i, x k) 2 i k (n 1)2b n(n 1)/2 = 4b n thus f has bounded differences with parameters L k = 4b/n. Applying Theorem 8 P ( U EU t ) 2e nt2 /8b 2, t / 59
52 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 52 / 59
53 Clique number of Erdös-Rényi Let G be an undirected graph.on n nodes. A clique in G is a complete (induced) sub-graph. Clique number of G denoted as ω(g) is the size of the largest clique(s). For two graphs G and G that differ in at most 1 edge, ω(g) ω(g ) 1. Thus E(G) ω(g) has bounded difference property with L = 1. Let G be an Erdös-Rényi random graph: Edges are independently drawn with probability p. Then, with m = ( n 2), ( ) P ω(g) E ω(g) δ 2e 2δ2 /m or setting ω(g) = ω(g)/m, ( ) P ω(g) E ω(g) δ 2e 2mδ2 53 / 59
54 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 54 / 59
55 Lipschitz functions of standard Gaussian vector A function f : R n R is L-Lipschitz w.r.t. 2 if f (x) f (y) L x y 2, x, y R n Theorem 9 (Gaussian concentration) Let X N(0, I n ) be a standard Gaussian vector and assume that f : R n R is L-Lipschitz w.r.t. the Euclidean norm. Then, P( f (X ) E[f (X )] ) ) t 2 exp ( t2 2L 2, t 0. (5) In other words, f (X ) is sub-gaussian with parameter L. Deep result, no easy proof! Has far-reaching consequences. One-sided bounds holds with prefactor 2 removed. 55 / 59
56 Example: χ 2 and norm concentrations revisited Let X N(0, I n ) and condition the function f (x) = x 2 / n. f is L-Lipschitz with L = 1/ n. Hence, ( X 2 P E X ) 2 t e nt2 /2, t 0 n n Since E X 2 n (why?), we have ( X 2 ) P 1 t e n t2 /2, t 0. n For t [0, 1], (1 + t) t, hence or setting 3t = δ, ( X 2 ) P 2 1 3t e n t2 /2, t [0, 1]. n ( X 2 ) P 2 1 δ e n δ2 /18, δ [0, 3]. n 56 / 59
57 Example: order statistics Let X N(0, I n ), and let f (x) = x (k) be the kth order statistic: For x R n, For any x, y R n, we have hence f is 1-Lipschitz. (Exercise) It follows that x (1) x (2) x (n) x (k) y (k) x y 2 P ( X (k) EX (k) t ) 2e t2 /2, t 0 iid In particular, if X i N(0, 1), i = 1,..., n, then ) P( max X n E[ max X n] t 2e t2 /2, t 0 i=1,...,n i=1,...,n 57 / 59
58 Example: singular values Consider a matrix X R n d where n > d. Let σ 1 (X ) σ 2 (X ) σ k (X ) be (ordered) singular values of X. By Weyl s theorem, for any X, Y R n d : σ k (X ) σ k (Y ) X Y op X Y F (Note that this is a generalization of order-statistics inequality.) Thus, X σ k (X ) is 1-Lipschitz: Proposition 6 Let X R n d be a random matrix with iid N(0, 1) entries. Then, ( σk P (X ) E[σ k (X )] ) δ 2e δ2 /2, δ 0 It remains to characterize E[σ k (X )]. For an overview of matrix norms, see matrix norms.pdf 58 / 59
59 Table of Contents 1 Concentration inequalities Sub-Gaussian concentration (Hoeffding inequality) Sub-exponential concentration (Bernstein inequality) Applications of Bernstein inequality χ 2 Concentration Johnson-Lindenstrauss embedding l 2 norm concentration l norm Bounded difference inequality (Azuma Hoeffding) Concentration of (bounded) U-statistics Concentration of clique numbers Gaussian concentration Gaussian chaos (Hanson Wright inequality) 59 / 59
STAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationHigh Dimensional Probability
High Dimensional Probability for Mathematicians and Data Scientists Roman Vershynin 1 1 University of Michigan. Webpage: www.umich.edu/~romanv ii Preface Who is this book for? This is a textbook in probability
More informationMarch 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.
Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)
More informationConcentration inequalities and the entropy method
Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many
More informationRandomized Algorithms
Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1
More informationConcentration inequalities and tail bounds
Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples
More informationSelected Exercises on Expectations and Some Probability Inequalities
Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ
More informationLecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN
Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and
More informationChapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability
Probability Theory Chapter 6 Convergence Four different convergence concepts Let X 1, X 2, be a sequence of (usually dependent) random variables Definition 1.1. X n converges almost surely (a.s.), or with
More informationProbability Background
Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second
More informationLecture 1 Measure concentration
CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples
More informationA NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER
A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER 1. The method Ashwelde and Winter [1] proposed a new approach to deviation inequalities for sums of independent random matrices. The
More informationSTA 711: Probability & Measure Theory Robert L. Wolpert
STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A
More informationSusceptible-Infective-Removed Epidemics and Erdős-Rényi random
Susceptible-Infective-Removed Epidemics and Erdős-Rényi random graphs MSR-Inria Joint Centre October 13, 2015 SIR epidemics: the Reed-Frost model Individuals i [n] when infected, attempt to infect all
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of
More informationOutline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University.
Outline Piotr 1 1 Lane Department of Computer Science and Electrical Engineering West Virginia University 8 April, 01 Outline Outline 1 Tail Inequalities Outline Outline 1 Tail Inequalities General Outline
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationUpper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1
Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More information6.1 Moment Generating and Characteristic Functions
Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,
More informationHigh-Dimensional Probability
High-Dimensional Probability An Introduction with Applications in Data Science Roman Vershynin University of California, Irvine January 22, 2018 https://www.math.uci.edu/~rvershyn/ Contents Preface vi
More informationAnti-concentration Inequalities
Anti-concentration Inequalities Roman Vershynin Mark Rudelson University of California, Davis University of Missouri-Columbia Phenomena in High Dimensions Third Annual Conference Samos, Greece June 2007
More informationThe Moment Method; Convex Duality; and Large/Medium/Small Deviations
Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More informationHoeffding, Chernoff, Bennet, and Bernstein Bounds
Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically
More informationNotes 6 : First and second moment methods
Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative
More informationProbability inequalities 11
Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on
More informationECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018
ECE534, Spring 2018: s for Problem Set #4 Due Friday April 6, 2018 1. MMSE Estimation, Data Processing and Innovations The random variables X, Y, Z on a common probability space (Ω, F, P ) are said to
More informationLimiting Distributions
Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the
More informationConcentration Inequalities
Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does
More informationRandom regular digraphs: singularity and spectrum
Random regular digraphs: singularity and spectrum Nick Cook, UCLA Probability Seminar, Stanford University November 2, 2015 Universality Circular law Singularity probability Talk outline 1 Universality
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationExample continued. Math 425 Intro to Probability Lecture 37. Example continued. Example
continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More information1 Dimension Reduction in Euclidean Space
CSIS0351/8601: Randomized Algorithms Lecture 6: Johnson-Lindenstrauss Lemma: Dimension Reduction Lecturer: Hubert Chan Date: 10 Oct 011 hese lecture notes are supplementary materials for the lectures.
More informationIf g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get
18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in
More informationRandomized Algorithms Week 2: Tail Inequalities
Randomized Algorithms Week 2: Tail Inequalities Rao Kosaraju In this section, we study three ways to estimate the tail probabilities of random variables. Please note that the more information we know about
More informationGaussian vectors and central limit theorem
Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables
More informationThe Canonical Gaussian Measure on R
The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where
More informationMultivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma
Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional
More informationChapter 7. Basic Probability Theory
Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries
More informationProving the central limit theorem
SOR3012: Stochastic Processes Proving the central limit theorem Gareth Tribello March 3, 2019 1 Purpose In the lectures and exercises we have learnt about the law of large numbers and the central limit
More informations k k! E[Xk ], s <s 0.
Chapter Moments and tails M X (s) =E e sx, defined for all s R where it is finite, which includes at least s =0. If M X (s) is defined on ( s 0,s 0 ) for some s 0 > 0 then X has finite moments of all orders
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Stochastic Convergence Barnabás Póczos Motivation 2 What have we seen so far? Several algorithms that seem to work fine on training datasets: Linear regression
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction
More informationStochastic Models (Lecture #4)
Stochastic Models (Lecture #4) Thomas Verdebout Université libre de Bruxelles (ULB) Today Today, our goal will be to discuss limits of sequences of rv, and to study famous limiting results. Convergence
More informationOn the Bennett-Hoeffding inequality
On the Bennett-Hoeffding inequality of Iosif 1,2,3 1 Department of Mathematical Sciences Michigan Technological University 2 Supported by NSF grant DMS-0805946 3 Paper available at http://arxiv.org/abs/0902.4058
More informationLearning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38
Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationLecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a]
U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 4 Professor Satish Rao September 7, 2010 Lecturer: Satish Rao Last revised September 13, 2010 Lecture 4 1 Deviation bounds. Deviation bounds
More informationLIST OF FORMULAS FOR STK1100 AND STK1110
LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function
More informationSmall Ball Probability, Arithmetic Structure and Random Matrices
Small Ball Probability, Arithmetic Structure and Random Matrices Roman Vershynin University of California, Davis April 23, 2008 Distance Problems How far is a random vector X from a given subspace H in
More informationStatistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it
Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students
More informationLecture 18: March 15
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More informationRandom Methods for Linear Algebra
Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform
More informationMoments and tails. Chapter 2
Chapter Moments and tails M X (s) =E e sx, defined for all s R where it is finite, which includes at least s =0. If M X (s) is defined on ( s 0,s 0 ) for some s 0 > 0 then X has finite moments of all orders
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationConcentration Inequalities for Random Matrices
Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,
More information8 Laws of large numbers
8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable
More informationNotes 1 : Measure-theoretic foundations I
Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,
More informationMatrix concentration inequalities
ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018 Recap: matrix Bernstein inequality Consider a sequence of independent random
More informationLimiting Distributions
We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results
More informationTom Salisbury
MATH 2030 3.00MW Elementary Probability Course Notes Part V: Independence of Random Variables, Law of Large Numbers, Central Limit Theorem, Poisson distribution Geometric & Exponential distributions Tom
More informationWeek 9 The Central Limit Theorem and Estimation Concepts
Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population
More information11.1 Set Cover ILP formulation of set cover Deterministic rounding
CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use
More informationTail and Concentration Inequalities
CSE 694: Probabilistic Analysis and Randomized Algorithms Lecturer: Hung Q. Ngo SUNY at Buffalo, Spring 2011 Last update: February 19, 2011 Tail and Concentration Ineualities From here on, we use 1 A to
More informationAppendix B: Inequalities Involving Random Variables and Their Expectations
Chapter Fourteen Appendix B: Inequalities Involving Random Variables and Their Expectations In this appendix we present specific properties of the expectation (additional to just the integral of measurable
More informationPractice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example
A little more E(X Practice Problem - Skewness of Bernoulli Random Variable Lecture 7: and the Law of Large Numbers Sta30/Mth30 Colin Rundel February 7, 014 Let X Bern(p We have shown that E(X = p Var(X
More informationEntropy and Ergodic Theory Lecture 15: A first look at concentration
Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationRandom matrices: Distribution of the least singular value (via Property Testing)
Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued
More informationLecture 11. Multivariate Normal theory
10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances
More informationLecture 4: Inequalities and Asymptotic Estimates
CSE 713: Random Graphs and Applications SUNY at Buffalo, Fall 003 Lecturer: Hung Q. Ngo Scribe: Hung Q. Ngo Lecture 4: Inequalities and Asymptotic Estimates We draw materials from [, 5, 8 10, 17, 18].
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More information1. Stochastic Processes and filtrations
1. Stochastic Processes and 1. Stoch. pr., A stochastic process (X t ) t T is a collection of random variables on (Ω, F) with values in a measurable space (S, S), i.e., for all t, In our case X t : Ω S
More informationLecture 9: March 26, 2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query
More informationSTA 4322 Exam I Name: Introduction to Statistics Theory
STA 4322 Exam I Name: Introduction to Statistics Theory Fall 2013 UF-ID: Instructions: There are 100 total points. You must show your work to receive credit. Read each part of each question carefully.
More informationSDS : Theoretical Statistics
SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff
More informationarxiv: v1 [math.pr] 11 Feb 2019
A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm arxiv:190.03736v1 math.pr] 11 Feb 019 Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Rong Ge Duke
More informationExercises Measure Theoretic Probability
Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary
More information1 Exercises for lecture 1
1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )
More informationLecture 6 September 13, 2016
CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationRandom Graphs. EECS 126 (UC Berkeley) Spring 2019
Random Graphs EECS 126 (UC Bereley) Spring 2019 1 Introduction In this note, we will briefly introduce the subject of random graphs, also nown as Erdös-Rényi random graphs. Given a positive integer n and
More informationModule 3. Function of a Random Variable and its distribution
Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given
More informationProbability and Statistics Notes
Probability and Statistics Notes Chapter Five Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Five Notes Spring 2011 1 / 37 Outline 1
More informationSelf-normalized Cramér-Type Large Deviations for Independent Random Variables
Self-normalized Cramér-Type Large Deviations for Independent Random Variables Qi-Man Shao National University of Singapore and University of Oregon qmshao@darkwing.uoregon.edu 1. Introduction Let X, X
More informationECE534, Spring 2018: Solutions for Problem Set #3
ECE534, Spring 08: Solutions for Problem Set #3 Jointly Gaussian Random Variables and MMSE Estimation Suppose that X, Y are jointly Gaussian random variables with µ X = µ Y = 0 and σ X = σ Y = Let their
More informationLarge Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n
Large Sample Theory In statistics, we are interested in the properties of particular random variables (or estimators ), which are functions of our data. In ymptotic analysis, we focus on describing the
More informationLecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.
Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal
More informationStein s Method: Distributional Approximation and Concentration of Measure
Stein s Method: Distributional Approximation and Concentration of Measure Larry Goldstein University of Southern California 36 th Midwest Probability Colloquium, 2014 Concentration of Measure Distributional
More information