Concentration inequalities and the entropy method
|
|
- Leslie Harper
- 5 years ago
- Views:
Transcription
1 Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona
2 what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables.
3 what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables. X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). How large are typical deviations of Z from EZ? We seek upper bounds for for t > 0. P{Z > EZ + t} and P{Z < EZ t}
4 various approaches - martingales (Yurinskii, 1974; Milman and Schechtman, 1986; Shamir and Spencer, 1987; McDiarmid, 1989,1998); - information theoretic and transportation methods (Alhswede, Gács, and Körner, 1976; Marton 1986, 1996, 1997; Dembo 1997); - Talagrand s induction method, 1996; - logarithmic Sobolev inequalities (Ledoux 1996, Massart 1998, Boucheron, Lugosi, Massart 1999, 2001).
5 chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. e λt
6 chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. If Z = n X i is a sum of independent random variables, n n Ee λz = E e λx i = Ee λx i by independence. It suffices to find bounds for Ee λx i. e λt
7 chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. If Z = n X i is a sum of independent random variables, n n Ee λz = E e λx i = Ee λx i by independence. It suffices to find bounds for Ee λx i. e λt Serguei Bernstein ( ) Herman Chernoff (1923 )
8 hoeffding s inequality If X 1,..., X n [0, 1], then Ee λ(x i EX i ) e λ2 /8.
9 hoeffding s inequality If X 1,..., X n [0, 1], then Ee λ(x i EX i ) e λ2 /8. We obtain { 1 P n [ n 1 X i E n ] } n X i > t 2e 2nt2 Wassily Hoeffding ( )
10 martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z.
11 martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z. Writing we have i = E i Z E i 1 Z, Z EZ = n i This is the Doob martingale representation of Z.
12 martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z. Writing we have i = E i Z E i 1 Z, Z EZ = n i This is the Doob martingale representation of Z. Joseph Leo Doob ( )
13 martingale representation: the variance ( n Var (Z) = E Now if j > i, E i j = 0, so ) 2 i = We obtain ( n Var (Z) = E n E i j i = i E i j = 0, ) 2 i = [ ] E 2 i + 2 E i j. j>i n [ ] E 2 i.
14 martingale representation: the variance ( n Var (Z) = E Now if j > i, E i j = 0, so ) 2 i = We obtain ( n Var (Z) = E n E i j i = i E i j = 0, ) 2 i = [ ] E 2 i + 2 E i j. j>i n [ ] E 2 i From this, using independence, it is easy derive the Efron-Stein inequality..
15 efron-stein inequality (1981) Let X 1,..., X n be independent random variables taking values in X. Let f : X n R and Z = f(x 1,..., X n ). Then Var(Z) E n (Z E (i) Z) 2 = E n Var (i) (Z). where E (i) Z is expectation with respect to the i-th variable X i only.
16 efron-stein inequality (1981) Let X 1,..., X n be independent random variables taking values in X. Let f : X n R and Z = f(x 1,..., X n ). Then Var(Z) E n (Z E (i) Z) 2 = E n Var (i) (Z). where E (i) Z is expectation with respect to the i-th variable X i only. We obtain more useful forms by using that Var(X) = 1 2 E(X X ) 2 and Var(X) E(X a) 2 for any constant a.
17 efron-stein inequality (1981) If X 1,..., X n are independent copies of X 1,..., X n, and Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), then Var(Z) 1 2 E [ n (Z Z i )2 ] Z is concentrated if it doesn t depend too much on any of its variables.
18 efron-stein inequality (1981) If X 1,..., X n are independent copies of X 1,..., X n, and Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), then Var(Z) 1 2 E [ n (Z Z i )2 ] Z is concentrated if it doesn t depend too much on any of its variables. If Z = n X i then we have an equality. Sums are the least concentrated of all functions!
19 efron-stein inequality (1981) If for some arbitrary functions f i Z i = f i (X 1,..., X i 1, X i+1,..., X n ), then [ n ] Var(Z) E (Z Z i ) 2
20 efron, stein, and steele Bradley Efron Charles Stein Mike Steele
21 example: kernel density estimation Let X 1,..., X n be i.i.d. real samples drawn according to some density φ. The kernel density estimate is φ n (x) = 1 n ( x Xi K nh h ), where h > 0, and K is a nonnegative kernel K = 1. The L 1 error is Z = f(x 1,..., X n ) = φ(x) φ n (x) dx.
22 example: kernel density estimation Let X 1,..., X n be i.i.d. real samples drawn according to some density φ. The kernel density estimate is φ n (x) = 1 n ( x Xi K nh h ), where h > 0, and K is a nonnegative kernel K = 1. The L 1 error is Z = f(x 1,..., X n ) = φ(x) φ n (x) dx. It is easy to see that f(x 1,..., x n ) f(x 1,..., x i,..., x n) 1 ( ) ( x xi x x K K i nh h h (Devroye, 1991.) so we get Var(Z) 2 n. ) dx 2 n,
23 Luc Devroye on Taksim square (2013).
24 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b.
25 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then Var(f(X)) aef(x) + b.
26 self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X).
27 self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X). Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions.
28 self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X). Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions. Configuration functions.
29 beyond the variance X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Recall the Doob martingale representation: Z EZ = n i where i = E i Z E i 1 Z, with E i [ ] = E[ X 1,..., X i ]. To get exponential inequalities, we bound the moment generating function Ee λ(z EZ).
30 azuma s inequality Suppose that the martingale differences are bounded: i c i. Then ( n 1 ) Ee λ(z EZ) = Ee λ( n i) = EEn e λ i +λ n = Ee λ ( n 1 i Ee λ ( n 1 i ) ) e λ2 ( n c2 i )/2. E n e λ n e λ2 c 2 n /2 (by Hoeffding) This is the Azuma-Hoeffding inequality for sums of bounded martingale differences.
31 bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded.
32 bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded. Bounded differences inequality: if X 1,..., X n are independent, then P{ Z EZ > t} 2e 2t2 / n c2 i.
33 Colin McDiarmid bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded. Bounded differences inequality: if X 1,..., X n are independent, then P{ Z EZ > t} 2e 2t2 / n c2 i. McDiarmid s inequality.
34 bounded differences inequality Easy to use. Distribution free. Often close to optimal. Does not exploit variance information. Often too rigid. Other methods are necessary.
35 shannon entropy If X, Y are random variables taking values in a set of size N, H(X) = x p(x) log p(x) H(X Y)= H(X, Y) H(Y) = p(x, y) log p(x y) x,y H(X) log N and H(X Y) H(X) Claude Shannon ( )
36 han s inequality If X = (X 1,..., X n ) and X (i) = (X 1,..., X i 1, X i+1,..., X n ), then n ( ) H(X) H(X (i) ) H(X) Te Sun Han
37 han s inequality If X = (X 1,..., X n ) and X (i) = (X 1,..., X i 1, X i+1,..., X n ), then Proof: n ( ) H(X) H(X (i) ) H(X) H(X)= H(X (i) ) + H(X i X (i) ) H(X (i) ) + H(X i X 1,..., X i 1 ) Te Sun Han Since n H(X i X 1,..., X i 1 ) = H(X), summing the inequality, we get (n 1)H(X) n H(X (i) ).
38 combinatorial entropies an example Let X 1,..., X n be independent points in the plane (of arbitrary distribution!). Let N be the number of subsets of points that are in convex position. Then Var(log 2 N) E log 2 N.
39 proof By Efron-Stein, it suffices to prove that f is self-bounding: 0 f n (x) f n 1 (x (i) ) 1 and n ( ) f n (x) f n 1 (x (i) ) f n (x).
40 proof By Efron-Stein, it suffices to prove that f is self-bounding: and n 0 f n (x) f n 1 (x (i) ) 1 ( ) f n (x) f n 1 (x (i) ) f n (x). The first property is obvious, only need to prove the second.
41 proof By Efron-Stein, it suffices to prove that f is self-bounding: and n 0 f n (x) f n 1 (x (i) ) 1 ( ) f n (x) f n 1 (x (i) ) f n (x). The first property is obvious, only need to prove the second. This is a deterministic property so fix the points.
42 proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set.
43 proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set. H(Y) = H(Y 1,..., Y n ) = log 2 N = f n (x) Also, H(Y (i) ) f n 1 (x (i) )
44 proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set. H(Y) = H(Y 1,..., Y n ) = log 2 N = f n (x) Also, H(Y (i) ) f n 1 (x (i) ) so by Han s inequality, n ( ) f n (x) f n 1 (x (i) ) n ( ) H(Y) H(Y (i) ) H(Y) = f n (x)
45 Han s inequality for relative entropies Let P = P 1 P n be a product distribution and Q an arbitrary distribution on X n.
46 Han s inequality for relative entropies Let P = P 1 P n be a product distribution and Q an arbitrary distribution on X n. By Han s inequality, D(Q P) 1 n 1 n D(Q (i) P (i) )
47 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0.
48 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0.
49 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0. Ent(Z) is the relative entropy between the distribution induced by Z on X n and the distribution of X = (X 1,..., X n ).
50 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0. Ent(Z) is the relative entropy between the distribution induced by Z on X n and the distribution of X = (X 1,..., X n ). Denote Then by Han s inequality, Ent (i) (Z) = E (i) Φ(Z) Φ(E (i) Z) Ent(Z) E n Ent (i) (Z).
51 a logarithmic sobolev inequality on the hypercube Let X = (X 1,..., X n ) be uniformly distributed over { 1, 1} n. If f : { 1, 1} n R and Z = f(x), Ent(Z 2 ) 1 n 2 E (Z Z i )2 The proof uses subadditivity of the entropy and calculus for the case n = 1. Implies Efron-Stein.
52 Sergei Lvovich Sobolev ( )
53 herbst s argument: exponential concentration If f : { 1, 1} n R, the log-sobolev inequality may be used with g(x) = e λf(x)/2 where λ R. If F(λ) = Ee λz is the moment generating function of Z = f(x), [ Ent(g(X) 2 )= λe Ze λz] [ E e λz] log E [Ze λz] = λf (λ) F(λ) log F(λ). Differential inequalities are obtained for F(λ).
54 herbst s argument As an example, suppose f is such that n (Z Z i )2 + v. Then by the log-sobolev inequality, λf (λ) F(λ) log F(λ) vλ2 4 F(λ) If G(λ) = log F(λ), this becomes ( G(λ) λ ) v 4. This can be integrated: G(λ) λez + λv/4, so F(λ) e λez λ2 v/4 This implies P{Z > EZ + t} e t2 /v
55 herbst s argument As an example, suppose f is such that n (Z Z i )2 + v. Then by the log-sobolev inequality, λf (λ) F(λ) log F(λ) vλ2 4 F(λ) If G(λ) = log F(λ), this becomes ( G(λ) λ ) v 4. This can be integrated: G(λ) λez + λv/4, so This implies F(λ) e λez λ2 v/4 P{Z > EZ + t} e t2 /v Stronger than the bounded differences inequality!
56 gaussian log-sobolev inequality Let X = (X 1,..., X n ) be a vector of i.i.d. standard normal If f : R n R and Z = f(x), Ent(Z 2 ) 2E [ f(x) 2] (Gross, 1975).
57 gaussian log-sobolev inequality Let X = (X 1,..., X n ) be a vector of i.i.d. standard normal If f : R n R and Z = f(x), Ent(Z 2 ) 2E [ f(x) 2] (Gross, 1975). Proof sketch: By the subadditivity of entropy, it suffices to prove it for n = 1. Approximate Z = f(x) by ( 1 m ) f ε i m where the ε i are i.i.d. Rademacher random variables. Use the log-sobolev inequality of the hypercube and the central limit theorem.
58 gaussian concentration inequality Herbst t argument may now be repeated: Suppose f is Lipschitz: for all x, y R n, Then, for all t > 0, f(x) f(y) L x y. P {f(x) Ef(X) t} e t2 /(2L 2). (Tsirelson, Ibragimov, and Sudakov, 1976).
59 beyond bernoulli and gaussian: the entropy method For general distributions, logarithmic Sobolev inequalities are not available. Solution: modified logarithmic Sobolev inequalities. Suppose X 1,..., X n are independent. Let Z = f(x 1,..., X n ) and Z i = f i (X (i) ) = f i (X 1,..., X i 1, X i+1,..., X n ). Let φ(x) = e x x 1. Then for all λ R, [ λe Ze λz] [ E e λz] log E [e λz] n [ ] E e λz φ ( λ(z Z i )). Michel Ledoux
60 the entropy method Define Z i = inf x i f(x 1,..., x i,..., X n) and suppose n (Z Z i ) 2 v. Then for all t > 0, P {Z EZ > t} e t2 /(2v).
61 the entropy method Define Z i = inf x i f(x 1,..., x i,..., X n) and suppose n (Z Z i ) 2 v. Then for all t > 0, P {Z EZ > t} e t2 /(2v). This implies the bounded differences inequality and much more.
62 example: the largest eigenvalue of a symmetric matrix Let A = (X i,j ) n n be symmetric, the X i,j independent (i j) with X i,j 1. Let Z = λ 1 = sup u T Au. u: u =1 and suppose v is such that Z = v T Av. A i,j is obtained by replacing X i,j by x i,j. Then ( ) (Z Z i,j ) + v T Av v T A i,j v 1Z>Zi,j ( ) ) = v T (A A i,j )v 1Z>Zi,j 2 (v i v j (X i,j X i,j ) Therefore, 1 i j n 4 v i v j. (Z Z i,j )2 + 1 i j n ( n ) 2 16 v i v j 2 16 vi 2 = 16. +
63 self-bounding functions Suppose Z satisfies 0 Z Z i 1 and n (Z Z i ) Z. Recall that Var(Z) EZ. We have much more: P{Z > EZ + t} e t2 /(2EZ+2t/3) and P{Z < EZ t} e t2 /(2EZ)
64 exponential efron-stein inequality Define n [ ] V + = E (Z Z i )2 + and n [ ] V = E (Z Z i )2. By Efron-Stein, Var(Z) EV + and Var(Z) EV.
65 exponential efron-stein inequality Define and By Efron-Stein, V + = V = n [ ] E (Z Z i )2 + n [ ] E (Z Z i )2. Var(Z) EV + and Var(Z) EV. The following exponential versions hold for all λ, θ > 0 with λθ < 1: log Ee λ(z EZ) λθ 1 λθ log /θ EeλV+. If also Z i Z 1 for every i, then for all λ (0, 1/2), log Ee λ(z EZ) 2λ 1 2λ log EeλV.
66 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b.
67 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then P {Z EZ + t} exp ( t 2 2 (aez + b + at/2) ).
68 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then P {Z EZ + t} exp ( t 2 2 (aez + b + at/2) ). If, in addition, f(x) f i (x (i) ) 1, then for 0 < t EZ, ( ) P {Z EZ t} exp t 2 2 (aez + b + c t). where c = (3a 1)/6.
69 the isoperimetric view Let X = (X 1,..., X n ) have independent components, taking values in X n. Let A X n. The Hamming distance of X to A is d(x, A) = min d(x, y) = min y A y A n 1 Xi y i. Michel Talagrand
70 the isoperimetric view Let X = (X 1,..., X n ) have independent components, taking values in X n. Let A X n. The Hamming distance of X to A is d(x, A) = min d(x, y) = min y A y A n 1 Xi y i. Michel Talagrand { n P d(x, A) t + 2 log 1 } e 2t2 /n. P[A]
71 the isoperimetric view Proof: By the bounded differences inequality, P{Ed(X, A) d(x, A) t} e 2t2 /n. Taking t = Ed(X, A), we get Ed(X, A) n 2 log 1 P{A}. By the bounded differences inequality again, { } n P d(x, A) t + 2 log 1 e 2t2 /n P{A}
72 talagrand s convex distance The weighted Hamming distance is d α (x, A) = inf d α(x, y) = inf y A y A i:x i y i α i where α = (α 1,..., α n ). The same argument as before gives { } α 2 1 P d α (X, A) t + log e 2t2 / α 2, 2 P{A} This implies sup min (P{A}, P {d α (X, A) t}) e t2 /2. α: α =1
73 convex distance inequality convex distance: d T (x, A) = sup d α (x, A). α [0, ) n : α =1
74 convex distance inequality convex distance: d T (x, A) = Talagrand s convex distance inequality: sup d α (x, A). α [0, ) n : α =1 P{A}P {d T (X, A) t} e t2 /4.
75 convex distance inequality convex distance: d T (x, A) = Talagrand s convex distance inequality: sup d α (x, A). α [0, ) n : α =1 P{A}P {d T (X, A) t} e t2 /4. Follows from the fact that d T (X, A) 2 is (4, 0) weakly self bounding (by a saddle point representation of d T ). Talagrand s original proof was different. It can also be recovered from Marton s transportation inequality.
76 convex lipschitz functions For A [0, 1] n and x [0, 1] n, define If A is convex, then D(x, A) = inf x y. y A D(x, A) d T (x, A).
77 convex lipschitz functions For A [0, 1] n and x [0, 1] n, define If A is convex, then D(x, A) = inf x y. y A D(x, A) d T (x, A). Proof: D(x, A)= inf x E νy (since A is convex) ν M(A) n ( ) inf Eν 1 2 xj Y j (since x j, Y j [0, 1]) ν M(A) = inf j=1 sup ν M(A) α: α 1 j=1 n α j E ν 1 xj Y j = d T (x, A) (by minimax theorem). (by Cauchy-Schwarz)
78 convex lipschitz functions Let X = (X 1,..., X n ) have independent components taking values in [0, 1]. Let f : [0, 1] n R be quasi-convex such that f(x) f(y) x y. Then P{f(X) > Mf(X) + t} 2e t2 /4 and P{f(X) < Mf(X) t} 2e t2 /4.
79 Take s = Mf(X) for the upper tail and s = Mf(X) t for the lower tail. convex lipschitz functions Let X = (X 1,..., X n ) have independent components taking values in [0, 1]. Let f : [0, 1] n R be quasi-convex such that f(x) f(y) x y. Then and P{f(X) > Mf(X) + t} 2e t2 /4 P{f(X) < Mf(X) t} 2e t2 /4. Proof: Let A s = {x : f(x) s} [0, 1] n. A s is convex. Since f is Lipschitz, f(x) s + D(x, A s ) s + d T (x, A s ), By the convex distance inequality, P{f(X) s + t}p{f(x) s} e t2 /4.
80 Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ).
81 Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ). H Φ is subadditive: H Φ (Z) n EH (i) Φ (Z) if (and only if) Φ is twice differentiable on (0, ), and either Φ is affine or strictly positive and 1/Φ is concave.
82 Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ). H Φ is subadditive: H Φ (Z) n EH (i) Φ (Z) if (and only if) Φ is twice differentiable on (0, ), and either Φ is affine or strictly positive and 1/Φ is concave. Φ(x) = x 2 corresponds to Efron-Stein. x log x is subadditivity of entropy. We may consider Φ(x) = x p for p (1, 2].
83 generalized efron-stein Define Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), n V + = (Z Z i )2 +.
84 generalized efron-stein Define Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), n V + = (Z Z i )2 +. For q 2 and q/2 α q 1, E [ (Z EZ) q ] + E [ (Z EZ) α ] ] q/α + + α (q α) E [V + (Z EZ) q 2 +, and similarly for E [ (Z EZ) q ].
85 moment inequalities We may solve the recursions, for q 2.
86 moment inequalities We may solve the recursions, for q 2. If V + c for some constant c 0, then for all integers q 2, ( E [ (Z EZ) q +]) 1/q Kqc, where K = 1/ ( e e ) <
87 moment inequalities We may solve the recursions, for q 2. If V + c for some constant c 0, then for all integers q 2, ( E [ (Z EZ) q +]) 1/q Kqc, where K = 1/ ( e e ) < More generally, ( [ E (Z EZ) q ]) 1/q ( [ ]) q E V + q/2 1/q.
88
Concentration, self-bounding functions
Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3
More informationOXPORD UNIVERSITY PRESS
Concentration Inequalities A Nonasymptotic Theory of Independence STEPHANE BOUCHERON GABOR LUGOSI PASCAL MASS ART OXPORD UNIVERSITY PRESS CONTENTS 1 Introduction 1 1.1 Sums of Independent Random Variables
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationConcentration of Measure with Applications in Information Theory, Communications and Coding
Concentration of Measure with Applications in Information Theory, Communications and Coding Maxim Raginsky UIUC Urbana, IL 61801, USA maxim@illinois.edu Igal Sason Technion Haifa 32000, Israel sason@ee.technion.ac.il
More informationMarch 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.
Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)
More informationConcentration inequalities: basics and some new challenges
Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,
More informationConcentration inequalities
Concentration inequalities A nonasymptotic theory of independence Stéphane Boucheron Paris 7 Gábor Lugosi ICREA and Universitat Pompeu Fabra Pascal Massart Orsay CLARENDON PRESS. OXFORD 2012 PREFACE Measure
More informationEXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION
EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION Luc Devroye Division of Statistics University of California at Davis Davis, CA 95616 ABSTRACT We derive exponential inequalities for the oscillation
More informationA Note on Jackknife Based Estimates of Sampling Distributions. Abstract
A Note on Jackknife Based Estimates of Sampling Distributions C. Houdré Abstract Tail estimates of statistics are given; they depend on jackknife estimates of variance of the statistics of interest. 1
More informationConcentration of Measure with Applications in Information Theory, Communications, and Coding
Concentration of Measure with Applications in Information Theory, Communications, and Coding Maxim Raginsky (UIUC) and Igal Sason (Technion) A Tutorial, Presented at 2015 IEEE International Symposium on
More informationConcentration Inequalities
Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does
More informationLecture 1 Measure concentration
CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples
More informationUniform concentration inequalities, martingales, Rademacher complexity and symmetrization
Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationA Gentle Introduction to Concentration Inequalities
A Gentle Introduction to Concentration Inequalities Karthik Sridharan Abstract This notes is ment to be a review of some basic inequalities and bounds on Random variables. A basic understanding of probability
More informationOn concentration of self-bounding functions
E l e c t r o n i c J o u r n a l o f P r o b a b i l i t y Vol 14 (2009), Paper no 64, pages 1884 1899 Journal URL http://wwwmathwashingtonedu/~ejpecp/ On concentration of self-bounding functions Stéphane
More informationAN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES
Lithuanian Mathematical Journal, Vol. 4, No. 3, 00 AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES V. Bentkus Vilnius Institute of Mathematics and Informatics, Akademijos 4,
More informationLectures 6: Degree Distributions and Concentration Inequalities
University of Washington Lecturer: Abraham Flaxman/Vahab S Mirrokni CSE599m: Algorithms and Economics of Networks April 13 th, 007 Scribe: Ben Birnbaum Lectures 6: Degree Distributions and Concentration
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationConcentration inequalities for non-lipschitz functions
Concentration inequalities for non-lipschitz functions University of Warsaw Berkeley, October 1, 2013 joint work with Radosław Adamczak (University of Warsaw) Gaussian concentration (Sudakov-Tsirelson,
More informationStein s method, logarithmic Sobolev and transport inequalities
Stein s method, logarithmic Sobolev and transport inequalities M. Ledoux University of Toulouse, France and Institut Universitaire de France Stein s method, logarithmic Sobolev and transport inequalities
More informationHW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.
HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard
More informationStein s Method: Distributional Approximation and Concentration of Measure
Stein s Method: Distributional Approximation and Concentration of Measure Larry Goldstein University of Southern California 36 th Midwest Probability Colloquium, 2014 Concentration of Measure Distributional
More informationConcentration Inequalities for Dependent Random Variables via the Martingale Method
Concentration Inequalities for Dependent Random Variables via the Martingale Method Leonid Kontorovich, and Kavita Ramanan Carnegie Mellon University L. Kontorovich School of Computer Science Carnegie
More informationSelected Exercises on Expectations and Some Probability Inequalities
Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ
More informationConcentration inequalities and tail bounds
Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples
More informationEntropy and Ergodic Theory Lecture 15: A first look at concentration
Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that
More informationHomework 1 Due: Wednesday, September 28, 2016
0-704 Information Processing and Learning Fall 06 Homework Due: Wednesday, September 8, 06 Notes: For positive integers k, [k] := {,..., k} denotes te set of te first k positive integers. Wen p and Y q
More informationDistance-Divergence Inequalities
Distance-Divergence Inequalities Katalin Marton Alfréd Rényi Institute of Mathematics of the Hungarian Academy of Sciences Motivation To find a simple proof of the Blowing-up Lemma, proved by Ahlswede,
More informationNotes 6 : First and second moment methods
Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative
More informationConcentration Inequalities for Random Matrices
Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationA note on the convex infimum convolution inequality
A note on the convex infimum convolution inequality Naomi Feldheim, Arnaud Marsiglietti, Piotr Nayar, Jing Wang Abstract We characterize the symmetric measures which satisfy the one dimensional convex
More information4 Expectation & the Lebesgue Theorems
STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationMatrix concentration inequalities
ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018 Recap: matrix Bernstein inequality Consider a sequence of independent random
More informationCommon-Knowledge / Cheat Sheet
CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]
More informationOutline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University.
Outline Piotr 1 1 Lane Department of Computer Science and Electrical Engineering West Virginia University 8 April, 01 Outline Outline 1 Tail Inequalities Outline Outline 1 Tail Inequalities General Outline
More information6.1 Moment Generating and Characteristic Functions
Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,
More informationPhenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM
More informationIntroduction to Self-normalized Limit Theory
Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized
More informationRandomized Algorithms
Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More informationConcentration of Measures by Bounded Size Bias Couplings
Concentration of Measures by Bounded Size Bias Couplings Subhankar Ghosh, Larry Goldstein University of Southern California [arxiv:0906.3886] January 10 th, 2013 Concentration of Measure Distributional
More informationConcentration of Measures by Bounded Couplings
Concentration of Measures by Bounded Couplings Subhankar Ghosh, Larry Goldstein and Ümit Işlak University of Southern California [arxiv:0906.3886] [arxiv:1304.5001] May 2013 Concentration of Measure Distributional
More informationThe symmetry in the martingale inequality
Statistics & Probability Letters 56 2002 83 9 The symmetry in the martingale inequality Sungchul Lee a; ;, Zhonggen Su a;b;2 a Department of Mathematics, Yonsei University, Seoul 20-749, South Korea b
More informationNEW FUNCTIONAL INEQUALITIES
1 / 29 NEW FUNCTIONAL INEQUALITIES VIA STEIN S METHOD Giovanni Peccati (Luxembourg University) IMA, Minneapolis: April 28, 2015 2 / 29 INTRODUCTION Based on two joint works: (1) Nourdin, Peccati and Swan
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationMEASURE CONCENTRATION FOR COMPOUND POIS- SON DISTRIBUTIONS
Elect. Comm. in Probab. 11 (26), 45 57 ELECTRONIC COMMUNICATIONS in PROBABILITY MEASURE CONCENTRATION FOR COMPOUND POIS- SON DISTRIBUTIONS I. KONTOYIANNIS 1 Division of Applied Mathematics, Brown University,
More informationThe space complexity of approximating the frequency moments
The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate
More information3. Review of Probability and Statistics
3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture
More informationStat410 Probability and Statistics II (F16)
Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two
More informationSTA 711: Probability & Measure Theory Robert L. Wolpert
STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More informationBennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence
Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present
More informationInformation Theory and Communication
Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information
More informationLAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES
LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES Luc Devroye School of Computer Science McGill University Montreal, Canada H3A 2K6 luc@csmcgillca June 25, 2001 Abstract We
More informationA NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER
A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER 1. The method Ashwelde and Winter [1] proposed a new approach to deviation inequalities for sums of independent random matrices. The
More informationECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018
ECE534, Spring 2018: s for Problem Set #4 Due Friday April 6, 2018 1. MMSE Estimation, Data Processing and Innovations The random variables X, Y, Z on a common probability space (Ω, F, P ) are said to
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More information3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions
3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions
More informationLecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write
Lecture 3: Expected Value 1.) Definitions. If X 0 is a random variable on (Ω, F, P), then we define its expected value to be EX = XdP. Notice that this quantity may be. For general X, we say that EX exists
More informationComputational Learning Theory - Hilary Term : Learning Real-valued Functions
Computational Learning Theory - Hilary Term 08 8 : Learning Real-valued Functions Lecturer: Varun Kanade So far our focus has been on learning boolean functions. Boolean functions are suitable for modelling
More informationInequalities for Sums of Random Variables: a combinatorial perspective
Inequalities for Sums of Random Variables: a combinatorial perspective Doctoral thesis Matas Šileikis Department of Mathematics and Computer Science, Adam Mickiewicz University, Poznań April 2012 Acknowledgements
More informationChapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University
Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationMATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions.
MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions. Uniform continuity Definition. A function f : E R defined on a set E R is called uniformly continuous on E if for every
More informationVariance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18
Variance reduction p. 1/18 Variance reduction Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Variance reduction p. 2/18 Example Use simulation to compute I = 1 0 e x dx We
More informationOn the Concentration of the Crest Factor for OFDM Signals
On the Concentration of the Crest Factor for OFDM Signals Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology, Haifa 3, Israel E-mail: sason@eetechnionacil Abstract
More informationConvex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples
More informationContents 1. Introduction 1 2. Main results 3 3. Proof of the main inequalities 7 4. Application to random dynamical systems 11 References 16
WEIGHTED CSISZÁR-KULLBACK-PINSKER INEQUALITIES AND APPLICATIONS TO TRANSPORTATION INEQUALITIES FRANÇOIS BOLLEY AND CÉDRIC VILLANI Abstract. We strengthen the usual Csiszár-Kullback-Pinsker inequality by
More informationThe Moment Method; Convex Duality; and Large/Medium/Small Deviations
Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential
More informationHigh Dimensional Probability
High Dimensional Probability for Mathematicians and Data Scientists Roman Vershynin 1 1 University of Michigan. Webpage: www.umich.edu/~romanv ii Preface Who is this book for? This is a textbook in probability
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More informationContinuous Random Variables and Continuous Distributions
Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Expectation & Variance of Continuous Random Variables ( 5.2) The Uniform Random Variable
More informationHoeffding, Chernoff, Bennet, and Bernstein Bounds
Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically
More informationTheorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1
Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationCONCENTRATION OF MEASURE
CONCENTRATION OF MEASURE Nathanaël Berestycki and Richard Nickl with an appendix by Ben Schlein University of Cambridge Version of: December 1, 2009 Contents 1 Introduction 3 1.1 Foreword.................................
More informationRefined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities
Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Maxim Raginsky and Igal Sason ISIT 2013, Istanbul, Turkey Capacity-Achieving Channel Codes The set-up DMC
More information2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).
Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationA concentration theorem for the equilibrium measure of Markov chains with nonnegative coarse Ricci curvature
A concentration theorem for the equilibrium measure of Markov chains with nonnegative coarse Ricci curvature arxiv:103.897v1 math.pr] 13 Mar 01 Laurent Veysseire Abstract In this article, we prove a concentration
More informationLimiting Distributions
We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results
More informationLECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline
LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity
More informationOn the Concentration of the Crest Factor for OFDM Signals
On the Concentration of the Crest Factor for OFDM Signals Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel The 2011 IEEE International Symposium
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationWeak and strong moments of l r -norms of log-concave vectors
Weak and strong moments of l r -norms of log-concave vectors Rafał Latała based on the joint work with Marta Strzelecka) University of Warsaw Minneapolis, April 14 2015 Log-concave measures/vectors A measure
More informationTalagrand's Inequality
Talagrand's Inequality Aukosh Jagannath October 2, 2013 Abstract Notes on Talagrand's inequality. This material is a mixture from Talagrand's two famous papers and Ledoux. 1 Introduction The goal of this
More informationAn introduction to some aspects of functional analysis
An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms
More informationMAS113 Introduction to Probability and Statistics. Proofs of theorems
MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a
More information3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions
3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions
More informationλ(x + 1)f g (x) > θ 0
Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where
More informationX n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)
14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence
More informationLecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.
Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal
More informationTail inequalities for additive functionals and empirical processes of. Markov chains
Tail inequalities for additive functionals and empirical processes of geometrically ergodic Markov chains University of Warsaw Banff, June 2009 Geometric ergodicity Definition A Markov chain X = (X n )
More information