Concentration inequalities and the entropy method

Size: px
Start display at page:

Download "Concentration inequalities and the entropy method"

Transcription

1 Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona

2 what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables.

3 what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables. X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). How large are typical deviations of Z from EZ? We seek upper bounds for for t > 0. P{Z > EZ + t} and P{Z < EZ t}

4 various approaches - martingales (Yurinskii, 1974; Milman and Schechtman, 1986; Shamir and Spencer, 1987; McDiarmid, 1989,1998); - information theoretic and transportation methods (Alhswede, Gács, and Körner, 1976; Marton 1986, 1996, 1997; Dembo 1997); - Talagrand s induction method, 1996; - logarithmic Sobolev inequalities (Ledoux 1996, Massart 1998, Boucheron, Lugosi, Massart 1999, 2001).

5 chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. e λt

6 chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. If Z = n X i is a sum of independent random variables, n n Ee λz = E e λx i = Ee λx i by independence. It suffices to find bounds for Ee λx i. e λt

7 chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. If Z = n X i is a sum of independent random variables, n n Ee λz = E e λx i = Ee λx i by independence. It suffices to find bounds for Ee λx i. e λt Serguei Bernstein ( ) Herman Chernoff (1923 )

8 hoeffding s inequality If X 1,..., X n [0, 1], then Ee λ(x i EX i ) e λ2 /8.

9 hoeffding s inequality If X 1,..., X n [0, 1], then Ee λ(x i EX i ) e λ2 /8. We obtain { 1 P n [ n 1 X i E n ] } n X i > t 2e 2nt2 Wassily Hoeffding ( )

10 martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z.

11 martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z. Writing we have i = E i Z E i 1 Z, Z EZ = n i This is the Doob martingale representation of Z.

12 martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z. Writing we have i = E i Z E i 1 Z, Z EZ = n i This is the Doob martingale representation of Z. Joseph Leo Doob ( )

13 martingale representation: the variance ( n Var (Z) = E Now if j > i, E i j = 0, so ) 2 i = We obtain ( n Var (Z) = E n E i j i = i E i j = 0, ) 2 i = [ ] E 2 i + 2 E i j. j>i n [ ] E 2 i.

14 martingale representation: the variance ( n Var (Z) = E Now if j > i, E i j = 0, so ) 2 i = We obtain ( n Var (Z) = E n E i j i = i E i j = 0, ) 2 i = [ ] E 2 i + 2 E i j. j>i n [ ] E 2 i From this, using independence, it is easy derive the Efron-Stein inequality..

15 efron-stein inequality (1981) Let X 1,..., X n be independent random variables taking values in X. Let f : X n R and Z = f(x 1,..., X n ). Then Var(Z) E n (Z E (i) Z) 2 = E n Var (i) (Z). where E (i) Z is expectation with respect to the i-th variable X i only.

16 efron-stein inequality (1981) Let X 1,..., X n be independent random variables taking values in X. Let f : X n R and Z = f(x 1,..., X n ). Then Var(Z) E n (Z E (i) Z) 2 = E n Var (i) (Z). where E (i) Z is expectation with respect to the i-th variable X i only. We obtain more useful forms by using that Var(X) = 1 2 E(X X ) 2 and Var(X) E(X a) 2 for any constant a.

17 efron-stein inequality (1981) If X 1,..., X n are independent copies of X 1,..., X n, and Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), then Var(Z) 1 2 E [ n (Z Z i )2 ] Z is concentrated if it doesn t depend too much on any of its variables.

18 efron-stein inequality (1981) If X 1,..., X n are independent copies of X 1,..., X n, and Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), then Var(Z) 1 2 E [ n (Z Z i )2 ] Z is concentrated if it doesn t depend too much on any of its variables. If Z = n X i then we have an equality. Sums are the least concentrated of all functions!

19 efron-stein inequality (1981) If for some arbitrary functions f i Z i = f i (X 1,..., X i 1, X i+1,..., X n ), then [ n ] Var(Z) E (Z Z i ) 2

20 efron, stein, and steele Bradley Efron Charles Stein Mike Steele

21 example: kernel density estimation Let X 1,..., X n be i.i.d. real samples drawn according to some density φ. The kernel density estimate is φ n (x) = 1 n ( x Xi K nh h ), where h > 0, and K is a nonnegative kernel K = 1. The L 1 error is Z = f(x 1,..., X n ) = φ(x) φ n (x) dx.

22 example: kernel density estimation Let X 1,..., X n be i.i.d. real samples drawn according to some density φ. The kernel density estimate is φ n (x) = 1 n ( x Xi K nh h ), where h > 0, and K is a nonnegative kernel K = 1. The L 1 error is Z = f(x 1,..., X n ) = φ(x) φ n (x) dx. It is easy to see that f(x 1,..., x n ) f(x 1,..., x i,..., x n) 1 ( ) ( x xi x x K K i nh h h (Devroye, 1991.) so we get Var(Z) 2 n. ) dx 2 n,

23 Luc Devroye on Taksim square (2013).

24 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b.

25 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then Var(f(X)) aef(x) + b.

26 self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X).

27 self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X). Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions.

28 self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X). Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions. Configuration functions.

29 beyond the variance X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Recall the Doob martingale representation: Z EZ = n i where i = E i Z E i 1 Z, with E i [ ] = E[ X 1,..., X i ]. To get exponential inequalities, we bound the moment generating function Ee λ(z EZ).

30 azuma s inequality Suppose that the martingale differences are bounded: i c i. Then ( n 1 ) Ee λ(z EZ) = Ee λ( n i) = EEn e λ i +λ n = Ee λ ( n 1 i Ee λ ( n 1 i ) ) e λ2 ( n c2 i )/2. E n e λ n e λ2 c 2 n /2 (by Hoeffding) This is the Azuma-Hoeffding inequality for sums of bounded martingale differences.

31 bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded.

32 bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded. Bounded differences inequality: if X 1,..., X n are independent, then P{ Z EZ > t} 2e 2t2 / n c2 i.

33 Colin McDiarmid bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded. Bounded differences inequality: if X 1,..., X n are independent, then P{ Z EZ > t} 2e 2t2 / n c2 i. McDiarmid s inequality.

34 bounded differences inequality Easy to use. Distribution free. Often close to optimal. Does not exploit variance information. Often too rigid. Other methods are necessary.

35 shannon entropy If X, Y are random variables taking values in a set of size N, H(X) = x p(x) log p(x) H(X Y)= H(X, Y) H(Y) = p(x, y) log p(x y) x,y H(X) log N and H(X Y) H(X) Claude Shannon ( )

36 han s inequality If X = (X 1,..., X n ) and X (i) = (X 1,..., X i 1, X i+1,..., X n ), then n ( ) H(X) H(X (i) ) H(X) Te Sun Han

37 han s inequality If X = (X 1,..., X n ) and X (i) = (X 1,..., X i 1, X i+1,..., X n ), then Proof: n ( ) H(X) H(X (i) ) H(X) H(X)= H(X (i) ) + H(X i X (i) ) H(X (i) ) + H(X i X 1,..., X i 1 ) Te Sun Han Since n H(X i X 1,..., X i 1 ) = H(X), summing the inequality, we get (n 1)H(X) n H(X (i) ).

38 combinatorial entropies an example Let X 1,..., X n be independent points in the plane (of arbitrary distribution!). Let N be the number of subsets of points that are in convex position. Then Var(log 2 N) E log 2 N.

39 proof By Efron-Stein, it suffices to prove that f is self-bounding: 0 f n (x) f n 1 (x (i) ) 1 and n ( ) f n (x) f n 1 (x (i) ) f n (x).

40 proof By Efron-Stein, it suffices to prove that f is self-bounding: and n 0 f n (x) f n 1 (x (i) ) 1 ( ) f n (x) f n 1 (x (i) ) f n (x). The first property is obvious, only need to prove the second.

41 proof By Efron-Stein, it suffices to prove that f is self-bounding: and n 0 f n (x) f n 1 (x (i) ) 1 ( ) f n (x) f n 1 (x (i) ) f n (x). The first property is obvious, only need to prove the second. This is a deterministic property so fix the points.

42 proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set.

43 proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set. H(Y) = H(Y 1,..., Y n ) = log 2 N = f n (x) Also, H(Y (i) ) f n 1 (x (i) )

44 proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set. H(Y) = H(Y 1,..., Y n ) = log 2 N = f n (x) Also, H(Y (i) ) f n 1 (x (i) ) so by Han s inequality, n ( ) f n (x) f n 1 (x (i) ) n ( ) H(Y) H(Y (i) ) H(Y) = f n (x)

45 Han s inequality for relative entropies Let P = P 1 P n be a product distribution and Q an arbitrary distribution on X n.

46 Han s inequality for relative entropies Let P = P 1 P n be a product distribution and Q an arbitrary distribution on X n. By Han s inequality, D(Q P) 1 n 1 n D(Q (i) P (i) )

47 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0.

48 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0.

49 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0. Ent(Z) is the relative entropy between the distribution induced by Z on X n and the distribution of X = (X 1,..., X n ).

50 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0. Ent(Z) is the relative entropy between the distribution induced by Z on X n and the distribution of X = (X 1,..., X n ). Denote Then by Han s inequality, Ent (i) (Z) = E (i) Φ(Z) Φ(E (i) Z) Ent(Z) E n Ent (i) (Z).

51 a logarithmic sobolev inequality on the hypercube Let X = (X 1,..., X n ) be uniformly distributed over { 1, 1} n. If f : { 1, 1} n R and Z = f(x), Ent(Z 2 ) 1 n 2 E (Z Z i )2 The proof uses subadditivity of the entropy and calculus for the case n = 1. Implies Efron-Stein.

52 Sergei Lvovich Sobolev ( )

53 herbst s argument: exponential concentration If f : { 1, 1} n R, the log-sobolev inequality may be used with g(x) = e λf(x)/2 where λ R. If F(λ) = Ee λz is the moment generating function of Z = f(x), [ Ent(g(X) 2 )= λe Ze λz] [ E e λz] log E [Ze λz] = λf (λ) F(λ) log F(λ). Differential inequalities are obtained for F(λ).

54 herbst s argument As an example, suppose f is such that n (Z Z i )2 + v. Then by the log-sobolev inequality, λf (λ) F(λ) log F(λ) vλ2 4 F(λ) If G(λ) = log F(λ), this becomes ( G(λ) λ ) v 4. This can be integrated: G(λ) λez + λv/4, so F(λ) e λez λ2 v/4 This implies P{Z > EZ + t} e t2 /v

55 herbst s argument As an example, suppose f is such that n (Z Z i )2 + v. Then by the log-sobolev inequality, λf (λ) F(λ) log F(λ) vλ2 4 F(λ) If G(λ) = log F(λ), this becomes ( G(λ) λ ) v 4. This can be integrated: G(λ) λez + λv/4, so This implies F(λ) e λez λ2 v/4 P{Z > EZ + t} e t2 /v Stronger than the bounded differences inequality!

56 gaussian log-sobolev inequality Let X = (X 1,..., X n ) be a vector of i.i.d. standard normal If f : R n R and Z = f(x), Ent(Z 2 ) 2E [ f(x) 2] (Gross, 1975).

57 gaussian log-sobolev inequality Let X = (X 1,..., X n ) be a vector of i.i.d. standard normal If f : R n R and Z = f(x), Ent(Z 2 ) 2E [ f(x) 2] (Gross, 1975). Proof sketch: By the subadditivity of entropy, it suffices to prove it for n = 1. Approximate Z = f(x) by ( 1 m ) f ε i m where the ε i are i.i.d. Rademacher random variables. Use the log-sobolev inequality of the hypercube and the central limit theorem.

58 gaussian concentration inequality Herbst t argument may now be repeated: Suppose f is Lipschitz: for all x, y R n, Then, for all t > 0, f(x) f(y) L x y. P {f(x) Ef(X) t} e t2 /(2L 2). (Tsirelson, Ibragimov, and Sudakov, 1976).

59 beyond bernoulli and gaussian: the entropy method For general distributions, logarithmic Sobolev inequalities are not available. Solution: modified logarithmic Sobolev inequalities. Suppose X 1,..., X n are independent. Let Z = f(x 1,..., X n ) and Z i = f i (X (i) ) = f i (X 1,..., X i 1, X i+1,..., X n ). Let φ(x) = e x x 1. Then for all λ R, [ λe Ze λz] [ E e λz] log E [e λz] n [ ] E e λz φ ( λ(z Z i )). Michel Ledoux

60 the entropy method Define Z i = inf x i f(x 1,..., x i,..., X n) and suppose n (Z Z i ) 2 v. Then for all t > 0, P {Z EZ > t} e t2 /(2v).

61 the entropy method Define Z i = inf x i f(x 1,..., x i,..., X n) and suppose n (Z Z i ) 2 v. Then for all t > 0, P {Z EZ > t} e t2 /(2v). This implies the bounded differences inequality and much more.

62 example: the largest eigenvalue of a symmetric matrix Let A = (X i,j ) n n be symmetric, the X i,j independent (i j) with X i,j 1. Let Z = λ 1 = sup u T Au. u: u =1 and suppose v is such that Z = v T Av. A i,j is obtained by replacing X i,j by x i,j. Then ( ) (Z Z i,j ) + v T Av v T A i,j v 1Z>Zi,j ( ) ) = v T (A A i,j )v 1Z>Zi,j 2 (v i v j (X i,j X i,j ) Therefore, 1 i j n 4 v i v j. (Z Z i,j )2 + 1 i j n ( n ) 2 16 v i v j 2 16 vi 2 = 16. +

63 self-bounding functions Suppose Z satisfies 0 Z Z i 1 and n (Z Z i ) Z. Recall that Var(Z) EZ. We have much more: P{Z > EZ + t} e t2 /(2EZ+2t/3) and P{Z < EZ t} e t2 /(2EZ)

64 exponential efron-stein inequality Define n [ ] V + = E (Z Z i )2 + and n [ ] V = E (Z Z i )2. By Efron-Stein, Var(Z) EV + and Var(Z) EV.

65 exponential efron-stein inequality Define and By Efron-Stein, V + = V = n [ ] E (Z Z i )2 + n [ ] E (Z Z i )2. Var(Z) EV + and Var(Z) EV. The following exponential versions hold for all λ, θ > 0 with λθ < 1: log Ee λ(z EZ) λθ 1 λθ log /θ EeλV+. If also Z i Z 1 for every i, then for all λ (0, 1/2), log Ee λ(z EZ) 2λ 1 2λ log EeλV.

66 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b.

67 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then P {Z EZ + t} exp ( t 2 2 (aez + b + at/2) ).

68 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then P {Z EZ + t} exp ( t 2 2 (aez + b + at/2) ). If, in addition, f(x) f i (x (i) ) 1, then for 0 < t EZ, ( ) P {Z EZ t} exp t 2 2 (aez + b + c t). where c = (3a 1)/6.

69 the isoperimetric view Let X = (X 1,..., X n ) have independent components, taking values in X n. Let A X n. The Hamming distance of X to A is d(x, A) = min d(x, y) = min y A y A n 1 Xi y i. Michel Talagrand

70 the isoperimetric view Let X = (X 1,..., X n ) have independent components, taking values in X n. Let A X n. The Hamming distance of X to A is d(x, A) = min d(x, y) = min y A y A n 1 Xi y i. Michel Talagrand { n P d(x, A) t + 2 log 1 } e 2t2 /n. P[A]

71 the isoperimetric view Proof: By the bounded differences inequality, P{Ed(X, A) d(x, A) t} e 2t2 /n. Taking t = Ed(X, A), we get Ed(X, A) n 2 log 1 P{A}. By the bounded differences inequality again, { } n P d(x, A) t + 2 log 1 e 2t2 /n P{A}

72 talagrand s convex distance The weighted Hamming distance is d α (x, A) = inf d α(x, y) = inf y A y A i:x i y i α i where α = (α 1,..., α n ). The same argument as before gives { } α 2 1 P d α (X, A) t + log e 2t2 / α 2, 2 P{A} This implies sup min (P{A}, P {d α (X, A) t}) e t2 /2. α: α =1

73 convex distance inequality convex distance: d T (x, A) = sup d α (x, A). α [0, ) n : α =1

74 convex distance inequality convex distance: d T (x, A) = Talagrand s convex distance inequality: sup d α (x, A). α [0, ) n : α =1 P{A}P {d T (X, A) t} e t2 /4.

75 convex distance inequality convex distance: d T (x, A) = Talagrand s convex distance inequality: sup d α (x, A). α [0, ) n : α =1 P{A}P {d T (X, A) t} e t2 /4. Follows from the fact that d T (X, A) 2 is (4, 0) weakly self bounding (by a saddle point representation of d T ). Talagrand s original proof was different. It can also be recovered from Marton s transportation inequality.

76 convex lipschitz functions For A [0, 1] n and x [0, 1] n, define If A is convex, then D(x, A) = inf x y. y A D(x, A) d T (x, A).

77 convex lipschitz functions For A [0, 1] n and x [0, 1] n, define If A is convex, then D(x, A) = inf x y. y A D(x, A) d T (x, A). Proof: D(x, A)= inf x E νy (since A is convex) ν M(A) n ( ) inf Eν 1 2 xj Y j (since x j, Y j [0, 1]) ν M(A) = inf j=1 sup ν M(A) α: α 1 j=1 n α j E ν 1 xj Y j = d T (x, A) (by minimax theorem). (by Cauchy-Schwarz)

78 convex lipschitz functions Let X = (X 1,..., X n ) have independent components taking values in [0, 1]. Let f : [0, 1] n R be quasi-convex such that f(x) f(y) x y. Then P{f(X) > Mf(X) + t} 2e t2 /4 and P{f(X) < Mf(X) t} 2e t2 /4.

79 Take s = Mf(X) for the upper tail and s = Mf(X) t for the lower tail. convex lipschitz functions Let X = (X 1,..., X n ) have independent components taking values in [0, 1]. Let f : [0, 1] n R be quasi-convex such that f(x) f(y) x y. Then and P{f(X) > Mf(X) + t} 2e t2 /4 P{f(X) < Mf(X) t} 2e t2 /4. Proof: Let A s = {x : f(x) s} [0, 1] n. A s is convex. Since f is Lipschitz, f(x) s + D(x, A s ) s + d T (x, A s ), By the convex distance inequality, P{f(X) s + t}p{f(x) s} e t2 /4.

80 Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ).

81 Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ). H Φ is subadditive: H Φ (Z) n EH (i) Φ (Z) if (and only if) Φ is twice differentiable on (0, ), and either Φ is affine or strictly positive and 1/Φ is concave.

82 Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ). H Φ is subadditive: H Φ (Z) n EH (i) Φ (Z) if (and only if) Φ is twice differentiable on (0, ), and either Φ is affine or strictly positive and 1/Φ is concave. Φ(x) = x 2 corresponds to Efron-Stein. x log x is subadditivity of entropy. We may consider Φ(x) = x p for p (1, 2].

83 generalized efron-stein Define Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), n V + = (Z Z i )2 +.

84 generalized efron-stein Define Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), n V + = (Z Z i )2 +. For q 2 and q/2 α q 1, E [ (Z EZ) q ] + E [ (Z EZ) α ] ] q/α + + α (q α) E [V + (Z EZ) q 2 +, and similarly for E [ (Z EZ) q ].

85 moment inequalities We may solve the recursions, for q 2.

86 moment inequalities We may solve the recursions, for q 2. If V + c for some constant c 0, then for all integers q 2, ( E [ (Z EZ) q +]) 1/q Kqc, where K = 1/ ( e e ) <

87 moment inequalities We may solve the recursions, for q 2. If V + c for some constant c 0, then for all integers q 2, ( E [ (Z EZ) q +]) 1/q Kqc, where K = 1/ ( e e ) < More generally, ( [ E (Z EZ) q ]) 1/q ( [ ]) q E V + q/2 1/q.

88

Concentration, self-bounding functions

Concentration, self-bounding functions Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3

More information

OXPORD UNIVERSITY PRESS

OXPORD UNIVERSITY PRESS Concentration Inequalities A Nonasymptotic Theory of Independence STEPHANE BOUCHERON GABOR LUGOSI PASCAL MASS ART OXPORD UNIVERSITY PRESS CONTENTS 1 Introduction 1 1.1 Sums of Independent Random Variables

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Concentration of Measure with Applications in Information Theory, Communications and Coding

Concentration of Measure with Applications in Information Theory, Communications and Coding Concentration of Measure with Applications in Information Theory, Communications and Coding Maxim Raginsky UIUC Urbana, IL 61801, USA maxim@illinois.edu Igal Sason Technion Haifa 32000, Israel sason@ee.technion.ac.il

More information

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang. Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)

More information

Concentration inequalities: basics and some new challenges

Concentration inequalities: basics and some new challenges Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,

More information

Concentration inequalities

Concentration inequalities Concentration inequalities A nonasymptotic theory of independence Stéphane Boucheron Paris 7 Gábor Lugosi ICREA and Universitat Pompeu Fabra Pascal Massart Orsay CLARENDON PRESS. OXFORD 2012 PREFACE Measure

More information

EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION

EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION Luc Devroye Division of Statistics University of California at Davis Davis, CA 95616 ABSTRACT We derive exponential inequalities for the oscillation

More information

A Note on Jackknife Based Estimates of Sampling Distributions. Abstract

A Note on Jackknife Based Estimates of Sampling Distributions. Abstract A Note on Jackknife Based Estimates of Sampling Distributions C. Houdré Abstract Tail estimates of statistics are given; they depend on jackknife estimates of variance of the statistics of interest. 1

More information

Concentration of Measure with Applications in Information Theory, Communications, and Coding

Concentration of Measure with Applications in Information Theory, Communications, and Coding Concentration of Measure with Applications in Information Theory, Communications, and Coding Maxim Raginsky (UIUC) and Igal Sason (Technion) A Tutorial, Presented at 2015 IEEE International Symposium on

More information

Concentration Inequalities

Concentration Inequalities Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does

More information

Lecture 1 Measure concentration

Lecture 1 Measure concentration CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples

More information

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

A Gentle Introduction to Concentration Inequalities

A Gentle Introduction to Concentration Inequalities A Gentle Introduction to Concentration Inequalities Karthik Sridharan Abstract This notes is ment to be a review of some basic inequalities and bounds on Random variables. A basic understanding of probability

More information

On concentration of self-bounding functions

On concentration of self-bounding functions E l e c t r o n i c J o u r n a l o f P r o b a b i l i t y Vol 14 (2009), Paper no 64, pages 1884 1899 Journal URL http://wwwmathwashingtonedu/~ejpecp/ On concentration of self-bounding functions Stéphane

More information

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES Lithuanian Mathematical Journal, Vol. 4, No. 3, 00 AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES V. Bentkus Vilnius Institute of Mathematics and Informatics, Akademijos 4,

More information

Lectures 6: Degree Distributions and Concentration Inequalities

Lectures 6: Degree Distributions and Concentration Inequalities University of Washington Lecturer: Abraham Flaxman/Vahab S Mirrokni CSE599m: Algorithms and Economics of Networks April 13 th, 007 Scribe: Ben Birnbaum Lectures 6: Degree Distributions and Concentration

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Concentration inequalities for non-lipschitz functions

Concentration inequalities for non-lipschitz functions Concentration inequalities for non-lipschitz functions University of Warsaw Berkeley, October 1, 2013 joint work with Radosław Adamczak (University of Warsaw) Gaussian concentration (Sudakov-Tsirelson,

More information

Stein s method, logarithmic Sobolev and transport inequalities

Stein s method, logarithmic Sobolev and transport inequalities Stein s method, logarithmic Sobolev and transport inequalities M. Ledoux University of Toulouse, France and Institut Universitaire de France Stein s method, logarithmic Sobolev and transport inequalities

More information

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given. HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard

More information

Stein s Method: Distributional Approximation and Concentration of Measure

Stein s Method: Distributional Approximation and Concentration of Measure Stein s Method: Distributional Approximation and Concentration of Measure Larry Goldstein University of Southern California 36 th Midwest Probability Colloquium, 2014 Concentration of Measure Distributional

More information

Concentration Inequalities for Dependent Random Variables via the Martingale Method

Concentration Inequalities for Dependent Random Variables via the Martingale Method Concentration Inequalities for Dependent Random Variables via the Martingale Method Leonid Kontorovich, and Kavita Ramanan Carnegie Mellon University L. Kontorovich School of Computer Science Carnegie

More information

Selected Exercises on Expectations and Some Probability Inequalities

Selected Exercises on Expectations and Some Probability Inequalities Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ

More information

Concentration inequalities and tail bounds

Concentration inequalities and tail bounds Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples

More information

Entropy and Ergodic Theory Lecture 15: A first look at concentration

Entropy and Ergodic Theory Lecture 15: A first look at concentration Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that

More information

Homework 1 Due: Wednesday, September 28, 2016

Homework 1 Due: Wednesday, September 28, 2016 0-704 Information Processing and Learning Fall 06 Homework Due: Wednesday, September 8, 06 Notes: For positive integers k, [k] := {,..., k} denotes te set of te first k positive integers. Wen p and Y q

More information

Distance-Divergence Inequalities

Distance-Divergence Inequalities Distance-Divergence Inequalities Katalin Marton Alfréd Rényi Institute of Mathematics of the Hungarian Academy of Sciences Motivation To find a simple proof of the Blowing-up Lemma, proved by Ahlswede,

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

A note on the convex infimum convolution inequality

A note on the convex infimum convolution inequality A note on the convex infimum convolution inequality Naomi Feldheim, Arnaud Marsiglietti, Piotr Nayar, Jing Wang Abstract We characterize the symmetric measures which satisfy the one dimensional convex

More information

4 Expectation & the Lebesgue Theorems

4 Expectation & the Lebesgue Theorems STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

Matrix concentration inequalities

Matrix concentration inequalities ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018 Recap: matrix Bernstein inequality Consider a sequence of independent random

More information

Common-Knowledge / Cheat Sheet

Common-Knowledge / Cheat Sheet CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]

More information

Outline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University.

Outline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University. Outline Piotr 1 1 Lane Department of Computer Science and Electrical Engineering West Virginia University 8 April, 01 Outline Outline 1 Tail Inequalities Outline Outline 1 Tail Inequalities General Outline

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Introduction to Self-normalized Limit Theory

Introduction to Self-normalized Limit Theory Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Concentration of Measures by Bounded Size Bias Couplings

Concentration of Measures by Bounded Size Bias Couplings Concentration of Measures by Bounded Size Bias Couplings Subhankar Ghosh, Larry Goldstein University of Southern California [arxiv:0906.3886] January 10 th, 2013 Concentration of Measure Distributional

More information

Concentration of Measures by Bounded Couplings

Concentration of Measures by Bounded Couplings Concentration of Measures by Bounded Couplings Subhankar Ghosh, Larry Goldstein and Ümit Işlak University of Southern California [arxiv:0906.3886] [arxiv:1304.5001] May 2013 Concentration of Measure Distributional

More information

The symmetry in the martingale inequality

The symmetry in the martingale inequality Statistics & Probability Letters 56 2002 83 9 The symmetry in the martingale inequality Sungchul Lee a; ;, Zhonggen Su a;b;2 a Department of Mathematics, Yonsei University, Seoul 20-749, South Korea b

More information

NEW FUNCTIONAL INEQUALITIES

NEW FUNCTIONAL INEQUALITIES 1 / 29 NEW FUNCTIONAL INEQUALITIES VIA STEIN S METHOD Giovanni Peccati (Luxembourg University) IMA, Minneapolis: April 28, 2015 2 / 29 INTRODUCTION Based on two joint works: (1) Nourdin, Peccati and Swan

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

MEASURE CONCENTRATION FOR COMPOUND POIS- SON DISTRIBUTIONS

MEASURE CONCENTRATION FOR COMPOUND POIS- SON DISTRIBUTIONS Elect. Comm. in Probab. 11 (26), 45 57 ELECTRONIC COMMUNICATIONS in PROBABILITY MEASURE CONCENTRATION FOR COMPOUND POIS- SON DISTRIBUTIONS I. KONTOYIANNIS 1 Division of Applied Mathematics, Brown University,

More information

The space complexity of approximating the frequency moments

The space complexity of approximating the frequency moments The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two

More information

STA 711: Probability & Measure Theory Robert L. Wolpert

STA 711: Probability & Measure Theory Robert L. Wolpert STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information

More information

LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES

LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES Luc Devroye School of Computer Science McGill University Montreal, Canada H3A 2K6 luc@csmcgillca June 25, 2001 Abstract We

More information

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER 1. The method Ashwelde and Winter [1] proposed a new approach to deviation inequalities for sums of independent random matrices. The

More information

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018 ECE534, Spring 2018: s for Problem Set #4 Due Friday April 6, 2018 1. MMSE Estimation, Data Processing and Innovations The random variables X, Y, Z on a common probability space (Ω, F, P ) are said to

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write Lecture 3: Expected Value 1.) Definitions. If X 0 is a random variable on (Ω, F, P), then we define its expected value to be EX = XdP. Notice that this quantity may be. For general X, we say that EX exists

More information

Computational Learning Theory - Hilary Term : Learning Real-valued Functions

Computational Learning Theory - Hilary Term : Learning Real-valued Functions Computational Learning Theory - Hilary Term 08 8 : Learning Real-valued Functions Lecturer: Varun Kanade So far our focus has been on learning boolean functions. Boolean functions are suitable for modelling

More information

Inequalities for Sums of Random Variables: a combinatorial perspective

Inequalities for Sums of Random Variables: a combinatorial perspective Inequalities for Sums of Random Variables: a combinatorial perspective Doctoral thesis Matas Šileikis Department of Mathematics and Computer Science, Adam Mickiewicz University, Poznań April 2012 Acknowledgements

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds

More information

MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions.

MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions. MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions. Uniform continuity Definition. A function f : E R defined on a set E R is called uniformly continuous on E if for every

More information

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18 Variance reduction p. 1/18 Variance reduction Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Variance reduction p. 2/18 Example Use simulation to compute I = 1 0 e x dx We

More information

On the Concentration of the Crest Factor for OFDM Signals

On the Concentration of the Crest Factor for OFDM Signals On the Concentration of the Crest Factor for OFDM Signals Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology, Haifa 3, Israel E-mail: sason@eetechnionacil Abstract

More information

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples

More information

Contents 1. Introduction 1 2. Main results 3 3. Proof of the main inequalities 7 4. Application to random dynamical systems 11 References 16

Contents 1. Introduction 1 2. Main results 3 3. Proof of the main inequalities 7 4. Application to random dynamical systems 11 References 16 WEIGHTED CSISZÁR-KULLBACK-PINSKER INEQUALITIES AND APPLICATIONS TO TRANSPORTATION INEQUALITIES FRANÇOIS BOLLEY AND CÉDRIC VILLANI Abstract. We strengthen the usual Csiszár-Kullback-Pinsker inequality by

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

High Dimensional Probability

High Dimensional Probability High Dimensional Probability for Mathematicians and Data Scientists Roman Vershynin 1 1 University of Michigan. Webpage: www.umich.edu/~romanv ii Preface Who is this book for? This is a textbook in probability

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Continuous Random Variables and Continuous Distributions

Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Expectation & Variance of Continuous Random Variables ( 5.2) The Uniform Random Variable

More information

Hoeffding, Chernoff, Bennet, and Bernstein Bounds

Hoeffding, Chernoff, Bennet, and Bernstein Bounds Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

CONCENTRATION OF MEASURE

CONCENTRATION OF MEASURE CONCENTRATION OF MEASURE Nathanaël Berestycki and Richard Nickl with an appendix by Ben Schlein University of Cambridge Version of: December 1, 2009 Contents 1 Introduction 3 1.1 Foreword.................................

More information

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Maxim Raginsky and Igal Sason ISIT 2013, Istanbul, Turkey Capacity-Achieving Channel Codes The set-up DMC

More information

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1). Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

A concentration theorem for the equilibrium measure of Markov chains with nonnegative coarse Ricci curvature

A concentration theorem for the equilibrium measure of Markov chains with nonnegative coarse Ricci curvature A concentration theorem for the equilibrium measure of Markov chains with nonnegative coarse Ricci curvature arxiv:103.897v1 math.pr] 13 Mar 01 Laurent Veysseire Abstract In this article, we prove a concentration

More information

Limiting Distributions

Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results

More information

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity

More information

On the Concentration of the Crest Factor for OFDM Signals

On the Concentration of the Crest Factor for OFDM Signals On the Concentration of the Crest Factor for OFDM Signals Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel The 2011 IEEE International Symposium

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Weak and strong moments of l r -norms of log-concave vectors

Weak and strong moments of l r -norms of log-concave vectors Weak and strong moments of l r -norms of log-concave vectors Rafał Latała based on the joint work with Marta Strzelecka) University of Warsaw Minneapolis, April 14 2015 Log-concave measures/vectors A measure

More information

Talagrand's Inequality

Talagrand's Inequality Talagrand's Inequality Aukosh Jagannath October 2, 2013 Abstract Notes on Talagrand's inequality. This material is a mixture from Talagrand's two famous papers and Ledoux. 1 Introduction The goal of this

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

MAS113 Introduction to Probability and Statistics. Proofs of theorems

MAS113 Introduction to Probability and Statistics. Proofs of theorems MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

λ(x + 1)f g (x) > θ 0

λ(x + 1)f g (x) > θ 0 Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal

More information

Tail inequalities for additive functionals and empirical processes of. Markov chains

Tail inequalities for additive functionals and empirical processes of. Markov chains Tail inequalities for additive functionals and empirical processes of geometrically ergodic Markov chains University of Warsaw Banff, June 2009 Geometric ergodicity Definition A Markov chain X = (X n )

More information