Concentration inequalities and the entropy method

Size: px

Start display at page:

Download "Concentration inequalities and the entropy method"

Leslie Harper
5 years ago
Views:

1 Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona

2 what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables.

3 what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables. X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). How large are typical deviations of Z from EZ? We seek upper bounds for for t > 0. P{Z > EZ + t} and P{Z < EZ t}

4 various approaches - martingales (Yurinskii, 1974; Milman and Schechtman, 1986; Shamir and Spencer, 1987; McDiarmid, 1989,1998); - information theoretic and transportation methods (Alhswede, Gács, and Körner, 1976; Marton 1986, 1996, 1997; Dembo 1997); - Talagrand s induction method, 1996; - logarithmic Sobolev inequalities (Ledoux 1996, Massart 1998, Boucheron, Lugosi, Massart 1999, 2001).

5 chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. e λt

6 chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. If Z = n X i is a sum of independent random variables, n n Ee λz = E e λx i = Ee λx i by independence. It suffices to find bounds for Ee λx i. e λt

If Z = n X i is a sum of independent random variables, n n Ee λz = E e λx i = Ee λx i by

7 chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. If Z = n X i is a sum of independent random variables, n n Ee λz = E e λx i = Ee λx i by independence. It suffices to find bounds for Ee λx i. e λt Serguei Bernstein ( ) Herman Chernoff (1923 )

8 hoeffding s inequality If X 1,..., X n [0, 1], then Ee λ(x i EX i ) e λ2 /8.

9 hoeffding s inequality If X 1,..., X n [0, 1], then Ee λ(x i EX i ) e λ2 /8. We obtain { 1 P n [ n 1 X i E n ] } n X i > t 2e 2nt2 Wassily Hoeffding ( )

10 martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z.

11 martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z. Writing we have i = E i Z E i 1 Z, Z EZ = n i This is the Doob martingale representation of Z.

12 martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z. Writing we have i = E i Z E i 1 Z, Z EZ = n i This is the Doob martingale representation of Z. Joseph Leo Doob ( )

13 martingale representation: the variance ( n Var (Z) = E Now if j > i, E i j = 0, so ) 2 i = We obtain ( n Var (Z) = E n E i j i = i E i j = 0, ) 2 i = [ ] E 2 i + 2 E i j. j>i n [ ] E 2 i.

14 martingale representation: the variance ( n Var (Z) = E Now if j > i, E i j = 0, so ) 2 i = We obtain ( n Var (Z) = E n E i j i = i E i j = 0, ) 2 i = [ ] E 2 i + 2 E i j. j>i n [ ] E 2 i From this, using independence, it is easy derive the Efron-Stein inequality..

15 efron-stein inequality (1981) Let X 1,..., X n be independent random variables taking values in X. Let f : X n R and Z = f(x 1,..., X n ). Then Var(Z) E n (Z E (i) Z) 2 = E n Var (i) (Z). where E (i) Z is expectation with respect to the i-th variable X i only.

16 efron-stein inequality (1981) Let X 1,..., X n be independent random variables taking values in X. Let f : X n R and Z = f(x 1,..., X n ). Then Var(Z) E n (Z E (i) Z) 2 = E n Var (i) (Z). where E (i) Z is expectation with respect to the i-th variable X i only. We obtain more useful forms by using that Var(X) = 1 2 E(X X ) 2 and Var(X) E(X a) 2 for any constant a.

17 efron-stein inequality (1981) If X 1,..., X n are independent copies of X 1,..., X n, and Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), then Var(Z) 1 2 E [ n (Z Z i )2 ] Z is concentrated if it doesn t depend too much on any of its variables.

18 efron-stein inequality (1981) If X 1,..., X n are independent copies of X 1,..., X n, and Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), then Var(Z) 1 2 E [ n (Z Z i )2 ] Z is concentrated if it doesn t depend too much on any of its variables. If Z = n X i then we have an equality. Sums are the least concentrated of all functions!

19 efron-stein inequality (1981) If for some arbitrary functions f i Z i = f i (X 1,..., X i 1, X i+1,..., X n ), then [ n ] Var(Z) E (Z Z i ) 2

20 efron, stein, and steele Bradley Efron Charles Stein Mike Steele

21 example: kernel density estimation Let X 1,..., X n be i.i.d. real samples drawn according to some density φ. The kernel density estimate is φ n (x) = 1 n ( x Xi K nh h ), where h > 0, and K is a nonnegative kernel K = 1. The L 1 error is Z = f(x 1,..., X n ) = φ(x) φ n (x) dx.

22 example: kernel density estimation Let X 1,..., X n be i.i.d. real samples drawn according to some density φ. The kernel density estimate is φ n (x) = 1 n ( x Xi K nh h ), where h > 0, and K is a nonnegative kernel K = 1. The L 1 error is Z = f(x 1,..., X n ) = φ(x) φ n (x) dx. It is easy to see that f(x 1,..., x n ) f(x 1,..., x i,..., x n) 1 ( ) ( x xi x x K K i nh h h (Devroye, 1991.) so we get Var(Z) 2 n. ) dx 2 n,

23 Luc Devroye on Taksim square (2013).

24 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b.

25 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then Var(f(X)) aef(x) + b.

26 self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X).

27 self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X). Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions.

28 self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X). Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions. Configuration functions.

29 beyond the variance X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Recall the Doob martingale representation: Z EZ = n i where i = E i Z E i 1 Z, with E i [ ] = E[ X 1,..., X i ]. To get exponential inequalities, we bound the moment generating function Ee λ(z EZ).

30 azuma s inequality Suppose that the martingale differences are bounded: i c i. Then ( n 1 ) Ee λ(z EZ) = Ee λ( n i) = EEn e λ i +λ n = Ee λ ( n 1 i Ee λ ( n 1 i ) ) e λ2 ( n c2 i )/2. E n e λ n e λ2 c 2 n /2 (by Hoeffding) This is the Azuma-Hoeffding inequality for sums of bounded martingale differences.

31 bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded.

32 bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded. Bounded differences inequality: if X 1,..., X n are independent, then P{ Z EZ > t} 2e 2t2 / n c2 i.

33 Colin McDiarmid bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded. Bounded differences inequality: if X 1,..., X n are independent, then P{ Z EZ > t} 2e 2t2 / n c2 i. McDiarmid s inequality.

34 bounded differences inequality Easy to use. Distribution free. Often close to optimal. Does not exploit variance information. Often too rigid. Other methods are necessary.

35 shannon entropy If X, Y are random variables taking values in a set of size N, H(X) = x p(x) log p(x) H(X Y)= H(X, Y) H(Y) = p(x, y) log p(x y) x,y H(X) log N and H(X Y) H(X) Claude Shannon ( )

36 han s inequality If X = (X 1,..., X n ) and X (i) = (X 1,..., X i 1, X i+1,..., X n ), then n ( ) H(X) H(X (i) ) H(X) Te Sun Han

37 han s inequality If X = (X 1,..., X n ) and X (i) = (X 1,..., X i 1, X i+1,..., X n ), then Proof: n ( ) H(X) H(X (i) ) H(X) H(X)= H(X (i) ) + H(X i X (i) ) H(X (i) ) + H(X i X 1,..., X i 1 ) Te Sun Han Since n H(X i X 1,..., X i 1 ) = H(X), summing the inequality, we get (n 1)H(X) n H(X (i) ).

38 combinatorial entropies an example Let X 1,..., X n be independent points in the plane (of arbitrary distribution!). Let N be the number of subsets of points that are in convex position. Then Var(log 2 N) E log 2 N.

39 proof By Efron-Stein, it suffices to prove that f is self-bounding: 0 f n (x) f n 1 (x (i) ) 1 and n ( ) f n (x) f n 1 (x (i) ) f n (x).

40 proof By Efron-Stein, it suffices to prove that f is self-bounding: and n 0 f n (x) f n 1 (x (i) ) 1 ( ) f n (x) f n 1 (x (i) ) f n (x). The first property is obvious, only need to prove the second.

41 proof By Efron-Stein, it suffices to prove that f is self-bounding: and n 0 f n (x) f n 1 (x (i) ) 1 ( ) f n (x) f n 1 (x (i) ) f n (x). The first property is obvious, only need to prove the second. This is a deterministic property so fix the points.

42 proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set.

43 proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set. H(Y) = H(Y 1,..., Y n ) = log 2 N = f n (x) Also, H(Y (i) ) f n 1 (x (i) )

44 proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set. H(Y) = H(Y 1,..., Y n ) = log 2 N = f n (x) Also, H(Y (i) ) f n 1 (x (i) ) so by Han s inequality, n ( ) f n (x) f n 1 (x (i) ) n ( ) H(Y) H(Y (i) ) H(Y) = f n (x)

45 Han s inequality for relative entropies Let P = P 1 P n be a product distribution and Q an arbitrary distribution on X n.

46 Han s inequality for relative entropies Let P = P 1 P n be a product distribution and Q an arbitrary distribution on X n. By Han s inequality, D(Q P) 1 n 1 n D(Q (i) P (i) )

47 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0.

48 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0.

49 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0. Ent(Z) is the relative entropy between the distribution induced by Z on X n and the distribution of X = (X 1,..., X n ).

50 subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0. Ent(Z) is the relative entropy between the distribution induced by Z on X n and the distribution of X = (X 1,..., X n ). Denote Then by Han s inequality, Ent (i) (Z) = E (i) Φ(Z) Φ(E (i) Z) Ent(Z) E n Ent (i) (Z).

51 a logarithmic sobolev inequality on the hypercube Let X = (X 1,..., X n ) be uniformly distributed over { 1, 1} n. If f : { 1, 1} n R and Z = f(x), Ent(Z 2 ) 1 n 2 E (Z Z i )2 The proof uses subadditivity of the entropy and calculus for the case n = 1. Implies Efron-Stein.

52 Sergei Lvovich Sobolev ( )

53 herbst s argument: exponential concentration If f : { 1, 1} n R, the log-sobolev inequality may be used with g(x) = e λf(x)/2 where λ R. If F(λ) = Ee λz is the moment generating function of Z = f(x), [ Ent(g(X) 2 )= λe Ze λz] [ E e λz] log E [Ze λz] = λf (λ) F(λ) log F(λ). Differential inequalities are obtained for F(λ).

54 herbst s argument As an example, suppose f is such that n (Z Z i )2 + v. Then by the log-sobolev inequality, λf (λ) F(λ) log F(λ) vλ2 4 F(λ) If G(λ) = log F(λ), this becomes ( G(λ) λ ) v 4. This can be integrated: G(λ) λez + λv/4, so F(λ) e λez λ2 v/4 This implies P{Z > EZ + t} e t2 /v

55 herbst s argument As an example, suppose f is such that n (Z Z i )2 + v. Then by the log-sobolev inequality, λf (λ) F(λ) log F(λ) vλ2 4 F(λ) If G(λ) = log F(λ), this becomes ( G(λ) λ ) v 4. This can be integrated: G(λ) λez + λv/4, so This implies F(λ) e λez λ2 v/4 P{Z > EZ + t} e t2 /v Stronger than the bounded differences inequality!

56 gaussian log-sobolev inequality Let X = (X 1,..., X n ) be a vector of i.i.d. standard normal If f : R n R and Z = f(x), Ent(Z 2 ) 2E [ f(x) 2] (Gross, 1975).

57 gaussian log-sobolev inequality Let X = (X 1,..., X n ) be a vector of i.i.d. standard normal If f : R n R and Z = f(x), Ent(Z 2 ) 2E [ f(x) 2] (Gross, 1975). Proof sketch: By the subadditivity of entropy, it suffices to prove it for n = 1. Approximate Z = f(x) by ( 1 m ) f ε i m where the ε i are i.i.d. Rademacher random variables. Use the log-sobolev inequality of the hypercube and the central limit theorem.

58 gaussian concentration inequality Herbst t argument may now be repeated: Suppose f is Lipschitz: for all x, y R n, Then, for all t > 0, f(x) f(y) L x y. P {f(x) Ef(X) t} e t2 /(2L 2). (Tsirelson, Ibragimov, and Sudakov, 1976).

beyond bernoulli and gaussian: the entropy method For general distributions, logarithmic Sobolev inequalities are not available. Solution: modified logarithmic Sobolev inequalities. Suppose X 1,.

59 beyond bernoulli and gaussian: the entropy method For general distributions, logarithmic Sobolev inequalities are not available. Solution: modified logarithmic Sobolev inequalities. Suppose X 1,..., X n are independent. Let Z = f(x 1,..., X n ) and Z i = f i (X (i) ) = f i (X 1,..., X i 1, X i+1,..., X n ). Let φ(x) = e x x 1. Then for all λ R, [ λe Ze λz] [ E e λz] log E [e λz] n [ ] E e λz φ ( λ(z Z i )). Michel Ledoux

60 the entropy method Define Z i = inf x i f(x 1,..., x i,..., X n) and suppose n (Z Z i ) 2 v. Then for all t > 0, P {Z EZ > t} e t2 /(2v).

61 the entropy method Define Z i = inf x i f(x 1,..., x i,..., X n) and suppose n (Z Z i ) 2 v. Then for all t > 0, P {Z EZ > t} e t2 /(2v). This implies the bounded differences inequality and much more.

62 example: the largest eigenvalue of a symmetric matrix Let A = (X i,j ) n n be symmetric, the X i,j independent (i j) with X i,j 1. Let Z = λ 1 = sup u T Au. u: u =1 and suppose v is such that Z = v T Av. A i,j is obtained by replacing X i,j by x i,j. Then ( ) (Z Z i,j ) + v T Av v T A i,j v 1Z>Zi,j ( ) ) = v T (A A i,j )v 1Z>Zi,j 2 (v i v j (X i,j X i,j ) Therefore, 1 i j n 4 v i v j. (Z Z i,j )2 + 1 i j n ( n ) 2 16 v i v j 2 16 vi 2 = 16. +

63 self-bounding functions Suppose Z satisfies 0 Z Z i 1 and n (Z Z i ) Z. Recall that Var(Z) EZ. We have much more: P{Z > EZ + t} e t2 /(2EZ+2t/3) and P{Z < EZ t} e t2 /(2EZ)

64 exponential efron-stein inequality Define n [ ] V + = E (Z Z i )2 + and n [ ] V = E (Z Z i )2. By Efron-Stein, Var(Z) EV + and Var(Z) EV.

65 exponential efron-stein inequality Define and By Efron-Stein, V + = V = n [ ] E (Z Z i )2 + n [ ] E (Z Z i )2. Var(Z) EV + and Var(Z) EV. The following exponential versions hold for all λ, θ > 0 with λθ < 1: log Ee λ(z EZ) λθ 1 λθ log /θ EeλV+. If also Z i Z 1 for every i, then for all λ (0, 1/2), log Ee λ(z EZ) 2λ 1 2λ log EeλV.

66 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b.

67 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then P {Z EZ + t} exp ( t 2 2 (aez + b + at/2) ).

68 weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then P {Z EZ + t} exp ( t 2 2 (aez + b + at/2) ). If, in addition, f(x) f i (x (i) ) 1, then for 0 < t EZ, ( ) P {Z EZ t} exp t 2 2 (aez + b + c t). where c = (3a 1)/6.

69 the isoperimetric view Let X = (X 1,..., X n ) have independent components, taking values in X n. Let A X n. The Hamming distance of X to A is d(x, A) = min d(x, y) = min y A y A n 1 Xi y i. Michel Talagrand

70 the isoperimetric view Let X = (X 1,..., X n ) have independent components, taking values in X n. Let A X n. The Hamming distance of X to A is d(x, A) = min d(x, y) = min y A y A n 1 Xi y i. Michel Talagrand { n P d(x, A) t + 2 log 1 } e 2t2 /n. P[A]

71 the isoperimetric view Proof: By the bounded differences inequality, P{Ed(X, A) d(x, A) t} e 2t2 /n. Taking t = Ed(X, A), we get Ed(X, A) n 2 log 1 P{A}. By the bounded differences inequality again, { } n P d(x, A) t + 2 log 1 e 2t2 /n P{A}

72 talagrand s convex distance The weighted Hamming distance is d α (x, A) = inf d α(x, y) = inf y A y A i:x i y i α i where α = (α 1,..., α n ). The same argument as before gives { } α 2 1 P d α (X, A) t + log e 2t2 / α 2, 2 P{A} This implies sup min (P{A}, P {d α (X, A) t}) e t2 /2. α: α =1

73 convex distance inequality convex distance: d T (x, A) = sup d α (x, A). α [0, ) n : α =1

74 convex distance inequality convex distance: d T (x, A) = Talagrand s convex distance inequality: sup d α (x, A). α [0, ) n : α =1 P{A}P {d T (X, A) t} e t2 /4.

75 convex distance inequality convex distance: d T (x, A) = Talagrand s convex distance inequality: sup d α (x, A). α [0, ) n : α =1 P{A}P {d T (X, A) t} e t2 /4. Follows from the fact that d T (X, A) 2 is (4, 0) weakly self bounding (by a saddle point representation of d T ). Talagrand s original proof was different. It can also be recovered from Marton s transportation inequality.

76 convex lipschitz functions For A [0, 1] n and x [0, 1] n, define If A is convex, then D(x, A) = inf x y. y A D(x, A) d T (x, A).

77 convex lipschitz functions For A [0, 1] n and x [0, 1] n, define If A is convex, then D(x, A) = inf x y. y A D(x, A) d T (x, A). Proof: D(x, A)= inf x E νy (since A is convex) ν M(A) n ( ) inf Eν 1 2 xj Y j (since x j, Y j [0, 1]) ν M(A) = inf j=1 sup ν M(A) α: α 1 j=1 n α j E ν 1 xj Y j = d T (x, A) (by minimax theorem). (by Cauchy-Schwarz)

78 convex lipschitz functions Let X = (X 1,..., X n ) have independent components taking values in [0, 1]. Let f : [0, 1] n R be quasi-convex such that f(x) f(y) x y. Then P{f(X) > Mf(X) + t} 2e t2 /4 and P{f(X) < Mf(X) t} 2e t2 /4.

79 Take s = Mf(X) for the upper tail and s = Mf(X) t for the lower tail. convex lipschitz functions Let X = (X 1,..., X n ) have independent components taking values in [0, 1]. Let f : [0, 1] n R be quasi-convex such that f(x) f(y) x y. Then and P{f(X) > Mf(X) + t} 2e t2 /4 P{f(X) < Mf(X) t} 2e t2 /4. Proof: Let A s = {x : f(x) s} [0, 1] n. A s is convex. Since f is Lipschitz, f(x) s + D(x, A s ) s + d T (x, A s ), By the convex distance inequality, P{f(X) s + t}p{f(x) s} e t2 /4.

80 Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ).

81 Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ). H Φ is subadditive: H Φ (Z) n EH (i) Φ (Z) if (and only if) Φ is twice differentiable on (0, ), and either Φ is affine or strictly positive and 1/Φ is concave.

82 Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ). H Φ is subadditive: H Φ (Z) n EH (i) Φ (Z) if (and only if) Φ is twice differentiable on (0, ), and either Φ is affine or strictly positive and 1/Φ is concave. Φ(x) = x 2 corresponds to Efron-Stein. x log x is subadditivity of entropy. We may consider Φ(x) = x p for p (1, 2].

83 generalized efron-stein Define Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), n V + = (Z Z i )2 +.

84 generalized efron-stein Define Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), n V + = (Z Z i )2 +. For q 2 and q/2 α q 1, E [ (Z EZ) q ] + E [ (Z EZ) α ] ] q/α + + α (q α) E [V + (Z EZ) q 2 +, and similarly for E [ (Z EZ) q ].

85 moment inequalities We may solve the recursions, for q 2.

86 moment inequalities We may solve the recursions, for q 2. If V + c for some constant c 0, then for all integers q 2, ( E [ (Z EZ) q +]) 1/q Kqc, where K = 1/ ( e e ) <

87 moment inequalities We may solve the recursions, for q 2. If V + c for some constant c 0, then for all integers q 2, ( E [ (Z EZ) q +]) 1/q Kqc, where K = 1/ ( e e ) < More generally, ( [ E (Z EZ) q ]) 1/q ( [ ]) q E V + q/2 1/q.

Concentration, self-bounding functions

Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3