Concentration inequalities and the entropy method

Similar documents
Concentration, self-bounding functions

OXPORD UNIVERSITY PRESS

STAT 200C: High-dimensional Statistics

Concentration of Measure with Applications in Information Theory, Communications and Coding

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

Concentration inequalities: basics and some new challenges

Concentration inequalities

EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION

A Note on Jackknife Based Estimates of Sampling Distributions. Abstract

Concentration of Measure with Applications in Information Theory, Communications, and Coding

Concentration Inequalities

Lecture 1 Measure concentration

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization

STAT 200C: High-dimensional Statistics

A Gentle Introduction to Concentration Inequalities

On concentration of self-bounding functions

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES

Lectures 6: Degree Distributions and Concentration Inequalities

Spring 2012 Math 541B Exam 1

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Concentration inequalities for non-lipschitz functions

Stein s method, logarithmic Sobolev and transport inequalities

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Stein s Method: Distributional Approximation and Concentration of Measure

Concentration Inequalities for Dependent Random Variables via the Martingale Method

Selected Exercises on Expectations and Some Probability Inequalities

Concentration inequalities and tail bounds

Entropy and Ergodic Theory Lecture 15: A first look at concentration

Homework 1 Due: Wednesday, September 28, 2016

Distance-Divergence Inequalities

Notes 6 : First and second moment methods

Concentration Inequalities for Random Matrices

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Exercises with solutions (Set D)

A note on the convex infimum convolution inequality

4 Expectation & the Lebesgue Theorems

P (A G) dp G P (A G)

Matrix concentration inequalities

Common-Knowledge / Cheat Sheet

Outline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University.

6.1 Moment Generating and Characteristic Functions

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Introduction to Self-normalized Limit Theory

Randomized Algorithms

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Convex Sets and Functions

Probability and Measure

Concentration of Measures by Bounded Size Bias Couplings

Concentration of Measures by Bounded Couplings

The symmetry in the martingale inequality

NEW FUNCTIONAL INEQUALITIES

Generalization theory

MEASURE CONCENTRATION FOR COMPOUND POIS- SON DISTRIBUTIONS

The space complexity of approximating the frequency moments

3. Review of Probability and Statistics

Stat410 Probability and Statistics II (F16)

STA 711: Probability & Measure Theory Robert L. Wolpert

Lecture 2: Review of Basic Probability Theory

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

Information Theory and Communication

LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

Computational Learning Theory - Hilary Term : Learning Real-valued Functions

Inequalities for Sums of Random Variables: a combinatorial perspective

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions.

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

On the Concentration of the Crest Factor for OFDM Signals

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Contents 1. Introduction 1 2. Main results 3 3. Proof of the main inequalities 7 4. Application to random dynamical systems 11 References 16

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

High Dimensional Probability

1 Review of The Learning Setting

Continuous Random Variables and Continuous Distributions

Hoeffding, Chernoff, Bennet, and Bernstein Bounds

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

x log x, which is strictly convex, and use Jensen s Inequality:

CONCENTRATION OF MEASURE

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

A concentration theorem for the equilibrium measure of Markov chains with nonnegative coarse Ricci curvature

Limiting Distributions

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

On the Concentration of the Crest Factor for OFDM Signals

Consistency of the maximum likelihood estimator for general hidden Markov models

Weak and strong moments of l r -norms of log-concave vectors

Talagrand's Inequality

An introduction to some aspects of functional analysis

MAS113 Introduction to Probability and Statistics. Proofs of theorems

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

λ(x + 1)f g (x) > θ 0

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Tail inequalities for additive functionals and empirical processes of. Markov chains

Transcription:

Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona

what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables.

what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables. X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). How large are typical deviations of Z from EZ? We seek upper bounds for for t > 0. P{Z > EZ + t} and P{Z < EZ t}

various approaches - martingales (Yurinskii, 1974; Milman and Schechtman, 1986; Shamir and Spencer, 1987; McDiarmid, 1989,1998); - information theoretic and transportation methods (Alhswede, Gács, and Körner, 1976; Marton 1986, 1996, 1997; Dembo 1997); - Talagrand s induction method, 1996; - logarithmic Sobolev inequalities (Ledoux 1996, Massart 1998, Boucheron, Lugosi, Massart 1999, 2001).

chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. e λt

chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. If Z = n X i is a sum of independent random variables, n n Ee λz = E e λx i = Ee λx i by independence. It suffices to find bounds for Ee λx i. e λt

chernoff bounds By Markov s inequality, if λ > 0, P{Z EZ > t} = P {e λ(z EZ) > e λt} Eeλ(Z EZ) Next derive bounds for the moment generating function Ee λ(z EZ) and optimize λ. If Z = n X i is a sum of independent random variables, n n Ee λz = E e λx i = Ee λx i by independence. It suffices to find bounds for Ee λx i. e λt Serguei Bernstein (1880-1968) Herman Chernoff (1923 )

hoeffding s inequality If X 1,..., X n [0, 1], then Ee λ(x i EX i ) e λ2 /8.

hoeffding s inequality If X 1,..., X n [0, 1], then Ee λ(x i EX i ) e λ2 /8. We obtain { 1 P n [ n 1 X i E n ] } n X i > t 2e 2nt2 Wassily Hoeffding (1914 1991)

martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z.

martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z. Writing we have i = E i Z E i 1 Z, Z EZ = n i This is the Doob martingale representation of Z.

martingale representation X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Denote E i [ ] = E[ X 1,..., X i ]. Thus, E 0 Z = EZ and E n Z = Z. Writing we have i = E i Z E i 1 Z, Z EZ = n i This is the Doob martingale representation of Z. Joseph Leo Doob (1910 2004)

martingale representation: the variance ( n Var (Z) = E Now if j > i, E i j = 0, so ) 2 i = We obtain ( n Var (Z) = E n E i j i = i E i j = 0, ) 2 i = [ ] E 2 i + 2 E i j. j>i n [ ] E 2 i.

martingale representation: the variance ( n Var (Z) = E Now if j > i, E i j = 0, so ) 2 i = We obtain ( n Var (Z) = E n E i j i = i E i j = 0, ) 2 i = [ ] E 2 i + 2 E i j. j>i n [ ] E 2 i From this, using independence, it is easy derive the Efron-Stein inequality..

efron-stein inequality (1981) Let X 1,..., X n be independent random variables taking values in X. Let f : X n R and Z = f(x 1,..., X n ). Then Var(Z) E n (Z E (i) Z) 2 = E n Var (i) (Z). where E (i) Z is expectation with respect to the i-th variable X i only.

efron-stein inequality (1981) Let X 1,..., X n be independent random variables taking values in X. Let f : X n R and Z = f(x 1,..., X n ). Then Var(Z) E n (Z E (i) Z) 2 = E n Var (i) (Z). where E (i) Z is expectation with respect to the i-th variable X i only. We obtain more useful forms by using that Var(X) = 1 2 E(X X ) 2 and Var(X) E(X a) 2 for any constant a.

efron-stein inequality (1981) If X 1,..., X n are independent copies of X 1,..., X n, and Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), then Var(Z) 1 2 E [ n (Z Z i )2 ] Z is concentrated if it doesn t depend too much on any of its variables.

efron-stein inequality (1981) If X 1,..., X n are independent copies of X 1,..., X n, and Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), then Var(Z) 1 2 E [ n (Z Z i )2 ] Z is concentrated if it doesn t depend too much on any of its variables. If Z = n X i then we have an equality. Sums are the least concentrated of all functions!

efron-stein inequality (1981) If for some arbitrary functions f i Z i = f i (X 1,..., X i 1, X i+1,..., X n ), then [ n ] Var(Z) E (Z Z i ) 2

efron, stein, and steele Bradley Efron Charles Stein Mike Steele

example: kernel density estimation Let X 1,..., X n be i.i.d. real samples drawn according to some density φ. The kernel density estimate is φ n (x) = 1 n ( x Xi K nh h ), where h > 0, and K is a nonnegative kernel K = 1. The L 1 error is Z = f(x 1,..., X n ) = φ(x) φ n (x) dx.

example: kernel density estimation Let X 1,..., X n be i.i.d. real samples drawn according to some density φ. The kernel density estimate is φ n (x) = 1 n ( x Xi K nh h ), where h > 0, and K is a nonnegative kernel K = 1. The L 1 error is Z = f(x 1,..., X n ) = φ(x) φ n (x) dx. It is easy to see that f(x 1,..., x n ) f(x 1,..., x i,..., x n) 1 ( ) ( x xi x x K K i nh h h (Devroye, 1991.) so we get Var(Z) 2 n. ) dx 2 n,

Luc Devroye on Taksim square (2013).

weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b.

weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then Var(f(X)) aef(x) + b.

self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X).

self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X). Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions.

self-bounding functions If and n 0 f(x) f i (x (i) ) 1 ( ) f(x) f i (x (i) ) f(x), then f is self-bounding and Var(f(X)) Ef(X). Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions. Configuration functions.

beyond the variance X 1,..., X n are independent random variables taking values in some set X. Let f : X n R and Z = f(x 1,..., X n ). Recall the Doob martingale representation: Z EZ = n i where i = E i Z E i 1 Z, with E i [ ] = E[ X 1,..., X i ]. To get exponential inequalities, we bound the moment generating function Ee λ(z EZ).

azuma s inequality Suppose that the martingale differences are bounded: i c i. Then ( n 1 ) Ee λ(z EZ) = Ee λ( n i) = EEn e λ i +λ n = Ee λ ( n 1 i Ee λ ( n 1 i ) ) e λ2 ( n c2 i )/2. E n e λ n e λ2 c 2 n /2 (by Hoeffding) This is the Azuma-Hoeffding inequality for sums of bounded martingale differences.

bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded.

bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded. Bounded differences inequality: if X 1,..., X n are independent, then P{ Z EZ > t} 2e 2t2 / n c2 i.

Colin McDiarmid bounded differences inequality If Z = f(x 1,..., X n ) and f is such that f(x 1,..., x n ) f(x 1,..., x i,..., x n) c i then the martingale differences are bounded. Bounded differences inequality: if X 1,..., X n are independent, then P{ Z EZ > t} 2e 2t2 / n c2 i. McDiarmid s inequality.

bounded differences inequality Easy to use. Distribution free. Often close to optimal. Does not exploit variance information. Often too rigid. Other methods are necessary.

shannon entropy If X, Y are random variables taking values in a set of size N, H(X) = x p(x) log p(x) H(X Y)= H(X, Y) H(Y) = p(x, y) log p(x y) x,y H(X) log N and H(X Y) H(X) Claude Shannon (1916 2001)

han s inequality If X = (X 1,..., X n ) and X (i) = (X 1,..., X i 1, X i+1,..., X n ), then n ( ) H(X) H(X (i) ) H(X) Te Sun Han

han s inequality If X = (X 1,..., X n ) and X (i) = (X 1,..., X i 1, X i+1,..., X n ), then Proof: n ( ) H(X) H(X (i) ) H(X) H(X)= H(X (i) ) + H(X i X (i) ) H(X (i) ) + H(X i X 1,..., X i 1 ) Te Sun Han Since n H(X i X 1,..., X i 1 ) = H(X), summing the inequality, we get (n 1)H(X) n H(X (i) ).

combinatorial entropies an example Let X 1,..., X n be independent points in the plane (of arbitrary distribution!). Let N be the number of subsets of points that are in convex position. Then Var(log 2 N) E log 2 N.

proof By Efron-Stein, it suffices to prove that f is self-bounding: 0 f n (x) f n 1 (x (i) ) 1 and n ( ) f n (x) f n 1 (x (i) ) f n (x).

proof By Efron-Stein, it suffices to prove that f is self-bounding: and n 0 f n (x) f n 1 (x (i) ) 1 ( ) f n (x) f n 1 (x (i) ) f n (x). The first property is obvious, only need to prove the second.

proof By Efron-Stein, it suffices to prove that f is self-bounding: and n 0 f n (x) f n 1 (x (i) ) 1 ( ) f n (x) f n 1 (x (i) ) f n (x). The first property is obvious, only need to prove the second. This is a deterministic property so fix the points.

proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set.

proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set. H(Y) = H(Y 1,..., Y n ) = log 2 N = f n (x) Also, H(Y (i) ) f n 1 (x (i) )

proof Among all sets in convex position, draw one uniformly at random. Define Y i as the indicator that x i is in the chosen set. H(Y) = H(Y 1,..., Y n ) = log 2 N = f n (x) Also, H(Y (i) ) f n 1 (x (i) ) so by Han s inequality, n ( ) f n (x) f n 1 (x (i) ) n ( ) H(Y) H(Y (i) ) H(Y) = f n (x)

Han s inequality for relative entropies Let P = P 1 P n be a product distribution and Q an arbitrary distribution on X n.

Han s inequality for relative entropies Let P = P 1 P n be a product distribution and Q an arbitrary distribution on X n. By Han s inequality, D(Q P) 1 n 1 n D(Q (i) P (i) )

subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0.

subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0.

subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0. Ent(Z) is the relative entropy between the distribution induced by Z on X n and the distribution of X = (X 1,..., X n ).

subadditivity of entropy The entropy of a random variable Z 0 is Ent(Z) = EΦ(Z) Φ(EZ) where Φ(x) = x log x. By Jensen s inequality, Ent(Z) 0. Let X 1,..., X n be independent and let Z = f(x 1,..., X n ), where f 0. Ent(Z) is the relative entropy between the distribution induced by Z on X n and the distribution of X = (X 1,..., X n ). Denote Then by Han s inequality, Ent (i) (Z) = E (i) Φ(Z) Φ(E (i) Z) Ent(Z) E n Ent (i) (Z).

a logarithmic sobolev inequality on the hypercube Let X = (X 1,..., X n ) be uniformly distributed over { 1, 1} n. If f : { 1, 1} n R and Z = f(x), Ent(Z 2 ) 1 n 2 E (Z Z i )2 The proof uses subadditivity of the entropy and calculus for the case n = 1. Implies Efron-Stein.

Sergei Lvovich Sobolev (1908 1989)

herbst s argument: exponential concentration If f : { 1, 1} n R, the log-sobolev inequality may be used with g(x) = e λf(x)/2 where λ R. If F(λ) = Ee λz is the moment generating function of Z = f(x), [ Ent(g(X) 2 )= λe Ze λz] [ E e λz] log E [Ze λz] = λf (λ) F(λ) log F(λ). Differential inequalities are obtained for F(λ).

herbst s argument As an example, suppose f is such that n (Z Z i )2 + v. Then by the log-sobolev inequality, λf (λ) F(λ) log F(λ) vλ2 4 F(λ) If G(λ) = log F(λ), this becomes ( G(λ) λ ) v 4. This can be integrated: G(λ) λez + λv/4, so F(λ) e λez λ2 v/4 This implies P{Z > EZ + t} e t2 /v

herbst s argument As an example, suppose f is such that n (Z Z i )2 + v. Then by the log-sobolev inequality, λf (λ) F(λ) log F(λ) vλ2 4 F(λ) If G(λ) = log F(λ), this becomes ( G(λ) λ ) v 4. This can be integrated: G(λ) λez + λv/4, so This implies F(λ) e λez λ2 v/4 P{Z > EZ + t} e t2 /v Stronger than the bounded differences inequality!

gaussian log-sobolev inequality Let X = (X 1,..., X n ) be a vector of i.i.d. standard normal If f : R n R and Z = f(x), Ent(Z 2 ) 2E [ f(x) 2] (Gross, 1975).

gaussian log-sobolev inequality Let X = (X 1,..., X n ) be a vector of i.i.d. standard normal If f : R n R and Z = f(x), Ent(Z 2 ) 2E [ f(x) 2] (Gross, 1975). Proof sketch: By the subadditivity of entropy, it suffices to prove it for n = 1. Approximate Z = f(x) by ( 1 m ) f ε i m where the ε i are i.i.d. Rademacher random variables. Use the log-sobolev inequality of the hypercube and the central limit theorem.

gaussian concentration inequality Herbst t argument may now be repeated: Suppose f is Lipschitz: for all x, y R n, Then, for all t > 0, f(x) f(y) L x y. P {f(x) Ef(X) t} e t2 /(2L 2). (Tsirelson, Ibragimov, and Sudakov, 1976).

beyond bernoulli and gaussian: the entropy method For general distributions, logarithmic Sobolev inequalities are not available. Solution: modified logarithmic Sobolev inequalities. Suppose X 1,..., X n are independent. Let Z = f(x 1,..., X n ) and Z i = f i (X (i) ) = f i (X 1,..., X i 1, X i+1,..., X n ). Let φ(x) = e x x 1. Then for all λ R, [ λe Ze λz] [ E e λz] log E [e λz] n [ ] E e λz φ ( λ(z Z i )). Michel Ledoux

the entropy method Define Z i = inf x i f(x 1,..., x i,..., X n) and suppose n (Z Z i ) 2 v. Then for all t > 0, P {Z EZ > t} e t2 /(2v).

the entropy method Define Z i = inf x i f(x 1,..., x i,..., X n) and suppose n (Z Z i ) 2 v. Then for all t > 0, P {Z EZ > t} e t2 /(2v). This implies the bounded differences inequality and much more.

example: the largest eigenvalue of a symmetric matrix Let A = (X i,j ) n n be symmetric, the X i,j independent (i j) with X i,j 1. Let Z = λ 1 = sup u T Au. u: u =1 and suppose v is such that Z = v T Av. A i,j is obtained by replacing X i,j by x i,j. Then ( ) (Z Z i,j ) + v T Av v T A i,j v 1Z>Zi,j ( ) ) = v T (A A i,j )v 1Z>Zi,j 2 (v i v j (X i,j X i,j ) Therefore, 1 i j n 4 v i v j. (Z Z i,j )2 + 1 i j n ( n ) 2 16 v i v j 2 16 vi 2 = 16. +

self-bounding functions Suppose Z satisfies 0 Z Z i 1 and n (Z Z i ) Z. Recall that Var(Z) EZ. We have much more: P{Z > EZ + t} e t2 /(2EZ+2t/3) and P{Z < EZ t} e t2 /(2EZ)

exponential efron-stein inequality Define n [ ] V + = E (Z Z i )2 + and n [ ] V = E (Z Z i )2. By Efron-Stein, Var(Z) EV + and Var(Z) EV.

exponential efron-stein inequality Define and By Efron-Stein, V + = V = n [ ] E (Z Z i )2 + n [ ] E (Z Z i )2. Var(Z) EV + and Var(Z) EV. The following exponential versions hold for all λ, θ > 0 with λθ < 1: log Ee λ(z EZ) λθ 1 λθ log /θ EeλV+. If also Z i Z 1 for every i, then for all λ (0, 1/2), log Ee λ(z EZ) 2λ 1 2λ log EeλV.

weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b.

weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then P {Z EZ + t} exp ( t 2 2 (aez + b + at/2) ).

weakly self-bounding functions f : X n [0, ) is weakly (a, b)-self-bounding if there exist f i : X n 1 [0, ) such that for all x X n, n ( f(x) f i (x (i) )) 2 af(x) + b. Then P {Z EZ + t} exp ( t 2 2 (aez + b + at/2) ). If, in addition, f(x) f i (x (i) ) 1, then for 0 < t EZ, ( ) P {Z EZ t} exp t 2 2 (aez + b + c t). where c = (3a 1)/6.

the isoperimetric view Let X = (X 1,..., X n ) have independent components, taking values in X n. Let A X n. The Hamming distance of X to A is d(x, A) = min d(x, y) = min y A y A n 1 Xi y i. Michel Talagrand

the isoperimetric view Let X = (X 1,..., X n ) have independent components, taking values in X n. Let A X n. The Hamming distance of X to A is d(x, A) = min d(x, y) = min y A y A n 1 Xi y i. Michel Talagrand { n P d(x, A) t + 2 log 1 } e 2t2 /n. P[A]

the isoperimetric view Proof: By the bounded differences inequality, P{Ed(X, A) d(x, A) t} e 2t2 /n. Taking t = Ed(X, A), we get Ed(X, A) n 2 log 1 P{A}. By the bounded differences inequality again, { } n P d(x, A) t + 2 log 1 e 2t2 /n P{A}

talagrand s convex distance The weighted Hamming distance is d α (x, A) = inf d α(x, y) = inf y A y A i:x i y i α i where α = (α 1,..., α n ). The same argument as before gives { } α 2 1 P d α (X, A) t + log e 2t2 / α 2, 2 P{A} This implies sup min (P{A}, P {d α (X, A) t}) e t2 /2. α: α =1

convex distance inequality convex distance: d T (x, A) = sup d α (x, A). α [0, ) n : α =1

convex distance inequality convex distance: d T (x, A) = Talagrand s convex distance inequality: sup d α (x, A). α [0, ) n : α =1 P{A}P {d T (X, A) t} e t2 /4.

convex distance inequality convex distance: d T (x, A) = Talagrand s convex distance inequality: sup d α (x, A). α [0, ) n : α =1 P{A}P {d T (X, A) t} e t2 /4. Follows from the fact that d T (X, A) 2 is (4, 0) weakly self bounding (by a saddle point representation of d T ). Talagrand s original proof was different. It can also be recovered from Marton s transportation inequality.

convex lipschitz functions For A [0, 1] n and x [0, 1] n, define If A is convex, then D(x, A) = inf x y. y A D(x, A) d T (x, A).

convex lipschitz functions For A [0, 1] n and x [0, 1] n, define If A is convex, then D(x, A) = inf x y. y A D(x, A) d T (x, A). Proof: D(x, A)= inf x E νy (since A is convex) ν M(A) n ( ) inf Eν 1 2 xj Y j (since x j, Y j [0, 1]) ν M(A) = inf j=1 sup ν M(A) α: α 1 j=1 n α j E ν 1 xj Y j = d T (x, A) (by minimax theorem). (by Cauchy-Schwarz)

convex lipschitz functions Let X = (X 1,..., X n ) have independent components taking values in [0, 1]. Let f : [0, 1] n R be quasi-convex such that f(x) f(y) x y. Then P{f(X) > Mf(X) + t} 2e t2 /4 and P{f(X) < Mf(X) t} 2e t2 /4.

Take s = Mf(X) for the upper tail and s = Mf(X) t for the lower tail. convex lipschitz functions Let X = (X 1,..., X n ) have independent components taking values in [0, 1]. Let f : [0, 1] n R be quasi-convex such that f(x) f(y) x y. Then and P{f(X) > Mf(X) + t} 2e t2 /4 P{f(X) < Mf(X) t} 2e t2 /4. Proof: Let A s = {x : f(x) s} [0, 1] n. A s is convex. Since f is Lipschitz, f(x) s + D(x, A s ) s + d T (x, A s ), By the convex distance inequality, P{f(X) s + t}p{f(x) s} e t2 /4.

Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ).

Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ). H Φ is subadditive: H Φ (Z) n EH (i) Φ (Z) if (and only if) Φ is twice differentiable on (0, ), and either Φ is affine or strictly positive and 1/Φ is concave.

Φ entropies For a convex function Φ on [0, ), the Φ-entropy of Z 0 is H Φ (Z) = EΦ (Z) Φ (EZ). H Φ is subadditive: H Φ (Z) n EH (i) Φ (Z) if (and only if) Φ is twice differentiable on (0, ), and either Φ is affine or strictly positive and 1/Φ is concave. Φ(x) = x 2 corresponds to Efron-Stein. x log x is subadditivity of entropy. We may consider Φ(x) = x p for p (1, 2].

generalized efron-stein Define Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), n V + = (Z Z i )2 +.

generalized efron-stein Define Z i = f(x 1,..., X i 1, X i, X i+1,..., X n ), n V + = (Z Z i )2 +. For q 2 and q/2 α q 1, E [ (Z EZ) q ] + E [ (Z EZ) α ] ] q/α + + α (q α) E [V + (Z EZ) q 2 +, and similarly for E [ (Z EZ) q ].

moment inequalities We may solve the recursions, for q 2.

moment inequalities We may solve the recursions, for q 2. If V + c for some constant c 0, then for all integers q 2, ( E [ (Z EZ) q +]) 1/q Kqc, where K = 1/ ( e e ) < 0.935.

moment inequalities We may solve the recursions, for q 2. If V + c for some constant c 0, then for all integers q 2, ( E [ (Z EZ) q +]) 1/q Kqc, where K = 1/ ( e e ) < 0.935. More generally, ( [ E (Z EZ) q ]) 1/q ( [ ]) + 1.6 q E V + q/2 1/q.