Stein Couplings for Concentration of Measure

Similar documents
Concentration of Measures by Bounded Couplings

Stein s Method: Distributional Approximation and Concentration of Measure

Concentration of Measures by Bounded Size Bias Couplings

Stein s Method: Distributional Approximation and Concentration of Measure

Stein s Method Applied to Some Statistical Problems

MODERATE DEVIATIONS IN POISSON APPROXIMATION: A FIRST ATTEMPT

Stein s Method for Matrix Concentration

Stein s Method for concentration inequalities

Matrix Concentration. Nick Harvey University of British Columbia

arxiv: v1 [math.pr] 11 Feb 2019

A Gentle Introduction to Stein s Method for Normal Approximation I

Stein s Method and the Zero Bias Transformation with Application to Simple Random Sampling

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1

STAT 200C: High-dimensional Statistics

DERIVING MATRIX CONCENTRATION INEQUALITIES FROM KERNEL COUPLINGS. Daniel Paulin Lester Mackey Joel A. Tropp

NEW FUNCTIONAL INEQUALITIES

Ferromagnets and the classical Heisenberg model. Kay Kirkpatrick, UIUC

Geometry of log-concave Ensembles of random matrices

Random Toeplitz Matrices

Notes 6 : First and second moment methods

Matrix concentration inequalities

Asymptotic Statistics-III. Changliang Zou

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION

Parameter Estimation

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random

Statistical Machine Learning

Bounds on the Constant in the Mean Central Limit Theorem

On large deviations for combinatorial sums

A Generalization of Wigner s Law

Lecture 1: August 28

Thermodynamic limit for a system of interacting fermions in a random medium. Pieces one-dimensional model

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Balls & Bins. Balls into Bins. Revisit Birthday Paradox. Load. SCONE Lab. Put m balls into n bins uniformly at random

Lecture 7: February 6

MATRIX CONCENTRATION INEQUALITIES VIA THE METHOD OF EXCHANGEABLE PAIRS 1

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

2 The Curie Weiss Model

arxiv: v3 [math.pr] 19 Apr 2018

STEIN S METHOD, SEMICIRCLE DISTRIBUTION, AND REDUCED DECOMPOSITIONS OF THE LONGEST ELEMENT IN THE SYMMETRIC GROUP

EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

Normal Approximation for Hierarchical Structures

Differential Stein operators for multivariate continuous distributions and applications

On Markov Chain Monte Carlo

Efron Stein Inequalities for Random Matrices

STAT 200C: High-dimensional Statistics

Outline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University.

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations

Introduction to Self-normalized Limit Theory

The Lopsided Lovász Local Lemma

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Poisson Approximation for Two Scan Statistics with Rates of Convergence

Introduction to Machine Learning

Limiting Distributions

Stone-Čech compactification of Tychonoff spaces

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Newton s Method and Localization

Maximum Mean Discrepancy

. Find E(V ) and var(v ).

Concentration inequalities and the entropy method

Estimating network degree distributions from sampled networks: An inverse problem

Mod-φ convergence I: examples and probabilistic estimates

Generalization theory

CONDITIONED POISSON DISTRIBUTIONS AND THE CONCENTRATION OF CHROMATIC NUMBERS

Algebra 2 (2006) Correlation of the ALEKS Course Algebra 2 to the California Content Standards for Algebra 2

RENORMALIZATION OF DYSON S VECTOR-VALUED HIERARCHICAL MODEL AT LOW TEMPERATURES

Tail process and its role in limit theorems Bojan Basrak, University of Zagreb

Stein s method, logarithmic Sobolev and transport inequalities

Limit Theorems for Exchangeable Random Variables via Martingales

Networks: Lectures 9 & 10 Random graphs

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Kolmogorov Berry-Esseen bounds for binomial functionals

A Matrix Expander Chernoff Bound

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Covariance function estimation in Gaussian process regression

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Estimation of Dynamic Regression Models

First Year Examination Department of Statistics, University of Florida

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Gravitational allocation to Poisson points

4 Sums of Independent Random Variables

High Dimensional Probability

Random Methods for Linear Algebra

Distance between multinomial and multivariate normal models

Prentice Hall: Algebra 2 with Trigonometry 2006 Correlated to: California Mathematics Content Standards for Algebra II (Grades 9-12)

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

Concentration Inequalities for Random Matrices

Fluctuations from the Semicircle Law Lecture 4

OXPORD UNIVERSITY PRESS

Chapter 7. Basic Probability Theory

Math 576: Quantitative Risk Management

Resolution of the Cap Set Problem: Progression Free Subsets of F n q are exponentially small

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

BALANCING GAUSSIAN VECTORS. 1. Introduction

Asymptotically optimal induced universal graphs

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

5. Conditional Distributions

LECTURE 10 LINEAR PROCESSES II: SPECTRAL DENSITY, LAG OPERATOR, ARMA. In this lecture, we continue to discuss covariance stationary processes.

METEOR PROCESS. Krzysztof Burdzy University of Washington

Transcription:

Stein Couplings for Concentration of Measure Jay Bartroff, Subhankar Ghosh, Larry Goldstein and Ümit Işlak University of Southern California [arxiv:0906.3886] [arxiv:1304.5001] [arxiv:1402.6769] Borchard Symposium, June/July 2014

Concentration of Measure Distributional tail bounds can be provided in cases where exact computation is intractable. Concentration of measure results can provide exponentially decaying bounds with explicit constants.

Concentration of Measure Distributional tail bounds can be provided in cases where exact computation is intractable. Concentration of measure results can provide exponentially decaying bounds with explicit constants.

Bounded Difference Inequality If Y = f (X 1,..., X n ) with X 1,..., X n independent, and for every i = 1,..., n the differences of the function f : Ω n R sup f (x 1,..., x i 1, x i, x i+1,..., x n ) f (x 1,..., x i 1, x i, x i+1,..., x n ) x i,x i are bounded by c i, then ( P ( Y E[Y ] t) 2 exp t 2 2 n k=1 c2 k ).

Self Bounding Functions The function f (x), x = (x 1,..., x n ) is (a, b) self bounding if there exist functions f i (x i ), x i = (x 1,..., x i 1, x i+1,..., x n ) such that and n (f (x) f i (x i )) af (x) + b i=1 0 f (x) f i (x i ) 1 for all x.

Self Bounding Functions For say, the upper tail, with c = (3a 1)/6, Y = f (X 1,..., X n ), with X 1,..., X n independent, for all t 0, ( t 2 ) P(Y E[Y ] t) exp. 2(aE[Y ] + b + c + t) Mean in the denominator can be very competitive with the factor n i=1 c2 i in the bounded difference inequality. If (a, b) = (1, 0), say, the denominator of the exponent is 2(E[Y ] + t/3), and as t rate is exp( 3t/2).

Self Bounding Functions For say, the upper tail, with c = (3a 1)/6, Y = f (X 1,..., X n ), with X 1,..., X n independent, for all t 0, ( t 2 ) P(Y E[Y ] t) exp. 2(aE[Y ] + b + c + t) Mean in the denominator can be very competitive with the factor n i=1 c2 i in the bounded difference inequality. If (a, b) = (1, 0), say, the denominator of the exponent is 2(E[Y ] + t/3), and as t rate is exp( 3t/2).

Use of Stein s Method Couplings Stein s method developed for evaluating the quality of distributional approximations through the use of characterizing equations. Implementation of the method often involves coupling constructions, with the quality of the resulting bounds reflecting the closeness of the coupling. Such couplings can be thought of as a type of distributional perturbation that measures dependence. Concentration of measure should hold when perturbation is small.

Use of Stein s Method Couplings Stein s method developed for evaluating the quality of distributional approximations through the use of characterizing equations. Implementation of the method often involves coupling constructions, with the quality of the resulting bounds reflecting the closeness of the coupling. Such couplings can be thought of as a type of distributional perturbation that measures dependence. Concentration of measure should hold when perturbation is small.

Use of Stein s Method Couplings Stein s method developed for evaluating the quality of distributional approximations through the use of characterizing equations. Implementation of the method often involves coupling constructions, with the quality of the resulting bounds reflecting the closeness of the coupling. Such couplings can be thought of as a type of distributional perturbation that measures dependence. Concentration of measure should hold when perturbation is small.

Use of Stein s Method Couplings Stein s method developed for evaluating the quality of distributional approximations through the use of characterizing equations. Implementation of the method often involves coupling constructions, with the quality of the resulting bounds reflecting the closeness of the coupling. Such couplings can be thought of as a type of distributional perturbation that measures dependence. Concentration of measure should hold when perturbation is small.

Stein s Method and Concentration Inequalities Raič (2007) applies the Stein equation to obtain Cramér type moderate deviations relative to the normal for some graph related statistics. Chatterjee (2007) derives tail bounds for Hoeffding s combinatorial CLT and the net magnetization in the Curie-Weiss model from statistical physics based on Stein s exchangeable pair coupling. Goldstein and Ghosh (2011) show bounded size bias coupling implies concentration. Chen and Röellin (2010) consider general Stein couplings of which the exchangeable pair and size bias (but not zero bias) are special cases; E[Gf (W ) Gf (W )] = E[Wf (W )]. Paulin, Mackey and Tropp (2012,2013) extend exchangeable pair method to random matrices.

Stein s Method and Concentration Inequalities Raič (2007) applies the Stein equation to obtain Cramér type moderate deviations relative to the normal for some graph related statistics. Chatterjee (2007) derives tail bounds for Hoeffding s combinatorial CLT and the net magnetization in the Curie-Weiss model from statistical physics based on Stein s exchangeable pair coupling. Goldstein and Ghosh (2011) show bounded size bias coupling implies concentration. Chen and Röellin (2010) consider general Stein couplings of which the exchangeable pair and size bias (but not zero bias) are special cases; E[Gf (W ) Gf (W )] = E[Wf (W )]. Paulin, Mackey and Tropp (2012,2013) extend exchangeable pair method to random matrices.

Stein s Method and Concentration Inequalities Raič (2007) applies the Stein equation to obtain Cramér type moderate deviations relative to the normal for some graph related statistics. Chatterjee (2007) derives tail bounds for Hoeffding s combinatorial CLT and the net magnetization in the Curie-Weiss model from statistical physics based on Stein s exchangeable pair coupling. Goldstein and Ghosh (2011) show bounded size bias coupling implies concentration. Chen and Röellin (2010) consider general Stein couplings of which the exchangeable pair and size bias (but not zero bias) are special cases; E[Gf (W ) Gf (W )] = E[Wf (W )]. Paulin, Mackey and Tropp (2012,2013) extend exchangeable pair method to random matrices.

Stein s Method and Concentration Inequalities Raič (2007) applies the Stein equation to obtain Cramér type moderate deviations relative to the normal for some graph related statistics. Chatterjee (2007) derives tail bounds for Hoeffding s combinatorial CLT and the net magnetization in the Curie-Weiss model from statistical physics based on Stein s exchangeable pair coupling. Goldstein and Ghosh (2011) show bounded size bias coupling implies concentration. Chen and Röellin (2010) consider general Stein couplings of which the exchangeable pair and size bias (but not zero bias) are special cases; E[Gf (W ) Gf (W )] = E[Wf (W )]. Paulin, Mackey and Tropp (2012,2013) extend exchangeable pair method to random matrices.

Stein s Method and Concentration Inequalities Raič (2007) applies the Stein equation to obtain Cramér type moderate deviations relative to the normal for some graph related statistics. Chatterjee (2007) derives tail bounds for Hoeffding s combinatorial CLT and the net magnetization in the Curie-Weiss model from statistical physics based on Stein s exchangeable pair coupling. Goldstein and Ghosh (2011) show bounded size bias coupling implies concentration. Chen and Röellin (2010) consider general Stein couplings of which the exchangeable pair and size bias (but not zero bias) are special cases; E[Gf (W ) Gf (W )] = E[Wf (W )]. Paulin, Mackey and Tropp (2012,2013) extend exchangeable pair method to random matrices.

Exchangeable Pair Couplings Let (X, X ) be exchangeable, with F (X, X ) = F (X, X ) and E[F (X, X ) X ] = f (X ) 1 2 E[ (f (X ) f (X ))F (X, X ) X ] c. Then Y = f (X) satisfies No independence assumption. ) P( Y t) 2 exp ( t2. 2c

Exchangeable Pair Couplings Let (X, X ) be exchangeable, with F (X, X ) = F (X, X ) and E[F (X, X ) X ] = f (X ) 1 2 E[ (f (X ) f (X ))F (X, X ) X ] c. Then Y = f (X) satisfies No independence assumption. ) P( Y t) 2 exp ( t2. 2c

Curie Weiss Model Consider a graph on n vertices V with symmetric neighborhoods N j, and Hamiltonian H h (σ) = 1 σ j σ k h 2n j V k N j σ j, i V and the measure on spins σ = (σ i ) i V, σ i { 1, 1} p β,h (σ) = Z 1 β,h e βh h(σ). Interested in the average net magentization m = 1 σ i. n i V

Curie Weiss Model Consider a graph on n vertices V with symmetric neighborhoods N j, and Hamiltonian H h (σ) = 1 σ j σ k h 2n j V k N j σ j, i V and the measure on spins σ = (σ i ) i V, σ i { 1, 1} p β,h (σ) = Z 1 β,h e βh h(σ). Interested in the average net magentization Consider the complete graph. m = 1 σ i. n i V

Curie Weiss Concentration Choose vertex V uniformly and sample σ V from the conditional distribution of σ V given σ j, j N V. Yields an exchangeable pair allowing the result above to imply, taking h = 0 for simplicity, P ( m tanh(βm) βn ) + t 2e nt2 /(4+4β). The magnetization m is concentrated about the roots of the equation x = tanh(βx).

Curie Weiss Concentration Choose vertex V uniformly and sample σ V from the conditional distribution of σ V given σ j, j N V. Yields an exchangeable pair allowing the result above to imply, taking h = 0 for simplicity, P ( m tanh(βm) βn ) + t 2e nt2 /(4+4β). The magnetization m is concentrated about the roots of the equation x = tanh(βx).

Size Bias Couplings For a nonnegative random variable Y with finite nonzero mean µ, we say that Y s has the Y -size bias distribution if E[Yg(Y )] = µe[g(y s )] for all g. Size biasing may appear, undesirably, in sampling. For sums of independent variables, size biasing a single summand size biases the sum. The closeness of a coupling of a sum Y to Y s is a type of perturbation that measures the dependence in the summands of Y. If X is a non trivial indicator variable then X s = 1.

Size Bias Couplings For a nonnegative random variable Y with finite nonzero mean µ, we say that Y s has the Y -size bias distribution if E[Yg(Y )] = µe[g(y s )] for all g. Size biasing may appear, undesirably, in sampling. For sums of independent variables, size biasing a single summand size biases the sum. The closeness of a coupling of a sum Y to Y s is a type of perturbation that measures the dependence in the summands of Y. If X is a non trivial indicator variable then X s = 1.

Size Bias Couplings For a nonnegative random variable Y with finite nonzero mean µ, we say that Y s has the Y -size bias distribution if E[Yg(Y )] = µe[g(y s )] for all g. Size biasing may appear, undesirably, in sampling. For sums of independent variables, size biasing a single summand size biases the sum. The closeness of a coupling of a sum Y to Y s is a type of perturbation that measures the dependence in the summands of Y. If X is a non trivial indicator variable then X s = 1.

Size Bias Couplings For a nonnegative random variable Y with finite nonzero mean µ, we say that Y s has the Y -size bias distribution if E[Yg(Y )] = µe[g(y s )] for all g. Size biasing may appear, undesirably, in sampling. For sums of independent variables, size biasing a single summand size biases the sum. The closeness of a coupling of a sum Y to Y s is a type of perturbation that measures the dependence in the summands of Y. If X is a non trivial indicator variable then X s = 1.

Bounded Size Bias Coupling implies Concentration Let Y be a nonnegative random variable with finite positive mean µ. Suppose there exists a coupling of Y to a variable Y s having the Y -size bias distribution that satisfies Y s Y + c for some c > 0 with probability one. Then, max (1 t 0 P(Y µ t), 1 µ t 0 P(Y µ t)) b(t; µ, c) where b(t; µ, c) = ( ) µ (t+µ)/c e t/c. µ + t Ghosh and Goldstein (2011), Improvement by Arratia and Baxendale (2013) Poisson behavior, rate exp( t log t) as t.

Bounded Size Bias Coupling implies Concentration Let Y be a nonnegative random variable with finite positive mean µ. Suppose there exists a coupling of Y to a variable Y s having the Y -size bias distribution that satisfies Y s Y + c for some c > 0 with probability one. Then, max (1 t 0 P(Y µ t), 1 µ t 0 P(Y µ t)) b(t; µ, c) where b(t; µ, c) = ( ) µ (t+µ)/c e t/c. µ + t Ghosh and Goldstein (2011), Improvement by Arratia and Baxendale (2013) Poisson behavior, rate exp( t log t) as t.

Bounded Coupling Concentration Inequality For the right tail, say, using that for x 0 the function h(x) = (1 + x) log(1 + x) x obeys the bound h(x) x 2 2(1 + x/3), we have ( P(Y µ t) exp t 2 2c(µ + t/3) ).

Proof of Upper Tail Bound For θ 0, e θy s = e θ(y +Y s Y ) e cθ e θy. (1) With m Y s (θ) = Ee θy s, and similarly for m Y (θ), µm Y s (θ) = µee θy s = E[Ye θy ] = m Y (θ) so multiplying by µ in (1) and taking expectation yields m Y (θ) µecθ m Y (θ). Integration yields ( µ m Y (θ) exp c ( )) e cθ 1 and the bound is obtained upon choosing θ = log(t/µ)/c in P(Y t) = P(e θt e θy 1) e θt+ µ c (e cθ 1).

Size Biasing Sum of Exchangeable Indicators Suppose X is a sum of nontrivial exchangeable indicator variables X 1,..., X n, and that for i {1,..., n} the variables X i 1,..., X i n have joint distribution Then L(X i 1,..., X i n) = L(X 1,..., X n X i = 1). X i = has the X -size bias distribution X s, as does the mixture X I when I is a random index with values in {1,..., n}, independent of all other variables. n j=1 In more generality, pick index I with probability P(I = i) proportional to EX i. X i j

Size Biasing Sum of Exchangeable Indicators Suppose X is a sum of nontrivial exchangeable indicator variables X 1,..., X n, and that for i {1,..., n} the variables X i 1,..., X i n have joint distribution Then L(X i 1,..., X i n) = L(X 1,..., X n X i = 1). X i = has the X -size bias distribution X s, as does the mixture X I when I is a random index with values in {1,..., n}, independent of all other variables. n j=1 In more generality, pick index I with probability P(I = i) proportional to EX i. X i j

Applications 1. The number of local maxima of a random function on a graph 2. The number of vertices in an Erdős-Rényi graph exceeding pre-set thresholds 3. The d-way covered volume of a collection of m balls placed uniformly over a volume m subset of R p

Applications 1. The number of local maxima of a random function on a graph 2. The number of vertices in an Erdős-Rényi graph exceeding pre-set thresholds 3. The d-way covered volume of a collection of m balls placed uniformly over a volume m subset of R p

Applications 1. The number of local maxima of a random function on a graph 2. The number of vertices in an Erdős-Rényi graph exceeding pre-set thresholds 3. The d-way covered volume of a collection of m balls placed uniformly over a volume m subset of R p

Local Maxima on Graphs Let G = (V, E) be a given graph, and for every v V let V v V be the neighbors of v, with v V. Let {C g, g V} be a collection of independent and identically distributed continuous random variables, and let X v be the indicator that vertex v corresponds to a local maximum value with respect to the neighborhood V v, that is X v (C w, w V v ) = 1(C v > C w ), v V. The sum w V v \{v} Y = v V X v is the number of local maxima on G.

Local Maxima on Graphs Let G = (V, E) be a given graph, and for every v V let V v V be the neighbors of v, with v V. Let {C g, g V} be a collection of independent and identically distributed continuous random variables, and let X v be the indicator that vertex v corresponds to a local maximum value with respect to the neighborhood V v, that is X v (C w, w V v ) = 1(C v > C w ), v V. The sum w V v \{v} Y = v V X v is the number of local maxima on G.

Size Biasing {X v, v V} If X v = 1, that is, if v is already a local maxima, let X v = X. Otherwise, interchange the value C v at v with the value C w at the vertex w that achieves the maximum C u for u V v, and let X v be the indicators of local maxima on this new configuration. Then Y s, the number of local maxima on X I, where I is chosen proportional to EX v, has the Y -size bias distribution. We have Y s Y + c where c = max v V max w V v V w.

Size Biasing {X v, v V} If X v = 1, that is, if v is already a local maxima, let X v = X. Otherwise, interchange the value C v at v with the value C w at the vertex w that achieves the maximum C u for u V v, and let X v be the indicators of local maxima on this new configuration. Then Y s, the number of local maxima on X I, where I is chosen proportional to EX v, has the Y -size bias distribution. We have Y s Y + c where c = max v V max w V v V w.

Self Bounding and Configuration Functions The collection of sets Π k Ω k, k = 0,..., n is hereditary if (x 1,..., x k ) Π k implies (x i1,..., x ij ) Π j for any 1 i 1 <... < i ij k. Let f : Ω n R be the function that assigns to x Ω n the size k of the largest subsequence of x that lies in Π k. With f i (x) the function f evaluated on x after removing its i th coordinate, we have 0 f (x) f i (x) 1 and n (f (x) f i (x)) f (x) i=1 as removing a single coordinate from x reduces f by at most one, and there at most f = k important coordinates. Hence, configuration functions are (a, b) = (1, 0) self bounding.

Self Bounding Functions The number of local maxima is a configuration function, with (x i1,..., x ij ) Π j when the vertices indexed by i 1,..., i j are local maxima; hence the number of local maxima Y is a self bounding function. Hence, Y satisfies the concentration bound ( t 2 ) P(Y E[Y ] t) exp. 2(E[Y ] + t/3) Size bias bound is of Poisson type with tail rate exp( t log t).

Self Bounding Functions The number of local maxima is a configuration function, with (x i1,..., x ij ) Π j when the vertices indexed by i 1,..., i j are local maxima; hence the number of local maxima Y is a self bounding function. Hence, Y satisfies the concentration bound ( t 2 ) P(Y E[Y ] t) exp. 2(E[Y ] + t/3) Size bias bound is of Poisson type with tail rate exp( t log t).

Multinomial Occupancy Models Let M α be the degree of vertex α [m] in an Erdős-Rényi random graph. Then Yge = 1(M α d α ) α [m] obeys the concentration bound b(t; µ, c) with c = sup α [m] d α + 1. Unbounded couplings can more easily be constructed than bounded ones, for instance, by giving the chosen vertex α the number of edges from the conditional distribution given M α d α. A coupling bounded by sup α [m] d α may be constructed by adding edges, or not, sequentially, to the chosen vertex, with probabilities depending on its degree. Degree distributions are log concave.

Multinomial Occupancy Models Let M α be the degree of vertex α [m] in an Erdős-Rényi random graph. Then Yge = 1(M α d α ) α [m] obeys the concentration bound b(t; µ, c) with c = sup α [m] d α + 1. Unbounded couplings can more easily be constructed than bounded ones, for instance, by giving the chosen vertex α the number of edges from the conditional distribution given M α d α. A coupling bounded by sup α [m] d α may be constructed by adding edges, or not, sequentially, to the chosen vertex, with probabilities depending on its degree. Degree distributions are log concave.

Multinomial Occupancy Models Let M α be the degree of vertex α [m] in an Erdős-Rényi random graph. Then Yge = 1(M α d α ) α [m] obeys the concentration bound b(t; µ, c) with c = sup α [m] d α + 1. Unbounded couplings can more easily be constructed than bounded ones, for instance, by giving the chosen vertex α the number of edges from the conditional distribution given M α d α. A coupling bounded by sup α [m] d α may be constructed by adding edges, or not, sequentially, to the chosen vertex, with probabilities depending on its degree. Degree distributions are log concave.

Multinomial Occupancy Models Similar remarks apply to Yne = α [m] 1(M α d α ). For some models, not here but e.g. multinomial urn occupancy, the indicators of Yge are negatively associated, though not for Yne.

The d-way covered volume on m balls in R p Let X 1,..., X m be the uniform and independent over the torus C n = [0, n 1/p ) p, and unit balls B 1,... B m placed at these centers. Then deviations of t or more from the mean by V k = Vol r [m] r d α r are bounded by b(t; µ, c) with c = dπ p. B α

Zero Bias Coupling For the mean zero, variance σ 2 random variable, we say Y has the Y -zero bias distribution when E[Yf (Y )] = σ 2 E[f (Y )] for all smooth f. Restatement of Stein s lemma: Y is normal if and only if Y = d Y. If Y and Y can be coupled on the same space such that Y Y c a.s., then under a mild MGF assumption ( P(Y t) exp t 2 2(σ 2 + ct) ), and with 4σ 2 + Ct in the denominator under similar conditions.

Zero Bias Coupling For the mean zero, variance σ 2 random variable, we say Y has the Y -zero bias distribution when E[Yf (Y )] = σ 2 E[f (Y )] for all smooth f. Restatement of Stein s lemma: Y is normal if and only if Y = d Y. If Y and Y can be coupled on the same space such that Y Y c a.s., then under a mild MGF assumption ( P(Y t) exp t 2 2(σ 2 + ct) ), and with 4σ 2 + Ct in the denominator under similar conditions.

Zero Bias Coupling For the mean zero, variance σ 2 random variable, we say Y has the Y -zero bias distribution when E[Yf (Y )] = σ 2 E[f (Y )] for all smooth f. Restatement of Stein s lemma: Y is normal if and only if Y = d Y. If Y and Y can be coupled on the same space such that Y Y c a.s., then under a mild MGF assumption ( P(Y t) exp t 2 2(σ 2 + ct) ), and with 4σ 2 + Ct in the denominator under similar conditions.

Combinatorial CLT Zero bias coupling can produce bounds for Hoeffdings statistic Y = n i=1 a iπ(i) when π is chosen uniformly over the symmetric group S n, and when its distribution is constant over cycle type. Permutations π chosen uniformly from involutions, π 2 = id, without fixed points; arises in matched pairs experiments.

Combinatorial CLT Zero bias coupling can produce bounds for Hoeffdings statistic Y = n i=1 a iπ(i) when π is chosen uniformly over the symmetric group S n, and when its distribution is constant over cycle type. Permutations π chosen uniformly from involutions, π 2 = id, without fixed points; arises in matched pairs experiments.

Combinatorial CLT, Exchangeable Pair Coupling Under the assumption that 0 a ij 1, using the exchangeable pair Chatterjee produces the bound ( t 2 ) P( Y µ A t) 2 exp, 4µ A + 2t while under this condition the zero bias bound gives ( t 2 ) P( Y µ A t) 2 exp 2σA 2 + 16t, which is smaller whenever t (2µ A σ 2 A )/7, holding asymptotically everywhere if a ij are i.i.d., say, as then Eσ 2 A < Eµ A.

Matrix Concentration Inequalities Application in high dimensional statistics, variable selection, matrix completion problem.

Matrix Concentration Inequalities Paulin, Mackay and Tropp, Stein pair Kernel coupling. Take (Z, Z ) exchangeable and X H d d such that X = φ(z) and X = φ(z ), and anti-symmetric function Kernel function K such that With E[K(Z, Z ) Z] = X. V X = 1 2 E[(X X ) 2 Z] and V K = 1 2 E[K(Z, Z ) 2 Z] if there exist s, c, v such that V X s 1 (cx + vi ) and V K s(cx + vi ), then one has bounds, such as, ( ) t 2 P(λmax(X ) t) d exp 2v + 2ct

Matrix Concentration by Size Bias For X a non-negative random variable with finite mean, we say X s has the X -size bias distribution when E[Xf (X )] = E[X ]E[f (X s )]

Matrix Concentration by Size Bias For X a positive definite random matrix with finite mean, we say X s has the X -size bias distribution when tr (E[Xf (X )]) = tr (E[X ]E[f (X s )]). For a product X = γa with γ a non-negative, scalar random variable and A a fixed positive definite matrix, X s = γ s A.

Matrix Concentration by Size Bias For X a positive definite random matrix with finite mean, we say X s has the X -size bias distribution when tr (E[Xf (X )]) = tr (E[X ]E[f (X s )]). For a product X = γa with γ a non-negative, scalar random variable and A a fixed positive definite matrix, X s = γ s A.

Size Bias Matrix Concentration If X = n k=1 Y k, with Y 1,..., Y n independent, then tre[xf (X )] = n tre[y k f (X )] = k=1 n k=1 ( ) tr E[Y k ]E[f (X (k) )]

Size Bias Matrix Concentration If X = n k=1 Y k, with Y 1,..., Y n independent, then tre[xf (X )] = May bound by n tre[y k f (X )] = k=1 n k=1 ( ) tr E[Y k ]E[f (X (k) )] n λ max (E[Y k ])tre[f (X (k) )], k=1 but doing so will produce a constant in the bound of value n λmax(ey k ) rather than λmax(ex ). k=1

Summary Concentration of measure results can provide exponential tail bounds on complicated distributions. Most concentration of measure results require independence. Size bias and zero bias couplings, or perturbations, measure departures from independence. Bounded couplings imply concentration of measure (and central limit behavior.) Unbounded couplings can also be handled under special conditions e.g., the number of isolated vertices in the Erdös-Rényi random graph (Ghosh, Goldstein and Raič).

Summary Concentration of measure results can provide exponential tail bounds on complicated distributions. Most concentration of measure results require independence. Size bias and zero bias couplings, or perturbations, measure departures from independence. Bounded couplings imply concentration of measure (and central limit behavior.) Unbounded couplings can also be handled under special conditions e.g., the number of isolated vertices in the Erdös-Rényi random graph (Ghosh, Goldstein and Raič).

Summary Concentration of measure results can provide exponential tail bounds on complicated distributions. Most concentration of measure results require independence. Size bias and zero bias couplings, or perturbations, measure departures from independence. Bounded couplings imply concentration of measure (and central limit behavior.) Unbounded couplings can also be handled under special conditions e.g., the number of isolated vertices in the Erdös-Rényi random graph (Ghosh, Goldstein and Raič).

Summary Concentration of measure results can provide exponential tail bounds on complicated distributions. Most concentration of measure results require independence. Size bias and zero bias couplings, or perturbations, measure departures from independence. Bounded couplings imply concentration of measure (and central limit behavior.) Unbounded couplings can also be handled under special conditions e.g., the number of isolated vertices in the Erdös-Rényi random graph (Ghosh, Goldstein and Raič).