Stein s Method Applied to Some Statistical Problems

Size: px

Start display at page:

Download "Stein s Method Applied to Some Statistical Problems"

Tobias Woods
5 years ago
Views:

1 Stein s Method Applied to Some Statistical Problems Jay Bartroff Borchard Colloquium 2017 Jay Bartroff (USC) Stein s for Stats 4.Jul.17 1 / 36

2 Outline of this talk 1. Stein s Method 2. Bounds to the normal for group sequential statistics with covariates Produce explicit distributional bounds to the limiting normal distribution for repeatedly-computed MLE (ˆθ 1, ˆθ 2,..., ˆθ K ) of a parameter vector θ R p in group sequential trials 3. Concentration inequalities for occupancy models with log-concave marginals How to get bounded, size biased couplings for certain multivariate occupancy models, then use these to get concentration inequalities Joint work with Larry Goldstein (USC) and Ümit Işlak (Boğaziçi University, Istanbul) Jay Bartroff (USC) Stein s for Stats 4.Jul.17 2 / 36

3 Outline of this talk 1. Stein s Method 2. Bounds to the normal for group sequential statistics with covariates Produce explicit distributional bounds to the limiting normal distribution for repeatedly-computed MLE (ˆθ 1, ˆθ 2,..., ˆθ K ) of a parameter vector θ R p in group sequential trials 3. Concentration inequalities for occupancy models with log-concave marginals How to get bounded, size biased couplings for certain multivariate occupancy models, then use these to get concentration inequalities Joint work with Larry Goldstein (USC) and Ümit Işlak (Boğaziçi University, Istanbul) Jay Bartroff (USC) Stein s for Stats 4.Jul.17 2 / 36

4 Outline of this talk 1. Stein s Method 2. Bounds to the normal for group sequential statistics with covariates Produce explicit distributional bounds to the limiting normal distribution for repeatedly-computed MLE (ˆθ 1, ˆθ 2,..., ˆθ K ) of a parameter vector θ R p in group sequential trials 3. Concentration inequalities for occupancy models with log-concave marginals How to get bounded, size biased couplings for certain multivariate occupancy models, then use these to get concentration inequalities Joint work with Larry Goldstein (USC) and Ümit Işlak (Boğaziçi University, Istanbul) Jay Bartroff (USC) Stein s for Stats 4.Jul.17 2 / 36

5 Outline of this talk 1. Stein s Method 2. Bounds to the normal for group sequential statistics with covariates Produce explicit distributional bounds to the limiting normal distribution for repeatedly-computed MLE (ˆθ 1, ˆθ 2,..., ˆθ K ) of a parameter vector θ R p in group sequential trials 3. Concentration inequalities for occupancy models with log-concave marginals How to get bounded, size biased couplings for certain multivariate occupancy models, then use these to get concentration inequalities Joint work with Larry Goldstein (USC) and Ümit Işlak (Boğaziçi University, Istanbul) Jay Bartroff (USC) Stein s for Stats 4.Jul.17 2 / 36

Charles Stein 1920-2016 Stein s paradox: X in admissable for θ in N p (θ, I) for p 3 James-Stein shrinkage estimator Stein s unbiased risk estimator for MSE Stein s Lemma

6 Charles Stein Stein s paradox: X in admissable for θ in N p (θ, I) for p 3 James-Stein shrinkage estimator Stein s unbiased risk estimator for MSE Stein s Lemma (1): Covariance estimation Stein s Lemma (2): Sequential sample size tails Stein s method for distributional approximation Jay Bartroff (USC) Stein s for Stats 4.Jul.17 3 / 36

7 Application 1: Group sequential analysis Jay Bartroff (USC) Stein s for Stats 4.Jul.17 4 / 36

8 Application 1: Group sequential analysis Jay Bartroff (USC) Stein s for Stats 4.Jul.17 5 / 36

9 But there are other books on this subject... Jay Bartroff (USC) Stein s for Stats 4.Jul.17 6 / 36

10 Setup Response Y i R of ith patient depends on known covariate vector x i unknown parameter vector θ R p Primary goal: To test a null hypothesis about θ, e.g., H 0 : θ = 0 H 0 : θ j 0 H 0 : at θ = b, some vector a, scalar b Secondary goals: Compute p-values or confidence regions for θ at the end of study Jay Bartroff (USC) Stein s for Stats 4.Jul.17 7 / 36

11 Setup Response Y i R of ith patient depends on known covariate vector x i unknown parameter vector θ R p Primary goal: To test a null hypothesis about θ, e.g., H 0 : θ = 0 H 0 : θ j 0 H 0 : at θ = b, some vector a, scalar b Secondary goals: Compute p-values or confidence regions for θ at the end of study Jay Bartroff (USC) Stein s for Stats 4.Jul.17 7 / 36

12 Setup Response Y i R of ith patient depends on known covariate vector x i unknown parameter vector θ R p Primary goal: To test a null hypothesis about θ, e.g., H 0 : θ = 0 H 0 : θ j 0 H 0 : at θ = b, some vector a, scalar b Secondary goals: Compute p-values or confidence regions for θ at the end of study Jay Bartroff (USC) Stein s for Stats 4.Jul.17 7 / 36

13 Setup: Group sequential analysis For efficiency, ethical, practical, financial reasons, the standard in trials has become group sequential analysis A group sequential trial with at most K groups Group 1: Group 2:. Group K: Y 1,..., Y n1 Y n1 +1,..., Y n2 Y nk 1 +1,..., Y nk Group sequential dominant format for clinical trials since... Beta-Blocker Heart Attack Trial ( BHAT, JAMA 82) Randomized trial of propranolol for heart attack survivors 3837 patients randomized Started June 1978, planned as 4-year study, terminated 8 months early due to observed benefit of propranolol Jay Bartroff (USC) Stein s for Stats 4.Jul.17 8 / 36

14 Setup: Group sequential analysis For efficiency, ethical, practical, financial reasons, the standard in trials has become group sequential analysis A group sequential trial with at most K groups Group 1: Group 2:. Group K: Y 1,..., Y n1 Y n1 +1,..., Y n2 Y nk 1 +1,..., Y nk Group sequential dominant format for clinical trials since... Beta-Blocker Heart Attack Trial ( BHAT, JAMA 82) Randomized trial of propranolol for heart attack survivors 3837 patients randomized Started June 1978, planned as 4-year study, terminated 8 months early due to observed benefit of propranolol Jay Bartroff (USC) Stein s for Stats 4.Jul.17 8 / 36

15 Setup: Group sequential analysis For efficiency, ethical, practical, financial reasons, the standard in trials has become group sequential analysis A group sequential trial with at most K groups Group 1: Group 2:. Group K: Y 1,..., Y n1 Y n1 +1,..., Y n2 Y nk 1 +1,..., Y nk Group sequential dominant format for clinical trials since... Beta-Blocker Heart Attack Trial ( BHAT, JAMA 82) Randomized trial of propranolol for heart attack survivors 3837 patients randomized Started June 1978, planned as 4-year study, terminated 8 months early due to observed benefit of propranolol Jay Bartroff (USC) Stein s for Stats 4.Jul.17 8 / 36

16 Setup: Group sequential analysis Stopping rule related to H 0 likelihood ratio, t-, F-, χ 2 - tests common Of the form: Stop and reject H 0 at stage min{k K : T (Y 1,..., Y nk ) C k } for some statistic T (Y 1,..., Y nk ), often a function of the MLE ˆθ k = ˆθ k (Y 1,..., Y nk ) The joint distribution of needed to ˆθ 1, ˆθ 2,..., ˆθ K choose critical values C k compute p-value at end of study give confidence region for θ at end of study Jay Bartroff (USC) Stein s for Stats 4.Jul.17 9 / 36

17 Setup: Group sequential analysis Stopping rule related to H 0 likelihood ratio, t-, F-, χ 2 - tests common Of the form: Stop and reject H 0 at stage min{k K : T (Y 1,..., Y nk ) C k } for some statistic T (Y 1,..., Y nk ), often a function of the MLE ˆθ k = ˆθ k (Y 1,..., Y nk ) The joint distribution of needed to ˆθ 1, ˆθ 2,..., ˆθ K choose critical values C k compute p-value at end of study give confidence region for θ at end of study Jay Bartroff (USC) Stein s for Stats 4.Jul.17 9 / 36

18 Setup: Group sequential analysis Stopping rule related to H 0 likelihood ratio, t-, F-, χ 2 - tests common Of the form: Stop and reject H 0 at stage min{k K : T (Y 1,..., Y nk ) C k } for some statistic T (Y 1,..., Y nk ), often a function of the MLE ˆθ k = ˆθ k (Y 1,..., Y nk ) The joint distribution of needed to ˆθ 1, ˆθ 2,..., ˆθ K choose critical values C k compute p-value at end of study give confidence region for θ at end of study Jay Bartroff (USC) Stein s for Stats 4.Jul.17 9 / 36

19 Background: Group sequential analysis Jennison & Turnbull (JASA 97) Asymptotic multivariate normal distribution of (ˆθ 1, ˆθ 2,..., ˆθ K ) in a regression setup Y i ind f i (Y i ; x i, θ) nice Asymptotics: n k n k 1 for all k, K fixed E ( θ k ) = θ Independent increments Cov (ˆθ k1, ˆθ k2 ) = Var (ˆθ k2 ) any k 1 k 2 Folk Theorem Normal limit widely (over-)used (software packages, etc.) before Jennison & Turnbull paper Commonly heard: Once n is 5 or so the normal limit kicks in! Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

20 Background: Group sequential analysis Jennison & Turnbull (JASA 97) Asymptotic multivariate normal distribution of (ˆθ 1, ˆθ 2,..., ˆθ K ) in a regression setup Y i ind f i (Y i ; x i, θ) nice Asymptotics: n k n k 1 for all k, K fixed E ( θ k ) = θ Independent increments Cov (ˆθ k1, ˆθ k2 ) = Var (ˆθ k2 ) any k 1 k 2 Folk Theorem Normal limit widely (over-)used (software packages, etc.) before Jennison & Turnbull paper Commonly heard: Once n is 5 or so the normal limit kicks in! Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

21 Background: Group sequential analysis Jennison & Turnbull (JASA 97) Asymptotic multivariate normal distribution of (ˆθ 1, ˆθ 2,..., ˆθ K ) in a regression setup Y i ind f i (Y i ; x i, θ) nice Asymptotics: n k n k 1 for all k, K fixed E ( θ k ) = θ Independent increments Cov (ˆθ k1, ˆθ k2 ) = Var (ˆθ k2 ) any k 1 k 2 Folk Theorem Normal limit widely (over-)used (software packages, etc.) before Jennison & Turnbull paper Commonly heard: Once n is 5 or so the normal limit kicks in! Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

22 Background: Group sequential analysis Jennison & Turnbull (JASA 97) Independent increments Suppose Then Cov (ˆθ k1, ˆθ k2 ) = Var (ˆθ k2 ) any k 1 k 2 H 0 : a t θ = 0, T k = (a t ˆθ k )I k = (a t ˆθ k )[Var (a t ˆθ k )] 1. Cov (T k1, T k2 ) = I k1 I k2 a t Cov (ˆθ k1, ˆθ k2 )a = I k1 I k2 a t Var (ˆθ k2 )a = I k1 I k2 Var (a t ˆθ k2 ) = I k1 I k2 I 1 k 2 = I k1 = Var (T k1 ) Cov (T k1, T k2 T k1 ) = 0 T 1, T 2 T 1,..., T K T K 1 asymptotically independent normals Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

23 Background: Group sequential analysis Jennison & Turnbull (JASA 97) Independent increments Suppose Then Cov (ˆθ k1, ˆθ k2 ) = Var (ˆθ k2 ) any k 1 k 2 H 0 : a t θ = 0, T k = (a t ˆθ k )I k = (a t ˆθ k )[Var (a t ˆθ k )] 1. Cov (T k1, T k2 ) = I k1 I k2 a t Cov (ˆθ k1, ˆθ k2 )a = I k1 I k2 a t Var (ˆθ k2 )a = I k1 I k2 Var (a t ˆθ k2 ) = I k1 I k2 I 1 k 2 = I k1 = Var (T k1 ) Cov (T k1, T k2 T k1 ) = 0 T 1, T 2 T 1,..., T K T K 1 asymptotically independent normals Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

24 Background: Group sequential analysis Jennison & Turnbull (JASA 97) Independent increments Suppose Then Cov (ˆθ k1, ˆθ k2 ) = Var (ˆθ k2 ) any k 1 k 2 H 0 : a t θ = 0, T k = (a t ˆθ k )I k = (a t ˆθ k )[Var (a t ˆθ k )] 1. Cov (T k1, T k2 ) = I k1 I k2 a t Cov (ˆθ k1, ˆθ k2 )a = I k1 I k2 a t Var (ˆθ k2 )a = I k1 I k2 Var (a t ˆθ k2 ) = I k1 I k2 I 1 k 2 = I k1 = Var (T k1 ) Cov (T k1, T k2 T k1 ) = 0 T 1, T 2 T 1,..., T K T K 1 asymptotically independent normals Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

25 Background: Group sequential analysis Jennison & Turnbull (JASA 97) Independent increments Suppose Then Cov (ˆθ k1, ˆθ k2 ) = Var (ˆθ k2 ) any k 1 k 2 H 0 : a t θ = 0, T k = (a t ˆθ k )I k = (a t ˆθ k )[Var (a t ˆθ k )] 1. Cov (T k1, T k2 ) = I k1 I k2 a t Cov (ˆθ k1, ˆθ k2 )a = I k1 I k2 a t Var (ˆθ k2 )a = I k1 I k2 Var (a t ˆθ k2 ) = I k1 I k2 I 1 k 2 = I k1 = Var (T k1 ) Cov (T k1, T k2 T k1 ) = 0 T 1, T 2 T 1,..., T K T K 1 asymptotically independent normals Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

26 New Contributions Extension to group sequential setting of Berry-Esseen bound for multivariate normal limit for smooth functions Anastasiou & Reinert 17: Bounds w/ explicit constants for bounded Wasserstein distance for scalar MLE (K = 1 analysis), Bernoulli. Anastasiou & Ley 17: Bounds for the asymptotic normality of the maximum likelihood estimator using the Delta method, ALEA. Anastasiou 17: Bounds for the normal approximation of the maximum likelihood estimator from m-dependent random variables. Statistics & Probability Letters. Anastasiou & Gaunt 16+: Multivariate normal approximation of the maximum likelihood estimator via the delta method, ArXiv: Anastasiou 15+: Assessing the multivariate normal approximation of the maximum likelihood estimator from high-dimensional, heterogeneous data., ArXiv: Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

27 New Contributions Extension to group sequential setting of Berry-Esseen bound for multivariate normal limit for smooth functions Anastasiou & Reinert 17: Bounds w/ explicit constants for bounded Wasserstein distance for scalar MLE (K = 1 analysis), Bernoulli. Anastasiou & Ley 17: Bounds for the asymptotic normality of the maximum likelihood estimator using the Delta method, ALEA. Anastasiou 17: Bounds for the normal approximation of the maximum likelihood estimator from m-dependent random variables. Statistics & Probability Letters. Anastasiou & Gaunt 16+: Multivariate normal approximation of the maximum likelihood estimator via the delta method, ArXiv: Anastasiou 15+: Assessing the multivariate normal approximation of the maximum likelihood estimator from high-dimensional, heterogeneous data., ArXiv: Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

28 New Contributions Extension to group sequential setting of Berry-Esseen bound for multivariate normal limit for smooth functions Anastasiou & Reinert 17: Bounds w/ explicit constants for bounded Wasserstein distance for scalar MLE (K = 1 analysis), Bernoulli. Anastasiou & Ley 17: Bounds for the asymptotic normality of the maximum likelihood estimator using the Delta method, ALEA. Anastasiou 17: Bounds for the normal approximation of the maximum likelihood estimator from m-dependent random variables. Statistics & Probability Letters. Anastasiou & Gaunt 16+: Multivariate normal approximation of the maximum likelihood estimator via the delta method, ArXiv: Anastasiou 15+: Assessing the multivariate normal approximation of the maximum likelihood estimator from high-dimensional, heterogeneous data., ArXiv: Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

29 New Contributions, Cont d Relaxing independence assumption: Assume log-likelihood of Y k := (Y nk 1 +1,..., Y nk ) is of the form for well-behaved functions f i, g k i G k log f i (Y i, θ) + g k (Y k, θ) g k = 0 gives Jennison & Turnbull s independent setting Some generalized linear mixed models (GLMMs) with random stage effect U k take this form U k = effect due to lab, monitoring board, cohort, etc. Penalized quasi-likelihood (Breslow & Clayton, JASA 93) Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

30 New Contributions, Cont d Relaxing independence assumption: Assume log-likelihood of Y k := (Y nk 1 +1,..., Y nk ) is of the form for well-behaved functions f i, g k i G k log f i (Y i, θ) + g k (Y k, θ) g k = 0 gives Jennison & Turnbull s independent setting Some generalized linear mixed models (GLMMs) with random stage effect U k take this form U k = effect due to lab, monitoring board, cohort, etc. Penalized quasi-likelihood (Breslow & Clayton, JASA 93) Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

31 New Contributions, Cont d Relaxing independence assumption: Assume log-likelihood of Y k := (Y nk 1 +1,..., Y nk ) is of the form for well-behaved functions f i, g k i G k log f i (Y i, θ) + g k (Y k, θ) g k = 0 gives Jennison & Turnbull s independent setting Some generalized linear mixed models (GLMMs) with random stage effect U k take this form U k = effect due to lab, monitoring board, cohort, etc. Penalized quasi-likelihood (Breslow & Clayton, JASA 93) Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

32 New Contributions, Cont d Relaxing independence assumption: Assume log-likelihood of Y k := (Y nk 1 +1,..., Y nk ) is of the form for well-behaved functions f i, g k i G k log f i (Y i, θ) + g k (Y k, θ) g k = 0 gives Jennison & Turnbull s independent setting Some generalized linear mixed models (GLMMs) with random stage effect U k take this form U k = effect due to lab, monitoring board, cohort, etc. Penalized quasi-likelihood (Breslow & Clayton, JASA 93) Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

33 GLMM Example: Poisson regression Letting f µ = Po(µ) density, For Y i in kth stage, Y i U k ind f µi where µ i = exp(β t x i + U k ) {U k } iid h λ θ = (β, λ). Then log-likelihood is K log f µi (Y i )h λ (U k )du k = k=1 i G k where µ i = exp(β t x i ). K log f µi (Y i ) + g k (Y k, θ) i Gk k=1 Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

34 GLMM Example: Poisson regression Letting f µ = Po(µ) density, For Y i in kth stage, Y i U k ind f µi where µ i = exp(β t x i + U k ) {U k } iid h λ θ = (β, λ). Then log-likelihood is K log f µi (Y i )h λ (U k )du k = k=1 i G k where µ i = exp(β t x i ). K log f µi (Y i ) + g k (Y k, θ) i Gk k=1 Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

35 GLMM Example: Poisson regression Letting f µ = Po(µ) density, For Y i in kth stage, Y i U k ind f µi where µ i = exp(β t x i + U k ) {U k } iid h λ θ = (β, λ). Then log-likelihood is K log f µi (Y i )h λ (U k )du k = k=1 i G k where µ i = exp(β t x i ). K log f µi (Y i ) + g k (Y k, θ) i Gk k=1 Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

36 GLMM Example: Poisson regression Letting f µ = Po(µ) density, For Y i in kth stage, Y i U k ind f µi where µ i = exp(β t x i + U k ) {U k } iid h λ θ = (β, λ). Then log-likelihood is K log f µi (Y i )h λ (U k )du k = k=1 i G k where µ i = exp(β t x i ). K log f µi (Y i ) + g k (Y k, θ) i Gk k=1 Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

37 Stein s Method for MVN Approximation Generator approach: Barbour 90, Goetze 91 Size biasing: Goldstein & Rinott 96, Rinott & Rotar 96 Zero biasing: Goldstein & Reinert 05 Exchangeable pair: Chatterjee & Meckes 08, Reinert & Röllin 09 Stein couplings: Fang & Röllin 15 Theorem (Reinert & Röllin 09) If W, W R q exchangeable pair with EW = 0, EWW t = Σ PD, and E(W W W ) = ΛW + R with Λ invertible, then for any 3-times differentiable h : R q R, Eh(W ) Eh(Σ 1/2 Z ) a h 2 4 for certain explicit constants a, b, c. + b h c ( h 1 + q 2 Σ 1/2 h 2 ) Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

38 Stein s Method for MVN Approximation Generator approach: Barbour 90, Goetze 91 Size biasing: Goldstein & Rinott 96, Rinott & Rotar 96 Zero biasing: Goldstein & Reinert 05 Exchangeable pair: Chatterjee & Meckes 08, Reinert & Röllin 09 Stein couplings: Fang & Röllin 15 Theorem (Reinert & Röllin 09) If W, W R q exchangeable pair with EW = 0, EWW t = Σ PD, and E(W W W ) = ΛW + R with Λ invertible, then for any 3-times differentiable h : R q R, Eh(W ) Eh(Σ 1/2 Z ) a h 2 4 for certain explicit constants a, b, c. + b h c ( h 1 + q 2 Σ 1/2 h 2 ) Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

39 Bounds to normal for ˆθ K := (ˆθ 1, ˆθ 2,..., ˆθ K ) Approach: Apply Reinert & Röllin 09 result with W = score function increments to get smooth function bounds to normal. Result In the group sequential setup above, if the Y i are independent or follow GLMMs with the log-likelihood of the kth group data Y k = (Y nk 1 +1,..., Y nk ) of the form i G k log f i (Y i, θ) + g k (Y k, θ), then under regularity conditions on the f i and g k there are a, b, c, d s.t. Eh(J 1/2 (ˆθ K θ K )) Eh(Z ) ak 2 J 1/2 2 h 2 + bk 3 J 1/2 3 h ck J 1/2 ( h 1 + pk 2 ) 2 Σ 1/2 J 1/2 h 2 + d. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

40 Bounds to normal for ˆθ K := (ˆθ 1, ˆθ 2,..., ˆθ K ) Approach: Apply Reinert & Röllin 09 result with W = score function increments to get smooth function bounds to normal. Result In the group sequential setup above, if the Y i are independent or follow GLMMs with the log-likelihood of the kth group data Y k = (Y nk 1 +1,..., Y nk ) of the form i G k log f i (Y i, θ) + g k (Y k, θ), then under regularity conditions on the f i and g k there are a, b, c, d s.t. Eh(J 1/2 (ˆθ K θ K )) Eh(Z ) ak 2 J 1/2 2 h 2 + bk 3 J 1/2 3 h ck J 1/2 ( h 1 + pk 2 ) 2 Σ 1/2 J 1/2 h 2 + d. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

41 Comments on result Bound Eh(J 1/2 (ˆθ K θ K )) Eh(Z ) ak 2 J 1/2 2 h 2 + bk 3 J 1/2 3 h ck J 1/2 ( h 1 + pk 2 ) 2 Σ 1/2 J 1/2 h 2 + d. a, b, c terms directly from Reinert & Röllin 09 bound c term Var(R) in E(W W W ) = ΛW + R, vanishes in independent case d term is from Taylor Series remainders Rate O(1/ n K ) under usual asymptotic n k n k 1 n K γ k (0, 1) Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

42 Comments on result Bound Eh(J 1/2 (ˆθ K θ K )) Eh(Z ) ak 2 J 1/2 2 h 2 + bk 3 J 1/2 3 h ck J 1/2 ( h 1 + pk 2 ) 2 Σ 1/2 J 1/2 h 2 + d. a, b, c terms directly from Reinert & Röllin 09 bound c term Var(R) in E(W W W ) = ΛW + R, vanishes in independent case d term is from Taylor Series remainders Rate O(1/ n K ) under usual asymptotic n k n k 1 n K γ k (0, 1) Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

43 Sketch of Proof Independent Case Score statistic S i (θ) = θ log f i(y i, θ) R p, W = S i (θ),..., S i (θ) R q, i G1 i G K where q = pk. Fisher Information ( ) J i (θ) = E θ S i(θ) t R p p J(θ 1,..., θ K ) = diag ( n1 J i (θ 1 ),..., i=1 n K i=1 J i (θ K ) ) R q q Σ := Var(W ) = diag J i (θ),..., J i (θ) R q q i G1 i G K Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

44 Sketch of Proof Independent Case Score statistic S i (θ) = θ log f i(y i, θ) R p, W = S i (θ),..., S i (θ) R q, i G1 i G K where q = pk. Fisher Information ( ) J i (θ) = E θ S i(θ) t R p p J(θ 1,..., θ K ) = diag ( n1 J i (θ 1 ),..., i=1 n K i=1 J i (θ K ) ) R q q Σ := Var(W ) = diag J i (θ),..., J i (θ) R q q i G1 i G K Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

45 Sketch of Proof Independent Case Score statistic S i (θ) = θ log f i(y i, θ) R p, W = S i (θ),..., S i (θ) R q, i G1 i G K where q = pk. Fisher Information ( ) J i (θ) = E θ S i(θ) t R p p J(θ 1,..., θ K ) = diag ( n1 J i (θ 1 ),..., i=1 n K i=1 J i (θ K ) ) R q q Σ := Var(W ) = diag J i (θ),..., J i (θ) R q q i G1 i G K Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

46 Sketch of Proof: Exchangeable pair Independent Case 1. Choose i {1,..., n K } uniformly, independent of all else 2. Replace Y i by independent copy Y i (keeping x i ), call result W W, W exchangeable W, W satisfy linearity condition which is easy to check entry-wise E(W W W ) = n 1 K W Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

47 Sketch of Proof: Exchangeable pair Independent Case 1. Choose i {1,..., n K } uniformly, independent of all else 2. Replace Y i by independent copy Y i (keeping x i ), call result W W, W exchangeable W, W satisfy linearity condition which is easy to check entry-wise E(W W W ) = n 1 K W Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

48 Sketch of Proof: Exchangeable pair Independent Case 1. Choose i {1,..., n K } uniformly, independent of all else 2. Replace Y i by independent copy Y i (keeping x i ), call result W W, W exchangeable W, W satisfy linearity condition which is easy to check entry-wise E(W W W ) = n 1 K W Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

49 Sketch of Proof: Exchangeable pair Independent Case 1. Choose i {1,..., n K } uniformly, independent of all else 2. Replace Y i by independent copy Y i (keeping x i ), call result W W, W exchangeable W, W satisfy linearity condition which is easy to check entry-wise E(W W W ) = n 1 K W Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

50 Sketch of Proof: Relating ˆθ K to W Independent Case By standard Taylor series, ˆθ K θ K = J(θ K ) 1 S, where S = ( n1 S i (θ 1 ),..., n K i=1 i=1 S i (θ K ) and θ K R q on line segment connecting ˆθ K, θ K. Then Eh(J 1/2 (ˆθ K θ K )) Eh(Z ) Eh(J 1/2 S) Eh(Z ) + Eh(J 1/2 J(θ K ) 1 S) Eh(J 1/2 S) ) R q Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

51 Sketch of Proof: Relating ˆθ K to W Independent Case By standard Taylor series, ˆθ K θ K = J(θ K ) 1 S, where S = ( n1 S i (θ 1 ),..., n K i=1 i=1 S i (θ K ) and θ K R q on line segment connecting ˆθ K, θ K. Then Eh(J 1/2 (ˆθ K θ K )) Eh(Z ) Eh(J 1/2 S) Eh(Z ) + Eh(J 1/2 J(θ K ) 1 S) Eh(J 1/2 S) ) R q Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

52 Sketch of Proof: Relating ˆθ K to W Independent Case Using S = AW where 1 p 0 p 0 p 1 p 1 p 0 p A = p 1 p 1 p 1st term is, 1 p, 0 p R p p identity and 0 matrices, Eh(J 1/2 S) Eh(Z ) = E h(w ) E h(σ 1/2 Z ) where h(w) = h(j 1/2 Aw), then apply Reinert-Röllin and simplify. 2nd term is bounded by Taylor series arguments. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

53 Sketch of Proof: Relating ˆθ K to W Independent Case Using S = AW where 1 p 0 p 0 p 1 p 1 p 0 p A = p 1 p 1 p 1st term is, 1 p, 0 p R p p identity and 0 matrices, Eh(J 1/2 S) Eh(Z ) = E h(w ) E h(σ 1/2 Z ) where h(w) = h(j 1/2 Aw), then apply Reinert-Röllin and simplify. 2nd term is bounded by Taylor series arguments. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

54 Sketch of Proof: Exchangeable pair GLMM Case 1. Choose i {1,..., n K } uniformly, independent of Y 1,..., Y nk 2. If i in kth group, replace Y i by independent copy Y i with mean ϕ(β t x i + U k ), ϕ 1 = link function (same covariates x i, group effect U k ), call result W W, W exchangeable W, W satisfy linearity condition where R = R(g 1,..., g K ) E(W W W ) = n 1 K W + R Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

55 Sketch of Proof: Exchangeable pair GLMM Case 1. Choose i {1,..., n K } uniformly, independent of Y 1,..., Y nk 2. If i in kth group, replace Y i by independent copy Y i with mean ϕ(β t x i + U k ), ϕ 1 = link function (same covariates x i, group effect U k ), call result W W, W exchangeable W, W satisfy linearity condition where R = R(g 1,..., g K ) E(W W W ) = n 1 K W + R Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

56 Sketch of Proof: Exchangeable pair GLMM Case 1. Choose i {1,..., n K } uniformly, independent of Y 1,..., Y nk 2. If i in kth group, replace Y i by independent copy Y i with mean ϕ(β t x i + U k ), ϕ 1 = link function (same covariates x i, group effect U k ), call result W W, W exchangeable W, W satisfy linearity condition where R = R(g 1,..., g K ) E(W W W ) = n 1 K W + R Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

57 Sketch of Proof: Exchangeable pair GLMM Case 1. Choose i {1,..., n K } uniformly, independent of Y 1,..., Y nk 2. If i in kth group, replace Y i by independent copy Y i with mean ϕ(β t x i + U k ), ϕ 1 = link function (same covariates x i, group effect U k ), call result W W, W exchangeable W, W satisfy linearity condition where R = R(g 1,..., g K ) E(W W W ) = n 1 K W + R Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

58 Sketch of Proof: Exchangeable pair GLMM Case 1. Choose i {1,..., n K } uniformly, independent of Y 1,..., Y nk 2. If i in kth group, replace Y i by independent copy Y i with mean ϕ(β t x i + U k ), ϕ 1 = link function (same covariates x i, group effect U k ), call result W W, W exchangeable W, W satisfy linearity condition where R = R(g 1,..., g K ) E(W W W ) = n 1 K W + R Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

59 Concentration and Coupling Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

60 Application 2: Concentration inequalities for occupancy models with log-concave marginals Main idea: How to get bounded, size biased couplings for certain multivariate occupancy models, then use methods pioneered by Ghosh and Goldstein 11 to get concentration inequalities Concentration Inequalities ( t 2 ) e.g., P (Y µ t) exp 2cµ + ct Widely used in high dimensional statistics machine learning random matrix theory applications: wireless communications, physics,... (See Raginsky & Sason 15) Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

61 Application 2: Concentration inequalities for occupancy models with log-concave marginals Main idea: How to get bounded, size biased couplings for certain multivariate occupancy models, then use methods pioneered by Ghosh and Goldstein 11 to get concentration inequalities Concentration Inequalities ( t 2 ) e.g., P (Y µ t) exp 2cµ + ct Widely used in high dimensional statistics machine learning random matrix theory applications: wireless communications, physics,... (See Raginsky & Sason 15) Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

62 Setup Occupancy model M = (M α ) M α may be degree count of vertex α in an Erdös-Rényi random graph # of balls in box α in multinomial model # balls of color α in sample from urn of colored balls We consider statistics like m m Y ge = 1{M α d}, Y eq = 1{M α = d} Y ge = α=1 m w α 1{M α d α }, Y eq = α=1 α=1 α=1 m w α 1{M α = d α } Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

63 Setup Occupancy model M = (M α ) M α may be degree count of vertex α in an Erdös-Rényi random graph # of balls in box α in multinomial model # balls of color α in sample from urn of colored balls We consider statistics like m m Y ge = 1{M α d}, Y eq = 1{M α = d} Y ge = α=1 m w α 1{M α d α }, Y eq = α=1 α=1 α=1 m w α 1{M α = d α } Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

64 Setup Occupancy model M = (M α ) M α may be degree count of vertex α in an Erdös-Rényi random graph # of balls in box α in multinomial model # balls of color α in sample from urn of colored balls We consider statistics like m m Y ge = 1{M α d}, Y eq = 1{M α = d} Y ge = α=1 m w α 1{M α d α }, Y eq = α=1 α=1 α=1 m w α 1{M α = d α } Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

65 Setup Occupancy model M = (M α ) M α may be degree count of vertex α in an Erdös-Rényi random graph # of balls in box α in multinomial model # balls of color α in sample from urn of colored balls We consider statistics like m m Y ge = 1{M α d}, Y eq = 1{M α = d} Y ge = α=1 m w α 1{M α d α }, Y eq = α=1 α=1 α=1 m w α 1{M α = d α } Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

66 Some Methods for Concentration Inequalities McDiarmid s Bounded Difference Inequality Y a function with bounded differences of independent inputs Negative Association e.g., Dubashi & Ranjan 98 Certifiable Functions Y a certifiable function of independent inputs Controlling a large enough subset of inputs certifies value of function e.g., McDiarmid & Reed 06 Bounded Size Bias Couplings Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

67 Some Methods for Concentration Inequalities McDiarmid s Bounded Difference Inequality Y a function with bounded differences of independent inputs Negative Association e.g., Dubashi & Ranjan 98 Certifiable Functions Y a certifiable function of independent inputs Controlling a large enough subset of inputs certifies value of function e.g., McDiarmid & Reed 06 Bounded Size Bias Couplings Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

68 Some Methods for Concentration Inequalities McDiarmid s Bounded Difference Inequality Y a function with bounded differences of independent inputs Negative Association e.g., Dubashi & Ranjan 98 Certifiable Functions Y a certifiable function of independent inputs Controlling a large enough subset of inputs certifies value of function e.g., McDiarmid & Reed 06 Bounded Size Bias Couplings Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

69 Some Methods for Concentration Inequalities McDiarmid s Bounded Difference Inequality Y a function with bounded differences of independent inputs Negative Association e.g., Dubashi & Ranjan 98 Certifiable Functions Y a certifiable function of independent inputs Controlling a large enough subset of inputs certifies value of function e.g., McDiarmid & Reed 06 Bounded Size Bias Couplings Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

70 Bounded Size Bias Couplings If there is a coupling Y s of Y with the Y -size bias distribution, i.e., E[Yf (Y )] = µe[f (Y s )] for all f, and Y s Y + c for some c > 0 with probability one, then max {P(Y µ t), P(Y µ t)} b µ,c (t). Ghosh & Goldstein 11: For all t > 0, ) ( P (Y µ t) exp ( t2 t 2 P (Y µ t) exp 2cµ 2cµ + ct b exponential as t. Arratia & Baxendale 13: b µ,c (t) = exp b Poisson as t. ( µ c h ( t µ )), h(x) = (1 + x) log(1 + x) x. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36 ).

71 Bounded Size Bias Couplings If there is a coupling Y s of Y with the Y -size bias distribution, i.e., E[Yf (Y )] = µe[f (Y s )] for all f, and Y s Y + c for some c > 0 with probability one, then max {P(Y µ t), P(Y µ t)} b µ,c (t). Ghosh & Goldstein 11: For all t > 0, ) ( P (Y µ t) exp ( t2 t 2 P (Y µ t) exp 2cµ 2cµ + ct b exponential as t. Arratia & Baxendale 13: b µ,c (t) = exp b Poisson as t. ( µ c h ( t µ )), h(x) = (1 + x) log(1 + x) x. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36 ).

72 Bounded Size Bias Couplings If there is a coupling Y s of Y with the Y -size bias distribution, i.e., E[Yf (Y )] = µe[f (Y s )] for all f, and Y s Y + c for some c > 0 with probability one, then max {P(Y µ t), P(Y µ t)} b µ,c (t). Ghosh & Goldstein 11: For all t > 0, ) ( P (Y µ t) exp ( t2 t 2 P (Y µ t) exp 2cµ 2cµ + ct b exponential as t. Arratia & Baxendale 13: b µ,c (t) = exp b Poisson as t. ( µ c h ( t µ )), h(x) = (1 + x) log(1 + x) x. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36 ).

73 Main Result M = (M α ) α [m], M α lattice log-concave Y ge = w α 1{M α d α }, Y ne = w α 1{M α d α }. α [m] α [m] Main Result (in words) 1. If M is bounded from below and can be closely coupled to a version M having the same distribution conditional on M α = M α + 1, then there is a bounded size biased coupling Y s ge Y ge + C and the above concentration inequalities hold. 2. If M is non-degenerate at (d α ) and can be closely coupled to a version M having the same distribution conditional on M α d α, then there is a bounded size biased coupling Y s ne Y ne + C and the above concentration inequalities hold. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

74 Main Result M = (M α ) α [m], M α lattice log-concave Y ge = w α 1{M α d α }, Y ne = w α 1{M α d α }. α [m] α [m] Main Result (in words) 1. If M is bounded from below and can be closely coupled to a version M having the same distribution conditional on M α = M α + 1, then there is a bounded size biased coupling Y s ge Y ge + C and the above concentration inequalities hold. 2. If M is non-degenerate at (d α ) and can be closely coupled to a version M having the same distribution conditional on M α d α, then there is a bounded size biased coupling Y s ne Y ne + C and the above concentration inequalities hold. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

75 Main Result M = (M α ) α [m], M α lattice log-concave Y ge = w α 1{M α d α }, Y ne = w α 1{M α d α }. α [m] α [m] Main Result (in words) 1. If M is bounded from below and can be closely coupled to a version M having the same distribution conditional on M α = M α + 1, then there is a bounded size biased coupling Y s ge Y ge + C and the above concentration inequalities hold. 2. If M is non-degenerate at (d α ) and can be closely coupled to a version M having the same distribution conditional on M α d α, then there is a bounded size biased coupling Y s ne Y ne + C and the above concentration inequalities hold. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

76 Main Result A few more details on Part 1 M = f (U) where U is some collection of random variables f is measurable Closely coupled means given U k L(V k ) := L(U M α k), there is coupling U + k and constant B such that L(U + k U k) = L(V k M + k,α = M k,α + 1) and Y + k,ge, α Y k,ge, α + B, where Y k,ge, α = β α 1(M k,β d β ). The constant is C = w (B d + 1) where w = max w α, d = max d α. Part 2 is similar. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

77 Main Result A few more details on Part 1 M = f (U) where U is some collection of random variables f is measurable Closely coupled means given U k L(V k ) := L(U M α k), there is coupling U + k and constant B such that L(U + k U k) = L(V k M + k,α = M k,α + 1) and Y + k,ge, α Y k,ge, α + B, where Y k,ge, α = β α 1(M k,β d β ). The constant is C = w (B d + 1) where w = max w α, d = max d α. Part 2 is similar. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

78 Main Result A few more details on Part 1 M = f (U) where U is some collection of random variables f is measurable Closely coupled means given U k L(V k ) := L(U M α k), there is coupling U + k and constant B such that L(U + k U k) = L(V k M + k,α = M k,α + 1) and Y + k,ge, α Y k,ge, α + B, where Y k,ge, α = β α 1(M k,β d β ). The constant is C = w (B d + 1) where w = max w α, d = max d α. Part 2 is similar. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

79 Main Result A few more details on Part 1 M = f (U) where U is some collection of random variables f is measurable Closely coupled means given U k L(V k ) := L(U M α k), there is coupling U + k and constant B such that L(U + k U k) = L(V k M + k,α = M k,α + 1) and Y + k,ge, α Y k,ge, α + B, where Y k,ge, α = β α 1(M k,β d β ). The constant is C = w (B d + 1) where w = max w α, d = max d α. Part 2 is similar. Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

80 Main Result Main Ingredient in Proof Incrementing Lemma If M is lattice log-concave then there is π(x, d) [0, 1] such that if M L(M M d) and B M Bern(π(M, d)), then M + B L(M M d + 1). Extension of Goldstein & Penrose 10 for M Binomial, d = 0 Analogous versions for L(M M d) L(M M d 1) L(M) L(M M d) where means coupled to Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

81 Main Result Main Ingredient in Proof Incrementing Lemma If M is lattice log-concave then there is π(x, d) [0, 1] such that if M L(M M d) and B M Bern(π(M, d)), then M + B L(M M d + 1). Extension of Goldstein & Penrose 10 for M Binomial, d = 0 Analogous versions for L(M M d) L(M M d 1) L(M) L(M M d) where means coupled to Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

82 Example 1: Erdös-Rényi random graph m vertices Independent edges with probability p α,β = p β,α [0, 1). Constructing U + k from U k: 1. Selection non-neighbor β of α with probability 2. Add edge connecting β to α p α,β 1 p α,β This affects at most 1 other vertex so B = 1 and Y s ge Y ge + w ( d + 1). Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

83 Example 1: Erdös-Rényi random graph m vertices Independent edges with probability p α,β = p β,α [0, 1). Constructing U + k from U k: 1. Selection non-neighbor β of α with probability 2. Add edge connecting β to α p α,β 1 p α,β This affects at most 1 other vertex so B = 1 and Y s ge Y ge + w ( d + 1). Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

84 Example 1: Erdös-Rényi random graph m vertices Independent edges with probability p α,β = p β,α [0, 1). Constructing U + k from U k: 1. Selection non-neighbor β of α with probability 2. Add edge connecting β to α p α,β 1 p α,β This affects at most 1 other vertex so B = 1 and Y s ge Y ge + w ( d + 1). Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

85 Example 1: Erdös-Rényi random graph Applying this with d α = d, w α = 1, ( t 2 ) P(Y ge µ t) exp 2(d + 1)µ ( t 2 ) exp 2(d + 1)m Compare with McDiarmid s bounded difference inequality: so Y ge = f (X 1,..., X ( m 2) ), X i = 1{edge between vertex pair i}, sup f (X 1,..., X i,..., X X i,x ( m i 2) ) f (X 1,..., X i,..., X ( m 2) ) 2, ( t 2 ) P(Y ge µ t) exp 4m(m 1) New bound an improvement for m > 2d + 3 Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

86 Example 1: Erdös-Rényi random graph Applying this with d α = d, w α = 1, ( t 2 ) P(Y ge µ t) exp 2(d + 1)µ ( t 2 ) exp 2(d + 1)m Compare with McDiarmid s bounded difference inequality: so Y ge = f (X 1,..., X ( m 2) ), X i = 1{edge between vertex pair i}, sup f (X 1,..., X i,..., X X i,x ( m i 2) ) f (X 1,..., X i,..., X ( m 2) ) 2, ( t 2 ) P(Y ge µ t) exp 4m(m 1) New bound an improvement for m > 2d + 3 Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

87 Example 1: Erdös-Rényi random graph Applying this with d α = d, w α = 1, ( t 2 ) P(Y ge µ t) exp 2(d + 1)µ ( t 2 ) exp 2(d + 1)m Compare with McDiarmid s bounded difference inequality: so Y ge = f (X 1,..., X ( m 2) ), X i = 1{edge between vertex pair i}, sup f (X 1,..., X i,..., X X i,x ( m i 2) ) f (X 1,..., X i,..., X ( m 2) ) 2, ( t 2 ) P(Y ge µ t) exp 4m(m 1) New bound an improvement for m > 2d + 3 Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

88 Example 1: Erdös-Rényi random graph Applying this with d α = d, w α = 1, ( t 2 ) P(Y ge µ t) exp 2(d + 1)µ ( t 2 ) exp 2(d + 1)m Compare with McDiarmid s bounded difference inequality: so Y ge = f (X 1,..., X ( m 2) ), X i = 1{edge between vertex pair i}, sup f (X 1,..., X i,..., X X i,x ( m i 2) ) f (X 1,..., X i,..., X ( m 2) ) 2, ( t 2 ) P(Y ge µ t) exp 4m(m 1) New bound an improvement for m > 2d + 3 Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

89 Example 1: Erdös-Rényi random graph Applying this with d α = d, w α = 1, ( t 2 ) P(Y ge µ t) exp 2(d + 1)µ ( t 2 ) exp 2(d + 1)m Compare with McDiarmid s bounded difference inequality: so Y ge = f (X 1,..., X ( m 2) ), X i = 1{edge between vertex pair i}, sup f (X 1,..., X i,..., X X i,x ( m i 2) ) f (X 1,..., X i,..., X ( m 2) ) 2, ( t 2 ) P(Y ge µ t) exp 4m(m 1) New bound an improvement for m > 2d + 3 Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

90 Example 2: Multinomial Counts n balls independently into m boxes Applications in species trapping, linguistics,... # empty boxes proved asymptotically normal by Weiss 58, Rényi 62 in uniform case Englund 81: L bound for # of empty cells, uniform case Dubashi & Ranjan 98: Concentration inequality via NA Penrose 09: L bound for # of isolated balls, uniform and nonuniform cases Bartroff & Goldstein 13: L bound for all d 2, uniform case Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

91 Example 2: Multinomial Counts n balls independently into m boxes Applications in species trapping, linguistics,... # empty boxes proved asymptotically normal by Weiss 58, Rényi 62 in uniform case Englund 81: L bound for # of empty cells, uniform case Dubashi & Ranjan 98: Concentration inequality via NA Penrose 09: L bound for # of isolated balls, uniform and nonuniform cases Bartroff & Goldstein 13: L bound for all d 2, uniform case Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

92 Example 2: Multinomial Counts n balls independently into m boxes Applications in species trapping, linguistics,... # empty boxes proved asymptotically normal by Weiss 58, Rényi 62 in uniform case Englund 81: L bound for # of empty cells, uniform case Dubashi & Ranjan 98: Concentration inequality via NA Penrose 09: L bound for # of isolated balls, uniform and nonuniform cases Bartroff & Goldstein 13: L bound for all d 2, uniform case Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

93 Example 2: Multinomial Counts n balls independently into m boxes Applications in species trapping, linguistics,... # empty boxes proved asymptotically normal by Weiss 58, Rényi 62 in uniform case Englund 81: L bound for # of empty cells, uniform case Dubashi & Ranjan 98: Concentration inequality via NA Penrose 09: L bound for # of isolated balls, uniform and nonuniform cases Bartroff & Goldstein 13: L bound for all d 2, uniform case Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

94 Example 2: Multinomial Counts n balls independently into m boxes Applications in species trapping, linguistics,... # empty boxes proved asymptotically normal by Weiss 58, Rényi 62 in uniform case Englund 81: L bound for # of empty cells, uniform case Dubashi & Ranjan 98: Concentration inequality via NA Penrose 09: L bound for # of isolated balls, uniform and nonuniform cases Bartroff & Goldstein 13: L bound for all d 2, uniform case Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

95 Example 2: Multinomial Counts n balls independently into m boxes Applications in species trapping, linguistics,... # empty boxes proved asymptotically normal by Weiss 58, Rényi 62 in uniform case Englund 81: L bound for # of empty cells, uniform case Dubashi & Ranjan 98: Concentration inequality via NA Penrose 09: L bound for # of isolated balls, uniform and nonuniform cases Bartroff & Goldstein 13: L bound for all d 2, uniform case Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

96 Example 2: Multinomial Counts n balls independently into m boxes Applications in species trapping, linguistics,... # empty boxes proved asymptotically normal by Weiss 58, Rényi 62 in uniform case Englund 81: L bound for # of empty cells, uniform case Dubashi & Ranjan 98: Concentration inequality via NA Penrose 09: L bound for # of isolated balls, uniform and nonuniform cases Bartroff & Goldstein 13: L bound for all d 2, uniform case Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

97 Example 2: Multinomial Counts p α,j = prob. that ball j [n] falls in box α [m] M α = # balls in box α = 1{ball j falls in box α} j [n] Constructing U + k from U k: Choose ball j box α with probability p α,j 1 p α,j and add it to box α. Y s ge, α Y ge, α so B = 0, thus Y s ge Y ge + w Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

98 Example 2: Multinomial Counts p α,j = prob. that ball j [n] falls in box α [m] M α = # balls in box α = 1{ball j falls in box α} j [n] Constructing U + k from U k: Choose ball j box α with probability p α,j 1 p α,j and add it to box α. Y s ge, α Y ge, α so B = 0, thus Y s ge Y ge + w Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

99 Example 2: Multinomial Counts p α,j = prob. that ball j [n] falls in box α [m] M α = # balls in box α = 1{ball j falls in box α} j [n] Constructing U + k from U k: Choose ball j box α with probability p α,j 1 p α,j and add it to box α. Y s ge, α Y ge, α so B = 0, thus Y s ge Y ge + w Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

100 Example 3: Multivariate Hypergeometric Sampling Urn with n = α [m] n α colored balls, n α balls of color α Sample of size s drawn without replacement M α = # balls in sample of color α Applications in sampling (and subsampling) theory, gambling, coupon-collector problems Constructing U + k probability from U k: Select non-α colored ball in sample with n α(j)/n and replace it with α-colored ball, α(j) = color of ball j 1 n α(j) /n Y s ge, α Y ge, α so B = 0, thus Y s ge Y ge + w Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

101 Example 3: Multivariate Hypergeometric Sampling Urn with n = α [m] n α colored balls, n α balls of color α Sample of size s drawn without replacement M α = # balls in sample of color α Applications in sampling (and subsampling) theory, gambling, coupon-collector problems Constructing U + k probability from U k: Select non-α colored ball in sample with n α(j)/n and replace it with α-colored ball, α(j) = color of ball j 1 n α(j) /n Y s ge, α Y ge, α so B = 0, thus Y s ge Y ge + w Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

102 Example 3: Multivariate Hypergeometric Sampling Urn with n = α [m] n α colored balls, n α balls of color α Sample of size s drawn without replacement M α = # balls in sample of color α Applications in sampling (and subsampling) theory, gambling, coupon-collector problems Constructing U + k probability from U k: Select non-α colored ball in sample with n α(j)/n and replace it with α-colored ball, α(j) = color of ball j 1 n α(j) /n Y s ge, α Y ge, α so B = 0, thus Y s ge Y ge + w Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

103 Example 3: Multivariate Hypergeometric Sampling Urn with n = α [m] n α colored balls, n α balls of color α Sample of size s drawn without replacement M α = # balls in sample of color α Applications in sampling (and subsampling) theory, gambling, coupon-collector problems Constructing U + k probability from U k: Select non-α colored ball in sample with n α(j)/n and replace it with α-colored ball, α(j) = color of ball j 1 n α(j) /n Y s ge, α Y ge, α so B = 0, thus Y s ge Y ge + w Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

104 Example 3: Multivariate Hypergeometric Sampling Urn with n = α [m] n α colored balls, n α balls of color α Sample of size s drawn without replacement M α = # balls in sample of color α Applications in sampling (and subsampling) theory, gambling, coupon-collector problems Constructing U + k probability from U k: Select non-α colored ball in sample with n α(j)/n and replace it with α-colored ball, α(j) = color of ball j 1 n α(j) /n Y s ge, α Y ge, α so B = 0, thus Y s ge Y ge + w Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

105 Example 3: Multivariate Hypergeometric Sampling Urn with n = α [m] n α colored balls, n α balls of color α Sample of size s drawn without replacement M α = # balls in sample of color α Applications in sampling (and subsampling) theory, gambling, coupon-collector problems Constructing U + k probability from U k: Select non-α colored ball in sample with n α(j)/n and replace it with α-colored ball, α(j) = color of ball j 1 n α(j) /n Y s ge, α Y ge, α so B = 0, thus Y s ge Y ge + w Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

106 Summary Stein s method applied to produce explicit bounds to the limiting normal distribution for repeatedly-computed MLE (ˆθ 1, ˆθ 2,..., ˆθ K ) of a parameter vector in group sequential trials concentration inequalities for a class of occupancy models with log-concave marginals Many unanswered questions in statistics possibly susceptible to Stein s concentration inequalities for heavy-tailed distributions convergence of empirical measures and dimension reduction methods projections of empirical measures onto subspaces high-dimensional PCA other problems in sequential analysis sequentially stopped test statistics stopping rules for high-dimensional MCMC Thank You! Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

107 Summary Stein s method applied to produce explicit bounds to the limiting normal distribution for repeatedly-computed MLE (ˆθ 1, ˆθ 2,..., ˆθ K ) of a parameter vector in group sequential trials concentration inequalities for a class of occupancy models with log-concave marginals Many unanswered questions in statistics possibly susceptible to Stein s concentration inequalities for heavy-tailed distributions convergence of empirical measures and dimension reduction methods projections of empirical measures onto subspaces high-dimensional PCA other problems in sequential analysis sequentially stopped test statistics stopping rules for high-dimensional MCMC Thank You! Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

108 Summary Stein s method applied to produce explicit bounds to the limiting normal distribution for repeatedly-computed MLE (ˆθ 1, ˆθ 2,..., ˆθ K ) of a parameter vector in group sequential trials concentration inequalities for a class of occupancy models with log-concave marginals Many unanswered questions in statistics possibly susceptible to Stein s concentration inequalities for heavy-tailed distributions convergence of empirical measures and dimension reduction methods projections of empirical measures onto subspaces high-dimensional PCA other problems in sequential analysis sequentially stopped test statistics stopping rules for high-dimensional MCMC Thank You! Jay Bartroff (USC) Stein s for Stats 4.Jul / 36

Stein Couplings for Concentration of Measure

Stein Couplings for Concentration of Measure Jay Bartroff, Subhankar Ghosh, Larry Goldstein and Ümit Işlak University of Southern California [arxiv:0906.3886] [arxiv:1304.5001] [arxiv:1402.6769] Borchard