Introduction to Self-normalized Limit Theory

Similar documents
Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou

In Memory of Wenbo V Li s Contributions

Almost Sure Central Limit Theorem for Self-Normalized Partial Sums of Negatively Associated Random Variables

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

On the Breiman conjecture

RELATIVE ERRORS IN CENTRAL LIMIT THEOREMS FOR STUDENT S t STATISTIC, WITH APPLICATIONS

Lower Tail Probabilities and Normal Comparison Inequalities. In Memory of Wenbo V. Li s Contributions

EXACT CONVERGENCE RATE AND LEADING TERM IN THE CENTRAL LIMIT THEOREM FOR U-STATISTICS

Self-normalized Cramér type moderate deviations for the maximum of sums

STA205 Probability: Week 8 R. Wolpert

Matrix concentration inequalities

Tail bound inequalities and empirical likelihood for the mean

Concentration inequalities and the entropy method

Asymptotic Statistics-III. Changliang Zou

arxiv: v1 [math.pr] 14 Sep 2007

On large deviations for combinatorial sums

Lecture 8: Information Theory and Statistics

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

SDS : Theoretical Statistics

The Convergence Rate for the Normal Approximation of Extreme Sums

Transforms. Convergence of probability generating functions. Convergence of characteristic functions functions

Selected Exercises on Expectations and Some Probability Inequalities

An improved result in almost sure central limit theorem for self-normalized products of partial sums

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

Cramér type moderate deviations for trimmed L-statistics

Victor H. de la Pena ~. Tze Leung Lai. Qi-Man Shao. Self-Normalized Processes. Limit Theory and Statistical Applications ABC

Complexity of two and multi-stage stochastic programming problems

Mathematical Statistics

A BERRY ESSEEN BOUND FOR STUDENT S STATISTIC IN THE NON-I.I.D. CASE. 1. Introduction and Results

Econometrics I, Estimation

Advanced Statistics II: Non Parametric Tests

A generalization of Cramér large deviations for martingales

STAT 200C: High-dimensional Statistics

Brief Review on Estimation Theory

MODERATE DEVIATIONS IN POISSON APPROXIMATION: A FIRST ATTEMPT

Lower Tail Probabilities and Related Problems

A Law of the Iterated Logarithm. for Grenander s Estimator

Asymptotic results for empirical measures of weighted sums of independent random variables

Experience Rating in General Insurance by Credibility Estimation

Bahadur representations for bootstrap quantiles 1

Probability and Measure

Deviation inequalities for martingales with applications

A generalization of Strassen s functional LIL

Normal Approximation for Non-linear Statistics Using a Concentration Inequality Approach

Asymptotic results for empirical measures of weighted sums of independent random variables

On probabilities of large and moderate deviations for L-statistics: a survey of some recent developments

A Gentle Introduction to Stein s Method for Normal Approximation I

Last Lecture. Biostatistics Statistical Inference Lecture 14 Obtaining Best Unbiased Estimator. Related Theorems. Rao-Blackwell Theorem

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 8: Importance Sampling

Lecture 7 Introduction to Statistical Decision Theory

Lecture 5: Importance sampling and Hamilton-Jacobi equations

Exercises with solutions (Set D)

Asymptotics of minimax stochastic programs

Sharp large deviation results for sums of independent random variables

P (A G) dp G P (A G)

Limiting Distributions

Non-uniform Berry Esseen Bounds for Weighted U-Statistics and Generalized L-Statistics

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

Concentration of Measures by Bounded Size Bias Couplings

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring 2018 version. Luc Rey-Bellet

Limiting Distributions

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Stat 710: Mathematical Statistics Lecture 31

Stability of Stochastic Differential Equations

Berry-Esseen bounds for self-normalized martingales

Formulas for probability theory and linear models SF2941

ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

Negative Association, Ordering and Convergence of Resampling Methods

Extreme inference in stationary time series

Mathematical statistics

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

EIE6207: Estimation Theory

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

CHAPTER 3: LARGE SAMPLE THEORY

MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES

ECE 275B Homework # 1 Solutions Version Winter 2015

Chapter 2: Resampling Maarten Jansen

Weak quenched limiting distributions of a one-dimensional random walk in a random environment

PRECISE ASYMPTOTIC IN THE LAW OF THE ITERATED LOGARITHM AND COMPLETE CONVERGENCE FOR U-STATISTICS

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek

Stein s method and zero bias transformation: Application to CDO pricing

Ahmed Mohammed. Harnack Inequality for Non-divergence Structure Semi-linear Elliptic Equations

Estimation of the quantile function using Bernstein-Durrmeyer polynomials

Large sample covariance matrices and the T 2 statistic

Submitted to the Brazilian Journal of Probability and Statistics

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk

Extreme Value Analysis and Spatial Extremes

Chapter 8. Hypothesis Testing. Po-Ning Chen. Department of Communications Engineering. National Chiao-Tung University. Hsin Chu, Taiwan 30050

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Probability Background

Transcription:

Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk

Outline What is the self-normalization? Why? Classical limit theorems Self-normalized large deviations Self-normalized moderate deviations Self-normalized Cramér type moderate deviations

Main references [1 ]V.H. de la Pena, T.Z. Lai and Q.M. Shao (2009). Self-normalized processes: Limit Theory and Statistical Applications. Spriner, Heidelberg. [2 ] Q.M. Shao and W.X. Zhou (2012). Cramér type moderate deviation theorems for self-normalized processes. 3

1. Introduction Let X, X 1, X 2,, X n be independent identically distributed (i.i.d.) random variables and let S n = n X i, V 2 n = n Xi 2. Assume EX = 0 and σ 2 = EX 2 <. Standardized sum: S n /(σ n) Self-normalized sum: S n /V n 4

Why are we interested in the self-normalized sum? It is of interest from the probability theory alone It has a close relation with Studentized statistics. Let H n = H n (θ, λ) be a sequence of statistics under consideration, where θ contains parameters of interest and λ is a vector of some unknown nuisance parameters. It is a common practice that one needs to estimate λ first from the data, say, the estimator is ˆλ, and then substitute ˆλ in H n, which naturally brings a studentized statistic Ĥn = H n (θ, ˆλ). Typical examples include: Student s t-statistic Hotelling s T 2 statistics Studentized U-statistics The largest eigenvalue of sample correlation matrices 5

2. Classical limit theorems Let X, X 1, X 2,, X n be independent identically distributed (i.i.d.) random variables and let n S n = X i. Law of large numbers: EX = µ S n n µ a.s. Law of the iterated logarithm: EX = 0, 0 < σ 2 = EX 2 < lim sup n S n σ = 1 a.s. n (2 log log n) 1/2 6

The central limit theorem: If EX = 0 and σ 2 = EX 2 <, then S n σ n d. N(0, 1) If EX = 0 and EX 2 I{ X x} is slowly varying, then there exist a n and b n such that Uniform Berry-Esseen bounds: sup x P ( S n σ n 1 a n S n b n d. N(0, 1).7975E X 3 x) Φ(x). nσ 3 7

Non-uniform Berry-Esseen bounds: P ( S n σ n x) Φ(x) C E X 3 (1 + x 3 ) nσ 3 8

Cramér - Chernoff s large deviation: If Ee t 0X < for some t 0 > 0, then x > EX, P ( ) 1/n Sn n x inf t 0 e tx Ee tx. Cramér s moderate deviation: Assume EX = 0 and σ 2 = EX 2 <. If Ee t 0 X 1 1/2 < for t 0 > 0, then ( ) S P n σ n x 1 1 Φ(x) uniformly in 0 x o(n 1/6 ). If Ee t 0 X 1 < for t 0 > 0, then for x 0 and x = o(n 1/2 ) ( ) S P n σ n x { = exp x 2 λ( x } ( ) 1 + O( 1 + x ) ), 1 Φ(x) n n where λ(t) is the Cramér s series. 9

Conclusion: Classical limit theorem The normalizing constants are deterministic Moment conditions play a crucial role 10

3. Self-normalized limit theorems: a brief review Self-normalized sum: S n /V n, where V 2 n = n Xi 2. Do the classical limit theorems remain valid for S n /V n? 11

3.1 The central limit theorem: If EX = 0 and EX 2 <, then S n /V n d. N(0, 1) Gine-Götze-Mason (1995): EX = 0 and EX 2 I{ X x} is slowly varying S n /V n d. N(0, 1) 12

Bentkus-Götze (1996): If EX = 0, σ 2 = EX 2 and E X 3 <, then sup x P (S n /V n x) Φ(x) C n 1/2 E X 3 /σ 3 3.2 Self-normalized limit distribution for X DASL Logan-Mallows-Rice-Shepp (1973): If X DASL(α), then the limiting density function p(x) of S n /V n exists and satisfies p(x) 1 α ( ) 1/2 2 τ α e x2 τα/2 2 π Conjecture: τ α is the solution of c 1 D α ( τ) + c 2 D α (τ) = 0 if α 1 e τ 2 /2 τ τ 0 e x2 /2 dx = 0 if α = 1 13

where D α (x) is the parabolic cylinder function. 14

3.3 Self-normalized law of the iterated logarithm (Griffin and Kuelbs (1989)) (a) If EX = 0 and EX 2 I{ X x} is slowly varying, then lim sup n S n V n (2 log log n) 1/2 = 1 a.s. (b) If X is symmetric and P (X x) = l(x) x α, 0 < α < 2, where l(x) is a slowly varying function, then there exists 0 < C α < such that lim sup n However, for any a n S n V n (log log n) 1/2 = C α a.s. lim sup n S n a n = 0 a.s.or a.s. 15

4. Self-normalized large deviations Question: Is there a self-normalized large deviation without assuming moment condition? Answer: YES! Observe that for x > 0 P (S n /Vn 2 x) = P (S n xvn 2 ) n = P ( (X i xxi 2 ) x) Assume that EX = 0 or EX 2 =. We have EX xex 2 < 0 and Ee t(x x X2) < for t 0. Therefore, by the Chernoff large deviation P (S n /V 2 n x) 1/n inf t 0 Eet(X x X2 ) 16

Question: What if the normalizing constant is V n n? It is well-know that Indeed, we have for a, b > 0. Thus ab (a 2 + b 2 )/2 ab = (1/2) inf c>0 (c a2 + b 2 /c) V n n = (1/2) inf c>0 (V 2 n /c + c n) 17

and S n P ( x) V n n = P (S n xv n n) = P (S n x(1/2) inf n /c + c n) ( ) = P sup cs n x(1/2)(vn 2 + c 2 n) 0 c 0 ( ) n = P {cx i x(1/2)(xi 2 + c 2 )} 0. sup c 0 c>0 (V 2 Theorem 1.1 [Shao (1997)] If EX = 0 or EX 2 =, then x > 0 ) 1/n P (S n /V n xn 1/2 sup c 0 inf t 0 Eet(cX x( X 2 +c 2 )/2) 18

5. Self-normalized moderate deviations Theorem 1.2 [Shao (1997)] (a) If EX = 0 and EX 2 I{ X x} is slowly varying, then ln P (S n /V n > x n ) x 2 n/2 for x n and x n = o( n), i.e., ε > 0 e (1+ε)x2 n/2 P (S n /V n > x n ) e (1 ε)x2 n/2 (b) Let X be a symmetric random variable satisfying P (X x) = l(x) x α, 0 < α < 2. where l(x) is a slowly varying function. Then x n, x n = o( n) ( ) Sn ln P x n β α x 2 n V n where β α is the solution of 0 2 e 2x x2 /β e 2x x2 /β x α+1 dx = 0. (1.1) 19

6. Self-normalized Cramér type moderate deviations Theorem 1.3 (Shao (1999)) If EX = 0 and E X 3 <, then uniformly in 0 x o(n 1/6 ). P (S n /V n x) 1 Φ(x) 1 Theorem 1.4 [Jing-Shao-Wang (2003)]. If EX = 0 and E X 3 <, then for x 0, and P (S n /V n x) 1 Φ(x) = 1 + O(1) (1 + x)3 E X 3 nσ 3 P (S n /V n x) (1 Φ(x)) for 0 x n 1/6 σ/(e X 3 ) 1/3. A(1 + x) 3 e x2 /2 n 1/2 E X 3 /σ 3 20

7. Self-normalized Cramér moderate deviations for independent random variables Let X 1, X 2,, X n be independent random variables with EX i = 0 and EXi 2 <. Put n S n = and for x > 0. n,x = X i, V 2 n = (1 + x)2 B 2 n (1 + x)3 + Bn 3 n Xi 2, Bn 2 = ESn 2 n EXi 2 I { Xi >B n /(1+x)} n E X i 3 I { Xi B n /(1+x)} Theorem 1.5 [Jing-Shao-Wang (2003)]. 21

There is an absolute constant A (> 1) such that for all x 0 satisfying P (S n xv n ) 1 Φ(x) = e O(1) n,x and x 2 max 1 i n EX2 i B 2 n n,x (1 + x) 2 /A, where O(1) A. 22

Theorem 1.6 [Jing-Shao-Wang (2003)]. for P (S n /V n x) (1 Φ(x)) n A(1 + x) 3 e x2 /2 E X i 3 0 x B n B 3 n ( n E X i 3 ) 1/3 23

8. Cramér moderate deviations for self-normalized processes 8.1 A general result Let ξ 1,..., ξ n be independent random variables satisfying n Eξ i = 0, 1 i n, Eξi 2 = 1. (1.2) Assume the nonlinear process of interest can be decomposed as a standardized partial sum of {ξ i } n, say, W n, plus a remainder, say, D n,1, while its self-normalized version can be written as where W n = T n = W n + D n,1 V n (1 + D n,2 ) 1/2, (1.3) n ( n ) 1/2 ξ i, V n = ξi 2, and D n,1, D n,2 are measurable functions of {ξ i } n. Examples satisfying (1.3) include t-statistic, Studentized U-statistics and L-statistics. In this section, we establish a general Cramér type moderation theorem for self-normalized process T n in the form of (1.3). For 1 24

i n and x 0, let δ i,x = (1 + x) 2 E[ξi 2 I {(1+x) ξi >1}] + (1 + x) 3 E[ ξ i 3 I {(1+x) ξi 1}], n n n,x = δ i,x and I n,x = E[e xw n x 2 Vn 2/2 ] = e[e xξ i x 2 ξi 2/2 ]. Let D (i) n,1 and D(i) n,2, for each 1 i n, be arbitrary measurable functions of {ξ j } n j=1,j i, such that {D(i) n,1, D(i) n,2 } and ξ i are independent. Set also for x > 0 that { R n,x := In,x 1 E[(x D n,1 + x 2 D n,2 )e n j=1 (xξ j x 2 ξj 2/2) ] n } + E[min(x ξ i, 1)( D n,1 D (i) n,1 + x D n,2 D (i) n,2 )e j i (xξ j x 2 ξj 2/2) ], where j i = n j=1,j i. Theorem 1.7 (Shao and Zhou (2012)) Let T n be defined in (1.3). Then there is an absolute constant A (> 1) such that exp{o(1) n,x } ( 1 A R n,x ) P (T n x) 1 Φ(x) 25 (1.4)

and P (T n x) ( 1 Φ(x) ) exp{o(1) n,x } ( 1 + A R n,x ) (1.5) +P { D n,1 /V n > 1/(4x) } + P { D n,2 > 1/(4x 2 ) } for all x > 1 satisfying and max δ i,x 1 (1.6) 1 i n n,x (1 + x) 2 /A, (1.7) where O(1) A. 26

8.2 Studentized U-statistics Let X 1, X 2,, X n be a sequence of i.i.d. random variables and let h : R m R be a Borel measurable symmetric function of m variables, where 2 m < n/2. Consider Hoeffding s U-statistic with a kernel h of degree m given by ( ) 1 n U n = m 1 i 1 <...<i m n h(x i1,..., X im ), (1.8) which is an unbiased estimate of θ = eh(x 1,..., X m ). The U-statistic is a basic statistic and its asymptotic properties have been extensively studied in literature. Let g(x) = Eh(x, X 2,..., X m ), x R and σ 2 = Var(g(X 1 )). For standardized (non-degenerate) U-statistic n Z n = mσ (U n θ), (1.9) where σ > 0 and m is fixed. 27

Since σ is usually unknown, consider the following Studentized U-statistic n T n = (U n θ), (1.10) ms 1 where s 2 1 is the leave-one-out Jackknife estimator of σ 2 given by s 2 (n 1) n 1 = (q (n m) 2 i U n ) 2 with (1.11) ( ) 1 n 1 q i = h(x i, X l1,..., X lm 1 ) m 1 (l 1,...,l m ) C m 1,i and C m 1,i = {(l 1,..., l m 1 ) : 1 l 1 < < l m 1 n, l j i for 1 j m 1}. As a direct but non-trivial consequence of Theorem 1.7, we can establish a sharp Cramér type moderate deviation theorem for Studentized U-statistic T n as follows. Theorem 1.8 (Shao and Zhou (2012)) Let 2 < p 3 and assume 0 < σ p := (E g(x 1 ) θ p ) 1/p <. 28

Suppose that there are constants c 0 1 and τ 0 such that m (h(x 1,..., x m ) θ) 2 c 0 (τσ 2 + (g(x i ) θ) ). 2 (1.12) Then there exists a constant A > 1 only depending on m such that { P (T n x) σ p p (1 + x) p } = 1 + O(1) 1 Φ(x) σ p n + c (1 + x)3 0(1 + τ), (1.13) (p 2)/2 n 1/2 for any 0 < x < 1 A min{σn(p 2)/(2p) /σ p, n 1/6 /(c 0 (1 + τ)) 1/6 }, where O(1) A. In particular, we have P (T n x) 1 Φ(x) uniformly in 0 x o(n (p 2)/(2p) ). 1 (1.14) Clearly, condition (1.12) is satisfied for the t-statistic (h(x 1, x 2 ) = (x 1 + x 2 )/2 with c 0 = 2 and τ = 0), sample variance (h(x 1, x 2 ) = (x 1 x 2 ) 2 /2, c 0 = 10, τ = θ 2 /σ 2 ), Gini s mean difference (h(x 1, x 2 ) = x 1 x 2, c 0 = 8, τ = θ 2 /σ 2 ) and one-sample Wilcoxon s statistic (h(x 1, x 2 ) = 1{x 1 + x 2 0}, c 0 = 1, τ = 1/σ 2 ). It would be 29

interesting to know if condition (1.12) can be weakened, but it seems impossible to remove condition (1.12) completely. 30

9. Miscellanies Horváth-Shao (1996): Large deviation for S n / max 1 i n X i Dembo-Shao (1998): Self-normalized moderate and large deviations in R d Gine-Mason (2001): Sub-gaussian property for X in the Feller class Bercu-Gassiat-Rio (2002): Concentration inequalities, large and moderate deviations for self-normalized empirical processes De la Peña-Klass-Lai (2004): Self-normalized processes: exponential inequalities, moment and limit theorems Jing-Shao-Zhou (2004): Self-normalized saddle point approximations Jing-Shao-Wang (2003): The studentized bootstrap 31

He and Shao (1996): Bahadur efficiency of studentized score tests He and Shao (1996, 2000): Bahadur representations for M-estimators Horvath and Shao (1996): Change point analysis Chen and Shao (1997, 2000): Monte Carlo methods in Baysian computations Liu and Shao (2013): Hotelling s T 2 statistics 32

Conclusion: Self-normalized limit theorems Require little or no moment assumption Results are more elegant and neater than many classical limit theorems Provide much wider applicability to other fields and to statistics in particular 33