Introduction to Self-normalized Limit Theory

Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk

Outline What is the self-normalization? Why? Classical limit theorems Self-normalized large deviations Self-normalized moderate deviations Self-normalized Cramér type moderate deviations

Main references [1 ]V.H. de la Pena, T.Z. Lai and Q.M. Shao (2009). Self-normalized processes: Limit Theory and Statistical Applications. Spriner, Heidelberg. [2 ] Q.M. Shao and W.X. Zhou (2012). Cramér type moderate deviation theorems for self-normalized processes. 3

1. Introduction Let X, X 1, X 2,, X n be independent identically distributed (i.i.d.) random variables and let S n = n X i, V 2 n = n Xi 2. Assume EX = 0 and σ 2 = EX 2 <. Standardized sum: S n /(σ n) Self-normalized sum: S n /V n 4

Why are we interested in the self-normalized sum? It is of interest from the probability theory alone It has a close relation with Studentized statistics. Let H n = H n (θ, λ) be a sequence of statistics under consideration, where θ contains parameters of interest and λ is a vector of some unknown nuisance parameters. It is a common practice that one needs to estimate λ first from the data, say, the estimator is ˆλ, and then substitute ˆλ in H n, which naturally brings a studentized statistic Ĥn = H n (θ, ˆλ). Typical examples include: Student s t-statistic Hotelling s T 2 statistics Studentized U-statistics The largest eigenvalue of sample correlation matrices 5

2. Classical limit theorems Let X, X 1, X 2,, X n be independent identically distributed (i.i.d.) random variables and let n S n = X i. Law of large numbers: EX = µ S n n µ a.s. Law of the iterated logarithm: EX = 0, 0 < σ 2 = EX 2 < lim sup n S n σ = 1 a.s. n (2 log log n) 1/2 6

The central limit theorem: If EX = 0 and σ 2 = EX 2 <, then S n σ n d. N(0, 1) If EX = 0 and EX 2 I{ X x} is slowly varying, then there exist a n and b n such that Uniform Berry-Esseen bounds: sup x P ( S n σ n 1 a n S n b n d. N(0, 1).7975E X 3 x) Φ(x). nσ 3 7

Non-uniform Berry-Esseen bounds: P ( S n σ n x) Φ(x) C E X 3 (1 + x 3 ) nσ 3 8

Cramér - Chernoff s large deviation: If Ee t 0X < for some t 0 > 0, then x > EX, P ( ) 1/n Sn n x inf t 0 e tx Ee tx. Cramér s moderate deviation: Assume EX = 0 and σ 2 = EX 2 <. If Ee t 0 X 1 1/2 < for t 0 > 0, then ( ) S P n σ n x 1 1 Φ(x) uniformly in 0 x o(n 1/6 ). If Ee t 0 X 1 < for t 0 > 0, then for x 0 and x = o(n 1/2 ) ( ) S P n σ n x { = exp x 2 λ( x } ( ) 1 + O( 1 + x ) ), 1 Φ(x) n n where λ(t) is the Cramér s series. 9

Conclusion: Classical limit theorem The normalizing constants are deterministic Moment conditions play a crucial role 10

3. Self-normalized limit theorems: a brief review Self-normalized sum: S n /V n, where V 2 n = n Xi 2. Do the classical limit theorems remain valid for S n /V n? 11

3.1 The central limit theorem: If EX = 0 and EX 2 <, then S n /V n d. N(0, 1) Gine-Götze-Mason (1995): EX = 0 and EX 2 I{ X x} is slowly varying S n /V n d. N(0, 1) 12

Bentkus-Götze (1996): If EX = 0, σ 2 = EX 2 and E X 3 <, then sup x P (S n /V n x) Φ(x) C n 1/2 E X 3 /σ 3 3.2 Self-normalized limit distribution for X DASL Logan-Mallows-Rice-Shepp (1973): If X DASL(α), then the limiting density function p(x) of S n /V n exists and satisfies p(x) 1 α ( ) 1/2 2 τ α e x2 τα/2 2 π Conjecture: τ α is the solution of c 1 D α ( τ) + c 2 D α (τ) = 0 if α 1 e τ 2 /2 τ τ 0 e x2 /2 dx = 0 if α = 1 13

where D α (x) is the parabolic cylinder function. 14

3.3 Self-normalized law of the iterated logarithm (Griffin and Kuelbs (1989)) (a) If EX = 0 and EX 2 I{ X x} is slowly varying, then lim sup n S n V n (2 log log n) 1/2 = 1 a.s. (b) If X is symmetric and P (X x) = l(x) x α, 0 < α < 2, where l(x) is a slowly varying function, then there exists 0 < C α < such that lim sup n However, for any a n S n V n (log log n) 1/2 = C α a.s. lim sup n S n a n = 0 a.s.or a.s. 15

4. Self-normalized large deviations Question: Is there a self-normalized large deviation without assuming moment condition? Answer: YES! Observe that for x > 0 P (S n /Vn 2 x) = P (S n xvn 2 ) n = P ( (X i xxi 2 ) x) Assume that EX = 0 or EX 2 =. We have EX xex 2 < 0 and Ee t(x x X2) < for t 0. Therefore, by the Chernoff large deviation P (S n /V 2 n x) 1/n inf t 0 Eet(X x X2 ) 16

Question: What if the normalizing constant is V n n? It is well-know that Indeed, we have for a, b > 0. Thus ab (a 2 + b 2 )/2 ab = (1/2) inf c>0 (c a2 + b 2 /c) V n n = (1/2) inf c>0 (V 2 n /c + c n) 17

and S n P ( x) V n n = P (S n xv n n) = P (S n x(1/2) inf n /c + c n) ( ) = P sup cs n x(1/2)(vn 2 + c 2 n) 0 c 0 ( ) n = P {cx i x(1/2)(xi 2 + c 2 )} 0. sup c 0 c>0 (V 2 Theorem 1.1 [Shao (1997)] If EX = 0 or EX 2 =, then x > 0 ) 1/n P (S n /V n xn 1/2 sup c 0 inf t 0 Eet(cX x( X 2 +c 2 )/2) 18

5. Self-normalized moderate deviations Theorem 1.2 [Shao (1997)] (a) If EX = 0 and EX 2 I{ X x} is slowly varying, then ln P (S n /V n > x n ) x 2 n/2 for x n and x n = o( n), i.e., ε > 0 e (1+ε)x2 n/2 P (S n /V n > x n ) e (1 ε)x2 n/2 (b) Let X be a symmetric random variable satisfying P (X x) = l(x) x α, 0 < α < 2. where l(x) is a slowly varying function. Then x n, x n = o( n) ( ) Sn ln P x n β α x 2 n V n where β α is the solution of 0 2 e 2x x2 /β e 2x x2 /β x α+1 dx = 0. (1.1) 19

6. Self-normalized Cramér type moderate deviations Theorem 1.3 (Shao (1999)) If EX = 0 and E X 3 <, then uniformly in 0 x o(n 1/6 ). P (S n /V n x) 1 Φ(x) 1 Theorem 1.4 [Jing-Shao-Wang (2003)]. If EX = 0 and E X 3 <, then for x 0, and P (S n /V n x) 1 Φ(x) = 1 + O(1) (1 + x)3 E X 3 nσ 3 P (S n /V n x) (1 Φ(x)) for 0 x n 1/6 σ/(e X 3 ) 1/3. A(1 + x) 3 e x2 /2 n 1/2 E X 3 /σ 3 20

7. Self-normalized Cramér moderate deviations for independent random variables Let X 1, X 2,, X n be independent random variables with EX i = 0 and EXi 2 <. Put n S n = and for x > 0. n,x = X i, V 2 n = (1 + x)2 B 2 n (1 + x)3 + Bn 3 n Xi 2, Bn 2 = ESn 2 n EXi 2 I { Xi >B n /(1+x)} n E X i 3 I { Xi B n /(1+x)} Theorem 1.5 [Jing-Shao-Wang (2003)]. 21

There is an absolute constant A (> 1) such that for all x 0 satisfying P (S n xv n ) 1 Φ(x) = e O(1) n,x and x 2 max 1 i n EX2 i B 2 n n,x (1 + x) 2 /A, where O(1) A. 22

Theorem 1.6 [Jing-Shao-Wang (2003)]. for P (S n /V n x) (1 Φ(x)) n A(1 + x) 3 e x2 /2 E X i 3 0 x B n B 3 n ( n E X i 3 ) 1/3 23

8. Cramér moderate deviations for self-normalized processes 8.1 A general result Let ξ 1,..., ξ n be independent random variables satisfying n Eξ i = 0, 1 i n, Eξi 2 = 1. (1.2) Assume the nonlinear process of interest can be decomposed as a standardized partial sum of {ξ i } n, say, W n, plus a remainder, say, D n,1, while its self-normalized version can be written as where W n = T n = W n + D n,1 V n (1 + D n,2 ) 1/2, (1.3) n ( n ) 1/2 ξ i, V n = ξi 2, and D n,1, D n,2 are measurable functions of {ξ i } n. Examples satisfying (1.3) include t-statistic, Studentized U-statistics and L-statistics. In this section, we establish a general Cramér type moderation theorem for self-normalized process T n in the form of (1.3). For 1 24

i n and x 0, let δ i,x = (1 + x) 2 E[ξi 2 I {(1+x) ξi >1}] + (1 + x) 3 E[ ξ i 3 I {(1+x) ξi 1}], n n n,x = δ i,x and I n,x = E[e xw n x 2 Vn 2/2 ] = e[e xξ i x 2 ξi 2/2 ]. Let D (i) n,1 and D(i) n,2, for each 1 i n, be arbitrary measurable functions of {ξ j } n j=1,j i, such that {D(i) n,1, D(i) n,2 } and ξ i are independent. Set also for x > 0 that { R n,x := In,x 1 E[(x D n,1 + x 2 D n,2 )e n j=1 (xξ j x 2 ξj 2/2) ] n } + E[min(x ξ i, 1)( D n,1 D (i) n,1 + x D n,2 D (i) n,2 )e j i (xξ j x 2 ξj 2/2) ], where j i = n j=1,j i. Theorem 1.7 (Shao and Zhou (2012)) Let T n be defined in (1.3). Then there is an absolute constant A (> 1) such that exp{o(1) n,x } ( 1 A R n,x ) P (T n x) 1 Φ(x) 25 (1.4)

and P (T n x) ( 1 Φ(x) ) exp{o(1) n,x } ( 1 + A R n,x ) (1.5) +P { D n,1 /V n > 1/(4x) } + P { D n,2 > 1/(4x 2 ) } for all x > 1 satisfying and max δ i,x 1 (1.6) 1 i n n,x (1 + x) 2 /A, (1.7) where O(1) A. 26

8.2 Studentized U-statistics Let X 1, X 2,, X n be a sequence of i.i.d. random variables and let h : R m R be a Borel measurable symmetric function of m variables, where 2 m < n/2. Consider Hoeffding s U-statistic with a kernel h of degree m given by ( ) 1 n U n = m 1 i 1 <...<i m n h(x i1,..., X im ), (1.8) which is an unbiased estimate of θ = eh(x 1,..., X m ). The U-statistic is a basic statistic and its asymptotic properties have been extensively studied in literature. Let g(x) = Eh(x, X 2,..., X m ), x R and σ 2 = Var(g(X 1 )). For standardized (non-degenerate) U-statistic n Z n = mσ (U n θ), (1.9) where σ > 0 and m is fixed. 27

Since σ is usually unknown, consider the following Studentized U-statistic n T n = (U n θ), (1.10) ms 1 where s 2 1 is the leave-one-out Jackknife estimator of σ 2 given by s 2 (n 1) n 1 = (q (n m) 2 i U n ) 2 with (1.11) ( ) 1 n 1 q i = h(x i, X l1,..., X lm 1 ) m 1 (l 1,...,l m ) C m 1,i and C m 1,i = {(l 1,..., l m 1 ) : 1 l 1 < < l m 1 n, l j i for 1 j m 1}. As a direct but non-trivial consequence of Theorem 1.7, we can establish a sharp Cramér type moderate deviation theorem for Studentized U-statistic T n as follows. Theorem 1.8 (Shao and Zhou (2012)) Let 2 < p 3 and assume 0 < σ p := (E g(x 1 ) θ p ) 1/p <. 28

Suppose that there are constants c 0 1 and τ 0 such that m (h(x 1,..., x m ) θ) 2 c 0 (τσ 2 + (g(x i ) θ) ). 2 (1.12) Then there exists a constant A > 1 only depending on m such that { P (T n x) σ p p (1 + x) p } = 1 + O(1) 1 Φ(x) σ p n + c (1 + x)3 0(1 + τ), (1.13) (p 2)/2 n 1/2 for any 0 < x < 1 A min{σn(p 2)/(2p) /σ p, n 1/6 /(c 0 (1 + τ)) 1/6 }, where O(1) A. In particular, we have P (T n x) 1 Φ(x) uniformly in 0 x o(n (p 2)/(2p) ). 1 (1.14) Clearly, condition (1.12) is satisfied for the t-statistic (h(x 1, x 2 ) = (x 1 + x 2 )/2 with c 0 = 2 and τ = 0), sample variance (h(x 1, x 2 ) = (x 1 x 2 ) 2 /2, c 0 = 10, τ = θ 2 /σ 2 ), Gini s mean difference (h(x 1, x 2 ) = x 1 x 2, c 0 = 8, τ = θ 2 /σ 2 ) and one-sample Wilcoxon s statistic (h(x 1, x 2 ) = 1{x 1 + x 2 0}, c 0 = 1, τ = 1/σ 2 ). It would be 29

interesting to know if condition (1.12) can be weakened, but it seems impossible to remove condition (1.12) completely. 30

9. Miscellanies Horváth-Shao (1996): Large deviation for S n / max 1 i n X i Dembo-Shao (1998): Self-normalized moderate and large deviations in R d Gine-Mason (2001): Sub-gaussian property for X in the Feller class Bercu-Gassiat-Rio (2002): Concentration inequalities, large and moderate deviations for self-normalized empirical processes De la Peña-Klass-Lai (2004): Self-normalized processes: exponential inequalities, moment and limit theorems Jing-Shao-Zhou (2004): Self-normalized saddle point approximations Jing-Shao-Wang (2003): The studentized bootstrap 31

He and Shao (1996): Bahadur efficiency of studentized score tests He and Shao (1996, 2000): Bahadur representations for M-estimators Horvath and Shao (1996): Change point analysis Chen and Shao (1997, 2000): Monte Carlo methods in Baysian computations Liu and Shao (2013): Hotelling s T 2 statistics 32

Conclusion: Self-normalized limit theorems Require little or no moment assumption Results are more elegant and neater than many classical limit theorems Provide much wider applicability to other fields and to statistics in particular 33