Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou

Similar documents
Introduction to Self-normalized Limit Theory

Bivariate Paired Numerical Data

Advanced Statistics II: Non Parametric Tests

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Math Review Sheet, Fall 2008

Unit 14: Nonparametric Statistical Methods

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Spring 2012 Math 541B Exam 1

Nonparametric Location Tests: k-sample

One-Sample Numerical Data

Chapter 10. U-statistics Statistical Functionals and V-Statistics

Nonparametric hypothesis tests and permutation tests

Introduction to Statistical Analysis. Cancer Research UK 12 th of February 2018 D.-L. Couturier / M. Eldridge / M. Fernandes [Bioinformatics core]

RELATIVE ERRORS IN CENTRAL LIMIT THEOREMS FOR STUDENT S t STATISTIC, WITH APPLICATIONS

Asymptotic Statistics-III. Changliang Zou

6 Single Sample Methods for a Location Parameter

TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION

Some Observations on the Wilcoxon Rank Sum Test

Chapter 2: Resampling Maarten Jansen

5 Introduction to the Theory of Order Statistics and Rank Statistics

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

A nonparametric two-sample wald test of equality of variances

STAT 461/561- Assignments, Year 2015

Nonparametric Independence Tests

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

Stat 710: Mathematical Statistics Lecture 31

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Recall the Basics of Hypothesis Testing

Statistics Applied to Bioinformatics. Tests of homogeneity

Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

high-dimensional inference robust to the lack of model sparsity

Robust Backtesting Tests for Value-at-Risk Models

Distribution-Free Procedures (Devore Chapter Fifteen)

STAT 200C: High-dimensional Statistics

STA205 Probability: Week 8 R. Wolpert

Independent and conditionally independent counterfactual distributions

EXACT CONVERGENCE RATE AND LEADING TERM IN THE CENTRAL LIMIT THEOREM FOR U-STATISTICS

Sparse Linear Discriminant Analysis With High Dimensional Data

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Econometrics II - Problem Set 2

Non-parametric tests, part A:

Statistics. Statistics

Master s Written Examination

Testing Statistical Hypotheses

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Inferential Statistics

Inference For High Dimensional M-estimates. Fixed Design Results

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

Nonparametric Statistics Notes

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Resampling and the Bootstrap

Rank-Based Methods. Lukas Meier

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

STAT 512 sp 2018 Summary Sheet

Exercises and Answers to Chapter 1

Minimum distance tests and estimates based on ranks

EXACT AND ASYMPTOTICALLY ROBUST PERMUTATION TESTS. Eun Yi Chung Joseph P. Romano

A Conditional Approach to Modeling Multivariate Extremes

Concentration of Measures by Bounded Couplings

Math 494: Mathematical Statistics

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Lecture 3. Inference about multivariate normal distribution

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Non-parametric methods

Modelling and Estimation of Stochastic Dependence

Review. December 4 th, Review

IEOR E4703: Monte-Carlo Simulation

NON-PARAMETRIC STATISTICS * (

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Inference For High Dimensional M-estimates: Fixed Design Results

STATISTICS SYLLABUS UNIT I

Non-specific filtering and control of false positives

A Bootstrap Test for Conditional Symmetry

Asymptotic Statistics-VI. Changliang Zou

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Non-parametric Inference and Resampling

Contents 1. Contents

Marginal Screening and Post-Selection Inference

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek

Comprehensive Examination Quantitative Methods Spring, 2018

Concentration of Measures by Bounded Size Bias Couplings

TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? Jianqing Fan Peter Hall Qiwei Yao

Master s Written Examination - Solution

Asymptotic efficiency of simple decisions for the compound decision problem

Non-parametric (Distribution-free) approaches p188 CN

Lecture 8 Inequality Testing and Moment Inequality Models

Multivariate Distribution Models

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Some Basic Concepts of Probability and Information Theory: Pt. 2

Maximum Mean Discrepancy

Lecture 21: Convergence of transformations and generating a random variable

Central Limit Theorem ( 5.3)

Does k-th Moment Exist?

Transcription:

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics Wen-Xin Zhou Department of Mathematics and Statistics University of Melbourne Joint work with Prof. Qi-Man Shao Program on Self-Normalized Asymptotic Theory in Probability, Statistics and Econometrics, IMS (NUS) 21 May, 2014

Outline 1 Mann-Whitney U-test: Motivating examples 2 Studentized U-statistics 3 Cramér-type moderate deviations for Studentized U-statistics 4 Refined moderate deviations for the two-sample t-statistic 5 Applications

1. Mann-Whitney U-test: Motivating examples Assume that we observe two independent samples i.i.d. i.i.d. X, X 1,..., X m P and Y, Y 1,..., Y n Q, where both P and Q are continuous distribution functions. The Mann-Whitney U-test statistic is given by U m,n = 1 m n I{X i Y j }. mn i=1 j=1

1. Mann-Whitney U-test: Motivating examples Assume that we observe two independent samples i.i.d. i.i.d. X, X 1,..., X m P and Y, Y 1,..., Y n Q, where both P and Q are continuous distribution functions. The Mann-Whitney U-test statistic is given by U m,n = 1 m n I{X i Y j }. mn i=1 j=1 Without continuity assumption, we could simply use Ũ m,n = 1 mn m i=1 n j=1 ( I{Xi Y j } + 1 2 I{X i = Y j } 1 ) 2, which takes into account the possibility of having ties.

The Mann-Whitney (M-W) test was originally introduced to test the null hypothesis H 0 : P = Q. Alternatively, the M-W test has been prevalently used for testing the equality of means or medians, as a nonparametric counterpart of the t-statistic. Advantages: Easy to implement, good efficiency, robustness against parametric assumptions, etc. Disadvantages (Chung and Romano, 2011): The M-W test is only valid if the fundamental assumption of identical distributions holds; that is, P = Q. Why?

The Mann-Whitney (M-W) test was originally introduced to test the null hypothesis H 0 : P = Q. Alternatively, the M-W test has been prevalently used for testing the equality of means or medians, as a nonparametric counterpart of the t-statistic. Advantages: Easy to implement, good efficiency, robustness against parametric assumptions, etc. Disadvantages (Chung and Romano, 2011): The M-W test is only valid if the fundamental assumption of identical distributions holds; that is, P = Q. Why? Misapplication of the M-W test (a toy example): Assume P N(log 2, 1), Q exp(1), such that Median(P) = Median(Q) = log 2. Set α = 0.05, using the M-W test via Monte Carlo simulation shows that the rejection probability for a two-sided test is 0.2843. In fact, the M-W test only captures the divergence from P(X Y) = 1 2. However, for X N(log 2, 1), Y exp(1), P(X < Y) = 0.4431.

A real data example (testing equality of means): Sutter (2009, AER) performed the M-W U-test to examine the effects of salient group membership on individual behavior, and compares them to those of team decision making. The average investments in PAY-COMM (payoff commonality) with 18 observations and MESSAGE (exchange of messages) with 24 observations are compared. Estimated densities (kernel method) for PAY-COMM and MESSAGE, denoted by P and Q, resp., are plotted in the Figure below:

Using the M-W test, Sutter rejects the null hypothesis that the average investments in PAY-COMM and MESSAGE are the same at the 10% significance level, with a p-value of 0.069. For the conventional 5% significance level, the M-W test would have failed to reject the null hypothesis. Using the Studentized permutation t-test, however, yields a p-value of 0.042 and rejects the null hypothesis at the 5% significance level (Chung and Romano, 2011).

The 3rd example (testing equality of distributions): Plott and Zeiler (2005, AER) used the M-W U-test to examine the null hypothesis that willingness to pay (WTP) and willingness to accept (WTA) are drawn from the same distribution. Estimated densities, P for WTP and Q for WTA, are plotted below:

The M-W test yields a z value of 1.738 (p-value = 0.0821), leading to a failure in rejecting the null hypothesis. This is not a surprising outcome, as when testing equality of distributions, it is more recommended to use a statistic that captures the differences of the entire distributions, such as the Kolmogorov-Smirnov or the Cramér-von Mises statistic, in contrast to assessing divergence in a particular parameter of the distributions. In the same example, the Cramér-von Mises statistic yields a p-value of 0.0546.

Conclusion: The Mann-Whitney U-test has been misused in many experimental applications. As opposed to testing equality of medians, means or distributions, what the M-W test is actually testing is H 0 : P(X Y) = 1 2 versus H 1 : P(X Y) > 1 2 or H 0 : P(X Y) 1 2 versus H 1 : P(X Y) > 1 2. Testing H 0 arises in many applications including: Testing whether the physiological performance of an active drug is better than that under the control treatment; Testing the effects of a policy, such as unemployment insurance or a vocational training program, on the level of unemployment.

Self-normalization: Even when testing the null H 0 : P(X Y) = 1 2, the standard Mann-Whitney test is invalid unless it is appropriately Studentized.

Self-normalization: Even when testing the null H 0 : P(X Y) = 1 2, the standard Mann-Whitney test is invalid unless it is appropriately Studentized. Recall that X 1,..., X m i.i.d. P X and Y 1,..., Y n i.i.d. Q Y. Let min(m, n), m m+n ρ (0, 1). Then the distribution of m(u m,n θ) is asymptotically normal with mean 0 and variance where Q Y (y) = Q Y( y). Var(Q Y (X i)) + ρ 1 ρ Var(P X(Y j )), In other words, m(u m,n θ) is not asymptotically pivotal.

2. Studentized U-statistics Let X, X 1,..., X n be i.i.d. random variables with distribution taking values in a measurable space (X, X ), consider a U-statistic of the form U n = ( 1 n h(x i1,..., X id ) d) 1 i 1 < <i d n with a symmetric kernel h : X d R, where 1 d n/2. Write θ = Eh(X 1,..., X d ), h 1 (x) = E(h(X 1,..., X d ) θ X d = x).

2. Studentized U-statistics Let X, X 1,..., X n be i.i.d. random variables with distribution taking values in a measurable space (X, X ), consider a U-statistic of the form U n = ( 1 n h(x i1,..., X id ) d) 1 i 1 < <i d n with a symmetric kernel h : X d R, where 1 d n/2. Write θ = Eh(X 1,..., X d ), h 1 (x) = E(h(X 1,..., X d ) θ X d = x). If σ 2 := Var(h 1 (X)) > 0, the standardized non-degenerate U-statistic is given by n Z n = d σ (U n θ).

Because σ 2 is always unknown, we are interested in the following Studentized (self-normalized) U-statistic: n Û n = d σ (U n θ), where σ 2 denotes the leave-one-out Jackknife estimator of σ 2 ; that is, σ 2 = n 1 n (n d) 2 (q i U n ) 2, where q i = 1 ( n 1 d 1 ) 1 i 1 < <i d 1 n, i j i, j=1,...,d 1 i=1 h(x i, X i1,..., X id 1 ).

Because σ 2 is always unknown, we are interested in the following Studentized (self-normalized) U-statistic: n Û n = d σ (U n θ), where σ 2 denotes the leave-one-out Jackknife estimator of σ 2 ; that is, σ 2 = n 1 n (n d) 2 (q i U n ) 2, where q i = 1 ( n 1 d 1 ) 1 i 1 < <i d 1 n, i j i, j=1,...,d 1 i=1 h(x i, X i1,..., X id 1 ). statistic kernel function t-statistic h(x 1, x 2 ) = 1 2 (x 1 + x 2 ) Sample variance h(x 1, x 2 ) = 1 2 (x 1 x 2 ) 2 Gini s mean difference h(x 1, x 2 ) = x 1 x 2 Wilcoxon s statistic h(x 1, x 2 ) = I{x 1 + x 2 0} Kendall s τ h(x 1, x 2 ) = 2I{(x2 2 x2 1 )(x1 2 x1 1 ) > 0}

The construction of U-statistics arises most naturally from a paired comparison experiment based on independent random variables, (X i, Y i ), i 1. Denote by (F, G) a pair of probability measures and assume X F, Y G.

The construction of U-statistics arises most naturally from a paired comparison experiment based on independent random variables, (X i, Y i ), i 1. Denote by (F, G) a pair of probability measures and assume X F, Y G. Consider independent samples, X 1,..., X m from F and Y 1,..., Y n from G. Let h(x 1,..., x d, y 1,..., y s ) be a kernel which is symmetric under independent permutations of x 1,..., x d and y 1,..., y s. The corresponding U-statistic is U m,n = ( 1 m )( n d s) 1 i 1 < <i d m 1 j 1 < <j s n which is an unbiased estimate of θ = Eh(X 1,..., X d, Y 1,..., Y s ). h(x i1,..., X id, Y j1,..., Y js ),

Examples A two-sample comparison of means: Let h(x, y) = x y, a kernel of degree (d, s) = (1, 1). Then θ = EX EY and the corresponding U-statistic is U m,n = 1 mn n m (X i Y j ) = X m Ȳ n. i=1 j=1 The Wilcoxon (1945), Mann-Whitney (1947), two-sample test: Let the kernel be h(x, y) = I{x y} with θ = P(X Y). The Lehmann statistic (1951), The Kochar statistic (1979), etc.

In the non-degenerate case where σ 2 1 = Var(h 1(X)) > 0 and σ 2 2 = Var(h 2(Y)) > 0 with h 1 (x) = E(h(X 1,..., X d, Y 1,..., Y s ) θ X 1 = x), h 2 (y) = E(h(X 1,..., X d, Y 1,..., Y s ) θ Y 1 = y), the Studentized two-sample U-statistic is given by where Û m,n = σ 1 m,n(u m,n θ) with σ 2 m,n = d 2 σ 2 1/m + s 2 σ 2 2/n, σ 2 1 = 1 m 1 m ( qi 1 m i=1 m i=1 q i ) 2, σ 2 2 = 1 n 1 n ( pj 1 n j=1 n ) 2, p j j=1 1 q i = h(xi, X ( m 1 d 1)( n i1,..., X id 1 s), Y j 1,..., Y js ), 1 p j = h(x1,..., X ( m d)( n 1 id, Y j, Y j1,..., Y js 1). s 1)

3. Cramér-type moderate deviations for Studentized U-statistics One-sample case: Theorem (Shao and Z. 2012) Assume that v p := (E h 1 (X) p ) 1/p < for some 2 < p 3, and that (h(x 1,..., x d ) θ) 2 c 0 (κσ 2 + d j=1 ) h 2 1(x j ) (1) for some constants c 0 1 and κ 0. Then there exists a positive constant C depending only on d such that P(Û n x)/(1 Φ(x)) = 1 + O(1) ( v p p (1+x) p n ) σ p n p/2 1 + a d (1+x) 3 holds uniformly for 0 x C 1 min((v p /σ)n 1/2 1/p, (n/a d ) 1/6 ), where a d = c 0 (κ + d), O(1) C.

Condition (1) on the kernel function is satisfied for the class of bounded kernels, such as the one-sample Wilcoxon test statistic, Kendall s tau, Spearman s rho, etc. More importantly, it extends the boundedness assumption to more general settings.

Condition (1) on the kernel function is satisfied for the class of bounded kernels, such as the one-sample Wilcoxon test statistic, Kendall s tau, Spearman s rho, etc. More importantly, it extends the boundedness assumption to more general settings. Analogously in the two-sample case, we require that (h(x 1,..., x d, y 1,..., y s ) θ) 2 c 0 (κσ 2 + d h 2 1(x i ) + i=1 s j=1 ) h 2 2(y j ) (2) where σ 2 = σ 2 1 + σ2 2 = Var(h 1(X)) + Var(h 2 (Y)).

Two-sample case: Theorem (Shao and Z. 2014) Assume that v 1,p = (E h 1 (X) p ) 1/p and v 2,p = (E h 2 (Y) p ) 1/p are finite for some 2 < p 3, and that condition (2) holds. Then there exists a positive constant C depending only on (d, s) such that P(Û m,n x)/(1 Φ(x)) = 1 + O(1)R n,m (x) holds uniformly for 0 x C 1 A m,n, where A m,n = min ( (σ 1 /v 1,p )m 1/2 1/p, (σ 2 /v 2,p )n 1/2 1/p, a 1/6 d,s (mn/(m + n)) 1/6), R m,n (x) = (v 1,p /σ 1 ) p (1+x)p + (v m p/2 1 2,p /σ 2 ) p (1+x)p + a n p/2 1 d,s (1 + x) 3 m+n mn with a d,s = c 0 (κ + d + s), and O(1) C.

Corollary (Moderate deviation for Studentized two-sample U-statistics with first order accuracy) Assume w.l.o.g. that n = min(m, n), Eh 2 1 (X), Eh2 2 (Y) > 0 and both E h 1 (X) 2+δ and E h 2 (Y) 2+δ are finite for some 0 < δ 1. Then P(Û m,n x)/(1 Φ(x)) = 1 + O(1)(1 + x) 2+δ n δ/2 holds uniformly over x [0, o(n δ/(4+2δ) )).

Corollary (Moderate deviation for Studentized two-sample U-statistics with first order accuracy) Assume w.l.o.g. that n = min(m, n), Eh 2 1 (X), Eh2 2 (Y) > 0 and both E h 1 (X) 2+δ and E h 2 (Y) 2+δ are finite for some 0 < δ 1. Then P(Û m,n x)/(1 Φ(x)) = 1 + O(1)(1 + x) 2+δ n δ/2 holds uniformly over x [0, o(n δ/(4+2δ) )). This result addresses the dependence between the range of uniform convergence of the relative error in the central limit theorem and the required (heavy-tailed) moment conditions. Question: Under higher order moment conditions, say δ > 1, whether it is possible to obtain a better approximation (second order accuracy) for the tail probability P(Û m,n x) for c 1 n 1/6 x c 2 n 1/2.

Self-normalized moderate deviations for independent r.v. s: Let X 1, X 2,... be independent random variables with EX i = 0. Write S n = n X i, Vn 2 = i=1 n Xi 2. i=1 The self-normalized moderate deviations describe the rate of convergence of the relative error of P(S n /V n x) to 1 Φ(x). The corresponding results with first and second order accuracies were established in Jing, Shao and Wang (2003) (first order accuracy under finite third moments) and Wang (2005, 2011) (second order accuracy under finite fourth moments).

Write B 2 n = n i=1 EX 2 i, L kn = B k n n E X i k, k 3. i=1 Theorem (Jing, Shao and Wang, 2003) If X 1, X 2,... are independent r.v. s with EX i = 0 and 0 < E X i 3 <, then P(S n /V n x)/(1 Φ(x)) = 1 + O(1)(1 + x) 3 L 3n for 0 x L 1/3 3n, where O(1) is bounded by an absolute constant.

Theorem (Wang, 2011) If X 1, X 2,... are independent r.v. s with EX i = 0 and 0 < EX 4 i <, then P(S n /V n x)/(1 Φ(x)) ( n ) ( = exp x3 EXi 3 1 + O(1) ( (1 + x)l 3n + (1 + x) 4 ) ) L 4n 3 i=1 for 0 x C 1 L 1/4 4n, where O(1) is bounded by an absolute constant.

Theorem (Wang, 2011) If X 1, X 2,... are independent r.v. s with EX i = 0 and 0 < EX 4 i <, then P(S n /V n x)/(1 Φ(x)) ( n ) ( = exp x3 EXi 3 1 + O(1) ( (1 + x)l 3n + (1 + x) 4 ) ) L 4n 3 i=1 for 0 x C 1 L 1/4 4n, where O(1) is bounded by an absolute constant. Question: Whether a similar expansion holds for more general Studentized nonlinear statistics, such as Hoeffding s class of U-statistics after suitably Studentized.

4. Two-sample t-statistic: A first attempt As a prototypical example of two-sample U-statistics, the two-sample t-statistic is of significant interest due to its wide applicability. Advantages: High degree of robustness against heavy-tailed data. The robustness of the t-statistic is useful in high dimensional data analysis under the sparsity assumption on the signal of interest (Delaigle, Hall and Jin, 2011). When dealing with two experimental groups, typically assumed to be independent, in scientifically controlled experiments, the two-sample t-statistic is one of the most commonly used statistics for hypothesis testing and constructing confidence intervals for the difference between the means of the two groups.

Let X 1,..., X m be a random sample from a population with mean µ 1 and variance σ1 2, and let Y 1,..., Y n be a random sample from another population with mean µ 2 and variance σ2 2. Assume that the two random samples are drawn independently. The two-sample t-statistic is defined as T m,n = X m Ȳ n σ 21 /m + σ22 /n, where σ 2 1 = 1 m 1 m i=1 (X i X m ) 2, σ 2 2 = 1 n 1 n (Y j Ȳ n ) 2. j=1

Corollary Assume that µ 1 = µ 2, and E X 1 2+δ <, E Y 1 2+δ < for some 0 < δ 1. Then P( T m,n x)/(1 Φ(x)) = 1 + O(1)(1 + x) 2+δ( (v 1,2+δ /σ 1 ) 2+δ m δ/2 + (v 2,2+δ /σ 2 ) 2+δ n δ/2) holds uniformly for 0 x C 1 min ( (σ 1 /v 1,2+δ )m δ/(4+2δ), (σ 2 /v 2,2+δ )n δ/(4+2δ)), where v 1,s = ( E X 1 µ 1 s) 1/s, v2,s = ( E Y 1 µ 2 s) 1/s for all s > 2 and O(1) C.

Corollary Assume that µ 1 = µ 2, E X 1 2+δ, E Y 1 2+δ < for some 1 < δ 2 and write γ 1 = E(X 1 µ 1 ) 3, γ 2 = E(Y 1 µ 2 ) 3. Then P( T m,n x)/(1 Φ(x)) ( = exp x3 (γ 1 /m 2 + γ 2 /n 2 ) ) (1 ) + O(1)Rm,n(x) 3 (σ1 2/m + σ2 2 /n)3/2 holds uniformly for 0 x C 1 A m,n, where A m,n = min ( (σ 1 /v 1,2+δ )m δ/(4+2δ), (σ 2 /v 2,2+δ )n δ/(4+2δ)), R m,n (x) = (v 1,3 /σ 1 ) 3 (1 + x)m 1/2 + (v 1,2+δ /σ 1 ) 2+δ (1 + x) 2+δ m δ/2 + (v 2,3 /σ 2 ) 3 (1 + x)n 1/2 + (v 2,2+δ /σ 2 ) 2+δ (1 + x) 2+δ n δ/2, v 1,s = ( E X 1 µ 1 s) 1/s, v2,s = ( E Y 1 µ 2 s) 1/s for s > 2 and O(1) C.

Question: Whether similar moderate deviation results with second order accuracy hold for general Studentized U-statistics. Standardized non-degenerate (one-sample) U-statistics (Borovskikh and Weber, 2001): Recall that X 1, X 2,... are i.i.d. r.v. s with dist. F. Assume that then E exp(c h(x 1,..., X d ) ) < for some c > 0, P( n(u n θ)/(d σ) x)/(1 Φ(x)) ( ( ( )) x 3 x 1 + x = exp λ F n ))(1 + O n n holds for x = o( n), with defined Cramér series which is convergent for small u. λ F (u) = λ 0,F + λ 1,F u + λ 2,F u 2 +

5. Applications Multiple-hypothesis testing in high dimensions: Analysis of gene expression microarray data, where the purpose is to examine whether each gene in isolation behaves differently in a control group v.s. an experimental group. The statistical model is { X g,i = µ g,1 + ε g,i, i = 1,..., m, Y g,j = µ g,2 + ω g,j, j = 1,..., n, for g = 1,..., G, g: the gth gene; i and j: the ith and jth array; µ g,1 and µ g,2 : the mean effects for the gth gene from the first and the second group. g, ε g,1,..., ε g,m (resp. ω g,1,..., ω g,n ) are independent r.v. s with mean zero and variance σg,1 2 > 0 (resp. σ2 g,2 > 0). For the gth marginal test, when the population variances σg,1 2 and σ2 g,2 are unequal, the two-sample t-statistic is most commonly used to carry out hypothesis testing for the null H g,0 : µ g,1 = µ g,2 against the alternative H g,1 : µ g,1 µ g,2. Nonparametric alternative: Studentized Mann-Whitney U-test.

Write H 1 = {g = 1,..., G : µ g,1 µ g,2 } and assume that Let α (0, 1) be fixed. lim G G 1 H 1 = π 1 [0, 1). Non-sparse setting (0 < π 1 < 1): Cao and Kosorok (2011) proposed a method to compute critical values directly for rejection regions to control FDR, for heavy tailed data (finiteness of 4th moments) and when log(g) = o(n 1/3 ).

Write H 1 = {g = 1,..., G : µ g,1 µ g,2 } and assume that Let α (0, 1) be fixed. lim G G 1 H 1 = π 1 [0, 1). Non-sparse setting (0 < π 1 < 1): Cao and Kosorok (2011) proposed a method to compute critical values directly for rejection regions to control FDR, for heavy tailed data (finiteness of 4th moments) and when log(g) = o(n 1/3 ). Sparse setting ( H 1 G η, 0 < η < 1): Adapting the regularized bootstrap correction method proposed by Liu and Shao (2013) to the current problem using two-sample t-statistics, the bootstrap calibration is accurate for heavy-tailed data (finiteness of 6th moments), provided that log(g) = o(n 1/2 ) as n = min(n, m). Real data example: Above procedures can be applied to the analysis of a leukemia cancer set (Golub et al., 1999) in order to identify differentially expressed genes between AML (acute lymphoblastic leukemia) and ALL (acute myeloid leukemia). The raw data consist of G = 7129 genes, 47 samples in class ALL and 25 in class AML.

Testing many moment inequalities (Chernozhukov, Chetverikov and Kato, 2013): Let X 1,..., X m (resp. Y 1,..., Y n ) be i.i.d. random vectors in R p, where X i = (X i1,..., X ip ) T (resp. Y j = (Y j1,..., Y jp ) T ). Consider the null hypothesis or H 0 : EX 1l EY 1l, l [p] versus H 1 : l [p], EX 1l > EY 1l H 0 : P(X 1l Y 1l ) 1 2, l [p] versus H 1 : l [p], P(X 1l Y 1l ) > 1 2, where [p] = {1,..., p}.

References 1 CHERNOZHUKOV, V., CHETVERIKOV, D. and KATO, K. (2013). Testing many moment inequalities. arxiv:1312.7614. 2 CHUNG, E. and ROMANO, J. (2011). Asymptotically valid and exact permutation tests based on two-sample U-statistics. Technical report. 3 JING, B.-Y., SHAO, Q.-M. and WANG, Q. (2003). Self-normalized Cramér-type large deviations for independent random variables. Ann. Probab. 31 2167 2215. 4 SHAO, Q.-M. and ZHOU, W.-X. (2012). Cramér type moderate deviation theorems for self-normalized processes. arxiv:1405.1218. 5 SHAO, Q.-M. and ZHOU, W.-X. (2014). Cramér type moderate deviations for Studentized two-sample U-statistics with applications. Technical report.