Empirical Likelihood Tests for High-dimensional Data

Similar documents
Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Statistical Inference On the High-dimensional Gaussian Covarianc

Asymptotic Statistics-VI. Changliang Zou

Lecture 3. Inference about multivariate normal distribution

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

On corrections of classical multivariate tests for high-dimensional data

Asymptotic Statistics-III. Changliang Zou

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

A General Framework for High-Dimensional Inference and Multiple Testing

CENTRAL LIMIT THEOREMS FOR CLASSICAL LIKELIHOOD RATIO TESTS FOR HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

ICSA Applied Statistics Symposium 1. Balanced adjusted empirical likelihood

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

Extreme inference in stationary time series

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS

High-dimensional covariance estimation based on Gaussian graphical models

STAT 461/561- Assignments, Year 2015

high-dimensional inference robust to the lack of model sparsity

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

STT 843 Key to Homework 1 Spring 2018

High-dimensional asymptotic expansions for the distributions of canonical correlations

Bivariate Paired Numerical Data

STA442/2101: Assignment 5

University of California, Berkeley

Monte Carlo Integration I [RC] Chapter 3

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST

Nonparametric Inference via Bootstrapping the Debiased Estimator

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

δ -method and M-estimation

Non-Stationary Time Series and Unit Root Testing

Estimation for two-phase designs: semiparametric models and Z theorems

Department of Economics, Vanderbilt University While it is known that pseudo-out-of-sample methods are not optimal for

Exploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability

Institute of Actuaries of India

1 Glivenko-Cantelli type theorems

Multivariate-sign-based high-dimensional tests for sphericity

Asymptotic Normality under Two-Phase Sampling Designs

STA205 Probability: Week 8 R. Wolpert

Testing Homogeneity Of A Large Data Set By Bootstrapping

UNIVERSITY OF TORONTO Faculty of Arts and Science

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Hypothesis Testing for Var-Cov Components

arxiv: v1 [stat.me] 28 May 2018

Estimation of the Bivariate and Marginal Distributions with Censored Data

On testing the equality of mean vectors in high dimension

11. Further Issues in Using OLS with TS Data

More powerful tests for sparse high-dimensional covariances matrices

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Empirical Likelihood Inference for Two-Sample Problems

Stat 710: Mathematical Statistics Lecture 31

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

1 Mixed effect models and longitudinal data analysis

Statistical Data Analysis Stat 3: p-values, parameter estimation

Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Appendix of the paper: Are interest rate options important for the assessment of interest rate risk?

Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

Quantitative trait evolution with mutations of large effect

Statistica Sinica Preprint No: SS

Testing for a unit root in an ar(1) model using three and four moment approximations: symmetric distributions

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

High-dimensional two-sample tests under strongly spiked eigenvalue models

and Comparison with NPMLE

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

Unit roots in vector time series. Scalar autoregression True model: y t 1 y t1 2 y t2 p y tp t Estimated model: y t c y t1 1 y t1 2 y t2

Hypothesis Testing One Sample Tests

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Thomas J. Fisher. Research Statement. Preliminary Results

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Pattern correlation matrices and their properties

Math 423/533: The Main Theoretical Topics

Composite Hypotheses and Generalized Likelihood Ratio Tests

Statistical Inference of Covariate-Adjusted Randomized Experiments

Quasi-likelihood Scan Statistics for Detection of

Understanding Regressions with Observations Collected at High Frequency over Long Span

Penalized Empirical Likelihood and Growing Dimensional General Estimating Equations

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Adjusted Empirical Likelihood for Long-memory Time Series Models

Hypothesis testing:power, test statistic CMS:

Psychology 454: Latent Variable Modeling How do you know if a model works?

Topics in statistical inference for massive data and high-dimensional data

MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN A TWO-SAMPLE PROBLEM

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

SUPPLEMENTARY SIMULATIONS & FIGURES

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

BTRY 4090: Spring 2009 Theory of Statistics

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

The Multinomial Model

Stat 502 Design and Analysis of Experiments General Linear Model

Transcription:

Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on joint work with Liang Peng and Rongmao Zhang

Contents 1 Introduction 2 3 4 5

Introduction In this talk we discuss empirical likelihood ratio tests for high-dimensional data. Let X i = (X i1,..., X ip ), i = 1, 2,, n be iid random vectors with mean µ = (µ 1,, µ p ) and covariance Σ = (σ ij ) 1 i,j p. Here p may depend on n. If p is fixed, then it is a traditional statistical setting. If p, then it is high-dimensional setting.

Typical testing questions: Testing µ = µ 0 (one sample mean test). Two sample means testing. Testing Σ = Σ 0 (covariance matrix test) Two sample covariance matrices testing. We focus on testing covariance matrices.

Problem setup. Let X i = (X i1,..., X ip ), i = 1, 2,, n be iid random vectors with mean µ = (µ 1,, µ p ) and covariance Σ = (σ ij ) 1 i,j p.

Problem setup. Let X i = (X i1,..., X ip ), i = 1, 2,, n be iid random vectors with mean µ = (µ 1,, µ p ) and covariance Σ = (σ ij ) 1 i,j p. Testing covariance matrix H 0 : Σ = Σ 0 against H 1 : Σ Σ 0. (1)

Problem setup. Let X i = (X i1,..., X ip ), i = 1, 2,, n be iid random vectors with mean µ = (µ 1,, µ p ) and covariance Σ = (σ ij ) 1 i,j p. Testing covariance matrix H 0 : Σ = Σ 0 against H 1 : Σ Σ 0. (1) Testing bandedness H 0 : σ ij = 0 for all i j τ. (2)

Problem setup. Let X i = (X i1,..., X ip ), i = 1, 2,, n be iid random vectors with mean µ = (µ 1,, µ p ) and covariance Σ = (σ ij ) 1 i,j p. Testing covariance matrix H 0 : Σ = Σ 0 against H 1 : Σ Σ 0. (1) Testing bandedness H 0 : σ ij = 0 for all i j τ. (2) Non-parametric. No information about sparsity.

Literature. Testing (1) for fixed p: traditional likelihood ratio test; scaled distance measure test (John (1971, 1972) and Nagao (1973)).

Literature. Testing (1) for fixed p: traditional likelihood ratio test; scaled distance measure test (John (1971, 1972) and Nagao (1973)). Testing (1) for divergent p: Ledoit and Wolf (2002) for normal X i and Chen, Zhang and Zhong (2010) for a factor model. Specific models are imposed. Restrictions are put on p. p has to go to infinity as n approaches infinity.

Testing (2) for divergent p: Cai and Jiang (2011). The test statistic: the coherence converges slowly. Normality are assumed. Restrictions are put on p and τ.

Testing (2) for divergent p: Cai and Jiang (2011). The test statistic: the coherence converges slowly. Normality are assumed. Restrictions are put on p and τ. Testing (2) for divergent p: Qiu and Chen (2012). Similar to Chen, Zhang and Zhong (2010), specific models; restrictions.

Our goal: build up a test statistic that works for both (1) and (2); loose the condition on p; get rid of specific models.

Our goal: build up a test statistic that works for both (1) and (2); loose the condition on p; get rid of specific models. First we assume µ is known. The case when µ is unknown is very similar.

Basic observations. Σ = Σ 0 is equivalent to D 2 := Σ Σ 0 2 F = tr((σ Σ 0) 2 ) = 0.

Basic observations. Σ = Σ 0 is equivalent to D 2 := Σ Σ 0 2 F = tr((σ Σ 0) 2 ) = 0. We can construct our test based on an estimator of D 2.

A natural estimator. For i = 1,..., n, define the p p matrix Y i = (X i µ)(x i µ) T, and estimator e(γ) = tr((y 1 Γ)(Y 2 Γ)). E[Y 1 ] = Σ and E[e(Σ 0 )] = D 2. E[e(Σ 0 )] = 0 is equivalent to Σ = Σ 0.

We need independent copies of (Y 1, Y 2 ).

We need independent copies of (Y 1, Y 2 ). Splitting the sample. Let N = [n/2]. For i = 1, 2,..., N, we define e i (Σ) = tr((y i Σ)(Y N+i Σ)).

We need independent copies of (Y 1, Y 2 ). Splitting the sample. Let N = [n/2]. For i = 1, 2,..., N, we define e i (Σ) = tr((y i Σ)(Y N+i Σ)). Very difficult to estimate the variance of e i.

We need independent copies of (Y 1, Y 2 ). Splitting the sample. Let N = [n/2]. For i = 1, 2,..., N, we define e i (Σ) = tr((y i Σ)(Y N+i Σ)). Very difficult to estimate the variance of e i. Empirical likelihood method automatically catches the asymptotic variance.

Define the empirical likelihood ratio function with constraint (estimating equation) E[e 1 (Σ)] = 0: N L 0 (Σ) = sup{ (Np i ) : i=1 N N p i = 1, p i e i (Σ) = 0, p i 0}. i=1 i=1

Define the empirical likelihood ratio function with constraint (estimating equation) E[e 1 (Σ)] = 0: N L 0 (Σ) = sup{ (Np i ) : i=1 N N p i = 1, p i e i (Σ) = 0, p i 0}. i=1 i=1 Under some regularity conditions and H 0, 2 log L 0 (Σ 0 ) converges weakly to χ 2 1. This seems good but...

Define the empirical likelihood ratio function with constraint (estimating equation) E[e 1 (Σ)] = 0: N L 0 (Σ) = sup{ (Np i ) : i=1 N N p i = 1, p i e i (Σ) = 0, p i 0}. i=1 i=1 Under some regularity conditions and H 0, 2 log L 0 (Σ 0 ) converges weakly to χ 2 1. This seems good but... Shortfall of the test based on L 0. When Σ Σ 0 2 F is small, E[e 1(Σ 0 )] will be very close to 0 (in a rate of Σ Σ 0 2 F ). In this case the test based on L 0 has a poor power. Later we will see this in power analysis.

Secondary constraint. We add one more constraint which is easier to break under H 1.

Secondary constraint. We add one more constraint which is easier to break under H 1. The choice of the second linear constraint can be arbitrary.

Secondary constraint. We add one more constraint which is easier to break under H 1. The choice of the second linear constraint can be arbitrary. With no prior information, the following statistics v i (Σ): v i (Σ) = 1 T p (Y i Y N+i 2Σ)1 p can be used in a constraint E[v 1 (Σ 0 )] = 0.

We define the empirical likelihood function with two constraints as N L 1 (Σ 0 ) = sup{ (Np i ) : i=1 N N p i = 1, i=1 i=1 ( ) ei (Σ 0 ) p i = v i (Σ 0 ) ( ) 0, p i 0}. 0

We define the empirical likelihood function with two constraints as N L 1 (Σ 0 ) = sup{ (Np i ) : i=1 N N p i = 1, i=1 i=1 ( ) ei (Σ 0 ) p i = v i (Σ 0 ) ( ) 0, p i 0}. 0 Theorem 1 Suppose e 1 (Σ 0 ) and v 1 (Σ 0 ) satisfy a regularity condition (P). Then under H 0, 2 log L 1 (Σ 0 ) converges in distribution to χ 2 2 as n.

CLT condition (similar to the Lyapunov condition). (P) We say a statistic T with size n satisfies condition (P) if ET 2 > 0 and for some δ > 0, E T 2+δ δ+min(δ,2) (ET 2 = o(n 4 ). ) 1+δ/2 For example, if E(T 4 )/(E(T 2 )) 2 = o(n), then T satisfies (P) with δ = 2.

Remark 1 In order to prove Theorem 1, it is sufficient to prove condition (P) e guarantees that the sample t i = ( i (Σ 0 ), v i (Σ 0 ) Var(ei (Σ 0 )) Var(vi (Σ 0 )) )T satisfies CLT and ti 2 satisfies LLN, with a controlled maximum.

Remark 1 In order to prove Theorem 1, it is sufficient to prove condition (P) e guarantees that the sample t i = ( i (Σ 0 ), v i (Σ 0 ) Var(ei (Σ 0 )) Var(vi (Σ 0 )) )T satisfies CLT and t 2 i satisfies LLN, with a controlled maximum. Remark 2 When µ is unknown, just replace µ in Y i by the sample means and the theorem still holds with one extra moment condition.

Remark 3 In the factor model considered by Chen, Zhang and Zhong (2010), e 1 (Σ 0 ) and v 1 (Σ 0 ) satisfy (P). With this model, our test allows p to diverge arbitrarily fast or stay finite.

The problem is testing Here we consider µ is known. H 0 : σ ij = 0 for all i j τ. (3)

We are interested the information of the black squares in Σ and we will ignore the stars. Basic observation. H 0 is equivalent to the black squares of Σ being 0.

Define the τ-off-diagonal upper triangular matrix M (τ) of a matrix M: { (M (τ) Mij j i + τ; ) ij = 0 j < i + τ. H 0 is equivalent to tr((σ (τ) ) T Σ (τ) ) = 0.

For i = 1,..., N, Let e i = tr((y (τ) i ) T Y (τ) N+i ), v i = 1 T p (Y (τ) i + Y (τ) N+i )1 p. We define the empirical likelihood function as N L 2 = sup{ (Np i ) : i=1 N N p i = 1, i=1 Here we omit the Σ 0 in e i and v i. i=1 ( ) e p i i v i = ( ) 0, p i 0}. 0

Theorem 2 Suppose that e 1 and v 1 satisfy (P). Then under H 0 in (3), 2 log L 2 converges in distribution to χ 2 2 as n.

Theorem 2 Suppose that e 1 and v 1 satisfy (P). Then under H 0 in (3), 2 log L 2 converges in distribution to χ 2 2 as n. The method can be used to test some other structures. One interesting application is to test the assumption or estimation of the sparsity.

Remark 4 (1) In the Gaussian model used by Cai ( and Jiang (2011), e 1 and v 1 satisfy (P) provided that τ = o 1 i,j p σ ij. ( 1 i,j p σ ij ) 1/2 )

Remark 4 (1) In the Gaussian model used by Cai ( and Jiang (2011), e 1 and v 1 satisfy (P) provided that τ = o 1 i,j p σ ij. ( 1 i,j p σ ij ) 1/2 ) (2) With moment or boundedness conditions, the Gaussian assumption can be removed.

Denote π 11 = E(e 1 (Σ) 2 ), π 22 = E(v 1 (Σ) 2 ), ζ 1 = tr((σ Σ 0 ) 2 )/ π 11 and ζ 2 = 21 T p (Σ Σ 0 )1 p / π 22. For most models we discuss, ( tr((σ Σ0 ) 2 ) ) ζ 1 = O tr(σ 2 ) and ( ) 1 T p (Σ Σ 0 )1 p ζ 2 = O 1 T p Σ 2. 1 p

Theorem 3 In addition to the conditions of Theorem 1, if H 1 : Σ Σ 0 holds, then P{ 2 log L 1 (Σ 0 ) > ξ 1 α } = P{χ 2 2,ν > ξ 1 α } + o(1) for any level α as n, where χ 2 2,ν is a noncentral chi-square distribution with two degrees of freedom and noncentrality parameter ν = N(ζ1 2 + ζ2 2 ),

Remark 5 From the above power analysis, the new test rejects the null hypothesis with probability tending to one when max( nζ 1, n ζ 2 ).

Remark 5 From the above power analysis, the new test rejects the null hypothesis with probability tending to one when max( nζ 1, n ζ 2 ). Note that the test given in Chen, Zhang and Zhong (2010) requires nζ 1.

Remark 5 From the above power analysis, the new test rejects the null hypothesis with probability tending to one when max( nζ 1, n ζ 2 ). Note that the test given in Chen, Zhang and Zhong (2010) requires nζ 1. Our test may have a better power or a worse power in different settings.

Remark 5 From the above power analysis, the new test rejects the null hypothesis with probability tending to one when max( nζ 1, n ζ 2 ). Note that the test given in Chen, Zhang and Zhong (2010) requires nζ 1. Our test may have a better power or a worse power in different settings. Same results for the test in Theorem 2.

It is clear that our tests is powerful when Σ Σ 0 is dense, and not powerful when Σ Σ 0 is sparse.

It is clear that our tests is powerful when Σ Σ 0 is dense, and not powerful when Σ Σ 0 is sparse. With only the first constraint E(e 1 (Σ 0 )) = 0, the test power (requires nζ 1 ) is worse than the test in Chen, Zhang and Zhong (2010).

It is clear that our tests is powerful when Σ Σ 0 is dense, and not powerful when Σ Σ 0 is sparse. With only the first constraint E(e 1 (Σ 0 )) = 0, the test power (requires nζ 1 ) is worse than the test in Chen, Zhang and Zhong (2010). The test in Cai and Jiang (2011) is good when Σ Σ 0 is sparse but is powerless when Σ Σ 0 is dense, since their test power depends on Σ Σ 0 max.

Testing covariance matrices We assume a dense model and a local alternative. We compare with Chen, Zhang and Zhong (2010) for testing covariance matrices and Cai and Jiang (2011) for testing bandedness. The ELT has biased size for small n, so we also give a bootstrap calibrated version of ELT.

Table : Testing covariance matrices (n, p) EL(0.05) BCEL(0.05) CZZ(0.05) EL(0.05) BCEL(0.05) CZZ(0.05) δ = 0 δ = 0 δ = 0 δ = 1 δ = 1 δ = 1 (50, 25) 0.127 0.054 0.053 0.296 0.118 0.219 (50, 50) 0.148 0.065 0.067 0.324 0.136 0.216 (50, 100) 0.138 0.068 0.038 0.317 0.125 0.212 (50, 200) 0.168 0.081 0.041 0.310 0.113 0.221 (50, 400) 0.151 0.071 0.045 0.342 0.145 0.242 (50, 800) 0.154 0.064 0.041 0.337 0.137 0.219 (200, 25) 0.065 0.048 0.052 0.348 0.305 0.179 (200, 50) 0.058 0.052 0.041 0.336 0.298 0.162 (200, 100) 0.068 0.054 0.059 0.353 0.319 0.179 (200, 200) 0.056 0.051 0.058 0.358 0.322 0.155 (200, 400) 0.069 0.064 0.051 0.374 0.343 0.180 (200, 800) 0.058 0.047 0.050 0.366 0.338 0.182

Table : Testing bandedness (n, p) EL(0.05) BCEL(0.05) CJ(0.05) EL(0.05) BCEL(0.05) CJ(0.05) δ = 0 δ = 0 δ = 0 δ = 1 δ = 1 δ = 1 (50, 25) 0.118 0.036 0.015 0.272 0.093 0.017 (50, 50) 0.124 0.049 0.010 0.266 0.097 0.018 (50, 100) 0.126 0.057 0.005 0.268 0.099 0.004 (50, 200) 0.128 0.058 0.003 0.268 0.100 0.001 (50, 400) 0.113 0.053 0.002 0.282 0.121 0.001 (50, 800) 0.128 0.062 0.001 0.281 0.109 0.000 (200, 25) 0.078 0.062 0.019 0.288 0.253 0.034 (200, 50) 0.074 0.059 0.033 0.323 0.286 0.020 (200, 100) 0.057 0.053 0.019 0.332 0.304 0.044 (200, 200) 0.066 0.046 0.024 0.293 0.263 0.032 (200, 400) 0.061 0.052 0.020 0.336 0.304 0.016 (200, 800) 0.053 0.046 0.026 0.317 0.297 0.025

Conclusion. The new technique works for non-parametric models; allows arbitrary p; requires only moment conditions; avoids to estimate asymptotic variance; the limiting distribution is always χ 2 2 ; can be applied to testing sample mean, two-sample means, and two-sample covariance matrices under the HD framework.

Shortfalls: the number of observations is reduced by half; the power is good in the dense setting but not in the sparse setting. The optimal choice of the second constraint is unknown.

References I Cai, T. and Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist. 39, 1496 1525. Chen, S., Zhang, L. and Zhong, P. (2010). Tests for High-Dimensional Covariance Matrices. J. Amer. Statist. Assoc. 105, 810-819. Ledoit, O., and Wolf, M. (2002). Some Hypothesis Tests for the Covariance Matrix When the Dimension Is Large Compare to the Sample Size. Ann. Statist., 30, 1081 1102.

References II Nagao, H. (1973). On some test criteria for covariance matrix. Ann. Statist., 1, 700 709. Qin, J. and Lawless, J.F. (1994). Empirical likelihood and general estimating equations. Ann. Statist. 22, 300 325. Qiu, Y. and Chen, S. (2012). Test for bandedness of high-dimensional covariance matrices and bandwidth estimation. Ann. Statist. 40, 1285-1314.

Thank you!