Estimation of the large covariance matrix with two-step monotone missing data

Similar documents
ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

arxiv: v2 [stat.me] 3 Nov 2014

State Estimation with ARMarkov Models

A New Asymmetric Interaction Ridge (AIR) Regression Method

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Chapter 3. GMM: Selected Topics

Hotelling s Two- Sample T 2

4. Score normalization technical details We now discuss the technical details of the score normalization method.

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

Introduction to Probability and Statistics

arxiv: v1 [physics.data-an] 26 Oct 2012

On split sample and randomized confidence intervals for binomial proportions

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Generalized Coiflets: A New Family of Orthonormal Wavelets

Positive decomposition of transfer functions with multiple poles

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

Approximating min-max k-clustering

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

Additive results for the generalized Drazin inverse in a Banach algebra

General Linear Model Introduction, Classes of Linear models and Estimation

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS

SOME TESTS CONCERNING THE COVARIANCE MATRIX IN HIGH DIMENSIONAL DATA

On Wald-Type Optimal Stopping for Brownian Motion

COMMUNICATION BETWEEN SHAREHOLDERS 1

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Estimating function analysis for a class of Tweedie regression models

Ž. Ž. Ž. 2 QUADRATIC AND INVERSE REGRESSIONS FOR WISHART DISTRIBUTIONS 1

Supplementary Materials for Robust Estimation of the False Discovery Rate

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables

MULTIVARIATE SHEWHART QUALITY CONTROL FOR STANDARD DEVIATION

Testing Block-Diagonal Covariance Structure for High-Dimensional Data

Linear diophantine equations for discrete tomography

Sign Tests for Weak Principal Directions

A multiple testing approach to the regularisation of large sample correlation matrices

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES

Unobservable Selection and Coefficient Stability: Theory and Evidence

Covariance Matrix Estimation for Reinforcement Learning

Asymptotic non-null distributions of test statistics for redundancy in high-dimensional canonical correlation analysis

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

An Analysis of Reliable Classifiers through ROC Isometrics

AN OPTIMAL CONTROL CHART FOR NON-NORMAL PROCESSES

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

arxiv: v2 [stat.me] 27 Apr 2018

High-Dimensional Asymptotic Behaviors of Differences between the Log-Determinants of Two Wishart Matrices

ESTIMATION OF THE RECIPROCAL OF THE MEAN OF THE INVERSE GAUSSIAN DISTRIBUTION WITH PRIOR INFORMATION

Sums of independent random variables

The spectrum of kernel random matrices

An Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling

Information collection on a graph

Information collection on a graph

On the sphericity test with large-dimensional observations. Citation Electronic Journal of Statistics, 2013, v. 7, p

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression

Applicable Analysis and Discrete Mathematics available online at HENSEL CODES OF SQUARE ROOTS OF P-ADIC NUMBERS

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Maxisets for μ-thresholding rules

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

The power performance of fixed-t panel unit root tests allowing for structural breaks in their deterministic components

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Asymptotic Analysis of the Squared Estimation Error in Misspecified Factor Models

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process

On the Maximum and Minimum of Multivariate Pareto Random Variables

Proximal methods for the latent group lasso penalty

Spectral Analysis by Stationary Time Series Modeling

Minimax Design of Nonnegative Finite Impulse Response Filters

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations

Location of solutions for quasi-linear elliptic equations with general gradient dependence

Some Measures of Agreement Between Close Partitions

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

BEST CONSTANT IN POINCARÉ INEQUALITIES WITH TRACES: A FREE DISCONTINUITY APPROACH

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

An Inverse Problem for Two Spectra of Complex Finite Jacobi Matrices

Approximation of the Euclidean Distance by Chamfer Distances

2 K. ENTACHER 2 Generalized Haar function systems In the following we x an arbitrary integer base b 2. For the notations and denitions of generalized

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP

On-Line Appendix. Matching on the Estimated Propensity Score (Abadie and Imbens, 2015)

Adaptive Estimation of the Regression Discontinuity Model

MATH 2710: NOTES FOR ANALYSIS

Radial Basis Function Networks: Algorithms

Ratio Estimators in Simple Random Sampling Using Information on Auxiliary Attribute

arxiv: v4 [math.nt] 11 Oct 2017

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems

Probability Estimates for Multi-class Classification by Pairwise Coupling

Published: 14 October 2013

Robust Beamforming via Matrix Completion

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels

Some results of convex programming complexity

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Understanding and Using Availability

216 S. Chandrasearan and I.C.F. Isen Our results dier from those of Sun [14] in two asects: we assume that comuted eigenvalues or singular values are

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari

Transcription:

Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo University of Science 2 Deartment of Health Sciences, Oita University of Nursing and Health Sciences 3 Deartment of Mathematics, KTH Royal Institute of Technology We suggest a shrinkage based technique for estimating covariance matrix in the high-dimensional normal model with missing data. Our asymtotic framework allows the dimensionality grow to infinity together with the samle size, N, and extends the methodology of Ledoit and Wolf 2004 to the case of 2-ste monotone missing data. Two new shrinkage tye estimators are derived and their dominance roerties over the Ledoit and Wolf 2004 estimator are shown under the exected quadratic loss. We erform a simulation study and conclude that the roosed estimators are successful for a range of missing data scenarios. Key Words Monotone missing data; High-dimensional estimation; Shrinkage tye estimation. Mathematics Subject Classification 62H2; 62F2.

Introduction Estimation of the covariance matrix Σ in high dimensions has been imortant for many statistical rocedures such as rincial comonent analysis, linear regression and discriminant analysis. Some of these rocedures require estimation of the covariance structure and eigenvalues of Σ while the others need the inverse. It has long been known that standard estimators such as maximum likelihood or least squares develoed for large samle sizes N and relatively small number of variables do not rovide a satisfactory solution in high dimensions. Therefore modification of these estimators is of crucial imortance for high-dimensional roblems. During recent years, many methods for imroving the estimators of Σ and its inverse in the context of high dimensional data, i.e. if the number of variables is much larger than N, have been develoed; see e.g., Kubokawa and Srivastava 2008 and Ledoit and Wolf 2004. However, all of these methods have been focused on the case where all data is observed. Since in ractice data sets often contain missing atterns, it is imortant to develo aroaches for high dimensional covariance estimation in resence of missing data. In this study, we focus on a articular tye of statistical missing data, a monotone missing scheme with missing values atterns occurring comletely at random. In other words, we assume that the underlying missing-data mechanism is ignorable under arametric estimation. This tye of missing data roblem has been extensively studied in the statistical literature; see e.g. Anderson 957, Bhargava 975. The samle scheme treated in them is called a k-ste monotone missing data. In articular, in Anderson and Olkin 985, the MLE of Σ for the case of 2-ste monotone missing data was derived exlicitly. Further, Kanda and Fujikoshi 998 studied asymtotic roerties of MLE of Σ constructed by 2-ste monotone missing data and extend these results to the general case of k-ste monotone missing data. However, in high dimensional situations, the MLE of Σ is known to erform oorly even for the case of comlete data, since it is usually imossible to collect 2

enough observations to fulfill n condition. In this aer, we resent a solution to this roblem by extending the idea of Ledoit and Wolf 2004 to the case of 2-ste monotone missing data and suggest a new estimator of the covariance matrix that is adjusted to high dimensionality in combination with missing data. The rest of the aer is organized as follows. In Section 2, we set u 2-ste monotone missing data scheme and derive some auxiliary unbiased estimators, which then will be used in Section 3 where we resent our new shrinkage estimators. Section 4 reorts numerical studies on the risk erformance of the suggested estimators and comarison with results by Ledoit and Wolf 2004. We conclude in Section 5 and resent the roof of key results in Aendix A.2, A.3 and A.4. 2 Auxiliary unbiased estimators and their asymtotic roerty under the 2-ste monotone missing data To secify the roblem considered here, let x be the -dimensional random vector distributed as N µ, Σ, where µ is mean vector and Σ is covariance matrix. Assume that x can be decomosed as x, x 2, where x and x 2 are and 2 -dimensional vectors, resectively. Suose now that we have N indeendent observations on the full set of variables, x, and N 2 indeendent observations on x. We introduce some notation for the samle means and covariance matrices. Let x denote the samle, mean of x based on the N observations, and x = x, x 2 x i : i, i =, 2. Let further x 2 denote the -dimensional samle mean vector for based on the N 2 observations. Throughout this aer, we use the letter j only as running suffix for samle observations. With above notations, the 2-ste monotone missing data scheme, described in Shutoh et al. 20 can be resented by x, x 2,..., x N, x 2, x 2 2,..., x 2 N 2, where x j j =,..., N denotes the -dimensional samle vector and x 2 j j = 3

,..., N 2 denotes the -dimensional samle vector. The grahical reresentation of this scheme is given in Figure.. Please insert Figure. around here. Further the samle covariance matrices based on the N and N 2 observations are exressed as S = N x j x x j x, S 2 = n n 2 j= resectively, where n i = N i, i =, 2. N 2 j= x 2 j x 2 x 2 Let the artitions of µ, Σ and S corresonding to ones of x be µ Σ Σ µ =, Σ = 2, S S S 2 = µ 2 Σ 2 Σ 22 S 2 S. 22, j x 2, resectively. Let ˆµ and ˆΣ denote the MLE of µ and Σ, resectively, and assume that ˆµ and ˆΣ can be artitioned in the same way as µ and Σ. From Anderson and Olkin 985, we can exress the MLE s ˆµ and ˆΣ as follows where ˆµ = N N x + N 2 x 2, ˆµ 2 = x 2 ˆΣ ˆΣ 2 x ˆµ, ˆΣ = W + W 2, ˆΣ2 = N ˆΣ W W 2, ˆΣ 22 = N W 22 + ˆΣ 2 ˆΣ ˆΣ 2, N = N + N 2, W = n S, W 2 = n 2 S 2 + N N 2 N W W 2 W = W 2 W 22 x x 2 x x 2,, W 22 = W 22 W 2 W W 2. In Lemma 2. stated below, we roose two auxiliary unbiased estimators of Σ under 2-ste monotone missing data. Lemma 2.. Let ˆΣ and ˆΣ 2 be defined as ˆΣ, ˆΣ,2 ˆΣ2, ˆΣ2,2 ˆΣ =, ˆΣ2 =, ˆΣ,2 ˆΣ,22 ˆΣ 2,2 ˆΣ2,22 4

where ˆΣ, = W + W 2, ˆΣ,2 = N ˆΣ W W 2, ˆΣ,2 = ˆΣ,2, ˆΣ,22 = b W 22 + ˆΣ ˆΣ 2 ˆΣ2, ˆΣ 2, = W + W 2, ˆΣ2,2 = N ˆΣ 2,2 = ˆΣ 2,2, ˆΣ2,22 = N W 22, N W 2, and Then b = N N N 2. N N 2 E[ˆΣ ] = E[ˆΣ 2 ] = Σ. Proof : See Aendix A.2. Due to the unbiasedness roerty, both estimators, ˆΣ and ˆΣ 2 are exected to erform better than S which is based on the comlete data art only. The following lemma rovides exact results on the relative erformance accuracy of these three estimators in terms of exected quadratic loss. Lemma 2.2. Let β 0 = a 2 + a 2, n n where a i = tr Σ i / and Σ i denotes i-th ower of the matrix Σ for i =, 2. Then, i E[ S Σ 2 F ] = β 0, ii E[ ˆΣ Σ 2 F ] = β 0 m, iii E[ ˆΣ 2 Σ 2 F ] = β 0 m 2, where 2 F denotes the normalized Frobenious norm, m = N { 2 a 2 tr Σ 2 22 tr Σ 22 2 tr Σ tr Σ 22 + a 2 k k 2 2k 3 nn + tr Σ 22 Σ 2 Σ Σ 2 + tr Σ 22 tr Σ 2 Σ } Σ 2, m 2 = N 2 a 2 + a 2 nn, 5

and k = k 2 = k 3 = 3r + 4r 2 2r 3 r 2 4 r + 6r 2 r 2 + 3 4r r 2 n n 2 r n r 3 }, 2. n { r 4r + 7r 2 4r 3 2rr 3 2 + 2rr 4 2 + 8r3 r 2 4rr 2 2 5r 2 + 6r 5 + 8r2 r 2 2r r 2 2r + 7 n n 2 + 2r { 2 r 2 3 r r } n r 3n, 2.2 n 3 r n, 2.3 r = n, r 2 = n n. Proof : See Aendix A.3. It follows from 2.-2.3 that k i = + On for all i, assuming that is fixed and n is relatively large. Then the following relationshi between m and m 2 holds m 2 = m + N 2 trσ2 Σ Σ 2 2 + Σ 2Σ 2 nn + tr Σ tr Σ 2 Σ Σ 2 2 + o, n + tr Σ 2Σ Σ 2 2 which imlies that ˆΣ is more accurate estimator than ˆΣ 2 in a large samle case. However, ˆΣ will be worse than S since k, k 2 and k 3 grow when /N. On the other hand, since ˆΣ 2 does not deend on the relationshi between and N, its accuracy is better than S. By this reason, it can be said that ˆΣ 2 is a flexibly accurate estimator. 3 The shrinkage estimator for high-dimensional data with 2-ste monotone missing values On the comlete samles, Ledoit and Wolf 2004 rovide a shrinkage estimator for Σ given by ˆΣ f = ˆβ f ˆδ f ˆµ f I + ˆα f ˆδ f S, 6

where ˆα f = â 2 â 2, ˆβf = n â 2, ˆδ f = â 2 + â 2 n, ˆµ f = â. In this study, we extend this aroach to the case of 2-ste monotone missing data using auxiliary estimators ˆΣ and ˆΣ 2 derived in Section 2. Ledoit and Wolf tye estimator is derived by the linear combination Σ = ρ I + ρ 2 ˆΣb of the identity matrix I and the unbiased estimator ˆΣ b = ˆΣ or ˆΣ 2 whose exected quadratic loss is minimum. Observed that ˆΣ = ˆΣ f if ˆΣ b = S. Following the technique of Ledoit and Wolf 2004, consider now the otimization roblem min E[ Σ Σ 2 F ] s.t. Σ = ρ I + ρ 2 ˆΣb, 3. ρ, ρ 2 where the coefficients ρ and ρ 2 are non-random. The solution of 3. is given by Σ = β δ µi + α δ ˆΣ b, where µ = tr Σ/, α = Σ µi 2 F, β = E[ ˆΣ b Σ 2 F ], δ = E[ ˆΣ b µi 2 F ]. With this notations we obtain µ = a, α = a 2 a 2. 3. The suggested estimator based on ˆΣ A crucial ste of our estimation is to get the exlicit exressions of E[ ˆΣ Σ 2 F ] and E[ ˆΣ µi 2 F ] in terms of a i and a 22 i consider the following conditions which are functions of tr Σ 22. We C0 : n, n 2,,, C : n c 0 [0,, 0 < lim a i = lim tr Σ i / <, i =,..., 4. Using Lemma A., under the conditions C0 and C, n i c i 0,, i =, 2. E[ ˆΣ Σ 2 F ] = β + o, E[ ˆΣ µi 2 F ] = δ + o, 7

where β = a 2 m n, δ = a 2 + a 2 m n, m = m a, a 22 = N { 2 a 2 2k 3 a 22 a + 2k 3 k 2 a 22 2 nn and a 22 = tr Σ 22 /. Thus, the asymtotically otimal linear combination of the identity matrix I and ˆΣ is given by Σ = β δ µi + α δ ˆΣ. } Observe however that Σ deends on unknown arameters. Therefore, the next ste is to relace these arameters their consistent estimators, and show that the asymtotic roerties of the resulting estimator are unchanged. Under the conditions C0 and C, the consistent estimators of α, β and δ are given by ˆα = â 2 â 2, ˆβ = n â 2 ˆm, ˆδ = /n â 2 + â 2 ˆm, 3.2 where ˆm = m â, â 22, and â, â 2 and â 22 are consistent estimators of the corresonding arameters â = â 22 tr S, â 2 = n 2 n n + 2 tr S 2 tr S 2, n = n tr S 22 n. 3.3 Now, by combining 3.2 and 3.3, we define the our estimator as ˆΣ = ˆβ ˆδ ˆµI + ˆαˆδ ˆΣ. One can see that ˆΣ has the same asymtotic roerties as Σ : ˆΣ Σ 2 F P 0 and E[ ˆΣ Σ 2 F ] E[ Σ Σ 2 F ] under conditions C0 and C. Consider further the relationshi between ˆΣ and ˆΣ f by Ledoit and Wolf 2004 when using comlete data art only. The following roosition rovides conditions for the estimator ˆΣ f to dominate ˆΣ f under the exected quadratic loss. 8

Proosition 3.. Under the conditions C and n, n 2,, with /n 0, /n i c i 0,, i =, 2 and n i /n + n 2 c i 0,, i = 3, 4 E[ ˆΣ f Σ 2 F ] E[ ˆΣ Σ 2 F ] c c 4 α δ 0 δ a 2 a 22 2. Proof : See Aendix A.4. 3.2 The suggested estimator based on ˆΣ 2 The dominance roerty of ˆΣ over ˆΣ f follows from the condition /n 0, which is very strong. Furthermore, if > n, ˆΣ is not alicable due to singularity of ˆΣ. In this subsection, we get another condition for the estimator of Σ to dominate ˆΣ f in the case of > n. Using the technique of subsection 3. and by considering C3 : n, n 2,,, n c 0 [0,, n i /n + n 2 c i 0,, i = 3, 4, n i c i 0,, i =, 2, we obtain E[ ˆΣ 2 Σ 2 F ] = β 2 + o, E[ ˆΣ 2 µi 2 F ] = δ 2 + o, where β 2 = n a 2 m 2, δ 2 = a 2 + a 2 m n 2, m 2 = m 2a = N 2 a 2 nn and a = tr Σ /. Observe that C3 resents more flexible asymtotic framework than C0. Now, the asymtotically otimal linear combination of the identity matrix I and ˆΣ 2 is given by Σ 2 = β 2 δ 2 µi + α δ 2 ˆΣ2. Of this, we roose our estimator of the form ˆΣ 2 = ˆβ 2 ˆδ 2 ˆµI + ˆαˆδ2 ˆΣ2, 9

where ˆα = â 2 â 2, ˆβ2 = n â 2 m 2â, ˆδ 2 = /n â 2 + â 2 m 2â and â = tr S /. By the same arguments as in subsection 3., Σ 2 and ˆΣ 2 have the same asymtotic roerties because ˆΣ 2 Σ 2 2 F conditions C and C3. P 0 and E[ ˆΣ 2 Σ 2 F ] E[ Σ 2 Σ 2 F ] under The dominance roerty of ˆΣ 2 over ˆΣ f follows from the following roosition. Proosition 3.2. Under the conditions C and C3, E[ ˆΣ f Σ 2 F ] E[ ˆΣ 2 Σ 2 F ] c c 4 α δ 0 δ 2 a 2. Proof : The roof of this roosition which we omit, is straightforward alication of the technique derived in Aendix A.4. 4 Numerical results Using Monte Carlo simulations, we comare erformance of our new estimators ˆΣ and ˆΣ 2 to the shrinkage estimator by Ledoit and Wolf 2004. The simulation exeriment is lanned as follows. The data are generated from the normal model with zero mean vector and covariance matrix Σ = diagσ, σ 2,..., σ ρ i j diagσ, σ 2,..., σ, where σ i is a random variable following continuous uniform distribution on the interval 0, 5. For all simulations we assume N = N 2 for convenience. We set = 00, 50, 200, and for each one consider three different values of ρ = 0.2, 0.5, 0.8 to vary the covariance arameters. The first setting reresents the case when missing art of the data asymtotically exceeds the comlete art. We fix = 0, the size of comlete art, and assume that N = N 2 = 50. This imlies that 2, the dimension of the missing art grows with. 0

In the second setting, we vary the roortion of the data that are missing. To ut it more recisely, for each = 00, 50, 200, we set = [q ], for q = 0.5, 0.7, 0.9, and Σ = diagσ, σ 2,..., σ 0.8 i j diagσ, σ 2,..., σ, where σ i is a random variable having continuous uniform distribution on the interval 0, 5. Here, [a] denotes the integer art of a. Within the first setting, three following estimators are evaluated: ˆΣ f, ˆΣ, ˆΣ 2, whereas in the second setting we only focus on ˆΣ f and ˆΣ 2. To evaluate erformance accuracy we use the loss function based on normalized Frobenius norm, LΣ, = Σ 2 F, and define the risk function as the exected loss, RΣ, = E[LΣ, ], when estimating Σ by. We asses the relative risk rate as follows RR = RΣ, /RΣ, S, where S is used reference estimator for all comarisons. For all the settings, the value of RR is comuted by 00,000 relications. Tables 4.-4.3 reort the risk rates for all three estimators in the for the first setting, and figures 4.-4.3 lot the risk rates for ˆΣ f and ˆΣ 2 as functions of for the second setting. Overview through the simulation results reveals that our new estimators rovide significant imrovements over ˆΣ f for both tyes of missing scenarios. Although RRs of ˆΣ and ˆΣ 2 are very close to each other for all ρ, ˆΣ 2 dominates ˆΣ in the case when missing art grows. As exected, ˆΣ 2 yields substantial imrovement over ˆΣ f even in the case of fixed missing art. It could be observed that both estimators are sensitive to the size of ρ 0, which is naturally exected in a AR model. In conclusion, both suggested estimators, ˆΣ and ˆΣ 2 are shown to have dominance roerties over ˆΣ f in high-dimensional normal model with 2-ste monotone missing data. Please insert Tables 4. 4.3 and Figures 4. 4.3 around here.

5 Conclusion This aer rovided the estimators for Σ in the case of 2-ste monotone missing data with high-dimensional setting by extending Ledoit and Wolf s 2004 estimator. The result rovided a solution to the roblem such that the existing MLE on the basis of 2-ste monotone missing data is not also alicable in the case of high-dimensional setting. Further, it could be observed that our estimators were suerior to Ledoit and Wolf s 2004 estimator in view of the relative risk rate for the erformed simulations. In the future, we may also derive the theoretical results of the rocedures with our estimators which are alicable to the high-dimensional and missing data. Aendix To rove roositions 2. and 2.2 we need two additional lemmas stated below. For these roofs we need the some reliminary results formulated in Lemma A. and Lemma A.2. A. Preliminary results Lemma 5.. Suose that A follows Wishart distribution W Σ, n, where A admits the similar artition as that of Σ in Section 2, and let A 22 = A 22 A 2 A A 2. Then i A 22 W Σ 22, n, and A 22 is indeendent of A and A 2, ii A 2 A N 2 V eca Σ Σ 2, Σ 22 A Proof. For the roof, see Muirhead 982 and Siotani et al. 985. Now we state Lemma A.2. 2

Lemma 5.2. For ˆΣ defined in Section 2, it holds that E[tr ˆΣ 2 ] = n + n tr Σ 2 + n tr Σ 2 N 2 nn {tr Σ 2 + tr Σ 2 k tr Σ 2 22 k 2 tr Σ 22 2 2k 3 tr Σ 22 tr Σ + tr Σ 22 Σ 2 Σ Σ 2 + tr Σ 22 tr Σ 2 Σ Σ 2 }, where k i, i =, 2, 3 is defined in Lemma 2.2. Proof. Using the following exression E[tr ˆΣ 2 ] = E[trˆΣ 2, + ˆΣ,2 ˆΣ,2 ] + E[trˆΣ 2,22 + ˆΣ,2 ˆΣ,2 ] A. and the results of Lemma A., we obtain E[trˆΣ 2, + ˆΣ,2 ˆΣ,2 ] = n + n trσ2 + Σ 2 Σ 2 + n tr Σ tr Σ N 2 + nn tr Σ 22 tr Σ, A.2 For the second term in A. we get E[trˆΣ 2,22 + ˆΣ,2 ˆΣ,2 ] = n 2 E [tr ˆΨ 2 W 2 ˆΨ 2 ] + b 2 E + b n E [tr ˆΨ 2 W ˆΨ2 W 22 ] + b n E [tr W 22 ˆΨ 2 W ˆΨ2 ] ] [tr W 2 22 + n 2 E [tr ˆΨ 2 W ˆΨ2 ˆΨ2 W ˆΨ2 ], A.3 3

where ˆΨ 2 = W 2 W. From Lemma A., E[tr ˆΨ 2 W 2 ˆΨ 2 ] = nn + tr Σ 2 Σ 2 + ntr Σ tr Σ 2 Σ Σ 2 + tr Σ 22 tr Σ + E[tr W 2 22 ] = n n tr Σ 2 22 +n tr Σ 22 2, E[tr ˆΨ 2 W ˆΨ2 W 22 ] = n + E[tr ˆΨ 2 W ˆΨ2 2 ] = +nn tr Σ 2 Σ { 2 + 22 N 2 nn 2 n tr Σ tr Σ 22, N 2 n } Σ 2 Σ 22 tr Σ 2 22 n + c 2N 2 + N2 2 2 } tr Σ 2 22 +2c 2 N 2 + N 2 2 + N2 2 { + + 2 N 2 A.4, A.5 n + c 22N 2 + N2 2 2 } +c 2 n N 2 + N2 2 + N 2 2 +{tr Σ 2 22 + tr Σ 22 2 } + 2 n + N 2 + N 2 2 + N 2 + N 2 n +2n + N 2 n +n 2 + n trσ 2 Σ Σ 2 2 tr Σ 22 Σ 2 Σ Σ 2 tr Σ 22 tr Σ 2 Σ Σ 2 +n tr Σ 2 Σ Σ 2 2. A.6 Substituting A.4-A.6 into A.3, the roof of Lemma A. is comlete. 4

A.2 Proof of Lemma 2. Using the moment roerties of the Wishart distribution and alying Lemma A., we get E[ˆΣ, ] = [ E N = ] W + W 2 N {N Σ + N 2 Σ } = Σ, A.7 [ ] E[ˆΣ,2 ] = N E W + W 2 W W 2 [ ] = N E W 2 + W 2 W W 2 = N {N Σ 2 + N 2 Σ 2 } = Σ 2, A.8 E[ˆΣ,22 ] = N N N 2 E[W N N 2 22 ] + [ ] N E W 2 W W + W 2 W W 2.A.9 Then, for the first term of A.9 we have N = N N 2 N N 2 Σ 22. N N 2 N N 2 For the second term of A.9 we obtain [ N E W 2 W = Σ 22 W + W 2 W N N 2 N N 2 E[W 22 ] ] W 2 Σ 22. A.0 A. Substituting A.0 and A. into A.9, we have E[ˆΣ,22 ] = Σ 22. A.2 Combining A.7, A.8 and A.2, we get E[ˆΣ ] = Σ. The unbiasedness of ˆΣ 2 is easily shown by calculating the first moment of the Wishart matrices W and W 2. 5

A.3 Proof of Lemma 2.2 The result in i is established in see, e.g., Ledoit and Wolf 2004. To show ii we first exress it as E[ ˆΣ Σ 2 F ] = E[tr ˆΣ 2 ]/ a 2. By using Lemma A.2, we can calculate E[tr ˆΣ 2 ], which gives the statement ii. It should be noted that i is shown assuming that N 2 = 0. Finaly, we give the roof of iii, by first exressing E[ ˆΣ 2 Σ 2 F ] = E[tr ˆΣ 2 2]/ a 2 A.3 and using the decomosition of ˆΣ 2 as n n S + N 2 n S2 S 2 S 2 S. 22 Then E [tr ˆΣ 2 2] = = { E [ n tr n S + N 2 ]} [tr S 2 n S2 + E 22 { n + tr Σ 2 + tr Σ 2 n n 2 ] } + 2 E[tr S 2 S 2 ] N 2 nn { tr Σ 2 + tr Σ 2}. A.4 Substituting A.4 into A.3, we obtain iii. It should be noted that i is shown assuming that N 2 = 0. A.4 Proof of Proosition 3. Under the conditions C and assuming that n, n 2,, with /n 0 and /n i c i 0,, i =, 2, E[ ˆΣ Σ 2 F ] αβ δ and E[ ˆΣ f Σ 2 F ] αβ 0 δ 0, A.5 6

where δ 0 = α+β 0. By using the continuous maing theorem and A.5, we obtain E[ ˆΣ f Σ 2 F ] E[ ˆΣ Σ 2 F ] αβ 0 δ 0 αβ δ. A.6 On the other hand, under the same assumtions as above, αβ 0 δ 0 αβ δ = α δ 0 δ β 0 β = α δ 0 δ m N 2α nn δ 0 δ a 2 a 22 2. A.7 Combining A.6 and A.7, Proosition 3. follows. Acknowledgments The authors would like to extend their sincere gratitude to the referee who gave invaluable comments and suggestions, which have greatly enhanced this aer. Any remaining errors are the authors resonsibility. This research was in art suorted by grants from the Forum for Asian Studies 44400-200, Stockholm University, Sweden. M.Hyodo s research was in art suorted by Grant-in-Aid for JSPS Fellow 23-973. T. Seo s research was in art suorted by Grant-in-Aid for Scientific Research C 23500360. T. Pavlenko s research was in art suorted by grants from Swedish Research Council 42-2008-966. References Anderson, T. W. 957. Maximum likelihood estimates for multivariate normal distribution when some observations are missing. Journal of the American Statistical Association, 52, 200-203. Anderson, T.W. and Olkin, I. 985. Maximum likelihood estimation of the arameters of multivariate normal distribution. Linear Algebra and Its Alications, 70, 47-7. Bhargava, R. P. 975. Some one-samle hyothesis testing roblems when there is monotone samle from a multivariate normal oulation. Annals of the Institute of Statistical Mathematics, 27, 327-339. 7

Fujikoshi, Y. and Seo, T. 998. Asymtotic aroximations for EPMC s of the linear and the quadratic discriminant function when the samle size and the dimension are large. Random Oerators and Stochastic Equations, 6, 269-280. Jinadasa, K. G. and Tracy, D. S. 992. Maximum likelihood estimation for multivariate normal distribution with monotone samle. Communications in Statistics A, Theory and Methods, 2, 4-50. Kanda, T. and Fujikoshi, Y. 998. Some basic roerties of the MLE s for a multivariate normal distribution with monotone missing data. American Journal of Mathematical and Management Sciences, 8, 6-90. Kubokawa, T. and Srivastava, M. S. 2008. Estimation of the recision matrix of a singular Wishart distribution and its alication in high-dimensional data. Journal of Multivariate Analysis, 99, 906-928. Little, R. J. A. and Rubin, D. R. 987. Statistical Analysis with Missing Data, Wiley, New York. Ledoit, O. and Wolf, M. 2004. A well-conditioned estimator for large dimensional covariance matrices. Journal of Multivariate Analysis, 88, 365 4. Muirhead, R. J. 982. Asects of multivariate statistical theory, Wiley, New York. Shutoh, N., Hyodo, M. and Seo, T. 20. An asymtotic aroximation for EPMC in linear discriminant analysis based on two-ste monotone missing samles. Journal of Multivariate Analysis, 02, 252 263. Siotani, M., Hayakawa, T. and Fujikoshi, Y. 985. Modern Multivariate Statistical Analysis, A Graduate Course and Handbook. Ohio:American Science Press. 8

Table 4.: The risk rates when ρ = 0.2 = 00 = 50 = 200 RRˆΣ f 0.968 0.347 0.050 RRˆΣ 0.946 0.336 0.044 RRˆΣ 2 0.942 0.336 0.044 Table 4.2: The risk rates when ρ = 0.5 = 00 = 50 = 200 RRˆΣ f 0.3392 0.2550 0.26 RRˆΣ 0.339 0.25 0.235 RRˆΣ 2 0.3309 0.2507 0.234 Table 4.3: The risk rates when ρ = 0.8 = 00 = 50 = 200 RRˆΣ f 0.682 0.5826 0.5039 RRˆΣ 0.6574 0.5634 0.4895 RRˆΣ 2 0.6508 0.5625 0.4890 9

N N 2 missing data 2 Figure.: 2-ste monotone missing data RR lotted as a function of the number of variables for = ˆΣ fsolid and = ˆΣ 2dashed. Fig.4.. q = 0.5. Fig.4.2. q = 0.7. Fig.4.3. q = 0.9. 20