Statistical Inference with Monotone Incomplete Multivariate Normal Data

Similar documents
Statistical Inference with Monotone Incomplete Multivariate Normal Data

Finite-Sample Inference with Monotone Incomplete Multivariate Normal Data, I

THE STATISTICAL ANALYSIS OF MONOTONE INCOMPLETE MULTIVARIATE NORMAL DATA

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Finite-Sample Inference with Monotone Incomplete Multivariate Normal Data, II

Lecture 3. Inference about multivariate normal distribution

Testing equality of two mean vectors with unequal sample sizes for populations with correlation

An Introduction to Multivariate Statistical Analysis

Multivariate Statistics

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Statistical Inference On the High-dimensional Gaussian Covarianc

Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern

You can compute the maximum likelihood estimate for the correlation

ANALYSIS OF VARIANCE AND QUADRATIC FORMS

Multivariate Statistical Analysis

Review of probability and statistics 1 / 31

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Linear Models and Estimation by Least Squares

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Notes on Random Vectors and Multivariate Normal

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Math 423/533: The Main Theoretical Topics

Inferences about a Mean Vector

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

Multivariate Analysis and Likelihood Inference

Lecture 5: Hypothesis tests for more than one sample

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

The Delta Method and Applications

[y i α βx i ] 2 (2) Q = i=1

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Ch 2: Simple Linear Regression

TEST FOR INDEPENDENCE OF THE VARIABLES WITH MISSING ELEMENTS IN ONE AND THE SAME COLUMN OF THE EMPIRICAL CORRELATION MATRIX.

Part 6: Multivariate Normal and Linear Models

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Multivariate Distributions

1 Data Arrays and Decompositions

STAT 730 Chapter 5: Hypothesis Testing

Gibbs Sampling in Linear Models #2

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem

Stat 579: Generalized Linear Models and Extensions

Lecture 7 Introduction to Statistical Decision Theory

Applied Multivariate and Longitudinal Data Analysis

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

1 Appendix A: Matrix Algebra

Qualifying Exam in Probability and Statistics.

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Ch 3: Multiple Linear Regression

g-priors for Linear Regression

Regression and Statistical Inference

Lecture Note 1: Probability Theory and Statistics

1 Exercises for lecture 1

Central Limit Theorem ( 5.3)

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Covariance function estimation in Gaussian process regression

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Nonparametric Inference In Functional Data

Canonical Correlation Analysis of Longitudinal Data

Estimation of a multivariate normal covariance matrix with staircase pattern data

Notes on the Multivariate Normal and Related Topics

STATISTICS SYLLABUS UNIT I

Lecture 8: Information Theory and Statistics

Conjugate Analysis for the Linear Model

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Multivariate Statistical Analysis

12 Discriminant Analysis

Estimation of the large covariance matrix with two-step monotone missing data

MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García. Comunicación Técnica No I-07-13/ (PE/CIMAT)

Applied Multivariate and Longitudinal Data Analysis

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

where x and ȳ are the sample means of x 1,, x n

Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

z = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Estimation: Problems & Solutions

Sample mean, covariance and T 2 statistic of the skew elliptical model

Problem Selected Scores

Analysis of variance, multivariate (MANOVA)

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

Multivariate Linear Regression Models

Department of Statistics

STAT 512 sp 2018 Summary Sheet

Stat 206: Sampling theory, sample moments, mahalanobis

Least Squares Estimation

LECTURES IN ECONOMETRIC THEORY. John S. Chipman. University of Minnesota

Studentization and Prediction in a Multivariate Normal Setting

Transcription:

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful co-authors: Wan-Ying Chang (Dept. of Fish & Wildlife, WA, USA) Tomoya Yamada (Sapporo Gakuin University, Japan) Megan Romer (Penn State University)

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 2/4

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 3/4 Background We have a population of patients We draw a random sample of N patients, and measure m variables on each patient: 1 Visual acuity 2 LDL (low-density lipoprotein) cholesterol 3 Systolic blood pressure 4 Glucose intolerance 5 Insulin response to oral glucose 6 Actual weight Expected weight. m. White blood cell count

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 4/4 We obtain data: Patient 1 2 3 N v 1,1 v 1,2. v 2,1 v 2,2. v 3,1 v 3,2. v N,1 v N,2. v 1,m v 2,m v 3,m v N,m Vector notation: V 1,V 2,...,V N V 1 : The measurements on patient 1, stacked into a column etc.

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 5/4 Classical multivariate analysis Statistical analysis of N m-dimensional data vectors Common assumption: The population has a multivariate normal distribution V : The vector of measurements on a randomly chosen patient Multivariate normal populations are characterized by: µ: The population mean vector Σ: The population covariance matrix For a given data set, µ and Σ are unknown

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 6/4 We wish to perform inference about µ and Σ Construct confidence regions for, and test hypotheses about, µ and Σ Anderson (2003). An Introduction to Multivariate Statistical Analysis, third edition Eaton (1984). Multivariate Statistics: A Vector-Space Approach Johnson and Wichern (2002). Applied Multivariate Statistical Analysis Muirhead (1982). Aspects of Multivariate Statistical Theory

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 7/4 Standard notation: V N p (µ,σ) The probability density function of V : For v R m, f(v) = (2π) m/2 Σ 1/2 exp ( 1 2 (v µ) Σ 1 (v µ) ) V 1,V 2,...,V N : Measurements on N randomly chosen patients Estimate µ and Σ using Fisher s maximum likelihood principle Likelihood function: L(µ,Σ) = N j=1 f(v j) Maximum likelihood estimator: The value of (µ, Σ) that maximizes L

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 8/4 µ = 1 N N j=1 V j: The sample mean and MLE of µ Σ = 1 N n j=1 (V j V )(V j V ) : The MLE of Σ What are the probability distributions of µ and Σ? µ N p (µ, 1 N Σ) LLN: As N, 1 N Σ 0 and hence µ µ, a.s. N Σ has a Wishart distribution, a generalization of the χ 2 µ and Σ also are mutually independent

Monotone incomplete data Some patients were not measured completely The resulting data set, with denoting a missing observation v 1,1 v 1,2 v 1,3. v 1,m v 2,2 v 2,3. v 2,m v 3,2. v 3,m. v N,m Monotone data: Each is followed by s only We may need to renumber patients to display the data in monotone form Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 9/4

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 10/4 Physical Fitness Data A well-known data set from a SAS manual on missing data Patients: Men taking a physical fitness course at NCSU Three variables were measured: Oxygen intake rate (ml. per kg. body weight per minute) RunTime (time taken, in minutes, to run 1.5 miles) RunPulse (heart rate while running)

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 11/4 Oxygen RunTime RunPulse 44.609 11.37 178 39.407 12.63 174 45.313 10.07 185 46.080 11.17 156 54.297 8.65 156 45.441 9.63 164 51.855 10.33 166 54.625 8.92 146 49.156 8.95 180 39.442 13.08 174 40.836 10.95 168 60.055 8.63 170 44.811 11.63 176 37.388 14.03 186 45.681 11.95 176 44.754 11.12 176 39.203 12.88 168 46.672 10.00 * 45.790 10.47 186 46.774 10.25 * 50.545 9.93 148 45.118 11.08 * 48.673 9.40 186 49.874 9.22 * 47.920 11.50 170 49.091 10.85 * 47.467 10.50 170 59.571 * * 50.388 10.08 168 50.541 * * 47.273 * *

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 12/4 Monotone data have a staircase pattern; we will consider the two-step pattern Partition V into an incomplete part of dimension p and a complete part of dimension q ( X 1 Y 1 ), ( X 2 Y 2 ),..., ( X n Y n ), ( Y n+1 ), ( Y n+2 ),..., ( Y N ) Assume that the individual vectors are independent and are drawn from N m (µ,σ) Goal: Maximum likelihood inference for µ and Σ, with analytical results as extensive and as explicit as in the classical setting

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 13/4 Where do monotone incomplete data arise? Panel survey data (Census Bureau, Bureau of Labor Statistics) Astronomy Early detection of diseases Wildlife survey research Covert communications Mental health research Climate and atmospheric studies.

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 14/4 We have n observations on ( X Y ) and N n additional observations on Y Difficulty: The likelihood function is more complicated L = = = n N f X,Y (x i,y i ) f Y (y i ) i=1 i=n+1 n f Y (y i )f X Y (x i ) i=1 N f Y (y i ) i=1 i=1 N i=n+1 n f X Y (x i ) f Y (y i )

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 15/4 Partition µ and Σ similarly: ( ) ( ) µ1 Σ11 Σ 12 µ =, Σ = µ 2 Σ 21 Σ 22 Let µ 1 2 = µ 1 Σ 12 Σ 1 22 (Y µ 2), Σ 11 2 = Σ 11 Σ 12 Σ 1 22 Σ 21 Y N q (µ 2,Σ 22 ), X Y N p (µ 1 2,Σ 11 2 ) µ and Σ: Wilks, Anderson, Morrison, Olkin, Jinadasa, Tracy,... Anderson and Olkin (1985): An elegant derivation of Σ

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 16/4 Sample means: X = 1 n X j, n j=1 Ȳ 2 = 1 N n N j=n+1 Y j, Ȳ 1 = 1 n Ȳ = 1 N n j=1 N j=1 Y j Y j Sample covariance matrices: n A 11 = (X j X)(X j X) n, A 12 = (X j X)(Y j Ȳ1) j=1 j=1 n A 22,n = (Y j Ȳ 1 )(Y j Ȳ 1 ), A 22,N = j=1 j=1 N (Y j Ȳ )(Y j Ȳ )

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 17/4 The MLE s of µ and Σ Notation: τ = n/n, τ = 1 τ µ 1 = X τa 12 A 1 22,n (Ȳ 1 Ȳ 2 ), µ 2 = Ȳ µ 1 is called the regression estimator of µ 1 In sample surveys, extra observations on a subset of variables are used to improve estimation of a parameter Σ is more complicated: Σ 11 = 1 n (A 11 A 12 A 1 22,n A 21) + 1 N A 12A 1 22,n A 22,NA 1 22,n A 21 Σ 12 = 1 N A 12A 1 22,n A 22,N Σ 22 = 1 N A 22,N

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 18/4 Seventy year-old unsolved problems Explicit confidence levels for elliptical confidence regions for µ In testing hypotheses on µ or Σ, are the LRT statistics unbiased? Calculate the higher moments of the components of µ Determine the asymptotic behavior of µ as n or N The Stein phenomenon for µ? The crucial obstacle: The exact distribution of µ

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 19/4 The Exact Distribution of µ Theorem (Chang and D.R.): For n > p + q, µ = L µ + V 1 + ( ( ) 1 n 1 ) 1/2 ( Q2 ) 1/2 V2 N Q 1, 0 where V 1, V 2, Q 1, and Q 2 are independent; V 1 N p+q (0,Ω), V 2 N p (0,Σ 11 2 ), Q 1 χ 2 n q, Q 2 χ 2 q; Ω = 1 N Σ + ( 1 n 1 ) ( ) Σ 11 2 0 N. 0 0 Corollary: µ is an unbiased estimator of µ. Also, µ 1 and µ 2 are independent iff Σ 12 = 0. A comment on the power of the method of characteristic functions; Terence s Stuff

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 20/4 Computation of the higher moments of µ now is straightforward Due to the term 1/Q 1, even moments exist only up to order n q The covariance matrix of µ: Cov( µ) = 1 N Σ + (n 2) τ n(n q 2) ( ) Σ 11 2 0 0 0

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 21/4 Asymptotics for µ Let n,n with N/n δ 1. Then ( ( )) L Σ 11 2 0 N( µ µ) Np+q 0,Σ + (δ 1) 0 0 Many other asymptotic results can be obtained from the exact distribution of µ; e.g., If n and N with n/n 0 then µ 2 µ 2, almost surely

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 22/4 The analog of Hotelling s T 2 -statistic T 2 = ( µ µ) Ĉov( µ) 1 ( µ µ) where Ĉov( µ) = 1 N Σ + (n 2) τ n(n q 2) ) ( Σ11 2 0 0 0 An obvious ellipsoidal confidence region for µ is { } ν R p+q : ( µ ν) Ĉov( µ) 1 ( µ ν) c Calculate the corresponding confidence level

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 23/4 Chang and D.R.: Probability inequalities for T 2 µ 1 = X τa 12 A 1 22,n (Ȳ1 Ȳ2), µ 2 = Ȳ A crucial identity (Anderson, p. 63): µ Σ 1 µ = ( µ 1 Σ 12 Σ 1 22 µ 1 2) Σ 11 2 ( µ 1 Σ 12 Σ 1 22 µ 2) + µ 2 Σ 1 22 µ 2 Romer (2009, thesis): An exact stochastic representation for the T 2 -statistic Romer s result opens the way toward Stein-estimation of µ when Σ is unknown

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 24/4 A decomposition of Σ Notation: A 11 2,n := A 11 A 12 A 1 22,n A 21 ( ) ( A 11 A 12 n Σ = τ + τ A 21 A 22,n ( ) ( A 12 A 1 22,n 0 +τ 0 I q ) A 11 2,n 0 0 0 B B B B ) ( A 1 22,n A 21 0 0 I q ) where ( ) A 11 A 12 A 21 A 22,n W p+q (n 1,Σ) and B W q (N n,σ 22 ) are independent. Also, N Σ 22 W q (N 1,Σ 22 )

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 25/4 A 22,N = n (Y j Ȳ1 + Ȳ1 Ȳ )(Y j Ȳ1 + Ȳ1 Ȳ ) j=1 N + (Y j Ȳ2 + Ȳ2 Ȳ )(Y j Ȳ2 + Ȳ2 Ȳ ) j=n+1 A 22,N = A 22,n + B B = N j=n+1 (Y j Ȳ2)(Y j Ȳ2) + n(n n) N (Ȳ1 Ȳ2)(Ȳ1 Ȳ2) Verify that the terms in the decomposition of Σ are independent

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 26/4 The marginal distribution of Σ 11 is non-trivial If Σ 12 = 0 then A 22,n, B, A 11 2,n, A 12 A 1 22,n A 21, X, Ȳ 1, and Ȳ2 are independent Matrix F -distribution: F (q) a,b = W 1/2 2 W 1 W 1/2 2 where W 1 W q (a,σ 22 ) and W 2 W q (b,σ 22 )

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 27/4 Theorem: Suppose that Σ 12 = 0. Then Σ 1/2 11 Σ 11 Σ 1/2 11 L = 1 n W 1 + 1 N W 1/2 where W 1, W 2, and F are independent, and 2 ( Ip + F ) W 1/2 2 W 1 W p (n q 1,I p ), F F (p) N n,n q+p 1 W 2 W p (q,i p ), and NΣ 1/2 11 Σ 11 Σ 1/2 11 L = N n Σ 1/2 11 A 11 2,n Σ 1/2 11 + Σ 1/2 11 A 12 A 1 22,n (A 22,n + B)A 1 22,n A 21Σ 1/2 11

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 28/4 Theorem: With no assumptions on Σ 12, Σ 12 Σ 1 22 L = Σ 12 Σ 1 22 + Σ1/2 11 2 W 1/2 KΣ 1/2 22 where W and K are independent, and W W p (n q + p 1,I p ), K N pq (0,I p I q ) In particular, Σ 12 Σ 1 22 is an unbiased estimator of Σ 12Σ 1 22 The general distribution of Σ requires the hypergeometric functions of matrix argument Saddlepoint approximations

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 29/4 The distribution of Σ is much simpler: Σ = Σ 11 2 Σ 22 Σ 11 2 and Σ 22 are independent; each is a product of independent χ 2 variables Hao and Krishnamoorthy (2001): p Σ = L n p N q Σ χ 2 n q j j=1 q j=1 χ 2 N j It now is plausible that tests of hypothesis on Σ are unbiased

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 30/4 Testing Σ = Σ 0 Data: Two-step, monotone incomplete sample Σ 0 : A given, positive definite matrix Test H 0 : Σ = Σ 0 vs. H a : Σ Σ 0 (WLOG, Σ 0 = I p+q ) Hao and Krishnamoorthy (2001): The LRT statistic is λ 1 A 22,N N/2 exp ( 1 2 tra ) 22,N A 11 2,n n/2 exp ( 1 2 tra ) 11 2,n exp ( 1 2 tra 12A 1 22,n A 21) ). Is the LRT unbiased? If C is a critical region of size α, is P(λ 1 C H a ) P(λ 1 C H 0 )?

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 31/4 E. J. G. Pitman: With complete data, λ 1 is not unbiased λ 1 becomes unbiased if sample sizes are replaced by degrees of freedom With two-step monotone data, perhaps a similarly modified statistic, λ 2, is unbiased? Answer: Still unknown. Theorem: If Σ 11 < 1 then λ 2 is unbiased With monotone incomplete data, further modification is needed

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 32/4 Theorem: The modified LRT, λ 3 A 22,N (N 1)/2 exp ( 1 2 tra ) 22,N A 11 2,n (n q 1)/2 exp ( 1 2 tra ) 11 2,n A 12 A 1 22,n A 21 q/2 exp ( 1 2 tra 12A 1 22,n A 21), is unbiased. Also, λ 1 is not unbiased For diagonal Σ = diag(σ jj ), the power function of λ 3 increases monotonically as any σ jj 1 increases, j = 1,...,p + q.

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 33/4 With monotone two-step data, test H 0 : (µ,σ) = (µ 0,Σ 0 ) vs. H a : (µ,σ) (µ 0,Σ 0 ) where µ 0 and Σ 0 are given. The LRT statistic is λ 4 = λ 1 exp ( 1 2 (n X X + NȲ Ȳ ) ) Remarkably, λ 4 is unbiased The sphericity test, H 0 : Σ I p+q vs. H a : I p+q The unbiasedness of the LRT statistic is an open problem

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 34/4 The Stein phenomenon for µ µ: The mean of a complete sample from N m (µ,i m ) Quadratic loss function: L( µ,µ) = µ µ 2 Risk function: R( µ) = E L( µ,µ) C. Stein: µ is inadmissible for m 3 Berkeley Symposium, 1956

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 35/4 The James-Stein estimator for shrinking µ to ν R m : ( ) c µ c = 1 µ ν 2 ( µ ν) + ν Baranchik s positive-part shrinkage estimator: ( ) µ + c c = 1 µ ν 2 ( µ ν) + ν + With complete data from N m (µ,i m ), R( µ) > R( µ c ) > R( µ + c ) for 0 < c < 2(m 2)

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 36/4 We collect a monotone incomplete sample from N p+q (µ,σ) Does the Stein phenomenon hold for µ, the MLE of µ? The phenomenon seems almost universal: It holds for many loss functions, inference problems, and distributions Various results available on shrinkage estimation of Σ with incomplete data, but no such results available for µ The crucial impediment: The distribution of µ was unknown

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 37/4 Theorem (Yamada and D.R.): With two-step monotone incomplete data from N p+q (µ,i p+q ) with p 2 and n q + 3, both µ and µ c are inadmissible: R( µ) > R( µ c ) > R( µ + c ) for all ν R p+q and all c (0,2c ), where c = p 2 n + q N. Non-radial loss functions Replace µ ν 2 by non-radial functions of µ ν Shrinkage to a random vector ν, calculated from the data

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 38/4 The Fundamental Open Problems Extension to general, k-step, monotone incomplete data The crucial impediment: The distribution of µ Conjecture: The Stein phenomenon holds for all monotone missingness patterns under normality Does Stein s phenomenon hold for arbitrary missingness patterns? Will we have reduced risk with similar estimators?

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 39/4 Testing for multivariate normality Monotone incomplete data, i.i.d., unknown population: ( ) ( ) ( ) ( ) ( ) ( X 1,,...,,,,..., Y 1 X 2 Y 2 X n Y n Y n+1 Y n+2 A generalization of Mardia s statistic for testing for kurtosis: Y N ) ˆβ = [( n (Xj ) ) ( (Xj ) )] 2 µ Σ 1 µ j=1 + N j=n+1 Y j Y j [ ] 2 1 (Y j µ 2 ) Σ 22 (Y j µ 2 )

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 40/4 An alternative to ˆβ Impute each missing X j using linear regression: { X j, 1 j n X j = µ 1 + Σ 12 Σ 1 22 (Y j µ 2 ), n + 1 j N Construct β = [( N ) ( Xj Y j=1 j µ ) Σ 1 ( ) ( Xj Y j µ )] 2 Remarkably, β β Theorem (Yamada, Romer, D.R.): With certain regularity conditions, constants c 1, c 2, ( β c 1 )/c 2 L N(0,1)

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 41/4 References Chang and R. (2009). Finite-sample inference with monotone incomplete multivariate normal data, I. J. Multivariate Analysis, 100, 1883 1899. Chang and R. (2010). Finite-sample inference with monotone incomplete multivariate normal data, II. J. Multivariate Analysis, 101, 603 620. Romer (2009). The Statistical Analysis of Monotone Incomplete Data. Ph.D. thesis, Penn State University. R. and Yamada (2010). The Stein phenomenon for monotone incomplete multivariate normal data. J. Multivariate Analysis, 101, 657 678. Romer and R. (2010). Maximum likelihood estimation of the mean of a multivariate normal population with monotone incomplete data. Statist. Probab. Lett., 80, 1284 1288. Yamada, Romer, and R. (2010). Measures of multivariate kurtosis with monotone incomplete data. Preprint, Penn State University. Romer and R. (2010). Finite-sample inference with monotone incomplete multivariate normal data, III. Preprint, Penn State University.