THE STATISTICAL ANALYSIS OF MONOTONE INCOMPLETE MULTIVARIATE NORMAL DATA

Size: px

Start display at page:

Download "THE STATISTICAL ANALYSIS OF MONOTONE INCOMPLETE MULTIVARIATE NORMAL DATA"

Derek Hodge
6 years ago
Views:

1 The Pennsylvania State University The Graduate School Department of Statistics THE STATISTICAL ANALYSIS OF MONOTONE INCOMPLETE MULTIVARIATE NORMAL DATA A Dissertation in Statistics by Megan M. Romer c 2009 Megan M. Romer Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2009

2 The dissertation of Megan M. Romer was reviewed and approved* by the following: Steven F. Arnold Professor of Statistics Chair of Committee Vernon M. Chinchilli Professor of Statistics Distinguished Professor and Chair of Public Health Sciences Diane M. Henderson Professor of Mathematics Donald St. P. Richards Professor of Statistics Dissertation Adviser Bruce G. Lindsay Willaman Professor of Statistics Head of the Department of Statistics * Signatures on file in the Graduate School.

3 iii Abstract We consider problems in finite-sample inference with monotone incomplete data drawn from N d (µ, Σ, a multivariate normal population with mean µ and covariance matrix Σ. In the case of two-step, monotone incomplete data, we show that µ and Σ, the maximum likelihood estimators of µ and Σ, respectively, are equivariant and obtain a new derivation of a stochastic representation for µ. Our new derivation allows us to identify explicitly in terms of the data the independent random variables that arise in that stochastic representation. Again, in the case of two-step, monotone incomplete data, we derive a stochastic representation for the exact distribution of a generalization of Hotelling s T 2, and therefore obtain ellipsoidal confidence regions for µ. We then derive probability inequalities for the cumulative distribution function of T 2. We apply these results to construct confidence regions for linear combinations of µ, and provide a numerical example in which we analyze a data set consisting of cholesterol measurements on a group of Pennsylvania hospital patients. In the case of three-step, monotone incomplete data, we examine the independence properties and joint distribution of subvectors of µ, the maximum likelihood estimator of µ. In our examination of the joint distribution of µ, we first establish that µ is equivariant and then identify the distribution of µ up to a certain set of conditioning variables.

4 iv Table of Contents List of Tables v List of Figures vi Acknowledgments vii Chapter. Introduction Chapter 2. Preliminaries Some matrix algebra Some multivariate distributions The matrix normal distribution The Wishart distribution The multivariate beta distribution The noncentral χ 2 distribution Some properties of these distributions Chapter 3. Two-step Monotone Incomplete Multivariate Normal Data Notation and maximum likelihood estimators A new derivation of an exact stochastic representation for µ An exact stochastic representation for the T 2 -statistic Probability inequalities for the T 2 -statistic Applications of the T 2 -statistic Simultaneous confidence intervals for linear functions of µ Ellipsoidal prediction regions for future observations Analysis of the Pennsylvania cholesterol data Chapter 4. Three-Step Monotone Incomplete Multivariate Normal Data Notation and maximum likelihood estimators Correlation properties of µ, µ 2, and µ The distribution of µ The covariance matrix of µ Chapter 5. Concluding Remarks Bibliography

5 v List of Tables. The Pennsylvania Cholesterol Data % Confidence Interval for Mean Cholesterol Levels

6 vi List of Figures 3. Simulated Cumulative Distribution Function of the T 2 -statistic: Comparison of Bounds (p = 2; q = ; n = 9; N = Simulated Cumulative Distribution Function of the T 2 -statistic: Comparison of Bounds (p = 2; q = 2; n = 0; N = Simulated Cumulative Distribution Function of the T 2 -statistic: Comparison of Bounds (p = 3; q = 3; n = 5; N = Simulated Cumulative Distribution Function of the T 2 -statistic: Comparison of Bounds (p = 4; q = 4; n = 5; N = Simulated Cumulative Distribution Function of the T 2 -statistic: Approximation (p = 2; q = 2; n = 0; N = Simulated Cumulative Distribution Function of the T 2 -statistic: Approximation (p = 4; q = 4; n = 5; N = The Pennsylvania Cholesterol Data: Assessment of Multivariate Normality 7

7 vii Acknowledgments To Donald, Jenn, Chuck, and Baby: For reasons unique to each of you, I thank you.

8 Chapter Introduction In all areas of research, incomplete data sets are ubiquitous. To describe a few of the many situations in which such data occur: in clinical trials, participants often drop out of studies; in engineering research, machines often fail; in scientific laboratories, beakers often break; in astronomy, cloudy weather interferes with data collection. Because of the omnipresence of data sets with missing values, there now exists an extensive literature on the analysis of such data. We refer to Giri [3], Johnson and Wichern [6], Little and Rubin [23], Schafer [30], and Srivastava [3] for treatments of statistical inference with incomplete data and a wide range of applications including astronomy, biology, clinical trials, and sample surveys. A well-known example of an incomplete data set was provided by Ryan and Joiner [29]. Researchers at a medical center in Pennsylvania monitored the cholesterol levels of 28 patients over a period of 4 days immediately following a heart attack. All 28 patients subsequently had their cholesterol levels measured at 2 and at 4 days of follow-up, and 9 patients were measured on day 4. The data are displayed in Table..

9 2 Table. The Pennsylvania Cholesterol Data Day Cholesterol Level Day Cholesterol Level * * * * * * * * * The cholesterol data provide an example of a two-step monotone incomplete pattern. A random sample X α = (X α,..., X dα, α =,..., N, from a d-dimensional multivariate population is said to be monotone incomplete if whenever X lα is missing then, for all j > l and β > α, X jβ also is missing. More generally, we can conceive of k-step monotone incomplete data sets as visualized in Figure. Figure

10 3 A random vector X R d is said to have a multivariate normal distribution with mean µ R d and positive definite (symmetric d d covariance matrix Σ, denoted X N d (µ, Σ, if its density function is [ (2π d/2 Σ /2 exp ] 2 (x µ Σ (x µ, (.0. x R d. The multivariate normal distribution is undoubtedly the most important distribution in statistics, and many extensive, classical treatments of statistical inference for multivariate normally distributed data are available, e.g. Anderson [2], Eaton [9], and Muirhead [25]. In this dissertation, we consider problems arising in the statistical analysis of monotone incomplete data drawn from a multivariate normal population. In general, µ and Σ are unknown, so it is of interest to perform statistical inference for them. Throughout the dissertation, we focus on maximum likelihood estimators of µ and Σ. When the data are monotone incomplete, the maximum likelihood estimators for µ and Σ, denoted by µ and Σ, respectively, are well-known [], [3], [5], [24]; however, the exact distributions of µ and Σ are far more complicated than in the case of complete data sets. In fact, until recently, neither distribution was known in the case of two-step data and, for general k, both distributions still remain unknown. Morrison [24] and Kanda and Fujikoshi [7] examined the exact means and variances of µ in great detail. Kanda and Fujikoshi also went on to find asymptotic results. Statistical inference for µ was first developed without knowledge of the exact distribution of µ. There has been much research in the area of hypothesis testing for µ e.g.

11 4 Bhargava [4], [5], Eaton and Kariya [0], Giri [3], Hao and Krishnamoorthy [4]. A drawback of hypothesis testing alone is that each element of µ must be specified in the null hypothesis, therefore it is preferable that the results of a hypothesis test be accompanied by a confidence region. Confidence regions for µ may be based on the likelihood ratio test statistic; however the resulting regions are not ellipsoidal and, in fact, have rather counterintuitive shapes. For the case in which the data are two-step monotone incomplete, the first derivation of ellipsoidal confidence regions for µ was obtained by Krishnamoorthy and Pannala [20]. The estimator given in [20] of Cov( µ, the covariance matrix of µ, is, in retrospect, not identical to Ĉov( µ, the maximum likelihood estimator of Cov( µ; however, it is asymptotically equivalent. Therefore we denote their estimated covariance matrix by Cov( µ. Krishnamoorthy and Pannala [20] obtained ellipsoidal confidence regions for µ by means of T 2 = ( µ µ Cov( µ ( µ µ, a generalization of the classical Hotelling s T 2 -statistic. Krishnamoorthy and Pannala approximated the distribution of T 2 with an F -distribution and we shall show that, for small dimensions, their approximation is very close to the exact distribution. Moreover, they also extended this method to general k-step monotone incomplete data. Chang and Richards [6], [7] derived stochastic representations for the exact distributions of µ and Σ in the case of two-step monotone incomplete data. These stochastic representations are important because the asymptotic distributions hold only for large sample sizes, which often are unavailable, especially for high-dimensional data. Chang

12 5 and Richards [6] also derived Ĉov( µ, the maximum likelihood estimator of Cov( µ and therefore generalized the classical Hotelling s T 2 -statistic to T 2 = ( µ µ Ĉov( µ ( µ µ. (.0.2 Chang and Richards based their ellipsoidal confidence regions for µ on probability inequalities for T 2. In this disseration, we begin by considering the case of two-step monotone incomplete data, i.e., k = 2. Let X R p and Y R q. Suppose N p+q ( X (µ, Σ. In Y ( X the two-step setting, we observe n mutually independent observations on and an Y additional N n independent observations on X only. Therefore, the data are mutually independent vectors of the form X Y, X 2 Y 2,, X n Y n, X n+, X n+2,, X N. In this setting, we provide an alternative derivation of the stochastic representation for µ found by Chang and Richards [6]. We first prove that both µ and Σ are equivariant, a result that greatly simplifies the examination of both statistics as we then may assume µ = 0 and Σ = I p+q, the identity matrix. This new derivation is based on the conditional distribution of the incomplete data given the complete data and it identifies explicitly the independent random variables that appear in the stochastic representation.

13 6 We derive next a stochastic representation for the exact distribution of the T 2 - statistic,.0.2. We first prove an invariance property of the T 2 -statistic, and we rely heavily on that property in our derivation of the exact stochastic representation. This stochastic representation allows us to construct ellipsoidal confidence regions for µ, or to perform related tests of hypotheses, with exact confidence levels or levels of significance, respectively. As our confidence regions are based on the exact distribution of the T 2 -statistic, it follows that our confidence regions are of exact level and hence are preferable to those of Chang and Richards [6] or Krishnamoorthy and Pannala [20]. From this stochastic representation, we also derive simultaneous confidence intervals for linear combinations of µ and we apply our confidence intervals and those previously available to the Pennsylvania cholesterol data as a numerical example. Because our stochastic representation is quite complex, we also provide in Chapter 3 probability inequalities for the T 2 -statistic. The last application we explore is the construction of prediction regions for new observations; although we are unable to find the exact distribution for our proposed statistic, we are confident that F -approximations can be obtained. In the second part of the dissertation, which appears in Chapter 4, we address several research problems related to three-step monotone incomplete data, i.e., k = 3. Let X R p, X 2 R p 2, and X 3 R p 3. Suppose (X, X, 2 X 3 N p +p 2 +p (µ, Σ. In the 3 three-step setting, we observe n mutually independent observations on (X, X, 2 X 3,

14 an additional n 2 independent observations on (X, X 2, and an additional n 3 observations on X only. Therefore, the data are mutually independent vectors of the form 7 X, X 2, X 3, X,n X 2,n X 3,n X,n + X 2,n + X,n +n 2 X,n +n 2 X,n +n 2 + X,n +n 2 +n 3 We partition µ in similar fashion into three subvectors, µ, µ 2, and µ 3. In this setting, we establish independence between µ and { µ 2, µ 3 }. Furthermore, we prove that when Σ = I d, d = p + p 2 + p 3, these subvectors are pairwise uncorrelated; therefore, although we have not established independence between µ 2 and µ 3, we have shown that they are uncorrelated for Σ = I d. We establish the equivariance of µ under a certain group of transformations and provide an extension of our alternative derivation for the distribution of µ from two-step to three-step monotone incomplete data. Although we have not been able to find a joint stochastic representation for µ, µ 2, and µ 3, we believe that we have identified the six random variables, whose joint distribution is unknown, that form the basis of that representation. Throughout the dissertation, we have also made an assumption on the process that generates the incomplete data. There are three main underlying processes that describe how observations are missing: missing at random, missing completely at random, and not missing at random [28]. Because of the independence structure we have assumed, we have also implicitly assumed our data is missing completely at random, that is, there is no reason or order as to why any one unit would be missing an observation as opposed

15 to another. Readers are referred to [23] and [30] for further discussion on the types of missingness. 8

16 9 Chapter 2 Preliminaries 2. Some matrix algebra Let X be a p q matrix and let vec(x denote the pq column vector formed by stacking the columns of X. Let C > 0 and D > 0 be p p and q q matrices, respectively, where > 0 denotes that the matrices are positive definite (and symmetric. We denote the inverse, trace, and determinant of C by C, tr (C, and C, respectively. Also, we denote the Kronecker product of C and D by C D. Muirhead [25] provides a number of useful properties of these matrix operations and we collect together some of their properties in the following proposition. Proposition 2... (i (C D = C D, tr (C D = (tr C(tr D, (C D = C D, and C D = C q D p. (ii If A is m p and B is r q, then (A B(C D = AC BD. (iii If A is m p and B is q m, then the following equalities hold: vec (BAX = (X Bvec (A tr (BAX = (vec (B (I Avec (X tr (AX CXB = (vec (X (BA C vec (X.

17 (iv Let A be p q and B be q p and P = C + ADB. The following expression for P is known as Woodbury s formula: 0 P = C C AD(D + DBC D DBC. (2.. (v Let λ,..., λ p be the eigenvalues of C. Then there exists an orthogonal p p matrix H such that H CH = diag(λ,..., λ p. (2..2 Let the (p + q (p + q matrix M > 0 be partitioned into p and q rows and columns, i.e., M = M M 2, where M is p p, M 2 = M is p q, and M 2 22 M 2 M 22 is q q. A large portion of our research involves the partitioning of data into blocks with similar patterns of missingness. We therefore make much use of the well-known Schur complement, that is M 22 = M 22 M 2 M M 2. There are a number of results involving Schur complements that we will need and so we list them in the following lemma, see Anderson [2] and Muirhead [25]. Proposition Partition the positive definite matrix M into p and q rows and columns, as above. Then (i The partial Iwasawa coordinates of M are {M, M 2 M, M 22 }, and I M = p 0 M 0 I p M 2 M I q 0 M 22 0 I q M M 2. (2..3

18 Further, I p M M 2 M = 0 I q M 0 0 M 22 I p 0. M 2 M I q (ii Let x = (x, x 2, where x R p, x 2 R q. Then, x M x = (x M 2 M 22 x 2 M 2 (x M 2 M 22 x 2 + x 2 M 22 x 2. (2..4 (iii Let X N p+q Y µ µ 2, Σ Σ 2, Σ 2 Σ 22 then, Y X N q (µ 2 + Σ 2 Σ (X j µ, Σ 22. (2..5 Finally, we define e = (, 0,..., 0 to be the vector whose first element is one followed by zeros for an arbitrary length that will be obvious when used in the context of the problem. Similarly, we define e 2 = (0,, 0,..., Some multivariate distributions 2.2. The matrix normal distribution Let M be a p q matrix, C and D be p p and q q positive definite matrices, respectively. A p q random matrix W 2 has a matrix normal distribution with mean M and covariance parameter C D, denoted W 2 N(M, C D, if the probability

19 2 density function of W 2 is [ (2π pq/2 C q/2 D p/2 exp 2 tr ( C W 2 M D ( W 2 M ], (2.2. W 2 R p q. Then W 2 N(M, C D is equivalent to stating that vec(w 2 N(vec(M, C D, the multivariate normal distribution discussed in (.0.; see Muirhead [25, p. 79] The Wishart distribution A d d random matrix W has a Wishart distribution with degrees of freedom a > d and covariance matrix Λ > 0, denoted W W d (a, Λ, if its probability density function is 2 ad/2 Λ a/2 Γ d (a/2 W 2 a 2 (d+ exp( 2 tr Λ W, (2.2.2 where W > 0, Re(a > 2 (d, and Γ d (a = π d(d /4 d j= Γ(a 2 (j, (2.2.3 is the multivariate gamma function [2], [25]. The Wishart distribution is also defined when W is singular, in which case, the density function does not exist. Let Σ > 0 and T be d d matrices and Z N(0, I n Σ. If W = Z Z, then W W d (n, Σ and has

20 3 characteristic function E[exp(i tr (T W ] = I d 2iT Σ n/2. ( The multivariate beta distribution A d d random matrix L has a multivariate beta distribution with degrees of freedom (a, b, where a > d and b > d, denoted Beta d (a/2, b/2, if its probability density function is Γ d ((a + b/2 Γ d (a/2γ d (b/2 L (a d /2 I L (b d /2, (2.2.5 L > 0, I L > The noncentral χ 2 distribution Let Z N(µ, I d, then v = Z Z χ 2 d (µ µ, has a noncentral chi-square distribution with noncentrality parameter τ 2 = µ µ. It is well-known, [25], that the characteristic function of v is [ E exp [itv] = ( 2it d/2 2 itτ ] exp. ( it 2.3 Some properties of these distributions We begin by stating a result on the characteristic function of a quadratic form in multivariate normal variables. Results of this type have been stated in various forms

21 in the literature; notably, they can be deduced from a result of Khatri [8], p. 446, eq. ( Lemma (Khatri [8] Let C be a real, symmetric p p matrix, t R, v R p, and Z N p (0, Σ. Then ( Ee it(z CZ+v Z = Ip 2itCΣ /2 exp 2 t2 v Σ(I p 2itCΣ v. (2.3. Moreover, (2.3. remains valid if C is a complex symmetric matrix whose imaginary part is positive-definite and v is a complex vector. The following result extends Lemma 2.2 of Chang and Richards [6]. Lemma Let Λ be a q q positive definite matrix and U be a p p positive semi-definite matrix. If B 2 N(0, C D, then E exp( tr UB 2 D ΛD B 2 = I pq + 2C/2 UC /2 D /2 ΛD /2 /2. (2.3.2 This result remains valid if U is a symmetric complex matrix with the real part of U positive definite. Proof. We attribute the following proof of (2.3.2 to an anonymous referee of Chang and Richards [6]. First, recall that if X N d (0, I d, then XX W d (, I d. Therefore, for any positive definite matrix A, ( ( E exp tx AX = E exp t tr AXX = I + 2tA /2, (2.3.3

22 5 for t > 0. Define K = D /2 B 2 C /2, φ = D /2 ΛD /2, and ψ = C /2 UC /2. By Proposition 2.. (iii, vec (K = (C /2 D /2 vec(b. Because 2 vec(b 2 N(0, C D, it follows that vec(k N(0, (C /2 D /2 (C D(C /2 D /2. By Proposition 2.. (ii, the covariance matrix of vec(k equals (C /2 D /2 (C D(C /2 D /2 = C /2 CC /2 D /2 DD /2 = I p I q = I pq ; hence, vec(k N pq (0, I pq. Further, by Proposition 2.. (iii, (vec K (ψ φ(vec K (vec K vec(φkψ = tr (K φkψ = tr (ψk φk, and from the definitions of ψ, K, and φ, we have ψk φk = UB 2 D ΛD B 2. Because vec(k N pq (0, I pq, the moment-generating function stated above, (2.3.3, with t = and A = ψ φ, yields the desired result. Chang and Richards [6] and Kanda and Fujikoshi [7] gathered together a collection of properties of the Wishart distribution that we will also need here, all of which are available from Anderson [2], Eaton [9], or Muirhead [25]. Proposition Suppose that W W d (a, Λ, and W = W W 2 and W 2 W 22 Λ Λ = Λ 2 have been partitioned similarly to the matrix M in Section (2.. Λ 2 Λ 22 Then,

23 (i W 22 and {W 2, W } are mutually independent, and W 22 W q (a p, Λ 22. (ii W 2 W N(Λ 2 Λ W, Λ 22 W. (iii If Λ 2 = 0, then W 22, W, and W 2 W W 2 are mutually independent. Moreover, W 2 W W 2 W q (p, Λ 22. (iv For k d, if M is k d of rank k, then (MW M W k (a d+k, (MΛ M. In particular, if Y is a d random vector which is independent of W and satisfies P (Y = 0 = 0, then Y is independent of Y Λ Y /Y W Y χ 2 a d+. (v Let G W d (a, Σ, H W d (b, Σ, where G and H are independent. Then L = (G + H /2 G(G + H /2 Beta d (a/2, b/2 and L is independent of G + H. 6 Finally, in deriving any stochastic representation, we will use the standard notation, =, L for equal in distribution;, L for stochastically greater; and, L for stochastically smaller. That is to say, if X and Y are random entities, then X = L Y signifies that X and Y have the same probability distribution; also if X and Y are scalar-valued random variables, then X L Y signifies that P (X t P (Y t for all t R, and X L Y whenever Y L X.

24 7 Chapter 3 Two-step Monotone Incomplete Multivariate Normal Data This chapter begins with a thorough description of our notation and the maximum likelihood estimators for two-step monotone incomplete data. We then provide an alternative derivation of the exact distribution of µ, the maximum likelihood estimator for µ, first derived by Chang and Richards, [6]. In this chapter we will also derive a stochastic representation for the exact distribution of a generalization of Hotelling s T 2 - statistic. We will then derive upper and lower bounds for the exact distribution of the T 2 -statistic. As a consequence, we obtain exact ellipsoidal confidence regions for µ. We also apply the T 2 -statistic to derive simultaneous confidence intervals for linear functions of µ, and we apply these results to the Pennsylvania cholesterol data. We complete this chapter by studying prediction regions for complete observations. 3. Notation and maximum likelihood estimators Let X R p and Y R q. In the case of two-step monotone incomplete data, we suppose that the data are N mutually independent observations of the form, X Y, X 2 Y 2,, X n Y n, X n+, X n+2,, X N (3..

25 where ( Xj Y j, j =,..., n are observations from N p+q (µ, Σ, and the incomplete data X j, j = n +,..., N, are observations on the first p characteristics of the same population. One additional assumption necessary to guarantee that all means and variances are finite and that all integrals encountered later are absolutely convergent is that n > p + 2, [6]. Data of the form (3.. have been widely studied; cf. Anderson [], Bhargava [4], [5], Morrison [24], Eaton and Kariya [0], Fujisawa [2], Hao and Krishnamoorthy [4], Kanda and Fujikoshi [7], and most recently, Chang and Richards [6], [7]. Define the sample mean vectors 8 X = n Ȳ = n n j= n j= X j, X2 = N n Y j, X = N N j=n+ N X j, j= X j, (3..2 and the corresponding matrices of sums of squares and products A,n = n (X j X (X j X, A 2 = A = n (X 2 j X (Y j Ȳ, j= A 22 = n (Y j Ȳ (Y j Ȳ, A,N = N (X j X(X j X. j= j= j= (3..3 In addition, we use the notation τ = n/n for the proportion of data which are complete and denote τ by τ, so that τ = (N n/n is the proportion of incomplete observations. By Anderson [] (cf. Morrison [24], Anderson and Olkin [3], Jinadasa and Tracy [5],

26 9 the maximum likelihood estimators of µ and Σ are, respectively, µ µ = = µ 2 X Ȳ τa 2 A,n ( X X 2, (3..4 and Σ = Σ Σ 2 Σ2 Σ22 = N A,N N A 2 A,n A,N N A,N A,n A 2 n A 22,n + N A 2 A,n A,N A,n A 2. ( A new derivation of an exact stochastic representation for µ Chang and Richards [6] derived an exact stochastic representation for µ by means of a direct analysis of its characteristic function. In examining ways to extend their methods to three-step data, we discovered an alternative method to derive the exact distribution of µ by means of the incomplete data given the complete data, namely Y given X,..., X N. Before we delve into the exact distribution, we show that µ and Σ are equivariant. This can be derived from a general argument given by Davison [8], p. 85, however we have chosen to provide the explicit details here.

27 Proposition Let Λ and Λ 22 be p p and q q positive definite matrices, respectively, Λ 2 be q p, ν R p, ν 2 R q, and Λ = Λ 0 0 Λ 22, C = I p 0, ν = Λ 2 I q ν ν Then the estimators µ and Σ are equivariant under the transformation X j Y j ΛC X j Y j + ν, (3.2. for j =,..., n. For j = n +,..., N, X j Λ X j + ν. Proof. Let X j Y j = ΛC X j Y j + ν = Λ X j + ν, (3.2.2 Λ 22 Λ 2 X j + Λ 22 Y j + ν 2 for j =,..., n and X = Λ j X j + ν, j = n +,..., N. Then X j Y j N p+q µ = µ µ 2, Σ = Σ Σ 2 Σ 2 Σ 22,

28 2 j =,..., n, and X j N p (µ, Σ, j = n +,..., N, where µ = ΛCµ + ν and Σ = ΛCΣC Λ. Define the sample mean vectors X = n Ȳ = n n X, j j= n j= X 2 = N n Y j, X = N N j=n+ X j, N X, (3.2.3 j j= and the corresponding matrices of sums of squares and products A = N (X X (X X n, A =,N j j 2 A = (X X (Y Ȳ, 2 j j A,n = j= n (X X j (X X n j, A = (Y Ȳ (Y Ȳ 22 j j. (3.2.4 j= j= j= Then, the maximum likelihood estimators of µ and Σ are, respectively, µ = X Ȳ τa A ( X X 2,n 2 (3.2.5 and Σ = N A,N N A A 2,n A,N N A A,N,n A 2 n A + 22,n N A A 2,n A A,N,n A 2. (3.2.6

29 Our goal is to show that µ = ΛC µ + ν and that Σ = ΛC ΣC Λ. As a consequence of , we have the following relations: 22 X = Λ X + ν, X 2 = Λ X 2 + ν, Ȳ = Λ 22 (Ȳ + Λ 2 X + ν 2, X = Λ X + ν, and A,N = Λ A,N Λ, A,n = Λ A,n Λ, n A = 2 A = Λ 2 (X j X [ ( Λ 22 Λ2 (X j X + Y j Ȳ ] j= = Λ (A 2 + A,n Λ 2 Λ 22, A = Λ [ n (Λ 2 (X j X + Y j Ȳ (Λ 2 (X j X + Y j Ȳ ] Λ22 j= = Λ 22 [ Λ2 A,n Λ 2 + Λ 2 A 2 + A 2 Λ 2 + A 22 ] Λ22. We may write both µ and Σ in terms of the original means and matrices of sums of squares and products. It is straightforward that µ = τ X + τ X 2 = τλ X + ν + τ(λ X2 + ν = Λ µ + ν.

30 23 The maximum likelihood estimator of µ 2 is µ 2 = Λ 22Ȳ + ν 2 + Λ 22 Λ 2 X τλ 22 (A 2 + Λ 2 A,n A,n ( X X 2 = Λ 22 Ȳ + ν 2 + Λ 22 Λ 2 (τ X + τ X 2 τλ 22 A 2 A,n ( X X 2. (3.2.7 Because τ X + τ X 2 = X, it follows that µ 2 = Λ 22 Λ 2 X + Λ 22 (Ȳ + ν 2 τa 2 A,n ( X X 2 + ν 2 = Λ 22 (Λ 2 µ + µ 2 + ν 2. (3.2.8 Therefore µ µ 2 = Λ µ + ν = ΛC µ + ν, (3.2.9 Λ 22 (Λ 2 µ + µ 2 + ν 2 µ 2 so we have proved that µ is equivariant. The maximum likelihood estimators of Σ, Σ 2, and Σ 22 are, respectively, Σ = N Λ A,N Λ, (3.2.0 Σ 2 = Σ 2 = N Λ A,N A,n A 2 Λ 22 + N Λ A,N Λ 2 Λ 22, (3.2.

31 24 and Σ = 22 n Λ [ 22 Λ2 A,n Λ 2 + Λ 2 A 2 + A 2 Λ 2 + A 22 (A 2 + Λ 2 A,n A,n (A 2 + A,n Λ 2 ] Λ 22 + N Λ 22 (A 2 + Λ 2 A,n A,n A,N A,n (A 2 + A,n Λ 2 Λ 22 = n Λ 22 A 22,n Λ 22 + N Λ 22 [ Λ2 A,N Λ 2 + A 2 A,n A,N Λ 2 + Λ 2 A,N A A,n 2 + A 2 A A,n,N A A ],n 2 Λ22. (3.2.2 To establish the equivariance of Σ, let us evaluate ΛC ΣC Λ. To that end, ΛC ΣC Λ Λ = 0 Λ 22 Λ 2 Λ 22 N A,N N A 2 A A,n,N Λ Λ Λ Λ 22 N A,N A,n A 2 n A 22,n + N A 2 A,n A,N A,n A 2 Λ = Σ Λ Λ ( Σ Λ 2 + Σ 2 Λ 22 Λ 22 ( Σ 2 + Λ 2 Σ Λ Λ 22 (Λ 2 Σ Λ 2 + Σ 2 Λ 2 + Λ 2 Σ2 + Σ 22 Λ 22 (3.2.3 By straightforward matrix multiplication, it follows from (3.2.0, (3.2., and (3.2.2 that ΛC ΣC Λ = Σ. Therefore, Σ is also equivariant under the transformation (3.2.2.

32 25 As a consequence of the equivariance of Σ, we obtain the following result. Corollary The estimated covariance matrix of µ is equivariant under the transformation ( Proof. Because Σ is equivariant under the transformation (3.2.2, it follows that Ĉov( µ = Σ + N (γ N = N ΛC ΣC Λ Σ 22 (γ N Λ 22 Σ22 Λ 22 = ΛCĈov( µc Λ. Therefore Ĉov( µ is equivariant. By taking Λ = Σ /2, Λ 22 = Σ /2, Λ 22 2 = Σ 2 Σ, (3.2.4 then under the transformation (3.2.2, we obtain Cov X = ΛCΣC Λ = I p+q. (3.2.5 Y Therefore, in analyzing the distribution of µ, we may assume, without loss of generality, that the population covariance matrix is I p+q. Furthermore, by choosing ν = ΛCµ, we may also assume, without loss of generality, that µ = 0.

33 of µ. We will now provide an alternative proof for the exact stochastic representation Theorem (Chang and Richards [6] Let V N p+q 0, N Σ + τ n Q χ 2 n p, Q 2 χ2 p, V 2 N q (0, I q, where V, V 2, Q, and Q , 0 Σ 22 are mutually independent. Then the distribution of µ is given by the exact stochastic representation µ L = µ + V + ( τq2 nq /2 0 Σ /2 22 V 2, (3.2.6 Proof. Assume, without loss of generality, that µ = 0 and Σ = I p+q. Then, by (2..5, it follows that Y j X j N q (0, I q and therefore all the X j and Y j are mutually independent, normally distributed random vectors. Therefore Ȳ {X,..., X N } N q (0, n I q. Conditional on the complete data, X = {X,..., X N }, the vector µ 2 is a linear combination of the incomplete data, Y = (Y,..., Y n, hence µ 2 is normally distributed. Moreover, because µ = X, it then follows that µ is fixed in the conditional distribution of µ 2 given X. We now need to find the conditional expected value and covariance matrix of µ 2. Let c j = (X j X A,n ( X X 2, j =,..., n; noting that n j= c j = 0, we may write µ 2 as µ 2 = Ȳ τa 2 A,n ( X X 2 = n = n n Y j τ j= n Y j τ j= n c j (Y j Ȳ j= n c j Y j. j=

34 Let δ jk be Kronecker s delta, that is δ jk =, j = k and δ jk = 0, j k. Then it is of note that 27 n n n n Cov( Y j, c j Y j = c k E(Y j Y k j= j= = = ( j= k= n j= k= n c k δ jk I q n c j I q = 0. j= Because n j= Y j and n j= c j Y j is zero, it then follows that n j= Y j and n j= c j Y j are jointly normally distributed and their covariance are independent. Therefore, conditional on X, µ 2 is a linear combination of independent normal vectors, hence µ 2 is normally distributed with mean E( µ 2 X = = n E(( n τc j Y j X j= n ( n τc j E(Y j = 0, (3.2.7 j= and covariance matrix Cov( µ 2 X = Cov( n n n Y j + Cov( τ Y j c j j= = n I q + τ 2 n j= k= j= n c j c k δ jk I q = n n I q + τ 2 ( c 2 I j q. j=

35 28 Because n j= c 2 j = ( X X 2 A,n n j= = ( X X 2 A,n ( X X 2, (X j X (X j X A,n ( X X 2 it follows that Cov( µ 2 X = n I q + τ 2 ( X X 2 A,n ( X X 2 I q. (3.2.8 Therefore µ 2 X N q (0, n [ + τ 2 n( X X 2 A,n ( X X 2 ]I q. Observe that µ 2 depends on X only through X X 2 and A,n ; therefore { X X 2, A,n } is sufficient for µ 2. By the independence of the sample mean and sample covariance matrix of a normal random sample, X and A,n are independent and, consequently, {τ X + τ X 2, X X 2 } and A,n are independent. Next, because Cov( X X 2, τ X + τ X 2 = Cov( X, τ X Cov( X 2, τ X 2 = N I p N I p = 0 and ( X X 2, τ X + τ X 2 has a joint multivariate normal distribution, it follows that µ = τ X + τ X 2 is independent of X X 2. Therefore X X 2, A,n, and τ X + τ X 2, and consequently, µ and µ 2, are mutually independent. By Proposition (iv, Q = ( X X 2 ( X X 2 ( X X 2 A,n ( X X 2 χ2 n p,

36 29 and Q is independent of X X 2. Therefore, ( µ 2 {X, Q } N q 0, [ + τ 2 n ( X X 2 ( X X 2 ] I n Q q. (3.2.9 Because X X 2 N p (0, (n (N n I p and n (N n = /n τ, it follows that ( X X 2 ( X X 2 Q 2 /n τ, where Q 2 χ 2. We may now write the conditional p distribution of µ 2 as ( µ 2 {Q, Q 2 } N q (0, n + τ Q 2 I n Q q. By elementary properties of the normal distribution, it follows that µ 2 L = n V 2 + τq2 nq V 2, where V 2 N q (0, I q, V 2 N q (0, I q, and V 2, V 2, Q, and Q 2 are mutually independent. It is straightforward to see that µ = X N p (0, N I p and therefore µ L = V, where V N p (0, N I p. Therefore, a joint stochastic representation for µ and µ 2 is µ µ = = L V + µ 2 τq 2 nq 0 V 2, where V = V V 2 N p+q 0, N I p 0 and V 2 is as defined previously. 0 n I q

37 Our final step is to transform the data back to its original form for general µ and Σ. Recall that by (3.2.4, the transformation to µ = 0 and Σ = I is µ µ 2 = Σ /2 0 0 Σ /2 22 I 0 µ µ. I µ 2 Σ 2 Σ 30 The inverse of the latter transformation is µ = I 0 µ 2 Σ 2 Σ I Σ /2 0 0 Σ /2 22 µ µ 2 + µ; therefore µ = L I 0 Σ 2 Σ I Σ /2 0 0 Σ /2 22 V + τq 2 nq 0 + µ. ( V 2 Because Σ /2 V 22 2 N q (0, Σ 22, and I 0 Σ 2 Σ I Σ /2 0 0 Σ /2 22 V N p+q 0, N Σ + τ n 0 0, 0 Σ 22 we obtain ( One advantage of this proof is that it provides explicit formulas for V, V 2, Q, and Q 2 in terms of the data, whereas Chang and Richards showed only the existence of

38 3 these random variable. Namely, those explicit formulas are: Q = ( X X 2 ( X X 2 ( X X 2 A ( X,n X 2, Q 2 = n τ( X X 2 ( X X 2, V V = = X, V 2 Ȳ n Y j= j (X j X A ( X,n X 2 V 2 = ( X X 2 A ( X,n X An exact stochastic representation for the T 2 -statistic Following Krishnamoorthy and Pannala [20] and Chang and Richards [6], we will study the pivotal quantity, T 2 = ( µ µ Ĉov( µ ( µ µ, (3.3. a generalization of Hotelling s T 2 -statistic in the setting of monotone incomplete data. An F -distribution approximation to the distribution of a statistic similar to (3.3. was given by Krishnamoorthy and Pannala [20]. Chang and Richards [6] obtained upper and lower bounds for its distribution, leading to conservative ellipsoidal confidence regions for µ, and derived the asymptotic distribution of the T 2 -statistic for the cases in which n, N, p, and q satisfy n > p+q for fixed n, or n/n δ (0, ] as n, N. Nevertheless, the exact finite-sample distribution of this statistic was unknown before the work in this dissertation.

39 32 Our primary motivation for deriving a stochastic representation for the exact distribution of the T 2 statistic (3.3. is that resulting ellipsoidal confidence regions for µ will be less conservative than those previously derived. Let γ = + (n 2N τ n(n p 2. (3.3.2 As shown by Chang and Richards [6], the maximum likelihood estimator of Cov( µ is Ĉov( µ = N Σ + (γ N Σ, ( where Σ is as defined in (3..5. Following Chang and Richards [7], we decompose A,N as follows: A,N = A,n + B + B 2, (3.3.4 where B = B 2 = N j=n+ (X j X 2 (X j X 2, (3.3.5 n(n n ( N X X 2 ( X X 2, (3.3.6 and A,n W p (n, Σ, B W p (N n, Σ, and B 2 W p (, Σ are mutually independent Wishart matrices. This decomposition leads to the following result due to Chang and Richards [7].

40 Lemma (Chang and Richards [6] When Σ 2 = 0, the random matrices and vectors A 22,n, A 2 A,n, A,n, Ȳ, X, B, and B 2 are mutually independent. 33 In preparation for our derivation of an exact stochastic representation of the distribution of the T 2 -statistic, we show that, without loss of generality, we may assume that µ = 0 and Σ = I d. Similar to Chang and Richards [6], we begin by writing T 2 = ( µ µ Ĉov( µ ( µ µ as a sum of two terms. Define T 2 = n(ȳ A 2 A X,n µ 2 +A 2 A µ,n A (Ȳ A 22,n 2 A X,n µ 2 +A 2 A µ,n, and (3.3.7 T 2 2 = N( X µ A,N ( X µ = N( X µ (A,n + B + B 2 ( X µ. (3.3.8 Applying the quadratic identity (2..4 with x µ µ and Λ NĈov( µ, we find that N T 2 = ( µ µ ( NĈov( µ ( µ µ = ( µ 2 µ 2 A 2 A ( µ,n µ (NĈov( µ 22 ( µ 2 µ 2 A 2 A ( µ,n µ + ( µ µ Σ ( µ µ.

41 By (3..4, µ 2 A 2 A,n µ = Ȳ A 2 A X,n ; by (3.3.3, NĈov( µ 22 = γ n A 22,n ; and by (3.3.4, A,N = A,n + B + B 2. Therefore 34 N T 2 = γ T + T 2. (3.3.9 Krishnamoorthy and Pannala [20] also decomposed their T 2 -statistic into a corresponding sum T 2 + T 2 and showed that the marginal distribution of each T 2 j does not depend on (µ, Σ. To deduce that the distribution of their T 2 -statistic depends neither on µ or Σ, it would need to be shown that the joint distribution of ( T, T 2 also satisfies that property, a result which appears difficult to establish directly. We provide a proof that uses ideas of Yamada, et al. [32] to show that the distribution of the T 2 -statistic depends neither on µ nor Σ. Proposition The statistics T 2 in (3.3.7, and T 2 2 in (3.3.8 are algebraically invariant under the transformation ( Consequently, the same holds for the T 2 - statistic in Proof. Let X Y = ΛC X + ν = Y Λ X + ν Λ 22 Λ 2 X + Λ 22 Y + ν 2 (3.2., µ and Σ are equivariant under this transformation. Because. By Proposition (T 2 2 ( µ µ Σ ( µ µ = (Λ µ + ν Λ µ ν (Λ Σ Λ (Λ µ + ν Λ µ ν = ( µ µ Σ ( µ µ T 2 2,

42 35 the statistic T 2 2 is invariant under the transformation ( To show that the statistic T 2 is invariant under the transformation (3.2.2, let us analyze each term of (T 2 ( µ 2 µ Σ Σ ( µ 2 2 µ Σ ( µ 22 2 µ Σ Σ ( µ 2 2 µ individually. The vector µ 2 µ 2 transforms to Λ 22 (Λ 2 µ + µ 2 + ν 2 (Λ 22 µ 2 + Λ 22 Λ 2 µ + ν 2 = Λ 22 ( µ 2 µ 2 + Λ 22 Λ 2 ( µ µ. The vector Σ Σ ( µ 2 µ transforms to Λ 22 (Λ 2 Σ + Σ 2 Λ (Λ Σ Λ (Λ µ + ν Λ µ ν = Λ 22 (Λ 2 + Σ 2 Σ ( µ µ. In addition, Σ 22 = Λ 22 (Λ 2 Σ Λ 2 + Σ 2 Λ 2 + Λ 2 Σ2 + Σ 22 Λ 22 Λ 22 (Λ 2 Σ + Σ 2 Λ (Λ Σ Λ Λ ( Σ Λ 2 + Σ 2 Λ 22 = Λ 22 ( Σ 22 Σ Σ 2 Σ 2 Λ 22 = Λ 22 Σ22 Λ 22.

43 36 Consequently, (T 2 equals ( µ 2 µ 2 Σ 2 Σ ( µ µ Λ 22 (Λ 22 Σ22 Λ 22 Λ 22 ( µ 2 µ 2 Σ 2 Σ ( µ µ = ( µ 2 µ 2 Σ Σ ( µ 2 µ Σ ( µ 22 2 µ 2 Σ Σ ( µ 2 µ T 2. Therefore, T 2 and T 2 2 both are invariant and hence, by (3.3.9, T 2 also is invariant. By taking Λ = Σ /2, Λ 22 = Σ /2, and Λ 22 2 = Σ 2 Σ, the covariance matrix of (X, Y under this transformation is ΛCΣC Λ = I p+q. We may then assume that the population covariance matrix is I p+q. Furthermore, by choosing ν = ΛCµ we may assume µ = 0. Hence, in deriving the distribution of the T 2 -statistic, we assume, without loss of generality, that µ = 0 and Σ = I p+q We now derive a stochastic representation for the exact distribution of the T 2 - statistic (3.3.. The proof of this result is lengthy, relying on characteristic functions and repeated applications of the powerful method of orthogonal invariance. The resulting stochastic representation, however, involves only chi-square and Beta random variables, and a 2 2 Wishart matrix, all mutually independent. Thus, the stochastic representation is straightforward to simulate. Theorem Let cos 2 θ Beta ( 2, 2 (p, Q χ 2 p, Q 2 χ2 p, Q 3 χ2 n p q, Q 4 χ 2, W W q 2 (N p, I 2, and β Beta((n p 2/2, (N n /2 be mutually

44 37 independent. Then, T 2 = L NQ ( 4 + Q γq β e W e 3 /2 nq + N nq /2 cos θ + 2 /2 N nq sin θ 2 2 W /2 nq + N nq /2 cos θ 2 /2 N nq sin θ τq + τq 2 2( τq τq 2 /2 cos θ ( + τq + τq 2 2( τq τq 2 /2 cos θ e W e /2 e W nq + 2 N nq /2 cos θ 2. (3.3.0 /2 N nq sin θ 2 Proof. We assume, without loss of generality, that µ = 0 and Σ = I p+q. Recall from ( (3.3.9 that T 2 N = γ T 2 + T 2 2, where T 2 = n(ȳ A 2 A X,n A (Ȳ A 22,n 2 A X,n and T 2 2 = N X (A,n + B + B 2 X. By elementary properties of the multivariate normal distribution, n( Ȳ A 2 A,n X { X, X 2, A 2, A,n, B } N q ( na 2 A,n X, I q, (3.3.

45 and by Proposition 2.3.3(i, A 22,n W q (n p, I q and is independent of {A 2, A,n }. 38 Define Q 3 = n ( Ȳ A 2 A (Ȳ X,n A2 A,n T 2 X, then, by Proposition 2.3.3(iv, Q 3 {A 2, A,n, X } χ 2, and Q n p q 3 is independent of Ȳ A 2 A X,n. Because this distribution does not depend on {A 2, A,n, X }, then Q 3 is also independent of {A 2, A,n, X }. Therefore, T 2 n ( Ȳ A L 2 A (Ȳ X,n A2 A X,n =, Q 3 where Q 3 χ 2 n p q and the numerator and denominator are mutually independent. By (3.3., n ( Ȳ A 2 A (Ȳ X,n A2 A,n 2( X χ n X q A,n A 2 A 2 A,n X, (3.3.2 a noncentral chi-square distribution with q degrees of freedom and noncentrality parameter n X A A,n 2 A 2 A X,n.

46 39 Let t R. By Lemma 3.3., the characteristic function of T 2 /N is E exp[itn T 2 ] [ ( = E exp it n ( Ȳ A γq 2 A (Ȳ X,n A2 A X,n 3 + N X ] (A,n + B + B 2 X [ = E Q3 E X E X2 E B E A2,A exp itn X ( ] A,n + B,n + B 2 X EȲ { X, X 2,A 2,A,n,B } [it exp n ( Ȳ A γq 2 A (Ȳ X,n A2 A 3,n X ] (3.3.3 Applying the formula (2.2.6 for the characteristic function of the noncentral χ 2 distribution to (3.3.2 and inserting the result into (3.3.3 yields [ E exp[itn T 2 ] = E Q3 E X E X2 E B E A2,A exp itn X ( ] A,n,n + B + B 2 X [ ( 2it q/2 itn X A A,n 2 A 2 A exp γq 3 γq 3 2it,n ] X. (3.3.4 By Proposition 2.3.3(ii, A 2 A,n N(0, I q A,n ; therefore (3.3.4 equals E Q3 ( 2it [ itn X ( ] A,n + B + B 2 X q/2e γq X E X2 E B E A,n exp 3 [ itn X A A,n 2 A ] 2 A X,n E A2 A exp. (3.3.5,n γq 3 2it

47 By Lemma 2.3.2, with U = itn/(γq 3 2itI q, B 2 = A 2, D = A,n, C = I q, and Λ = X X, [ itn X A A,n 2 A ] 2 A X,n E A2 A exp,n γq 3 2it = E A2 A,n exp = = = [ itn tr (A 2 A X,n X A A,n 2 γq 3 2it I pq 2itn γq 3 2it I q A /2 X,n X I p 2itn γq 3 2it A /2 X,n X A /2 /2,n A /2 q/2,n ( 2itn q/2 γq 3 2it X A X,n. ] 40 Substituting this result into (3.3.5 yields, E exp[itn T 2 ] = E Q3 ( 2it γq 3 q/2e X E X2 E B E A,n exp ( 2itn γq 3 2it X q/2. A X,n [ itn X ] (A,n + B + B 2 X Because ( 2it ( 2itn γq 3 γq 3 2it X γq 3 2it 2itn X A X A X,n,n =, γq 3

48 4 it follows that [ E exp[itn T 2 ] = E Q3 E X E X2 E B E A,n exp itn X ] (A,n + B + B 2 X ( 2it( + n X A X,n q/2 γq 3 [ = E Q3 E X E X2 E B E A,n exp itn X ] (A,n + B + B 2 X [ ] itq4 E Q4 exp ( + n γq X A X,n, 3 where Q 4 χ 2 q and Q 4 is independent of Q 3, X, X 2, B, A,n. Hence, E exp[itn T 2 ] [ ] itq4 [ = E Q3 E Q4 exp E γq X E X2 E B E A,n exp itn X ] (A,n + B + B 2 X 3 exp [ itnq4 γq 3 X A X,n ]. (3.3.6 By applying the method of orthogonal invariance, we shall simplify the above characteristic function greatly. For fixed Q 3 and Q 4, define the function f( X, X [ 2 = E B E A,n exp itn X ] [ (A,n + B + B 2 itnq4 X exp γq 3 X A X,n ]. We first verify that f( X, X 2 is invariant under the transformation ( X, X 2 (H X, H X 2, where H O(p, the set of orthogonal p p matrices. Recall that B 2 is a function of X and X 2, ( Suppose ( X, X 2 is replaced by (H X, H X 2, then the last

49 42 two exponential terms in (3.3.6 have exponents itn(h(τ X + τ X 2 (A,n + B + n τh( X X 2 ( X X 2 H (H(τ X + τ X 2 and (3.3.7 itnq 4 γq 3 (H X A,n (H X. (3.3.8 Because A,n W p (n, I p, then also HA,n H W p (n, I p and a similar result holds for B. Then the random variable in (3.3.7 is equal in distribution to X H (HA,n H + HB H + n τh( X X 2 ( X X 2 H H X = X H (H(A,n + B + n τ( X X 2 ( X X 2 H H X = X (A,n + B + n τ( X X 2 ( X X 2 X, and, similarly, the random variable in (3.3.8 is equal in distribution to itnq 4 γq 3 (H X (HA,n H (H X = itnq 4 γq 3 X A X,n. Therefore, f( X, X 2 = f(h X, H X 2. Because B 2 = n τ( X X 2 ( X X 2 is of rank, then by Proposition 2.. (v, there exists H O(p such that H( X X 2 ( X X 2 H = X X = X X 2 2 e e, 0 0

50 43 where e = (, 0,..., 0 R p. Therefore, we may replace ( X X 2 ( X X 2 by H X X 2 2 e e H in ( In addition, by replacing ( X, X 2 with ( H X, H X2 then, by orthogonal invariance, (3.3.6 becomes E Q3 E Q4 exp [ itq4 γq 3 ]E X E X2 E B E A,n exp [ itn X (A,n + B + n τ X X ] 2 2 e e X exp [ itnq4 γq 3 X A X,n ]. (3.3.9 We make one last orthogonal transformation. There exists an orthogonal matrix C O(p with first row X / X ; we may construct the remaining rows of C using the Gram-Schmidt orthogonalization process. We transform X to C X = X e and X 2 to αe + βe 2, where e is defined as before, e 2 = (0,, 0,..., 0 R p, and α and β are such that α X = X X 2, α 2 + β 2 = X 2 2. Let θ be the angle between X and X 2 and recall that cos θ = X X 2 / X X 2. Then α = X X 2 X = X 2 cos θ, and β = = ( X 2 2 ( X X 2 2 /2 X 2 ( X 2 X 2 2 ( cos 2 /2 θ X 2 = X2 sin θ. (3.3.20

51 44 Therefore X = τ X + τ X 2 = τ X e + τ X 2 (e cos θ + e 2 sin θ and X X 2 2 = X 2 + X X X 2 cos θ. Because n X N p (0, I p and N n X 2 N p (0, I p, then X and X 2 are orthogonally invariant random vectors. Therefore X / X and X 2 / X 2 are mutually independent and uniformly distributed on S p, the unit sphere in R p. Hence cos θ L = U U 2, where U and U 2 are independent and uniformly distributed on Sp. By Muirhead [25, p. 38], we then have X, X2, and θ are mutually independent, and cos 2 θ Beta(/2, (p /2. By (3.3.9, T 2 = L NQ ( 4 + n X 2 e γq A e,n 3 + N 2 [( τ X + τ X 2 cos θ e + τ X 2 e 2 sin θ ] ( ( A,n + B + n τ X 2 + X X X 2 cos θ e e [( τ X + τ X 2 cos θ e + τ X 2 e 2 sin θ ]. (3.3.2 Because n X N p (0, I p, it follows that Q n X 2 χ 2 p. Similarly, N n X 2 N p (0, I p and therefore Q 2 (N n X 2 2 χ 2 p. In addition, A,n, Q 3, Q 4, X,

52 X 2, θ, and B are mutually independent. Thus, we have mutual independence between Q, Q 2, Q 3, Q 4, θ, A,n, and B. We therefore conclude that 45 T 2 = L NQ ( 4 + Q γq e A e,n 3 [( τq /2 + N + ( A,n + B + [( τq /2 + τq /2 2 cos θ e + /2 τq e 2 sin θ ( τq + τq 2 2( τq τq 2 /2 cos θ τq /2 2 cos θ e + 2 ] e e ] /2 τq e 2 sin θ. ( This representation involves p p Wishart matrices, so it would be nice to reduce the size of any such matrices appearing in the final result. Next, we represent the distribution of T 2 in terms of a 2 2 Wishart matrix. By Proposition (v, L = (A,n + B /2 A,n (A,n + B /2 is independent of P = A,n + B and L Beta p ((n /2, (N n /2. Therefore, we may write ( in terms of L and P as T 2 = L NQ ( 4 + Q γq (P /2 e L (P /2 e 3 [( τq /2 + N + τq /2 2 cos θ e + /2 τq e 2 sin θ ( P + ( τq + τq 2 2( τq τq 2 /2 cos θ e e [( τq /2 + τq /2 2 cos θ e + 2 ] ] /2 τq e 2 sin θ. ( Because the distribution of L is invariant under orthogonal transformations, we may replace L by HLH, where H O(p. We now choose H to be the orthogonal matrix

53 with first row P /2 e / P /2 e and with all the remaining rows of H constructed using the Gram-Schmidt orthogonalization process. Then (P /2 e L P /2 e L = e P e e L e. By Muirhead [25], β = e L e /Beta((n p 2/2, (N n /2. 2 In order to simplify the notation let us also define u = (u, u 2 = N /2 (( τq /2 + /2 /2 τq cos θ, τq sin θ, and v = τq + τq 2 2( τq τq 2 /2 cos θ. Then the representation ( becomes 2 46 T 2 = L NQ ( 4 + Q β e γq P e + (u e + u 2 e 2 ( P + ve e (u e + u 2 e 2. 3 ( Our final step is to specify the distribution of the remaining terms that involve P. The first term only involves e P e and the second term involving P may be simplified by Woodbury s formula, (2.., as follows: (u e + u 2 e 2 ( P + ve e (u e + u 2 e 2 ( = (u e + u 2 e 2 P vp e e P + ve P (u e e + u 2 e 2 = (u e + u 2 e 2 P (u e + u 2 e 2 v + ve P e [(u e + u 2 e 2 P e ] 2.

54 47 Therefore, we derive the joint distribution of e P e, e P e 2, and e P e 2 2, when P W p (N 2, I p. Let M = (e, e 2 ; noting that MM = I 2, it follows by Proposition (iv, that (MP M W 2 (N p, I 2. Let W = MP M e = P e e P e 2. e P e 2 e P e 2 2 Then the stochastic representation, (3.3.24, reduces to T 2 = L NQ ( 4 + Q γq β e W e + u W v u 3 + ve W (e e W u 2, ( where W W 2 (N p, I 2. Remark We may take this one step further and represent the entire stochastic representation in terms of scalar mutually independent random variables. Let w = e W e, w 22 = e W e 2 2, w 2 = e W e 2, and ρ = w 2 / w w 22. We may rewrite Equation ( in terms of scalar random variables: ( T 2 = L t + Q β w + (u e + u 2 e 2 W (u e + u 2 e 2 v(u w + u 2 ρ w w vw ( = t + Q β w + u 2 w + u 2 2 w22 + 2u u 2 ρ w w 22 v(u w + u 2 ρ w w vw.

55 48 By Anderson [2], W = T T, where T is lower-triangular, that is t T = 0, t 2 t 22 and the entries of T are mutually independent, t 2 jj χ2 N p, and t ij N(0,, i j. It follows that W = (T T = t t 22 = t 2 t2 22 t 22 t 2 t t t 2 t t t2 t 2 2 t. ( t 2 t t 2 Therefore the joint distribution of {w, w 22, ρ 2 } is equal to the joint distribution of {(t t2 2 /t2 t2, 22 /t2, 22 t2 2 /(t2 2 +t2 } = {(Q Q 7 /Q 5 Q 6, /Q 6, Q 7 /(Q 6 +Q 7 }, where Q 5 χ 2 N p, Q 6 χ2 N p, and Q 7 χ2. Remark We have provided in the stochastic representation the distribution of cos 2 θ. Because the distribution of cos θ may also be desired, we provide the details here. Recall from Theorem that cos 2 θ Beta ( 2, 2 (p, where cos θ (, and cos θ L = cos θ. Let α = 2, β = 2 (p, X = cos θ, and Y = cos2 θ Beta(α, β. Let t (0,, then, because the distribution of cos θ is symmetric, P (X > t = 2 ( P ( t < X < t = 2 ( P ( X < t = 2 ( P (Y < t2.

56 49 Therefore P (X t = 2 (+P (Y < t2. Because the probability density function, p.d.f, is the derivative of the cumulative distribution function, f X (t, the p.d.f of X, equals tf Y (t 2, 0 < t <, where f Y (t is the p.d.f of Y. Similarly, f X (t = tf Y (t 2, < t 0 and therefore f X (t = t f Y (t 2, < t <. 3.4 Probability inequalities for the T 2 -statistic Because the exact distribution of T 2 is complicated, it would be useful to find simpler upper and lower bounds on its distribution. Chang and Richards [6] found upper and lower bounds for the distribution function of the T 2 -statistic [6] and Krishnamoorthy and Pannala [20] obtained an approximation to the distribution of their T 2 -statistic by means of an F -distributed statistic. Because our bounds are based on a stochastic representation of the exact distribution of the T 2 -statistic, it can be expected that our bounds will lead to more precise confidence regions than those in [6] and [20]. Theorem Let Q χ 2 p, Q 3 χ2 n p q, Q 4 χ2 q, Q 8 χ2 N p, and β Beta((n p 2/2, (N n /2 be mutually independent. For t 0, P ( T 2 t ( NQ4 ( P + Q t. (3.4. γq 3 Q 8 β Proof. It is straightforward from ( that µ Ĉov( µ µ L NQ 4 γq 3 ( + Q β e P e. (3.4.2

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful