THE STATISTICAL ANALYSIS OF MONOTONE INCOMPLETE MULTIVARIATE NORMAL DATA

Size: px
Start display at page:

Download "THE STATISTICAL ANALYSIS OF MONOTONE INCOMPLETE MULTIVARIATE NORMAL DATA"

Transcription

1 The Pennsylvania State University The Graduate School Department of Statistics THE STATISTICAL ANALYSIS OF MONOTONE INCOMPLETE MULTIVARIATE NORMAL DATA A Dissertation in Statistics by Megan M. Romer c 2009 Megan M. Romer Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2009

2 The dissertation of Megan M. Romer was reviewed and approved* by the following: Steven F. Arnold Professor of Statistics Chair of Committee Vernon M. Chinchilli Professor of Statistics Distinguished Professor and Chair of Public Health Sciences Diane M. Henderson Professor of Mathematics Donald St. P. Richards Professor of Statistics Dissertation Adviser Bruce G. Lindsay Willaman Professor of Statistics Head of the Department of Statistics * Signatures on file in the Graduate School.

3 iii Abstract We consider problems in finite-sample inference with monotone incomplete data drawn from N d (µ, Σ, a multivariate normal population with mean µ and covariance matrix Σ. In the case of two-step, monotone incomplete data, we show that µ and Σ, the maximum likelihood estimators of µ and Σ, respectively, are equivariant and obtain a new derivation of a stochastic representation for µ. Our new derivation allows us to identify explicitly in terms of the data the independent random variables that arise in that stochastic representation. Again, in the case of two-step, monotone incomplete data, we derive a stochastic representation for the exact distribution of a generalization of Hotelling s T 2, and therefore obtain ellipsoidal confidence regions for µ. We then derive probability inequalities for the cumulative distribution function of T 2. We apply these results to construct confidence regions for linear combinations of µ, and provide a numerical example in which we analyze a data set consisting of cholesterol measurements on a group of Pennsylvania hospital patients. In the case of three-step, monotone incomplete data, we examine the independence properties and joint distribution of subvectors of µ, the maximum likelihood estimator of µ. In our examination of the joint distribution of µ, we first establish that µ is equivariant and then identify the distribution of µ up to a certain set of conditioning variables.

4 iv Table of Contents List of Tables v List of Figures vi Acknowledgments vii Chapter. Introduction Chapter 2. Preliminaries Some matrix algebra Some multivariate distributions The matrix normal distribution The Wishart distribution The multivariate beta distribution The noncentral χ 2 distribution Some properties of these distributions Chapter 3. Two-step Monotone Incomplete Multivariate Normal Data Notation and maximum likelihood estimators A new derivation of an exact stochastic representation for µ An exact stochastic representation for the T 2 -statistic Probability inequalities for the T 2 -statistic Applications of the T 2 -statistic Simultaneous confidence intervals for linear functions of µ Ellipsoidal prediction regions for future observations Analysis of the Pennsylvania cholesterol data Chapter 4. Three-Step Monotone Incomplete Multivariate Normal Data Notation and maximum likelihood estimators Correlation properties of µ, µ 2, and µ The distribution of µ The covariance matrix of µ Chapter 5. Concluding Remarks Bibliography

5 v List of Tables. The Pennsylvania Cholesterol Data % Confidence Interval for Mean Cholesterol Levels

6 vi List of Figures 3. Simulated Cumulative Distribution Function of the T 2 -statistic: Comparison of Bounds (p = 2; q = ; n = 9; N = Simulated Cumulative Distribution Function of the T 2 -statistic: Comparison of Bounds (p = 2; q = 2; n = 0; N = Simulated Cumulative Distribution Function of the T 2 -statistic: Comparison of Bounds (p = 3; q = 3; n = 5; N = Simulated Cumulative Distribution Function of the T 2 -statistic: Comparison of Bounds (p = 4; q = 4; n = 5; N = Simulated Cumulative Distribution Function of the T 2 -statistic: Approximation (p = 2; q = 2; n = 0; N = Simulated Cumulative Distribution Function of the T 2 -statistic: Approximation (p = 4; q = 4; n = 5; N = The Pennsylvania Cholesterol Data: Assessment of Multivariate Normality 7

7 vii Acknowledgments To Donald, Jenn, Chuck, and Baby: For reasons unique to each of you, I thank you.

8 Chapter Introduction In all areas of research, incomplete data sets are ubiquitous. To describe a few of the many situations in which such data occur: in clinical trials, participants often drop out of studies; in engineering research, machines often fail; in scientific laboratories, beakers often break; in astronomy, cloudy weather interferes with data collection. Because of the omnipresence of data sets with missing values, there now exists an extensive literature on the analysis of such data. We refer to Giri [3], Johnson and Wichern [6], Little and Rubin [23], Schafer [30], and Srivastava [3] for treatments of statistical inference with incomplete data and a wide range of applications including astronomy, biology, clinical trials, and sample surveys. A well-known example of an incomplete data set was provided by Ryan and Joiner [29]. Researchers at a medical center in Pennsylvania monitored the cholesterol levels of 28 patients over a period of 4 days immediately following a heart attack. All 28 patients subsequently had their cholesterol levels measured at 2 and at 4 days of follow-up, and 9 patients were measured on day 4. The data are displayed in Table..

9 2 Table. The Pennsylvania Cholesterol Data Day Cholesterol Level Day Cholesterol Level * * * * * * * * * The cholesterol data provide an example of a two-step monotone incomplete pattern. A random sample X α = (X α,..., X dα, α =,..., N, from a d-dimensional multivariate population is said to be monotone incomplete if whenever X lα is missing then, for all j > l and β > α, X jβ also is missing. More generally, we can conceive of k-step monotone incomplete data sets as visualized in Figure. Figure

10 3 A random vector X R d is said to have a multivariate normal distribution with mean µ R d and positive definite (symmetric d d covariance matrix Σ, denoted X N d (µ, Σ, if its density function is [ (2π d/2 Σ /2 exp ] 2 (x µ Σ (x µ, (.0. x R d. The multivariate normal distribution is undoubtedly the most important distribution in statistics, and many extensive, classical treatments of statistical inference for multivariate normally distributed data are available, e.g. Anderson [2], Eaton [9], and Muirhead [25]. In this dissertation, we consider problems arising in the statistical analysis of monotone incomplete data drawn from a multivariate normal population. In general, µ and Σ are unknown, so it is of interest to perform statistical inference for them. Throughout the dissertation, we focus on maximum likelihood estimators of µ and Σ. When the data are monotone incomplete, the maximum likelihood estimators for µ and Σ, denoted by µ and Σ, respectively, are well-known [], [3], [5], [24]; however, the exact distributions of µ and Σ are far more complicated than in the case of complete data sets. In fact, until recently, neither distribution was known in the case of two-step data and, for general k, both distributions still remain unknown. Morrison [24] and Kanda and Fujikoshi [7] examined the exact means and variances of µ in great detail. Kanda and Fujikoshi also went on to find asymptotic results. Statistical inference for µ was first developed without knowledge of the exact distribution of µ. There has been much research in the area of hypothesis testing for µ e.g.

11 4 Bhargava [4], [5], Eaton and Kariya [0], Giri [3], Hao and Krishnamoorthy [4]. A drawback of hypothesis testing alone is that each element of µ must be specified in the null hypothesis, therefore it is preferable that the results of a hypothesis test be accompanied by a confidence region. Confidence regions for µ may be based on the likelihood ratio test statistic; however the resulting regions are not ellipsoidal and, in fact, have rather counterintuitive shapes. For the case in which the data are two-step monotone incomplete, the first derivation of ellipsoidal confidence regions for µ was obtained by Krishnamoorthy and Pannala [20]. The estimator given in [20] of Cov( µ, the covariance matrix of µ, is, in retrospect, not identical to Ĉov( µ, the maximum likelihood estimator of Cov( µ; however, it is asymptotically equivalent. Therefore we denote their estimated covariance matrix by Cov( µ. Krishnamoorthy and Pannala [20] obtained ellipsoidal confidence regions for µ by means of T 2 = ( µ µ Cov( µ ( µ µ, a generalization of the classical Hotelling s T 2 -statistic. Krishnamoorthy and Pannala approximated the distribution of T 2 with an F -distribution and we shall show that, for small dimensions, their approximation is very close to the exact distribution. Moreover, they also extended this method to general k-step monotone incomplete data. Chang and Richards [6], [7] derived stochastic representations for the exact distributions of µ and Σ in the case of two-step monotone incomplete data. These stochastic representations are important because the asymptotic distributions hold only for large sample sizes, which often are unavailable, especially for high-dimensional data. Chang

12 5 and Richards [6] also derived Ĉov( µ, the maximum likelihood estimator of Cov( µ and therefore generalized the classical Hotelling s T 2 -statistic to T 2 = ( µ µ Ĉov( µ ( µ µ. (.0.2 Chang and Richards based their ellipsoidal confidence regions for µ on probability inequalities for T 2. In this disseration, we begin by considering the case of two-step monotone incomplete data, i.e., k = 2. Let X R p and Y R q. Suppose N p+q ( X (µ, Σ. In Y ( X the two-step setting, we observe n mutually independent observations on and an Y additional N n independent observations on X only. Therefore, the data are mutually independent vectors of the form X Y, X 2 Y 2,, X n Y n, X n+, X n+2,, X N. In this setting, we provide an alternative derivation of the stochastic representation for µ found by Chang and Richards [6]. We first prove that both µ and Σ are equivariant, a result that greatly simplifies the examination of both statistics as we then may assume µ = 0 and Σ = I p+q, the identity matrix. This new derivation is based on the conditional distribution of the incomplete data given the complete data and it identifies explicitly the independent random variables that appear in the stochastic representation.

13 6 We derive next a stochastic representation for the exact distribution of the T 2 - statistic,.0.2. We first prove an invariance property of the T 2 -statistic, and we rely heavily on that property in our derivation of the exact stochastic representation. This stochastic representation allows us to construct ellipsoidal confidence regions for µ, or to perform related tests of hypotheses, with exact confidence levels or levels of significance, respectively. As our confidence regions are based on the exact distribution of the T 2 -statistic, it follows that our confidence regions are of exact level and hence are preferable to those of Chang and Richards [6] or Krishnamoorthy and Pannala [20]. From this stochastic representation, we also derive simultaneous confidence intervals for linear combinations of µ and we apply our confidence intervals and those previously available to the Pennsylvania cholesterol data as a numerical example. Because our stochastic representation is quite complex, we also provide in Chapter 3 probability inequalities for the T 2 -statistic. The last application we explore is the construction of prediction regions for new observations; although we are unable to find the exact distribution for our proposed statistic, we are confident that F -approximations can be obtained. In the second part of the dissertation, which appears in Chapter 4, we address several research problems related to three-step monotone incomplete data, i.e., k = 3. Let X R p, X 2 R p 2, and X 3 R p 3. Suppose (X, X, 2 X 3 N p +p 2 +p (µ, Σ. In the 3 three-step setting, we observe n mutually independent observations on (X, X, 2 X 3,

14 an additional n 2 independent observations on (X, X 2, and an additional n 3 observations on X only. Therefore, the data are mutually independent vectors of the form 7 X, X 2, X 3, X,n X 2,n X 3,n X,n + X 2,n + X,n +n 2 X,n +n 2 X,n +n 2 + X,n +n 2 +n 3 We partition µ in similar fashion into three subvectors, µ, µ 2, and µ 3. In this setting, we establish independence between µ and { µ 2, µ 3 }. Furthermore, we prove that when Σ = I d, d = p + p 2 + p 3, these subvectors are pairwise uncorrelated; therefore, although we have not established independence between µ 2 and µ 3, we have shown that they are uncorrelated for Σ = I d. We establish the equivariance of µ under a certain group of transformations and provide an extension of our alternative derivation for the distribution of µ from two-step to three-step monotone incomplete data. Although we have not been able to find a joint stochastic representation for µ, µ 2, and µ 3, we believe that we have identified the six random variables, whose joint distribution is unknown, that form the basis of that representation. Throughout the dissertation, we have also made an assumption on the process that generates the incomplete data. There are three main underlying processes that describe how observations are missing: missing at random, missing completely at random, and not missing at random [28]. Because of the independence structure we have assumed, we have also implicitly assumed our data is missing completely at random, that is, there is no reason or order as to why any one unit would be missing an observation as opposed

15 to another. Readers are referred to [23] and [30] for further discussion on the types of missingness. 8

16 9 Chapter 2 Preliminaries 2. Some matrix algebra Let X be a p q matrix and let vec(x denote the pq column vector formed by stacking the columns of X. Let C > 0 and D > 0 be p p and q q matrices, respectively, where > 0 denotes that the matrices are positive definite (and symmetric. We denote the inverse, trace, and determinant of C by C, tr (C, and C, respectively. Also, we denote the Kronecker product of C and D by C D. Muirhead [25] provides a number of useful properties of these matrix operations and we collect together some of their properties in the following proposition. Proposition 2... (i (C D = C D, tr (C D = (tr C(tr D, (C D = C D, and C D = C q D p. (ii If A is m p and B is r q, then (A B(C D = AC BD. (iii If A is m p and B is q m, then the following equalities hold: vec (BAX = (X Bvec (A tr (BAX = (vec (B (I Avec (X tr (AX CXB = (vec (X (BA C vec (X.

17 (iv Let A be p q and B be q p and P = C + ADB. The following expression for P is known as Woodbury s formula: 0 P = C C AD(D + DBC D DBC. (2.. (v Let λ,..., λ p be the eigenvalues of C. Then there exists an orthogonal p p matrix H such that H CH = diag(λ,..., λ p. (2..2 Let the (p + q (p + q matrix M > 0 be partitioned into p and q rows and columns, i.e., M = M M 2, where M is p p, M 2 = M is p q, and M 2 22 M 2 M 22 is q q. A large portion of our research involves the partitioning of data into blocks with similar patterns of missingness. We therefore make much use of the well-known Schur complement, that is M 22 = M 22 M 2 M M 2. There are a number of results involving Schur complements that we will need and so we list them in the following lemma, see Anderson [2] and Muirhead [25]. Proposition Partition the positive definite matrix M into p and q rows and columns, as above. Then (i The partial Iwasawa coordinates of M are {M, M 2 M, M 22 }, and I M = p 0 M 0 I p M 2 M I q 0 M 22 0 I q M M 2. (2..3

18 Further, I p M M 2 M = 0 I q M 0 0 M 22 I p 0. M 2 M I q (ii Let x = (x, x 2, where x R p, x 2 R q. Then, x M x = (x M 2 M 22 x 2 M 2 (x M 2 M 22 x 2 + x 2 M 22 x 2. (2..4 (iii Let X N p+q Y µ µ 2, Σ Σ 2, Σ 2 Σ 22 then, Y X N q (µ 2 + Σ 2 Σ (X j µ, Σ 22. (2..5 Finally, we define e = (, 0,..., 0 to be the vector whose first element is one followed by zeros for an arbitrary length that will be obvious when used in the context of the problem. Similarly, we define e 2 = (0,, 0,..., Some multivariate distributions 2.2. The matrix normal distribution Let M be a p q matrix, C and D be p p and q q positive definite matrices, respectively. A p q random matrix W 2 has a matrix normal distribution with mean M and covariance parameter C D, denoted W 2 N(M, C D, if the probability

19 2 density function of W 2 is [ (2π pq/2 C q/2 D p/2 exp 2 tr ( C W 2 M D ( W 2 M ], (2.2. W 2 R p q. Then W 2 N(M, C D is equivalent to stating that vec(w 2 N(vec(M, C D, the multivariate normal distribution discussed in (.0.; see Muirhead [25, p. 79] The Wishart distribution A d d random matrix W has a Wishart distribution with degrees of freedom a > d and covariance matrix Λ > 0, denoted W W d (a, Λ, if its probability density function is 2 ad/2 Λ a/2 Γ d (a/2 W 2 a 2 (d+ exp( 2 tr Λ W, (2.2.2 where W > 0, Re(a > 2 (d, and Γ d (a = π d(d /4 d j= Γ(a 2 (j, (2.2.3 is the multivariate gamma function [2], [25]. The Wishart distribution is also defined when W is singular, in which case, the density function does not exist. Let Σ > 0 and T be d d matrices and Z N(0, I n Σ. If W = Z Z, then W W d (n, Σ and has

20 3 characteristic function E[exp(i tr (T W ] = I d 2iT Σ n/2. ( The multivariate beta distribution A d d random matrix L has a multivariate beta distribution with degrees of freedom (a, b, where a > d and b > d, denoted Beta d (a/2, b/2, if its probability density function is Γ d ((a + b/2 Γ d (a/2γ d (b/2 L (a d /2 I L (b d /2, (2.2.5 L > 0, I L > The noncentral χ 2 distribution Let Z N(µ, I d, then v = Z Z χ 2 d (µ µ, has a noncentral chi-square distribution with noncentrality parameter τ 2 = µ µ. It is well-known, [25], that the characteristic function of v is [ E exp [itv] = ( 2it d/2 2 itτ ] exp. ( it 2.3 Some properties of these distributions We begin by stating a result on the characteristic function of a quadratic form in multivariate normal variables. Results of this type have been stated in various forms

21 in the literature; notably, they can be deduced from a result of Khatri [8], p. 446, eq. ( Lemma (Khatri [8] Let C be a real, symmetric p p matrix, t R, v R p, and Z N p (0, Σ. Then ( Ee it(z CZ+v Z = Ip 2itCΣ /2 exp 2 t2 v Σ(I p 2itCΣ v. (2.3. Moreover, (2.3. remains valid if C is a complex symmetric matrix whose imaginary part is positive-definite and v is a complex vector. The following result extends Lemma 2.2 of Chang and Richards [6]. Lemma Let Λ be a q q positive definite matrix and U be a p p positive semi-definite matrix. If B 2 N(0, C D, then E exp( tr UB 2 D ΛD B 2 = I pq + 2C/2 UC /2 D /2 ΛD /2 /2. (2.3.2 This result remains valid if U is a symmetric complex matrix with the real part of U positive definite. Proof. We attribute the following proof of (2.3.2 to an anonymous referee of Chang and Richards [6]. First, recall that if X N d (0, I d, then XX W d (, I d. Therefore, for any positive definite matrix A, ( ( E exp tx AX = E exp t tr AXX = I + 2tA /2, (2.3.3

22 5 for t > 0. Define K = D /2 B 2 C /2, φ = D /2 ΛD /2, and ψ = C /2 UC /2. By Proposition 2.. (iii, vec (K = (C /2 D /2 vec(b. Because 2 vec(b 2 N(0, C D, it follows that vec(k N(0, (C /2 D /2 (C D(C /2 D /2. By Proposition 2.. (ii, the covariance matrix of vec(k equals (C /2 D /2 (C D(C /2 D /2 = C /2 CC /2 D /2 DD /2 = I p I q = I pq ; hence, vec(k N pq (0, I pq. Further, by Proposition 2.. (iii, (vec K (ψ φ(vec K (vec K vec(φkψ = tr (K φkψ = tr (ψk φk, and from the definitions of ψ, K, and φ, we have ψk φk = UB 2 D ΛD B 2. Because vec(k N pq (0, I pq, the moment-generating function stated above, (2.3.3, with t = and A = ψ φ, yields the desired result. Chang and Richards [6] and Kanda and Fujikoshi [7] gathered together a collection of properties of the Wishart distribution that we will also need here, all of which are available from Anderson [2], Eaton [9], or Muirhead [25]. Proposition Suppose that W W d (a, Λ, and W = W W 2 and W 2 W 22 Λ Λ = Λ 2 have been partitioned similarly to the matrix M in Section (2.. Λ 2 Λ 22 Then,

23 (i W 22 and {W 2, W } are mutually independent, and W 22 W q (a p, Λ 22. (ii W 2 W N(Λ 2 Λ W, Λ 22 W. (iii If Λ 2 = 0, then W 22, W, and W 2 W W 2 are mutually independent. Moreover, W 2 W W 2 W q (p, Λ 22. (iv For k d, if M is k d of rank k, then (MW M W k (a d+k, (MΛ M. In particular, if Y is a d random vector which is independent of W and satisfies P (Y = 0 = 0, then Y is independent of Y Λ Y /Y W Y χ 2 a d+. (v Let G W d (a, Σ, H W d (b, Σ, where G and H are independent. Then L = (G + H /2 G(G + H /2 Beta d (a/2, b/2 and L is independent of G + H. 6 Finally, in deriving any stochastic representation, we will use the standard notation, =, L for equal in distribution;, L for stochastically greater; and, L for stochastically smaller. That is to say, if X and Y are random entities, then X = L Y signifies that X and Y have the same probability distribution; also if X and Y are scalar-valued random variables, then X L Y signifies that P (X t P (Y t for all t R, and X L Y whenever Y L X.

24 7 Chapter 3 Two-step Monotone Incomplete Multivariate Normal Data This chapter begins with a thorough description of our notation and the maximum likelihood estimators for two-step monotone incomplete data. We then provide an alternative derivation of the exact distribution of µ, the maximum likelihood estimator for µ, first derived by Chang and Richards, [6]. In this chapter we will also derive a stochastic representation for the exact distribution of a generalization of Hotelling s T 2 - statistic. We will then derive upper and lower bounds for the exact distribution of the T 2 -statistic. As a consequence, we obtain exact ellipsoidal confidence regions for µ. We also apply the T 2 -statistic to derive simultaneous confidence intervals for linear functions of µ, and we apply these results to the Pennsylvania cholesterol data. We complete this chapter by studying prediction regions for complete observations. 3. Notation and maximum likelihood estimators Let X R p and Y R q. In the case of two-step monotone incomplete data, we suppose that the data are N mutually independent observations of the form, X Y, X 2 Y 2,, X n Y n, X n+, X n+2,, X N (3..

25 where ( Xj Y j, j =,..., n are observations from N p+q (µ, Σ, and the incomplete data X j, j = n +,..., N, are observations on the first p characteristics of the same population. One additional assumption necessary to guarantee that all means and variances are finite and that all integrals encountered later are absolutely convergent is that n > p + 2, [6]. Data of the form (3.. have been widely studied; cf. Anderson [], Bhargava [4], [5], Morrison [24], Eaton and Kariya [0], Fujisawa [2], Hao and Krishnamoorthy [4], Kanda and Fujikoshi [7], and most recently, Chang and Richards [6], [7]. Define the sample mean vectors 8 X = n Ȳ = n n j= n j= X j, X2 = N n Y j, X = N N j=n+ N X j, j= X j, (3..2 and the corresponding matrices of sums of squares and products A,n = n (X j X (X j X, A 2 = A = n (X 2 j X (Y j Ȳ, j= A 22 = n (Y j Ȳ (Y j Ȳ, A,N = N (X j X(X j X. j= j= j= (3..3 In addition, we use the notation τ = n/n for the proportion of data which are complete and denote τ by τ, so that τ = (N n/n is the proportion of incomplete observations. By Anderson [] (cf. Morrison [24], Anderson and Olkin [3], Jinadasa and Tracy [5],

26 9 the maximum likelihood estimators of µ and Σ are, respectively, µ µ = = µ 2 X Ȳ τa 2 A,n ( X X 2, (3..4 and Σ = Σ Σ 2 Σ2 Σ22 = N A,N N A 2 A,n A,N N A,N A,n A 2 n A 22,n + N A 2 A,n A,N A,n A 2. ( A new derivation of an exact stochastic representation for µ Chang and Richards [6] derived an exact stochastic representation for µ by means of a direct analysis of its characteristic function. In examining ways to extend their methods to three-step data, we discovered an alternative method to derive the exact distribution of µ by means of the incomplete data given the complete data, namely Y given X,..., X N. Before we delve into the exact distribution, we show that µ and Σ are equivariant. This can be derived from a general argument given by Davison [8], p. 85, however we have chosen to provide the explicit details here.

27 Proposition Let Λ and Λ 22 be p p and q q positive definite matrices, respectively, Λ 2 be q p, ν R p, ν 2 R q, and Λ = Λ 0 0 Λ 22, C = I p 0, ν = Λ 2 I q ν ν Then the estimators µ and Σ are equivariant under the transformation X j Y j ΛC X j Y j + ν, (3.2. for j =,..., n. For j = n +,..., N, X j Λ X j + ν. Proof. Let X j Y j = ΛC X j Y j + ν = Λ X j + ν, (3.2.2 Λ 22 Λ 2 X j + Λ 22 Y j + ν 2 for j =,..., n and X = Λ j X j + ν, j = n +,..., N. Then X j Y j N p+q µ = µ µ 2, Σ = Σ Σ 2 Σ 2 Σ 22,

28 2 j =,..., n, and X j N p (µ, Σ, j = n +,..., N, where µ = ΛCµ + ν and Σ = ΛCΣC Λ. Define the sample mean vectors X = n Ȳ = n n X, j j= n j= X 2 = N n Y j, X = N N j=n+ X j, N X, (3.2.3 j j= and the corresponding matrices of sums of squares and products A = N (X X (X X n, A =,N j j 2 A = (X X (Y Ȳ, 2 j j A,n = j= n (X X j (X X n j, A = (Y Ȳ (Y Ȳ 22 j j. (3.2.4 j= j= j= Then, the maximum likelihood estimators of µ and Σ are, respectively, µ = X Ȳ τa A ( X X 2,n 2 (3.2.5 and Σ = N A,N N A A 2,n A,N N A A,N,n A 2 n A + 22,n N A A 2,n A A,N,n A 2. (3.2.6

29 Our goal is to show that µ = ΛC µ + ν and that Σ = ΛC ΣC Λ. As a consequence of , we have the following relations: 22 X = Λ X + ν, X 2 = Λ X 2 + ν, Ȳ = Λ 22 (Ȳ + Λ 2 X + ν 2, X = Λ X + ν, and A,N = Λ A,N Λ, A,n = Λ A,n Λ, n A = 2 A = Λ 2 (X j X [ ( Λ 22 Λ2 (X j X + Y j Ȳ ] j= = Λ (A 2 + A,n Λ 2 Λ 22, A = Λ [ n (Λ 2 (X j X + Y j Ȳ (Λ 2 (X j X + Y j Ȳ ] Λ22 j= = Λ 22 [ Λ2 A,n Λ 2 + Λ 2 A 2 + A 2 Λ 2 + A 22 ] Λ22. We may write both µ and Σ in terms of the original means and matrices of sums of squares and products. It is straightforward that µ = τ X + τ X 2 = τλ X + ν + τ(λ X2 + ν = Λ µ + ν.

30 23 The maximum likelihood estimator of µ 2 is µ 2 = Λ 22Ȳ + ν 2 + Λ 22 Λ 2 X τλ 22 (A 2 + Λ 2 A,n A,n ( X X 2 = Λ 22 Ȳ + ν 2 + Λ 22 Λ 2 (τ X + τ X 2 τλ 22 A 2 A,n ( X X 2. (3.2.7 Because τ X + τ X 2 = X, it follows that µ 2 = Λ 22 Λ 2 X + Λ 22 (Ȳ + ν 2 τa 2 A,n ( X X 2 + ν 2 = Λ 22 (Λ 2 µ + µ 2 + ν 2. (3.2.8 Therefore µ µ 2 = Λ µ + ν = ΛC µ + ν, (3.2.9 Λ 22 (Λ 2 µ + µ 2 + ν 2 µ 2 so we have proved that µ is equivariant. The maximum likelihood estimators of Σ, Σ 2, and Σ 22 are, respectively, Σ = N Λ A,N Λ, (3.2.0 Σ 2 = Σ 2 = N Λ A,N A,n A 2 Λ 22 + N Λ A,N Λ 2 Λ 22, (3.2.

31 24 and Σ = 22 n Λ [ 22 Λ2 A,n Λ 2 + Λ 2 A 2 + A 2 Λ 2 + A 22 (A 2 + Λ 2 A,n A,n (A 2 + A,n Λ 2 ] Λ 22 + N Λ 22 (A 2 + Λ 2 A,n A,n A,N A,n (A 2 + A,n Λ 2 Λ 22 = n Λ 22 A 22,n Λ 22 + N Λ 22 [ Λ2 A,N Λ 2 + A 2 A,n A,N Λ 2 + Λ 2 A,N A A,n 2 + A 2 A A,n,N A A ],n 2 Λ22. (3.2.2 To establish the equivariance of Σ, let us evaluate ΛC ΣC Λ. To that end, ΛC ΣC Λ Λ = 0 Λ 22 Λ 2 Λ 22 N A,N N A 2 A A,n,N Λ Λ Λ Λ 22 N A,N A,n A 2 n A 22,n + N A 2 A,n A,N A,n A 2 Λ = Σ Λ Λ ( Σ Λ 2 + Σ 2 Λ 22 Λ 22 ( Σ 2 + Λ 2 Σ Λ Λ 22 (Λ 2 Σ Λ 2 + Σ 2 Λ 2 + Λ 2 Σ2 + Σ 22 Λ 22 (3.2.3 By straightforward matrix multiplication, it follows from (3.2.0, (3.2., and (3.2.2 that ΛC ΣC Λ = Σ. Therefore, Σ is also equivariant under the transformation (3.2.2.

32 25 As a consequence of the equivariance of Σ, we obtain the following result. Corollary The estimated covariance matrix of µ is equivariant under the transformation ( Proof. Because Σ is equivariant under the transformation (3.2.2, it follows that Ĉov( µ = Σ + N (γ N = N ΛC ΣC Λ Σ 22 (γ N Λ 22 Σ22 Λ 22 = ΛCĈov( µc Λ. Therefore Ĉov( µ is equivariant. By taking Λ = Σ /2, Λ 22 = Σ /2, Λ 22 2 = Σ 2 Σ, (3.2.4 then under the transformation (3.2.2, we obtain Cov X = ΛCΣC Λ = I p+q. (3.2.5 Y Therefore, in analyzing the distribution of µ, we may assume, without loss of generality, that the population covariance matrix is I p+q. Furthermore, by choosing ν = ΛCµ, we may also assume, without loss of generality, that µ = 0.

33 of µ. We will now provide an alternative proof for the exact stochastic representation Theorem (Chang and Richards [6] Let V N p+q 0, N Σ + τ n Q χ 2 n p, Q 2 χ2 p, V 2 N q (0, I q, where V, V 2, Q, and Q , 0 Σ 22 are mutually independent. Then the distribution of µ is given by the exact stochastic representation µ L = µ + V + ( τq2 nq /2 0 Σ /2 22 V 2, (3.2.6 Proof. Assume, without loss of generality, that µ = 0 and Σ = I p+q. Then, by (2..5, it follows that Y j X j N q (0, I q and therefore all the X j and Y j are mutually independent, normally distributed random vectors. Therefore Ȳ {X,..., X N } N q (0, n I q. Conditional on the complete data, X = {X,..., X N }, the vector µ 2 is a linear combination of the incomplete data, Y = (Y,..., Y n, hence µ 2 is normally distributed. Moreover, because µ = X, it then follows that µ is fixed in the conditional distribution of µ 2 given X. We now need to find the conditional expected value and covariance matrix of µ 2. Let c j = (X j X A,n ( X X 2, j =,..., n; noting that n j= c j = 0, we may write µ 2 as µ 2 = Ȳ τa 2 A,n ( X X 2 = n = n n Y j τ j= n Y j τ j= n c j (Y j Ȳ j= n c j Y j. j=

34 Let δ jk be Kronecker s delta, that is δ jk =, j = k and δ jk = 0, j k. Then it is of note that 27 n n n n Cov( Y j, c j Y j = c k E(Y j Y k j= j= = = ( j= k= n j= k= n c k δ jk I q n c j I q = 0. j= Because n j= Y j and n j= c j Y j is zero, it then follows that n j= Y j and n j= c j Y j are jointly normally distributed and their covariance are independent. Therefore, conditional on X, µ 2 is a linear combination of independent normal vectors, hence µ 2 is normally distributed with mean E( µ 2 X = = n E(( n τc j Y j X j= n ( n τc j E(Y j = 0, (3.2.7 j= and covariance matrix Cov( µ 2 X = Cov( n n n Y j + Cov( τ Y j c j j= = n I q + τ 2 n j= k= j= n c j c k δ jk I q = n n I q + τ 2 ( c 2 I j q. j=

35 28 Because n j= c 2 j = ( X X 2 A,n n j= = ( X X 2 A,n ( X X 2, (X j X (X j X A,n ( X X 2 it follows that Cov( µ 2 X = n I q + τ 2 ( X X 2 A,n ( X X 2 I q. (3.2.8 Therefore µ 2 X N q (0, n [ + τ 2 n( X X 2 A,n ( X X 2 ]I q. Observe that µ 2 depends on X only through X X 2 and A,n ; therefore { X X 2, A,n } is sufficient for µ 2. By the independence of the sample mean and sample covariance matrix of a normal random sample, X and A,n are independent and, consequently, {τ X + τ X 2, X X 2 } and A,n are independent. Next, because Cov( X X 2, τ X + τ X 2 = Cov( X, τ X Cov( X 2, τ X 2 = N I p N I p = 0 and ( X X 2, τ X + τ X 2 has a joint multivariate normal distribution, it follows that µ = τ X + τ X 2 is independent of X X 2. Therefore X X 2, A,n, and τ X + τ X 2, and consequently, µ and µ 2, are mutually independent. By Proposition (iv, Q = ( X X 2 ( X X 2 ( X X 2 A,n ( X X 2 χ2 n p,

36 29 and Q is independent of X X 2. Therefore, ( µ 2 {X, Q } N q 0, [ + τ 2 n ( X X 2 ( X X 2 ] I n Q q. (3.2.9 Because X X 2 N p (0, (n (N n I p and n (N n = /n τ, it follows that ( X X 2 ( X X 2 Q 2 /n τ, where Q 2 χ 2. We may now write the conditional p distribution of µ 2 as ( µ 2 {Q, Q 2 } N q (0, n + τ Q 2 I n Q q. By elementary properties of the normal distribution, it follows that µ 2 L = n V 2 + τq2 nq V 2, where V 2 N q (0, I q, V 2 N q (0, I q, and V 2, V 2, Q, and Q 2 are mutually independent. It is straightforward to see that µ = X N p (0, N I p and therefore µ L = V, where V N p (0, N I p. Therefore, a joint stochastic representation for µ and µ 2 is µ µ = = L V + µ 2 τq 2 nq 0 V 2, where V = V V 2 N p+q 0, N I p 0 and V 2 is as defined previously. 0 n I q

37 Our final step is to transform the data back to its original form for general µ and Σ. Recall that by (3.2.4, the transformation to µ = 0 and Σ = I is µ µ 2 = Σ /2 0 0 Σ /2 22 I 0 µ µ. I µ 2 Σ 2 Σ 30 The inverse of the latter transformation is µ = I 0 µ 2 Σ 2 Σ I Σ /2 0 0 Σ /2 22 µ µ 2 + µ; therefore µ = L I 0 Σ 2 Σ I Σ /2 0 0 Σ /2 22 V + τq 2 nq 0 + µ. ( V 2 Because Σ /2 V 22 2 N q (0, Σ 22, and I 0 Σ 2 Σ I Σ /2 0 0 Σ /2 22 V N p+q 0, N Σ + τ n 0 0, 0 Σ 22 we obtain ( One advantage of this proof is that it provides explicit formulas for V, V 2, Q, and Q 2 in terms of the data, whereas Chang and Richards showed only the existence of

38 3 these random variable. Namely, those explicit formulas are: Q = ( X X 2 ( X X 2 ( X X 2 A ( X,n X 2, Q 2 = n τ( X X 2 ( X X 2, V V = = X, V 2 Ȳ n Y j= j (X j X A ( X,n X 2 V 2 = ( X X 2 A ( X,n X An exact stochastic representation for the T 2 -statistic Following Krishnamoorthy and Pannala [20] and Chang and Richards [6], we will study the pivotal quantity, T 2 = ( µ µ Ĉov( µ ( µ µ, (3.3. a generalization of Hotelling s T 2 -statistic in the setting of monotone incomplete data. An F -distribution approximation to the distribution of a statistic similar to (3.3. was given by Krishnamoorthy and Pannala [20]. Chang and Richards [6] obtained upper and lower bounds for its distribution, leading to conservative ellipsoidal confidence regions for µ, and derived the asymptotic distribution of the T 2 -statistic for the cases in which n, N, p, and q satisfy n > p+q for fixed n, or n/n δ (0, ] as n, N. Nevertheless, the exact finite-sample distribution of this statistic was unknown before the work in this dissertation.

39 32 Our primary motivation for deriving a stochastic representation for the exact distribution of the T 2 statistic (3.3. is that resulting ellipsoidal confidence regions for µ will be less conservative than those previously derived. Let γ = + (n 2N τ n(n p 2. (3.3.2 As shown by Chang and Richards [6], the maximum likelihood estimator of Cov( µ is Ĉov( µ = N Σ + (γ N Σ, ( where Σ is as defined in (3..5. Following Chang and Richards [7], we decompose A,N as follows: A,N = A,n + B + B 2, (3.3.4 where B = B 2 = N j=n+ (X j X 2 (X j X 2, (3.3.5 n(n n ( N X X 2 ( X X 2, (3.3.6 and A,n W p (n, Σ, B W p (N n, Σ, and B 2 W p (, Σ are mutually independent Wishart matrices. This decomposition leads to the following result due to Chang and Richards [7].

40 Lemma (Chang and Richards [6] When Σ 2 = 0, the random matrices and vectors A 22,n, A 2 A,n, A,n, Ȳ, X, B, and B 2 are mutually independent. 33 In preparation for our derivation of an exact stochastic representation of the distribution of the T 2 -statistic, we show that, without loss of generality, we may assume that µ = 0 and Σ = I d. Similar to Chang and Richards [6], we begin by writing T 2 = ( µ µ Ĉov( µ ( µ µ as a sum of two terms. Define T 2 = n(ȳ A 2 A X,n µ 2 +A 2 A µ,n A (Ȳ A 22,n 2 A X,n µ 2 +A 2 A µ,n, and (3.3.7 T 2 2 = N( X µ A,N ( X µ = N( X µ (A,n + B + B 2 ( X µ. (3.3.8 Applying the quadratic identity (2..4 with x µ µ and Λ NĈov( µ, we find that N T 2 = ( µ µ ( NĈov( µ ( µ µ = ( µ 2 µ 2 A 2 A ( µ,n µ (NĈov( µ 22 ( µ 2 µ 2 A 2 A ( µ,n µ + ( µ µ Σ ( µ µ.

41 By (3..4, µ 2 A 2 A,n µ = Ȳ A 2 A X,n ; by (3.3.3, NĈov( µ 22 = γ n A 22,n ; and by (3.3.4, A,N = A,n + B + B 2. Therefore 34 N T 2 = γ T + T 2. (3.3.9 Krishnamoorthy and Pannala [20] also decomposed their T 2 -statistic into a corresponding sum T 2 + T 2 and showed that the marginal distribution of each T 2 j does not depend on (µ, Σ. To deduce that the distribution of their T 2 -statistic depends neither on µ or Σ, it would need to be shown that the joint distribution of ( T, T 2 also satisfies that property, a result which appears difficult to establish directly. We provide a proof that uses ideas of Yamada, et al. [32] to show that the distribution of the T 2 -statistic depends neither on µ nor Σ. Proposition The statistics T 2 in (3.3.7, and T 2 2 in (3.3.8 are algebraically invariant under the transformation ( Consequently, the same holds for the T 2 - statistic in Proof. Let X Y = ΛC X + ν = Y Λ X + ν Λ 22 Λ 2 X + Λ 22 Y + ν 2 (3.2., µ and Σ are equivariant under this transformation. Because. By Proposition (T 2 2 ( µ µ Σ ( µ µ = (Λ µ + ν Λ µ ν (Λ Σ Λ (Λ µ + ν Λ µ ν = ( µ µ Σ ( µ µ T 2 2,

42 35 the statistic T 2 2 is invariant under the transformation ( To show that the statistic T 2 is invariant under the transformation (3.2.2, let us analyze each term of (T 2 ( µ 2 µ Σ Σ ( µ 2 2 µ Σ ( µ 22 2 µ Σ Σ ( µ 2 2 µ individually. The vector µ 2 µ 2 transforms to Λ 22 (Λ 2 µ + µ 2 + ν 2 (Λ 22 µ 2 + Λ 22 Λ 2 µ + ν 2 = Λ 22 ( µ 2 µ 2 + Λ 22 Λ 2 ( µ µ. The vector Σ Σ ( µ 2 µ transforms to Λ 22 (Λ 2 Σ + Σ 2 Λ (Λ Σ Λ (Λ µ + ν Λ µ ν = Λ 22 (Λ 2 + Σ 2 Σ ( µ µ. In addition, Σ 22 = Λ 22 (Λ 2 Σ Λ 2 + Σ 2 Λ 2 + Λ 2 Σ2 + Σ 22 Λ 22 Λ 22 (Λ 2 Σ + Σ 2 Λ (Λ Σ Λ Λ ( Σ Λ 2 + Σ 2 Λ 22 = Λ 22 ( Σ 22 Σ Σ 2 Σ 2 Λ 22 = Λ 22 Σ22 Λ 22.

43 36 Consequently, (T 2 equals ( µ 2 µ 2 Σ 2 Σ ( µ µ Λ 22 (Λ 22 Σ22 Λ 22 Λ 22 ( µ 2 µ 2 Σ 2 Σ ( µ µ = ( µ 2 µ 2 Σ Σ ( µ 2 µ Σ ( µ 22 2 µ 2 Σ Σ ( µ 2 µ T 2. Therefore, T 2 and T 2 2 both are invariant and hence, by (3.3.9, T 2 also is invariant. By taking Λ = Σ /2, Λ 22 = Σ /2, and Λ 22 2 = Σ 2 Σ, the covariance matrix of (X, Y under this transformation is ΛCΣC Λ = I p+q. We may then assume that the population covariance matrix is I p+q. Furthermore, by choosing ν = ΛCµ we may assume µ = 0. Hence, in deriving the distribution of the T 2 -statistic, we assume, without loss of generality, that µ = 0 and Σ = I p+q We now derive a stochastic representation for the exact distribution of the T 2 - statistic (3.3.. The proof of this result is lengthy, relying on characteristic functions and repeated applications of the powerful method of orthogonal invariance. The resulting stochastic representation, however, involves only chi-square and Beta random variables, and a 2 2 Wishart matrix, all mutually independent. Thus, the stochastic representation is straightforward to simulate. Theorem Let cos 2 θ Beta ( 2, 2 (p, Q χ 2 p, Q 2 χ2 p, Q 3 χ2 n p q, Q 4 χ 2, W W q 2 (N p, I 2, and β Beta((n p 2/2, (N n /2 be mutually

44 37 independent. Then, T 2 = L NQ ( 4 + Q γq β e W e 3 /2 nq + N nq /2 cos θ + 2 /2 N nq sin θ 2 2 W /2 nq + N nq /2 cos θ 2 /2 N nq sin θ τq + τq 2 2( τq τq 2 /2 cos θ ( + τq + τq 2 2( τq τq 2 /2 cos θ e W e /2 e W nq + 2 N nq /2 cos θ 2. (3.3.0 /2 N nq sin θ 2 Proof. We assume, without loss of generality, that µ = 0 and Σ = I p+q. Recall from ( (3.3.9 that T 2 N = γ T 2 + T 2 2, where T 2 = n(ȳ A 2 A X,n A (Ȳ A 22,n 2 A X,n and T 2 2 = N X (A,n + B + B 2 X. By elementary properties of the multivariate normal distribution, n( Ȳ A 2 A,n X { X, X 2, A 2, A,n, B } N q ( na 2 A,n X, I q, (3.3.

45 and by Proposition 2.3.3(i, A 22,n W q (n p, I q and is independent of {A 2, A,n }. 38 Define Q 3 = n ( Ȳ A 2 A (Ȳ X,n A2 A,n T 2 X, then, by Proposition 2.3.3(iv, Q 3 {A 2, A,n, X } χ 2, and Q n p q 3 is independent of Ȳ A 2 A X,n. Because this distribution does not depend on {A 2, A,n, X }, then Q 3 is also independent of {A 2, A,n, X }. Therefore, T 2 n ( Ȳ A L 2 A (Ȳ X,n A2 A X,n =, Q 3 where Q 3 χ 2 n p q and the numerator and denominator are mutually independent. By (3.3., n ( Ȳ A 2 A (Ȳ X,n A2 A,n 2( X χ n X q A,n A 2 A 2 A,n X, (3.3.2 a noncentral chi-square distribution with q degrees of freedom and noncentrality parameter n X A A,n 2 A 2 A X,n.

46 39 Let t R. By Lemma 3.3., the characteristic function of T 2 /N is E exp[itn T 2 ] [ ( = E exp it n ( Ȳ A γq 2 A (Ȳ X,n A2 A X,n 3 + N X ] (A,n + B + B 2 X [ = E Q3 E X E X2 E B E A2,A exp itn X ( ] A,n + B,n + B 2 X EȲ { X, X 2,A 2,A,n,B } [it exp n ( Ȳ A γq 2 A (Ȳ X,n A2 A 3,n X ] (3.3.3 Applying the formula (2.2.6 for the characteristic function of the noncentral χ 2 distribution to (3.3.2 and inserting the result into (3.3.3 yields [ E exp[itn T 2 ] = E Q3 E X E X2 E B E A2,A exp itn X ( ] A,n,n + B + B 2 X [ ( 2it q/2 itn X A A,n 2 A 2 A exp γq 3 γq 3 2it,n ] X. (3.3.4 By Proposition 2.3.3(ii, A 2 A,n N(0, I q A,n ; therefore (3.3.4 equals E Q3 ( 2it [ itn X ( ] A,n + B + B 2 X q/2e γq X E X2 E B E A,n exp 3 [ itn X A A,n 2 A ] 2 A X,n E A2 A exp. (3.3.5,n γq 3 2it

47 By Lemma 2.3.2, with U = itn/(γq 3 2itI q, B 2 = A 2, D = A,n, C = I q, and Λ = X X, [ itn X A A,n 2 A ] 2 A X,n E A2 A exp,n γq 3 2it = E A2 A,n exp = = = [ itn tr (A 2 A X,n X A A,n 2 γq 3 2it I pq 2itn γq 3 2it I q A /2 X,n X I p 2itn γq 3 2it A /2 X,n X A /2 /2,n A /2 q/2,n ( 2itn q/2 γq 3 2it X A X,n. ] 40 Substituting this result into (3.3.5 yields, E exp[itn T 2 ] = E Q3 ( 2it γq 3 q/2e X E X2 E B E A,n exp ( 2itn γq 3 2it X q/2. A X,n [ itn X ] (A,n + B + B 2 X Because ( 2it ( 2itn γq 3 γq 3 2it X γq 3 2it 2itn X A X A X,n,n =, γq 3

48 4 it follows that [ E exp[itn T 2 ] = E Q3 E X E X2 E B E A,n exp itn X ] (A,n + B + B 2 X ( 2it( + n X A X,n q/2 γq 3 [ = E Q3 E X E X2 E B E A,n exp itn X ] (A,n + B + B 2 X [ ] itq4 E Q4 exp ( + n γq X A X,n, 3 where Q 4 χ 2 q and Q 4 is independent of Q 3, X, X 2, B, A,n. Hence, E exp[itn T 2 ] [ ] itq4 [ = E Q3 E Q4 exp E γq X E X2 E B E A,n exp itn X ] (A,n + B + B 2 X 3 exp [ itnq4 γq 3 X A X,n ]. (3.3.6 By applying the method of orthogonal invariance, we shall simplify the above characteristic function greatly. For fixed Q 3 and Q 4, define the function f( X, X [ 2 = E B E A,n exp itn X ] [ (A,n + B + B 2 itnq4 X exp γq 3 X A X,n ]. We first verify that f( X, X 2 is invariant under the transformation ( X, X 2 (H X, H X 2, where H O(p, the set of orthogonal p p matrices. Recall that B 2 is a function of X and X 2, ( Suppose ( X, X 2 is replaced by (H X, H X 2, then the last

49 42 two exponential terms in (3.3.6 have exponents itn(h(τ X + τ X 2 (A,n + B + n τh( X X 2 ( X X 2 H (H(τ X + τ X 2 and (3.3.7 itnq 4 γq 3 (H X A,n (H X. (3.3.8 Because A,n W p (n, I p, then also HA,n H W p (n, I p and a similar result holds for B. Then the random variable in (3.3.7 is equal in distribution to X H (HA,n H + HB H + n τh( X X 2 ( X X 2 H H X = X H (H(A,n + B + n τ( X X 2 ( X X 2 H H X = X (A,n + B + n τ( X X 2 ( X X 2 X, and, similarly, the random variable in (3.3.8 is equal in distribution to itnq 4 γq 3 (H X (HA,n H (H X = itnq 4 γq 3 X A X,n. Therefore, f( X, X 2 = f(h X, H X 2. Because B 2 = n τ( X X 2 ( X X 2 is of rank, then by Proposition 2.. (v, there exists H O(p such that H( X X 2 ( X X 2 H = X X = X X 2 2 e e, 0 0

50 43 where e = (, 0,..., 0 R p. Therefore, we may replace ( X X 2 ( X X 2 by H X X 2 2 e e H in ( In addition, by replacing ( X, X 2 with ( H X, H X2 then, by orthogonal invariance, (3.3.6 becomes E Q3 E Q4 exp [ itq4 γq 3 ]E X E X2 E B E A,n exp [ itn X (A,n + B + n τ X X ] 2 2 e e X exp [ itnq4 γq 3 X A X,n ]. (3.3.9 We make one last orthogonal transformation. There exists an orthogonal matrix C O(p with first row X / X ; we may construct the remaining rows of C using the Gram-Schmidt orthogonalization process. We transform X to C X = X e and X 2 to αe + βe 2, where e is defined as before, e 2 = (0,, 0,..., 0 R p, and α and β are such that α X = X X 2, α 2 + β 2 = X 2 2. Let θ be the angle between X and X 2 and recall that cos θ = X X 2 / X X 2. Then α = X X 2 X = X 2 cos θ, and β = = ( X 2 2 ( X X 2 2 /2 X 2 ( X 2 X 2 2 ( cos 2 /2 θ X 2 = X2 sin θ. (3.3.20

51 44 Therefore X = τ X + τ X 2 = τ X e + τ X 2 (e cos θ + e 2 sin θ and X X 2 2 = X 2 + X X X 2 cos θ. Because n X N p (0, I p and N n X 2 N p (0, I p, then X and X 2 are orthogonally invariant random vectors. Therefore X / X and X 2 / X 2 are mutually independent and uniformly distributed on S p, the unit sphere in R p. Hence cos θ L = U U 2, where U and U 2 are independent and uniformly distributed on Sp. By Muirhead [25, p. 38], we then have X, X2, and θ are mutually independent, and cos 2 θ Beta(/2, (p /2. By (3.3.9, T 2 = L NQ ( 4 + n X 2 e γq A e,n 3 + N 2 [( τ X + τ X 2 cos θ e + τ X 2 e 2 sin θ ] ( ( A,n + B + n τ X 2 + X X X 2 cos θ e e [( τ X + τ X 2 cos θ e + τ X 2 e 2 sin θ ]. (3.3.2 Because n X N p (0, I p, it follows that Q n X 2 χ 2 p. Similarly, N n X 2 N p (0, I p and therefore Q 2 (N n X 2 2 χ 2 p. In addition, A,n, Q 3, Q 4, X,

52 X 2, θ, and B are mutually independent. Thus, we have mutual independence between Q, Q 2, Q 3, Q 4, θ, A,n, and B. We therefore conclude that 45 T 2 = L NQ ( 4 + Q γq e A e,n 3 [( τq /2 + N + ( A,n + B + [( τq /2 + τq /2 2 cos θ e + /2 τq e 2 sin θ ( τq + τq 2 2( τq τq 2 /2 cos θ τq /2 2 cos θ e + 2 ] e e ] /2 τq e 2 sin θ. ( This representation involves p p Wishart matrices, so it would be nice to reduce the size of any such matrices appearing in the final result. Next, we represent the distribution of T 2 in terms of a 2 2 Wishart matrix. By Proposition (v, L = (A,n + B /2 A,n (A,n + B /2 is independent of P = A,n + B and L Beta p ((n /2, (N n /2. Therefore, we may write ( in terms of L and P as T 2 = L NQ ( 4 + Q γq (P /2 e L (P /2 e 3 [( τq /2 + N + τq /2 2 cos θ e + /2 τq e 2 sin θ ( P + ( τq + τq 2 2( τq τq 2 /2 cos θ e e [( τq /2 + τq /2 2 cos θ e + 2 ] ] /2 τq e 2 sin θ. ( Because the distribution of L is invariant under orthogonal transformations, we may replace L by HLH, where H O(p. We now choose H to be the orthogonal matrix

53 with first row P /2 e / P /2 e and with all the remaining rows of H constructed using the Gram-Schmidt orthogonalization process. Then (P /2 e L P /2 e L = e P e e L e. By Muirhead [25], β = e L e /Beta((n p 2/2, (N n /2. 2 In order to simplify the notation let us also define u = (u, u 2 = N /2 (( τq /2 + /2 /2 τq cos θ, τq sin θ, and v = τq + τq 2 2( τq τq 2 /2 cos θ. Then the representation ( becomes 2 46 T 2 = L NQ ( 4 + Q β e γq P e + (u e + u 2 e 2 ( P + ve e (u e + u 2 e 2. 3 ( Our final step is to specify the distribution of the remaining terms that involve P. The first term only involves e P e and the second term involving P may be simplified by Woodbury s formula, (2.., as follows: (u e + u 2 e 2 ( P + ve e (u e + u 2 e 2 ( = (u e + u 2 e 2 P vp e e P + ve P (u e e + u 2 e 2 = (u e + u 2 e 2 P (u e + u 2 e 2 v + ve P e [(u e + u 2 e 2 P e ] 2.

54 47 Therefore, we derive the joint distribution of e P e, e P e 2, and e P e 2 2, when P W p (N 2, I p. Let M = (e, e 2 ; noting that MM = I 2, it follows by Proposition (iv, that (MP M W 2 (N p, I 2. Let W = MP M e = P e e P e 2. e P e 2 e P e 2 2 Then the stochastic representation, (3.3.24, reduces to T 2 = L NQ ( 4 + Q γq β e W e + u W v u 3 + ve W (e e W u 2, ( where W W 2 (N p, I 2. Remark We may take this one step further and represent the entire stochastic representation in terms of scalar mutually independent random variables. Let w = e W e, w 22 = e W e 2 2, w 2 = e W e 2, and ρ = w 2 / w w 22. We may rewrite Equation ( in terms of scalar random variables: ( T 2 = L t + Q β w + (u e + u 2 e 2 W (u e + u 2 e 2 v(u w + u 2 ρ w w vw ( = t + Q β w + u 2 w + u 2 2 w22 + 2u u 2 ρ w w 22 v(u w + u 2 ρ w w vw.

55 48 By Anderson [2], W = T T, where T is lower-triangular, that is t T = 0, t 2 t 22 and the entries of T are mutually independent, t 2 jj χ2 N p, and t ij N(0,, i j. It follows that W = (T T = t t 22 = t 2 t2 22 t 22 t 2 t t t 2 t t t2 t 2 2 t. ( t 2 t t 2 Therefore the joint distribution of {w, w 22, ρ 2 } is equal to the joint distribution of {(t t2 2 /t2 t2, 22 /t2, 22 t2 2 /(t2 2 +t2 } = {(Q Q 7 /Q 5 Q 6, /Q 6, Q 7 /(Q 6 +Q 7 }, where Q 5 χ 2 N p, Q 6 χ2 N p, and Q 7 χ2. Remark We have provided in the stochastic representation the distribution of cos 2 θ. Because the distribution of cos θ may also be desired, we provide the details here. Recall from Theorem that cos 2 θ Beta ( 2, 2 (p, where cos θ (, and cos θ L = cos θ. Let α = 2, β = 2 (p, X = cos θ, and Y = cos2 θ Beta(α, β. Let t (0,, then, because the distribution of cos θ is symmetric, P (X > t = 2 ( P ( t < X < t = 2 ( P ( X < t = 2 ( P (Y < t2.

56 49 Therefore P (X t = 2 (+P (Y < t2. Because the probability density function, p.d.f, is the derivative of the cumulative distribution function, f X (t, the p.d.f of X, equals tf Y (t 2, 0 < t <, where f Y (t is the p.d.f of Y. Similarly, f X (t = tf Y (t 2, < t 0 and therefore f X (t = t f Y (t 2, < t <. 3.4 Probability inequalities for the T 2 -statistic Because the exact distribution of T 2 is complicated, it would be useful to find simpler upper and lower bounds on its distribution. Chang and Richards [6] found upper and lower bounds for the distribution function of the T 2 -statistic [6] and Krishnamoorthy and Pannala [20] obtained an approximation to the distribution of their T 2 -statistic by means of an F -distributed statistic. Because our bounds are based on a stochastic representation of the exact distribution of the T 2 -statistic, it can be expected that our bounds will lead to more precise confidence regions than those in [6] and [20]. Theorem Let Q χ 2 p, Q 3 χ2 n p q, Q 4 χ2 q, Q 8 χ2 N p, and β Beta((n p 2/2, (N n /2 be mutually independent. For t 0, P ( T 2 t ( NQ4 ( P + Q t. (3.4. γq 3 Q 8 β Proof. It is straightforward from ( that µ Ĉov( µ µ L NQ 4 γq 3 ( + Q β e P e. (3.4.2

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Statistical Inference with Monotone Incomplete Multivariate Normal Data Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful

More information

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Statistical Inference with Monotone Incomplete Multivariate Normal Data Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Finite-Sample Inference with Monotone Incomplete Multivariate Normal Data, I

Finite-Sample Inference with Monotone Incomplete Multivariate Normal Data, I Finite-Sample Inference with Monotone Incomplete Multivariate Normal Data, I Wan-Ying Chang and Donald St.P. Richards July 23, 2008 Abstract We consider problems in finite-sample inference with two-step,

More information

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84 Chapter 2 84 Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random variables. For instance, X = X 1 X 2.

More information

Finite-Sample Inference with Monotone Incomplete Multivariate Normal Data, II

Finite-Sample Inference with Monotone Incomplete Multivariate Normal Data, II Finite-Sample Inference with Monotone Incomplete Multivariate Normal Data, II Wan-Ying Chang and Donald St. P. Richards July 4, 008 Abstract We continue our recent work on finite-sample, i.e., non-asymptotic,

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern

Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 473-481 Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern Byungjin

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Random Vectors and Multivariate Normal Distributions

Random Vectors and Multivariate Normal Distributions Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random 75 variables. For instance, X = X 1 X 2., where each

More information

Elliptically Contoured Distributions

Elliptically Contoured Distributions Elliptically Contoured Distributions Recall: if X N p µ, Σ), then { 1 f X x) = exp 1 } det πσ x µ) Σ 1 x µ) So f X x) depends on x only through x µ) Σ 1 x µ), and is therefore constant on the ellipsoidal

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. Adjoint operator and adjoint matrix Given a linear operator L on an inner product space V, the adjoint of L is a transformation

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

On Expected Gaussian Random Determinants

On Expected Gaussian Random Determinants On Expected Gaussian Random Determinants Moo K. Chung 1 Department of Statistics University of Wisconsin-Madison 1210 West Dayton St. Madison, WI 53706 Abstract The expectation of random determinants whose

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

1 Last time: least-squares problems

1 Last time: least-squares problems MATH Linear algebra (Fall 07) Lecture Last time: least-squares problems Definition. If A is an m n matrix and b R m, then a least-squares solution to the linear system Ax = b is a vector x R n such that

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

A matrix over a field F is a rectangular array of elements from F. The symbol

A matrix over a field F is a rectangular array of elements from F. The symbol Chapter MATRICES Matrix arithmetic A matrix over a field F is a rectangular array of elements from F The symbol M m n (F ) denotes the collection of all m n matrices over F Matrices will usually be denoted

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models Bare minimum on matrix algebra Psychology 588: Covariance structure and factor models Matrix multiplication 2 Consider three notations for linear combinations y11 y1 m x11 x 1p b11 b 1m y y x x b b n1

More information

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution Relevant

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013 Multivariate Gaussian Distribution Auxiliary notes for Time Series Analysis SF2943 Spring 203 Timo Koski Department of Mathematics KTH Royal Institute of Technology, Stockholm 2 Chapter Gaussian Vectors.

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Nobumichi Shutoh, Masashi Hyodo and Takashi Seo 2 Department of Mathematics, Graduate

More information

MULTIVARIATE POPULATIONS

MULTIVARIATE POPULATIONS CHAPTER 5 MULTIVARIATE POPULATIONS 5. INTRODUCTION In the following chapters we will be dealing with a variety of problems concerning multivariate populations. The purpose of this chapter is to provide

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Studentization and Prediction in a Multivariate Normal Setting

Studentization and Prediction in a Multivariate Normal Setting Studentization and Prediction in a Multivariate Normal Setting Morris L. Eaton University of Minnesota School of Statistics 33 Ford Hall 4 Church Street S.E. Minneapolis, MN 55455 USA eaton@stat.umn.edu

More information

ANOVA: Analysis of Variance - Part I

ANOVA: Analysis of Variance - Part I ANOVA: Analysis of Variance - Part I The purpose of these notes is to discuss the theory behind the analysis of variance. It is a summary of the definitions and results presented in class with a few exercises.

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

A Few Special Distributions and Their Properties

A Few Special Distributions and Their Properties A Few Special Distributions and Their Properties Econ 690 Purdue University Justin L. Tobias (Purdue) Distributional Catalog 1 / 20 Special Distributions and Their Associated Properties 1 Uniform Distribution

More information

7 Matrix Operations. 7.0 Matrix Multiplication + 3 = 3 = 4

7 Matrix Operations. 7.0 Matrix Multiplication + 3 = 3 = 4 7 Matrix Operations Copyright 017, Gregory G. Smith 9 October 017 The product of two matrices is a sophisticated operations with a wide range of applications. In this chapter, we defined this binary operation,

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in

More information

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem T Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem Toshiki aito a, Tamae Kawasaki b and Takashi Seo b a Department of Applied Mathematics, Graduate School

More information

Recall the convention that, for us, all vectors are column vectors.

Recall the convention that, for us, all vectors are column vectors. Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control Takahiro Nishiyama a a Department of Mathematical Information Science, Tokyo University of Science,

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,

More information

Multivariate Gaussian Analysis

Multivariate Gaussian Analysis BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

MULTIVARIATE DISCRETE PHASE-TYPE DISTRIBUTIONS

MULTIVARIATE DISCRETE PHASE-TYPE DISTRIBUTIONS MULTIVARIATE DISCRETE PHASE-TYPE DISTRIBUTIONS By MATTHEW GOFF A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY WASHINGTON STATE UNIVERSITY Department

More information

Multivariate Linear Models

Multivariate Linear Models Multivariate Linear Models Stanley Sawyer Washington University November 7, 2001 1. Introduction. Suppose that we have n observations, each of which has d components. For example, we may have d measurements

More information

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Haruhiko Ogasawara. This article gives the first half of an expository supplement to Ogasawara (2015).

Haruhiko Ogasawara. This article gives the first half of an expository supplement to Ogasawara (2015). Economic Review (Otaru University of Commerce, Vol.66, No. & 3, 9-58. December, 5. Expository supplement I to the paper Asymptotic expansions for the estimators of Lagrange multipliers and associated parameters

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

Some New Properties of Wishart Distribution

Some New Properties of Wishart Distribution Applied Mathematical Sciences, Vol., 008, no. 54, 673-68 Some New Properties of Wishart Distribution Evelina Veleva Rousse University A. Kanchev Department of Numerical Methods and Statistics 8 Studentska

More information

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix Probability Theory Linear transformations A transformation is said to be linear if every single function in the transformation is a linear combination. Chapter 5 The multivariate normal distribution When

More information

Inferences about a Mean Vector

Inferences about a Mean Vector Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University

More information

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

4. Distributions of Functions of Random Variables

4. Distributions of Functions of Random Variables 4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

ON COMBINING CORRELATED ESTIMATORS OF THE COMMON MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION

ON COMBINING CORRELATED ESTIMATORS OF THE COMMON MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION ON COMBINING CORRELATED ESTIMATORS OF THE COMMON MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION K. KRISHNAMOORTHY 1 and YONG LU Department of Mathematics, University of Louisiana at Lafayette Lafayette, LA

More information

= ϕ r cos θ. 0 cos ξ sin ξ and sin ξ cos ξ. sin ξ 0 cos ξ

= ϕ r cos θ. 0 cos ξ sin ξ and sin ξ cos ξ. sin ξ 0 cos ξ 8. The Banach-Tarski paradox May, 2012 The Banach-Tarski paradox is that a unit ball in Euclidean -space can be decomposed into finitely many parts which can then be reassembled to form two unit balls

More information

c 2005 Society for Industrial and Applied Mathematics

c 2005 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. XX, No. X, pp. XX XX c 005 Society for Industrial and Applied Mathematics DISTRIBUTIONS OF THE EXTREME EIGENVALUES OF THE COMPLEX JACOBI RANDOM MATRIX ENSEMBLE PLAMEN KOEV

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics Chapter 6 Order Statistics and Quantiles 61 Extreme Order Statistics Suppose we have a finite sample X 1,, X n Conditional on this sample, we define the values X 1),, X n) to be a permutation of X 1,,

More information

Analysis of variance using orthogonal projections

Analysis of variance using orthogonal projections Analysis of variance using orthogonal projections Rasmus Waagepetersen Abstract The purpose of this note is to show how statistical theory for inference in balanced ANOVA models can be conveniently developed

More information

Mathematical Methods wk 2: Linear Operators

Mathematical Methods wk 2: Linear Operators John Magorrian, magog@thphysoxacuk These are work-in-progress notes for the second-year course on mathematical methods The most up-to-date version is available from http://www-thphysphysicsoxacuk/people/johnmagorrian/mm

More information

Department of Statistics

Department of Statistics Research Report Department of Statistics Research Report Department of Statistics No. 05: Testing in multivariate normal models with block circular covariance structures Yuli Liang Dietrich von Rosen Tatjana

More information

ELEMENTARY LINEAR ALGEBRA

ELEMENTARY LINEAR ALGEBRA ELEMENTARY LINEAR ALGEBRA K R MATTHEWS DEPARTMENT OF MATHEMATICS UNIVERSITY OF QUEENSLAND First Printing, 99 Chapter LINEAR EQUATIONS Introduction to linear equations A linear equation in n unknowns x,

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 4

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 4 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 4 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory April 12, 2012 Andre Tkacenko

More information

Orthogonal decompositions in growth curve models

Orthogonal decompositions in growth curve models ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 4, Orthogonal decompositions in growth curve models Daniel Klein and Ivan Žežula Dedicated to Professor L. Kubáček on the occasion

More information

Supermodular ordering of Poisson arrays

Supermodular ordering of Poisson arrays Supermodular ordering of Poisson arrays Bünyamin Kızıldemir Nicolas Privault Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang Technological University 637371 Singapore

More information

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey

More information

Estimation of a multivariate normal covariance matrix with staircase pattern data

Estimation of a multivariate normal covariance matrix with staircase pattern data AISM (2007) 59: 211 233 DOI 101007/s10463-006-0044-x Xiaoqian Sun Dongchu Sun Estimation of a multivariate normal covariance matrix with staircase pattern data Received: 20 January 2005 / Revised: 1 November

More information

MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García. Comunicación Técnica No I-07-13/ (PE/CIMAT)

MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García. Comunicación Técnica No I-07-13/ (PE/CIMAT) MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García Comunicación Técnica No I-07-13/11-09-2007 (PE/CIMAT) Multivariate analysis of variance under multiplicity José A. Díaz-García Universidad

More information

Mean Vector Inferences

Mean Vector Inferences Mean Vector Inferences Lecture 5 September 21, 2005 Multivariate Analysis Lecture #5-9/21/2005 Slide 1 of 34 Today s Lecture Inferences about a Mean Vector (Chapter 5). Univariate versions of mean vector

More information

Lecture 15: Multivariate normal distributions

Lecture 15: Multivariate normal distributions Lecture 15: Multivariate normal distributions Normal distributions with singular covariance matrices Consider an n-dimensional X N(µ,Σ) with a positive definite Σ and a fixed k n matrix A that is not of

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

CSL361 Problem set 4: Basic linear algebra

CSL361 Problem set 4: Basic linear algebra CSL361 Problem set 4: Basic linear algebra February 21, 2017 [Note:] If the numerical matrix computations turn out to be tedious, you may use the function rref in Matlab. 1 Row-reduced echelon matrices

More information