Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Size: px

Start display at page:

Download "Inferences about Parameters of Trivariate Normal Distribution with Missing Data"

Oliver Hawkins
6 years ago
Views:

Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with

1 Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School Inferences about Parameters of Trivariate Normal Distribution with Missing Data Xing Wang Florida International University, DOI:.548/etd.FI3899 Follow this and additional works at: Part of the Statistical Models Commons, and the Statistical Theory Commons Recommended Citation Wang, Xing, "Inferences about Parameters of Trivariate Normal Distribution with Missing Data" (3). FIU Electronic Theses and Dissertations This work is brought to you for free and open access by the University Graduate School at FIU Digital Commons. It has been accepted for inclusion in FIU Electronic Theses and Dissertations by an authorized administrator of FIU Digital Commons. For more information, please contact

2 FLORIDA INTERNATIONAL UNIVERSITY Miami, Florida INFERENCES ABOUT PARAMETERS OF TRIVARIATE NORMAL DISTRIBUTION WITH MISSING DATA A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in STATISTICS by Xing Wang 3

3 To: Dean Kenneth Furton College of Arts and Sciences This thesis, written by Xing Wang, and entitled Inferences about Parameters of Trivariate Normal Distribution with Missing Data, having been approved in respect to style and intellectual content, is referred to you for judgment. We have read this thesis and recommend that it be approved. Florence George Kai Huang, Co-Major Professor Jie Mi, Co-Major Professor Date of Defense: July 5, 3 The thesis of Xing Wang is approved. Dean Kenneth Furton College of Arts and Sciences Dean Lakshmi N. Reddi University Graduate School Florida International University, 3 ii

4 ACKNOWLEDGMENTS First of all, I would like to express my sincere thanks to my major professor, Dr. Jie Mi for his patient guidance, enthusiasm, encouragement and friendship throughout this whole study. I couldn t finish my research without his great support. I would like to express deepest appreciation to my co-major professor, Dr. Kai Huang, for his technical support and patient guidance in Latex and Matlab. I would also like to thank the member of my committee, Dr. Florence George for her time, valuable advice and great encouragement. In addition, I would like to thank the Department of Mathematics and Statistics, all the professors who supported and encouraged me throughout my life in FIU. I wish to thank auntie Miao, who gave me substantial help in my life. I also thank my classmate Maria, my technical support of Matlab. I am really grateful to have them with me, and may God bless them all. iii

5 ABSTRACT OF THE THESIS INFERENCES ABOUT PARAMETERS OF TRIVARIATE NORMAL DISTRIBUTION WITH MISSING DATA by Xing Wang Florida International University, 3 Miami, Florida Professor Jie Mi, Co-Major Professor Professor Kai Huang, Co-Major Professor Multivariate normal distribution is commonly encountered in any field, a frequent issue is the missing values in practice. The purpose of this research was to estimate the parameters in three-dimensional covariance permutation-symmetric normal distribution with complete data and all possible patterns of incomplete data. In this study, MLE with missing data were derived, and the properties of the MLE as well as the sampling distributions were obtained. A Monte Carlo simulation study was used to evaluate the performance of the considered estimators for both cases when was known and unknown. All results indicated that, compared to estimators in the case of omitting observations with missing data, the estimators derived in this article led to better performance. Furthermore, when was unknown, using the estimate of would lead to the same conclusion. Keywords: Trivariate Normal Distribution, Permutation-Symmetric Covariance, Missing Data, MLE. iv

6 TABLE OF CONTENTS CHAPTER PAGE I. Introduction II. Review of MLE with complete data III. The Maximum Likelihood Estimates with Missing Data Notation and Assumption Likelihood Function Derivation of ˆµ and ˆ IV. Properties of MLEs Mean of ˆµ Variance of ˆµ Variance of ˆµ ˆµ V. Sampling distributions The distribution of ˆµ µ The distribution of ˆµ ˆµ VI. Simulation Study with Known VII. Simulation Study with Unknown VIII. Conclusion REFERENCES v

7 LIST OF TABLES TABLE PAGE. Coverage Probability of confidence regions of µ The Probability of Type I error The Power of testing H : µ = vs.h a : µ The Coverage probability of confidence intervals of µ The average width of confidence intervals of µ The average variance of ˆµ The coverage probability of confidence intervals of µ µ The average width of confidence intervals of µ µ The average variance of ˆµ ˆµ Coverage Probability of confidence regions of µ The Probability of Type I error The Power of testing H : µ = vs.h a : µ The Coverage probability of confidence intervals of µ The average width of confidence intervals of µ The average variance of ˆµ The coverage probability of confidence intervals of µ µ The average width of confidence intervals of µ µ The average variance of ˆµ ˆµ vi

8 LIST OF FIGURES FIGURE PAGE. Comparison of Coverage Probabilities of Confidence Region of µ Absolute Value of the Difference Between the Coverage Probabilities and Comparison of Type I Error Absolute Value of the Difference Between the Type I Error and α = Compare the Power of Testing Compare the Coverage Probabilities of the confidence intervals of µ Absolute Value of the Difference Between the Coverage Probabilities and Compare the Average Widths of the Confidence Intervals of µ Compare the average variance of ˆµ Compare the Coverage Probabilities of the confidence intervals of µ µ Absolute value of the difference between Coverage Probabilities and Compare Average Widths of Confidence Intervals of µ µ Compare the average variance of ˆµ ˆµ Compare the Coverage Probabilities of the Confidence Regions of µ Absolute value of the difference between Coverage Probabilities and Compare the Type I Error Absolute value of difference between Type I Error and α = Compare the Power of Testing Compare the Coverage Probabilities of the Confidence Intervals of µ = Absolute value of the difference between Coverage Probabilities and Compare the Average Widths of the Confidence Intervals of µ = vii

9 . Compare the average variance of ˆµ Compare the Coverage Probabilities of the Confidence Intervals of µ µ Absolute value of the difference between Coverage Probabilities and Compare the Average Widths of the Confidence Intervals of µ µ Compare the average variance of ˆµ ˆµ viii

10 I. Introduction In practice, the normal distribution is commonly encountered, since the sampling distribution of many multivariate statistics are proximately normal because of the a central limit effect. Early application of the multivariate normal distribution was with regard to biological studies. For instance, we often think that height and weight, when observed on the same individual, approximately follow bivariate normal distribution. We could extend this to the foot size, or any other variable of related physical characteristics, and these measurements together could follow the multivariate normal distribution. Moreover, we often have to analyze data that contains missing values in practice. Incomplete normal data could arise in any number of scientific investigations: early detection of diseases, wildlife survey research, mental health research, and so on. Missing data are one of the most persuasive problems for analysis of data which can occur for a variety of reasons. For example, a participant may refuse to answer some of the questions; the use of new instruments results in incomplete historical data; some information may be purposely excised to protect confidentiality. Estimation of parameters of a multivariate normal distribution when data are incomplete has been discussed by many authors. A systematic approach to missing values problem was derived using likelihoods of observed values. Wilks (93) considered MLEs for a bivariate normal population with missing data in both variables. Srivastava and Zaatar (973) used Monte Carlo simulation to compare four estimators of the covariance matrix of a bivariate normal distribution. Edgett (956) gave maximum likelihood estimates of parameters of a trivariate normal distribution when observations on only one variable are missing. Lord (955) and Matthai (95) also found estimates of parameters

11 of a trivariate normal distribution in other special cases. Anderson (957) indicated how one could obtain the Maximum Likelihood Estimates when the sample was monotone. Hocking and Smith (968) developed a method of estimating parameters of a p-variate normal distribution with zero mean vector in which the missing observations are not required to follow certain patterns. Their estimation technique could be summarized as follows: The data were divided into groups according to which variates were missing. Initial estimates of the parameters were obtained from that group of observations with no missing variates. (c) These initial estimators were modified by adjoining, optimally, the information in the remaining groups in a sequential manner until all data was used. Hocking and Marx (979) used the same method as Hocking and Smith (968) to derive estimates, but their use of matrices simplified the notation and gave the estimates in a form that was easily implemented on a computer. They also gave exact small sample moments of the estimators for the case of two data groups. The purpose of my research is to estimate the parameters in three-dimensional covariance permutation-symmetric normal distribution with complete data and all possible patterns of incomplete data, and then to study the properties of the estimators. It is assumed that all correlation coefficients and variances are equal, which means that we focus on the covariance permutation-symmetric trivariate normal distribution. The special case of the covariance permutation-symmetric model are the exchangeable normal variables, which make their appearance in many statistical applications. For instance, in the Bayes theory concerning normal observations, it is generally assumed that the prior distribution

12 of θ, the population mean, has an N(µ, ω ) distribution for some µ and ω >. Thus, for given θ, if the random variables are (conditionally) i.i.d normal variables, then the marginal distribution of the data variable X = (X,..., X n ) is a mixture. In this case X,..., X n are exchangeable normal variables. The thesis is organized as follows. The Maximum Likelihood Estimates of the parameters of trivariate normal distribution with complete data are reviewed in Section. Maximum Likelihood Estimates with missing data will be derived in Section 3. In Section 4, the properties of the MLEs will be obtained, followed by the sampling distributions in Section 5. The numerical studies based on Monte Carlo simulations are considered in Section 6, which include comparison of confidence regions of µ; the probability of Type I error, and power of testing H : µ = vs H a : µ ; (c) coverage probability and average width of confidence intervals with regard to µ ; (d) coverage probability and average width of confidence intervals with regard to µ µ using i) only three-dimensional observations and ii) both three-dimensional observations and all possible incomplete observations. Finally, in Section 7, the simulation will follow the same procedure in Section 6 but based on unknown. 3

13 II. Review of MLE with complete data In order to obtain the trivariate normal likelihood function, let us assume that the 3 vectors X, X,..., X n represent a random sample from a three-dimensional multivariate normal population with mean vector µ and covariance matrix Σ, where X i = (x i, x i, x i3 ), i =,,..., n, µ = (µ, µ, µ 3 ) and Σ = Since X, X,..., X n are mutually independent and each has a distribution N 3 (µ,σ), the joint density function of all the observations is the product of the marginal normal densities: f(x, X,..., X n ) = = n { (π) 3/ Σ /exp[ (X i µ) Σ (X i µ) ] } n (π) 3n/ Σ n/exp[ (X i µ) Σ (X i µ) ] It has been derived that ˆµ = X () and ˆΣ = n (X i X)(X i X) n = (n )S n () are the maximum likelihood estimates of µ and Σ respectively, where X = n n X i, i =,,..., n 4

14 and S = n (X i X)(X i X). n 5

15 III. The Maximum Likelihood Estimates with Missing Data. Notation and Assumption In the present paper, we consider the covariance permutation-symmetric trivariate normal distribution, which means that = = 3 and all correlation coefficients are equal: = 3 = 3. By denoting τ =, the marginal density function can be expressed as: where f(x, x, x 3 ) = e w/[τ( )] π 3/ τ 3/ (3) w = ( + ) [ (x µ ) + (x µ ) + (x 3 µ 3 ) ] + [ (x µ )(x µ ) + (x µ )(x 3 µ 3 ) + (x µ )(x 3 µ 3 ) ] The data in our study are divided into groups according to which variables are missing, and initial estimates of the parameters are obtained from that group of observations with no missing variables. Specifically, it is assumed that a sample of size n + n + n 3 + n 3 + n + n + n 3 is taken from a three-dimensional normal distribution but some of the observations have randomly occurring missing entries. In this case, the observations can be placed in 7 groups. Specifically, it means that the data consist of n complete observations {(x i, x i, x i3 ), i =,, n } on X = (X, X, X 3 ), n observations {(x () i, x() i ), i =,, n } on (X, X ), n 3 observations {(x (3) i, x(3) i3 ), i =,, n 3 } on (X, X 3 ), n 3 observations {(x (3) i, x(3) i3 ), i =,, n 3 } on (X, X 3 ), n observations {x () i, i =,, n } on X, n observations {x () i, i =,, n } on X, n 3 observations {x (3) i3, i =,, n 3 } on X 3. 6

16 . Likelihood Function Following Hocking and Smith(968), the likelihood function can be written as: L(µ, τ) = L (µ, τ) L (µ, µ, τ) L 3 (µ, µ 3, τ) L 3 (µ, µ 3, τ) L (µ, τ) = L (µ, τ) L 3 (µ 3, τ) n n n 3 () (3) f(x i; µ, τ) i ; µ, µ, τ) i ; µ, µ 3, τ) f(x f(x n 3 n n n 3 i ; µ, µ 3, τ) f(x () i ; µ, τ) f(x () i ; µ, τ) (3) f(x f(x (3) i ; µ 3, τ) (4) Where x i = (x i, x i, x i3 ) (), i =,, n ; i = (x x () i, x() i ), i =,, n ; (3) xi = (x (3) i, x (3) i3 ) (3), i =,, n 3 ; i = (x x (3) i, x (3) i3 ), i =,, n 3 ; x () i = x () i, i =,, n ; x () i = x () i, i =,, n ; x (3) i = x (3) i3, i =,, n 3 L is used to denote the likelihood function for the three-dimensional complete observations, where L (µ, τ) = e P n w i/[τ( )] ( π 3/ τ 3/ ) n with w i = ( + ) [ (x i µ ) + (x i µ ) + (x i3 µ 3 ) ] + [ (x i µ )(x i µ ) + (x i µ )(x i3 µ 3 ) + (x i µ )(x i3 µ 3 ) ] Similarly, other notations are defined as follows: P n L (µ, µ, τ) = e [ (x() i µ ) P n (x() i µ )(x () i µ )+ P n (x() i µ ) ]/[τ( )] (π) n τ n ( ) n / 7

17 P n3 L 3 (µ, µ 3, τ) = e [ (x(3) i µ ) P n 3 (x(3) i µ )(x (3) i3 µ 3)+ P n 3 (x(3) i3 µ 3) ]/[τ( )] (π) n 3 τ n 3( ) n 3/ P n3 L 3 (µ, µ 3, τ) = e [ (x(3) i µ ) P n 3 (x(3) i µ )(x (3) i3 µ 3)+ P n 3 (x(3) i3 µ 3) ]/[τ( )] (π) n 3 τ n 3( ) n 3/ P n L (µ, τ) = e (x() i µ ) /(τ) (π) n / τ n / P n L (µ, τ) = e (x() i µ ) /(τ) (π) n / τ n / 3. Derivation of ˆµ and ˆ P n3 L 3 (µ 3, τ) = e (x(3) i3 µ 3) /(τ) (π) n 3/ τ n 3/ In the rest of this study it is assumed that ( /, ) is known. In order to apply likelihood-based method to obtain the MLEs ˆµ and ˆ of µ and, the natural logarithms of the above likelihood functions are taken: lnl (µ, τ) = 3n ln(π) 3n lnτ n ln( ) n + + τ( ) [ n (x i µ ) + (x i µ ) + n (x i3 µ 3 ) ] n τ( ) [ n (x i µ )(x i µ ) + (x i µ )(x i3 µ 3 ) n + (x i µ )(x i3 µ 3 )] lnl (µ, µ, τ) = n ln(π) n lnτ n ln( ) 8

18 n τ( ) [ n + (x () i µ ) ] (x () i n µ ) (x () i µ )(x () i µ ) lnl 3 (µ, µ 3, τ) = n 3 ln(π) n 3 lnτ n 3 ln( ) n3 τ( ) [ n 3 (x (3) i µ ) (x (3) i µ )(x (3) i3 µ 3 ) n 3 + (x (3) i3 µ 3 ) ] lnl 3 (µ, µ 3, τ) = n 3 ln(π) n 3 lnτ n 3 ln( ) n3 τ( ) [ n 3 (x (3) i µ ) (x (3) i µ )(x (3) i3 µ 3 ) n 3 + (x (3) i3 µ 3 ) ] lnl (µ, τ) = n ln(π) n lnτ τ lnl (µ, τ) = n ln(π) n lnτ τ lnl 3 (µ 3, τ) = n 3 ln(π) n 3 lnτ τ n n n 3 (x () i µ ) (x () i µ ) (x (3) i3 µ 3 ) Combining all the above expressions together yields the log-likelihood function (5): lnl(µ, τ) = 3n ln(π) 3n lnτ n ln( ) n ln(π) n lnτ n ln( ) n 3 ln(π) n 3 lnτ n 3 ln( ) n 3 ln(π) n 3 lnτ n 3 ln( ) n ln(π) n lnτ n ln(π) n lnτ n 3 ln(π) n 3 lnτ 9

19 n + + τ( ) [ n n (x i µ ) + (x i µ ) + (x i3 µ 3 ) ] n τ( ) [ n (x i µ )(x i µ ) + (x i µ )(x i3 µ 3 ) n + (x i µ )(x i3 µ 3 )] n (x () i µ )(x () n3 τ( ) [ n 3 + n 3 (x (3) i3 (x (3) i µ 3 ) ] (x (3) i µ )(x (3) n (x () n τ( ) [ n i µ ) + n 3 µ ) (x () (x () i µ ) i µ ) ] (x (3) i µ )(x (3) n3 τ( ) [ (x (3) i µ ) i3 µ 3 ) + n (x () n 3 (x (3) i3 µ 3 ) ] i3 µ 3 ) i µ ) i µ ) n 3 (x (3) i3 µ 3 ) τ τ τ = c ( 3n + n + n 3 + n 3 + n + n + n 3) lnτ n + + τ( ) [ n n (x i µ ) + (x i µ ) + (x i3 µ 3 ) ] n τ( ) [ n3 τ( ) [ n3 τ( ) [ τ n (x () n i µ ) + (x () (x (3) i µ ) + (x (3) i µ ) + i µ ) τ n (x () (x () n 3 (x (3) n 3 (x (3) i µ ) ] i3 µ 3 ) ] i3 µ 3 ) ] i µ ) τ n 3 (x (3) i3 µ 3) n τ( ) [ n (x i µ )(x i µ ) + (x i µ )(x i3 µ 3 ) n + (x i µ )(x i3 µ 3 )] + n 3 + (x (3) i µ )(x (3) n 3 i3 µ 3 ) + [ n τ( ) (x (3) i µ )(x (3) (x () i µ )(x () i3 µ 3 ) ] i µ )

20 (5) where c = 3n ln(π) n ln( ) n ln(π) n ln( ) n 3 ln(π) n 3 ln( ) n 3 ln(π) n 3 ln( ) n ln(π) n ln(π) n 3 ln(π) In order to find the Maximum Likelihood Estimates of µ = (µ, µ, µ 3 ) and τ, the partial derivatives of lnl with respect to µ and τ are taken, and are set to be zero: lnl µ = lnl τ = which are equivalent to the following expressions: n lnl + n = (x µ τ( i µ ) + ) τ( ) [ n 3 + n (x (3) i µ )] + τ + (x i3 µ 3 )] n (x () n τ( ) [ i µ ) + (x () (x () i µ ) n τ( ) [ (x i µ ) n 3 i µ ) + (x (3) i3 µ 3 )] = (6) n lnl + n = (x µ τ( i µ ) + ) τ( ) [ n 3 + n (x (3) i µ )] + τ + (x i3 µ 3 )] n (x () n τ( ) [ i µ ) + (x () (x () i µ ) n τ( ) [ (x i µ ) n 3 i µ ) + (x (3) i3 µ 3 )] = (7)

21 n lnl + n3 = (x µ 3 τ( i3 µ 3 ) + ) τ( ) [ n 3 + n (x (3) i3 µ 3 )] + τ + (x i µ )] n 3 (x (3) n3 τ( ) [ i3 µ 3) + (x (3) (x (3) i3 µ 3 ) n τ( ) [ (x i µ ) n 3 i µ ) + (x (3) i µ )] = (8) lnl τ = (3n τ + n + n 3 + n 3 + n + n + n 3) n + τ ( ) [ n (x i µ ) + (x i µ ) + n (x i3 µ 3 ) ] n + τ ( ) [ n (x i µ )(x i µ ) + (x i µ )(x i3 µ 3 ) n + (x i µ )(x i3 µ 3 )] + n n 3 n 3 + n τ [ (x () i µ )(x () (x (3) i µ )(x (3) (x (3) i µ )(x (3) (x () n τ ( ) [ n i µ ) + i3 µ 3 ) + i3 µ 3 ) + n i µ ) + (x () (x () n 3 (x (3) n 3 (x (3) (x () i µ ) n 3 i µ ) + (x (3) i µ ) n 3 i3 µ 3 ) + i3 µ 3 ) ] n 3 i µ ) + (x (3) i3 µ 3 ) ] = (x (3) i µ ) (9) The system of linear equations (6) (7) (8) can be rewritten as: n n ( + )( ) (x i µ ) + ( )[ n + ( )( ) n ( )[ (x () (x () n n 3 i µ ) + (x () n (x (3) i µ )] i µ ) + ( )[ (x i µ ) + (x i3 µ 3 )] n 3 i µ ) + (x (3) i3 µ 3 )] = ()

22 n n ( + )( ) (x i µ ) + ( )[ n + ( )( ) n ( )[ (x () (x () n n 3 i µ ) + (x () n (x (3) i µ )] i µ ) + ( )[ (x i µ ) + (x i3 µ 3 )] n 3 i µ ) + (x (3) i3 µ 3 )] = () n n 3 ( + )( ) (x i3 µ 3 ) + ( )[ n 3 + ( )( ) n 3 ( )[ (x (3) (x (3) n n 3 i3 µ 3 ) + (x (3) n (x (3) i3 µ 3 )] i3 µ 3) + ( )[ (x i µ ) + (x i µ )] n 3 i µ ) + (x (3) They could be further expressed as follows: i µ )] = () [n ( + ) + (n + n 3 )( + ) + n ( + )( + )( )]µ [n ( + ) + n ( + )]µ [n 3 ( + ) + n ( + )]µ 3 n ( + ) n x i ( + )( x () i + n 3 x (3) i ) n n ( + )( + )( ) x () i + ( + ) (x i + x i3 ) n + ( + )( x () i + n 3 x (3) i3 ) = (3) [n ( + ) + n ( + )]µ + [n ( + ) + (n + n 3 )( + ) + n ( + )( + )( )]µ [n 3 ( + ) + n ( + )]µ 3 n n ( + ) x i ( + )( x () i + 3 n 3 x (3) i )

23 n n ( + )( + )( ) x () i + ( + ) (x i + x i3 ) n + ( + )( x () i + n 3 x (3) i3 ) = (4) [n 3 ( + ) + n ( + )]µ [n 3 ( + ) + n ( + )]µ + [n ( + ) + (n 3 + n 3 )( + ) + n 3 ( + )( + )( )]µ 3 n ( + ) n 3 x i3 ( + )( x (3) i3 + n 3 x (3) i3 ) n 3 n ( + )( + )( ) x (3) i3 + ( + ) (x i + x i ) n 3 + ( + )( x (3) i + n 3 x (3) i ) = (5) For the sake of convenience, the above system of equations (3) (4) (5) could be denoted by matrix and vectors: Aµ = b (6) where Matrix A= a (n γ 3 + n γ ) (n 3 γ 3 + n γ ) (n γ 3 + n γ ) a (n 3 γ 3 + n γ ) (n 3 γ 3 + n γ ) (n 3 γ 3 + n γ ) a 33 with (7) a = n γ + (n + n 3 )γ 3 + n γ 3 γ γ, a = n γ + (n + n 3 )γ 3 + n γ 3 γ γ, a 33 = n γ + (n 3 + n 3 )γ 3 + n 3 γ 3 γ γ, γ = +, γ =, γ 3 = + 4

24 µ = (µ, µ, µ 3 ), b = (b, b, b 3 ) (8) b = γ n x. + γ 3 (n x (). + n 3 x (3). ) + γ 3 γ γ n x (). γ n ( x. + x.3 ) γ 3 (n x (). + n 3 x (3).3 ) b = γ n x. + γ 3 (n x (). + n 3 x (3). ) + γ 3 γ γ n x (). γ n ( x. + x.3 ) γ 3 (n x (). + n 3 x (3).3 ) where x.j = b 3 = γ n x.3 + γ 3 (n 3 x (3).3 + n 3 x (3).3 ) + γ 3 γ γ n 3 x (3).3 γ n ( x. + x. ) γ 3 (n 3 x (3). + n 3 x (3). ) P n x ij n, x ().j = x () P n.j = x() ij n, x ().j = P n x() ij n, x (3).j = P n x() ij n, x (3) P n3.j = x(3) ij n 3, j =,, 3. P n3 x(3) ij n 3, x (3) P n3.j = x(3) ij n 3, If A is a positive definite matrix, then the equation (6) can be solved and thus allow us to obtain the matrix expression of Maximum Likelihood Estimates of µ. To prove the positive definiteness of A, matrix A is firstly denoted as the sum of two matrices: B and B. The next step is to show that B is positive semi-definite and B is positive definite, where n + n 3 n n 3 B = γ 3 n n + n 3 n 3 n 3 n 3 n 3 + n 3 n γ + n γ γ 3 n n B = γ n n γ + n γ γ 3 n n n n γ + n 3 γ γ 3 (9). () 5

25 It can be proved that matrix B is positive semi-definite by rewriting B as the sum of C, C and C 3, where n n C = γ 3 n n () C = γ 3 C 3 = γ 3 n 3 n 3 () n 3 n 3 n 3 n 3. (3) n 3 n 3 Proof. A matrix is positive semi-definite if and only if every principal minor of the matrix is nonnegative. Since n, (, ), γ 3 = + >, the first-order principal minors of matrix C are nonnegative; the second-order principal minors of matrix C are γ 3 n ( ) and ; the third-order principal minor of matrix C is. Therefore, C is a positive semi-definite matrix. Similarly, since n 3 and n 3, all the principal minors of C and C 3 are nonnegative, so that C and C 3 are also positive semi-definite matrices. Based on the results above, matrix B = C + C + C 3 is a positive semi-definite matrix. Applying the similar argument to matrix B, we could write it as the sum of C 4 and C 5 given below. By showing that C 4 is positive definite and C 5 is positive semi-definite, B can be proved as a positive definite matrix. 6

26 n γ n n C 4 = γ n n γ n n n n γ n γ γ 3 C 5 = γ n γ γ 3 n 3 γ γ 3 (4) (5) Proof. Since n >, (, ), γ = + >, γ = >, γ 3 = + >, the first-order principal minor of matrix C 4 is n γ > ; the second-order principal minor is n γ (γ ) > ; the third-order principal minor is n 3 ( + ) 3 ( )( + ) >. Therefore, C 4 is a positive definite matrix. Since n, n, n 3, the first-order principal minors of matrix C 5 are n γ γ γ 3, n γ γ γ 3 and n 3 γ γ γ 3. Similarly, it can be checked that all the second and third-order principal minors are nonnegative. Therefore C 5 is a positive semi-definite matrix. As a result, matrix B = C 4 + C 5 is a positive definite matrix. B is shown to be positive semi-definite and B is positive definite, therefore A = B + B is a positive definite matrix. Note that A is positive definite, thus it is a nonsingular matrix and its inverse matrix exists. It could be derived from expression (6) that ˆµ = A b where ˆµ = (ˆµ, ˆµ, ˆµ 3 ). (6) Moreover, the Maximum Likelihood Estimator of τ could be obtained by plugging ˆµ into equation (9): 7

27 ˆτ = 3n + n + n 3 + n 3 + n + n + n { n + 3 ( + )( ) [ (x i ˆµ ) n n n + (x i ˆµ ) + (x i3 ˆµ 3 ) ] ( + )( ) [ (x i ˆµ )(x i ˆµ ) n n + (x i ˆµ )(x i3 ˆµ 3 ) + (x i ˆµ )(x i3 ˆµ 3 )] n + ( + )( ) [ n + + (x () n 3 (x (3) n 3 + n 3 (x () i i ˆµ ) + n 3 i3 ˆµ 3 ) + (x (3) i3 (x (3) i (x (3) i ˆµ 3 ) ] + n [ n ˆµ ) (x () n 3 ˆµ ) (x () i ˆµ )(x () n 3 ˆµ ) i ˆµ ) + i ˆµ ) (x (3) i ˆµ )(x (3) (x (3) i ˆµ )(x (3) n (x () i ˆµ n 3 ) + i3 ˆµ 3 ) i3 ˆµ 3 ) (x (3) i3 ˆµ 3) ] } (7) By denoting c = 3n + n + n 3 + n 3 + n + n + n 3, γ = +, γ =, γ 3 = +, the equation above can be further simplified as: ˆτ = { γ n n n [ (x i ˆµ ) + (x i ˆµ ) + (x i3 ˆµ 3 ) ] c γ 3 γ n [ (x i ˆµ )(x i ˆµ ) + γ 3 γ n + n n (x i ˆµ )(x i3 ˆµ 3 )] + γ γ [ n 3 n 3 + n [ (x () i ˆµ )(x () (x (3) i ˆµ )(x (3) (x (3) i ˆµ )(x (3) i ˆµ n ) + (x () n (x i ˆµ )(x i3 ˆµ 3 ) i ˆµ ) + i3 ˆµ 3 ) + i3 ˆµ 3 ) + n (x () n 3 (x (3) n 3 (x (3) i ˆµ n 3 ) + (x () 8 (x () i ˆµ ) n 3 i ˆµ ) + n 3 i3 ˆµ 3 ) + i3 ˆµ 3 ) ] (x (3) i3 ˆµ 3) ] } (x (3) i ˆµ ) (x (3) i ˆµ )

28 (8) IV. Properties of MLEs. Mean of ˆµ The estimate of µ in our model with missing data is an unbiased estimator. This property could be easily proved as follows. Proof. Since E( x.j ) = E( x ().j ) = E( x (3).j ) = E( x (3).j ) = E( x ().j ) = E( x().j ) = E( x(3).j ) = µ j, j =,, 3. E(b ) E = E(b ) E(b 3 ) = γ n µ + γ 3 (n µ + n 3 µ ) + γ 3 γ γ n µ γ n (µ + µ 3 ) γ 3 (n µ + n 3 µ 3 ) γ n µ + γ 3 (n µ + n 3 µ ) + γ 3 γ γ n µ γ n (µ + µ 3 ) γ 3 (n µ + n 3 µ 3 ) γn µ 3 + γ 3 (n 3 µ 3 + n 3 µ 3 ) + γ 3 γ γ n 3 µ 3 γ n (µ + µ ) γ 3 (n 3 µ + n 3 µ ) = [n γ + (n + n 3 )γ 3 + n γ 3 γ γ ]µ (n γ 3 + n γ )µ (n 3 γ 3 + n γ )µ 3 (n γ 3 + n γ )µ + [n γ + (n + n 3 )γ 3 + n γ 3 γ γ ]µ (n 3 γ 3 + n γ )µ 3 (n 3 γ 3 + n γ )µ (n 3 γ 3 + n γ )µ + [n γ + (n 3 + n 3 )γ 3 + n 3 γ 3 γ γ ]µ 3 = Aµ Consequently E(ˆµ) = E(A b) = A E = A Aµ = µ 9

29 . Variance of ˆµ With matrix notation, the variance of ˆµ can be easily denoted as V ar(ˆµ) = V ar(a b) = A V ar(a ) (9) where b is the same vector as in (8). For the sake of convenience, b could be denoted as b = DX where and D = n γ n γ n γ n γ n γ n γ n γ n γ n γ n γ 3 n γ 3 n γ 3 n γ 3 n 3 γ 3 n 3 γ 3 n 3 γ 3 n 3 γ 3 n 3 γ 3 n 3 γ 3 n 3 γ 3 n 3 γ 3 n γ 3 γ γ n γ 3 γ γ n 3 γ 3 γ γ X = ( x., x., x.3, x ()., x ()., x (3)., x (3).3, x (3)., x (3).3, x ()., x ()., x (3).3 ). Since the V ar could be further expressed as: V ar = V ar(dx ) = DV ar(x )D, (3) combining (9) and (3), we get: V ar(ˆµ) = (A D)V ar(x )(A D) (3)

30 where the variance of X can be obtained by V ar(x ) = n n n n n n n n n n n n n n 3 n 3 n 3 n 3 n 3 n 3 n 3 n 3 n n n Variance of ˆµ ˆµ The variance of ˆµ is a 3 3 matrix which has the format as follows: V ar(ˆµ ) Cov(ˆµ, ˆµ ) Cov(ˆµ, ˆµ 3 ) V ar(ˆµ) = Cov(ˆµ, ˆµ ) V ar(ˆµ ) Cov(ˆµ, ˆµ 3 ). (3) Cov(ˆµ, ˆµ 3 ) Cov(ˆµ, ˆµ 3 ) V ar(ˆµ 3 ) Consequently, the variance of ˆµ ˆµ can be obtained as: V ar(ˆµ ˆµ ) = V ar(ˆµ ) + V ar(ˆµ ) Cov(ˆµ, ˆµ ) V. Sampling distributions. The distribution of ˆµ µ Since vector X = (X, X, X 3 ) has a joint normal distribution, ˆµ is a linear combination of b, which is also the linear combination of the components of X. As a

31 result, ˆµ follows the normal distribution: N 3 (E(ˆµ µ), V ar(ˆµ µ)), where E(ˆµ µ) and V ar(ˆµ µ) could be simply calculated as: E(ˆµ µ) = E(ˆµ) E(µ) = µ µ = V ar(ˆµ µ) = V ar(ˆµ) = (A D)V ar(x )(A D).. The distribution of ˆµ ˆµ Since ˆµ = (ˆµ, ˆµ, ˆµ 3 ) has a joint normal distribution, ˆµ ˆµ follows normal distribution with mean E(ˆµ ˆµ ) = E(ˆµ ) E(ˆµ ) = µ µ and variance V ar(ˆµ ˆµ ) = V ar(ˆµ ) + V ar(ˆµ ) Cov(ˆµ, ˆµ ). VI. Simulation Study with Known In this section, a simulation study is conducted to compare the performance of estimates in case only three-dimensional observations and both three-dimensional observations and all possible incomplete observations. The efficiency of this estimator is evaluated by means of coverage probabilities and average width of the confidence intervals, type I error and power of testing H : µ = vs.h a : µ. The sample size is chosen as n =, n = n 3 = n 3 = 5, n = n = n 3 =, and the following five levels of standard deviations are considered: =.5,.5,,, 4. It is assumed that is known in this section, so the real value of is used, which ranges from. to.9. The population mean µ is chosen to be µ = (,, 3) when calculating the power of testing, coverage probabilities and average width of the confidence intervals. The number of replication is r =.

32 . Compare the confidence regions of µ Table : Coverage Probabilities of 95% confidence regions of µ = (,, 3)

33 Coverage Probability Coverage Probability Coverage Probability = =.94 = Coverage Probability Coverage Probability =.5.94 = Confidence level=.95 Figure : Comparison of Coverage Probabilities of Confidence Region of µ Absolute value of the difference Absolute value of the difference Absolute value of the difference 4 x 3 = x 3 = 4 6 x 3 =4 4 Absolute value of the difference Absolute value of the difference 8 x 3 = x 3 = 3 Figure : Absolute Value of the Difference Between the Coverage Probabilities and.95 It is evident from Figure that, in both case and, the coverage probabilities of the 95% confidence regions are close to.95. The absolute values of difference between the coverage probabilities and confidence level.95, as shown in Figure, are also small in both cases. 4

34 . Compare the probability of type I error and power of testing H : µ = vs.h a : µ. Compare the probability of type I error Table : The Probability of Type I error

35 Probability of Type I Error Probability of Type I Error Probability of Type I Error = = =.4 Probability of Type I Error Probability of Type I Error =.5.4 = α=.5 Figure 3: Comparison of Type I Error Absolute value of the difference Absolute value of the difference Absolute value of the difference 8 x 3 = x 3 = 3 6 x 3 =4 4 Absolute value of the difference Absolute value of the difference 6 x 3 = x 3 = 4 Figure 4: Absolute Value of the Difference Between the Type I Error and α =.5 In the case of α =.5, type I errors in both and are close to.5 at each level of. It can be noted more clearly from Figure 4 that the absolute values of difference between Type I Errors and α =.5 are small in both cases. 6

36 . Compare the power of testing H : µ = vs.h a : µ Table 3: The Power of testing H : µ = (,, ) vs.h a : µ = (,, 3)

37 =6 =8 Power of testing Power of testing = =.8 Power of testing.5 Power of testing.6.4. =4.8 Power of testing.6.4. Figure 5: Compare the Power of Testing In order to compare the power of testing, our levels of standard deviation have been chosen as = 6, 8,,, 4 to make the results more observable. It can be observed from Figure 5 that, at each level of, the power of testing is much higher in case than that in. 3. Compare the confidence intervals of µ 3. Compare the coverage probabilities of confidence intervals of µ It can be noted from Figure 6 and Figure 7 that, in both case and, the coverage probabilities of the 95% confidence intervals of µ are close to.95. The absolute values of differences between the coverage probabilities and confidence level.95 are small in both cases. 8

38 Table 4: The Coverage probabilities of confidence intervals of µ =

39 Coverage Probability Coverage Probability Coverage Probability = =.94 = Coverage Probability Coverage Probability = = Confidence level=.95 Figure 6: Compare the Coverage Probabilities of the confidence intervals of µ Absolute value of the difference Absolute value of the difference Absolute value of the difference 4 x 3 = x 3 = x 3 =4 6 4 Absolute value of the difference Absolute value of the difference 4 x 3 = x 3 = 4 Figure 7: Absolute Value of the Difference Between the Coverage Probabilities and.95 3

40 3. Compare the average width of confidence intervals of µ Table 5: The average width of confidence intervals of µ =

41 . =.5.4 =.5 Average Width.5..5 Average Width.3.. =.8 =.5 Average Width.6.4. Average Width.5 =4.5 Average Width.5.5 Figure 8: Compare the Average Widths of the Confidence Intervals of µ As presented in Figure 8, at each level of, the average width of the confidence intervals is smaller in case than that in. 3.3 Compare the average variance of ˆµ As presented in Figure 9, at each level of, the average variance of ˆµ is smaller in case than that in. This result explains our finding in section 3.: the average width of the confidence intervals is smaller in case than that in. 3

42 Table 6: The average variance of ˆµ e e e e e e e

43 Average Variance Average Variance Average Variance 8 x 3 = = =4.5 Average Variance Average Variance =.5 = Figure 9: Compare the average variance of ˆµ 4. Compare the confidence intervals of µ µ 4. The coverage probabilities of confidence intervals of µ µ (µ =, µ = ) It can be observed from Figure and Figure that, in both case and, the coverage probabilities of the 95% confidence intervals of µ µ are close to.95. The absolute values of differences between the coverage probabilities and confidence level.95 are small in both cases. 34

44 Table 7: The coverage probability of confidence intervals of µ µ (µ =, µ = )

45 Coverage Probability Coverage Probability Coverage Probability = = =.94 Coverage Probability Coverage Probability =.5.94 = Confidence level=.95 Figure : Compare the Coverage Probabilities of the confidence intervals of µ µ Absolute value of the difference Absolute value of the difference Absolute value of the difference 6 x 3 = x 3 = 4 6 x 3 =4 4 Absolute value of the difference Absolute value of the difference 6 x 3 = x 3 = 4 Figure : Absolute value of the difference between Coverage Probabilities and.95 36

46 4. Average widths of confidence intervals of µ µ (µ =, µ = ) Table 8: Average widths of confidence intervals of µ µ (µ =, µ = )

47 .4 =.5.8 =.5 Average Width.3.. Average Width.6.4. = = Average Width.5 Average Width.5.5 =4 4 Average Width 3 Figure : Compare Average Widths of Confidence Intervals of µ µ Figure shows that the average width of confidence intervals of µ µ is decreasing as the is increasing. The average widths in case are always smaller than those in at each level of. 4.3 Compare the average variance of confidence intervals of µ µ As presented in Figure 3, at each level of, the average variance of ˆµ ˆµ is smaller in case than that in. This result explains our finding in section 4.: the average width of the confidence intervals is smaller in case than that in. 38

48 Table 9: The average variance of ˆµ ˆµ e e e

49 .5 =.5.6 =.5 Average Variance..5 Average Variance.4. =. =.8 Average Variance.5..5 Average Variance.6.4. =4 3 Average Variance Figure 3: Compare the average variance of ˆµ ˆµ VII. Simulation Study with Unknown It is assumed that is unknown in this section, so the estimate of is used. The formula to estimate is shown by Orjuela (3) using MLE method with threedimensional complete data: ˆ = n [(x i x. )(x i x. ) + (x i x. )(x i3 x.3 ) + (x i x. )(x i3 x.3 )] n [(x. i x. ) + (x i x. ) + (x i3 x.3 ) ] Everything else is the same as that in Section VI, specifically, simulation study in this part is conducted to compare the performance of estimates in case only three-dimensional observations and both three-dimensional observations and all possible incomplete observations. 4

50 . Compare the confidence regions of µ Table : Coverage Probabilities of confidence regions of µ = (,, 3)

51 Coverage Probability Coverage Probability Coverage Probability =.5.9 =.95.9 = Coverage Probability Coverage Probability =.5.9 =.95.9 Confidence level=.95 Figure 4: Compare the Coverage Probabilities of the Confidence Regions of µ Absolute value of the difference Absolute value of the difference Absolute value of the differen.3. =.5. = =4 Absolute value of the difference Absolute value of the differen.4.3. =.5. =.4. Figure 5: Absolute value of the difference between Coverage Probabilities and.95 Using the estimates of would make coverage probabilities a little lower than the confidence level.95 for both case and. It can be noted from Figure 5 that the absolute values of the difference between Coverage Probabilities and.95 are around. for each level of. 4

52 . Compare the probability of type I error and power of testing H : µ = vs.h a : µ. Compare the probability of type I error Table : The Probability of Type I error

53 Type I Error.8.6 =.5.4 = Type I Error.8.6 =.5.4 = Type I Error =4 Type I Error Type I Error α=.5 Figure 6: Compare the Type I Error Absolute value of the difference Absolute value of the difference Absolute value of the differen.3.. =.5 = =4 Absolute value of the differen Absolute value of the difference.5..5 =.5. =.3.. Figure 7: Absolute value of difference between Type I Error and α =.5 When using the estimate of, type I errors in both and are a little higher than α =.5 at each level of. It can be observed more clearly from Figure 7 that the absolute values of difference between Type I Errors and α =.5 are around. in both cases. 44

54 . Compare the power of testing H : µ = vs.h a : µ Table : The Power of testing H : µ = vs.h a : µ = (,, 3)

55 Power of testing Power of testing Power of testing.5 =6 =.5 =4.5 Power of testing Power of testing.5 =8 =.5 Figure 8: Compare the Power of Testing H : µ = vs.h a : µ = (,, 3) With regard to the power of testing, it can be observed that, when the level of is fixed, case has higher power than that of case. 3. Compare the confidence intervals of µ 3. Compare the coverage probabilities of confidence intervals of µ The coverage probabilities of the 95% confidence intervals are close to.95 for all cases, and the differences between coverage probabilities and confidence level.95 are small. 46

56 Table 3: The Coverage probabilities of confidence intervals of µ =

57 Coverage Probability Coverage Probability Coverage Probability.95 =.5.94 = = Coverage Probability Coverage Probability.95 =.5.94 = Confidence level=.95 Figure 9: Compare the Coverage Probabilities of the Confidence Intervals of µ = Absolute value of the difference Absolute value of the difference Absolute value of the differen..5 =.5 =..5 =4..5 Absolute value of the difference Absolute value of the differen..5 =.5 =..5 Figure : Absolute value of the difference between Coverage Probabilities and.95 48

58 3. Compare the average width of confidence intervals of µ Table 4: The average width of confidence intervals of µ =

59 Average Width Average Width Average Width.. =.5 =.5 =4 4 Average Width Average Width.4. =.5 = Figure : Compare the Average Widths of the Confidence Intervals of µ = With regard to the average width of the confidence intervals of µ =, the confidence intervals in are always narrower than those in. 3.3 Compare the average variance of ˆµ As presented in Figure, at each level of, the average variance of ˆµ is smaller in case than that in. This result explains our finding in section 3.: the average width of the confidence intervals is smaller in case than that in. 5

60 Table 5: The average variance of ˆµ e e e e e e e

61 Average Variance Average Variance..5 =.5 =..5 Average Variance =4 Average Variance Average Variance.4. =.5 =.5 Figure : Compare the average variance of ˆµ 4. Compare the confidence intervals of µ µ 4. The coverage probabilities of confidence intervals of µ µ (µ =, µ = ) Using the estimates of would make coverage probabilities a little lower than the confidence level.95. It can be noted from Figure 4 that the absolute values of the difference are around. for each level of. 5

62 Table 6: The coverage probabilities of confidence intervals of µ µ (µ =, µ = )

63 Coverage Probability Coverage Probability Coverage Probability = =.94.9 = Coverage Probability Coverage Probability =.5 = Confidence level=.95 Figure 3: Compare the Coverage Probabilities of the Confidence Intervals of µ µ Absolute value of the difference Absolute value of the difference Absolute value of the differen.3. =.5. = =4 Absolute value of the difference Absolute value of the differen.4.3. =.5. =.4. Figure 4: Absolute value of the difference between Coverage Probabilities and.95 54

64 4. The average width of confidence intervals of µ µ (µ =, µ = ) Table 7: The average width of confidence intervals of µ µ (µ =, µ = )

65 Average Width Average Width Average Width.4. =.5 =.5 =4 4 Average Width Average Width.5 =.5 = Figure 5: Compare the Average Widths of the Confidence Intervals of µ µ With regard to the average width of the confidence intervals of µ µ, the confidence intervals in are always narrower than those in. 4.3 Compare the average variance of ˆµ ˆµ As presented in Figure 6, at each level of, the average variance of ˆµ is smaller in case than that in. This result explains our finding in section 4.: the average width of the confidence intervals is smaller in case than that in. 56

66 Table 8: The average variance of ˆµ e e

67 Average Variance Average Variance.. =.5 =. Average Variance. =4 4 Average Variance Average Variance.5 =.5 =.5 Figure 6: Compare the average variance of ˆµ VIII. Conclusion It can be concluded from the numerical study that, compared to estimators in the case of omitting observations with missing data, using the considered estimators in this thesis would generally lead to better performance in the following aspects: higher power of testing and shorter confidence intervals, while keeping the coverage probabilities and type I errors close to the desired α level. Furthermore, when is unknown, using the estimate of would lead to the same conclusion. 58

A Study on the Correlation of Bivariate And Trivariate Normal Models

Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 11-1-2013 A Study on the Correlation of Bivariate And Trivariate Normal Models Maria