Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical Methods and Statistics Bulgaria, 707 Rousse, 8 Studentska str, room 44 eveleva@ruacadbg Abstract We consider samples with monotone missing data, drawn from a normal population to test if the covariance matrix is equal to a given positive definite matrix We propose an imputation procedure for the missing data and give the exact distribution of the corresponding likelihood ratio test statistic from the classical complete case Mathematics Subject Classification: 6H0, 6H5 Keywords: monotone missing data, Bellman gamma distribution, covariance matrix, hypotheses testing Introduction The problem of missing data is an important applied problem, because missing values are encountered in many practical situations (see [4]) Two commonly used approaches to analyze incomplete data are the likelihood based approach and the multiple imputation The imputation method is to impute the missing data to form complete data and then use the standard methods available for the complete data analysis For a good exposition of the imputation procedures and validity of imputation inferences in practice, we refer to [6] In this paper we consider samples with monotone missing data pattern Let (X,,X n ) t be a random vector with multivariate normal distribution N n (μ, Σ),where the mean vector μ and the covariance matrix Σ are unknown Suppose that we have k + + k n independent observations, k of which are on (X,,X n ) t, k -on(x,,x n ) t and so on, k n on the random variable X n Assume that k j 0 and m j = k + + k j >j, j =,,n The data
696 E Veleva can be written in the following pattern, known as a monotone pattern x, x,m x n, x n,m x n,m + x n, x n, x n,m x n,m + x n, x n, () In the literature on inference for μ and Σ, it is noticeable that the exact distributions of ˆμ and ˆΣ, the maximum likelihood estimators of μ and Σ, have remained unknown This problem is basic to inference with incomplete data when large samples are infeasible or impractical (see []) In [] the authors initiate a program of research on inference for μ and Σ with the goal of deriving explicit results analogous to those existing in the classical complete case In this paper we consider hypotheses H 0 :Σ=Σ 0 against H a :Σ Σ 0, propose an imputation procedure for the missing data in () and give the exact distribution of the corresponding likelihood ratio test statistic from the classical complete case Preliminary Notes We shall use the following known propositions, which can be found in [5] Proposition Let the n random vector x be normally distributed according to x N n (μ, Σ), then the m random vector y, obtained by the linear transformation y =Ax +c, where A denotes an m n matrix of constants with full rank m and c an m vector of constants, has the normal distribution y N m (Aμ +c, AΣA t ) Proposition Let the n random vector x be normally distributed according to x N n (μ, Σ), then x t A x, where A is an n n matrix of constants has noncentral chi-square distribution χ (ranka,μ t Aμ), if and only if the matrix AΣ is idempotent Proposition 3 If A is idempotent, then rank A = tra Proposition 4 If the n n matrix A with ranka =r is idempotent, then I n A is also idempotent with rank(i n A) =n r Proposition 5 Let the n random vector x be normally distributed according to x N n (μ, Σ), then the linear form Ax and the quadratic form x t Bx with the positive definite or positive semidefinite matrix B are independent, if and only if AΣB = 0
Testing a normal covariance matrix 697 Let A be a real square matrix of order n Let α and β be nonempty subsets of the set N n = {,,n} By A[α, β] we denote the submatrix of A, composed of the rows with numbers from α and the colus with numbers from β When β α, A[α, α] is denoted simply by A[α] The Bellman gamma distribution is a matrix variate distribution, which is a generalization of the Wishart and the matrix gamma distributions The next definition is given in [3] By Γ n (a,,a n ) is denoted the generalized multivariate gamma function, Γ n(a,,a n )=π n(n )/4 n Γ ( a j (j )), for a j > (j ), j =,,n Definition 6 A random positive definite matrix U (n n) is said to follow Bellman gamma type I distribution, denoted by U BG I n (a,,a n ;C), if its probability density function is given by n (det C[{i,,n}]) a i a i i= (det U) an (n+)/ Γ n (a,,a n ) n etr( CU), (det U[{,,i }]) a i a i i= where C (n n) is a positive definite constant matrix, a 0 =0and a j > (j ), j =,,n 3 Main Results Let us replace the missing values in () by zero and denote the obtained matrix by X, x, x,m 0 0 0 X = x n, x n,m x n, 0 0 () x n, x n,m x n, x n, + x n, Theorem 3 Let the matrix X, defined by () presents the observations () on a random vector (X,,X n ) t with multivariate standard normal distribution N n (0, I n ) Then the matrix W = XX t has Bellman gamma type I distribution BG I n ( m,, ; I n) Proof Let W =(w i,j ) be the matrix W = XX t and let w i =(w,i,, w i,i ) t, i =,,n Let us denote by z i the vector of available observations on X i, ie z i = (x i,,,x i,mi ) t, i =,,n Let X i be the matrix X i = X[{,,i}, {,,m i+ }], i =,,n The distribution of z i is N mi (0, I mi ), i =,,n Since w i = X i z i and X i X t i = j=
698 E Veleva W[{,,i }], according to Proposition the distribution of w i, i =,,n is N i (0, W[{,,i }]) under the condition that the matrix X i has fixed elements It is easy to see that ( ) W[{,,i }] wi W[{,,i}] = Let w t i w i,i w (i) = w i,i w t i (W[{,,i }]) w i, (3) ie w (i) = det W[{,,i}],i=,,n (4) det W[{,,i }] For i =,,n, w (i) can be written in the form w (i) = z t i z i z t i X t i (X i X t i ) X i z i = z t i[i mi X t i (X i X t i ) X i ]z i Since X i [I mi X t i (X i X t i ) X i ]=X i X i =0, according to Proposition 5, w i is independent of w (i) under the condition that the matrix X i has fixed elements Additionally, since the matrix X t i (X i X t i ) X i is idempotent, from Proposition 3 we have that rank(x t i (X i X t i ) X i )=tr(x t i (X i X t i ) X i )=tr(i i )=i Hence, according to Proposition 4 the matrix I mi X t i (X i X t i ) X i is also idempotent with the rank m i i+ Now, applying Proposition we get that w (i) has chi-square distribution χ (m i i + ), again under the condition that the matrix X i has fixed elements The conditional distributions of w (i) and w i depend only on the elements of the matrix W[{,,i }] Therefore the joint density of w,, w (), w,, w (n), w n will have the form w m, e w, Γ ( ) m m n w mi i+ (i) e w (i) Γ ( m i ) i+ i= m i i+ e wt i (W [{,,i }]) w i (π) i (det W [{,,i }]) By the transformation of the variables w (i) into w i,i by means of (3) with det J = we get the distribution of the elements w i,j of W, which using (4) can be written in the form m + + Γ n ( m ),, (det W ) [ (n+)]/ n (det W [{,,i }]) (m i m i )/ i= etr ( ) W
Testing a normal covariance matrix 699 ( Consequently, according to Definition W BG I m n,, ; I n) Let z i be the vector of available observations on X i in (), ie z i = (x i,,,x i,mi ) t, i =,,n Let us denote by z i the mean of the elements of z i, i =,,n Consider the data matrix x, x,m z z z X = x n, x n,m x n, z n z n, x n, x n,m x n, x n, + x n, (5) in which we substitute z,, z n for the missing values in,, n th row respectively of the data in () Theorem 3 Let the matrix X, defined by (5) presents the observations () on a random vector (X,,X n ) t with multivariate normal distribution N n (μ, I n ) Let x i, i =,,m n be the colu vectors of the matrix X, x be the vector x=( z,, z n ) t and S be the matrix m n S = (x i x)(x i x) t (6) i= Then x and S are independent, x ( N n (μ, diag(m,,m n )) and S has Bellman gamma distribution BG I m n,, ; I n) Proof Let Θ be an orthogonal m n m n matrix of the form: θ, θ,m θ,m + θ,m θ, θ, θ,m θ,m + θ,m θ, θ m, θ m,m θ,m + θ,m θ, Θ= 0 0 θ m +,m + θ m +,m θ, 0 0 θ m,m + θ m,m θ, 0 0 0 0 θ, 0 0 0 0 θ, The all elements in the first colu of Θ are equal to In colus from tom, the last m n m elements are equal to zero In colus from m j +
700 E Veleva to m j+, j =,,n the last m n m j+ elements are equal to zero and the first m j elements are equal to each other It can be easily checked that such orthogonal matrix exists and even is not unique Let Y =(y i,j ) be the matrix Y = X Θ Then YY t = X ΘΘ t X t = XX t Let us denote by y i, i =,,m n the colu vectors of Y It is easy to see that y = m n x Therefore the matrix S, defined by (6) can be written in the form m n m n S = x i x t i m n x x t = XX t y y t = YYt y y t = y i yi t (7) i= Since the matrix Θ is orthogonal, θ t θ i = 0 for i =,,m n, where θ i, i =,,m n denote the colu vectors of Θ Therefore, the sum of elements of θ i equals to zero, i =,,m n Hence it follows that E(y i )=0,i =,,m n It can be checked that in the matrix Y the all elements, lying on the places of the missing data in () are equal to zero Let y k,s, s be a nonzero element of Y Then <s, therefore the last m n elements of the vector θ s are equal to zero Let the vector θ s has m nonzero elements, then y k,s = x k, θ,s + + x k,m θ m,s (8) The distribution of y k,s as a linear combination of independent normal distributed random variables is also normal The variance of y k,s is Var(y k,s )=θ,s Var(x k,)+ + θ m,s Var(x k,m) =θ,s + + θ m,s = For k =,,n i= y k, = m n z k = (x k, + + x k,mk ) (9) Consequently y k, is also a linear combination of the observations in the k th row of () and hence is normally distributed, E(y k, )= [E(x k, )+ + E(x k,mk )] = m n μ k, where μ k is the k th coordinate of the mean vector μ and Var(y k, )= m n [Var(x m k, )+ + Var(x k,mk )] = m n k Since (X,,X n ) t N n (μ, I n ), the vectors z,, z n are independent Consequently, if k,, k p are different integers from the interval [,n] and y k,s,, y kp,s p are nonzero elements of the matrix Y, then they are independent Hence for i =,,m n the nonzero elements of the vector y i have joint
Testing a normal covariance matrix 70 multivariate normal distribution For i =,,m n it is N n j (0, I n j ), where j is the biggest integer, such that m j <i(m 0 = ) Since y = m n x, the distribution of x is N n (μ, diag(m,,m n )) From (8) and (9) it follows that if y k,s,, y k,sp are arbitrary nonzero elements of Y, then they have joint multivariate normal distribution Moreover, (8) and (9) can be written in the form y k,s =(x k, μ k )θ,s + +(x k,m μ k )θ m,s,s, y k, =(x k, μ k ) + +(x k,mk μ k ) + μ k Hence for s> Cov(y k,,y k,s )=E(y k, y k,s )=E(x k, μ k ) θ,s + and for <s<q + E(x k,m μ k ) θ m,s =(θ,s + + θ m,s ) =0 Cov(y k,s,y k,q )=E(y k,s y k,q )=E(x k, μ k ) θ,s θ,q + + E(x k,m μ k ) θ m,s θ m,q = θ,s θ,q + + θ m,s θ m,q =0 Therefore y k,s,, y k,sp are independent and hence the vectors y,, y are independent Consequently from (7) and Theorem 3 the Theorem follows For the data in (), let us consider the hypotheses H 0 :Σ=Σ 0 against H a : Σ Σ 0, where Σ 0 is an arbitrary positive definite matrix of size n It is shown in [4], that the testing problem is invariant under a suitable transformation of the data in () and without loss of generality we can assume that Σ 0 =I n, where I n is the identity matrix of size n It is easy to see that under H 0 :Σ=I n, the maximum likelihood estimations for the missing values in the j th row in () are equal to z j, j =,,n (m n ) The matrix the data matrix (5) Let us consider the modified likelihood ratio test statistic λ, S is actually the empirical covariance matrix, obtained from λ =(e/m n ) n/ (det S) ( )/ e (trs)/, which is unbiased in the classical case of fully observed data matrix (see []) Theorem 33 Under H 0 :Σ=I n, λ is distributed as the product K (ζ ζ n ) ( )/ η η n,
70 E Veleva where K is the constant K =(e/m n ) n/ n( )/, ζ,, ζ n, η,, η n are mutually independent random variables, ζ j has Beta distribution Beta ( (m j+ j )/,j/), j =,,n and η j ξ ( )/ j e ξ j, where ξ j is Gamma distributed G((m j )/, ), j =,,n The proof of Theorem 33 will appear in a subsequent paper Stochastic representations of the Bellman gamma distribution Since Theorem 33 give the exact distribution of the test statistic λ, the test procedure, suggested in this paper is proper for small samples References [] W Chang and D St P Richards, Finite - sample inference with monotone incomplete multivariate normal data I, J Multivariate Anal, In Press, Corrected Proof, Available online 4 May 009 [] NC Giri, Multivariate Statistical Analysis, Marcel Dekker Inc, New York, 004 [3] AK Gupta and DK Nagar, Matrix variate distributions, Chapman & Hall/CRC, 000 [4] J Hao and K Krishnamoorthy, Inferences on a normal covariance matrix and generalized variance with monotone missing data, J Multivariate Anal, 78 (00), 6 8 [5] K R Koch, Estimation and Hypothesis Testing in Linear Models, Springer - Verlag, Berlin, 999 [6] JA Little and DB Rubin, Statistical Analysis With Missing Data, nd edition, Wiley - Interscience, Hoboken, NJ, 00 Received: May, 009