Examensarbete. Explicit Estimators for a Banded Covariance Matrix in a Multivariate Normal Distribution Emil Karlsson

Size: px

Start display at page:

Download "Examensarbete. Explicit Estimators for a Banded Covariance Matrix in a Multivariate Normal Distribution Emil Karlsson"

Sharyl Horn
6 years ago
Views:

1 Examensarbete Explicit Estimators for a Banded Covariance Matrix in a Multivariate Normal Distribution Emil Karlsson LiTH - MAT - EX / SE

3 Explicit Estimators for a Banded Covariance Matrix in a Multivariate Normal Distribution Mathematical Statistics, Linköpings Universitet Emil Karlsson LiTH - MAT - EX / SE Examensarbete: 16 hp Level: G2 Supervisor: Martin Singull, Mathematical Statistics, Linköpings Universitet Examiner: Martin Singull, Mathematical Statistics, Linköpings Universitet Linköping: Januari 2014

5 Abstract The problem of estimating mean and covariances of a multivariate normal distributed random vector has been studied in many forms. This thesis focuses on the estimators proposed in [15] for a banded covariance structure with m-dependence. It presents the previous results of the estimator and rewrites the estimator when m = 1, thus making it easier to analyze. This leads to an adjustment, and a proposition for an unbiased estimator can be presented. A new and easier proof of consistency is then presented. This theory is later generalized into a general linear model where the corresponding theorems and propositions are made to establish unbiasedness and consistency. In the last chapter some simulations with the previous and new estimator verifies that the theoretical results indeed makes an impact. Keywords: Banded covariance matrices, Covariance matrix estimation, Explicit estimators, Multivariate normal distribution, general linear model. URL for electronic version: Karlsson, v

6 vi

7 Acknowledgements First of all, I want to thank my supervisor Martin Singull. You introduced me to the world of mathematical statistics and have inspired me alot. You have read, helped and commented on my work for a long time and have always been helpful when I ran into questions, whether it was about style, the thesis or mathematics in general. Further I would like to thank my opponent Jonas Granholm for the comments regarding the thesis and for the time taken to improve it. I also want to thank Tahere for your encouragement and continuous support. I want to thank mum and dad as well. Lastly I would like to thank the most important person in my life, Ghazaleh. You have always been an incredible source of inspiration and happiness for me and have helped me in ways I never could have imagined. You are the most incredible person I have ever met. Thank you for making me a better person. I love you. Karlsson, vii

8 viii

9 Nomenclature Most of the reoccurring abbreviations and symbols are described here. Notation Throughout the thesis matrices and vectors will be denoted by boldface transcription. Matrices will be denoted by capital letters and vectors by small letters of Latin or Greek alphabets. Scalars and matrix elements will be denoted by ordinary letters of Latin or Greek alphabets. Random variables will be denoted by capital letters from the end of the Latin alphabet. The end of proofs are marked by. LIST OF NOTATION A m,n - matrix of size m n M m,n - the set of all matrices of size m n a ij - matrix element of the i-th row and j-th column a n - vector of size n c - scalar A - transposed matrix A I n - identity matrix of size n A - determinant of A rank(a) - rank of A tr(a) - trace of A X - random matrix x - random vector X - random variable E[X] - expectation var(x) - variance cov(x, Y ) - covariance of X and Y S - sample dispersion matrix N p (µ, Σ) - multivariate normal distribution N p,n (M, Σ, Ψ) - matrix normal distribution MLE - Maximum likelihood estimator Karlsson, ix

10 x

11 Contents 1 Introduction Chapter outline Mathematical background Linear algebra General definitions and theorems Trace Idempotence Vectorization and Kronecker product Statistics General concepts The Maximum Likelihood Estimator The matrix normal distribution General Linear Model Quadratic- and bilinear form Explicit estimators of a banded covariance matrix Previous results Remodeling of explicit estimators Proof of unbiasedness and consistency Generalization to a general linear model Presentation ˆB instead of ˆµ Proposed estimators Proof of unbiasedness and consistency Simulations Simulations of the regular normal distribution Simulations of the estimators for a general linear model Discussion and further research Discussion Further research A Motivation of the estimator 27 Karlsson, xi

12 xii Contents

13 Chapter 1 Introduction There exist many testing, estimation, confidence interval and types of regression models in the multivariate statistical literature that are based on the assumption that the underlying distribution are normally distributed [21, 11, 1]. The primary reason is that often sets of multivariate datasets are, at least approximately, normally distributed. The multivariate normal distribution is also simpler to analyze than many other distributions. For example all the information in a multivariate normal distribution can be found in its mean and covariances. Because of this, estimating the mean and covariances are subjects of importance in statistics. This thesis will study an estimating procedure of a patterned covariance matrix. Patterned covariance matrices arise from a variety of different situations and applications and have been studied by many authors. A seminar paper in the 1940:s [23] considered patterned covariances when studying psychological tests. This was the introduction of the covariance matrix with equal diagonal and equal off-diagonal elements. Two years later the author extended this structure into blocks which had a certain pattern [22]. During the late 1960:s the covariance model where the covariances where depended on the distance between variables, a circular stationary model, where developed by [17]. There were different developments and directions for covariance matrices with structures, e.g., [16, 12, 3]. Some extensions from a linear model with one error term into mixed linear models, growth curves and variance component models were made, e.g., [6, 20]. More recent results, e.g., on block structures of matrices, can be found in [13, 14] and others. A special type of patterned covariance matrix called a banded covariance matrix will be discussed in this thesis. Banded covariance matrices are common in applications and arise often in association with time series. For example in signal processing, covariances of Gauss-Markov random processes or cyclostationary processes [24, 10, 5]. In this thesis we will study a special case of banded matrices with unequal elements except that certain covariances are zero. These covariance matrices will have a tridiagonal structure. Originally, estimates of covariance matrices were obtained using non-iterative methods such as analysis of variance (ANOVA) and minimum norm quadratic unbiased estimation (MINQUE). The introduction of the modern computer changed a lot of things and the cheap processing power made it possible to use iterative methods which performed Karlsson,

14 2 Chapter 1. Introduction better. With this came the rise of the maximum likelihood method and more general estimating equations. These methods has surely dominated during the last years. But nowadays we see a shift back to non-iterative methods since the datasets has grown tremendously. There are examples in EEG/EKG-studies in medicine, QTL-analysis in genetics or time series in meteorology. With such huge datasets estimating with iterative methods can be a slow and tedious job. This thesis will discuss some properties of an explicit non-iterative estimator for a banded covariance matrix derived in [15] and present some improvement to this estimator. The improvements gives an unbiased and consistent estimator under the special case of first order dependence for the mean and the covariance matrix. 1.1 Chapter outline Chapter 2: Introduces the necessary contexts and results in mathematics, mainly in linear algebra and statistics, which are required for studying this thesis. Some of these results are references to throughout the thesis. Chapter 3: Presents the explicit estimator and some results regarding it. From these results a new unbiased explicit estimator is suggested. Chapter 4: The explicit estimator is generalized for estimating the covariance matrix in a general linear model. Later an unbiased estimator is proposed. Chapter 5: Simulations based on the new unbiased explicit estimator in comparison to the previous explicit estimators are presented. Chapter 6: Discuss further improvements and directions of research which are interesting with regard to the explicit estimator. Appendix: Presents a short motivation of the estimator given in [15].

15 Chapter 2 Mathematical background This chapter will introduce some of the mathematics that are needed to understand this thesis. Some of the material may be new and other not, but the aim of this chapter is to give a person studying a Bachelor program in Mathematics sufficient background to understand this thesis. Most of theorems will be given without a proof but it can be accessed through the referenced sources of each section. 2.1 Linear algebra Since this thesis makes extensive use of linear algebra and matrices some of the more common definitions and theorems will be introduced here General definitions and theorems In the first part follows some general definitions and theorems in real linear algebra. This section presents some notation and expressions that are required later on. These results will be given without proof but can be found in [7, 8, 2]. Definition 1. The set of all matrices with m rows and n columns is denoted as M m,n. Definition 2. A matrix A M n,n is called the identity matrix of size n, denoted with I n, if the diagonal elements are 1 and the off-diagonal elements are 0. Definition 3. A matrix A M m,n is called a tridiagonal matrix if a ij = 0 for all i j > 1, where a ij is the element on row i and column j of the matrix A. Definition 4. A vector 1 n M n,1 of size n is called the one-vector of size n, e.g., where 1 denotes the transpose of = ( ). Definition 5. A matrix A M m,n is called the zero-matrix, denoted as 0 m,n, if all matrix elements of A equals 0. Definition 6. The rank of a matrix A M m,n, denoted as rank(a), is defined as the number of linearly independent columns (or rows) of the matrix. Karlsson,

16 4 Chapter 2. Mathematical background Definition 7. A matrix A M m,n is called symmetric if A = A. Definition 8. A matrix A M m,n is called normal if AA = A A. Definition 9. A matrix A M m,n is called orthogonal if A A = I n. In that case AA = I m as well. Definition 10. A symmetric square matrix A M n,n is positive (semi-)definite if x Ax > ( )0 for any vector x 0. Definition 11. The values λ i, which satisfy Ax i = λ i x i, are called eigenvalues of the matrix. The vector x i, which corresponds to the eigenvalue λ i, is called eigenvector of A corresponding to λ i. And lastly a theorem regarding matrix decomposition, called the spectral theorem. Theorem 1. Any normal matrix A M n,n has an orthonormal basis of eigenvectors. In other words, any normal matrix A can be represented as A = UDU, where U M n,n is an orthogonal matrix and D M n,n is a diagonal matrix with the eigenvalues of A on the diagonal Trace This section introduces the linear operator called trace which is frequently used in this thesis. These results can be found in [7, 2]. Definition 12. The trace of a square matrix A M n,n, denoted by tr(a), is defined to be the sum of its diagonal entries, that is, n tr(a) = a ii. Here follows some common properties regarding the trace of a matrix. Lemma 1. (i) tr(a + B) = tr(a) + tr(b) for A, B M n,n. (ii) tr(ca) = ctr(a) for all scalars c. Thus, the trace is a linear map. (iii) tr(a ) = tr(a) i=1 (iv) tr(ab) = tr(ba) for A M n,m and B M m,n. Equivalently, the trace is invariant during cyclic permutations, i.e., (v) tr(abcd) = tr(bcda) = tr(cdab) = tr(dabc) for A, B, C and D of proper sizes. This is known as the cyclic property. The trace of matrix A can also be written as the sum of its eigenvalues. Lemma 2. Let A M n,n and λ 1,..., λ n the eigenvalues of A, listed according to their algebraic multiplicities, then, n tr(a) = λ i. i=1

17 2.1. Linear algebra Idempotence Here follows a definition and some results regarding idempotent matrices. This type of matrices arises frequently in multivariate statistics. These results rely on [7, 2]. Definition 13. A matrix A M n,n is called idempotent if A 2 = AA = A. In terms of Theorem 1, an idempotent matrix is always diagonalizable. Lemma 3. An idempotent matrix is always diagonalizable and its eigenvalues are either 0 or 1. Since an idempotent matrix only has eigenvalues 0 or 1, it is easy to express its trace. Lemma 4. For an idempotent matrix A M n,n, follows tr(a) = rank(a). Idempotent matrices have a strong connection to orthogonal projections in the following sense. Theorem 2. A matrix A M n,n is idempotent and symmetric if and only if A is an orthogonal projection. It is also possible to construct new idempotent matrices by subtracting an idempotent matrix from the identity matrix. Theorem 3. Let A M n.n be an idempotent matrix. Then the matrix B = I n A is also idempotent. Proof. BB = (I n A)(I n A) = I n 2I n A + A 2 = I n 2A + A = I n A = B Vectorization and Kronecker product In this section some definitions and results on vectorization and Kronecker product are presented. These results simplifies notation regarding matrix normal distribution and makes it easier to interpret. The results rely on [8]. Definition 14. Let A = (a ij ) be a p q-matrix and B = (b ij ) an r s-matrix. Then the pr qs-matrix A B is called the Kronecker product of the matrices A and B, if where A B = [a ij B], i = 1,..., p; j = 1,..., q, a ij b 11 a ij b 1s a ij B = a ij b r1 a ij b rs Definition 15. Let A = (a 1,..., a q ) be a p q-matrix, where a i, i = 1..., q is the i-th column vector. The vectorization operator vec is an operator from R p q to R pq, with veca = a 1. a q.

18 6 Chapter 2. Mathematical background 2.2 Statistics This section presents some general definitions in statistics as well as some results in multivariate statistics and the general linear model. The ambition is to give enough substance to establish a basis for the understanding of this thesis. Most of the results will be given without proof but can be found in the references for each subsection General concepts Here follows some definitions and properties regarding estimation in statistics. These definitions and theorems comes from [4]. At first we present the definition of an unbiased estimator. Definition 16. The bias of a point estimator ˆθ of a parameter θ is the difference between the expected value of ˆθ and θ; that is, Bias θ (ˆθ) = E ˆθ θ. An estimator whose bias is identically (in θ) equal to 0 is called unbiased and satisfies E[ˆθ] = θ for all θ. Here follows the definition of a consistent estimator. Definition 17. A sequence of estimators ˆθ n = ˆθ n (X 1,..., X n ) is a consistent sequence of estimators of the parameter θ if for every ɛ > 0 and every θ in the parameter space, lim n P θ( ˆθ n θ < ɛ) = 1. And lastly a theorem which is widely used to establish consistency when unbiasedness already has been proved. Theorem 4. If ˆθ n is a sequence of estimators of a parameter θ satisfying, (i) (ii) lim V ar(ˆθ n ) = 0, n lim Bias θ(ˆθ n ) = 0, for every θ in the parameter space, n then ˆθ n is a consistent sequence of estimators of θ The Maximum Likelihood Estimator In this section the estimating procedure, called maximum likelihood, is examined closer and some results regarding properties of these estimators are presented. These results can be found in [4]. First we present a formal definition of the maximum likelihood estimator (MLE). Definition 18. For each sample point x, let ˆθ(x) be a parameter value at which L(θ x) attains its maximum as a function of θ, with x held fixed. A maximum likelihood estimator (MLE) of the parameter θ based on a sample X is ˆθ(X). The main reasons why the MLEs are so popular are the following theorems regarding its asymptotically properties.

19 2.2. Statistics 7 Theorem 5. Let X 1, X 2,..., be identically independent distributed f(x θ), and let L(θ x) = n f(x i θ) be the likelihood function. Let ˆθ denote the MLE of θ. Let τ (θ) i=1 be a continuous function of θ. Under some regularity conditions, which can be found in Miscellanea in [4] on f(x θ) and hence, L(θ x), for every ɛ > 0 and every θ Θ lim n P θ( τ (ˆθ) τ (θ) ɛ) = 0. That is, τ (ˆθ) is consistent estimator τ (θ). Theorem 6. Let X 1, X 2,..., be identically independent distributed f(x θ), let ˆθ denote the MLE of θ, and let τ (θ) be a continuous function of θ. Under some regularity conditions, which can be found in Miscellanea in [4] on f(x θ) and hence L(θ x), n(τ (ˆθ) τ (θ)) N(0, varlb (ˆθ)), where var LB (ˆθ) is the Cramér-Rao Lower Bound. That is, τ (ˆθ) is a consistent, asymptotically unbiased and asymptotically efficient estimator of τ (θ). Note 1. The regularity conditions in the two previous theorems are rather loose conditions on the probability density function. Since the underlying distribution in this thesis is the normal distribution, the regularity conditions possess no restriction in the development of results in this thesis The matrix normal distribution This section presents some definitions and estimators for the normal distribution generalized into the multivariate and matrix normal distribution. These results can be found in [4, 8, 9, 21, 11]. First of the univariate normal distribution is defined. Definition 19. A random variable X is said to have a univariate normal distribution, i.e., X N(µ, σ 2 ), if the probability density function of X is, f(x) = ( ) 1 exp (x µ)2 2πσ 2 2σ 2. In order to generalize this into a multivariate setting we need to define the covariance of a vector, e.g., the covariance matrix. Definition 20. If the random vector x m has mean µ the covariance matrix of x is defined to be the m n-matrix, where σ ij = cov(x i, X j ). Σ = (σ ij ) = cov(x) = E[(x µ)(x µ) ], In this thesis the covariance matrix with a banded structure is studied. It is defined as follows

20 8 Chapter 2. Mathematical background Definition 21. A covariance matrix of size k has a banded structure of order 1 < m < k, denoted as Σ (k) (m), if, σ 11 σ σ 1m σ 12 σ 22 σ σ 2,m Σ = σ k m+1,k 1... σ k 2,k 1 σ k 1,k 1 σ k 1,k σ k m,k... σ k 1,k σ kk Here follows the definition of a multivariate (vector-) normal distribution. Definition 22. A random vector x m is said to have a m-variate normal distribution if, for every a R m, the distribution of a x is univariate normal distributed. Theorem 7. If x has an m-variate normal distribution then both µ = E[x] and Σ = cov(x) exist and the distribution of x is determined by µ and Σ. In the standard case the following estimators are most frequently used. Theorem 8. Let x 1,..., x n be random sample from a multivariate normal population with mean µ and covariance Σ. Then, ˆµ = x = 1 n n i=1 x i and ˆΣ = 1 n 1 are the MLE respective corrected MLE of µ and Σ. n (x i x)(x i x), Theorem 9. The estimators in Theorem 8 are sufficient, consistent and unbiased with respect to the multivariate normal distribution. The matrix normal distribution adds another dimension of dependence in the data and is often used to model multivariate datasets. Here follows its definition. i=1 Definition 23. Let Σ = τ τ and Ψ = γγ, where τ M p,r and γ M n,s. A matrix X M p,n is said to be matrix normally distributed with parameters M, Σ and Ψ, if it has the same distribution as, M + τ Uγ, where M M p,n is non-random and U M r,s consists of s independent and identically distributed (i.i.d.) N r (0, I) vectors U i, i = 1, 2,..., s. If X M p,n is matrix normally distributed, this will be denoted X N p,n (M, Σ, Ψ). This definition makes it possible to rewrite the likelihood function in Theorem 8 into a matrix normal setting. Let X = (x 1,..., x n ). Then X N p,n (M, Σ, I n ), where M = µ1 n. This will be a random sample from a matrix normal population. Then the estimator of Σ can be written in the following way, ˆΣ = 1 n 1 XCX,

21 2.2. Statistics 9 where C = I n 1 n 11. The matrix normal distribution is nothing else than a multivariate normal distribution, i.e., let X N p,n (µ, Ω, Ψ), then vecx = x N pn (vecm, Ω), where Ω = Ψ Σ. Here follows a theorem which shows a property of the matrix normal distribution and it is later used in the proof of bilinear expectation. Theorem 10. Let X N p,n (0, Σ, I) and Q M n,n be an orthogonal matrix which is independent of X. Then X and XQ have the same distribution General Linear Model This section introduces the general linear model and some estimators frequently associated with it. The results are taken from [11, 21, 1]. The multivariate linear model generalizes to the general linear model in the sense it allows a vector of observations given by rows of a matrix Y, to correspond to the rows of the known design matrix X. The multivariate model takes the form Y = XB + E, where Y M n,m and E M n,m are random matrices, X M n,p is a known matrix, called the design matrix, and B M p,m is an unknown matrix of parameters called regression coefficients. We will assume throughout this chapter that X has rank k, that n > m + p, and the rows of the error matrix E are independent N m (0, Σ) random vectors. Using the notation introduced in the earlier subsection, this means that E is N n,m (0, I n, Σ) so that Y is N n,m (XB, I n, Σ). Here follows a presentation of the MLEs and some of their properties. Theorem 11. Let Y = XB N n,m (XB, I n, Σ) where n > m + p and p = rank(x). Then the MLE of B and the corrected MLE of Σ are ˆB = (X X) 1 X Y and where ˆΣ = 1 n p (Y X ˆB) (Y X ˆB) = 1 n p Y HY, H = I n X(X X) 1 X. The non-corrected MLE of Σ above is not unbiased therefore the adjustment with the rank of X is required to establish unbiasedness. Here follows a theorem regarding the properties of the estimators. Theorem 12. The estimators in Theorem 11 are unbiased, consistent and are sufficient for (B, Σ).

22 10 Chapter 2. Mathematical background Quadratic- and bilinear form In this section some results regarding quadratic and bilinear forms in statistics are presented. The results here are not so common so the references can be found within the section. First a small presentation of the quadratic form taken from [9]. Definition 24. Let x N n (µ, Σ) and A M n,n. Then x Ax is called a quadratic form. Here follows the expression of the first two moments of the quadratic form. Theorem 13. Let x N n (µ, Σ) and A M n,n. Then (i) E[x Ax] = tr(aσ) + µ Aµ (ii) var[x Ax] = 2tr((AΣ) 2 ) + 4µ AΣAµ Here follows some results on the bilinear form. This form has not been studied so much so here follows a definition and some results based on the same ideas as the quadratic form. Definition 25. Let x N n (µ x, I n ), y N n (µ y, I n ) and A M n,n. Then x Ay is called a bilinear form. Here the first two moments of the bilinear form are presented. Theorem 14. Let x N n (0, Σ x ), y N n (0, Σ y ) and A M n,n. The bilinear form x Ay has the following properties. (i) E[x Ay] = tr(a cov(x, y)) (ii) var[x Ay] = tr(a cov(x, y)) 2 +tr(a var(x)a var(y)) = tr(a) cov(x, y) 2 + tr(a 2 ) var(x) var(y). Proof. Here follows a proof of property one with for the slightly simpler case when A is normal. Since A is normal, it can be written as A = BDB, by the spectral decomposition theorem, where B is orthogonal and D is a diagonal matrix. It is now possible to write the bilinear form as where Q = x Ay = x (BDB )y = (B x) D(B y) = z Dw z = B x and w = B y. Since B is orthogonal Theorem 10 implies that z N n (0, I n ) and w N n (0, I n ). It is now possible to look for the expected value of Q. E[Q] = tr[e(q)] = E[tr(z Dw)] = E[tr(Dz w)] = tr(e[dz w]) = tr(d E[z w]) = tr(d cov(z, w)) = tr(d cov(b x, B y)) = tr(db cov(x, y)b) = tr(bdb cov(x, y)) = tr(a cov(x, y)). The proof for any square matrix and the second property can be found in [19, 18].

23 Chapter 3 Explicit estimators of a banded covariance matrix In this chapter we introduce the estimator developed in [15]. This thesis will focus on the case where the covariance matrix is banded of order one. 3.1 Previous results In [15] an explicit estimator for the covariance matrix for a multivariate normal distribution when the covariance matrix have an m-dependence structure is presented. They propose estimators for the general case when m + 1 < p < n and establish some properties of it. Furthermore the article consider the special case where m = 1 in detail and in this section some of the results from the article will be presented. Proposition 1. Let X N p,n (µ1 n, Σ (1) (p), I n). Explicit estimators are given by ˆµ i = 1 n x i1 n, ˆσ ii = 1 n x icx i for i = 1,..., p, ˆσ i,i+1 = 1 n ˆr icx i+1 for i = 1,..., p 1, where ˆr 1 = y 1 and ˆr i = x i ŝ iˆr i 1 for i = 2,..., p 1, ŝ i = ˆr i 1Cx i ˆr i 1Cx i 1, where C = I n 1 n 1 n1 n. Since these estimators are ad hoc it is important to establish some properties to motivate them. Thus the following theorem. Theorem 15. The estimator ˆµ = (ˆµ 1,..., ˆµ p ) given in Proposition 1 is unbiased and consistent, and the estimator ˆΣ (1) (p) = (ˆσ ij ) is consistent. Karlsson,

24 12 Chapter 3. Explicit estimators of a banded covariance matrix The previous theorem presents two crucial properties for the estimator, consistency and unbiasedness. However, the sample covariance matrix above lacks the property of unbiasedness. One of the main goals of this thesis were the development of this desired property. Regardless the simulations of the estimators in comparison to standard methods, e.g., maximum likelihood have shown great promise which made it probable that and unbiasedness variation of the sample covariance matrix exists. The result of this goal will be presented in the following sections. 3.2 Remodeling of explicit estimators This section presents some rewriting of the estimator above to show some properties which makes it a clearer expression and more suitable for interpretation and analyzing. The estimators presented in Proposition 1 is partly composed of the MLE. The estimator for µ i, ˆµ i = 1 n x i1 n is the mean value which is the MLE. The proposed estimator and the MLE share a resemblance, which can be seen by looking at the estimations of the diagonal elements of the covariance matrix, ˆσ ii = 1 n x icx i for i = 1,..., p. The diagonal elements is the MLE for diagonal of a covariance matrix. Also, the first off-diagonal element of the covariance matrix looks the following way ˆσ 12 = 1 n ˆr 1Cx 2, where So that, ˆr 1 = x 1. ˆσ 12 = 1 n ˆr 1Cx 2 = 1 n x 1Cx 2, which is the MLE for the same element for an unstructured matrix. This can also be seen by looking at the covariance matrix as a whole. σ 11 σ σ 1m σ 12 σ 22 σ σ 2,m σ k m+1,k 1... σ k 2,k 1 σ k 1,k 1 σ k 1,k σ k m,k... σ k 1,k σ kk where the left upper 2x2-partion of the matrix ( ) Σ (1) (2) = σ11 σ 12 σ 21 σ 22

25 3.2. Remodeling of explicit estimators 13 is the unstructured covariance matrix. However, the estimators of the off-diagonal elements of the covariance matrix except σ 12 are not the MLE and are therefore harder to analyze. But when i > 1 we can write the estimators as. ˆσ i,i+1 = 1 n ˆr icx i+1 = 1 n (x i ŝ iˆr i 1 ) Cx i+1 = 1 ( ) x i ˆr i 1Cx i n ˆr ˆr i 1 Cx i+1 i 1C ˆr i 1 = 1 ( ) x n icx i+1 ˆr i 1Cx i ˆr ˆr i 1C ˆr i 1Cx i+1 i 1 and since ˆr k 1C ˆr k 1 is a scalar it is possible to write ˆσ i,i+1 = 1 n ( x i Cx i+1 x ic ˆr i 1 (ˆr i 1C ˆr i 1 ) 1ˆr i 1Cx i+1 ) = 1 n x i ( C C ˆri 1 (ˆr i 1C ˆr i 1 ) 1ˆr i 1C ) x i+1, which can be written as, ˆσ i,i+1 = 1 n x ia i 1 x i+1, where A i = C C ˆr i (ˆr ic ˆr i ) 1ˆr ic. This makes it suitable to propose another form of the estimator in Proposition 1 justified by the calculations above. Hence, the following proposition. Proposition 2. Let X N p,n (µ1 n, Σ (1) (p), I n). Explicit estimators are given by ˆµ i = 1 n x i1 n, ˆσ ii = 1 n x icx i for i = 1,..., p, ˆσ i,i+1 = 1 n x ia i 1 x i+1 for i = 1,..., p 1, where ˆr 1 = y 1 and ˆr i = x i where A i = C C ˆr i (ˆr ic ˆr i ) 1ˆr ic, with A 0 = C, ˆr i 1Cx i ˆr i 1Cx i 1 ˆr i 1 for i = 2,..., p 1, with C = I n 1 n 1 n1 n.

26 14 Chapter 3. Explicit estimators of a banded covariance matrix Theorem 16. The matrix A i in Proposition 2 is idempotent and symmetric with rank n 2. Proof. Idempotence: A i = C C ˆr i (ˆr ic ˆr i ) 1ˆr ic. A 2 i = (C C ˆr i (ˆr ic ˆr i ) 1ˆr ic)(c C ˆr i (ˆr ic ˆr i ) 1ˆr ic) = C 2 CC ˆr i (ˆr ic ˆr i ) 1ˆr ic C ˆr i (ˆr ic ˆr i ) 1ˆr icc +C ˆr i (ˆr ic ˆr i ) 1ˆr icc ˆr i (ˆr ic ˆr i ) 1ˆr ic = C 2C ˆr i (ˆr ic ˆr i ) 1ˆr ic + C ˆr i (ˆr ic ˆr i ) 1 (ˆr ic ˆr i )(ˆr ic ˆr i ) 1ˆr ic = C C ˆr i (ˆr ic ˆr i ) 1ˆr ic = A i. Symmetry: A i = (C C ˆr i (ˆr ic ˆr i ) 1ˆr ic) = C C ˆr i (ˆr ic ˆr i ) 1ˆr ic = C C ˆr i (ˆr ic ˆr i ) 1ˆr ic = A i. Rank: Since A i is idempotent the rank(a i ) = tr(a i ). This implies the following, rank(a i ) = tr(a i ) = tr(c C ˆr i (ˆr ic ˆr i ) 1ˆr ic) = tr(c) tr(c ˆr i (ˆr ic ˆr i ) 1ˆr ic) = (n 1) tr(c ˆr i (ˆr ic ˆr i ) 1ˆr i) = n 1 tr((ˆr ic ˆr i )(ˆr ic ˆr i ) 1 ) = n 1 tr(1) = n Proof of unbiasedness and consistency The last section presented some alteration to the original estimator which made it possible to rewrite it with a quadratic or bilinear form, centering an idempotent matrix, i.e., ˆσ ii = 1 n x icx i for i = 1,..., p, and ˆσ i,i+1 = 1 n x ia i 1 x i+1 for i = 1,..., p 1. Section presented some results regarding the expectation and variance for quadraticand bilinear form. It is now possible to combine those results with the results of the previous section to develop an unbiased estimator for the covariance matrix. It is also possible to present a new and much simpler proof of the consistency for the sample covariance matrix compared to the proof in [15]. First of we propose an unbiased estimator of the covariance matrix.

27 3.3. Proof of unbiasedness and consistency 15 Proposition 3. Let X N p,n (µ1 n, Σ (1) (p), I n). Explicit estimators are given by ˆµ i = 1 n x i1 n, ˆσ ii = 1 n 1 x icx i for i = 1,..., p, ˆσ 12 = 1 n 1 x 1Cx 2, ˆσ i,i+1 = 1 n 2 x ia i 1 x i+1 for i = 2,..., p 1, ˆr 1 = y 1 and ˆr i = x i where A i = C C ˆr i (ˆr ic ˆr i ) 1ˆr ic, ˆr i 1Cx i ˆr i 1Cx i 1 ˆr i 1 for i = 2,..., p 1, with C = I n 1 n 1 n1 n. Here follows the proof of the unbiasedness of the previous estimator. Theorem 17. The estimators from Proposition 3 are unbiased. Proof. The estimators ˆµ i, ˆσ ii for i = 1,..., p and ˆσ 12 coincide with the corrected maximum likelihood and are thus according to Theorem 6 unbiased. Therefore it remains to prove that ˆσ i,i+1 are unbiased for i = 2,..., p 1. When the estimators are derived in Appendix, they are conditioned on the previous x:s. That is the calculation of ˆσ i,i+1 assumes x 1,..., x i 1 to be known constants. Therefore, the matrix A i below can be considered as a non-random matrix. We can then consider ˆσ i,i+1 as a bilinear form and calculate its expectation. ( 1 = E n 2 x ia i 1 x i+1 x 1,..., x i 1 ( ) 1 E(ˆσ i,i+1 ) = E n 2 x ia i 1 x i+1 According to the bilinear form, Theorem 14 gives ) = 1 n 2 E(x ia i 1 x i+1 x 1,..., x i 1 ). E(ˆσ i,i+1 ) = 1 n 2 rank(a i 1)σ i,i+1 = 1 n 2 (n 2)σ i,i+1 = σ i,i+1, where we have used Theorem 19 for the second last equality. Thus E(ˆσ i,i+1 ) = σ i,i+1 and the theorem has been proved. The proof for consistency follow the same structure as the proof above but instead uses that the estimators are unbiased and study the variance of the estimators. Theorem 18. The estimators from Proposition 3 are consistent.

28 16 Chapter 3. Explicit estimators of a banded covariance matrix Proof. The estimators ˆµ i, ˆσ ii for i = 1,..., p and ˆσ 12 coincide with the corrected maximum likelihood and are thus according to Theorem 5 consistent. Therefore it remains to prove that ˆσ i,i+1 are consistent for i = 2,..., p 1. When the estimators are derived in the Appendix, they are conditioned on the previous x:s. That is the calculation of ˆσ i,i+1 assumes x 1,..., x i 1 to be known constants. Therefore, the matrix A i below can be considered as a non-random matrix. We can then consider ˆσ i,i+1 as a bilinear form and calculate its variance. ( ) 1 var(ˆσ i,i+1 ) = var ( 1 = var n 2 x ia i 1 x i+1 x 1,..., x i 1 according to Theorem 14. var(ˆσ i,i+1 ) = Since by Theorem 19 A i 1 is idempotent. var(ˆσ i,i+1 ) = (n 2) x ia i 1 x i+1 ) = 1 (n 2) 2 var(x ia i 1 x i+1 x 1,..., x i 1 ), 1 (n 2) 2 rank(a i 1)σ 2 i,i+1 + tr(a 2 )σ ii σ i+1,i+1. 1 (n 2) 2 tr(a)(σ2 i,i+1 + σ ii σ i+1,i+1 ). It is possible to simplify this further because rank(a) = tr(a) is known, thus, var(ˆσ i,i+1 ) = σ2 i,i+1 + σ iiσ i+1,i+1 n 2 0, when the number of observations goes to infinity. Since the estimator also is unbiased, it is according to Theorem 4 consistent. Thus the theorem has been proved.

29 Chapter 4 Generalization to a general linear model In this chapter the estimator presented earlier will be extended into a general linear model. Then the results proved in the last chapter will be proved for this new estimator. Since the estimating procedure of the covariance matrix for a linear model remains similar to the estimation of a regular multivariate normal covariance matrix, the reasoning will be similar. Two differences of concern are the effect of estimating the regression parameters compared to the expected value and the degrees of freedom in covariance matrix compared to the rank of the design matrix. 4.1 Presentation In this section the presumptions and some notations for the estimators will be given. We will use similar presumptions and model as in Chapter 2 on the general linear model. Thus the multivariate model takes the form Y = XB + E, where Y and E are n m random matrices, X is a known n p-design matrix, and B is an unknown p m-matrix of regression coefficients. We will assume throughout this chapter that X has full rank p, that n m + p, and the rows of the error matrix E are independent N m (0, Σ) random vectors. We will also assign Σ a certain structure as in the previous chapter, hence Σ is banded of order one. 4.2 ˆB instead of ˆµ This section contains a motivation why ˆB will maximize the conditional likelihood function in the same way as ˆµ does. The main difference between a normal model and a general linear model is the regression parameters which affects the mean. Therefore you estimate B instead of µ. Karlsson,

30 18 Chapter 4. Generalization to a general linear model In a general linear model the MLE for B is ˆB = (X X) 1 X Y. Since the general linear model is a fusion between different response value, it is possible to determine the different b separately with the following expression. ˆbi = (X X) 1 X y i, for i = 1,..., k. The explicit estimator in this thesis is derived from stepwise maximization of the likelihood function. This procedure gives the estimators in this thesis and can be found in the Appendix. The estimators, as we have seen, coincide with the mean for a normal distribution. The same principle applies to the general linear model in the following way. B = (b 1,..., b k ) ˆbi y 1,..., y i 1 = ˆb i = (X X) 1 X y i. Since each individual b-vector can be determined independently and because the estimator of ˆb above is the MLE, it will maximize the conditional distribution because of the independence of the b:s. It is then possible to write ˆB = (ˆb 1,..., ˆb k ). The estimators ˆB and ˆΣ are sufficient for B and Σ. This altogether makes a good basis to propose explicit estimators for a general linear model with banded structure of order one. 4.3 Proposed estimators In this section we propose explicit estimators for a general linear model. In the last section a motivation for ˆB = (X X) 1 X Y is given and here follows a motivation for the covariance matrix. In the previous chapter we assumed X N p,n (µ1 n, Σ (p) (1), I n). We now study the general linear model Y N n,p (X design B, I n, Σ (p) (1)) and see that the transformation Y X design B will yield the same model as the last chapter, hence Y X design B N p,n (0, Σ (p) (1), I n). Replacing the observation x i with y i X designˆb in Proposition 3 gives the following appearance of the estimator. Proposition 4. Let Y N n,p (XB, I n, Σ (1) (p)). Explicit estimators are given by ˆB = (X X) 1 X Y, ˆσ ii = 1 n 1 (y i Xˆb i ) C(y i Xˆb i ) for i = 1,..., p, ˆσ 12 = 1 n 1 (y 1 Xˆb 1 ) C(y 2 Xˆb 2 ), ˆσ i,i+1 = 1 n 2 ( yi Xˆb i ) A i 1 (y i+1 Xˆb i+1 ) for i = 2,..., p 1, where A i = C C ˆr i (ˆr ic ˆr i ) 1ˆr ic, with A 0 = C,

31 4.3. Proposed estimators 19 where ˆr 1 = y 1 and ˆr i = y i Xˆb i ˆr i 1C(y i Xˆb i ) ˆr i 1C(y i 1 Xˆb i 1 ) ˆr i 1 for i = 2,..., p 1, with C = I n 1 n 1 n1 n. This can be simplified by changing the main matrix C into D = I n X(X X) 1 X. This removes the necessity of having y i Xˆb i as it is included in the matrix D. This leads us to the following proposition. Proposition 5. Let Y = XB + E N n,p (XB, I n, Σ (1) (p)), where rank(x) = k. Explicit estimators are given by ˆB = (X X) 1 X Y, ˆσ ii = 1 n 1 y idy i ˆσ 1,2 = 1 n 2 y 1Dy 2 for i = 1,..., p ˆσ i,i+1 = 1 n 2 y ia i 1 x i+1 for i = 2,..., p 1 where ˆr 1 = y 1 and ˆr i = y i where A i = D Dˆr i (ˆr idˆr i ) 1ˆr id with A 0 = D, ˆr i 1Dy i ˆr i 1Dy i 1 ˆr i 1 for i = 2,..., p 1 with D = I n X(X X) 1 X Since in the last chapter we saw that the correction of unbiasedness depended on matrix A i above which C is a part of, we need to study the properties of the new A i to determine what kind of estimator for a general linear model will give us unbiasedness. Here follows a theorem regarding the properties of the matrix A i above. Theorem 19. The matrix A i in Proposition 5 is idempotent and symmetric with rank n k 1. Proof. Idempotence: A i = D Dˆr i (ˆr idˆr i ) 1ˆr id. A 2 i = (D Dˆr i (ˆr idˆr i ) 1ˆr id)(d Dˆr i (ˆr idˆr i ) 1ˆr id) = D 2 DDˆr i (ˆr idˆr i ) 1ˆr id Dˆr i (ˆr idˆr i ) 1ˆr idd + Dˆr i (ˆr idˆr i ) 1 ˆr iddˆr i (ˆr idˆr i ) 1ˆr id = D 2Dˆr i (ˆr idˆr i ) 1ˆr id + Dˆr i (ˆr idˆr i ) 1 (ˆr idˆr i )(ˆr idˆr i ) 1ˆr id = D Dˆr i (ˆr idˆr i ) 1ˆr id = A i.

32 20 Chapter 4. Generalization to a general linear model Symmetry: A i = (D Dˆr i (ˆr idˆr i ) 1ˆr id) = D D ˆr i (ˆr idˆr i ) 1ˆr id = D Dˆr i (ˆr idˆr i ) 1ˆr id = A i. Rank: Since A i is idempotent the rank(a i ) = tr(a i ). This implies the following, rank(a i ) = tr(a i ) = tr(d Dˆr i (ˆr idˆr i ) 1ˆr id) = tr(d) tr(dˆr i (ˆr idˆr i ) 1ˆr id) = (n k) tr(dˆr i (ˆr idˆr i ) 1ˆr i) = n 1 tr((ˆr idˆr i )(ˆr idˆr i ) 1 ) = n k tr(1) = n k 1. This gives us the possibility to present the following explicit estimator which in next section will be proved to be unbiased and consistent. Proposition 6. Let Y = XB + E N p,n (XB, I n, Σ (1) (p)), where rank(x) = k. Explicit estimators are given by ˆσ i,i+1 = ˆB = (X X) 1 X Y, ˆσ ii = 1 n k y idy i for i = 1,..., p, ˆσ 12 = 1 n k 1 y 1Dy 2, 1 n k 1 y ia i 1 x i+1 for i = 2,..., p 1, where A i = D Dˆr i (ˆr idˆr i ) 1ˆr id, with A 0 = D, where ˆr 1 = y 1 and ˆr i = y i ˆr i 1Dy i ˆr i 1Dy i 1 ˆr i 1 for i = 2,..., p 1, with D = I n X(X X) 1 X.

33 4.4. Proof of unbiasedness and consistency Proof of unbiasedness and consistency This section provides the proof of unbiasedness and consistency of the estimators presented above. Since the structure of the estimators is similar to an ordinary normal model, the proofs will be similar to the ones presented in Chapter 3. We start by proving that the estimators are unbiased. Theorem 20. The estimators from Proposition 6 are unbiased. Proof. The estimators ˆB, ˆσ ii for i = 1,..., p and ˆσ 12 coincide with the corrected maximum likelihood and are thus according to Theorem 12 unbiased. Therefore it remains to prove that ˆσ i,i+1 are unbiased for i = 2,..., p 1. When the estimators are derived in the Appendix, they are conditioned on the previous y:s. That is the calculation of ˆσ i,i+1 assumes y 1,..., y i 1 to be known constants. Therefore, the matrix A i below can be considered as a non-random matrix. We can consider ˆσ i,i+1 as a bilinear form and calculate its expectation as ( ) 1 E(ˆσ i,i+1 ) = E ( = E 1 n k 1 y ia i 1 y i+1 y 1,..., y i 1 = = n k 1 y ia i 1 y i+1 ) 1 n k 1 rank(a i 1)σ i,i+1 1 n k 1 (n k 1)σ i,i+1 = σ i,i+1, = 1 n 2 E(y ia i 1 y i+1 y 1,..., y i 1 ) according to Theorem 14 and Theorem 19. Thus E(ˆσ i,i+1 ) = σ i,i+1 and the theorem has been proved. The proof for consistency follow the same structure as the proof above but instead uses that the estimators are unbiased and study the variance of the estimators. Theorem 21. The estimators from Proposition 6 are consistent. Proof. The estimators ˆµ i, ˆσ ii for i = 1,..., p and ˆσ 12 coincide with the corrected maximum likelihood and are thus according to Theorem 12 consistent. Therefore it remains to prove that ˆσ i,i+1 are consistent for i = 2,..., p 1. When the estimators are derived in Appendix, they are conditioned on the previous y:s. That is the calculation of ˆσ i,i+1 assumes y 1,..., y i 1 to be known constants. Therefore, the matrix A i below can be considered as a non-random matrix. We can then consider ˆσ i,i+1 as a bilinear form and calculate its variance as 1 var(ˆσ i,i+1 ) = var( n 2 y ia i 1 y i+1 ) 1 1 = var( n 2 y ia i 1 y i+1 y 1,..., y i 1 ) = (n 2) 2 var(y ia i 1 y i+1 y 1,..., y i 1 ) = 1 (n 2) 2 rank(a i 1)σ 2 i,i+1 + (A 2 )σ ii σ i+1,i+1

34 22 Chapter 4. Generalization to a general linear model = 1 (n 2) 2 (A)(σ2 i,i+1 + σ ii σ i+1,i+1 ) = σ2 i,i+1 + σ iiσ i+1,i+1, n 2 according to Theorem 14, Theorem 19 and since rank(a) = tr(a). One can see that, when n the var(ˆσ i,i+1 ) 0. Estimators which are both unbiased and whose variance tends to 0 when n are consistent according to Theorem 4. Thus the theorem has been proved. We have presented and proved that these explicit estimators can be used with good qualities for a general linear model.

35 Chapter 5 Simulations In this chapter we display some simulations of the unbiased covariance matrix estimator presented in Proposition 3 and Proposition 6 and compare them to the previous estimator in Proposition 1 and Proposition 5. This will give an idea about how much better the improved estimators perform. 5.1 Simulations of the regular normal distribution ( ) In this section we assumed that x N 4 µ, Σ (4) where and (1) 1 µ = Σ = In this simulation a sample of size n = 20 observations was randomly generated using MATLAB Version Then the unbiased explicit estimators and the previous explicit estimators were calculated in each simulation. This was repeated times and the average values of the obtained estimator were calculated. Based on the average, explicit unbiased average estimators are given by, ˆΣ new = , and the previous estimators are given by, ˆΣ prev = Karlsson,

36 24 Chapter 5. Simulations In this simulation the unbiased estimators perform considerably better than the previous ones. This is in line with the theory presented. 5.2 Simulations of the estimators for a general linear model ( In this section we assumed that Y = XB + E N n,5 XB, I n, Σ (5) Σ = (1) ) where For each simulation the matrices X and B were randomly generated using MATLAB:s randomizer routine to avoid any effect on the estimation process. In this simulation a sample of size n = 80 observations was randomly generated using MATLAB Version Then the unbiased explicit estimators and the previous explicit estimators were calculated in each simulation. This was repeated times and the average values of the obtained estimator were calculated. Based on the average unbiased explicit average estimators are given by, ˆΣ new = , and the previous estimators are given by, ˆΣ prev = In this simulation the unbiased estimators perform vastly better than the previous estimators. This is also in line with the theory since the adjustment for unbiasedness is larger than for an ordinary normal distribution.

37 Chapter 6 Discussion and further research 6.1 Discussion This thesis presents unbiased and consistent estimators for a covariance matrix with a banded structure of order one. This progress makes it more suitable to use in a real situation since the property of unbiasedness is highly desired in combination with the generalization into a general linear model. The simulation study in the Chapter 5 shows that the effect is prominent and since the estimator in the earlier paper [15] was comparable to other used methods this improved version should be efficient as well. 6.2 Further research The main direction for future research in this area is a way to extend these results into a banded covariance matrix of any order. This would provide explicit estimators with good qualities. Another step is to compare these estimators with the MLEs for a banded covariance matrix and determine how well it performs in comparison to that. Lastly a direction of research could be to study the efficiency of the estimator which would give a comparison to all other possible unbiased estimator of a banded covariance matrix. Karlsson,

38 26 Chapter 6. Discussion and further research

39 Appendix A Motivation of the estimator In this chapter the motivation of the explicit estimator presented in [15] will be presented. Here follows some notation specific for the appendix. We will define M ji k as the matrix obtained when the j th row and the i th column have been removed from Σ k. Moreover, we will partition the matrix Σ (m) (k) as ( ) Σ (m) (k) = Σ (m) (k 1) σ 1k σ k1 σ kk, where σ k1 = (0,..., 0, σ k,k m,..., σ k,k 1 ). For a matrix X, the notation X i:j will be used for the matrix including rows from i to j, i.e., X i:j = (x i, x i+1,..., x j ). The estimators are based on the likelihood. However, instead of maximizing the complete likelihood we factor the likelihood and maximize each term. In this way explicit estimators are obtained. By conditioning the probability density equals. f(x) = f(x p X 1,..., X p 1 )... f(x m+2 X 1:m+1 )f(x 1:m+1 ). Hence, for k = m + 2,..., p partion the covariance matrix Σ (k). We have where the conditional variance equals x k X 1:k 1 N 1,n (µ k 1:k 1, σ k 1:k 1, I n ), σ k 1:k 1 = σ kk σ k1σ 1 (k 1) σ 1k, and where the conditional expectation equals µ k 1:k 1 = µ k1 n + σ k1σ 1 (k 1) x 1 µ 1 1 n. x k 1 µ k 1 1 n Karlsson,

40 28 Appendix A. Motivation of the estimator = r k0 1 n + k 1 i=k m σ ki k 1 j=1 σ ij (k 1) x j. Here σ ij (k 1) are the elements of the matrix ( ) M ji Σ 1 (k 1) = σ ij (k 1) = ( 1) i+j (k 1) i,j Σ(k 1) The first regression coefficient equals i,j. (A.1) = µ k r k0 = µ k k 1 i=k m We may rewrite equation (A.1) as where and = r k0 1 n + k 1 i=k m σ ki k 1 µ k 1:k 1 = r k0 1 n + k 1 i=k m = r k0 1 n + σ ki k 1 j=1 σ ij (k 1) µ j M ji ( 1) i+j (k 1) Σ (k 1) µ j. j=1 k 1 i=k m Σ (k 2) k 1 σ ki Σ (k 1) k 1 i=k m σ ki k 1 j=1 j=1 ( 1) i+j Σ ij (k 1) x j r ki s k 1,i = S k 1 r k, r k = (r k0, r k,k m,..., r k,k 1 ), S k 1 = (1 n, s k 1,k m,..., s k 1,k 1 ), M ji (k 1) x j Σ (k 2) Σ (k 2) r ki = σ ki Σ (k 1), for i = k m,..., k 1, k 1 M ji s k 1,i = ( 1) i+j (k 1) Σ (k 2) x j, for i = k 1,..., k 1. j=1 The proposed estimators for the regression coefficients in the k:th step are ˆr = (ˆr k0, ˆr k,k m,..., ˆr k,k 1 ) = (Ŝ k 1Ŝk 1) 1 Ŝ k 1x k, where and j=1 Ŝ k 1 = (1, ŝ k 1,k m,..., ŝ k 1,k 1 ), k 1 ji ˆM ŝ k 1,i = ( 1) i+j (k 1) ˆΣ x j, for i = k m,..., k 1. (k 2)

41 29 Here the estimators from the previous terms (1, 2,..., k 1) are inserted in ŝ k 1,i for all i = k m,..., k 1. The estimator for the conditional variance is given by σ k 1:k 1 = 1 n (x k ˆµ k 1:k 1 ) (x k ˆµ k 1:k 1 ) = 1 n x k(i n Ŝ k 1 (Ŝ k 1Ŝk 1) 1 Ŝ k 1)x k. The estimators for the original parameters may be calculated as ˆΣ (k 1) ˆσ ki = ˆr ki ˆΣ, for i = k m,..., k 1, (k 2) and ˆµ k = ˆr k0 + k 1 i=k m ji ˆM ( 1) i+j (k 1) ˆΣ ˆµ j, (k 2) ˆσ ki k 1 j=1 ˆσ kk = 1 n x k(i n Ŝ k 1 (Ŝ k 1Ŝk 1) 1 Ŝ k 1)x k + ˆσ k1 ˆΣ 1 (k 1)ˆσ 1k. It remains to show that the estimator ˆµ k is the mean of x k, i.e., ˆµ = 1 n x j 1 n for all k = 1,..., p and a proof via induction is now presented. Base step: For k = 1, 2,..., m + 1, ˆµ k = 1 n x k 1 n since the estimators are MLEs in a model with a non-structured covariance matrix. Inductive step: For some m + 1 < k 1 assume that ˆµ j = 1 n x j 1 n, for all j < k 1. Then ˆµ k = ˆr k0 + = ˆr k0 + k 1 i=k m = ˆr k0 + k 1 i=k m k 1 ji ˆM ˆσ ki ( 1) i+j (k 1) j=1 ˆΣ ˆµ j (k 1) ji ˆM ( 1) i+j (k 1) 1 ˆΣ n x j1 n (k 2) ˆr ki k 1 k 1 i=k m j=1 ˆr ki 1 n ˆx j1 = 1 nŝk 1ˆr k = 1 nŝk 1(Ŝ k 1Ŝk 1) 1 Ŝ k 1x k. Since Ŝ k 1 (Ŝ k 1Ŝk 1) 1 Ŝ k 1 is a projection on a space which contains the vector 1 n we have ˆµ k = 1 n 1 nŝk 1(Ŝ k 1Ŝk 1) 1 Ŝ k 1x k = 1 n 1 nx k. Hence, by induction all the estimators for the expectations are means, i.e., ˆµ k = 1 n x k1 n.

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution