Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

Probability Theory Linear transformations A transformation is said to be linear if every single function in the transformation is a linear combination. Chapter 5 The multivariate normal distribution When dealing with linear transformations it is convenient to use matrix notation. A linear transformation can then be written as Y=BX+b where 1 2 The mean vector and the covariance matrix Definition 2.1. Let X be a random n-vector whose components have finite variance. The mean vector of X is μ=e(x), and the covariance matrix of X is Λ=E(X-μ)(X-μ). The mean vector and the covariance matrix For the covariance matrix of X, it follows that When dealing with random vectors and matrices expectations are taken componentwise, which means that That is, the elements of the mean vector are the means of the components of X. 3 4 1

Expectations for linear transformations Expectations for linear transformations Proof (the covariance matrix) Theorem 2.2. Let X be a random n-vector with mean vector μ and covariance matrix Λ. Further, let B be an m n matrix, let b be a constant m-vector, and set Y=BX+b. Then and Proof (the covariance matrix). Because of the fact that multiplicative constant matrices can be moved outside of the expectation it follows that 5 6 The multivariate normal distribution Definition I Exercise 5.3.2 Definition I. The random n-vector X is normal iff, for every n-vector a, the linear combination a X is (univariate) normal. Notation. The notation X N(µ,Λ) is used to denote that X has a multivariate normal distribution with mean vector μ and covariance matrix Λ. Let X = (X 1,X 2 ) be a normal random vector distributed as What is the joint distribution of Y 1 =X 1 +X 2 and Y 2 =2X 1-3X 2? Since Theorem 3.1. Let X N(µ,Λ) and set Y=BX+b. Then Y N(Bµ+b, BΛB ) Proof. The correctness of the mean vector and the covariance matrix follows directly from Theorem 2.2. Next we prove that every linear combination of Y is normal by showing that a linear combination of Y is another linear combination of X. it follows from Theorem 3.1 that 7 8 2

The multivariate normal distribution Definition II: Transforms The moment generating function of a random vector X is given by Proof of Theorem 4.2. Definition I implies Definition II Let X be N(µ,Λ) by Definition I. The mgf of X is given by Definition II. The random vector X is normal, N(µ,Λ), iff its moment generating function is on the form and since Y = t X is a linear combination of X, it follows from Definition I that Y is (univariate) normal and therefore has a moment generating function. Furthermore, it follows from Theorem 2.2 that E(Y)=t μ and Var(Y)=t Λt. Theorem 4.2. Definition I and Definition II are equivalent. The meaning. If every linear combination of X is univariate normal then the moment generating function of X is on the form given above. If, on the other hand, the moment generating function of X is on the form given above, then every linear combination of X is univariate normal. Hence and the first part of the proof is established. 9 10 Properties of symmetric matrices Properties of non-negative definite symmetric matrices Definition. A symmetric matrix A is said to be positive-definite if for all x 0 the quadratic form x Ax is positive. If for all x the quadratic form is non-negative then A is said to be nonnegative-definite (or positive-semidefinite). Theorem 2.1. Every covariance matrix Λ is nonnegativedefinite. Proof. Let X be a random vector whose covariance matrix is Λ, and now study the linear combination y X. By Theorem 2.2 Orthogonal matrices. A symmetric matrix C is an orthogonal matrix if C C=I, where I is the identity matrix. It follows that the rows (and columns) of an orthogonal matrix is orthonormal, that is, they all have unit length and they are all pairwise orthogonal. Diagonal matrices. A symmetric matrix D is a diagonal matrix if the diagonal elements are the only non-zero elements of D. Diagonalization. Let A be a symmetric matrix. Then there exists an orthogonal matrix C and a diagonal matrix D such that A=CDC. Furthermore, the diagonal elements of D are the eigenvalues of A. and the theorem is proved. The square root. Let A be a symmetric matrix. The square root of A is a matrix (usually denoted) A 1/2 where A 1/2 A 1/2 = A. It follows from the diagonalization of A that A 1/2 = CD 1/2 C. 11 12 3

Proof of Theorem 4.2. Definition II implies Definition I Let Y 1,,Y n be independent N(0,1), that is, Y = (Y 1,,Y n ) are N(0,I) by Definition I. The moment generating function of Y is given by Proof of Theorem 4.2. Definition II implies Definition I The moment generating function of X is given by Next we let X = Λ 1/2 Y + µ and since this is a linear transformation of Y it follows from Theorem 2.2 that which is the mgf given in Definition II. Since it is clear that any linear combination of X is another linear combination of Y, which means that X is normal, N(µ,Λ), according to Definition I. 13 14 Problem 5.10.30 (part 1) Let X₁,X₂, and X₃ have joint moment generating function as follows: Problem 5.10.30 (part 1) Find the joint distribution of Y 1 =X 1 +X 3 and Y 2 =X 1 +X 2, that is, the distribution of the linear transformation Y=BX where Since By Definition II it follows that X₁,X₂, and X₃ are jointly normal it follows from Theorem 3.1 that 15 16 4

Important properties of determinants 1. A square matrix A is invertible iff det A 0. 2. Fortheidentity matrix I we have that det I = 1. 3. For the transpose of A we have that det A = det A. 4. Let A and B be square matrices. Then det AB = det A det B. 5. Results 2. and 4. now imply that det A -1 = (det A) -1. 6. Let C be an orthogonal matrix. Results 2., 3., and 4. now imply that det C = ±1. 7. Since a symmetric matrix A can be diagonalized as A=CDC it follows by results 4. and 6. that det A = det D = λ 1 λ 2 λ n, where λ 1,λ 2,,λ n are the eigenvalues of A. 17 The multivariate normal distribution Definition III: The density function Definition III. The random vector X is normal, N(µ,Λ), (where det Λ > 0) iff its density function is on the form Theorem 5.2. Definitions I, II, and III are equivalent (in the nonsingular case). Idea for the proof. First we find a normal random vector Y whose density function is easy to derive. Then a suitably defined linear transformation X = BY will be N(µ,Λ). Finally the transformation theorem (Theorem 1.2.1) will give us the density function of X. 18 Proof of Theorem 5.2 Step 1. Find a normal random vector Y whose density function is easy to derive Let Y 1,,Y n be independent N(0,1). Then, by Definition I, Y = (Y 1,,Y n ) is N(0,I). The density function of Y is given by Proof of Theorem 5.2 Step 2. We know from before that X = Λ 1/2 Y + µ is N(µ,Λ). Step 3. Find the density function of X. Recall Theorem 1.2.1. Step 3.1. Inversion yields that Y = Λ -1/2 (X - µ). Step 3.2. Since it is a linear transformation, the Jacobian becomes Step 3.3. Finally, it follows from Theorem 1.2.1. that 19 20 5

Problem 5.10.30 (part 2) In the first part of the problem we found that Since det Λ = 4 10-2 2 = 36 and Conditional distributions General situation. Let X be N(µ,Λ) with det Λ > 0. Furthermore, let X 1 and X 2 be subvectors of X where the components of X 1 and X 2 are assumed to be different. By definition iti it follows from Definition III that the density of Y is given by Can anything be said about the distribution of X 2 2 X 1=x 1? Answer. YES! Conditional distributions of multivariate normal distributions are normal. 21 22 Problem 5.10.30 (part 3) Independence Find the conditional density of Y 1 given that Y 2 =1, that is, find f Y1 Y 2 =1(y 1 ). Natural question 1. Is there an easy way to determine whether the components of a normal random vector are independent? Theorem 7.1. Let X be a normal random vector. The components of X are independent iff they are uncorrelated. Proof. Show that uncorrelated components imply independence. Since it follows that which means that Hence, the conditional distribution of Y 1 given that Y 2 =1 is N(4/5, 18/5). 23 24 6

Problem 5.10.10 Suppose that the moment generating function of (X,Y) is Problem 5.10.10 Since (U,V) is a linear transformation of (X,Y) it is clear that (U,V) is also bivariate normal. The covariance matrix of (U,V) is given by Determine a so that U=X+2Y and V=2X-Y become independent. Since It is, however, by Theorem 7.1, enough to determine an off-diagonal element. it follows from Definition II that and it is thus clear that only for a=4/3 will U and V be independent. 25 26 Independence and linear transformations Natural question 2. A linear transformation of a normal random vector is itself normal. Is it always possible to find a linear transformation that will have uncorrelated, and hence, independent components? Theorem 8.1. Let X be N(µ,Λ). Furthermore, let C be the orthogonal matrix that diagonalizes Λ, that is, C ΛC = D, where the diagonal elements of D are the eigenvalues of Λ. Then Y = C X is N(C μ, D). Theorem 8.2. Let X be N(µ,σ 2 I). Furthermore, let C be an arbitrary orthogonal matrix. Then Y = C X is N(C μ, σ 2 I). Problem 5.10.9 b Let X and Y be independent N(0,σ 2 ). Show that X+Y and X-Y are independent normal random variables. Since X and Y are independent we have that (X,Y) is bivariate normal, N(0,σ 2 I). Furthermore and because of the fact that Conclusion. For the general N(µ,Λ) it always exists one orthogonal transformation that will yield a normal random vector with independent components. For the special case N(µ,σ 2 I) any orthogonal transformation will produce a normal random vector with independent components. 27 it follows from Exercise 8.2 that the components of (X+Y,X-Y) are independent normal random variables. 28 7

Problem 5.10.37 Problem 5.10.37 Let Since det Λ = 1-ρ² and where ρ is the correlation coefficient. Determine the probability distribution of it follows that the joint density function of X and Y is given by The moment generating function of W is defined by and in order to find it we first have to find the joint density of X and Y. It follows by the density of (X,Y) that the main part of the expression for the moment generating function of W is given by Q where 29 30 Problem 5.10.37 Problem 5.10.37 It follows that the moment generating function of W is given by Since Q is the main part of a multivariate normal density function where and and it is clear that W is χ 2 (2). 31 32 8

The multivariate normal distribution and the Chi-square distribution Theorem 9.1. Let X be N(µ,Λ) with det Λ > 0. Then where n is the dimension of X. Proof. Set Y = Λ -1/2 (X - µ). Then Y is N(0,I) and it follows that and since where Y 1,Y 2,,Y n are i.i.d. N(0,1) it is clear that Y Y is χ 2 (n). 33 9