Multivariate Gaussian Distribution Auxiliary notes for Time Series Analysis SF2943 Spring 203 Timo Koski Department of Mathematics KTH Royal Institute of Technology, Stockholm
2
Chapter Gaussian Vectors. Multivariate Gaussian Distribution Let us recall the following; X is a normal random variable, if where µ is real and σ > 0. Notation: X Nµ,σ 2. f X x = σ 2π e 2σ 2 x µ2, Properties: X Nµ,σ 2 EX = µ, Var = σ 2. X Nµ,σ 2, then the moment generating function is and the characteristic function is ψ X t = E [ e tx] = e tµ+ 2 t2 σ 2,. ϕ X t = E [ e itx] = e itµ 2 t2 σ 2..2 X Nµ,σ 2 Y = ax +b Naµ+b,a 2 σ 2. X Nµ,σ 2 Z = X µ σ N0,... Notation for Vectors, Mean Vector, Covariance Matrix & Characteristic Functions An n random vector or a multivariate random variable is denoted by X X 2 X =. = X,X 2,...,X n, X n 3
4 where is the vector transpose. A vector in R n is designated by x x 2 x =. = x,x 2,...,x n. x n We denote by F X x the joint distribution function of X, which means that F X x = PX x = PX x,x 2 x 2,...,X n x n. The following definitions are natural. We have the mean vector E[X ] E[X 2 ] µ X = E[X] =., E[X n ] which is a n column vector of means =expected values of the components of X. The covariance matrix is a square n n -matrix [ C X := E X µ X X µ X ], where the entry at the position i,j is c i,j def = C X i,j = E[X i µ i X j µ j ], that is the covariance of X i and X j. Every covariance matrix, now designated by C, is by construction symmetric and nonnegative definite, i.e, for all x R n C = C.3 x Cx 0..4 It is shown in linear algebra that nonnegative definiteness is equivalent to detc 0. In terms of the entries c i,j of a covariance matrix C = c i,j n,n, i=,j= there are the following necessary properties.. c i,j = c j,i symmetry. 2. c i,i = VarX i = σ 2 i 0 the elements in the main diagonal are the variances, and thus all elements in the main diagonal are nonnegative. 3. c 2 i,j c i,i c j,j.
5 Example.. The covariance matrix of a bivariate random variable X = X,X 2 is often written in the following form σ 2 C = ρσ σ 2 ρσ σ 2 σ2 2,.5 where σ 2 = VarX, σ 2 2 = VarX 2 and ρ = CovX,Y/σ σ 2 is the coefficient of correlation of X and X 2. C is invertible positive definite if and only if ρ 2. The rules for finding the mean vector and the covariance matrix of a transformed vector are simple. Proposition..2 X is a random vector with mean vector µ X and covariance matrix C X. B is a m n matrix. If Y = BX+b, then Proof For simplicity of writing, take b = µ = 0. Then EY = Bµ X +b.6 C Y = BC X B..7 C Y = EYY = EBXBX = [ = EBXX B = BE XX ] B = BC X B. We have Definition.. [ ] φ X s def = E e is X = e is x df X x.8 R n is the characteristic function of the random vector X. In.8 s x is a scalar product in R n, s x = n s i x i. i= As F X is a joint distribution function on R n and R n is a notation for a multiple integral over R n, we know that R n df X x =, which means that φ X 0 =, where 0 is a n -vector of zeros.
6..2 Multivariate Normal Distribution Definition..2 X has a multivariate normal distribution with mean vector µ and covariance matrix C, written as X N µ,c, if and only if the characteristic function is given as φ X s = e is µ 2 s Cs..9 Theorem..3 X has a multivariate normal distribution N µ, C if and only of n a X = a i X i.0 has a normal distribution for all vectors a = a,a 2,...,a n. Additional properties are:. Theorem..4 If Y = BX+b, and X N µ,c, then Y N i= Bµ+b,BCB. 2. Theorem..5 A Gaussian multivariate random variable has independent components if and only if the covariance matrix is diagonal. 3. Theorem..6 If C is positive definite detc > 0, then it can be shown that there is a simultaneous density of the form f X x = 2π n/2 detc e 2 x µx C x µ X.. 4. Theorem..7 X,X 2 is a bivariate Gaussian random variable. The conditional distribution for X 2 given X = x is N µ 2 +ρ σ2 x µ,σ 2 σ 2 ρ 2,.2 where µ 2 = EX 2, µ = EX 2, σ 2 = VarX 2, σ = VarX and ρ = CovX,X 2 /σ σ 2. Proof is done by an explicit evaluation of. followed by an explicit evaluation of the pertinent conditional density. Definition..3 Z N 0,I is a standard Gaussian vector, where I is n n identity matrix.
7 Let X N µ X,C. Then, if C is positive definite, we can factorize C as C = AA, for n n matrix A, where A is lower triangular. Actually we can always decompose C = LDL, where L is a unique n n lower triangular, D is diagonal with positive elements on the main diagonal, and we write A = L D. Then A is lower triangular. Then Z = A X µ X is a standard Gaussian vector. In some applications, like, e.g., in time series analysis and signal processing, one refers to A as a whitening matrix. It can be shown that A is lower triangular, thus we have obtained Z by a causal operation, in the sense that Z i is a function of X,...,X i. Z is known as the innovations of X. Conversely, one goes from the innovations to X through another causal operation by X = AZ+b, and then X = N b,aa. Example..8 Factorization of a 2 2 Covariance Matrix Let X N µ,c. X 2 Let Z och Z 2 be independent N0,. We consider the lower triangular matrix B = σ 0 ρσ 2 σ 2 ρ 2,.3 which clearly has an inverse, as soon as ρ ±. Moreover, one verifies that C = B B, when we write C as in.5. Then we get X X 2 Z = µ+b,.4 Z 2 where, of course, 0 N Z 2 0 Z 0, 0.
8.2 Partitioned Covariance Matrices Assume that X, n, is partitioned as X = X,X 2, where X is p and X 2 is q, n = q +p. Let the covariance matrix C be partitioned in the sense that Σ Σ C = 2,.5 Σ 2 Σ 22 where Σ is p p, Σ 22 is q q e.t.c.. The mean is partitioned correspondingly as µ µ :=..6 µ 2 Let X N n µ,c, where N n refers to a normal distribution in n variables, C and µ are partitioned as in.5-.6. Then the marginal distribution of X 2 is X 2 N q µ 2,Σ 22, if Σ 22 is invertible. Let X N n µ,c, where C and µ are partitioned as in.5-.6. Assume that the inverse Σ 22 exists. Then the conditional distribution of X given X 2 = x 2 is normal, or, X X 2 = x 2 N p µ 2,Σ 2,.7 where µ 2 = µ +Σ 2 Σ 22 x 2 µ 2.8 and Σ 2 = Σ Σ 2 Σ 22 Σ 2. By virtue of.7 and.8 the best estimator in the mean square sense and the best linear estimator in the mean square sense are one and the same random variable..3 Gaussian Time Series {X t t T} is a Gaussian time series, if all joint distributions are multivariate Gaussian. In other words, ffor any t,...,t n and integer n the vector Here X t,x t2,...,x tn N µ t,σ t. µ t = E[X t ],E[X t2 ],...,E[X tn ] is the mean vector with components obtained from the mean function of the process {X t t T}. The matrix Σ t = {γ X t i,t j } n,n i=,j= has as its arrays the values of the ACVF of {X t t T}.
.4 Appendix: Symmetric Matrices & Orthogonal Diagonalization & Gaussian Vectors We quote some results from [] or any textbook in linear algebra. An n n matrix A is orthogonally diagonalizable, if there is an orthogonal matrix P i.e., P P =PP = I such that P AP = Λ, where Λ is a diagonal matrix. Then we have Theorem.4. If A is an n n matrix, then the following are equivalent: i A is orthogonally diagonalizable. ii A has an orthonormal set of eigenvectors. iii A is symmetric. 9 Since covariance matrices are symmetric, we have by the theorem above that all covariance matrices are orthogonally diagonalizable. Theorem.4.2 If A is a symmetric matrix, then i Eigenvalues of A are all real numbers. ii Eigenvectors from different eigenspaces are orthogonal. That is, all eigenvalues of a covariance matrix are real. Hence we have for any covariance matrix the spectral decomposition C = n λ i e i e i,.9 i= where Ce i = λ i e i. Since C is nonnegative definite, and its eigenvectors are orthonormal, 0 e i Ce i = λ i e i e i = λ i, and thus the eigenvalues of a covariance matrix are nonnegative. Let now P be an orthogonal matrix such that P C X P = Λ, and X N 0,C X, i.e., C X is a covariance matrix and Λ is diagonal with the eigenvalues of C X on the main diagonal. Then if Y = P X, we have by theorem..4 that Y N 0,Λ.
0 In other words, Y is a Gaussian vector and has by theorem..5 independent components. This method of producing independent Gaussians has several important applications. One of these is the principal component analysis. In addition, the operation is invertible, as recreates X N 0,C X from Y. X = PY.5 Appendix: Proof of.2 Let X = X,X 2 Nµ X,C, µ X = inverse of C in.5 is C = σ 2σ2 2 ρ2 µ µ 2 and C in.5 with ρ 2. The σ2 2 ρσ σ 2 ρσ σ 2 σ 2 Then we get by straightforward evaluation in. where ρ 2 Now we claim that f X x = = [ x µ σ 2π detc e 2 x µx C x µ X. 2πσ σ 2 ρ 2 e 2 Qx,x2,.20 f X2 X =x x 2 = Qx,x 2 = 2 2ρx µ x 2 µ 2 σ σ 2 + e 2 σ 2 x 2 µ 2x 2 2, σ 2 2π ] 2 x2 µ 2. a density of a Gaussian random variable X 2 X = x with the conditional expectation µ 2 x and the conditional variance σ 2 µ 2 x = µ 2 +ρ σ 2 σ x µ, σ 2 = σ 2 ρ2. To prove these assertions about f X2 X =x x 2 we set f X x = e 2σ 2 x µ 2,.2 σ 2π σ 2
and compute the ratio fx,x 2 x,x2 f Xx. We get from the above by.20 and.2 that f X,X 2 x,x 2 f X x = σ 2π 2πσ σ 2 ρ 2 e 2 Qx,x2+ 2σ 2 x µ 2, which we organize, for clarity, by introducing the auxiliary function Hx,x 2 by 2 Hx,x 2 def = 2 Qx,x 2 + 2σ 2 x µ 2. Here we have ρ 2 = [ x µ σ ρ 2 ρ 2 Hx,x 2 = 2 2ρx µ x 2 µ 2 σ σ 2 + 2 x µ 2ρx µ x 2 µ 2 σ σ 2 ρ 2 σ Evidently we have now shown Hx,x 2 = Hence we have found that ] 2 x2 µ 2 x µ σ 2 2 x 2 µ 2 ρ σ2 σ x µ σ2 2. ρ2 σ 2 x2 µ 2 + σ2 2. ρ2 2 f X,X 2 x,x 2 f X x = x 2 µ 2 ρ σ 2 x e σ µ 2 2 σ 2 2 ρ2. ρ2 σ 2 2π This establishes the properties of bivariate normal random variables claimed in.2 above.
2
Bibliography [] H. Anton & C. Rorres: Elementary Linear Algebra with Supplemental Applications. John Wiley & Sons Asia Pte Ltd, 20. 3