Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Similar documents
Formulas for probability theory and linear models SF2941

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Gaussian Distribution

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

2. Matrix Algebra and Random Vectors

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis

22m:033 Notes: 7.1 Diagonalization of Symmetric Matrices

Basic Concepts in Matrix Algebra

Linear Algebra: Characteristic Value Problem

Exercise Set 7.2. Skills

TAMS39 Lecture 2 Multivariate normal distribution

1 Last time: least-squares problems

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

18 Bivariate normal distribution I

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Remark By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

a 11 a 12 a 11 a 12 a 13 a 21 a 22 a 23 . a 31 a 32 a 33 a 12 a 21 a 23 a 31 a = = = = 12

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Linear Algebra Primer

Lecture Notes on the Gaussian Distribution

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

The Multivariate Normal Distribution


Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Notes on Random Vectors and Multivariate Normal

Fundamentals of Engineering Analysis (650163)

Remark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

Lecture 11. Multivariate Normal theory

MAC Module 12 Eigenvalues and Eigenvectors. Learning Objectives. Upon completing this module, you should be able to:

MAC Module 12 Eigenvalues and Eigenvectors

Principal Components Theory Notes

and let s calculate the image of some vectors under the transformation T.

Math 3191 Applied Linear Algebra

Continuous Random Variables

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Gaussian random variables inr n

MAC Module 2 Systems of Linear Equations and Matrices II. Learning Objectives. Upon completing this module, you should be able to :

Gaussian Models (9/9/13)

8. Diagonalization.

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Linear Algebra (Review) Volker Tresp 2018

Random Variables and Their Distributions

Recall : Eigenvalues and Eigenvectors

STAT 501 Assignment 1 Name Spring 2005

The Singular Value Decomposition

3. Probability and Statistics

Chapter 3. Determinants and Eigenvalues

Lecture 14: Multivariate mgf s and chf s

Spectral Theorem for Self-adjoint Linear Operators

1 Inner Product and Orthogonality

1 Data Arrays and Decompositions

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Review (Probability & Linear Algebra)

Random Vectors and Multivariate Normal Distributions

c c c c c c c c c c a 3x3 matrix C= has a determinant determined by

MATRICES. a m,1 a m,n A =

Multivariate Distributions

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Eigenvectors and SVD 1

Quantum Computing Lecture 2. Review of Linear Algebra

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Vectors and Matrices Statistics with Vectors and Matrices

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

Linear Systems. Class 27. c 2008 Ron Buckmire. TITLE Projection Matrices and Orthogonal Diagonalization CURRENT READING Poole 5.4

Therefore, A and B have the same characteristic polynomial and hence, the same eigenvalues.

Let X and Y denote two random variables. The joint distribution of these random

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015

7. Symmetric Matrices and Quadratic Forms

Matrix Algebra: Summary

Linear Algebra Primer

conditional cdf, conditional pdf, total probability theorem?

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

16.584: Random Vectors

Chapter 5 continued. Chapter 5 sections

Large Scale Data Analysis Using Deep Learning

III - MULTIVARIATE RANDOM VARIABLES

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

A Probability Review

Math 313 Chapter 1 Review

EE4601 Communication Systems

MATH 369 Linear Algebra

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Introduction to Probability and Stocastic Processes - Part I

Homework For each of the following matrices, find the minimal polynomial and determine whether the matrix is diagonalizable.

Gaussian vectors and central limit theorem

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Lecture 10 - Eigenvalues problem

Online Exercises for Linear Algebra XM511

The following definition is fundamental.

1 Principal component analysis and dimensional reduction

Lecture 5: Moment Generating Functions

Bivariate distributions

Transcription:

Multivariate Gaussian Distribution Auxiliary notes for Time Series Analysis SF2943 Spring 203 Timo Koski Department of Mathematics KTH Royal Institute of Technology, Stockholm

2

Chapter Gaussian Vectors. Multivariate Gaussian Distribution Let us recall the following; X is a normal random variable, if where µ is real and σ > 0. Notation: X Nµ,σ 2. f X x = σ 2π e 2σ 2 x µ2, Properties: X Nµ,σ 2 EX = µ, Var = σ 2. X Nµ,σ 2, then the moment generating function is and the characteristic function is ψ X t = E [ e tx] = e tµ+ 2 t2 σ 2,. ϕ X t = E [ e itx] = e itµ 2 t2 σ 2..2 X Nµ,σ 2 Y = ax +b Naµ+b,a 2 σ 2. X Nµ,σ 2 Z = X µ σ N0,... Notation for Vectors, Mean Vector, Covariance Matrix & Characteristic Functions An n random vector or a multivariate random variable is denoted by X X 2 X =. = X,X 2,...,X n, X n 3

4 where is the vector transpose. A vector in R n is designated by x x 2 x =. = x,x 2,...,x n. x n We denote by F X x the joint distribution function of X, which means that F X x = PX x = PX x,x 2 x 2,...,X n x n. The following definitions are natural. We have the mean vector E[X ] E[X 2 ] µ X = E[X] =., E[X n ] which is a n column vector of means =expected values of the components of X. The covariance matrix is a square n n -matrix [ C X := E X µ X X µ X ], where the entry at the position i,j is c i,j def = C X i,j = E[X i µ i X j µ j ], that is the covariance of X i and X j. Every covariance matrix, now designated by C, is by construction symmetric and nonnegative definite, i.e, for all x R n C = C.3 x Cx 0..4 It is shown in linear algebra that nonnegative definiteness is equivalent to detc 0. In terms of the entries c i,j of a covariance matrix C = c i,j n,n, i=,j= there are the following necessary properties.. c i,j = c j,i symmetry. 2. c i,i = VarX i = σ 2 i 0 the elements in the main diagonal are the variances, and thus all elements in the main diagonal are nonnegative. 3. c 2 i,j c i,i c j,j.

5 Example.. The covariance matrix of a bivariate random variable X = X,X 2 is often written in the following form σ 2 C = ρσ σ 2 ρσ σ 2 σ2 2,.5 where σ 2 = VarX, σ 2 2 = VarX 2 and ρ = CovX,Y/σ σ 2 is the coefficient of correlation of X and X 2. C is invertible positive definite if and only if ρ 2. The rules for finding the mean vector and the covariance matrix of a transformed vector are simple. Proposition..2 X is a random vector with mean vector µ X and covariance matrix C X. B is a m n matrix. If Y = BX+b, then Proof For simplicity of writing, take b = µ = 0. Then EY = Bµ X +b.6 C Y = BC X B..7 C Y = EYY = EBXBX = [ = EBXX B = BE XX ] B = BC X B. We have Definition.. [ ] φ X s def = E e is X = e is x df X x.8 R n is the characteristic function of the random vector X. In.8 s x is a scalar product in R n, s x = n s i x i. i= As F X is a joint distribution function on R n and R n is a notation for a multiple integral over R n, we know that R n df X x =, which means that φ X 0 =, where 0 is a n -vector of zeros.

6..2 Multivariate Normal Distribution Definition..2 X has a multivariate normal distribution with mean vector µ and covariance matrix C, written as X N µ,c, if and only if the characteristic function is given as φ X s = e is µ 2 s Cs..9 Theorem..3 X has a multivariate normal distribution N µ, C if and only of n a X = a i X i.0 has a normal distribution for all vectors a = a,a 2,...,a n. Additional properties are:. Theorem..4 If Y = BX+b, and X N µ,c, then Y N i= Bµ+b,BCB. 2. Theorem..5 A Gaussian multivariate random variable has independent components if and only if the covariance matrix is diagonal. 3. Theorem..6 If C is positive definite detc > 0, then it can be shown that there is a simultaneous density of the form f X x = 2π n/2 detc e 2 x µx C x µ X.. 4. Theorem..7 X,X 2 is a bivariate Gaussian random variable. The conditional distribution for X 2 given X = x is N µ 2 +ρ σ2 x µ,σ 2 σ 2 ρ 2,.2 where µ 2 = EX 2, µ = EX 2, σ 2 = VarX 2, σ = VarX and ρ = CovX,X 2 /σ σ 2. Proof is done by an explicit evaluation of. followed by an explicit evaluation of the pertinent conditional density. Definition..3 Z N 0,I is a standard Gaussian vector, where I is n n identity matrix.

7 Let X N µ X,C. Then, if C is positive definite, we can factorize C as C = AA, for n n matrix A, where A is lower triangular. Actually we can always decompose C = LDL, where L is a unique n n lower triangular, D is diagonal with positive elements on the main diagonal, and we write A = L D. Then A is lower triangular. Then Z = A X µ X is a standard Gaussian vector. In some applications, like, e.g., in time series analysis and signal processing, one refers to A as a whitening matrix. It can be shown that A is lower triangular, thus we have obtained Z by a causal operation, in the sense that Z i is a function of X,...,X i. Z is known as the innovations of X. Conversely, one goes from the innovations to X through another causal operation by X = AZ+b, and then X = N b,aa. Example..8 Factorization of a 2 2 Covariance Matrix Let X N µ,c. X 2 Let Z och Z 2 be independent N0,. We consider the lower triangular matrix B = σ 0 ρσ 2 σ 2 ρ 2,.3 which clearly has an inverse, as soon as ρ ±. Moreover, one verifies that C = B B, when we write C as in.5. Then we get X X 2 Z = µ+b,.4 Z 2 where, of course, 0 N Z 2 0 Z 0, 0.

8.2 Partitioned Covariance Matrices Assume that X, n, is partitioned as X = X,X 2, where X is p and X 2 is q, n = q +p. Let the covariance matrix C be partitioned in the sense that Σ Σ C = 2,.5 Σ 2 Σ 22 where Σ is p p, Σ 22 is q q e.t.c.. The mean is partitioned correspondingly as µ µ :=..6 µ 2 Let X N n µ,c, where N n refers to a normal distribution in n variables, C and µ are partitioned as in.5-.6. Then the marginal distribution of X 2 is X 2 N q µ 2,Σ 22, if Σ 22 is invertible. Let X N n µ,c, where C and µ are partitioned as in.5-.6. Assume that the inverse Σ 22 exists. Then the conditional distribution of X given X 2 = x 2 is normal, or, X X 2 = x 2 N p µ 2,Σ 2,.7 where µ 2 = µ +Σ 2 Σ 22 x 2 µ 2.8 and Σ 2 = Σ Σ 2 Σ 22 Σ 2. By virtue of.7 and.8 the best estimator in the mean square sense and the best linear estimator in the mean square sense are one and the same random variable..3 Gaussian Time Series {X t t T} is a Gaussian time series, if all joint distributions are multivariate Gaussian. In other words, ffor any t,...,t n and integer n the vector Here X t,x t2,...,x tn N µ t,σ t. µ t = E[X t ],E[X t2 ],...,E[X tn ] is the mean vector with components obtained from the mean function of the process {X t t T}. The matrix Σ t = {γ X t i,t j } n,n i=,j= has as its arrays the values of the ACVF of {X t t T}.

.4 Appendix: Symmetric Matrices & Orthogonal Diagonalization & Gaussian Vectors We quote some results from [] or any textbook in linear algebra. An n n matrix A is orthogonally diagonalizable, if there is an orthogonal matrix P i.e., P P =PP = I such that P AP = Λ, where Λ is a diagonal matrix. Then we have Theorem.4. If A is an n n matrix, then the following are equivalent: i A is orthogonally diagonalizable. ii A has an orthonormal set of eigenvectors. iii A is symmetric. 9 Since covariance matrices are symmetric, we have by the theorem above that all covariance matrices are orthogonally diagonalizable. Theorem.4.2 If A is a symmetric matrix, then i Eigenvalues of A are all real numbers. ii Eigenvectors from different eigenspaces are orthogonal. That is, all eigenvalues of a covariance matrix are real. Hence we have for any covariance matrix the spectral decomposition C = n λ i e i e i,.9 i= where Ce i = λ i e i. Since C is nonnegative definite, and its eigenvectors are orthonormal, 0 e i Ce i = λ i e i e i = λ i, and thus the eigenvalues of a covariance matrix are nonnegative. Let now P be an orthogonal matrix such that P C X P = Λ, and X N 0,C X, i.e., C X is a covariance matrix and Λ is diagonal with the eigenvalues of C X on the main diagonal. Then if Y = P X, we have by theorem..4 that Y N 0,Λ.

0 In other words, Y is a Gaussian vector and has by theorem..5 independent components. This method of producing independent Gaussians has several important applications. One of these is the principal component analysis. In addition, the operation is invertible, as recreates X N 0,C X from Y. X = PY.5 Appendix: Proof of.2 Let X = X,X 2 Nµ X,C, µ X = inverse of C in.5 is C = σ 2σ2 2 ρ2 µ µ 2 and C in.5 with ρ 2. The σ2 2 ρσ σ 2 ρσ σ 2 σ 2 Then we get by straightforward evaluation in. where ρ 2 Now we claim that f X x = = [ x µ σ 2π detc e 2 x µx C x µ X. 2πσ σ 2 ρ 2 e 2 Qx,x2,.20 f X2 X =x x 2 = Qx,x 2 = 2 2ρx µ x 2 µ 2 σ σ 2 + e 2 σ 2 x 2 µ 2x 2 2, σ 2 2π ] 2 x2 µ 2. a density of a Gaussian random variable X 2 X = x with the conditional expectation µ 2 x and the conditional variance σ 2 µ 2 x = µ 2 +ρ σ 2 σ x µ, σ 2 = σ 2 ρ2. To prove these assertions about f X2 X =x x 2 we set f X x = e 2σ 2 x µ 2,.2 σ 2π σ 2

and compute the ratio fx,x 2 x,x2 f Xx. We get from the above by.20 and.2 that f X,X 2 x,x 2 f X x = σ 2π 2πσ σ 2 ρ 2 e 2 Qx,x2+ 2σ 2 x µ 2, which we organize, for clarity, by introducing the auxiliary function Hx,x 2 by 2 Hx,x 2 def = 2 Qx,x 2 + 2σ 2 x µ 2. Here we have ρ 2 = [ x µ σ ρ 2 ρ 2 Hx,x 2 = 2 2ρx µ x 2 µ 2 σ σ 2 + 2 x µ 2ρx µ x 2 µ 2 σ σ 2 ρ 2 σ Evidently we have now shown Hx,x 2 = Hence we have found that ] 2 x2 µ 2 x µ σ 2 2 x 2 µ 2 ρ σ2 σ x µ σ2 2. ρ2 σ 2 x2 µ 2 + σ2 2. ρ2 2 f X,X 2 x,x 2 f X x = x 2 µ 2 ρ σ 2 x e σ µ 2 2 σ 2 2 ρ2. ρ2 σ 2 2π This establishes the properties of bivariate normal random variables claimed in.2 above.

2

Bibliography [] H. Anton & C. Rorres: Elementary Linear Algebra with Supplemental Applications. John Wiley & Sons Asia Pte Ltd, 20. 3