The Multivariate Normal Distribution. In this case according to our theorem

Similar documents
STAT 450. Moment Generating Functions

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2

STAT 830 The Multivariate Normal Distribution

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

Continuous Random Variables

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

3. Probability and Statistics

Lecture 11. Multivariate Normal theory

Notes on Random Vectors and Multivariate Normal

Random Vectors and Multivariate Normal Distributions

BASICS OF PROBABILITY

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

Probability Background

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Basic Concepts in Matrix Algebra

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

Chapter 4. Chapter 4 sections

Formulas for probability theory and linear models SF2941

2. Matrix Algebra and Random Vectors

In this course we: do distribution theory when ǫ i N(0, σ 2 ) discuss what if the errors, ǫ i are not normal? omit proofs.

Probability Lecture III (August, 2006)

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

ACM 116: Lectures 3 4

18.440: Lecture 28 Lectures Review

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

III - MULTIVARIATE RANDOM VARIABLES

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Section 8.1. Vector Notation

The Multivariate Normal Distribution 1

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

ECE Lecture #9 Part 2 Overview

MLES & Multivariate Normal Theory

conditional cdf, conditional pdf, total probability theorem?

The Multivariate Gaussian Distribution

Random Variables and Their Distributions

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

5 Operations on Multiple Random Variables

MAS113 Introduction to Probability and Statistics. Proofs of theorems

STAT 100C: Linear models

Gaussian random variables inr n

Introduction to Probability and Stocastic Processes - Part I

8 - Continuous random vectors

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Joint Distribution of Two or More Random Variables

18.440: Lecture 28 Lectures Review

Lecture 14: Multivariate mgf s and chf s

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

A Probability Review

Multivariate Random Variable

Vectors and Matrices Statistics with Vectors and Matrices

Multiple Random Variables

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

Gaussian vectors and central limit theorem

4. Distributions of Functions of Random Variables

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Principal Components Theory Notes

18 Bivariate normal distribution I

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

1.12 Multivariate Random Variables

The Multivariate Gaussian Distribution [DRAFT]

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Mathematical foundations - linear algebra

Elements of Probability Theory

Chp 4. Expectation and Variance

LIST OF FORMULAS FOR STK1100 AND STK1110

TAMS39 Lecture 2 Multivariate normal distribution

1 Solution to Problem 2.1

Multivariate Distributions

Chapter 5 continued. Chapter 5 sections

Sometimes can find power series expansion of M X and read off the moments of X from the coefficients of t k /k!.

Multivariate Distributions (Hogg Chapter Two)

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

2 Functions of random variables

Bivariate Distributions. Discrete Bivariate Distribution Example

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

Stat 5101 Notes: Algorithms (thru 2nd midterm)

BMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs

Review problems for MA 54, Fall 2004.

Lecture 6: Special probability distributions. Summarizing probability distributions. Let X be a random variable with probability distribution

Recall : Eigenvalues and Eigenvectors

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

Bivariate Transformations

1 Last time: least-squares problems

01 Probability Theory and Statistics Review

MULTIVARIATE DISTRIBUTIONS

Exam 2. Jeremy Morris. March 23, 2006

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Statistics 3657 : Moment Generating Functions

Stat 206: Linear algebra

Statistical Pattern Recognition

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

MATH4210 Financial Mathematics ( ) Tutorial 7

Transcription:

The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this case according to our theorem f Z (z 1,..., z p ) = 1 2π e z2 i /2 = (2π) p/2 exp{ z T z/2} ; superscript t denotes matrix transpose. Defn: X R p has a multivariate normal distribution if it has the same distribution as AZ+µ for some µ R p, some p q matrix of constants A and Z MV N q (0, I). 35

p = q, A singular: X does not have a density. A invertible: derive multivariate normal density by change of variables: X = AZ + µ Z = A 1 (X µ) So X Z = A Z X = A 1. f X (x) = f Z (A 1 (x µ)) det(a 1 ) = exp{ (x µ)t (A 1 ) T A 1 (x µ)/2} (2π) p/2 det A Now define Σ = AA T and notice that and Thus f X is Σ 1 = (A T ) 1 A 1 = (A 1 ) T A 1 det Σ = det A det A T = (det A) 2.. exp{ (x µ) T Σ 1 (x µ)/2} (2π) p/2 (det Σ) 1/2 ; the M V N(µ, Σ) density. Note density is the same for all A such that AA T = Σ. This justifies the notation MV N(µ, Σ). 36

For which µ, Σ is this a density? Any µ but if x R p then x T Σx = x T AA T x = (A T x) T (A T x) = p 1 y 2 i 0 where y = A T x. Inequality strict except for y = 0 which is equivalent to x = 0. Thus Σ is a positive definite symmetric matrix. Conversely, if Σ is a positive definite symmetric matrix then there is a square invertible matrix A such that AA T = Σ so that there is a MV N(µ, Σ) distribution. (A can be found via the Cholesky decomposition, e.g.) When A is singular X will not have a density: a such that P (a T X = a T µ) = 1; X is confined to a hyperplane. Still true: distribution of X depends only on Σ = AA T : if AA T = BB T then AZ + µ and BZ + µ have the same distribution. 37

Expectation, moments Defn: If X R p has density f then E(g(X)) = any g from R p to R. g(x)f(x) dx. FACT: if Y = g(x) for a smooth g (mapping R R) E(Y ) = = yf Y (y) dy g(x)f Y (g(x))g (x) dx = E(g(X)) by change of variables formula for integration. This is good because otherwise we might have two different values for E(e X ). Linearity: E(aX + by ) = ae(x) + be(y ) for real X and Y. 38

Defn: The r th moment (about the origin) of a real rv X is µ r = E(X r ) (provided it exists). We generally use µ for E(X). Defn: The r th central moment is µ r = E[(X µ) r ] We call σ 2 = µ 2 the variance. Defn: For an R p valued random vector X µ X = E(X) is the vector whose i th entry is E(X i ) (provided all entries exist). Fact: same idea used for random matrices. Defn: The (p p) variance covariance matrix of X is Var(X) = E [ (X µ)(x µ) T ] which exists provided each component X i has a finite second moment. 39

Example moments: If Z N(0, 1) then E(Z) = ze z2 /2 dz/ 2π = e z2 /2 2π = 0 and (integrating by parts) E(Z r ) = so that zr e z2 /2 dz/ 2π = zr 1 e z2 /2 2π + (r 1) zr 2 e z2 /2 dz/ 2π µ r = (r 1)µ r 2 for r 2. Remembering that µ 1 = 0 and µ 0 = z0 e z2 /2 dz/ 2π = 1 we find that { 0 r odd µ r = (r 1)(r 3) 1 r even. 40

If now X N(µ, σ 2 ), that is, X σz + µ, then E(X) = σe(z) + µ = µ and µ r (X) = E[(X µ) r ] = σ r E(Z r ) In particular, we see that our choice of notation N(µ, σ 2 ) for the distribution of σz + µ is justified; σ is indeed the variance. Similarly for X M V N(µ, Σ) we have X = AZ + µ with Z MV N(0, I) and E(X) = µ and Var(X) = E { (X µ)(x µ) T } = E { AZ(AZ) T } = AE(ZZ T )A T = AIA T = Σ. Note use of easy calculation: E(Z) = 0 and Var(Z) = E(ZZ T ) = I. 41

Moments and independence Theorem: If X 1,..., X p are independent and each X i is integrable then X = X 1 X p is integrable and E(X 1 X p ) = E(X 1 ) E(X p ). Moment Generating Functions Defn: The moment generating function of a real valued X is M X (t) = E(e tx ) defined for those real t for which the expected value is finite. Defn: The moment generating function of X R p is M X (u) = E[e ut X ] defined for those vectors u for which the expected value is finite. 42

Example: If Z N(0, 1) then M Z (t) = 1 2π = 1 2π = 1 2π = e t2 /2 etz z2 /2 dz e (z t)2 /2+t 2 /2 dz e u2 /2+t 2 /2 du Theorem: (p = 1) If M is finite for all t in a neighbourhood of 0 then 1. Every moment of X is finite. 2. M is C (in fact M is analytic). 3. µ k = dk dt km X(0). Note: C means has continuous derivatives of all orders. Analytic means has convergent power series expansion in neighbourhood of each t ( ɛ, ɛ). The proof, and many other facts about mgfs, rely on techniques of complex variables. 43

Characterization & MGFs Theorem: Suppose X and Y are R p valued random vectors such that M X (u) = M Y (u) for u in some open neighbourhood of 0 in R p. Then X and Y have the same distribution. The proof relies on techniques of complex variables. 44

MGFs and Sums If X 1,..., X p are independent and Y = X i then mgf of Y is product mgfs of individual X i : E(e ty ) = i E(e tx i) or M Y = M Xi. (Also for multivariate X i.) Example: If Z 1,..., Z p are independent N(0, 1) then E(e ai Z i ) = E(e a iz i ) i = i e a2 i /2 = exp( a 2 i /2) Conclusion: If Z MNV p (0, I) then M Z (u) = exp( u 2 i /2) = exp(ut u/2). Example: If X N(µ, σ 2 ) then X = σz + µ and M X (t) = E(e t(σz+µ) ) = e tµ e σ2 t 2 /2. 45

Theorem: Suppose X = AZ + µ and Y = A Z + µ where Z MV N p (0, I) and Z MV N q (0, I). Then X and Y have the same distribution if and only iff the following two conditions hold: 1. µ = µ. 2. AA T = A (A ) T. Alternatively: if X, Y each MVN then E(X) = E(Y) and Var(X) = Var(Y) imply that X and Y have the same distribution. Proof: If 1 and 2 hold the mgf of X is ( E e tt X ) = E (e tt (AZ+µ ) ( = e tt µ E e (AT t) T Z ) = e tt µ+(a T t) T (A T t) = e tt µ+t T Σt Thus M X = M Y. Conversely if X and Y have the same distribution then they have the same mean and variance. 46

Thus mgf is determined by µ and Σ. Theorem: If X MV N p (µ, Σ) then there is A a p p matrix such that X has same distribution as AZ + µ for Z MV N p (0, I). We may assume that A is symmetric and nonnegative definite, or that A is upper triangular, or that Ba is lower triangular. Proof: Pick any A such that AA T = Σ such as PD 1/2 P T from the spectral decomposition. Then AZ + µ MV N p (µ, Σ). From the symmetric square root can produce an upper triangular square root by the Gram Schmidt process: if A has rows a T 1,..., at p then let v p be a p / a T p a p. Choose v p 1 proportional to a p 1 bv p where b = a T p 1 v p so that v p 1 has unit length. Continue in this way; you automatically get a T j v k = 0 if j < k. If P has columns v 1,..., v p then P is orthogonal and AP is an upper triangular square root of Σ. 47

Variances, Covariances, Correlations Defn: The covariance between X and Y is Cov(X, Y) = E { (X µ X )(Y µ Y ) T } This is a matrix. Properties: Cov(X, X) = Var(X). Cov is bilinear: Cov(AX + BW, Y) =ACov(X, Y) + BCov(W, Y) and Cov(X, CY + DZ) =Cov(X, Y)C T + Cov(X, Z)D T 48

Properties of the M V N distribution 1: All margins are multivariate normal: if and Σ = X = µ = [ X1 X 2 [ µ1 µ 2 ] ] [ Σ11 Σ 12 Σ 21 Σ 22 then X MV N(µ, Σ) X 1 MV N(µ 1, Σ 11 ). 2: MX + ν MV N(Mµ + ν, MΣM T ): affine transformation of MVN is normal. 3: If Σ 12 = Cov(X 1, X 2 ) = 0 then X 1 and X 2 are independent. 4: All conditionals are normal: the conditional distribution of X 1 given X 2 = x 2 is MV N(µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ), Σ 11 Σ 12 Σ 1 22 Σ 21) ] 49

Proof of (1): If X = AZ + µ then X 1 = [I 0] X for I the identity matrix of correct dimension. So X 1 = ([I 0] A) Z + [I 0] µ Compute mean and variance to check rest. Proof of (2): If X = AZ + µ then MX + ν = MAZ + ν + Mµ Proof of (3): If then u = [ u1 u 2 ] M X (u) = M X1 (u 1 )M X2 (u 2 ) 50

Proof of (4): first case: assume Σ 22 has an inverse. Define Then [ W X 2 W = X 1 Σ 12 Σ 1 22 X 2 ] = Thus (W, X 2 ) T where Σ = [ I Σ12 Σ 1 22 0 I ] [ X1 X 2 is MV N(µ 1 Σ 12 Σ 1 22 µ 2, Σ ) [ Σ11 Σ 12 Σ 1 22 Σ 21 0 0 Σ 22 ] ] 51

Now joint density of W and X factors f W,X2 (w, x 2 ) = f W (w)f X2 (x 2 ) By change of variables joint density of X is f X1,X 2 (x 1, x 2 ) = cf W (x 1 Mx 2 )f X2 (x 2 ) where c = 1 is the constant Jacobian of the linear transformation from (W, X 2 ) to (X 1, X 2 ) and M = Σ 12 Σ 1 22 Thus conditional density of X 1 given X 2 = x 2 is f W (x 1 Mx 2 )f X2 (x 2 ) f X2 (x 2 ) = f W (x 1 Mx 2 ) As a function of x 1 this density has the form of the advertised multivariate normal density. 52

Specialization to bivariate case: Write Σ = where we define Note that [ σ 2 1 ρσ 1 σ 2 ρσ 1 σ 2 σ 2 2 ρ = Cov(X 1, X 2 ) Var(X 1 )Var(X 2 ) σ 2 i = Var(X i) ] Then W = X 1 ρ σ 1 σ 2 X 2 is independent of X 2. The marginal distribution of W is N(µ 1 ρσ 1 µ 2 /σ 2, τ 2 ) where τ 2 =Var(X 1 ) 2ρ σ 1 Cov(X 1, X 2 ) σ 2 ( + ρ σ ) 2 1 Var(X 2 ) σ 2 53

This simplifies to σ 2 1 (1 ρ2 ) Notice that it follows that 1 ρ 1 More generally: any X and Y : 0 Var(X λy ) = Var(X) 2λCov(X, Y ) + λ 2 Var(Y ) RHS is minimized at Minimum value is λ = Cov(X, Y ) Var(Y ) 0 Var(X)(1 ρ 2 XY ) where ρ XY = Cov(X, Y ) Var(X)Var(Y ) defines the correlation between X and Y. 54

Multiple Correlation Now suppose X 2 is scalar but X 1 is vector. Defn: Multiple correlation between X 1 and X 2 over all a 0. R 2 (X 1, X 2 ) = max ρ a T X 1,X 2 2 Thus: maximize Cov 2 (a T X 1, X 2 ) Var(a T X 1 )Var(X 2 ) = at Σ 12 Σ 21 a ( a T Σ 11 a ) Σ 22 Put b = Σ 1/2 11 a. For Σ 11 invertible problem is equivalent to maximizing where b T Qb b T b Q = Σ 1/2 11 Σ 12 Σ 21 Σ 1/2 11 Solution: find largest eigenvalue of Q. 55

Note where is a vector. Set Q = vv T v = Σ 1/2 11 Σ 12 vv T x = λx and multiply by v T to get v T x = 0 or λ = v T v If v T x = 0 then we see λ = 0 so largest eigenvalue is v T v. Summary: maximum squared correlation is R 2 (X 1, X 2 ) = vt v = Σ 21Σ 1 11 Σ 12 Σ 22 Σ 22 Achieved when eigenvector is x = v = b so a = Σ 1/2 11 Σ 1/2 11 Σ 12 = Σ 1 11 Σ 12 Notice: since R 2 is squared correlation between two scalars (a t X 1 and X 2 ) we have 0 R 2 1 Equals 1 iff X 2 is linear combination of X 1. 56

Correlation matrices, partial correlations: Correlation between two scalars X and Y is ρ XY = Cov(X, Y ) Var(X)Var(Y ) If X has variance Σ then the correlation matrix of X is R X with entries R ij = Cov(X i, X j ) Var(Xi )Var(X j ) = Σ ij Σii Σ jj If X 1, X 2 are MVN with the usual partitioned variance covariance matrix then the conditional variance of X 1 given X 2 is Σ 11 2 = Σ 11 Σ 12 Σ 1 22 Σ 21 From this define partial correlation matrix R 11 2 = (Σ 11 2 ) ij Σ11 2 ) ii Σ 11 2 ) jj Note: these are used even when X 1, X 2 are NOT MVN 57