Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

Similar documents
Lecture 11. Multivariate Normal theory

Multiple Random Variables

Random Vectors and Multivariate Normal Distributions

7. The Multivariate Normal Distribution

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

ANOVA: Analysis of Variance - Part I

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Lecture 14: Multivariate mgf s and chf s

The Multivariate Gaussian Distribution

Multivariate Random Variable

A Probability Review

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution

Chapter 3 Transformations

MAS223 Statistical Inference and Modelling Exercises

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

2. Matrix Algebra and Random Vectors

Review (Probability & Linear Algebra)

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Vectors and Matrices Statistics with Vectors and Matrices

Notes on Random Vectors and Multivariate Normal

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Lecture 15: Multivariate normal distributions

The Singular Value Decomposition

Review of Linear Algebra

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

3. Probability and Statistics

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis

Elements of Probability Theory

Multiple Random Variables

16.584: Random Vectors

Chap 3. Linear Algebra

The Multivariate Normal Distribution. In this case according to our theorem

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

18 Bivariate normal distribution I

2 Functions of random variables

Elliptically Contoured Distributions

Principal Components Theory Notes

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Repeated Eigenvalues and Symmetric Matrices

Basic Concepts in Matrix Algebra

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

Chapter 5 continued. Chapter 5 sections

[y i α βx i ] 2 (2) Q = i=1

III - MULTIVARIATE RANDOM VARIABLES

Probability Lecture III (August, 2006)

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

The Multivariate Normal Distribution 1

1 Data Arrays and Decompositions

Multivariate Time Series

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

Large Sample Properties of Estimators in the Classical Linear Regression Model

TAMS39 Lecture 2 Multivariate normal distribution

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Linear Algebra Review. Vectors

Multivariate Gaussians. Sargur Srihari

Chapter 5. Chapter 5 sections

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as

CHAPTER 8: Matrices and Determinants

Multivariate Distributions (Hogg Chapter Two)

1 Inner Product and Orthogonality

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Multivariate Statistical Analysis

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

The Multivariate Gaussian Distribution [DRAFT]

Symmetric matrices and dot products

Introduction to Probability Theory

Exam 2. Jeremy Morris. March 23, 2006

Math 308 Practice Final Exam Page and vector y =

Lecture 21: Convergence of transformations and generating a random variable

01 Probability Theory and Statistics Review

10. Joint Moments and Joint Characteristic Functions

Week Quadratic forms. Principal axes theorem. Text reference: this material corresponds to parts of sections 5.5, 8.2,

22m:033 Notes: 7.1 Diagonalization of Symmetric Matrices

EXERCISES ON DETERMINANTS, EIGENVALUES AND EIGENVECTORS. 1. Determinants

Lecture Note 1: Probability Theory and Statistics

Chapter 5,6 Multiple RandomVariables

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

MA 265 FINAL EXAM Fall 2012

conditional cdf, conditional pdf, total probability theorem?

Recall the convention that, for us, all vectors are column vectors.

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Mathematical foundations - linear algebra

Multivariate Distributions

MTH 2032 SemesterII

Math Linear Algebra II. 1. Inner Products and Norms

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

This appendix provides a very basic introduction to linear algebra concepts.

This property turns out to be a general property of eigenvectors of a symmetric A that correspond to distinct eigenvalues as we shall see later.

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Transcription:

Probability Theory Linear transformations A transformation is said to be linear if every single function in the transformation is a linear combination. Chapter 5 The multivariate normal distribution When dealing with linear transformations it is convenient to use matrix notation. A linear transformation can then be written as Y=BX+b where 1 2 The mean vector and the covariance matrix Definition 2.1. Let X be a random n-vector whose components have finite variance. The mean vector of X is μ=e(x), and the covariance matrix of X is Λ=E(X-μ)(X-μ). The mean vector and the covariance matrix For the covariance matrix of X, it follows that When dealing with random vectors and matrices expectations are taken componentwise, which means that That is, the elements of the mean vector are the means of the components of X. 3 4 1

Expectations for linear transformations Expectations for linear transformations Proof (the covariance matrix) Theorem 2.2. Let X be a random n-vector with mean vector μ and covariance matrix Λ. Further, let B be an m n matrix, let b be a constant m-vector, and set Y=BX+b. Then and Proof (the covariance matrix). Because of the fact that multiplicative constant matrices can be moved outside of the expectation it follows that 5 6 The multivariate normal distribution Definition I Exercise 5.3.2 Definition I. The random n-vector X is normal iff, for every n-vector a, the linear combination a X is (univariate) normal. Notation. The notation X N(µ,Λ) is used to denote that X has a multivariate normal distribution with mean vector μ and covariance matrix Λ. Let X = (X 1,X 2 ) be a normal random vector distributed as What is the joint distribution of Y 1 =X 1 +X 2 and Y 2 =2X 1-3X 2? Since Theorem 3.1. Let X N(µ,Λ) and set Y=BX+b. Then Y N(Bµ+b, BΛB ) Proof. The correctness of the mean vector and the covariance matrix follows directly from Theorem 2.2. Next we prove that every linear combination of Y is normal by showing that a linear combination of Y is another linear combination of X. it follows from Theorem 3.1 that 7 8 2

The multivariate normal distribution Definition II: Transforms The moment generating function of a random vector X is given by Proof of Theorem 4.2. Definition I implies Definition II Let X be N(µ,Λ) by Definition I. The mgf of X is given by Definition II. The random vector X is normal, N(µ,Λ), iff its moment generating function is on the form and since Y = t X is a linear combination of X, it follows from Definition I that Y is (univariate) normal and therefore has a moment generating function. Furthermore, it follows from Theorem 2.2 that E(Y)=t μ and Var(Y)=t Λt. Theorem 4.2. Definition I and Definition II are equivalent. The meaning. If every linear combination of X is univariate normal then the moment generating function of X is on the form given above. If, on the other hand, the moment generating function of X is on the form given above, then every linear combination of X is univariate normal. Hence and the first part of the proof is established. 9 10 Properties of symmetric matrices Properties of non-negative definite symmetric matrices Definition. A symmetric matrix A is said to be positive-definite if for all x 0 the quadratic form x Ax is positive. If for all x the quadratic form is non-negative then A is said to be nonnegative-definite (or positive-semidefinite). Theorem 2.1. Every covariance matrix Λ is nonnegativedefinite. Proof. Let X be a random vector whose covariance matrix is Λ, and now study the linear combination y X. By Theorem 2.2 Orthogonal matrices. A symmetric matrix C is an orthogonal matrix if C C=I, where I is the identity matrix. It follows that the rows (and columns) of an orthogonal matrix is orthonormal, that is, they all have unit length and they are all pairwise orthogonal. Diagonal matrices. A symmetric matrix D is a diagonal matrix if the diagonal elements are the only non-zero elements of D. Diagonalization. Let A be a symmetric matrix. Then there exists an orthogonal matrix C and a diagonal matrix D such that A=CDC. Furthermore, the diagonal elements of D are the eigenvalues of A. and the theorem is proved. The square root. Let A be a symmetric matrix. The square root of A is a matrix (usually denoted) A 1/2 where A 1/2 A 1/2 = A. It follows from the diagonalization of A that A 1/2 = CD 1/2 C. 11 12 3

Proof of Theorem 4.2. Definition II implies Definition I Let Y 1,,Y n be independent N(0,1), that is, Y = (Y 1,,Y n ) are N(0,I) by Definition I. The moment generating function of Y is given by Proof of Theorem 4.2. Definition II implies Definition I The moment generating function of X is given by Next we let X = Λ 1/2 Y + µ and since this is a linear transformation of Y it follows from Theorem 2.2 that which is the mgf given in Definition II. Since it is clear that any linear combination of X is another linear combination of Y, which means that X is normal, N(µ,Λ), according to Definition I. 13 14 Problem 5.10.30 (part 1) Let X₁,X₂, and X₃ have joint moment generating function as follows: Problem 5.10.30 (part 1) Find the joint distribution of Y 1 =X 1 +X 3 and Y 2 =X 1 +X 2, that is, the distribution of the linear transformation Y=BX where Since By Definition II it follows that X₁,X₂, and X₃ are jointly normal it follows from Theorem 3.1 that 15 16 4

Important properties of determinants 1. A square matrix A is invertible iff det A 0. 2. Fortheidentity matrix I we have that det I = 1. 3. For the transpose of A we have that det A = det A. 4. Let A and B be square matrices. Then det AB = det A det B. 5. Results 2. and 4. now imply that det A -1 = (det A) -1. 6. Let C be an orthogonal matrix. Results 2., 3., and 4. now imply that det C = ±1. 7. Since a symmetric matrix A can be diagonalized as A=CDC it follows by results 4. and 6. that det A = det D = λ 1 λ 2 λ n, where λ 1,λ 2,,λ n are the eigenvalues of A. 17 The multivariate normal distribution Definition III: The density function Definition III. The random vector X is normal, N(µ,Λ), (where det Λ > 0) iff its density function is on the form Theorem 5.2. Definitions I, II, and III are equivalent (in the nonsingular case). Idea for the proof. First we find a normal random vector Y whose density function is easy to derive. Then a suitably defined linear transformation X = BY will be N(µ,Λ). Finally the transformation theorem (Theorem 1.2.1) will give us the density function of X. 18 Proof of Theorem 5.2 Step 1. Find a normal random vector Y whose density function is easy to derive Let Y 1,,Y n be independent N(0,1). Then, by Definition I, Y = (Y 1,,Y n ) is N(0,I). The density function of Y is given by Proof of Theorem 5.2 Step 2. We know from before that X = Λ 1/2 Y + µ is N(µ,Λ). Step 3. Find the density function of X. Recall Theorem 1.2.1. Step 3.1. Inversion yields that Y = Λ -1/2 (X - µ). Step 3.2. Since it is a linear transformation, the Jacobian becomes Step 3.3. Finally, it follows from Theorem 1.2.1. that 19 20 5

Problem 5.10.30 (part 2) In the first part of the problem we found that Since det Λ = 4 10-2 2 = 36 and Conditional distributions General situation. Let X be N(µ,Λ) with det Λ > 0. Furthermore, let X 1 and X 2 be subvectors of X where the components of X 1 and X 2 are assumed to be different. By definition iti it follows from Definition III that the density of Y is given by Can anything be said about the distribution of X 2 2 X 1=x 1? Answer. YES! Conditional distributions of multivariate normal distributions are normal. 21 22 Problem 5.10.30 (part 3) Independence Find the conditional density of Y 1 given that Y 2 =1, that is, find f Y1 Y 2 =1(y 1 ). Natural question 1. Is there an easy way to determine whether the components of a normal random vector are independent? Theorem 7.1. Let X be a normal random vector. The components of X are independent iff they are uncorrelated. Proof. Show that uncorrelated components imply independence. Since it follows that which means that Hence, the conditional distribution of Y 1 given that Y 2 =1 is N(4/5, 18/5). 23 24 6

Problem 5.10.10 Suppose that the moment generating function of (X,Y) is Problem 5.10.10 Since (U,V) is a linear transformation of (X,Y) it is clear that (U,V) is also bivariate normal. The covariance matrix of (U,V) is given by Determine a so that U=X+2Y and V=2X-Y become independent. Since It is, however, by Theorem 7.1, enough to determine an off-diagonal element. it follows from Definition II that and it is thus clear that only for a=4/3 will U and V be independent. 25 26 Independence and linear transformations Natural question 2. A linear transformation of a normal random vector is itself normal. Is it always possible to find a linear transformation that will have uncorrelated, and hence, independent components? Theorem 8.1. Let X be N(µ,Λ). Furthermore, let C be the orthogonal matrix that diagonalizes Λ, that is, C ΛC = D, where the diagonal elements of D are the eigenvalues of Λ. Then Y = C X is N(C μ, D). Theorem 8.2. Let X be N(µ,σ 2 I). Furthermore, let C be an arbitrary orthogonal matrix. Then Y = C X is N(C μ, σ 2 I). Problem 5.10.9 b Let X and Y be independent N(0,σ 2 ). Show that X+Y and X-Y are independent normal random variables. Since X and Y are independent we have that (X,Y) is bivariate normal, N(0,σ 2 I). Furthermore and because of the fact that Conclusion. For the general N(µ,Λ) it always exists one orthogonal transformation that will yield a normal random vector with independent components. For the special case N(µ,σ 2 I) any orthogonal transformation will produce a normal random vector with independent components. 27 it follows from Exercise 8.2 that the components of (X+Y,X-Y) are independent normal random variables. 28 7

Problem 5.10.37 Problem 5.10.37 Let Since det Λ = 1-ρ² and where ρ is the correlation coefficient. Determine the probability distribution of it follows that the joint density function of X and Y is given by The moment generating function of W is defined by and in order to find it we first have to find the joint density of X and Y. It follows by the density of (X,Y) that the main part of the expression for the moment generating function of W is given by Q where 29 30 Problem 5.10.37 Problem 5.10.37 It follows that the moment generating function of W is given by Since Q is the main part of a multivariate normal density function where and and it is clear that W is χ 2 (2). 31 32 8

The multivariate normal distribution and the Chi-square distribution Theorem 9.1. Let X be N(µ,Λ) with det Λ > 0. Then where n is the dimension of X. Proof. Set Y = Λ -1/2 (X - µ). Then Y is N(0,I) and it follows that and since where Y 1,Y 2,,Y n are i.i.d. N(0,1) it is clear that Y Y is χ 2 (n). 33 9