Random Vectors and Multivariate Normal Distributions

Similar documents
BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

Lecture 11. Multivariate Normal theory

Lecture 15: Multivariate normal distributions

Ch4. Distribution of Quadratic Forms in y

Multivariate Distributions

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution. In this case according to our theorem

3. Probability and Statistics

Lecture 14: Multivariate mgf s and chf s

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

BASICS OF PROBABILITY

1.12 Multivariate Random Variables

Random Variables and Their Distributions

The Multivariate Normal Distribution 1

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Notes on Random Vectors and Multivariate Normal

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

8 - Continuous random vectors

01 Probability Theory and Statistics Review

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2

Probability Background

On the Distribution of a Quadratic Form in Normal Variates

Formulas for probability theory and linear models SF2941

Lecture Note 1: Probability Theory and Statistics

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Lecture 1: August 28

MIT Spring 2015

Probability and Distributions

CHAPTER 1 DISTRIBUTION THEORY 1 CHAPTER 1: DISTRIBUTION THEORY

STAT 100C: Linear models

1.1 Review of Probability Theory

Distributions of Quadratic Forms. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 31

BMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs

discrete random variable: probability mass function continuous random variable: probability density function

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

Sampling Distributions

Exam 2. Jeremy Morris. March 23, 2006

Sampling Distributions

Stat 5101 Notes: Brand Name Distributions

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

ANOVA: Analysis of Variance - Part I

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Algorithms for Uncertainty Quantification

Basic Concepts in Matrix Algebra

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

TAMS39 Lecture 2 Multivariate normal distribution

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Multiple Random Variables

Chapter 1. Matrix Algebra

STA 2101/442 Assignment 3 1

Multivariate Random Variable

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Chapter 5 continued. Chapter 5 sections

16.584: Random Vectors

Lecture 2: Repetition of probability theory and statistics

III - MULTIVARIATE RANDOM VARIABLES

Chapter 5. Chapter 5 sections

STT 843 Key to Homework 1 Spring 2018

Asymptotic Statistics-VI. Changliang Zou

A Probability Review

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

ECON Fundamentals of Probability

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Chapter 4 Multiple Random Variables

7.3 The Chi-square, F and t-distributions

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

ECON 5350 Class Notes Review of Probability and Distribution Theory

1 Review of Probability and Distributions

2 Functions of random variables

conditional cdf, conditional pdf, total probability theorem?

Probability Theory and Statistics. Peter Jochumzen

Vectors and Matrices Statistics with Vectors and Matrices

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

18 Bivariate normal distribution I

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Probability Distributions

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

MAS223 Statistical Inference and Modelling Exercises

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Section 8.1. Vector Notation

1 Appendix A: Matrix Algebra

Things to remember when learning probability distributions:

2. Matrix Algebra and Random Vectors

Linear Algebra Review

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

15 Discrete Distributions

Gaussian random variables inr n

Introduction to Probability and Stocastic Processes - Part I

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

18.440: Lecture 28 Lectures Review

Transcription:

Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random 75

variables. For instance, X = X 1 X 2., where each element represent a random variable, is a random vector. Definition 3.1.2. Mean and covariance matrix of a random vector. The mean (expectation) and covariance matrix of a random vector X is defined as follows: E [X 1 ] E [X E [X] = 2 ],. E [X n ] and cov(x) = E [{X E (X)} {X E (X)} T] σ1 2 σ 12... σ 1n σ = 21 σ 2 2... σ 2n,.... X n σ n1 σ n2... σ 2 n (3.1.1) where σ 2 j = var(x j) and σ jk = cov(x j,x k ) for j, k = 1, 2,...,n. Chapter 3 76

Properties of Mean and Covariance. 1. If X and Y are random vectors and A,B,C and D are constant matrices, then E [AXB + CY + D] = AE [X]B + CE[Y] + D. (3.1.2) Proof. Left as an excercise. 2. For any random vector X, the covariance matrix cov(x) is symmetric. Proof. Left as an excercise. 3. If X j, j = 1, 2,...,n are independent random variables, then cov(x) = diag(σj 2, j = 1, 2,...,n). Proof. Left as an excercise. 4. cov(x + a) = cov(x) for a constant vector a. Proof. Left as an excercise. Chapter 3 77

Properties of Mean and Covariance (cont.) 5. cov(ax) = Acov(X)A T for a constant matrix A. Proof. Left as an excercise. 6. cov(x) is positive semi-definite. Meaningful covariance matrices are positive definite. Proof. Left as an excercise. 7. cov(x) = E[XX T ] E[X] {E[X]} T. Proof. Left as an excercise. Chapter 3 78

Definition 3.1.3. Correlation Matrix. A correlation matrix of a vector of random variable X is defined as the matrix of pairwise correlations between the elements of X. Explicitly, 1 ρ 12... ρ 1n ρ crr(x) = 21 1... ρ 2n, (3.1.3).... ρ n1 ρ n2... 1 where ρ jk = corr(x j,x k ) = σ jk /(σ j σ k ), j, k = 1, 2,...,n. Example 3.1.1. If only successive random variables in the random vector X are correlated and have the same correlation ρ, then the correlation matrix corr(x) is given by corr(x) = 1 ρ 0... 0 ρ 1 ρ... 0 0 ρ 1... 0..... 0 0 0... 1, (3.1.4) Chapter 3 79

Example 3.1.2. If every pair of random variables in the random vector X have the same correlation ρ, then the correlation matrix corr(x) is given by corr(x) = 1 ρ ρ... ρ ρ 1 ρ... ρ ρ ρ 1... ρ,..... (3.1.5) ρ ρ ρ... 1 and the random variables are said to be exchangeable. 3.2 Multivariate Normal Distribution Definition 3.2.1. Multivariate Normal Distribution. A random vector X = (X 1,X 2,...,X n ) T is said to follow a multivariate normal distribution with mean µ and covariance matrix Σ if X can be expressed as X = AZ + µ, where Σ = AA T and Z = (Z 1,Z 2,...,Z n ) with Z i, i = 1, 2,...,n iid N(0, 1) variables. Chapter 3 80

BIOS 2083 Linear Models Abdus S. Wahed 0.25 0.3 Bivariate normal distribution with mean (0, 0) T and covariance matrix 0.3 1.0 0.4 Probability Density 0.3 0.2 0.1 0 2 0 x2 2 3 2 1 x1 0 1 2 3 Definition 3.2.2. Multivariate Normal Distribution. A random vector X = (X 1,X 2,...,X n ) T is said to follow a multivariate normal distribution with mean µ and a positive definite covariance matrix Σ if X has the density. f X (x) = [ 1 (2π) n/2 Σ 1/2exp 1 ] 2 (x µ)t Σ 1 (x µ) (3.2.1) Chapter 3 81

Properties 1. Moment generating function of a N(µ,Σ) random variable X is given by { M X (t) = exp µ T t + 1 } 2 tt Σt. (3.2.2) 2. E(X) = µ and cov(x) = Σ. 3. If X 1,X 2,...,X n are i.i.d N(0, 1) random variables, then their joint distribution can be characterized by X = (X 1,X 2,...,X n ) T N(0,I n ). 4. X N n (µ,σ) if and only if all non-zero linear combinations of the components of X are normally distributed. Chapter 3 82

Linear transformation 5. If X N n (µ,σ) and A m n is a constant matrix of rank m, then Y = Ax N p (Aµ,AΣA T ). Proof. Use definition 3.2.1 or property 1 above. Orthogonal linear transformation 6. If X N n (µ,i n ) and A n n is an orthogonal matrix and Σ = I n, then Y = Ax N n (Aµ,I n ). Chapter 3 83

Marginal and Conditional distributions Suppose X is N n (µ,σ) and X is partitioned as follows, X = X 1, where X 1 is of dimension p 1 and X 2 is of dimension n p 1. Suppose X 2 the corresponding partitions for µ and Σ are given by respectively. Then, µ = µ 1, and Σ = µ 2 Σ 11 Σ 12 Σ 21 Σ 22 7. Marginal distribution. X 1 is multivariate normal - N p (µ 1,Σ 11 ). Proof. Use the result from property 5 above. 8. Conditional distribution. The distribution of X 1 X 2 is p-variate normal - N p (µ 1 2,Σ 1 2 ), where, and provided Σ is positive definite. µ 1 2 = µ 1 + Σ 12 Σ 1 22 (X 2 µ 2 ), Σ 1 2 = Σ 11 Σ 12 Σ 1 22 Σ 21, Proof. See Result 5.2.10, page 156 (Ravishanker and Dey). Chapter 3 84

Uncorrelated implies independence for multivariate normal random variables 9. If X, µ, and Σ are partitioned as above, then X 1 and X 2 are independent if and only if Σ 12 = 0 = Σ T 21. Proof. We will use m.g.f to prove this result. Two random vectors X 1 and X 2 are independent iff M (X1,X 2 )(t 1, t 2 ) = M X1 (t 1 )M X2 (t 2 ). Chapter 3 85

3.3 Non-central distributions We will start with the standard chi-square distribution. Definition 3.3.1. Chi-square distribution. If X 1, X 2,...,X n be n independent N(0, 1) variables, then the distribution of n i=1 X2 i is χ 2 n (ch-square with degrees of freedom n). χ 2 n-distribution is a special case of gamma distribution when the scale parameter is set to 1/2 and the shape parameter is set to be n/2. That is, the density of χ 2 n is given by f χ 2 n (x) = (1/2)n/2 Γ(n/2) e x/2 x n/2 1, x 0; n = 1, 2,...,. (3.3.1) Example 3.3.1. The distribution of (n 1)S 2 /σ 2, where S 2 = n i=1 (X i X) 2 /(n 1) is the sample variance of a random sample of size n from a normal distribution with mean µ and variance σ 2, follows a χ 2 n 1. The moment generating function of a chi-square distribution with n d.f. is given by M χ 2 n (t) = (1 2t) n/2, t < 1/2. (3.3.2) The m.g.f (3.3.2 shows that the sum of two independent ch-square random variables is also a ch-square. Therefore, differences of sequantial sums of squares of independent normal random variables will be distributed independently as chi-squares. Chapter 3 86

Theorem 3.3.2. If X N n (µ,σ) and Σ is positive definite, then (X µ) T Σ 1 (X µ) χ 2 n. (3.3.3) Proof. Since Σ is positive definite, there exists a non-singular A n n such that Σ = AA T (Cholesky decomposition). Then, by definition of multivariate normal distribution, X = AZ + µ, where Z is a random sample from a N(0, 1) distribution. Now, Chapter 3 87

0.16 0.14 λ=0 0.12 0.1 λ=2 0.08 λ=4 0.06 0.04 0.02 λ=6 λ=8 λ=10 0 0 5 10 15 20 Figure 3.1: Non-central chi-square densities with df 5 and non-centrality parameter λ. Definition 3.3.2. Non-central chi-square distribution. Suppose X s are as in Definition (3.3.1) except that each X i has mean µ i, i = 1, 2,...,n. Equivalently, suppose, X = (X 1,...,X n ) T be a random vector distributed as N n (µ,i n ), where µ = (µ 1,...,µ n ) T. Then the distribution of n i=1 X2 i = X T X is referred to as non-central chi-square with d.f. n and non-centrality parameter λ = n i=1 µ2 i /2 = 1 2 µt µ. The density of such a non-central chisquare variable χ 2 n(λ) can be written as a infinite poisson mixture of central chi-square densities as follows: f χ 2 n (λ)(x) = e λ λ j (1/2) (n+2j)/2 j! Γ((n + 2j)/2) e x/2 x (n+2j)/2 1. (3.3.4) j=1 Chapter 3 88

Properties 1. The moment generating function of a non-central chi-square variable χ 2 n(λ) is given by M χ 2 n (n,λ)(t) = (1 2t) n/2 exp 2. E [ χ 2 n(λ) ] = n + 2λ. 3. V ar [ χ 2 n(λ) ] = 2(n + 4λ). 4. χ 2 n(0) χ 2 n. 5. For a given constant c, (a) P(χ 2 n(λ) > c) is an increasing function of λ. (b) P(χ 2 n(λ) > c) P(χ 2 n > c). { } 2λt, t < 1/2. (3.3.5) 1 2t Chapter 3 89

Theorem 3.3.3. If X N n (µ,σ) and Σ is positive definite, then X T Σ 1 X χ 2 n(λ = µ T Σ 1 µ/2). (3.3.6) Proof. Since Σ is positive definite, there exists a non-singular matrix A n n such that Σ = AA T (Cholesky decomposition). Define, Then, and Y = A 1 X. X = AY, X T Σ 1 X = Y T A T Σ 1 AY = Y T A T {AA T } 1 AY n = Y T Y = Yi 2. But notice that Y is a linear combination of X and hence is distributed as N n (A 1 µ,a 1 Σ { A 1} T = In ). Therefore components of Y are independently distributed. Specifically, Y i N(τ i, 1), where τ = A 1 µ. Thus, by definition of non-central chi-square, X T Σ 1 X = n i=1 Y i 2 λ = n i=1 τ2 i /2 = τt τ/2 = µ T Σ 1 µ/2. i=1 χ 2 n(λ), where Chapter 3 90

0.8 0.7 0.6 λ=0 0.5 λ=2 0.4 0.3 0.2 0.1 λ=4 λ=6 λ=8 λ=10 0 0 1 2 3 4 5 6 7 8 Figure 3.2: Non-central F-densities with df 5 and 15 and non-centrality parameter λ. Definition 3.3.3. Non-central F-distribution. If U 1 χ 2 n 1 (λ) and U 2 χ 2 n 2 and U 1 and U 2 are independent, then, the distribution of F = U 1/n 1 U 2 /n 2 (3.3.7) is referred to as non-central F-distribution with df n 1 and n 2, and noncentrality parameter λ. Chapter 3 91

0.4 0.35 0.3 0.25 0.2 0.15 λ=0 λ=2 λ=4 λ=6 λ=8 0.1 λ=10 0.05 0 0.05 5 0 5 10 15 20 Figure 3.3: Non-central t-densities with df 5 and non-centrality parameter λ. Definition 3.3.4. Non-central t-distribution. If U 1 N(λ, 1) and U 2 χ 2 n and U 1 and U 2 are independent, then, the distribution of T = U 1 U2 /n (3.3.8) is referred to as non-central t-distribution with df n and non-centrality parameter λ. Chapter 3 92

3.4 Distribution of quadratic forms Caution: We assume that our matrix of quadratic form is symmetric. Lemma 3.4.1. If A n n is symmetric and idempotent with rank r, then r of its eigenvalues are exactly equal to 1 and n r are equal to zero. Proof. Use spectral decomposition theorem. (See Result 2.3.10 on page 51 of Ravishanker and Dey). Theorem 3.4.2. Let X N n (0,I n ). The quadratic form X T AX χ 2 r iff A is idempotent with rank(a) = r. Proof. Let A be (symmetric) idempotent matrix of rank r. Then, by spectral decomposition theorem, there exists an orthogonal matrix P such that P T AP = Λ = I r 0 0 0. (3.4.1) Define Y = P T X = PT 1 X P T 2 X = Y 1 Y 2, so that P T 1 P 1 = I r. Thus, X = Chapter 3 93

PY and Y 1 N r (0,I r ). Now, X T Ax = (PY) T APY = Y T I r 0 Y 0 0 = Y T 1 Y 1 χ 2 r. (3.4.2) Now suppose X T AX χ 2 r. This means that the moment generating function of X T AX is given by M XT AX(t) = (1 2t) r/2. (3.4.3) But, one can calculate the m.g.f. of X T AX directly using the multivariate normal density as M XT AX(t) = E [ exp { (X T AX)t }] = exp { (X T AX)t } f X (x)dx = exp { (X T AX)t } [ 1 (2π) n/2exp 1 ] 2 xt x dx [ 1 = (2π) n/2exp 1 ] 2 xt (I n 2tA)x dx = I n 2tA 1/2 n = (1 2tλ i ) 1/2. (3.4.4) i=1 Equate (3.4.3) and (3.4.4) to obtain the desired result. Chapter 3 94

Theorem 3.4.3. Let X N n (µ,σ) where Σ is positive definite. The quadratic form X T AX χ 2 r(λ) where λ = µ T Aµ/2, iff AΣ is idempotent with rank(aσ) = r. Proof. Omitted. Theorem 3.4.4. Independence of two quadratic forms. Let X N n (µ,σ) where Σ is positive definite. The two quadratic forms X T AX and X T BX are independent if and only if AΣB = 0 = BΣA. (3.4.5) Proof. Omitted. Remark 3.4.1. Note that in the above theorem, the two quadratic forms need not have a chi-square distribution. When they are, the theorem is referred to as Craig s theorem. Theorem 3.4.5. Independence of linear and quadratic forms. Let X N n (µ,σ) where Σ is positive definite. The quadratic form X T AX and the linear form BX are independently distributed if and only if BΣA = 0. (3.4.6) Proof. Omitted. Remark 3.4.2. Note that in the above theorem, the quadratic form need not have a chi-square distribution. Chapter 3 95

Example 3.4.6. Independence of sample mean and sample variance. Suppose X N n (0,I n ). Then X = n i=1 X i/n = 1 T X/n and S 2 X = n i=1 (X i X) 2 /(n 1) are independently distributed. Proof. Chapter 3 96

Theorem 3.4.7. Let X N n (µ,σ). Then E [ X T AX ] = µ T Aµ + trace(aσ). (3.4.7) Remark 3.4.3. Note that in the above theorem, the quadratic form need not have a chi-square distribution. Proof. Chapter 3 97

Theorem 3.4.8. Fisher-Cochran theorem. Suppose X N n (µ,i n ). Let Q j = X T A j X, j = 1, 2,...,k be k quadratic forms with rank(a j ) = r j such that X T X = k j=1 Q j. Then, Q j s are independently distributed as χ 2 r j (λ j ) where λ j = µ T A j µ/2 if and only if k j=1 r j = n. Proof. Omitted. Theorem 3.4.9. Generalization of Fisher-Cochran theorem. Suppose X N n (µ,i n ). Let A j, j = 1, 2,...,k be k n n symmetric matrices with rank(a j ) = r j such that A = k j=1 A j with rank(a) = r. Then, 1. X T A j X s are independently distributed as χ 2 r j (λ j ) where λ j = µ T A j µ/2, and 2. X T AX χ 2 r(λ) where λ = k j=1 λ j if and only if any one of the following conditions is satisfied. C1. A j Σ is idempotent for all j and A j ΣA k = 0 for all j < k. C2. A j Σ is idempotent for all j and AΣ is idempotent. C3. A j ΣA k = 0 for all j < k and AΣ is idempotent. C4. r = k j=1 r j and AΣ is idempotent. C5. the matrices AΣ, A j Σ, j = 1, 2,...,k 1 are idempotent and A k Σ is non-negative definite. Chapter 3 98