Lecture 14: Multivariate mgf s and chf s

Similar documents
Lecture 15: Multivariate normal distributions

Lecture 5: Moment generating functions

18 Bivariate normal distribution I

Multiple Random Variables

Multivariate Random Variable

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Random Vectors and Multivariate Normal Distributions

5 Operations on Multiple Random Variables

Elements of Probability Theory

Chapter 5 continued. Chapter 5 sections

Chapter 5. Chapter 5 sections

ACM 116: Lectures 3 4

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

BASICS OF PROBABILITY

Introduction to Probability and Stocastic Processes - Part I

EE4601 Communication Systems

Notes on Random Vectors and Multivariate Normal

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

3. Probability and Statistics

Continuous Random Variables

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

Lecture 21: Convergence of transformations and generating a random variable

The Multivariate Normal Distribution 1

1.1 Review of Probability Theory

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 1: August 28

Introduction to Normal Distribution

The Multivariate Normal Distribution. In this case according to our theorem

Lecture 11. Multivariate Normal theory

Chapter 4 Multiple Random Variables

The Multivariate Normal Distribution 1

1.12 Multivariate Random Variables

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

TAMS39 Lecture 2 Multivariate normal distribution

Let X and Y denote two random variables. The joint distribution of these random

Partial Solutions for h4/2014s: Sampling Distributions

Chapter 4. Chapter 4 sections

1 Solution to Problem 2.1

Introduction to Probability Theory

Formulas for probability theory and linear models SF2941

Exam 2. Jeremy Morris. March 23, 2006

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Bivariate distributions

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture 2: Repetition of probability theory and statistics

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

Random Variables and Their Distributions

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

HW4 : Bivariate Distributions (1) Solutions

Multivariate random variables

conditional cdf, conditional pdf, total probability theorem?

Chapter 4 Multiple Random Variables

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Multiple Random Variables

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Stat 206: Sampling theory, sample moments, mahalanobis

Probability Lecture III (August, 2006)

Summary. Ancillary Statistics What is an ancillary statistic for θ? .2 Can an ancillary statistic be a sufficient statistic?

A Probability Review

1 Review of Probability and Distributions

ENGG2430A-Homework 2

Algorithms for Uncertainty Quantification

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

STA 2201/442 Assignment 2

Lecture 6: Special probability distributions. Summarizing probability distributions. Let X be a random variable with probability distribution

MAS223 Statistical Inference and Modelling Exercises

MATH2715: Statistical Methods

Gaussian vectors and central limit theorem

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

STAT 512 sp 2018 Summary Sheet

Lecture 20: Linear model, the LSE, and UMVUE

Basics on Probability. Jingrui He 09/11/2007

Multivariate Distribution Models

Convergence in Distribution

p. 6-1 Continuous Random Variables p. 6-2

ECON 5350 Class Notes Review of Probability and Distribution Theory

Section 8.1. Vector Notation

IDENTIFIABILITY OF THE MULTIVARIATE NORMAL BY THE MAXIMUM AND THE MINIMUM

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

Lecture The Sample Mean and the Sample Variance Under Assumption of Normality

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Stat 710: Mathematical Statistics Lecture 31

Joint Gaussian Graphical Model Review Series I

CHAPTER 1 DISTRIBUTION THEORY 1 CHAPTER 1: DISTRIBUTION THEORY

Chapter 4 : Expectation and Moments

CSCI-6971 Lecture Notes: Probability theory

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Transcription:

Lecture 14: Multivariate mgf s and chf s Multivariate mgf and chf For an n-dimensional random vector X, its mgf is defined as M X (t) = E(e t X ), t R n and its chf is defined as φ X (t) = E(e ıt X ), Simple properties of mgf s and chf s t R n M X (t) is either finite or, and M X (0) = 1. The chf φ X (t) is a continuous function of t R n and φ X (0) = 1. For any fixed k n matrix A and vector b R k, the mgf and chf of the random vector AX + b are, respectively, M AX+b (t) = e b t M X (A t) and φ AX+b (t) = e ıb t φ X (A t), t R k In particular, if Y is the first m components of X, m < n, then the mgf and chf of Y are, respectively, M Y (s) = M X ((s,0)) and φ Y (s) = φ X ((s,0)), s R m, 0 R n m UW-Madison (Statistics) Stat 609 Lecture 14 2015 1 / 17

Calculation of moments If X = (X 1,...,X n ) and M X (t) is finite in a neighborhood of 0, then E(X r 1 1 X r n n ) is finite for any nonnegative integers r 1,...,r n, and E(X r 1 1 X r n n ) = r 1+ +r n M X (t) t r 1 1 tr n n t=0 In particular, E(X M X (t) 1 ) t = E(X) =. t=0 E(X n ) and 2 E(X1 2) E(X 1X 2 ) E(X 1 X n ) M X (t) t t = E(XX ) = E(X 2 X 1 ) E(X2 2) E(X 2X n ) t=0 E(X n X 1 ) E(X n X 2 ) E(Xn 2 ) When the mgf is not finite in any neighborhood of 0, then we can use the chf to calculate moments. UW-Madison (Statistics) Stat 609 Lecture 14 2015 2 / 17

Suppose that E X r 1 1 X r n k < for some nonnegative integers r 1,...,r n with r = r 1 + + r n. Since r e ıt X t r 1 1 tr n ı = r X r 1 1 X r n e ıt X k X r 1 1 X r n k k which is integrable, we can switch integration and differentiation to get r φ X (t) ( ) t r = ı r E X r 1 1 1 tr n 1 X r n e ıt X k k and r φ X (t) t r 1 1 tr n = ı r E(X r 1 1 X r n k ). k t=0 In particular, Theorem M1. φ X (t) t = ıex, t=0 2 φ X (t) t t τ = E(XX ) t=0 If the mgf M X (t) of an n-dimensional random vector X is finite in a neighborhood of 0, then the chf of X is φ X (t) = M X ( ıt), t R n. UW-Madison (Statistics) Stat 609 Lecture 14 2015 3 / 17

Proof (we need a proof because M X (t) may not be finite for all t) We first prove this for a univariate X. If M X (t) is finite in a neighborhood of 0, ( δ,δ), 0 < δ < 1, then M X (t) is differentiable of all order in ( δ,δ), E X r < for all r = 1,2..., and E(X M X (t) = k )t k, t ( δ,δ) k! Using the inequality [ e ısx we obtain that φ X (s + t) n which implies that e ıtx n ] ( ıtx) k k! tx n+1 (n + 1)! t,s R ı k t k k! E(X k e ısx ) t n+1 E X n+1, t < δ, s R (n + 1)! φ X (s + t) = where g (k) (s) = d k g(s)/ds k. φ (k) X (s)tk k! t < δ, s R UW-Madison (Statistics) Stat 609 Lecture 14 2015 4 / 17

Setting s = 0, we obtain that φ X (t) = φ (k) X (0)tk k! = For t < δ and s < δ, φ X (s + t) = = j j=0 ı k E(X k )t k = k! φ (k) X (s)tk k! ı j E(X j ) = t k k! tk s j k E(X k )( ıt) k = M X ( ıt) t < δ k! j=k k!(j k)! = j=0 ı j E(X j )s j k (j k)! ı j E(X j )(s + t) j This proves that φ X (s) = M X ( ıs), s < 2δ Continuing this process, we can show that φ X (s) = M X ( ıs) for s < 3δ, s < 4δ,..., and hence, φ X (t) = M X ( ıt) for t R. Consider now an n-dimensional X. From the proof result for the univariate case, we have φ X (t) = E(e ıt X ) = φ t X (1) = M t X ( ı) = M X ( ıt), t R n. j! UW-Madison (Statistics) Stat 609 Lecture 14 2015 5 / 17

Theorem M2 (uniqueness) (i) Random vectors X Y iff the chf s φ X (t) = φ Y (t) for t R n. (ii) If the mgf s of random vectors X and Y satisfy M X (t) = M Y (t) < for t in a neighborhood of 0, then X Y. Proof. It is clear that the distribution of X determines its mgf and chf. Part (i) follows from the following multivariate inversion formula: for any A = (a 1,b 1 ) (a n,b n ) such that the distribution of X is continuous at all points in the boundary of A, P ( X A ) = lim c c c c c φ X (t 1,...,t n ) ı n (2π) n n i=1 e ıt i a i e ıt i b i t i dt i. To establish (ii), we use the relationship between mgf and chf shown in Theorem M1. If the mgf s M X (t) = M Y (t) < for t in a neighborhood of 0, then φ X (t) = M X ( ıt) = M Y ( ıt) = φ Y (t), Then by (i), X and Y have the same distribution. t R n UW-Madison (Statistics) Stat 609 Lecture 14 2015 6 / 17

Similar to the univariate case, convergence of chf s and cdf s are equivalent and convergence of mgf s implies convergence of cdf s. We introduce the following result without giving proofs. Theorem M3. Suppose that X 1,X 2,... is a sequence of k-dimensional random vectors with mgf s M Xn (t) and chf s φ Xn (t), t R k. (i) If lim n M Xn (t) = M X (t) < for all t in a neighborhood of 0, where M X (t) is the mgf of a k-dimensional random vector X, then lim n F Xn (x) = F X (x) for all x where F X (x) is continuous. (ii) A necessary and sufficient condition for the convergence of F Xn (x) s in (i) is that lim n φ Xn (t) = φ X (t) for all t R k, where φ X (t) is the chf of a k-dimensional random vector X. (iii) In (ii), if lim n φ Xn (t) = g(t) for all t R k and g(t) is a continuous function on R k, then g(t) must be a chf of some k-dimensional random vector X and the result in (ii) holds. Note that Theorem M3(ii) gives a necessary and sufficient condition, while the converse of Theorem M3(i) is not true in general. UW-Madison (Statistics) Stat 609 Lecture 14 2015 7 / 17

The next result concerns the relationship between independence and chf s and mgf s. Theorem M4. Let X and Y be two random vectors with dimensions n and m, respectively. (i) X and Y are independent iff their joint and marginal chf s satisfy φ (X,Y ) (t) = φ X (t 1 )φ Y (t 2 ) for all t = (t 1,t 2 ), t 1 R n, t 2 R m. (ii) If the mgf of (X,Y ) is finite in a neighborhood of 0 R n+m, then X and Y are independent iff their joint and marginal mgf s satisfy M (X,Y ) (t)=m X (t 1 )M Y (t 2 ) for all t =(t 1,t 2 ) in the neighborhood of 0. Proof. Because of Theorem M1, (ii) follows from (i). The only if part of (i) follows from Theorem 4.2.12A. It remains to show the if part of (i). Let F(x,y) be the cdf of (X,Y ), F X (x) be the cdf of X, and F Y (y) be the cdf of Y, x R n, y R m. UW-Madison (Statistics) Stat 609 Lecture 14 2015 8 / 17

Define F(x,y) = F X (x)f Y (y), x R n, y R m It can be easily shown that F is a cdf on R n+m. There exist random vectors X (n-dimensional) and Ỹ (m-dimensional) such that the cdf of ( X,Ỹ ) is F(x,y). From the form of F, we know that X and Ỹ are independent, X F X with chf φ X, and Ỹ F Y with chf φ Y. Hence the chf of F must be φ X (t 1 )φ Y (t 2 ), t 1 R n, t 2 R m. If φ (X,Y ) (t) = φ X (t 1 )φ Y (t 2 ) for all t = (t 1,t 2 ), t 1 R n, t 2 R m, then, by uniqueness, F(x,y) = F(x,y) = F X (x)f Y (y) for all x R n, y R m. This proves that X and Y are independent. As an example, we now consider the mgf s in a family of multivariate distributions that is an extension of the univariate normal distribution family. n-dimensional multivariate normal distribution Let µ R n, Σ be a positive definite n n matrix, and Σ be the determinant of Σ. UW-Madison (Statistics) Stat 609 Lecture 14 2015 9 / 17

The following pdf on R n, ( 1 f (x) = (2π) n/2 exp (x µ) Σ 1 ) (x µ), x R n, Σ 1/2 2 is called the n-dimensional normal pdf and the corresponding distribution is called the n-dimensional normal distribution and denoted by N(µ,Σ). Using transformation, we can show that f (x) is indeed a pdf on R n. If X N(0,Σ), then, for any t R n, 1 M X (t) = (2π) n/2 Σ 1/2 e t x exp R n e t Σt/2 = (2π) n/2 Σ 1/2 exp R n = e t Σt/2 ( x Σ 1 ) x dx 2 ( (x Σt) Σ 1 (x Σt) 2 ) dx If X N(µ,Σ), an application of the property of mgf and the previous result leads to M X (t) = e µ t+t Σt/2 t R n UW-Madison (Statistics) Stat 609 Lecture 14 2015 10 / 17

From the property of mgf, we can differentiate M X (t) to get E(X) = M X (t) t = (µ + Σt)e µ t+t Σt/2 = µ t=0 t=0 E(XX ) = 2 M X (t) t t = t=0 t (µ + t+t Σt/2 Σt)eµ t=0 [ ] = Σe µ t+t Σt/2 + (µ + Σt)(µ + Σt) e µ t+t Σt/2 t=0 Therefore, = Σ + µµ Var(X) = E(XX ) E(X)E(X ) = Σ Like the univariate case, µ and Σ in N(µ,Σ) are respectively the mean vector and covariance matrix of the distribution. Another important consequence from the previous result is that the components of X N(µ,Σ) are independent iff Σ is a diagonal matrix, i.e., components of X are uncorrelated. UW-Madison (Statistics) Stat 609 Lecture 14 2015 11 / 17

We now study properties of the multivariate normal distribution. The bivariate normal distribution is a special case of the multivariate normal distribution with n = 2. Bivariate normal distributions First, we want to show that the definition for the general multivariate normal distribution with n = 2 is consistent with the early definition of the bivariate normal distribution. Let µ 1 R, µ 2 R, σ 1 > 0, σ 2 > 0, and 1 < ρ < 1 be constants, and ( ) ( ) ( ) µ1 σ 2 µ = Σ = 1 ρσ 1 σ 2 x1 µ 2 ρσ 1 σ 2 σ2 2 x = x 2 Then Σ = σ1 2 σ 2 2 ρ2 σ1 2 σ 2 2 = σ 1 2 σ 2 2 (1 ρ2 ) ( Σ 1 1 σ = 2 ) σ1 2σ 2 2 2 ρσ 1 σ 2 (1 ρ2 ) ρσ 1 σ 2 σ1 2 (x µ) Σ 1 (x µ) = 1 ( ) ( x1 µ 1 σ 2 )( ) 2 ρσ 1 σ 2 x1 µ 1 Σ x 2 µ 2 ρσ 1 σ 2 σ1 2 x 2 µ 2 UW-Madison (Statistics) Stat 609 Lecture 14 2015 12 / 17

= 1 ( x1 µ 1 Σ x 2 µ 2 ) ( σ 2 2 (x 1 µ 1 ) ρσ 1 σ 2 (x 2 µ 2 ) ρσ 1 σ 2 (x 1 µ 1 ) + σ 2 1 (x 2 µ 2 ) = σ 2 2 (x 1 µ 1 ) 2 2ρσ 1 σ 2 (x 1 µ 1 )(x 2 µ 2 ) + σ 2 1 (x 2 µ 2 ) 2 σ 2 1 σ 2 2 (1 ρ2 ) = (x 1 µ 1 ) 2 σ1 2(1 ρ2 ) 2ρ(x 1 µ 1 )(x 2 µ 2 ) σ 1 σ 2 (1 ρ 2 + (x 2 µ 2 ) 2 ) σ2 2(1 ρ2 ) Thus, when n = 2, ( 1 f (x) = (2π) n/2 exp (x µ) Σ 1 ) (x µ) Σ 1/2 2 ) exp ( (x 1 µ 1 ) 2 2σ1 = 2(1 ρ2 ) + ρ(x 1 µ 1 )(x 2 µ 2 ) (x 2 µ 2 ) 2 σ 1 σ 2 (1 ρ 2 ) 2σ2 2(1 ρ2 ) 2πσ 1 σ 2 1 ρ 2 which is the same as the bivariate normal pdf defined earlier. Transformation and marginals If X is an n-dimensional random vector N(µ,Σ) and Y = AX + b, where A is a fixed k n matrix of rank k n and b is a fixed k-dimensional vector, then Y N(Aµ + b,aσa ). UW-Madison (Statistics) Stat 609 Lecture 14 2015 13 / 17 )

This can be proved by using multivariate transformation, or the mgf. We showed that the mgf of X N(µ,Σ) is M X (t) = e µ t+t Σt/2 t R n From the properties of mgf, Y = AX + b has mgf M Y (t) = e b t M X (A t) = e b t e µ (A t)+(a t) Σ(A t)/2 = e (Aµ+b) t+t (AΣA )t/2, which is exactly the mgf for N(Aµ + b,aσa ), noting that (AΣA ) 1 exists since A has rank k n and Σ has rank n. The result seems still hold even if AΣA is singular, but we have not defined the normal distribution with a singular covariance matrix. If we take A = (I k 0), where I k is the identity matrix of order k and 0 is the k (n k) matrix of all 0 s, then AX is exactly the vector containing the first k components of X and, therefore, we have shown that If Y is multivariate normal, then any sub-vector of Y is also normally distributed. Is the converse true? That is, if both X and Y are normal, should the joint pdf of (X,Y ) be always normal? UW-Madison (Statistics) Stat 609 Lecture 14 2015 14 / 17

A counter-example Consider where f (x,y) = 1 2π e (x 2 +y 2 )/2 [1 + g(x,y)], (x,y) R 2 { xy 1 < x < 1, 1 < y < 1 g(x,y) = 0 otherwise Then f (x,y) 0. (Is it a pdf?) f (x,y)dy = 1 [ e (x 2 +y 2 )/2 dy + 2π = 1 [ 2π e x 2 /2 = 1 2π e x 2 /2 e y 2 /2 dy + xi( x < 1) ] e (x 2 +y 2 )/2 g(x,y)dy 1 1 ] ye y 2 /2 dy i.e., the marginal of X is N(0,1) (which also shows that f (x,y) is a pdf). Similarly, the pdf of Y is N(0,1). However, this f (x,y) is certainly not a bivariate normal pdf. UW-Madison (Statistics) Stat 609 Lecture 14 2015 15 / 17

Other joint distributions may not have the property that the marginal distributions are of the same type as the joint distribution. Uniform distributions A general n-dimensional uniform distribution on a set A R n has pdf { 1/C x A f (x) = C = dx 0 x A For n = 1 and A = an interval, C is the length of the interval. For n = 2, C is the area of the set A. Consider for example n = 2 and A is the rectangle A = [a,b] [c,d] = {(x,y) R 2 : a x b, c y d} where a, b, c and d are constants. In this case, the two marginal distributions are uniform distributions on intervals [a,b] and [c,d], since, when x [a,b], d f (x,y)dy = 1 1 dy = C c (b a)(d c) and f (x,y)dy = 0 when x [a,b]. d c A dy = 1 b a UW-Madison (Statistics) Stat 609 Lecture 14 2015 16 / 17

However, if A is a disk A = {(x,y) R 2 : x 2 + y 2 1} then C = π and, when x [ 1,1], 1 x 2 1 f (x,y)dy = 1 x 2 π dy = 2 1 x 2 π and f (x,y)dy = 0 when x [ 1,1]. This shows that the marginal distributions are not uniform. UW-Madison (Statistics) Stat 609 Lecture 14 2015 17 / 17