Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Similar documents
Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

3. Probability and Statistics

BASICS OF PROBABILITY

Bivariate distributions

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

18 Bivariate normal distribution I

2 (Statistics) Random variables

Lecture 2: Repetition of probability theory and statistics

Chapter 5 continued. Chapter 5 sections

EE4601 Communication Systems

Section 8.1. Vector Notation

Multivariate Random Variable

Algorithms for Uncertainty Quantification

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

8 - Continuous random vectors

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Continuous Random Variables

Probability and Distributions

Multiple Random Variables

Review: mostly probability and some statistics

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Jointly Distributed Random Variables

1.1 Review of Probability Theory

Lecture 1: August 28

ENGG2430A-Homework 2

ECE Lecture #9 Part 2 Overview

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

STA 256: Statistics and Probability I

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Introduction to Probability and Stocastic Processes - Part I

ACM 116: Lectures 3 4

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Appendix A : Introduction to Probability and stochastic processes

Exam 2. Jeremy Morris. March 23, 2006

Chapter 5. Chapter 5 sections

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Random Variables and Their Distributions

HW4 : Bivariate Distributions (1) Solutions

Introduction to Probability Theory

More on Distribution Function

Multivariate Distributions (Hogg Chapter Two)

Joint Distribution of Two or More Random Variables

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Lecture 2: Review of Probability

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

MATH 151, FINAL EXAM Winter Quarter, 21 March, 2014

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

More than one variable

A Probability Review

Final Examination Solutions (Total: 100 points)

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Continuous r.v practice problems

Gaussian random variables inr n

Review of Probability Theory

Chapter 2. Probability

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

Introduction to Normal Distribution

Chapter 5,6 Multiple RandomVariables

2 Functions of random variables

16.584: Random Vectors

Let X and Y denote two random variables. The joint distribution of these random

Review (Probability & Linear Algebra)

conditional cdf, conditional pdf, total probability theorem?

Introduction to Machine Learning

Basics on Probability. Jingrui He 09/11/2007

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Formulas for probability theory and linear models SF2941

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

Lecture 14: Multivariate mgf s and chf s

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

L2: Review of probability and statistics

Homework 10 (due December 2, 2009)

4 Pairs of Random Variables

01 Probability Theory and Statistics Review

Recitation 2: Probability

Chapter 2: Random Variables

ECE531: Principles of Detection and Estimation Course Introduction

ECE Homework Set 3

STAT Chapter 5 Continuous Distributions

MAS223 Statistical Inference and Modelling Exercises

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

Lecture 11. Multivariate Normal theory

Lecture 15: Multivariate normal distributions

Multiple Random Variables

Stochastic processes Lecture 1: Multiple Random Variables Ch. 5

STAT 516 Midterm Exam 3 Friday, April 18, 2008

Multivariate Distributions

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Notes on Random Vectors and Multivariate Normal

Introduction to Probability and Stocastic Processes - Part I

Quick Tour of Basic Probability Theory and Linear Algebra

4. Distributions of Functions of Random Variables

Transcription:

4.1 Bivariate Distributions. Chapter 4. Multivariate Distributions For a pair r.v.s (X,Y ), the Joint CDF is defined as F X,Y (x, y ) = P (X x,y y ). Obviously, the marginal distributions may be obtained easily from the joint distribution: F X (x) = P (X x) = P (X x,y < ) = F X,Y (x, ), and F Y (y ) = F X,Y (, y ). Covariance and correlation of X and Y : Cov(X,Y ) = E {(X E X )(Y EY )} = E (XY ) (E X )(EY ), Corr(X,Y ) = Cov(X,Y ) / Var(X )Var(Y ).

Discrete bivariate distributions If X takes discrete values x 1,, x m andy takes discrete values y 1,, y n, their joint probability function may be presented in a table: X \Y y 1 y 2 y n x 1 p 11 p 12 p 1n p 1 x 2 p 21 p 22 p 2n p 2 x m p m1 p 22 p mn p m p 1 p 2 p n where p ij = P (X = x i,y = y j ), and p i = P (X = x i ) = p j = P (Y = y j ) = n P (X = x i,y = y j ) = p ij, j =1 m P (X = x i,y = y j ) = p ij. i =1 j i

In general, p ij p i p j. However if p ij = p i p j for all i and j, X and Y are independent, i.e. P (X = x i,y = y j ) = P (X = x i ) P (Y = y j ), i, j. For independent X and Y, Cov(X,Y ) = 0. Example 1. Flip a fair coin two times. Let X = 1 if H occurs in the first flip, and 0 if T occurs in the first flip. Let Y = 1 if the outcomes in the two flips are the same, and 0 if the two outcomes are different. The joint probability function is X \Y 1 0 1 1/4 1/4 1/2 0 1/4 1/4 1/2 1/2 1/2 It is easy to see that X and Y are independent, which is a bit anti-intuitive.

Continuous bivariate distribution If the CDF F X,Y can be written as F X,Y (x, y ) = y x f X,Y (u, v )dudv for any x and y, where f X,Y 0, (X,Y ) has a continuous joint distribution, and f X,Y (x, y ) is the joint PDF. As F X,Y (, ) = 1, it holds that f X,Y (u, v )dudv = 1. In fact any non-negative function satisfying this condition is a PDF. Furthermore for any subset A in R 2, P {(X,Y ) A} = f X,Y (x, y )dxdy. A

Also Cov(X,Y ) = = (x E X )(y EY )f X,Y (x, y )dxdy x yf X,Y (x, y )dxdy E X EY. Note that F X (x) = F X,Y (x, ) = x f X,Y (u, v )dudv = x { f X,Y (u, v )dv } du, hence the marginal PDF of X can be derived from the joint PDF as follows f X (x) = Similarly, f y (y ) = f X,Y (x, y )dx. f X,Y (x, y )dy.

Note. Different from discrete cases, it is not always easy to work out marginal PDFs from joint PDFs, especially when PDFs are discontinuous. When f X,Y (x, y ) = f X (x)f y (y ) for any x and y, X and Y are independent, as then P (X x,y y ) = = x f X (u)du y and also Cov(X,Y ) = 0. y x f X,Y (u, v )dudv = y f Y (v )dv = P (X x)p (Y y ), x f X (u)f Y (v )dudv Example 2. Uniform distribution on unit square U [0, 1] 2. f (x, y ) = { 1 0 x 1, 0 y 1 0 otherwise.

This is well-defined PDF, as f 0 and f (x, y )dxdy = 1. It is easy to see that X and Y are independent. Let us calculate some probabilities P (X < 1/2,Y < 1/2) = F (1/2, 1/2) = P (X + Y < 1) = = = {x+y <1} 1 1 y 0 dy 1/2 1/2 0 0 0 1/2 1/2 dxdy = 1/4. f X,Y (x, y )dxdy = dx = 1 0 f X,Y (x, y )dxdy {x >0, y >0, x+y <1} (1 y )dy = 1/2. dxdy Example 3. Let (X,Y ) have the joint PDF f (x, y ) = { x 2 + x y /3 0 x 1, 0 y 2, 0 otherwsie.

Calculate P (0 < X < 1/2, 1/4 < Y < 3) and P (X < Y ). Are X and Y independent with each other? = P (0 < X < 1/2, 1/4 < Y < 3) = P (0 < X < 1/2, 1/4 < Y < 2) 2 1/2 dy (x 2 + x y 2 3 )dx = 1 + y 1.75 dy = 24 24 + y 2 2 48 1/4 = 0.155. 1/4 0 1/4 P (X < Y ) = 1 0 dx 2 x (x 2 + x y 3 )dy = 1 0 ( 2 3 x +2x 2 7 6 x 3 )dx = 17/24 = 0.708. f X (x) = 2 0 (x 2 + x y 3 )dy = 2x 2 + 2x 3, f Y (y ) = Both f X (x) and f Y (y ) are well-defined PDFs. 1 0 (x 2 + x y 3 )dx = 1 3 + y 6. But f (x, y ) f X (x)f Y (y ), hence they are not independent.

4.2 Conditional Distributions If X and Y are not independent, knowing X should be helpful in determining Y, as X may carry some information on Y. Therefore it makes sense to define the distribution of Y given, say, X = x. This is the concept of conditional distributions. If both X and Y are discrete, the conditional probability function is simply a special case of conditional probabilities: P (Y = y X = x) = P (Y = y, X = x) / P (X = x). However this definition does not extend to continuous r.v.s, as then P (X = x) = 0.

Definition (Conditional PDF). For continuous r.v.s X and Y, the conditional PDF of Y given X = x is f Y X ( x) = f X,Y (x, )/f X (x). Remark. (i) As a function of y, f Y X (y x) is a PDF: P (Y A X = x) = f Y X (y x)dy, while x is treated as a constant (i.e. not a variable). A (ii) E (Y X = x) = yf Y X (y x)dy is a function of x, and Var(Y X = x) = {y E (Y X = x)} 2 f Y X (y x)dy.

(iii) If X and Y are independent, f Y X (y x) = f Y (y ). (iv) f X,Y (x, y ) = f X (x)f Y X (y x) = f X Y (x y )f Y (y ), which offers alternative ways to determine the joint PDF. (v) E {E (Y X )} = E (Y ) This in fact holds for any r.v.s X and Y. We give a proof here for continuous r.v.s only: { E {E (Y X )} = yf Y X (y x)dy } f X (x)dx = yf X,Y (x, y )dxdy = y { f X,Y (x, y )dx } dy = yf Y (y )dy = EY. Example 4. Let f X,Y (x, y ) = e y for 0 < x < y <, and 0 otherwise. Find f Y X (y x), f X Y (x y ) and Cov(X,Y ).

We need to find f X (x), f Y (y ) first: f X (x) = f X,Y (x, y )dy = Hence f Y (y ) = f X,Y (x, y )dx = f Y X (y x) = f X,Y (x, y ) f X (x) x y f X Y (x y ) = f X,Y (x, y ) f Y (y ) 0 e y dy = e x e y dx = y e y x (0, ), y (0, ). = e (y x) y (x, ), = 1/y x (0, y ). Note that given Y = y, X U (0, y ), i.e. the uniform distribution on (0, y ).

To find Cov(X,Y ), we compute E X, EY and E (XY ) first. E X = xf X (x)dx = xe x dx = e x (1 + x) 0 = 1, 0 EY = yf Y (y )dy = y 2 e y dy = y 2 e y 0 + 2 y e y dy = 2, 0 0 y E (XY ) = x yf X,Y (x, y )dxdy = dy x y e y dx = 1 y 3 e y dy 2 = 1 2 y 3 e y 0 + 3 2 0 0 0 y 2 e y dy = 3. Hence Cov(X,Y ) = E (XY ) (E X )(EY ) = 3 2 = 1. 0

4.3 Multivariate Distributions Let X = (X 1,, X n ) be a random vector (r.v.) consisting of n r.v.s. The joint CDF is defined as F (x 1,, x n ) F X1,,X n (x 1,, x n ) = P (X 1 x 1,, X n x n ). If X is continuous, its PDF f satisfies F (x 1,, x n ) = xn x1 In general, the PDF admits the factorisation f (u 1,, u n )du 1 du n. f (x 1,, x n ) = f (x 1 )f (x 2 x 1 )f (x 3 x 1, x 2 ) f (x n x 1,, x n 1 ), where f (x j x 1,, x j 1 ) denotes the conditional PDF of X j given X 1 = x 1,, X j 1 = x j.

However, when X 1,, X n are independent, f X1,,X n (x 1,, x n ) = f X1 (x 1 ) f Xn (x n ). IID Samples. If X 1,, X n are independent and each has the same CDF F, we say that X 1,, X n are IID (independent and identically distributed) and write X 1,, X n ii d F. We also call X 1,, X n a sample or a random sample. 4.3 Two important multivariate distributions Multinomial Distribution Multinomial(n, p 1,, p k ) an extension of Bin(n, p).

Suppose we threw a k -sided die n times, record X i as the number of times ended with the i -th side, i = 1,, k. Then (X 1,, X k ) Multinomial(n, p 1,, p k ), where p i is the probability of the event that the i -th side occurs in one threw. Obviously p i 0 and i p i = 1. We may immediately make the following observation from the above definition. (i) X 1 + + X k n, therefore X 1,, X n are not independent. (ii) X i Bin(n, p i ), hence E X i = np i and Var(X i ) = np i (1 p i ).

The joint probability function for Multinomial(n, p 1,, p k ): For any j 1,, j k 0 and j 1 + + j k = n, P (X 1 = j 1,, X k = j k ) = n! j 1! j k! pj 1 1 p j k k.

Multivariate Normal Distribution N (µ, Σ): a k -variable r.v. X = (X 1,, X k ) is normal with mean µ and covariance matrix Σ if its PDF is f (x) = 1 (2π) k /2 Σ 1/2 exp { 1 2 (x µ) Σ 1 (x µ) } x R k, where µ is k -vector, and Σ is a k k positive-definite matrix. Some properties of N (µ, Σ): Let µ = (µ 1,, µ k ) and Σ (σ ij ), then (i) E X = µ, and the covariance matrix Cov(X) = E {(X µ)(x µ) } = Σ, and σ ij = Cov(X i, X j ) = E {(X i µ i )(X j µ j )}.

(ii) When σ ij = 0 for all i j, i.e. the components of X are uncorrelated, Σ = diag(σ 11,, σ k k ), Σ = i σ ii. Hence the PDF admits a simple form f (x) = k i =1 1 exp{ 1 (x 2πσii 2σ i µ i ) 2 }. ii Thus X 1,, X n are independent when σ ij = 0 for all i j. (iii) Let Y = AX + b, where A is a constant matrix and b is a constant vector. Then Y N (Aµ + b, AΣA ). (iv) X i N (µ i, σ ii ). For any constant k -vector a, a X is a scale r.v. and a X N (a µ, a Σa). (v). Standard Normal Distribution: N (0, I k ), where I k is the k k identity matrix.

Example 5. Let X 1, X 2, X 3 be jointly normal with the common mean 0, variance 1 and Corr(X i, X j ) = 0.5, 1 i j 3. Find the probability P ( X 1 + X 2 + X 3 2). It is difficult to calculate this probability by the integration of the joint PDF. We provide an estimate by simulation. (The justification will be in Chapter 6). We solve a general problem first. Let X N (µ, Σ), X has p component. For any set A R p, we may estimate the probability P (X A) by the relative frequency #{1 i n : X i A} / n, where n is a large integer, and X 1,, X n are n vectors generated independently from N (µ, Σ).

Note X = µ + Σ 1/2 Z, where Z N (0, I p ) is standard normal, and Σ 1/2 0 and Σ 1/2 Σ 1/2 = Σ. We generate Z by rnorm(p), and apply the above linear transformation to obtain X. Σ 1/2 may be obtained by an eigenanalysis for Σ using R-function eigen. Since Σ 0, it holds that Σ = ΓΛΓ, where Γ is an orthogonal matrix (i.e. Γ Γ = I p ), Λ = diag(λ 1,, λ p ) is a diagonal matrix. Then Σ 1/2 = ΓΛ 1/2 Γ, where Λ 1/2 = diag ( ) λ1,, λ p. The R function rmnorm below generate random vectors from N (µ, Σ).

rmnorm <- function(n, p, mu, Sigma) { # generate n p-vectors from N(mu, Sigma) # mu is p-vector of mean, Sigma >=0 is pxp matrix t <- eigen(sigma, symmetric=t) # eigenanalysis for Sigma ev <- sqrt(t$values) # square-roots of the eigenvalues G <- as.matrix(t$vectors) # line up eigenvectors into a matrix G D <- G*0; for(i in 1:p) D[i,i] <- ev[i]; # D is diagonal matrix P <- G%*%D%*%t(G) # P=GDG' is the required transformation matrix Z <- matrix(rnorm(n*p), byrow=t, ncol=p) # Z is nxp matrix with elements drawn from N(0,1) Z <- Z%*%P # Now each row of Z is N(0, Sigma) X <- matrix(rep(mu, n), byrow=t, ncol=p) + Z # each row of X is N(mu, Sigma) }

This function is saved in the file rmnorm.r. We may use it to perform the required task: source("rmnorm.r") mu <- c(0, 0, 0) Sigma <- matrix(c(1,0.5,0.5,0.5,1,0.5,0.5,0.5,1), byrow=t, ncol=3) X <- rmnorm(20000, 3, mu, Sigma) dim(x) # check the size of X t <- abs(x[,1]) + abs(x[,2]) + abs(x[,3]) cat("estimated probability:", length(t[t<=2])/20000, "\n") It returned the value: Estimated probability: 0.446

I repeated it a few more times and obtained the estimates 0.439, 0.445, 0.441 etc.

4.4 Transformations of random variables Let a random vector X have PDF f X. We are interested in the distribution of a scalar function of X, say, Y = r (X). We introduce a general procedure first. Three steps to find the PDF of Y = r (X): (i) For each y, find the set A y = {x : r (x) y } (ii) Find the CDF (iii) f Y (y ) = d dy F Y (y ). F Y (y ) = P (Y y ) = P {r (X) y } = A y f X (x)d x.

Example 6. Let X f X (x) (X is a scalar). Find the PDF of Y = e X. A y = {x : e x y } = {x : x log y }. Hence Hence F Y (y ) = P (Y y ) = P {e X y } = P (X log y ) = F X (log y ). f Y (y ) = d dy F X (log y ) = f X (log y ) d log y dy = y 1 f X (log y ). Note that y = e x and log y = x, dy dx = ex = y. The above result can be written as f Y (y ) = f X (x) / dy dx, or f Y (y )dy = f X (x)dx. For 1-1 transformation Y = r (X ) (i.e. the inverse function X = r 1 (Y ) is

uniquely defined), it holds that f Y (y ) = f X (x)/ r (x) = f X (x) dx. dy Note. You should replace all x in the above by x = r 1 (y ).

Example 7. Let X U ( 1, 3). Find the PDF of Y = X 2. Now this is not a 1-1 transformation. We have to use the general 3-step procedure. Note that Y takes values in (0, 9). Consider two cases: (i) For y (0, 1), A y = ( y, y ), F y (y ) = A y f X (x)dx = 0.5 y. Hence f Y (y ) = F Y (y ) = 0.25/ y. (ii) For y [1, 9), A y = ( 1, y ), F y (y ) = A y f X (x)dx = 0.25( y + 1). Hence f Y (y ) = F Y (y ) = 0.125/ y. Collectively we have f Y (y ) = 0.25/ y 0 < y < 1 0.125/ y 1 y < 9 0 otherwise.