Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

4.1 Bivariate Distributions. Chapter 4. Multivariate Distributions For a pair r.v.s (X,Y ), the Joint CDF is defined as F X,Y (x, y ) = P (X x,y y ). Obviously, the marginal distributions may be obtained easily from the joint distribution: F X (x) = P (X x) = P (X x,y < ) = F X,Y (x, ), and F Y (y ) = F X,Y (, y ). Covariance and correlation of X and Y : Cov(X,Y ) = E {(X E X )(Y EY )} = E (XY ) (E X )(EY ), Corr(X,Y ) = Cov(X,Y ) / Var(X )Var(Y ).

Discrete bivariate distributions If X takes discrete values x 1,, x m andy takes discrete values y 1,, y n, their joint probability function may be presented in a table: X \Y y 1 y 2 y n x 1 p 11 p 12 p 1n p 1 x 2 p 21 p 22 p 2n p 2 x m p m1 p 22 p mn p m p 1 p 2 p n where p ij = P (X = x i,y = y j ), and p i = P (X = x i ) = p j = P (Y = y j ) = n P (X = x i,y = y j ) = p ij, j =1 m P (X = x i,y = y j ) = p ij. i =1 j i

In general, p ij p i p j. However if p ij = p i p j for all i and j, X and Y are independent, i.e. P (X = x i,y = y j ) = P (X = x i ) P (Y = y j ), i, j. For independent X and Y, Cov(X,Y ) = 0. Example 1. Flip a fair coin two times. Let X = 1 if H occurs in the first flip, and 0 if T occurs in the first flip. Let Y = 1 if the outcomes in the two flips are the same, and 0 if the two outcomes are different. The joint probability function is X \Y 1 0 1 1/4 1/4 1/2 0 1/4 1/4 1/2 1/2 1/2 It is easy to see that X and Y are independent, which is a bit anti-intuitive.

Continuous bivariate distribution If the CDF F X,Y can be written as F X,Y (x, y ) = y x f X,Y (u, v )dudv for any x and y, where f X,Y 0, (X,Y ) has a continuous joint distribution, and f X,Y (x, y ) is the joint PDF. As F X,Y (, ) = 1, it holds that f X,Y (u, v )dudv = 1. In fact any non-negative function satisfying this condition is a PDF. Furthermore for any subset A in R 2, P {(X,Y ) A} = f X,Y (x, y )dxdy. A

Also Cov(X,Y ) = = (x E X )(y EY )f X,Y (x, y )dxdy x yf X,Y (x, y )dxdy E X EY. Note that F X (x) = F X,Y (x, ) = x f X,Y (u, v )dudv = x { f X,Y (u, v )dv } du, hence the marginal PDF of X can be derived from the joint PDF as follows f X (x) = Similarly, f y (y ) = f X,Y (x, y )dx. f X,Y (x, y )dy.

Note. Different from discrete cases, it is not always easy to work out marginal PDFs from joint PDFs, especially when PDFs are discontinuous. When f X,Y (x, y ) = f X (x)f y (y ) for any x and y, X and Y are independent, as then P (X x,y y ) = = x f X (u)du y and also Cov(X,Y ) = 0. y x f X,Y (u, v )dudv = y f Y (v )dv = P (X x)p (Y y ), x f X (u)f Y (v )dudv Example 2. Uniform distribution on unit square U [0, 1] 2. f (x, y ) = { 1 0 x 1, 0 y 1 0 otherwise.

This is well-defined PDF, as f 0 and f (x, y )dxdy = 1. It is easy to see that X and Y are independent. Let us calculate some probabilities P (X < 1/2,Y < 1/2) = F (1/2, 1/2) = P (X + Y < 1) = = = {x+y <1} 1 1 y 0 dy 1/2 1/2 0 0 0 1/2 1/2 dxdy = 1/4. f X,Y (x, y )dxdy = dx = 1 0 f X,Y (x, y )dxdy {x >0, y >0, x+y <1} (1 y )dy = 1/2. dxdy Example 3. Let (X,Y ) have the joint PDF f (x, y ) = { x 2 + x y /3 0 x 1, 0 y 2, 0 otherwsie.

Calculate P (0 < X < 1/2, 1/4 < Y < 3) and P (X < Y ). Are X and Y independent with each other? = P (0 < X < 1/2, 1/4 < Y < 3) = P (0 < X < 1/2, 1/4 < Y < 2) 2 1/2 dy (x 2 + x y 2 3 )dx = 1 + y 1.75 dy = 24 24 + y 2 2 48 1/4 = 0.155. 1/4 0 1/4 P (X < Y ) = 1 0 dx 2 x (x 2 + x y 3 )dy = 1 0 ( 2 3 x +2x 2 7 6 x 3 )dx = 17/24 = 0.708. f X (x) = 2 0 (x 2 + x y 3 )dy = 2x 2 + 2x 3, f Y (y ) = Both f X (x) and f Y (y ) are well-defined PDFs. 1 0 (x 2 + x y 3 )dx = 1 3 + y 6. But f (x, y ) f X (x)f Y (y ), hence they are not independent.

4.2 Conditional Distributions If X and Y are not independent, knowing X should be helpful in determining Y, as X may carry some information on Y. Therefore it makes sense to define the distribution of Y given, say, X = x. This is the concept of conditional distributions. If both X and Y are discrete, the conditional probability function is simply a special case of conditional probabilities: P (Y = y X = x) = P (Y = y, X = x) / P (X = x). However this definition does not extend to continuous r.v.s, as then P (X = x) = 0.

Definition (Conditional PDF). For continuous r.v.s X and Y, the conditional PDF of Y given X = x is f Y X ( x) = f X,Y (x, )/f X (x). Remark. (i) As a function of y, f Y X (y x) is a PDF: P (Y A X = x) = f Y X (y x)dy, while x is treated as a constant (i.e. not a variable). A (ii) E (Y X = x) = yf Y X (y x)dy is a function of x, and Var(Y X = x) = {y E (Y X = x)} 2 f Y X (y x)dy.

(iii) If X and Y are independent, f Y X (y x) = f Y (y ). (iv) f X,Y (x, y ) = f X (x)f Y X (y x) = f X Y (x y )f Y (y ), which offers alternative ways to determine the joint PDF. (v) E {E (Y X )} = E (Y ) This in fact holds for any r.v.s X and Y. We give a proof here for continuous r.v.s only: { E {E (Y X )} = yf Y X (y x)dy } f X (x)dx = yf X,Y (x, y )dxdy = y { f X,Y (x, y )dx } dy = yf Y (y )dy = EY. Example 4. Let f X,Y (x, y ) = e y for 0 < x < y <, and 0 otherwise. Find f Y X (y x), f X Y (x y ) and Cov(X,Y ).

We need to find f X (x), f Y (y ) first: f X (x) = f X,Y (x, y )dy = Hence f Y (y ) = f X,Y (x, y )dx = f Y X (y x) = f X,Y (x, y ) f X (x) x y f X Y (x y ) = f X,Y (x, y ) f Y (y ) 0 e y dy = e x e y dx = y e y x (0, ), y (0, ). = e (y x) y (x, ), = 1/y x (0, y ). Note that given Y = y, X U (0, y ), i.e. the uniform distribution on (0, y ).

To find Cov(X,Y ), we compute E X, EY and E (XY ) first. E X = xf X (x)dx = xe x dx = e x (1 + x) 0 = 1, 0 EY = yf Y (y )dy = y 2 e y dy = y 2 e y 0 + 2 y e y dy = 2, 0 0 y E (XY ) = x yf X,Y (x, y )dxdy = dy x y e y dx = 1 y 3 e y dy 2 = 1 2 y 3 e y 0 + 3 2 0 0 0 y 2 e y dy = 3. Hence Cov(X,Y ) = E (XY ) (E X )(EY ) = 3 2 = 1. 0

4.3 Multivariate Distributions Let X = (X 1,, X n ) be a random vector (r.v.) consisting of n r.v.s. The joint CDF is defined as F (x 1,, x n ) F X1,,X n (x 1,, x n ) = P (X 1 x 1,, X n x n ). If X is continuous, its PDF f satisfies F (x 1,, x n ) = xn x1 In general, the PDF admits the factorisation f (u 1,, u n )du 1 du n. f (x 1,, x n ) = f (x 1 )f (x 2 x 1 )f (x 3 x 1, x 2 ) f (x n x 1,, x n 1 ), where f (x j x 1,, x j 1 ) denotes the conditional PDF of X j given X 1 = x 1,, X j 1 = x j.

However, when X 1,, X n are independent, f X1,,X n (x 1,, x n ) = f X1 (x 1 ) f Xn (x n ). IID Samples. If X 1,, X n are independent and each has the same CDF F, we say that X 1,, X n are IID (independent and identically distributed) and write X 1,, X n ii d F. We also call X 1,, X n a sample or a random sample. 4.3 Two important multivariate distributions Multinomial Distribution Multinomial(n, p 1,, p k ) an extension of Bin(n, p).

Suppose we threw a k -sided die n times, record X i as the number of times ended with the i -th side, i = 1,, k. Then (X 1,, X k ) Multinomial(n, p 1,, p k ), where p i is the probability of the event that the i -th side occurs in one threw. Obviously p i 0 and i p i = 1. We may immediately make the following observation from the above definition. (i) X 1 + + X k n, therefore X 1,, X n are not independent. (ii) X i Bin(n, p i ), hence E X i = np i and Var(X i ) = np i (1 p i ).

The joint probability function for Multinomial(n, p 1,, p k ): For any j 1,, j k 0 and j 1 + + j k = n, P (X 1 = j 1,, X k = j k ) = n! j 1! j k! pj 1 1 p j k k.

Multivariate Normal Distribution N (µ, Σ): a k -variable r.v. X = (X 1,, X k ) is normal with mean µ and covariance matrix Σ if its PDF is f (x) = 1 (2π) k /2 Σ 1/2 exp { 1 2 (x µ) Σ 1 (x µ) } x R k, where µ is k -vector, and Σ is a k k positive-definite matrix. Some properties of N (µ, Σ): Let µ = (µ 1,, µ k ) and Σ (σ ij ), then (i) E X = µ, and the covariance matrix Cov(X) = E {(X µ)(x µ) } = Σ, and σ ij = Cov(X i, X j ) = E {(X i µ i )(X j µ j )}.

(ii) When σ ij = 0 for all i j, i.e. the components of X are uncorrelated, Σ = diag(σ 11,, σ k k ), Σ = i σ ii. Hence the PDF admits a simple form f (x) = k i =1 1 exp{ 1 (x 2πσii 2σ i µ i ) 2 }. ii Thus X 1,, X n are independent when σ ij = 0 for all i j. (iii) Let Y = AX + b, where A is a constant matrix and b is a constant vector. Then Y N (Aµ + b, AΣA ). (iv) X i N (µ i, σ ii ). For any constant k -vector a, a X is a scale r.v. and a X N (a µ, a Σa). (v). Standard Normal Distribution: N (0, I k ), where I k is the k k identity matrix.

Example 5. Let X 1, X 2, X 3 be jointly normal with the common mean 0, variance 1 and Corr(X i, X j ) = 0.5, 1 i j 3. Find the probability P ( X 1 + X 2 + X 3 2). It is difficult to calculate this probability by the integration of the joint PDF. We provide an estimate by simulation. (The justification will be in Chapter 6). We solve a general problem first. Let X N (µ, Σ), X has p component. For any set A R p, we may estimate the probability P (X A) by the relative frequency #{1 i n : X i A} / n, where n is a large integer, and X 1,, X n are n vectors generated independently from N (µ, Σ).

Note X = µ + Σ 1/2 Z, where Z N (0, I p ) is standard normal, and Σ 1/2 0 and Σ 1/2 Σ 1/2 = Σ. We generate Z by rnorm(p), and apply the above linear transformation to obtain X. Σ 1/2 may be obtained by an eigenanalysis for Σ using R-function eigen. Since Σ 0, it holds that Σ = ΓΛΓ, where Γ is an orthogonal matrix (i.e. Γ Γ = I p ), Λ = diag(λ 1,, λ p ) is a diagonal matrix. Then Σ 1/2 = ΓΛ 1/2 Γ, where Λ 1/2 = diag ( ) λ1,, λ p. The R function rmnorm below generate random vectors from N (µ, Σ).

rmnorm <- function(n, p, mu, Sigma) { # generate n p-vectors from N(mu, Sigma) # mu is p-vector of mean, Sigma >=0 is pxp matrix t <- eigen(sigma, symmetric=t) # eigenanalysis for Sigma ev <- sqrt(t$values) # square-roots of the eigenvalues G <- as.matrix(t$vectors) # line up eigenvectors into a matrix G D <- G*0; for(i in 1:p) D[i,i] <- ev[i]; # D is diagonal matrix P <- G%*%D%*%t(G) # P=GDG' is the required transformation matrix Z <- matrix(rnorm(n*p), byrow=t, ncol=p) # Z is nxp matrix with elements drawn from N(0,1) Z <- Z%*%P # Now each row of Z is N(0, Sigma) X <- matrix(rep(mu, n), byrow=t, ncol=p) + Z # each row of X is N(mu, Sigma) }

This function is saved in the file rmnorm.r. We may use it to perform the required task: source("rmnorm.r") mu <- c(0, 0, 0) Sigma <- matrix(c(1,0.5,0.5,0.5,1,0.5,0.5,0.5,1), byrow=t, ncol=3) X <- rmnorm(20000, 3, mu, Sigma) dim(x) # check the size of X t <- abs(x[,1]) + abs(x[,2]) + abs(x[,3]) cat("estimated probability:", length(t[t<=2])/20000, "\n") It returned the value: Estimated probability: 0.446

I repeated it a few more times and obtained the estimates 0.439, 0.445, 0.441 etc.

4.4 Transformations of random variables Let a random vector X have PDF f X. We are interested in the distribution of a scalar function of X, say, Y = r (X). We introduce a general procedure first. Three steps to find the PDF of Y = r (X): (i) For each y, find the set A y = {x : r (x) y } (ii) Find the CDF (iii) f Y (y ) = d dy F Y (y ). F Y (y ) = P (Y y ) = P {r (X) y } = A y f X (x)d x.

Example 6. Let X f X (x) (X is a scalar). Find the PDF of Y = e X. A y = {x : e x y } = {x : x log y }. Hence Hence F Y (y ) = P (Y y ) = P {e X y } = P (X log y ) = F X (log y ). f Y (y ) = d dy F X (log y ) = f X (log y ) d log y dy = y 1 f X (log y ). Note that y = e x and log y = x, dy dx = ex = y. The above result can be written as f Y (y ) = f X (x) / dy dx, or f Y (y )dy = f X (x)dx. For 1-1 transformation Y = r (X ) (i.e. the inverse function X = r 1 (Y ) is

uniquely defined), it holds that f Y (y ) = f X (x)/ r (x) = f X (x) dx. dy Note. You should replace all x in the above by x = r 1 (y ).

Example 7. Let X U ( 1, 3). Find the PDF of Y = X 2. Now this is not a 1-1 transformation. We have to use the general 3-step procedure. Note that Y takes values in (0, 9). Consider two cases: (i) For y (0, 1), A y = ( y, y ), F y (y ) = A y f X (x)dx = 0.5 y. Hence f Y (y ) = F Y (y ) = 0.25/ y. (ii) For y [1, 9), A y = ( 1, y ), F y (y ) = A y f X (x)dx = 0.25( y + 1). Hence f Y (y ) = F Y (y ) = 0.125/ y. Collectively we have f Y (y ) = 0.25/ y 0 < y < 1 0.125/ y 1 y < 9 0 otherwise.