conditional cdf, conditional pdf, total probability theorem?

6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random variable, total expectation theorem characteristic function, moment generating function inequalities vector vs. random vector cdf?, pdf? transformation of a random vector? conditional cdf, conditional pdf, total probability theorem? expectation of a random vector, total expectation theorem? characteristic function?, moment generating function? 1

Big Picture Chapter 6. Two random variables Chapter 7. Multiple random variables Combined with this note Chapter 7. Random sequence Chapter 8. Statistics Take EECE645 Statistical Signal Processing Chapter 9. Concept of random process Chapter 10. Time-domain: correlation function Chapter 11. Frequency-domain: Power spectral density 2

Chapter Outline 6-1 Bivariate distributions 6-2 One function of two random variables 6-3 Two functions of two random variables 6-4 Joint moments 6-5 Joint characteristic functions 6-6 Conditional distributions 6-7 Conditional expected values 3

6.1 VECTOR RANDOM VARIABLES Q. What does it mean by that there are two random variables? Recall that, given (S, A, P ), a random variable X(s) is a measurable function from S into R. A. Given (S, A, P ), there are two measurable functions X(s) and Y (s) from S into R. Draw a block diagram with a source and a system with one input and two outputs. Terminologies a vector random variable = a random vector a bivariate random variable = a random vector of length 2 a multivariate random variable = a random vector of length greater than or equal to 2 an N-dimensional random variable = an N-dimensional random vector 4

Roughly speaking, given a random experiment with a real-valued vector outcome, the vector as a random entity is called a random vector. (Caution: A specific outcome is a deterministic vector not a random vector.) Def. Given a probability space (S, A, P ), a random vector X(s) of length N is a vectorvalued measurable function X 1 (s) X 2 (s) X(s) =. X N (s) from S into R N. Draw a block diagram!!!! Def. Random variables defined on the common sample space of a probability space are called jointly distributed random variables or, simply, joint random variables. 5

Given a fair-coin tossing problem, suppose that we have X(H) = 1, X(T ) = 1, Y (H) = 1, and Y (T ) = 1. Q1. Do we have an identical CDF for X and Y? A. Q2. Can we say that X = Y? Def. X is a non-negative random variable if X(s) 0, s S X is non-negative everywhere/surely. Def. X is non-negative almost surely/everywhere if Pr(X 0) = 1. Def. X is non-negative with probability 1. A. Terminologies Def. X = Y surely/everywhere if X(s) = Y (s), s S. Def. X = Y almost surely/everywhere if P ({s : X(s) = Y (s)}) = 1. X = Y a.s. X = Y with probability 1 P ({X = Y }) = 1. X = Y a.s. E[ X Y p ] = 0, for any p 1. X = Y a.s. P ( X Y > ɛ) = 0, ɛ > 0. Def. X = Y in distribution if F X (x) = F Y (x), x. 6

Given a fair-coin tossing problem, suppose that we have X 1 (H) = 1, X 1 (T ) = 1, Y 1 (H) = 1, and Y 1 (T ) = 1. How can we fully characterize these random variables? A. Since there are only two outcomes {[X 1, Y 1 ] = [1, 1]} and {[X 1, Y 1 ] = [ 1, 1]}, we just need to specify that Pr({[X 1, Y 1 ] = [1, 1]}) = 1/2 and Pr({[X 1, Y 1 ] = [ 1, 1]}) = 1/2. Visualize the pmf. Q. Suppose that we have X 2 (H) = 1, X 2 (T ) = 1, Y 2 (H) = 1, and Y 2 (T ) = 1. We have identical distributions for X 1 and X 2 and identical distributions for Y 1 and Y 2? Can we say that we have identical distributions for [X 1, X 2 ] and [X 2, Y 2 ]? Visualize the pmf. A. 7

Q. (special case) How can we fully characterize a random vector, if the induced sample space is countable? A. It is necessary and sufficient to assign probabilities to each possible outcome. Two N-D discrete random vectors have the same distribution iff the N-D probability mass functions are identical. Q. (general case) How can we fully characterize a random vector, if the induced sample space is not countable? Can we introduce something similar to the CDF/PDF of a random variable for a random vector? 8

Joint Distribution Function The induced probability space of a random vector X of length N can be denoted by ( R N, B N, P X ). B n is the Borel σ-algebra of R N. Consider a 2-D histogram when we have a random vector of length 2. Then, it is straightforward to introduce the notion of a 2-D PDF f X (x) = f X1,X 2 (x 1, x 2 ) of a random vector of length 2 as a limit of the normalized 2-D histogram. What about a 2-D CDF? Considering the backward compatibility, we may need the 2-D PDF be the 2-D derivative of the 2-D CDF, i.e., F X,Y (x, y) = It turns out that this guess is true. 2 F X,Y (x, y) = f X,Y (x, y) x y x y f X,Y (x, y )dx dy. 9

Consider an event now in a joint sample space as A = {X x} = {X x, Y } B = {Y y} = { X, Y y} Figure 6.1-1 Then, A B = {X x, Y y} is the joint event whose probability is defined as the CDF. Now, draw an X-Y plane and shade the set that corresponds to a joint event. How can you find the area? How can you find the probability of this event? What is it? 10

Figure 6.1-2 11

Def. The joint CDF of a random vector [X, Y ] is defined as the probability of the joint event {X x, Y y}, i.e., F X,Y (x, y) Pr{X x, Y y} Lemma. (w/o proof) Any event B of interest in B 2 can be rewritten as a set operation on the events in the form of {X x, Y y}. Proposition. The joint CDF of two discrete random variables can be written as M N F X,Y (x, y) = p X,Y (x m, y n )u(x x m )u(y y n ) m=1 n=1 where p X,Y (x m, y n ) is called the joint PMF of [X, Y ]. Def. For N random variables, F X1,X 2,...,X N (x 1, x 2,..., x N ) Pr{X 1 x 1, X 2 x 2,..., X N x N }. Theorem. A random vector X [X 1, X 2,..., X N ] is fully characterized by its joint CDF F X (x). 12

Figure 6.1-3 13

Properties of the Joint CDF properties of a joint distribution function for two random variables X and Y : (1) F X,Y (, ) = 0 F X,Y (, y) = 0 F X,Y (x, ) = 0 (6.1-1a) (2) F X,Y (, ) = 1 (6.1-1b) (3) 0 F X,Y (x, y) 1 (6.1-1c) (4) F X,Y (x, y) is a nondecreasing function of both x and y (6.1-1d) (5) F X,Y (x 2, y 2 ) + F X,Y (x 1, y 1 ) F X,Y (x 1, y 2 ) F X,Y (x 2, y 1 ) = P {x 1 < X x 2, y 1 < Y y 2 } 0 (6.1-1e) (6) F X,Y (x, ) = F X (x) F X,Y (, y) = F Y (y) (6.1-1f) 14

Marginal CDF Property 6: The distribution function of one random variable can be obtained by... Note that {X x} = {X x, Y }, {Y y} = {X, Y y}. Recall: F X (x) = Pr{X x} F Y (y) = Pr{Y y} Thus, F X,Y (x, ) = P ({s S : X(s) x, Y (s) }) = P ({s S : X(s) x} {s S : Y (s) }) = P ({s S : X(s) x} S) = P ({s S : X(s) x}) = F X (x). Similarly, F X,Y (, y) = F Y (y). F X (x) and F Y (y) obtained from F X,Y (x, y) by using Property 6 are called marginal CDFs. From an N-dimensional joint distribution function we may obtain a k-dimensional marginal distribution function. 15

Joint Density Function The concept of a pdf of a random variable is extended to include multiple random variables. Def. The joint probability density function=the joint density function Lemma. If X and Y are discrete, f X,Y (x, y) = f X,Y (x, y) = 2 F X,Y (x, y) x y M m=1 n=1 N p X,Y (x m, y n )δ(x x m )δ(y y n ). Def. For N random variables, the joint PDF is define by which implies f X1,X 2,...,X N (x 1, x 2,..., x N ) = N F X1,X 2,...,X N (x 1, x 2,..., x N ) x 1 x 2... x N, F X1,X 2,...,X N (x 1, x 2,..., x N ) = xn x2 x1... f X1,X 2,...,X N (ξ 1, ξ 2,..., ξ N )dξ 1 dξ 2...dξ N 16

Properties of a Joint PDF properties of a joint density function (1) f X,Y (x, y) 0 (6.1-2a) (2) (3) F X,Y (x, y) = (4) F X (x) = F Y (y) = f X,Y (x, y)dxdy = 1 x y y x f X,Y (ξ 1, ξ 2 )dξ 1 dξ 2 f X,Y (ξ 1, ξ 2 )dξ 2 dξ 1 f X,Y (ξ 1, ξ 2 )dξ 1 dξ 2 (5) P {x 1 < X x 2, y 1 < Y y 2 } = (6) f X (x) = f Y (y) = f X,Y (x, y)dy f X,Y (x, y)dx y2 x2 y 1 (6.1-2b) (6.1-2c) (6.1-2d) (6.1-2e) x 1 f X,Y (x, y)dxdy (6.1-2f) (6.1-2g) (6.1-2h) Property 1 and 2 may be used as sufficient tests to determine if some function can be a valid density function. 17

Marginal Density Functions marginal probability density functions = marginal density functions f X (x) = df X(x) dx f Y (y) = df Y (y) dy For N random variables, a k-dimensional marginal density function f X1,X 2,...,X k (x 1, x 2,..., x k ) = There are ( N k ) k-dimensional marginal PDFs.... f X1,X 2,...,X N (x 1, x 2,..., x N )dx k+1 dx k+2...dx N 18

6.6 CONDITIONAL DISTRIBUTIONS Review the conditional cdf of X given B with P (B) 0 the corresponding conditional pdf F X (x B) = P (X x B) = P ({X x} B) P (B) f X (x B) = df X(x B) dx B = {X a}, B = {a < X b}, B = {X = a} Now, we have two jointly distributed random variables X and Y. What if B = {Y a}, B = {a < Y b}, B = {Y = a}? The results must be consistent with our previous results with Y = X. 19

*Conditional Distribution and Density Interval Conditioning interval conditioning X and Y either continuous or discrete B = {y a < Y y b } F X (x y a < Y y b )= F X,Y (x, y b ) F X,Y (x, y a ) F Y (y b ) F Y (y a ) yb x y = a f X,Y (ξ, y)dξdy yb f X,Y (x, y)dxdy (6.6-1) y a f X (x y a < Y y b ) = yb yb y a f X,Y (x, y)dy f X,Y (x, y)dxdy y a 20

Figure 6.6-1 21

Conditional Distribution and Density Point Conditioning the conditional pdf of X given {Y = y} Distinguish a constant random variable Y = 3 from an event {Y = 3}! point conditioning B = {y y < Y y + y} We consider only two cases: F X (x y y < Y y + y) = X and Y are both discrete = Pr(Y = y k ) > 0 f X,Y (x, y) = f Y (y) = N i=1 y+ y y y M P (y j )δ(y y j ) j=1 x f X,Y (ξ 1, ξ 2 )dξ 1 dξ 2 y+ y y y f Y (ξ)dξ M P (x i, y j )δ(x x i )δ(y y j ) j=1 F X (x Y = y k ) = f X (x Y = y k ) = N i=1 N i=1 P (x i, y k ) P (y k ) u(x x i) P (x i, y k ) P (y k ) δ(x x i) 22

the conditional probability of event A given {Y = y} P (A Y = y) = P (A {Y = y}) P (Y = y) X and Y are both continuous = Pr(Y = y k ) = 0 x F X (x y y < Y y + y) = f X,Y (ξ 1, y)dξ 1 2 y f Y (y)2 y x F X (x Y = y) = f X,Y (ξ, y)dξ f Y (y) f X (x Y = y) = f X,Y (x, y) f Y (y) f X Y (x, y) = f X,Y (x, y) f Y (y) f Y X (y, x) = f X,Y (x, y) f X (x) the conditional probability of event A given {Y = y} P (A Y = y) = P (A {Y = y}) f Y (y) 23

Figure 6.6-2 24

Bayes Theorems for joint random variables X continuous, Y continuous f X Y (x, y) = f Y X (y, x)f X (x) f Y X(y, x)f X (x)dx X continuous, Y discrete f X Y (x, y) = Pr(Y = y X = x)f X (x) Pr(Y = y X = x)f X(x)dx X discrete, Y continuous Pr(X = x Y = y) = f Y X(y, x) Pr(X = x) x f Y X(y, x) Pr(X = x) X discrete, Y discrete... 25

STATISTICAL INDEPENDENCE statistical independence of events vs. statistical independence of random variables Review. Two events A and B are statistically independent if P (A B) = P (A)P (B) Def. Two random variables X and Y are statistically independent if Equivalently, P {X x, Y y} = P {X x}p {Y y}, x, y F X,Y (x, y) = F X (x)f Y (y), x, y f X,Y (x, y) = f X (x)f Y (y), x, y F X (x Y y) = P {X x, Y y} P {Y y} = F X,Y (x, y) F Y (y) = F X(x)F Y (y) F Y (y) = F X (x), x, y F Y (y X x) = F Y (y), x, y f X (x Y y) = f X (x), x, y f Y (y X x) = f Y (y), x, y Any event in terms of X and any other event in terms of Y are always independent. 26

Statistical Independence of N Random Variables Def. For any M, N N and for any k 1, k 2,..., k M and l 1, l 2,..., l N without any repetition, an event in terms of X k1, X k2,..., X km and an event in terms of X l1, X l2,..., X ln are independent events. If X 1, X 2,..., X N are statistically independent then any group of these random variables is independent of any other group. A function of any group is independent of any function of any other group of the random variables. A i = {X i x i } i = 1, 2,..., N Theorem. X 1, X 2,..., X N are statistically independent iff N F X1,X 2,...,X N (x 1, x 2,..., x N ) = F Xn (x n ) n=1 Equivalently, N f X1,X 2,...,X N (x 1, x 2,..., x N ) = f Xn (x n ) n=1 Q. What is the difference from the definition of N independent events? 27

6.2 ONE FUNCTION OF TWO RANDOM VARIABLES a function of statistically independent or dependent two random variables How to find the cdf and the pdf? General One Function Case Y = g(x 1, X 2,..., X N ) F Y (y) = P {g(x 1, X 2,..., X N ) y} =... f X1,X 2,...,X N (x 1, x 2,..., x N )dx 1 dx 2...dx N {g(x 1, x 2,..., x N ) y} f Y (y) = df Y (y) fy = d... f X1,X dy 2,...,X N (x 1, x 2,..., x N )dx 1 dx 2...dx N {g(x 1, x 2,..., x N ) y} 28

Special One Function Case: Sum of Two Random Variables Q. Given two jointly distributed random variables X and Y, find the cdf and the pdf of X +Y. Case I: When X and Y are dependent. We use the result for the general case now with g(x, Y ) X + Y. The key is to set W = X + Y. Then, the cdf of W is given by F W (w) = Pr(W w) = Pr(X + Y w). Let s visualize the event of which probability to be computed. Figure 6.2-1 29

Thus, the CDF of W can be rewritten in terms of the joint PDF of X and Y as: F W (w) = w y x= f X,Y (x, y)dxdy By differentiating using Leibniz s rule, we have the PDF of W as: f W (w) = Case II: When X and Y are independent. f X,Y (w y, y)dy The CDF of W can be rewritten in terms of the joint PDF of X and Y as: F W (w) = = The PDF of W is given by w y f W (w) = An alternate derivation f W (w) = = = x= w y f Y (y) f X (x)f Y (y)dxdy x= f X (w y)f Y (y)dy f W,Y (w, y)dy f W Y =y (w)f Y (y)dy f X (w y)f Y (y)dy = f X (w) f Y (w) f X (x)dxdy (6.2-1) 30

The density function of the sum of two statistically independent random variables is the convolution of their individual density functions. Figure 6.2-2 31

Q. What if X and Y are jointly Gaussian and Z = X + Y? Z is Gaussian. Q. What if X Y? Hint. X + ( Y ), f Z (z) = f Y (z) = f Y ( z) Q. What if X/Y? If X and Y are jointly normal, then X/Y has a Cauchy density centered at rσ 1 /σ 2. Q. What if X 2 + Y 2? If X and Y are i.i.d. normal with zero mean, then X 2 + Y 2 has an exponential density. Q. What if X 2 + Y 2? If X and Y are i.i.d. normal with zero mean, then X 2 + Y 2 has a Rayleigh density. If X and Y are independent normal with same variance, then X 2 + Y 2 has a Ricean/Rician density. 32

Order Statistics Def. Given X 1, X 2,..., X N, we can rearrange them in an increasing order of magnitude such that X (1) X (2) X (N). Then, X (k) is called the kth-order statistic. As X 1, X 2,..., X N are dependent in general, X (1), X (2),, X (N) are also dependent in general. X (k) is a nonlinear operation on X[ 1, X 2,..., X N ]. X (1) = min(x 1, X 2,..., X N ) and X (N) = max(x 1, X 2,..., X N ) are representative ones. For N = 2, sketch the events min(x, Y ) w and max(x, Y ) w. What if X and Y are independent? Discrete Case case with X, Y Z, case with... Q. What if X and Y are independent Poisson and Z = X + Y? Z is Poisson. 33

6.3 TWO FUNCTIONS OF TWO RANDOM VARIABLES Q. Given X = [X 1, X 2 ] T, find the joint CDF and PDF of Y = [g 1 (X 1, X 2 ), g 2 (X 1, X 2 )] T. CDF first: F Y1,Y 2 (y 1, y 2 ) = Pr(Y 1 y 1, Y 2 y 2 ) Then, PDF by differentiating... Example 6-21: [min(x 1, X 2 ), max(x 1, X 2 )] = Pr(g 1 (X 1, X 2 ) y 1, g 2 (X 1, X 2 ) y 2 ) = f X1,X 2 (x 1, x 2 )dx 1 dx 2 (6.3-1) {(x 1,x 2 ):g 1 (x 1,x 2 ) y 1,g 2 (x 1,x 2 ) y 2 } Example 6-22: Cartesian coordinate to polar coordinate R = X 2 + Y 2, Θ = tan 1 (Y/X) f R,Θ (r, θ) = f XY (r cos θ, r sin θ)r To show Q(x) 1 x2 2e 2, x 0. 34

Multiple Functions Q. Find the joint density function of a set of functions that defines a set of random variables. Y i = g i (X 1, X 2,..., X N ) i = 1, 2,..., N Note that N random variables are mapped to N random variables. What if N random variables are mapped to M < N random variables? Introduce an auxiliary random variable, then find an Mth order marginal PDF. Backward compatibility: Recall that where y = g(x i ). f Y (y) = f X (x i ) dg(x) dx General Case from 2 r.v. to 2 r.v.: When [Z, W ] = [g(x, Y ), h(x, Y )], xi f Z,W (z, w) = f X,Y (x i, y i ) J(x i, y i ) where z = g(x i, y i ), w = h(x i, y i ) and J(x i, y i ) is the Jacobian (determinant) of the original transformation defined as ]) J(x i, y i ) = det ([ g x h x The Jacobian can be negative, so that we take the absolute value of it. It is assumed that... differentiable. g y h y. 35

Special Case from N r.v. to N r.v. invertible: It is assumed that a set of inverse continuous functions T 1 j exists such that Then, X j = T 1 j (Y 1, Y 2,..., Y N ) j = 1, 2,..., N. =.. R Ẋ... R Y f X1,...,X N (x 1,..., x N )dx 1...dx N f Y1,...,Y N (y 1,..., y N )dy 1...dy N. differentiable: The Jacobian of the transformations is the determinant of a matrix of derivatives Thus, which implies J(y) = T 1 1 y 1. TN 1 y 1 T 1 1 y N. TN 1 y N.. f X1,...,X N (x 1,..., x N )dx 1...dx N R Ẋ =... f Y1,...,Y N (x 1 = T1 1,..., x N = TN 1 ) J dy 1...dy N, R Y f Y (y) = f X (T 1 (y)) J(y) 36

6.4 JOINT MOMENTS expectation for two or more random variables moments, characteristic functions, moment generating functions EXPECTED VALUE OF A FUNCTION OF RANDOM VARIABLES Recall that if Z = g(x, Y ) then E[Z] = zf Z(z)dz. (Fundamental Theorem of Expectation) Backward compatibility: For multiple random variables, Special Cases: E[g(X 1,..., X N )] = E[g(X, Y )] = g(x 1,..., X N ) = g(x 1 ) E[g(X 1 )] = E{X + Y } = E{X} + E{Y } always. E{ i X i} = i a ie{x i } always. E{XY } E{X}E{Y } in general. g(x, y)f X,Y (x, y)dxdy g(x 1 )f X1 (x 1 )dx 1... g(x 1,..., x N )f X1,...,X N (x 1,..., x N )dx 1...dx N If X and Y are independent, then E{XY } = E{X}E{Y }. 37

Joint Moments about the Origin Def. joint moments about the origin = joint non-central moments m nk = E[X n Y k ] = the first-order moments, the second-order moments,... the correlation of X and Y R XY = m 11 = E[XY ] = Def. X and Y are uncorrelated if x n y k f X,Y (x, y)dxdy xyf X,Y (x, y)dxdy R XY = E{XY } = E{X}E{Y } statistical independence of X and Y uncorrelatedness Converse is not true in general. For gaussian, the converse is true. Why? Correlation does not imply causation. Def. X and Y are orthogonal if R XY = E{XY } = 0 Zero-mean X and Y are orthogonal iff uncorrelated. 38

Joint Central Moments Def. joint central moments µ nk = E[(X X) n (Y Ȳ )k ] = the second-order central moments: variances µ 20 = E[(X X) 2 ] = σx 2 = C XX (x X) n (y Ȳ )k f X,Y (x, y)dxdy µ 02 = E[(Y Ȳ )2 ] = σy 2 = C Y Y the covariance of X and Y C XY = µ 11 = E[(X X)(Y Ȳ )] = (x X)(y Ȳ )f X,Y (x, y)dxdy C XY = R XY XȲ = R XY E[X]E[Y ] C XY = 0 iff X and Y are uncorrelated C XY = E[X]E[Y ] iff X and Y orthogonal If X and Y are orthogonal and either X or Y has zero mean value, then C XY = 0. the normalized second-order moment 39

ρ µ11/ µ 20 µ 02 = C XY /σ X σ [ Y (X X) (Y ρ = E Ȳ ) ] σ X σ Y ρ = the correlation coefficient of X and Y C XY ρ = CXX C Y Y (6.4-1a) (6.4-1b) 1 ρ 1 Why? Consider E[(aX + Y ) 2 ] 0, a and its discriminant. Examples in Wikipedia If pairwise uncorrelated random variables, then { } Var a i X i = i i a 2 i Var {X i }. 40

6.5 JOINT CHARACTERISTIC FUNCTIONS Def. The joint characteristic function of two random variables X and Y where ω 1 and ω 2 are real numbers. Φ X,Y (ω 1, ω 2 ) = Φ X,Y (ω 1, ω 2 ) = E[e jω 1X+jω 2 Y ] f X,Y (x, y)e jω 1x+jω 2 y dxdy the two-dimensional Fourier transform (with signs of ω 1 and ω 2 reversed) of the joint density function f X,Y (x, y) = 1 (2π) 2 marginal characteristic functions Φ X,Y (ω 1, ω 2 )e jω 1x jω 2 y dω 1 dω 2 Φ X (ω 1 ) = Φ X,Y (ω 1, 0) Φ Y (ω 2 ) = Φ X,Y (0, ω 2 ) Joint moments can be found from the joint characteristic function. m nk = ( j) n+k n+k Φ X,Y (ω 1, ω 2 ) ω1 n ωk 2 ω1 =0,ω 2 =0 useful where the probability density function is needed for the sum of N statistically independent random variables 41

Two JOINTLY GAUSSIAN Random Variables Two random variables X and Y are jointly gaussian, if their joint pdf is of the form f X,Y (x, y) = { [ 1 1 (x 2 2πσ X σ exp X) Y 1 ρ 2 2(1 ρ 2 ) σx 2 the bivariate gaussian density 2ρ(x X)(y Ȳ ) σ X σ Y + ]} (y Ȳ )2 σy 2 X = E[X] (6.5-1) Ȳ = E[Y ] (6.5-2) σ 2 X = E[(X X) 2 ] (6.5-3) σ 2 X = E[(Y Ȳ )2 ] (6.5-4) ρ = E[(X X)(Y Ȳ )]/σ Xσ Y (6.5-5) f X,Y (x, y) f X,Y ( X, Ȳ ) = 1 2πσ X σ Y 1 ρ 2 The locus of constant values of f X,Y (x, y) is an ellipse. If ρ = 0, corresponding to uncorrelated X and Y, then f X,Y (x, y) = f X (x)f Y (y) ] 2 1 (x X) f X (x) = exp [ 2πσ 2 X 2σ 2 X 42

f Y (y) = ] 1 (y Ȳ )2 exp [ 2πσ 2 Y 2σY 2 Any uncorrelated gaussian random variables are also statistically independent A coordinate rotation (linear transformation of X and Y ) through an angle θ = 1 [ ] 2ρσX σ Y 2 tan 1 σx 2 σ2 Y is sufficient to convert correlated random variable X and Y into two statistically independent gaussian random variables 43

Figure 6.5-1 44

COMPUTER GENERATION OF MULTIPLE RANDOM VARIABLES Using two statistically independent random variables X 1 and X 2, both uniformly distributed on (0, 1), generate two statistically dependent gaussian random variables Y 1 and Y 2, each with zero-mean and unit-variance. Y 1 = T 1 (X 1, X 2 ) = 2 ln(x 1 ) cos(2πx 2 ) Y 2 = T 2 (X 1, X 2 ) = 2 ln(x 1 ) sin(2πx 2 ) (6.5-6a) (6.5-6b) f Y1,Y 2 (y 1, y 2 ) = e y2 1/2 2π e y2 2/2 2π Using two statistically independent gaussian random variables Y 1 and Y 2, each with zero-mean and unit-variance, generate two statistically independent gaussian random variables W 1 and W 2 that have arbitrary variances and arbitrary correlation coefficient. [ ] σw [C W ] = 2 1 ρ W σ W1 σ W2 ρ W σ W1 σ W2 σw 2 = [T ][T ] t 2 To find [T ], set [T ] as a lower triangular matrix of the form [ ] T11 0 [T ] = T 21 T 22 possible as long as [C W ] is non singular T 11 = σ W1 (6.5-7a) T 21 = ρ W σ W2 (6.5-7b) T 22 = σ W2 1 ρ 2 W (6.5-7c) 45

Thus, W 1 = T 11 Y 1 = σ W1 Y 1 (6.5-8a) W 2 = T 21 Y 1 + T 22 Y 2 = ρ W σ W2 Y 1 + σ W2 1 ρ 2 W Y 2 (6.5-8b) If arbitrary means are desired W 1 = W 1 + σ W1 Y 1 (6.5-9a) W 2 = W 2 + ρ W σ W2 Y 1 + σ W2 1 ρ 2 W 2 Y 2 (6.5-9b) For N random variables, [T ] can be found by the Cholesky method of factoring matrices. Suppose two statistically independent gaussian random variables W 1 and W 2 with respective means W 1 and W 2 and variances both equal to σ 2, are subject to Thus, R = T 1 (W 1, W 2 ) = W 2 1 + W 2 2 (6.5-10) Θ = T 2 (W 1, W 2 ) = tan 1 (W 2 /W 1 ) (6.5-11) Since W 1 = T1 1 (R, Θ) = R cos(θ) (6.5-12) W 2 = T2 1 (R, Θ) = R sin(θ) (6.5-13) 46

If we define f W1,W 2 (w 1, w 2 ) = 1 2πσ 2e [(w 1 W 1 ) 2 +(w 2 W 2 ) 2 ]/(2σ 2 ) f R,Θ (r, θ) = ru(r) 2πσ exp { [ [r cos(θ) W 2 1 ] 2 + [r sin(θ) W 2 ] 2] /(2σ 2 ) } = ru(r) { 2πσ exp 1 } + A 2 2 2σ 2[r2 0 2rA 0 cos(θ θ 0 )] (6.5-14) A 0 = W 2 1 + W 2 2 then simplifies to f R,Θ (r, θ) = ru(r) 2πσ 2 exp θ 0 = tan 1 ( W 2 / W 1 ) { 1 } + A 2 2σ 2[r2 0 2rA 0 cos(θ θ 0 )] 47

6.7 CONDITIONAL EXPECTED VALUES Conditional Expectation When X and Y are jointly distributed, which is a function of y. E[X Y = y] = Let T (y) E[X Y = y]. Then, what is Z T (Y )? T (Y ) is not denoted by E[X Y = Y ]. T (Y ) is denoted by E[X Y ]. E[X Y ] is a random variable. xf X Y =y (x)dx x f X,Y (x, y) dx f Y (y) If X and Y are independent, then T (y) = E[X Y = y] = E[X]. Thus, E[X Y ] = E[X] is a constant random variable. The pdf of E[X Y ] can be found as we already learned. What is E[E[X Y ]]? Total Expectation Theorem: E[E[X Y ]] = = E[X Y = y]f Y (y)dy ( x f ) X,Y (x, y) dx f Y (y)dy f Y (y) xf X,Y (x, y)dxdy = E[X] (6.7-1) 48

When X, Y, and Z are jointly distributed, two functions E[Z X = x] and E[E[Z X, Y ] X = x] satisfy E[Z X = x] = E[E[Z X, Y ] X = x]. Why? Thus, two random variables E[Z X] and E[E[Z X, Y ] X] satisfy E[Z X] = E[E[Z X, Y ] X]. Caution: E[g(X, Y ) X = x] = E[g(x, Y ) X = x] E[g(x, Y )] E[g(X, Y ) X] is a random variable. E[E[g(X, Y ) X]] = E[g(X, Y )] If X and Y are zero-mean Gaussian random variables with correlation coefficient r, then E[X 2 Y 2 ] = E[X 2 ]E[Y 2 ] + 2 (E[XY ]) 2. 49

6.8 SUMMARY the theory of multiple random variables a random vector joint cdf, joint pdf conditional cdf and conditional pdf for several random variables statistical independence of random variables 50