conditional cdf, conditional pdf, total probability theorem?

Similar documents
5 Operations on Multiple Random Variables

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Multiple Random Variables

Continuous Random Variables

ECE Lecture #9 Part 2 Overview

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

3 Operations on One Random Variable - Expectation

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

BASICS OF PROBABILITY

3. Probability and Statistics

Lecture 11. Probability Theory: an Overveiw

Formulas for probability theory and linear models SF2941

Lecture 2: Repetition of probability theory and statistics

Review of Probability Theory

ENGG2430A-Homework 2

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

EE4601 Communication Systems

Let X and Y denote two random variables. The joint distribution of these random

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University.

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

2 (Statistics) Random variables

A Probability Review

Multivariate Random Variable

Fundamentals of Digital Commun. Ch. 4: Random Variables and Random Processes

ECE 4400:693 - Information Theory

Chapter 4 : Expectation and Moments

1 Random Variable: Topics

Algorithms for Uncertainty Quantification

More than one variable

Multivariate distributions

Stat 5101 Notes: Algorithms (thru 2nd midterm)

where r n = dn+1 x(t)

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

18.440: Lecture 28 Lectures Review

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

1.1 Review of Probability Theory

Bivariate Transformations

Chp 4. Expectation and Variance

Lecture 1: August 28

Chap 2.1 : Random Variables

Chapter 2: Random Variables

EE 438 Essential Definitions and Relations

18.440: Lecture 28 Lectures Review

Stat 5101 Notes: Algorithms

Conditional distributions. Conditional expectation and conditional variance with respect to a variable.

Introduction to Probability Theory

MAS223 Statistical Inference and Modelling Exercises

ECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Chapter 5,6 Multiple RandomVariables

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Lecture 6 Basic Probability

Uncorrelatedness and Independence

Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables.

Multiple Random Variables

2 Functions of random variables

Definition of a Stochastic Process

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

18 Bivariate normal distribution I

Probability Notes. Compiled by Paul J. Hurtado. Last Compiled: September 6, 2017

STOR Lecture 16. Properties of Expectation - I

Chapter 4 Multiple Random Variables

ECE Lecture #10 Overview

Random Variables and Their Distributions

STAT 430/510: Lecture 16

01 Probability Theory and Statistics Review

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Multivariate Distributions (Hogg Chapter Two)

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Review: mostly probability and some statistics

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Bivariate Distributions. Discrete Bivariate Distribution Example

Recitation 2: Probability

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Math-Stat-491-Fall2014-Notes-I

Probability Background

3 Multiple Discrete Random Variables

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

180B Lecture Notes, W2011

Probability and Distributions

3 Operations on One R.V. - Expectation

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

MULTIVARIATE PROBABILITY DISTRIBUTIONS

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I

The Multivariate Normal Distribution. In this case according to our theorem

Lecture 19: Properties of Expectation

Introduction to Probability and Stocastic Processes - Part I

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Random Variables. P(x) = P[X(e)] = P(e). (1)

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

Raquel Prado. Name: Department of Applied Mathematics and Statistics AMS-131. Spring 2010

4 Pairs of Random Variables

Appendix A : Introduction to Probability and stochastic processes

Transcription:

6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random variable, total expectation theorem characteristic function, moment generating function inequalities vector vs. random vector cdf?, pdf? transformation of a random vector? conditional cdf, conditional pdf, total probability theorem? expectation of a random vector, total expectation theorem? characteristic function?, moment generating function? 1

Big Picture Chapter 6. Two random variables Chapter 7. Multiple random variables Combined with this note Chapter 7. Random sequence Chapter 8. Statistics Take EECE645 Statistical Signal Processing Chapter 9. Concept of random process Chapter 10. Time-domain: correlation function Chapter 11. Frequency-domain: Power spectral density 2

Chapter Outline 6-1 Bivariate distributions 6-2 One function of two random variables 6-3 Two functions of two random variables 6-4 Joint moments 6-5 Joint characteristic functions 6-6 Conditional distributions 6-7 Conditional expected values 3

6.1 VECTOR RANDOM VARIABLES Q. What does it mean by that there are two random variables? Recall that, given (S, A, P ), a random variable X(s) is a measurable function from S into R. A. Given (S, A, P ), there are two measurable functions X(s) and Y (s) from S into R. Draw a block diagram with a source and a system with one input and two outputs. Terminologies a vector random variable = a random vector a bivariate random variable = a random vector of length 2 a multivariate random variable = a random vector of length greater than or equal to 2 an N-dimensional random variable = an N-dimensional random vector 4

Roughly speaking, given a random experiment with a real-valued vector outcome, the vector as a random entity is called a random vector. (Caution: A specific outcome is a deterministic vector not a random vector.) Def. Given a probability space (S, A, P ), a random vector X(s) of length N is a vectorvalued measurable function X 1 (s) X 2 (s) X(s) =. X N (s) from S into R N. Draw a block diagram!!!! Def. Random variables defined on the common sample space of a probability space are called jointly distributed random variables or, simply, joint random variables. 5

Given a fair-coin tossing problem, suppose that we have X(H) = 1, X(T ) = 1, Y (H) = 1, and Y (T ) = 1. Q1. Do we have an identical CDF for X and Y? A. Q2. Can we say that X = Y? Def. X is a non-negative random variable if X(s) 0, s S X is non-negative everywhere/surely. Def. X is non-negative almost surely/everywhere if Pr(X 0) = 1. Def. X is non-negative with probability 1. A. Terminologies Def. X = Y surely/everywhere if X(s) = Y (s), s S. Def. X = Y almost surely/everywhere if P ({s : X(s) = Y (s)}) = 1. X = Y a.s. X = Y with probability 1 P ({X = Y }) = 1. X = Y a.s. E[ X Y p ] = 0, for any p 1. X = Y a.s. P ( X Y > ɛ) = 0, ɛ > 0. Def. X = Y in distribution if F X (x) = F Y (x), x. 6

Given a fair-coin tossing problem, suppose that we have X 1 (H) = 1, X 1 (T ) = 1, Y 1 (H) = 1, and Y 1 (T ) = 1. How can we fully characterize these random variables? A. Since there are only two outcomes {[X 1, Y 1 ] = [1, 1]} and {[X 1, Y 1 ] = [ 1, 1]}, we just need to specify that Pr({[X 1, Y 1 ] = [1, 1]}) = 1/2 and Pr({[X 1, Y 1 ] = [ 1, 1]}) = 1/2. Visualize the pmf. Q. Suppose that we have X 2 (H) = 1, X 2 (T ) = 1, Y 2 (H) = 1, and Y 2 (T ) = 1. We have identical distributions for X 1 and X 2 and identical distributions for Y 1 and Y 2? Can we say that we have identical distributions for [X 1, X 2 ] and [X 2, Y 2 ]? Visualize the pmf. A. 7

Q. (special case) How can we fully characterize a random vector, if the induced sample space is countable? A. It is necessary and sufficient to assign probabilities to each possible outcome. Two N-D discrete random vectors have the same distribution iff the N-D probability mass functions are identical. Q. (general case) How can we fully characterize a random vector, if the induced sample space is not countable? Can we introduce something similar to the CDF/PDF of a random variable for a random vector? 8

Joint Distribution Function The induced probability space of a random vector X of length N can be denoted by ( R N, B N, P X ). B n is the Borel σ-algebra of R N. Consider a 2-D histogram when we have a random vector of length 2. Then, it is straightforward to introduce the notion of a 2-D PDF f X (x) = f X1,X 2 (x 1, x 2 ) of a random vector of length 2 as a limit of the normalized 2-D histogram. What about a 2-D CDF? Considering the backward compatibility, we may need the 2-D PDF be the 2-D derivative of the 2-D CDF, i.e., F X,Y (x, y) = It turns out that this guess is true. 2 F X,Y (x, y) = f X,Y (x, y) x y x y f X,Y (x, y )dx dy. 9

Consider an event now in a joint sample space as A = {X x} = {X x, Y } B = {Y y} = { X, Y y} Figure 6.1-1 Then, A B = {X x, Y y} is the joint event whose probability is defined as the CDF. Now, draw an X-Y plane and shade the set that corresponds to a joint event. How can you find the area? How can you find the probability of this event? What is it? 10

Figure 6.1-2 11

Def. The joint CDF of a random vector [X, Y ] is defined as the probability of the joint event {X x, Y y}, i.e., F X,Y (x, y) Pr{X x, Y y} Lemma. (w/o proof) Any event B of interest in B 2 can be rewritten as a set operation on the events in the form of {X x, Y y}. Proposition. The joint CDF of two discrete random variables can be written as M N F X,Y (x, y) = p X,Y (x m, y n )u(x x m )u(y y n ) m=1 n=1 where p X,Y (x m, y n ) is called the joint PMF of [X, Y ]. Def. For N random variables, F X1,X 2,...,X N (x 1, x 2,..., x N ) Pr{X 1 x 1, X 2 x 2,..., X N x N }. Theorem. A random vector X [X 1, X 2,..., X N ] is fully characterized by its joint CDF F X (x). 12

Figure 6.1-3 13

Properties of the Joint CDF properties of a joint distribution function for two random variables X and Y : (1) F X,Y (, ) = 0 F X,Y (, y) = 0 F X,Y (x, ) = 0 (6.1-1a) (2) F X,Y (, ) = 1 (6.1-1b) (3) 0 F X,Y (x, y) 1 (6.1-1c) (4) F X,Y (x, y) is a nondecreasing function of both x and y (6.1-1d) (5) F X,Y (x 2, y 2 ) + F X,Y (x 1, y 1 ) F X,Y (x 1, y 2 ) F X,Y (x 2, y 1 ) = P {x 1 < X x 2, y 1 < Y y 2 } 0 (6.1-1e) (6) F X,Y (x, ) = F X (x) F X,Y (, y) = F Y (y) (6.1-1f) 14

Marginal CDF Property 6: The distribution function of one random variable can be obtained by... Note that {X x} = {X x, Y }, {Y y} = {X, Y y}. Recall: F X (x) = Pr{X x} F Y (y) = Pr{Y y} Thus, F X,Y (x, ) = P ({s S : X(s) x, Y (s) }) = P ({s S : X(s) x} {s S : Y (s) }) = P ({s S : X(s) x} S) = P ({s S : X(s) x}) = F X (x). Similarly, F X,Y (, y) = F Y (y). F X (x) and F Y (y) obtained from F X,Y (x, y) by using Property 6 are called marginal CDFs. From an N-dimensional joint distribution function we may obtain a k-dimensional marginal distribution function. 15

Joint Density Function The concept of a pdf of a random variable is extended to include multiple random variables. Def. The joint probability density function=the joint density function Lemma. If X and Y are discrete, f X,Y (x, y) = f X,Y (x, y) = 2 F X,Y (x, y) x y M m=1 n=1 N p X,Y (x m, y n )δ(x x m )δ(y y n ). Def. For N random variables, the joint PDF is define by which implies f X1,X 2,...,X N (x 1, x 2,..., x N ) = N F X1,X 2,...,X N (x 1, x 2,..., x N ) x 1 x 2... x N, F X1,X 2,...,X N (x 1, x 2,..., x N ) = xn x2 x1... f X1,X 2,...,X N (ξ 1, ξ 2,..., ξ N )dξ 1 dξ 2...dξ N 16

Properties of a Joint PDF properties of a joint density function (1) f X,Y (x, y) 0 (6.1-2a) (2) (3) F X,Y (x, y) = (4) F X (x) = F Y (y) = f X,Y (x, y)dxdy = 1 x y y x f X,Y (ξ 1, ξ 2 )dξ 1 dξ 2 f X,Y (ξ 1, ξ 2 )dξ 2 dξ 1 f X,Y (ξ 1, ξ 2 )dξ 1 dξ 2 (5) P {x 1 < X x 2, y 1 < Y y 2 } = (6) f X (x) = f Y (y) = f X,Y (x, y)dy f X,Y (x, y)dx y2 x2 y 1 (6.1-2b) (6.1-2c) (6.1-2d) (6.1-2e) x 1 f X,Y (x, y)dxdy (6.1-2f) (6.1-2g) (6.1-2h) Property 1 and 2 may be used as sufficient tests to determine if some function can be a valid density function. 17

Marginal Density Functions marginal probability density functions = marginal density functions f X (x) = df X(x) dx f Y (y) = df Y (y) dy For N random variables, a k-dimensional marginal density function f X1,X 2,...,X k (x 1, x 2,..., x k ) = There are ( N k ) k-dimensional marginal PDFs.... f X1,X 2,...,X N (x 1, x 2,..., x N )dx k+1 dx k+2...dx N 18

6.6 CONDITIONAL DISTRIBUTIONS Review the conditional cdf of X given B with P (B) 0 the corresponding conditional pdf F X (x B) = P (X x B) = P ({X x} B) P (B) f X (x B) = df X(x B) dx B = {X a}, B = {a < X b}, B = {X = a} Now, we have two jointly distributed random variables X and Y. What if B = {Y a}, B = {a < Y b}, B = {Y = a}? The results must be consistent with our previous results with Y = X. 19

*Conditional Distribution and Density Interval Conditioning interval conditioning X and Y either continuous or discrete B = {y a < Y y b } F X (x y a < Y y b )= F X,Y (x, y b ) F X,Y (x, y a ) F Y (y b ) F Y (y a ) yb x y = a f X,Y (ξ, y)dξdy yb f X,Y (x, y)dxdy (6.6-1) y a f X (x y a < Y y b ) = yb yb y a f X,Y (x, y)dy f X,Y (x, y)dxdy y a 20

Figure 6.6-1 21

Conditional Distribution and Density Point Conditioning the conditional pdf of X given {Y = y} Distinguish a constant random variable Y = 3 from an event {Y = 3}! point conditioning B = {y y < Y y + y} We consider only two cases: F X (x y y < Y y + y) = X and Y are both discrete = Pr(Y = y k ) > 0 f X,Y (x, y) = f Y (y) = N i=1 y+ y y y M P (y j )δ(y y j ) j=1 x f X,Y (ξ 1, ξ 2 )dξ 1 dξ 2 y+ y y y f Y (ξ)dξ M P (x i, y j )δ(x x i )δ(y y j ) j=1 F X (x Y = y k ) = f X (x Y = y k ) = N i=1 N i=1 P (x i, y k ) P (y k ) u(x x i) P (x i, y k ) P (y k ) δ(x x i) 22

the conditional probability of event A given {Y = y} P (A Y = y) = P (A {Y = y}) P (Y = y) X and Y are both continuous = Pr(Y = y k ) = 0 x F X (x y y < Y y + y) = f X,Y (ξ 1, y)dξ 1 2 y f Y (y)2 y x F X (x Y = y) = f X,Y (ξ, y)dξ f Y (y) f X (x Y = y) = f X,Y (x, y) f Y (y) f X Y (x, y) = f X,Y (x, y) f Y (y) f Y X (y, x) = f X,Y (x, y) f X (x) the conditional probability of event A given {Y = y} P (A Y = y) = P (A {Y = y}) f Y (y) 23

Figure 6.6-2 24

Bayes Theorems for joint random variables X continuous, Y continuous f X Y (x, y) = f Y X (y, x)f X (x) f Y X(y, x)f X (x)dx X continuous, Y discrete f X Y (x, y) = Pr(Y = y X = x)f X (x) Pr(Y = y X = x)f X(x)dx X discrete, Y continuous Pr(X = x Y = y) = f Y X(y, x) Pr(X = x) x f Y X(y, x) Pr(X = x) X discrete, Y discrete... 25

STATISTICAL INDEPENDENCE statistical independence of events vs. statistical independence of random variables Review. Two events A and B are statistically independent if P (A B) = P (A)P (B) Def. Two random variables X and Y are statistically independent if Equivalently, P {X x, Y y} = P {X x}p {Y y}, x, y F X,Y (x, y) = F X (x)f Y (y), x, y f X,Y (x, y) = f X (x)f Y (y), x, y F X (x Y y) = P {X x, Y y} P {Y y} = F X,Y (x, y) F Y (y) = F X(x)F Y (y) F Y (y) = F X (x), x, y F Y (y X x) = F Y (y), x, y f X (x Y y) = f X (x), x, y f Y (y X x) = f Y (y), x, y Any event in terms of X and any other event in terms of Y are always independent. 26

Statistical Independence of N Random Variables Def. For any M, N N and for any k 1, k 2,..., k M and l 1, l 2,..., l N without any repetition, an event in terms of X k1, X k2,..., X km and an event in terms of X l1, X l2,..., X ln are independent events. If X 1, X 2,..., X N are statistically independent then any group of these random variables is independent of any other group. A function of any group is independent of any function of any other group of the random variables. A i = {X i x i } i = 1, 2,..., N Theorem. X 1, X 2,..., X N are statistically independent iff N F X1,X 2,...,X N (x 1, x 2,..., x N ) = F Xn (x n ) n=1 Equivalently, N f X1,X 2,...,X N (x 1, x 2,..., x N ) = f Xn (x n ) n=1 Q. What is the difference from the definition of N independent events? 27

6.2 ONE FUNCTION OF TWO RANDOM VARIABLES a function of statistically independent or dependent two random variables How to find the cdf and the pdf? General One Function Case Y = g(x 1, X 2,..., X N ) F Y (y) = P {g(x 1, X 2,..., X N ) y} =... f X1,X 2,...,X N (x 1, x 2,..., x N )dx 1 dx 2...dx N {g(x 1, x 2,..., x N ) y} f Y (y) = df Y (y) fy = d... f X1,X dy 2,...,X N (x 1, x 2,..., x N )dx 1 dx 2...dx N {g(x 1, x 2,..., x N ) y} 28

Special One Function Case: Sum of Two Random Variables Q. Given two jointly distributed random variables X and Y, find the cdf and the pdf of X +Y. Case I: When X and Y are dependent. We use the result for the general case now with g(x, Y ) X + Y. The key is to set W = X + Y. Then, the cdf of W is given by F W (w) = Pr(W w) = Pr(X + Y w). Let s visualize the event of which probability to be computed. Figure 6.2-1 29

Thus, the CDF of W can be rewritten in terms of the joint PDF of X and Y as: F W (w) = w y x= f X,Y (x, y)dxdy By differentiating using Leibniz s rule, we have the PDF of W as: f W (w) = Case II: When X and Y are independent. f X,Y (w y, y)dy The CDF of W can be rewritten in terms of the joint PDF of X and Y as: F W (w) = = The PDF of W is given by w y f W (w) = An alternate derivation f W (w) = = = x= w y f Y (y) f X (x)f Y (y)dxdy x= f X (w y)f Y (y)dy f W,Y (w, y)dy f W Y =y (w)f Y (y)dy f X (w y)f Y (y)dy = f X (w) f Y (w) f X (x)dxdy (6.2-1) 30

The density function of the sum of two statistically independent random variables is the convolution of their individual density functions. Figure 6.2-2 31

Q. What if X and Y are jointly Gaussian and Z = X + Y? Z is Gaussian. Q. What if X Y? Hint. X + ( Y ), f Z (z) = f Y (z) = f Y ( z) Q. What if X/Y? If X and Y are jointly normal, then X/Y has a Cauchy density centered at rσ 1 /σ 2. Q. What if X 2 + Y 2? If X and Y are i.i.d. normal with zero mean, then X 2 + Y 2 has an exponential density. Q. What if X 2 + Y 2? If X and Y are i.i.d. normal with zero mean, then X 2 + Y 2 has a Rayleigh density. If X and Y are independent normal with same variance, then X 2 + Y 2 has a Ricean/Rician density. 32

Order Statistics Def. Given X 1, X 2,..., X N, we can rearrange them in an increasing order of magnitude such that X (1) X (2) X (N). Then, X (k) is called the kth-order statistic. As X 1, X 2,..., X N are dependent in general, X (1), X (2),, X (N) are also dependent in general. X (k) is a nonlinear operation on X[ 1, X 2,..., X N ]. X (1) = min(x 1, X 2,..., X N ) and X (N) = max(x 1, X 2,..., X N ) are representative ones. For N = 2, sketch the events min(x, Y ) w and max(x, Y ) w. What if X and Y are independent? Discrete Case case with X, Y Z, case with... Q. What if X and Y are independent Poisson and Z = X + Y? Z is Poisson. 33

6.3 TWO FUNCTIONS OF TWO RANDOM VARIABLES Q. Given X = [X 1, X 2 ] T, find the joint CDF and PDF of Y = [g 1 (X 1, X 2 ), g 2 (X 1, X 2 )] T. CDF first: F Y1,Y 2 (y 1, y 2 ) = Pr(Y 1 y 1, Y 2 y 2 ) Then, PDF by differentiating... Example 6-21: [min(x 1, X 2 ), max(x 1, X 2 )] = Pr(g 1 (X 1, X 2 ) y 1, g 2 (X 1, X 2 ) y 2 ) = f X1,X 2 (x 1, x 2 )dx 1 dx 2 (6.3-1) {(x 1,x 2 ):g 1 (x 1,x 2 ) y 1,g 2 (x 1,x 2 ) y 2 } Example 6-22: Cartesian coordinate to polar coordinate R = X 2 + Y 2, Θ = tan 1 (Y/X) f R,Θ (r, θ) = f XY (r cos θ, r sin θ)r To show Q(x) 1 x2 2e 2, x 0. 34

Multiple Functions Q. Find the joint density function of a set of functions that defines a set of random variables. Y i = g i (X 1, X 2,..., X N ) i = 1, 2,..., N Note that N random variables are mapped to N random variables. What if N random variables are mapped to M < N random variables? Introduce an auxiliary random variable, then find an Mth order marginal PDF. Backward compatibility: Recall that where y = g(x i ). f Y (y) = f X (x i ) dg(x) dx General Case from 2 r.v. to 2 r.v.: When [Z, W ] = [g(x, Y ), h(x, Y )], xi f Z,W (z, w) = f X,Y (x i, y i ) J(x i, y i ) where z = g(x i, y i ), w = h(x i, y i ) and J(x i, y i ) is the Jacobian (determinant) of the original transformation defined as ]) J(x i, y i ) = det ([ g x h x The Jacobian can be negative, so that we take the absolute value of it. It is assumed that... differentiable. g y h y. 35

Special Case from N r.v. to N r.v. invertible: It is assumed that a set of inverse continuous functions T 1 j exists such that Then, X j = T 1 j (Y 1, Y 2,..., Y N ) j = 1, 2,..., N. =.. R Ẋ... R Y f X1,...,X N (x 1,..., x N )dx 1...dx N f Y1,...,Y N (y 1,..., y N )dy 1...dy N. differentiable: The Jacobian of the transformations is the determinant of a matrix of derivatives Thus, which implies J(y) = T 1 1 y 1. TN 1 y 1 T 1 1 y N. TN 1 y N.. f X1,...,X N (x 1,..., x N )dx 1...dx N R Ẋ =... f Y1,...,Y N (x 1 = T1 1,..., x N = TN 1 ) J dy 1...dy N, R Y f Y (y) = f X (T 1 (y)) J(y) 36

6.4 JOINT MOMENTS expectation for two or more random variables moments, characteristic functions, moment generating functions EXPECTED VALUE OF A FUNCTION OF RANDOM VARIABLES Recall that if Z = g(x, Y ) then E[Z] = zf Z(z)dz. (Fundamental Theorem of Expectation) Backward compatibility: For multiple random variables, Special Cases: E[g(X 1,..., X N )] = E[g(X, Y )] = g(x 1,..., X N ) = g(x 1 ) E[g(X 1 )] = E{X + Y } = E{X} + E{Y } always. E{ i X i} = i a ie{x i } always. E{XY } E{X}E{Y } in general. g(x, y)f X,Y (x, y)dxdy g(x 1 )f X1 (x 1 )dx 1... g(x 1,..., x N )f X1,...,X N (x 1,..., x N )dx 1...dx N If X and Y are independent, then E{XY } = E{X}E{Y }. 37

Joint Moments about the Origin Def. joint moments about the origin = joint non-central moments m nk = E[X n Y k ] = the first-order moments, the second-order moments,... the correlation of X and Y R XY = m 11 = E[XY ] = Def. X and Y are uncorrelated if x n y k f X,Y (x, y)dxdy xyf X,Y (x, y)dxdy R XY = E{XY } = E{X}E{Y } statistical independence of X and Y uncorrelatedness Converse is not true in general. For gaussian, the converse is true. Why? Correlation does not imply causation. Def. X and Y are orthogonal if R XY = E{XY } = 0 Zero-mean X and Y are orthogonal iff uncorrelated. 38

Joint Central Moments Def. joint central moments µ nk = E[(X X) n (Y Ȳ )k ] = the second-order central moments: variances µ 20 = E[(X X) 2 ] = σx 2 = C XX (x X) n (y Ȳ )k f X,Y (x, y)dxdy µ 02 = E[(Y Ȳ )2 ] = σy 2 = C Y Y the covariance of X and Y C XY = µ 11 = E[(X X)(Y Ȳ )] = (x X)(y Ȳ )f X,Y (x, y)dxdy C XY = R XY XȲ = R XY E[X]E[Y ] C XY = 0 iff X and Y are uncorrelated C XY = E[X]E[Y ] iff X and Y orthogonal If X and Y are orthogonal and either X or Y has zero mean value, then C XY = 0. the normalized second-order moment 39

ρ µ11/ µ 20 µ 02 = C XY /σ X σ [ Y (X X) (Y ρ = E Ȳ ) ] σ X σ Y ρ = the correlation coefficient of X and Y C XY ρ = CXX C Y Y (6.4-1a) (6.4-1b) 1 ρ 1 Why? Consider E[(aX + Y ) 2 ] 0, a and its discriminant. Examples in Wikipedia If pairwise uncorrelated random variables, then { } Var a i X i = i i a 2 i Var {X i }. 40

6.5 JOINT CHARACTERISTIC FUNCTIONS Def. The joint characteristic function of two random variables X and Y where ω 1 and ω 2 are real numbers. Φ X,Y (ω 1, ω 2 ) = Φ X,Y (ω 1, ω 2 ) = E[e jω 1X+jω 2 Y ] f X,Y (x, y)e jω 1x+jω 2 y dxdy the two-dimensional Fourier transform (with signs of ω 1 and ω 2 reversed) of the joint density function f X,Y (x, y) = 1 (2π) 2 marginal characteristic functions Φ X,Y (ω 1, ω 2 )e jω 1x jω 2 y dω 1 dω 2 Φ X (ω 1 ) = Φ X,Y (ω 1, 0) Φ Y (ω 2 ) = Φ X,Y (0, ω 2 ) Joint moments can be found from the joint characteristic function. m nk = ( j) n+k n+k Φ X,Y (ω 1, ω 2 ) ω1 n ωk 2 ω1 =0,ω 2 =0 useful where the probability density function is needed for the sum of N statistically independent random variables 41

Two JOINTLY GAUSSIAN Random Variables Two random variables X and Y are jointly gaussian, if their joint pdf is of the form f X,Y (x, y) = { [ 1 1 (x 2 2πσ X σ exp X) Y 1 ρ 2 2(1 ρ 2 ) σx 2 the bivariate gaussian density 2ρ(x X)(y Ȳ ) σ X σ Y + ]} (y Ȳ )2 σy 2 X = E[X] (6.5-1) Ȳ = E[Y ] (6.5-2) σ 2 X = E[(X X) 2 ] (6.5-3) σ 2 X = E[(Y Ȳ )2 ] (6.5-4) ρ = E[(X X)(Y Ȳ )]/σ Xσ Y (6.5-5) f X,Y (x, y) f X,Y ( X, Ȳ ) = 1 2πσ X σ Y 1 ρ 2 The locus of constant values of f X,Y (x, y) is an ellipse. If ρ = 0, corresponding to uncorrelated X and Y, then f X,Y (x, y) = f X (x)f Y (y) ] 2 1 (x X) f X (x) = exp [ 2πσ 2 X 2σ 2 X 42

f Y (y) = ] 1 (y Ȳ )2 exp [ 2πσ 2 Y 2σY 2 Any uncorrelated gaussian random variables are also statistically independent A coordinate rotation (linear transformation of X and Y ) through an angle θ = 1 [ ] 2ρσX σ Y 2 tan 1 σx 2 σ2 Y is sufficient to convert correlated random variable X and Y into two statistically independent gaussian random variables 43

Figure 6.5-1 44

COMPUTER GENERATION OF MULTIPLE RANDOM VARIABLES Using two statistically independent random variables X 1 and X 2, both uniformly distributed on (0, 1), generate two statistically dependent gaussian random variables Y 1 and Y 2, each with zero-mean and unit-variance. Y 1 = T 1 (X 1, X 2 ) = 2 ln(x 1 ) cos(2πx 2 ) Y 2 = T 2 (X 1, X 2 ) = 2 ln(x 1 ) sin(2πx 2 ) (6.5-6a) (6.5-6b) f Y1,Y 2 (y 1, y 2 ) = e y2 1/2 2π e y2 2/2 2π Using two statistically independent gaussian random variables Y 1 and Y 2, each with zero-mean and unit-variance, generate two statistically independent gaussian random variables W 1 and W 2 that have arbitrary variances and arbitrary correlation coefficient. [ ] σw [C W ] = 2 1 ρ W σ W1 σ W2 ρ W σ W1 σ W2 σw 2 = [T ][T ] t 2 To find [T ], set [T ] as a lower triangular matrix of the form [ ] T11 0 [T ] = T 21 T 22 possible as long as [C W ] is non singular T 11 = σ W1 (6.5-7a) T 21 = ρ W σ W2 (6.5-7b) T 22 = σ W2 1 ρ 2 W (6.5-7c) 45

Thus, W 1 = T 11 Y 1 = σ W1 Y 1 (6.5-8a) W 2 = T 21 Y 1 + T 22 Y 2 = ρ W σ W2 Y 1 + σ W2 1 ρ 2 W Y 2 (6.5-8b) If arbitrary means are desired W 1 = W 1 + σ W1 Y 1 (6.5-9a) W 2 = W 2 + ρ W σ W2 Y 1 + σ W2 1 ρ 2 W 2 Y 2 (6.5-9b) For N random variables, [T ] can be found by the Cholesky method of factoring matrices. Suppose two statistically independent gaussian random variables W 1 and W 2 with respective means W 1 and W 2 and variances both equal to σ 2, are subject to Thus, R = T 1 (W 1, W 2 ) = W 2 1 + W 2 2 (6.5-10) Θ = T 2 (W 1, W 2 ) = tan 1 (W 2 /W 1 ) (6.5-11) Since W 1 = T1 1 (R, Θ) = R cos(θ) (6.5-12) W 2 = T2 1 (R, Θ) = R sin(θ) (6.5-13) 46

If we define f W1,W 2 (w 1, w 2 ) = 1 2πσ 2e [(w 1 W 1 ) 2 +(w 2 W 2 ) 2 ]/(2σ 2 ) f R,Θ (r, θ) = ru(r) 2πσ exp { [ [r cos(θ) W 2 1 ] 2 + [r sin(θ) W 2 ] 2] /(2σ 2 ) } = ru(r) { 2πσ exp 1 } + A 2 2 2σ 2[r2 0 2rA 0 cos(θ θ 0 )] (6.5-14) A 0 = W 2 1 + W 2 2 then simplifies to f R,Θ (r, θ) = ru(r) 2πσ 2 exp θ 0 = tan 1 ( W 2 / W 1 ) { 1 } + A 2 2σ 2[r2 0 2rA 0 cos(θ θ 0 )] 47

6.7 CONDITIONAL EXPECTED VALUES Conditional Expectation When X and Y are jointly distributed, which is a function of y. E[X Y = y] = Let T (y) E[X Y = y]. Then, what is Z T (Y )? T (Y ) is not denoted by E[X Y = Y ]. T (Y ) is denoted by E[X Y ]. E[X Y ] is a random variable. xf X Y =y (x)dx x f X,Y (x, y) dx f Y (y) If X and Y are independent, then T (y) = E[X Y = y] = E[X]. Thus, E[X Y ] = E[X] is a constant random variable. The pdf of E[X Y ] can be found as we already learned. What is E[E[X Y ]]? Total Expectation Theorem: E[E[X Y ]] = = E[X Y = y]f Y (y)dy ( x f ) X,Y (x, y) dx f Y (y)dy f Y (y) xf X,Y (x, y)dxdy = E[X] (6.7-1) 48

When X, Y, and Z are jointly distributed, two functions E[Z X = x] and E[E[Z X, Y ] X = x] satisfy E[Z X = x] = E[E[Z X, Y ] X = x]. Why? Thus, two random variables E[Z X] and E[E[Z X, Y ] X] satisfy E[Z X] = E[E[Z X, Y ] X]. Caution: E[g(X, Y ) X = x] = E[g(x, Y ) X = x] E[g(x, Y )] E[g(X, Y ) X] is a random variable. E[E[g(X, Y ) X]] = E[g(X, Y )] If X and Y are zero-mean Gaussian random variables with correlation coefficient r, then E[X 2 Y 2 ] = E[X 2 ]E[Y 2 ] + 2 (E[XY ]) 2. 49

6.8 SUMMARY the theory of multiple random variables a random vector joint cdf, joint pdf conditional cdf and conditional pdf for several random variables statistical independence of random variables 50