A Probability Review - PDF Free Download

A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1

A Probability Review Reading: Go over handouts 2 5 in EE 420x notes Basic probability rules: (1) Pr{Ω} = 1, Pr{ } = 0, 0 Pr{A} 1; Pr{ i=1 A i} = i=1 Pr{A i} if A i A }{{} j = for all i j; A i and A j disjoint (2) Pr{A B} = Pr{A} + Pr{B} Pr{A B}, Pr{A c } = 1 Pr{A}; (3) If A B, then Pr{A B} = Pr{A} Pr{B}; (4) Pr{A B} = Pr{A B} Pr{B} (conditional probability) or Pr{A B} = Pr{A B} Pr{B} (chain rule); EE 527, Detection and Estimation Theory, # 0b 2

(5) Pr{A} = Pr{A B 1 } Pr{B 1 } + + Pr{A B n } Pr{B n } if B 1, B 2,, B n form a partition of the full space Ω; (6) Bayes rule: Pr{A B} = Pr{B A} Pr{A} Pr{B} EE 527, Detection and Estimation Theory, # 0b 3

Reminder: Independence, Correlation and Covariance For simplicity, we state all the definitions for pdfs; the corresponding definitions for pmfs are analogous Two random variables X and Y are independent if f X,Y (x, y) = f X (x) f Y (y) Correlation between real-valued random variables X and Y : E X,Y {X Y } = + + x y f X,Y (x, y) dx dy Covariance between real-valued random variables X and Y : cov X,Y (X, Y ) = E X,Y {(X µ X ) (Y µ Y )} = + + (x µ X ) (y µ Y ) f X,Y (x, y) dx dy EE 527, Detection and Estimation Theory, # 0b 4

where µ X = E X (X) = µ Y = E Y (Y ) = + + x f X (x) dx y f Y (y) dy Uncorrelated random variables: Random variables X and Y are uncorrelated if c X,Y = cov X,Y (X, Y ) = 0 (1) If X and Y are real-valued RVs, then (1) can be written as E X,Y {X Y } = E X {X} E Y {Y } Mean Vector and Covariance Matrix: Consider a random vector X = X 1 X 2 X N EE 527, Detection and Estimation Theory, # 0b 5

The mean of this random vector is defined as µ 1 E µ X = µ 2 = E [X X 1 1] X{X} = E [X X 2 2] µ N E XN [X N ] Denote the covariance between X i and X k, cov Xi,X k (X i, X k ), by c i,k ; hence, the variance of X i is c i,i = cov Xi,X k (X i, X i ) = var Xi (X i ) = σx 2 }{{} i The covariance matrix of X is defined as more notation C X = c 1,1 c 1,2 c 1,N c 2,1 c 2,2 c 2,N c N,1 c N,2 c N,N The above definitions apply to both real and complex vectors X Covariance matrix of a real-valued random vector X: C X = E X {(X E X [X]) (X E X [X]) T } = E X [X X T ] E X [X] (E X [X]) T For real-valued X, c i,k = c k,i and, therefore, C X is a symmetric matrix EE 527, Detection and Estimation Theory, # 0b 6

Linear Transform of Random Vectors Linear Transform For real-valued Y, X, A, Y = g(x) = A X Mean Vector: µ Y = E X {A X} = A µ X (2) Covariance Matrix: C Y = E Y {Y Y T } µ Y µ T Y = E X {A X X T A T } A µ X µ T X A T ( ) = A E X {X X T } µ X µ }{{ T X} C X A T = A C X A T (3) EE 527, Detection and Estimation Theory, # 0b 7

Reminder: Iterated Expectations In general, we can find E X,Y [g(x, Y )] using iterated expectations: E X,Y [g(x, Y )] = E Y {E X Y [g(x, Y ) Y ]} (4) where E X Y denotes the expectation with respect to f X Y (x y) and E Y denotes the expectation with respect to f Y (y) Proof E Y {E X Y [g(x, Y ) Y ]} = = = = + + + + E X Y [g(x, Y ) y] f Y (y) dy ( + ) g(x, y)f X Y (x y) dx f Y (y) dy + + = E X,Y [g(x, Y )] g(x, y)f X Y (x y)f Y (y) dx dy g(x, y) f X,Y (x, y) dx dy EE 527, Detection and Estimation Theory, # 0b 8

Reminder: Law of Conditional Variances Define the conditional variance of X given Y = y to be the variance of X with respect to f X Y (x y), ie ] var X Y (X Y = y) = E X Y [(X E X Y [X y]) 2 y = E X Y [X 2 y] (E X Y [X y]) 2 The random variable var X Y (X Y ) is a function of Y only, taking values var(x Y = y) Its expected value with respect to Y is E Y {var X Y (X Y )} = E Y { E X Y [X 2 Y ] (E X Y [X Y ]) 2} iterated exp = E X,Y [X 2 ] E Y {(E X Y [X Y ]) 2 } = E X [X 2 ] E Y {(E X Y [X Y ]) 2 } Since E X Y [X Y ] is a random variable (and a function of Y only), it has variance: var Y {E X Y [X Y ]} = E Y {(E X Y [X Y ]) 2 } (E Y {E X Y [X Y ]}) 2 iterated exp = E Y {(E X Y [X Y ]) 2 } (E X,Y [X]) 2 = E Y {E X Y [X Y ] 2 } (E X [X]) 2 EE 527, Detection and Estimation Theory, # 0b 9

Adding the above two expressions yields the law of conditional variances: E Y {var X Y (X Y )} + var Y {E X Y [X Y ]} = var X (X) (5) Note: (4) and (5) hold for both real- and complex-valued random variables EE 527, Detection and Estimation Theory, # 0b 10

Useful Expectation and Covariance Identities for Real-valued Random Variables and Vectors E X,Y [a X + b Y + c] = a E X [X] + b E Y [Y ] + c var X,Y (a X + b Y + c) = a 2 var X (X) + b 2 var Y (Y ) +2 a b cov X,Y (X, Y ) where a, b, and c are constants and X and Y are random variables A vector/matrix version of the above identities: E X,Y [A X + B Y + c] = A E X [X] + B E Y [Y ] + c cov X,Y (A X + B Y + c) = A cov X (X) A T + B cov Y (Y ) B T +A cov X,Y (X, Y ) B T + B cov X,Y (Y, X) A T where T denotes a transpose and cov X,Y (X, Y ) = E X,Y {(X E X [X]) (Y E Y [Y ]) T } Useful properties of crosscovariance matrices: cov X,Y,Z (X, Y + Z) = cov X,Y (X, Y ) + cov X,Z (X, Z) EE 527, Detection and Estimation Theory, # 0b 11

cov Y,X (Y, X) = [cov X,Y (X, Y )] T cov X (X) = cov X (X, X) var X (X) = cov X (X, X) cov X,Y (A X + b, P Y + q) = A cov X,Y (X, Y ) P T (To refresh memory about covariance and its properties, see p 12 of handout 5 in EE 420x notes For random vectors, see handout 7 in EE 420x notes, particularly pp 1 15) Useful theorems: (1) (handout 5 in EE 420x notes) E X (X) = E Y [E X Y (X Y )] shown on p 8 E X Y [g(x) h(y ) y] = h(y) E X Y [g(x) y] E X,Y [g(x) h(y )] = E Y {h(y ) E X Y [g(x) Y ]} The vector version of (1) is the same just put bold letters EE 527, Detection and Estimation Theory, # 0b 12

(2) var X (X) = E Y [var X Y (X Y )] + var Y (E X Y [X Y ]); and the vector/matrix version is cov X (X) }{{} variance/covariance matrix of X = E Y [cov X Y (X Y )] +cov Y (E X Y [X Y ]) shown on p (3) Generalized law of conditional variances: cov X,Y (X, Y ) = E Z [cov X,Y Z (X, Y Z)] +cov Z (E X Z [X Z], E Y Z [Y Z]) (4) Transformation: Y = g(x) one-to-one Y 1 = g 1 (X 1,, X n ) Y n = g n (X 1,, X n ) then f Y (y) = f X (h 1 (y 1 ),, h n (y n )) J EE 527, Detection and Estimation Theory, # 0b 13

where h( ) is the unique inverse of g( ) and J = x y T = x 1 x 1 y n y 1 x n y 1 x n y n Print and read the handout Probability distributions from the Course readings section on WebCT Bring it with you to the midterm exams EE 527, Detection and Estimation Theory, # 0b 14

Jointly Gaussian Real-valued RVs Scalar Gaussian random variables: f X (x) = 1 2 π σ 2 X exp [ (x µ X) 2 2 σ 2 X ] Definition Two real-valued RVs X and Y are jointly Gaussian if their joint pdf is of the form f X,Y (x, y) = 1 exp { 2πσ X σ Y 1 ρ 2 X,Y 1 2 (1 ρ 2 X,Y ) (x µ X ) (y µ Y ) ] 2 ρ X,Y σ X σ Y [ (x µx ) 2 } σ 2 X + (y µ Y ) 2 σ 2 Y (6) This pdf is parameterized by µ X, µ Y, σ 2 X, σ 2 Y, and ρ X,Y Here, σ X = σ 2 X and σ Y = σ 2 Y Note: We will soon define a more general multivariate Gaussian pdf EE 527, Detection and Estimation Theory, # 0b 15

If X and Y are jointly Gaussian, contours of equal joint pdf are ellipses defined by the quadratic equation (x µ X ) 2 σ 2 X + (y µ Y ) 2 (x µ X ) (y µ Y ) σ 2 2 ρ X,Y = const 0 Y σ X σ Y Examples: In the following examples, we plot contours of the joint pdf f X,Y (x, y) for zero-mean jointly Gaussian RVs for various values of σ X, σ Y, and ρ X,Y EE 527, Detection and Estimation Theory, # 0b 16

EE 527, Detection and Estimation Theory, # 0b 17

If X and Y Gaussian, eg are jointly Gaussian, the conditional pdfs are ( X {Y = y} N ρ X,Y σ X y E Y [Y ] σ Y +E X [X], (1 ρ 2 X,Y ) σ 2 X If X and Y are jointly Gaussian and uncorrelated, ie ρ X,Y = 0, they are also independent (7) ) EE 527, Detection and Estimation Theory, # 0b 18

Gaussian Random Vectors Real-valued Gaussian random vectors: f X (x) = 1 (2 π) N/2 C X 1/2 exp [ 1 2 (x µ X) T C 1 X (x µ X ) ] Complex-valued Gaussian random vectors: f Z (z) = 1 π N C Z exp [ (z µ Z ) H C 1 Z (z µ Z ) ] Notation for real- and complex-valued Gaussian random vectors: X N r (µ X, C X ) [or simply N (µ X, C X )] X N c (µ X, C X ) complex real An affine transform of a Gaussian vector is also a Gaussian random vector, ie if then Y = A X + b Y N r (A µ X + b, A C X A T ) Y N c (A µ X + b, A C X A H ) real complex EE 527, Detection and Estimation Theory, # 0b 19

The Gaussian random vector W N r (0, σ 2 I n ) (where I n denotes the identity matrix of size n) is called white; pdf contours of a white Gaussian random vector are spheres centered at the origin Suppose that W [n], n = 0, 1,, N 1 are independent, identically distributed (iid) zeromean univariate Gaussian N (0, σ 2 ) Then, for W = [W [0], W [1],, W [N 1]] T, f W (w) = N (w 0, σ 2 I) Suppose now that, for these W [n], Y [n] = θ + W [n] where θ is a constant What is the joint pdf of Y [0], Y [1],, and Y [N 1]? This pdf is the pdf of the vector Y = [Y [0], Y [1],, Y [N 1]] T : Y = 1 θ + W where 1 is an N 1 vector of ones Now, Since θ is a constant, f Y (y) = N (y 1 θ, σ 2 I) f Y (y) = f Y θ (y θ) EE 527, Detection and Estimation Theory, # 0b 20

Gaussian Random Vectors A real-valued random vector X = [X 1, X 2,, X n ] T with mean µ and covariance matrix Σ with determinant Σ > 0 (ie Σ is positive definite) is a Gaussian random vector (or X 1, X 2,, X n Gaussian RVs) if and only if its joint pdf is are jointly f X (x) = 1 2πΣ 1/2 exp[ 1 2 (x µ)t Σ 1 (x µ)] (8) Verify that, for n = 2, this joint pdf reduces to the twodimensional pdf in (6) Notation: We use X N (µ, Σ) to denote a Gaussian random vector Since Σ is positive definite, Σ 1 is also positive definite and, for x µ, (x µ) T Σ 1 (x µ) > 0 which means that the contours of the multivariate Gaussian pdf in (8) are ellipsoids The Gaussian random vector X N (0, σ 2 I n ) (where I n denotes the identity matrix of size n) is called white contours of the pdf of a white Gaussian random vector are spheres centered at the origin EE 527, Detection and Estimation Theory, # 0b 21

Properties of Real-valued Gaussian Random Vectors Property 1: For a Gaussian random vector, uncorrelation implies independence This is easy to verify by setting Σ i,j = 0 for all i j in the joint pdf, then Σ becomes diagonal and so does Σ 1 ; then, the joint pdf reduces to the product of marginal pdfs f Xi (x i ) = N (µ i, Σ i,i ) = N (µ i, σ 2 ) Clearly, this property Xi holds for blocks of RVs (subvectors) as well Property 2: A linear transform of a Gaussian random vector X N (µ X, Σ X ) yields a Gaussian random vector: Y = A X N (A µ X, A Σ X A T ) It is easy to show that E Y [Y ] = A µ X and cov Y (Y ) = Σ Y = A Σ X A T So E Y [Y ] = E X [A X] = A E X [X] = A µ X and Σ Y = E Y [(Y E Y [Y ]) (Y E Y [Y ]) T ] = E X [(A X A µ X ) (A X A µ X ) T ] = A E X [(X µ X ) (X µ X ) T ] A T = A Σ X A T EE 527, Detection and Estimation Theory, # 0b 22

Of course, if we use the definition of a Gaussian random vector in (8), we cannot yet claim that Y is a Gaussian random vector (For a different definition of a Gaussian random vector, we would be done right here) Proof We need to verify that the joint pdf of Y indeed has the right form Here, we decide to take the equivalent (easier) task and verify that the characteristic function of Y has the right form Definition Suppose X f X (X) Then the characteristic function of X is given by Φ X (ω) = E X [exp(j ω T X)] where ω is an n-dimensional real-valued vector and j = 1 Thus Φ X (ω) = + + f X (x) exp(j ω T x) dx proportional to the inverse multi-dimensional Fourier transform of f X (x); therefore, we can find f X (x) by taking the Fourier transform (with the appropriate proportionality factor): f X (x) = 1 (2π) n + + Φ X (ω) exp( j ω T x) dx EE 527, Detection and Estimation Theory, # 0b 23

Example: The characteristic function for X N (µ, σ 2 ) is given by Φ X (ω) = exp( 1 2 ω2 σ 2 + j µ ω) (9) and for a Gaussian random vector Z N (µ, Σ), Φ Z (ω) = exp( 1 2 ωt Σω + j ω T µ) (10) Now, go back to our proof: Y = AX is the characteristic function of Φ Y (ω) = E Y [exp(j ω T Y )] = E X [exp(j ω T A X)] = exp( 1 2 ωt A Σ X A T ω + j ω T A µ X ) Thus Y = A X N (A µ X, AΣ X A T ) EE 527, Detection and Estimation Theory, # 0b 24

Property 3: Marginals of a Gaussian random vector are Gaussian, ie if X is a Gaussian random vector, then, for any {i 1, i 2,, i k } {1, 2,, n}, Y = X i1 X i2 X ik is a Gaussian random vector To show this, [ we use ] Property 2 X1 Here is an example with n = 3 and Y = We set X 3 Y = [ 1 0 0 0 0 1 ] X 1 X 2 X 3 thus Here and Y N ( [ ] [ ] ) µ 1 Σ1,1 Σ, 1,3 µ 3 Σ 3,1 Σ 3,3 E X X 1 X 2 = µ 1 µ 2 X 3 µ 3 cov X X 1 X 2 = Σ 1,1 Σ 1,2 Σ 1,3 Σ 2,1 Σ 2,2 Σ 2,3 X 3 Σ 3,1 Σ 3,2 Σ 3,3 EE 527, Detection and Estimation Theory, # 0b 25

and note that [ µ1 µ 3 ] = [ 1 0 0 0 0 1 ] µ 1 µ 2 µ 3 and [ ] Σ1,1 Σ 1,3 Σ 3,1 Σ 3,3 = [ 1 0 0 0 0 1 ] Σ 1,1 Σ 1,2 Σ 1,3 Σ 2,1 Σ 2,2 Σ 2,3 Σ 3,1 Σ 3,2 Σ 3,3 1 0 0 0 0 1 The converse of Property 3 does not hold in general; here is a counterexample: Example: Suppose X 1 N (0, 1) and X 2 = { 1, wp 1 2 1, wp 1 2 are independent RVs and consider X 3 = X 1 X 2 Observe that X 3 N (0, 1) and f X 1,X3 (x 1, x 3 ) is not a jointly Gaussian pdf EE 527, Detection and Estimation Theory, # 0b 26

Property 4: Conditionals of Gaussian random vectors are Gaussian, ie if then X = [ X1 X 2 ] {X 2 X 1 = x 1 } N and {X 1 X 2 = x 2 } N ( [ ] [ ] ) µ N 1 Σ1,1 Σ, 1,2 µ 2 Σ 2,1 Σ 2,2 ( ) Σ 2,1 Σ1,1 1 (x 1 µ 1 )+µ 2, Σ 2,2 Σ 2,1 Σ1,1 1 Σ 1,2 ( ) Σ 1,2 Σ2,2 1 (x 2 µ 2 )+µ 1, Σ 1,1 Σ 1,2 Σ2,2 1 Σ 2,1 EE 527, Detection and Estimation Theory, # 0b 27

Example: Compare this result to the case of n = 2 in (7): {X 2 X 1 = x 1 } N ( Σ2,1 (x 1 µ 1 ) + µ 2, Σ 2,2 Σ 2 ) 1,2 Σ 1,1 Σ 1,1 In particular, having X = X 2 and Y = X 1, y = x 1, this result becomes: {X Y = y} N ( σx,y σ 2 Y (y µ Y ) + µ X, σ 2 X σ2 X,Y σ 2 Y ) where σ X,Y = cov X,Y (X, Y ), σ 2 X = cov X,X (X, X) = var X (X), and σ 2 Y = cov Y,Y (Y, Y ) = var Y (Y ) Now, it is clear that ρ X,Y = σ X,Y σ X σ Y where σ X = σ 2 X > 0 and σ Y = σ 2 Y > 0 Example: Suppose EE 527, Detection and Estimation Theory, # 0b 28

Property 5: If X N (µ, Σ) then (x µ) T Σ 1 (x µ) χ 2 d (Chi-square in your distr table) EE 527, Detection and Estimation Theory, # 0b 29

Additive Gaussian Noise Channel Consider a communication channel with input X N (µ X, τ 2 X) and noise W N (0, σ 2 ) where X and W are independent and the measurement Y is Y = X + W Since X and W are independent, we have f X,W (x, w) = f X (x) f W (x) and [ X W ] ( [ µ N X 0 What is f Y X (y x)? Since ], [ τ 2 X 0 0 σ 2 ] ) {Y X = x} = x + W N (x, σ 2 ) we have f Y X (y x) = N (y x, σ 2 ) EE 527, Detection and Estimation Theory, # 0b 30

[ X Y How about f Y (y)? Construct the joint pdf f X,Y (x, y) of X and Y : since [ ] [ ] [ ] X 1 0 X = Y 1 1 W then ] N ( [ 1 0 1 1 ] [ µx 0 ], [ 1 0 1 1 ] [ ] [ τ 2 X 0 1 1 0 σ 2 0 1 ] ) yielding [ X Y ] N ( [ ] [ ] µ X τ 2 X τ, 2 ) X µ X τ 2 X τ 2 X + σ 2 Therefore, f Y (y) = N ( y µx, τ 2 X + σ 2) EE 527, Detection and Estimation Theory, # 0b 31

Complex Gaussian Distribution Consider joint pdf of real and imaginary part of an n 1 complex vector Z Z = U + j V Assume X = [ U Y ] The 2 n-variate Gaussian pdf of the (real!) vector X is f X (x) = 1 (2 π) 2 n Σ X exp [ 1 2 (z µ X) T Σ 1 X (z µ X ) ] where µ X = [ µu ], Σ µ X = V [ ] ΣUU Σ UV Σ V U Σ V V Therefore, Pr{x A} = x A f X (x) dx EE 527, Detection and Estimation Theory, # 0b 32

Complex Gaussian Distribution (cont) Assume that Σ X has a special structure: Σ UU = Σ V V and Σ UV = Σ V U [Note that Σ UV = Σ T V U has to hold as well] Then, we can define a complex Gaussian pdf of Z as f Z (z) = 1 π n Σ X exp [ (z µ Z ) H Σ 1 Z (z µ Z ) ] where µ Z = µ U + j µ V Σ Z = E Z {(Z µ Z ) (Z µ Z ) H } = 2 (Σ UU + j Σ V U ) 0 = E Z {(Z µ Z ) (Z µ Z ) T } EE 527, Detection and Estimation Theory, # 0b 33