Section 8.1. Vector Notation

Similar documents
5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

3. Probability and Statistics

Bivariate distributions

Probability and Distributions

2 Functions of random variables

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Multiple Random Variables

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Continuous Random Variables

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

The Multivariate Normal Distribution. In this case according to our theorem

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Introduction to Probability and Stocastic Processes - Part I

A Probability Review

Section 9.1. Expected Values of Sums

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

BASICS OF PROBABILITY

4. Distributions of Functions of Random Variables

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Multivariate Random Variable

SDS 321: Introduction to Probability and Statistics

STAT Chapter 5 Continuous Distributions

Quick Tour of Basic Probability Theory and Linear Algebra

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

EE4601 Communication Systems

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Formulas for probability theory and linear models SF2941

Random Variables and Their Distributions

We introduce methods that are useful in:

Review of Probability Theory

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

Final Exam # 3. Sta 230: Probability. December 16, 2012

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

2 (Statistics) Random variables

Bivariate Distributions

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Let X and Y denote two random variables. The joint distribution of these random

MAS223 Statistical Inference and Modelling Exercises

1 Random Variable: Topics

conditional cdf, conditional pdf, total probability theorem?

Review (Probability & Linear Algebra)

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Problem Solutions Chapter 4

Review of probability

1 Review of Probability and Distributions

Lecture 11. Probability Theory: an Overveiw

Review: mostly probability and some statistics

The Multivariate Gaussian Distribution

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

where r n = dn+1 x(t)

Introduction to Machine Learning

Lecture 2: Repetition of probability theory and statistics

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Statistics for scientists and engineers

Notes for Math 324, Part 19

CS145: Probability & Computing

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Problem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2},

Brief Review of Probability

Spring 2012 Math 541B Exam 1

Stat 206: Sampling theory, sample moments, mahalanobis

Jointly Distributed Random Variables

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

Statistics, Data Analysis, and Simulation SS 2015

ACM 116: Lectures 3 4

Statistical Pattern Recognition

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

Algorithms for Uncertainty Quantification

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Preliminary statistics

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

Multivariate Statistics

8 - Continuous random vectors

STAT/MATH 395 PROBABILITY II

4. CONTINUOUS RANDOM VARIABLES

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

Multivariate Distributions

Chapter 3: Random Variables 1

1 Presessional Probability

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Chapter 2. Probability

Stat 206: Linear algebra

1.1 Review of Probability Theory

Random Variables. P(x) = P[X(e)] = P(e). (1)

Basics on Probability. Jingrui He 09/11/2007

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

Stat 5101 Notes: Algorithms (thru 2nd midterm)

TAMS39 Lecture 2 Multivariate normal distribution

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

General Random Variables

Recitation 2: Probability

Multivariate probability distributions and linear regression

ENGG2430A-Homework 2

Transcription:

Section 8.1 Vector Notation

Definition 8.1 Random Vector A random vector is a column vector X = [ X 1 ]. X n Each Xi is a random variable.

Definition 8.2 Vector Sample Value A sample value of a random vector is a column vector x = [ x 1 x n ]. The ith component, x i, of the vector x is a sample value of a random variable, X i.

Random Vectors: Notation Following our convention for random variables, the uppercase X is the random vector and the lowercase x is a sample value of X. However, we also use boldface capitals such as A and B to denote matrices with components that are not random variables. It will be clear from the context whether A is a matrix of numbers, a matrix of random variables, or a random vector.

Definition 8.3 Random Vector Probability Functions (a) The CDF of a random vector X is F X (x) = F X1,...,X n (x 1,..., x n ). (b) The PMF of a discrete random vector X is P X (x) = P X1,...,X n (x 1,..., x n ). (c) The PDF of a continuous random vector X is f X (x) = f X1,...,X n (x 1,..., x n ).

Definition 8.4 Probability Functions of a Pair of Random Vectors For random vectors X with n components and Y with m components: (a) The joint CDF of X and Y is F X,Y (x, y) = F X1,...,X n,y 1,...,Y m (x 1,..., x n, y 1,..., y m ) ; (b) The joint PMF of discrete random vectors X and Y is P X,Y (x, y) = P X1,...,X n,y 1,...,Y m (x 1,..., x n, y 1,..., y m ) ; (c) The joint PDF of continuous random vectors X and Y is f X,Y (x, y) = f X1,...,X n,y 1,...,Y m (x 1,..., x n, y 1,..., y m ).

Example 8.1 Problem Random vector X has PDF f X (x) = 6e a x x 0, 0 otherwise, (8.1) where a = [ 1 2 3 ]. What is the CDF of X?

Example 8.1 Solution Because a has three components, we infer that X is a three-dimensional random vector. Expanding a x, we write the PDF as a function of the vector components, f X (x) = 6e x 1 2x 2 3x 3 x i 0, 0 otherwise. (8.2) Applying Definition 8.4, we integrate the PDF with respect to the three variables to obtain F X (x) = (1 e x 1)(1 e 2x 2)(1 e 3x 3) x i 0, 0 otherwise. (8.3)

Quiz 8.1 Discrete random vectors X = [ ] x 1 x 2 x 3 and Y = [ ] y 1 y 2 y 3 are related by Y = AX. Find the joint PMF P Y (y) if X has joint PMF P X (x) = (1 p)p x 3 x 1 < x 2 < x 3 ; x 1, x 2, x 3 {1, 2,...}, and A = 0 otherwise, 1 0 0 1 1 0 0 1 1.

Quiz 8.1 Solution By definition of A, Y 1 = X 1, Y 2 = X 2 X 1 and Y 3 = X 3 X 2. Since 0 < X 1 < X 2 < X 3, each Y i must be a strictly positive integer. Thus, for y 1, y 2, y 3 {1, 2,...}, P Y (y) = P [Y 1 = y 1, Y 2 = y 2, Y 3 = y 3 ] X 1 = y 1, = P X 2 X 1 = y 2, X 3 X 2 = y 3 X 1 = y 1, = P X 2 = y 2 + y 1, X 3 = y 3 + y 2 + y 1 = P X (y 1, y 2 + y 1, y 3 + y 2 + y 1 ) = (1 p) 3 p y 1+y 2 +y 3. (1) With a = [ 1 1 1 ] and q = 1 p, the joint PMF of Y is P Y (y) = qp a y y 1, y 2, y 3 {1, 2,...}, 0 otherwise.

Section 8.2 Independent Random Variables and Random Vectors

Definition 8.5 Independent Random Vectors Random vectors X and Y are independent if Discrete: P X,Y (x, y) = P X (x)p Y (y) Continuous: f X,Y (x, y) = f X (x)f Y (y).

Example 8.2 Problem As in Example 5.22, random variables Y 1,..., Y 4 have the joint PDF f Y1,...,Y 4 (y 1,..., y 4 ) = 4 0 y 1 y 2 1, 0 y 3 y 4 1, 0 otherwise. (8.4) Let V = [ Y 1 ] [ Y 4 and W = Y2 ]. Y 3 Are V and W independent random vectors?

Example 8.2 Solution We first note that the components of V are V 1 = Y 1, and V 2 = Y 4. Also, W 1 = Y 2, and W 2 = Y 3. Therefore, 4 0 v 1 w 1 1; f V,W (v, w) = f Y1,...,Y 4 (v 1, w 1, w 2, v 2 ) = 0 w 2 v 2 1, (8.5) 0 otherwise. Since V = [ ] Y 1 Y 4 and W = [ ], Y 2 Y 3 f V (v) = f Y1,Y 4 (v 1, v 2 ), f W (w) = f Y2,Y 3 (w 1, w 2 ). (8.6) In Example 5.22, we found the marginal PDFs f Y1,Y 4 (y 1, y 4 ) and f Y2,Y 3 (y 2, y 3 ) in Equations (5.78) and (5.80). From these marginal PDFs, we have Therefore, f V (v) = f W (w) = { 4(1 v1 )v 2 0 v 1, v 2 1, 0 otherwise, { 4w1 (1 w 2 ) 0 w 1, w 2 1, 0 otherwise. (8.7) (8.8) { 16(1 v1 )v 2 w 1 (1 w 2 ) 0 v 1, v 2, w 1, w 2 1, f V (v) f W (w) = 0 otherwise, which is not equal to f V,W (v, w). Therefore V and W are not independent. (8.9)

Quiz 8.2 Use the components of Y = [ Y 1,..., Y 4 ] in Example 8.2 to construct two independent random vectors V and W. Prove that V and W are independent.

Quiz 8.2 Solution In the PDF f Y (y), the components have dependencies as a result of the ordering constraints Y 1 Y 2 and Y 3 Y 4. We can separate these constraints by creating the vectors V = The joint PDF of V and W is [ ] Y1 Y 2 f V,W (v, w) =, W = 4 [ ] Y3 Y 4 0 v 1 v 2 1; 0 w 1 w 2 1, 0 otherwise.. (1) We must verify that V and W are independent. For 0 v 1 v 2 1, f V (v) = = f V,W (v, w) dw 1 dw 2 ( 1 1 0 w 1 4 dw 2 ) dw 1 = 1 (2) 0 4(1 w 1) dw 1 = 2. (3) [Continued]

Quiz 8.2 Solution (Continued 2) Similarly, for 0 w 1 w 2 1, f W (w) = = f V,W (v, w) dv 1 dv 2 ( 1 1 It follows that V and W have PDFs f V (v) = f W (w) = 0 v 1 4 dv 2 ) 2 0 v 1 v 2 1, 0 otherwise. 2 0 w 1 w 2 1, 0 otherwise. dv 1 = 2. (4) It is easy to verify that f V,W (v, w) = f V (v)f W (w), confirming that V and W are independent vectors. (5) (6)

Section 8.3 Functions of Random Vectors

Theorem 8.1 For random variable W = g(x), Discrete: P W (w) = P[W = w] = P X (x) Continuous: F W (w) = P[W w] = x:g(x)=w g(x) w f X (x) dx 1 dx n.

Example 8.3 Problem Consider an experiment that consists of spinning the pointer on the wheel of circumference 1 meter in Example 4.1 n times and observing Y n meters, the maximum position of the pointer in the n spins. Find the CDF and PDF of Y n.

Example 8.3 Solution If X i is the position of the pointer on spin i, then Y n = max{x 1, X 2,..., X n }. As a result, Y n y if and only if each X i y. This implies F Yn (y) = P [Y n y] = P [X 1 y, X 2 y,... X n y]. (8.10) If we assume the spins to be independent, the events {X 1 y}, {X 2 y},..., {X n y} are independent events. Thus F Yn (y) = P [X 1 y] P [X n y] = (P [X y]) n = (F X (y)) n. (8.11) Example 4.2 derives Equation (4.8): F X (x) = 0 x < 0, x 0 x < 1, 1 x 1. (8.12) Equations (8.11) and (8.12) imply that the CDF and corresponding PDF are F Yn (y) = 0 y < 0, y n 0 y 1, 1 y > 1, f Yn (y) = ny n 1 0 y 1, 0 otherwise. (8.13)

Theorem 8.2 Let X be a vector of n iid continuous random variables, each with CDF F X (x) and PDF f X (x). (a) The CDF and the PDF of Y = max{x 1,..., X n } are F Y (y) = (F X (y)) n, f Y (y) = n(f X (y)) n 1 f X (y). (b) The CDF and the PDF of W = min{x 1,..., X n } are F W (w) = 1 (1 F X (w)) n, f W (w) = n(1 F X (w)) n 1 f X (w).

Proof: Theorem 8.2 By definition, F Y (y) = P[Y y]. Because Y is the maximum value of {X 1,..., X n }, the event {Y y} = {X 1 y, X 2 y,..., X n y}. Because all the random variables X i are iid, {Y y} is the intersection of n independent events. Each of the events {X i y} has probability F X (y). The probability of the intersection is the product of the individual probabilities, which implies the first part of the theorem: F Y (y) = (F X (y)) n. The second part is the result of differentiating F Y (y) with respect to y. The derivations of F W (w) and f W (w) are similar. They begin with the observations that F W (w) = 1 P[W > w] and that the event {W > w} = {X 1 > w, X 2 > w,... X n > w}, which is the intersection of n independent events, each with probability 1 F X (w).

Theorem 8.3 For a random vector X, the random variable g(x) has expected value Discrete: E[g(X)] = x 1 S X1 x n S X n g(x)p X (x) Continuous: E[g(X)] = g(x)f X (x) dx 1 dx n.

Theorem 8.4 When the components of X are independent random variables, E [g 1 (X 1 )g 2 (X 2 ) g n (X n )] = E [g 1 (X 1 )] E [g 2 (X 2 )] E [g n (X n )].

Proof: Theorem 8.4 When X is discrete, independence implies P X (x) = P X1 (x 1 ) P Xn (x n ). This implies E [g 1 (X 1 ) g n (X n )] = = x 1 S X1 ( x n S X n x 1 S X1 g 1 (x 1 )P X1 (x 1 ) g 1 (x 1 ) g n (x n )P X (x) (8.14) ) ( x n S X n g n (x n )P Xn (x n ) ) (8.15) = E [g 1 (X 1 )] E [g 2 (X 2 )] E [g n (X n )]. (8.16) The derivation is similar for independent continuous random variables.

Theorem 8.5 Given the continuous random vector X, define the derived random vector Y such that Y k = ax k + b for constants a > 0 and b. The CDF and PDF of Y are ( y1 b F Y (y) = F X ( y1 b a,..., y ) n b, f a Y (y) = 1 a nf X a,..., y ) n b. a

Proof: Theorem 8.5 We observe Y has CDF F Y (y) = P[aX 1 + b y 1,..., ax n + b y n ]. Since a > 0, [ F Y (y) = P X 1 y 1 b,..., X n y ] ( n b y1 b = F a a X,..., y ) n b. a a (8.17) Definition 5.13 defines the joint PDF of Y, f Y (y) = n F Y1,...,Y n (y 1,..., y n ) = 1 ( y1 b y 1 y n a nf X,..., y ) n b. (8.18) a a

Theorem 8.6 If X is a continuous random vector and A is an invertible matrix, then Y = AX + b has PDF f Y (y) = 1 det (A) f ( X A 1 (y b) )

Proof: Theorem 8.6 Let B = {y y ỹ} so that F Y (ỹ) = B f Y (y) dy. Define the vector transformation x = T (y) = A 1 (y b). It follows that Y B if and only if X T (B), where T (B) = {x Ax + b ỹ} is the image of B under transformation T. This implies F Y (ỹ) = P [X T (B)] = By the change-of-variable theorem (Math Fact B.13), F Y (ỹ) = B f X T (B) f X(x) dx (8.19) ( A 1 (y b) ) det ( A 1 ) dy (8.20) where det(a 1 ) is the absolute value of the determinant of A 1. Definition 8.3 for the CDF and PDF of a random vector combined with Theorem 5.23(b) imply that f Y (y) = f X (A 1 (y b)) det(a 1 ). The theorem follows, since det(a 1 ) = 1/ det(a).

Quiz 8.3(A) A test of light bulbs produced by a machine has three possible outcomes: L, long life; A, average life; and R, reject. The results of different tests are independent. All tests have the following probability model: P[L] = 0.3, P[A] = 0.6, and P[R] = 0.1. Let X 1, X 2, and X 3 be the number of light bulbs that are L, A, and R respectively in five tests. Find the PMF P X (x); the marginal PMFs P X1 (x 1 ), P X2 (x 2 ), and P X3 (x 3 ); and the PMF of W = max(x 1, X 2, X 3 ).

Quiz 8.3(A) Solution Referring to Theorem 2.9, each test is a subexperiment with three possible outcomes: L, A and R. In five trials, the vector X = [ ] X 1 X 2 X 3 indicating the number of outcomes of each subexperiment has the multinomial PMF ( 5 ) P X (x) = 0.3 x 1 x 1, x 2, x 0.6x 2 0.1x 3. 3 We can find the marginal PMF for each X i from the joint PMF P X (x); however it is simpler to just start from first principles and observe that X 1 is the number of occurrences of L in five independent tests. If we view each test as a trial with success probability P[L] = 0.3, we see that X 1 is a binomial (n, p) = (5, 0.3) random variable. Similarly, X 2 is a binomial (5, 0.6) random variable and X 3 is a binomial (5, 0.1) random variable. That is, for p 1 = 0.3, p 2 = 0.6 and p 3 = 0.1, ( 5 P Xi (x) = p x) x i (1 p i) 5 x. (1) [Continued]

Quiz 8.3(A) Solution (Continued 2) From the marginal PMFs, we see that X 1, X 2 and X 3 are not independent. Hence, we must use Theorem 8.1 to find the PMF of W. In particular, since X 1 + X 2 + X 3 = 5 and since each X i is non-negative, P W (0) = P W (1) = 0. Furthermore, P W (2) = P X (1, 2, 2) + P X (2, 1, 2) + P X (2, 2, 1) = 5!0.3(0.6)2 (0.1) 2 2!2!1! = 0.1458. + 5!0.32 (0.6)(0.1) 2 2!2!1! + 5!0.32 (0.6) 2 (0.1) 2!2!1! In addition, for w = 3, w = 4, and w = 5, the event W = w occurs if and only if one of the mutually exclusive events X 1 = w, X 2 = w, or X 3 = w occurs. Thus, P W (3) = P W (4) = P W (5) = (2) 3 P Xi (3) = 0.486, (3) i=1 3 P Xi (4) = 0.288, (4) i=1 3 P Xi (5) = 0.0802. (5) i=1

Quiz 8.3(B) The random vector X has PDF f X (x) = e x 3 0 x 1 x 2 x 3, 0 otherwise. (8.21) Find the PDF of Y = AX + b. where A = diag[2, 2, 2] and b = [ 4 4 4 ].

Quiz 8.3(B) Solution Since each Y i = 2X i + 4, we can apply Theorem 8.5 to write f Y (y) = 1 ( y1 4 2 3f X 2, y 2 4 2, y ) 3 4 2 = (1/8)e (y 3 4)/2 4 y 1 y 2 y 3, 0 otherwise. (1) Note that for other matrices A, the constraints on y resulting from the constraints 0 X 1 X 2 X 3 can be much more complicated.

Section 8.4 Expected Value Vector and Correlation Matrix

Definition 8.6 Expected Value Vector The expected value of a random vector X is a column vector E [X] = µ X = [ E [X 1 ] E [X 2 ] E [X n ]].

Example 8.4 Problem If X = [ X 1 X 2 X 3 ], what are the components of XX?

Example 8.4 Solution XX = X 1 X 2 X 3 [ ] X 1 2 X 1 X 2 X 1 X 3 X1 X 2 X 3 = X 2 X 1 X2 2 X 2 X 3. (8.22) X 3 X 1 X 3 X 2 X3 2

Definition 8.7 Expected Value of a Random Matrix For a random matrix A with the random variable A ij as its i, jth element, E[A] is a matrix with i, jth element E[A ij ].

Definition 8.8 Vector Correlation The correlation of a random vector X is an n n matrix R X with i, jth element R X (i, j) = E[X i X j ]. In vector notation, R X = E [ XX ].

Example 8.5 If X = [ X 1 X 2 X 3 ], the correlation matrix of X is R X = E [ X 2 1 ] E [X 2 X 1 ] E [ X 2 2 E [X 1 X 2 ] E [X 1 X 3 ] ] E [X 2 X 3 ] E [X 3 X 1 ] E [X 3 X 2 ] E [ X 2 3 ] = E [ X 2 1 ] r X2,X 1 E [ X 2 2 r X1,X 2 r X1,X 3 ] r X2,X 3 ]. r X3,X 1 r X3,X 2 E [ X 2 3

Definition 8.9 Vector Covariance The covariance of a random vector X is an n n matrix C X with components C X (i, j) = Cov[X i, X j ]. In vector notation, C X = E [ (X µ X )(X µ X ) ]

Example 8.6 If X = [ X 1 X 2 X 3 ], the covariance matrix of X is C X = Var[X 1 ] Cov [X 1, X 2 ] Cov [X 1, X 3 ] Cov [X 2, X 1 ] Var[X 2 ] Cov [X 2, X 3 ] (8.23) Cov [X 3, X 1 ] Cov [X 3, X 2 ] Var[X 3 ]

Theorem 8.7 For a random vector X with correlation matrix R X, covariance matrix C X, and vector expected value µ X, C X = R X µ X µ X.

Proof: Theorem 8.7 The proof is essentially the same as the proof of Theorem 5.16(a), with vectors replacing scalars. Cross multiplying inside the expectation of Definition 8.9 yields C X = E [ XX Xµ X µ XX + µ X µ X = E [ XX ] E [ Xµ X] E [ µx X ] + E [ µ X µ X]. (8.24) Since E[X] = µ X is a constant vector, C X = R X E [X] µ X µ X E [ X ] + µ X µ X = R X µ Xµ X. (8.25) ]

Example 8.7 Problem Find the expected value E[X], the correlation matrix R X, and the covariance matrix C X of the two-dimensional random vector X with PDF f X (x) = 2 0 x 1 x 2 1, 0 otherwise. (8.26)

Example 8.7 Solution The elements of the expected value vector are E [X i ] = x if X (x) dx 1 dx 2 = 1 0 x2 0 2x i dx 1 dx 2, i = 1, 2. (8.27) The integrals are E[X 1 ] = 1/3 and E[X 2 ] = 2/3, so that µ X = E[X] = [ 1/3 2/3 ]. The elements of the correlation matrix are E [ X 2 1 E [ X2 2 E [X 1 X 2 ] = ] = ] = x2 1 f X(x) dx 1 dx 2 = x2 2 f X(x) dx 1 dx 2 = 1 0 1 x 1x 2 f X (x) dx 1 dx 2 = x2 0 1 0 2x2 1 dx 1dx 2, (8.28) x2 0 0 2x2 2 dx 1dx 2, (8.29) x2 0 2x 1x 2 dx 1 dx 2. (8.30) These integrals are E[X 1 2 ] = 1/6, E[X 2 2 ] = 1/2, and E[X 1 X 2 ] = 1/4. [Continued]

Example 8.7 Solution (Continued 2) Therefore, R X = [ ] 1/6 1/4. (8.31) 1/4 1/2 We use Theorem 8.7 to find the elements of the covariance matrix. [ ] [ ] [ ] C X = R X µ X µ 1/6 1/4 1/9 2/9 1/18 1/36 X = =. (8.32) 1/4 1/2 2/9 4/9 1/36 1/18

Definition 8.10 Vector Cross-Correlation The cross-correlation of random vectors, X with n components and Y with m components, is an n m matrix R XY with i, jth element R XY (i, j) = E[X i Y j ], or, in vector notation, R XY = E [ XY ].

Definition 8.11 Vector Cross-Covariance The cross-covariance of a pair of random vectors X with n components and Y with m components is an n m matrix C XY with i, jth element C XY (i, j) = Cov[X i, Y j ], or, in vector notation, C XY = E [ (X µ X )(Y µ Y ) ].

Theorem 8.8 X is an n-dimensional random vector with expected value µ X, correlation R X, and covariance C X. The m-dimensional random vector Y = AX + b, where A is an m n matrix and b is an m-dimensional vector, has expected value µ Y, correlation matrix R Y, and covariance matrix C Y given by µ Y = Aµ X + b, R Y = AR X A + (Aµ X )b + b(aµ X ) + bb, C Y = AC X A.

Proof: Theorem 8.8 We derive the formulas for the expected value and covariance of Y. The derivation for the correlation is similar. First, the expected value of Y is µ Y = E [AX + b] = A E [X] + E [b] = Aµ X + b. (8.33) It follows that Y µ Y = A(X µ X ). This implies C Y = E [ (A(X µ X ))(A(X µ X )) ] = E [ A(X µ X ))(X µ X ) A ] = A E [ (X µ X )(X µ X ) ] A = AC X A. (8.34)

Example 8.8 Problem Given the expected value µ X, the correlation R X, and the covariance C X of random vector X in Example 8.7, and Y = AX + b, where A = 1 0 6 3 and b = 3 6 0 2, (8.35) 2 find the expected value µ Y, the correlation R Y, and the covariance C Y.

Example 8.8 Solution From the matrix operations of Theorem 8.8, we obtain µ Y = [ 1/3 2 3 ] and R Y = 1/6 13/12 4/3 13/12 7.5 9.25 ; C Y = 4/3 9.25 12.5 1/18 5/12 1/3 5/12 3.5 3.25. (8.36) 1/3 3.25 3.5

Theorem 8.9 The vectors X and Y = AX + b have cross-correlation R XY and crosscovariance C XY given by R XY = R X A + µ X b, C XY = C X A.

Example 8.9 Problem Continuing Example 8.8 for random vectors X and Y = AX+b, calculate (a) The cross-correlation matrix R XY and the cross-covariance matrix C XY. (b) The correlation coefficients ρ Y1,Y 3 and ρ X2,Y 1.

Example 8.9 Solution (a) Direct matrix calculation using Theorem 8.9 yields [ ] [ 1/6 13/12 4/3 1/18 5/12 1/3 R XY = ; C 1/4 5/3 29/12 XY = 1/36 1/3 5/12 ]. (8.37) (b) Referring to Definition 5.6 and recognizing that Var[Y i ] = C Y (i, i), we have ρ Y1,Y 3 = Cov [Y 1, Y 3 ] Var[Y 1 ] Var[Y 3 ] Similarly, ρ X2,Y 1 = Cov [X 2, Y 1 ] Var[X 2 ] Var[Y 1 ] = = C Y (1, 3) C Y (1, 1)C Y (3, 3) C XY (2, 1) C X (2, 2)C Y (1, 1) = 0.756 (8.38) = 1/2. (8.39)

Quiz 8.4 The three-dimensional random vector X = [ X 1 X 2 X 3 ] has PDF f X (x) = 6 0 x 1 x 2 x 3 1, 0 otherwise. Find E[X] and the correlation and covariance matrices R X and C X. (8.40)

Quiz 8.4 Solution To solve this problem, we need to find the expected values E[X i ] and E[X i X j ] for each I and j. To do this, we need the marginal PDFs f Xi (x i ) and f Xi,X j (x i, x j ). First we note that each marginal PDF is nonzero only if any subset of the x i obeys the ordering contraints 0 x 1 x 2 x 3 1. Within these constraints, we have and and f X1,X 2 (x 1, x 2 ) = f X2,X 3 (x 2, x 3 ) = f X1,X 3 (x 1, x 3 ) = f X (x) dx 3 = 1 f X (x) dx 1 = f X (x) dx 2 = x3 x 2 6 dx 3 = 6(1 x 2 ), (1) x2 0 6 dx 1 = 6x 2, (2) x 1 6 dx 2 = 6(x 3 x 1 ). (3) In particular, we must keep in mind that f X1,X 2 (x 1, x 2 ) = 0 unless 0 x 1 x 2 1, f X2,X 3 (x 2, x 3 ) = 0 unless 0 x 2 x 3 1, and that f X1,X 3 (x 1, x 3 ) = 0 unless 0 x 1 x 3 1. The complete expressions are { 6(1 x2 ) 0 x 1 x 2 1, f X1,X 2 (x 1, x 2 ) = (4) 0 otherwise. { 6x2 0 x 2 x 3 1, f X2,X 3 (x 2, x 3 ) = (5) 0 otherwise. { 6(x3 x 1 ) 0 x 1 x 3 1, f X1,X 3 (x 1, x 3 ) = (6) 0 otherwise. [Continued]

Quiz 8.4 Solution (Continued 2) Now we can find the marginal PDFs. When 0 x i 1 for each x i, f X1 (x 1 ) = = 1 f X1,X 2 (x 1, x 2 ) dx 2 x 1 6(1 x 2 ) dx 2 = 3(1 x 1 ) 2. (7) f X2 (x 2 ) = = 1 f X2,X 3 (x 2, x 3 ) dx 3 x 2 6x 2 dx 3 = 6x 2 (1 x 2 ). (8) f X3 (x 3 ) = = x3 0 f X2,X 3 (x 2, x 3 ) dx 2 6x 2 dx 2 = 3x 2 3. (9) [Continued]

Quiz 8.4 Solution (Continued 3) The complete expressions are f X1 (x 1 ) = { 3(1 x1 ) 2 0 x 1 1, 0 otherwise. (10) f X2 (x 2 ) = f X3 (x 3 ) = { 6x2 (1 x 2 ) 0 x 2 1, 0 otherwise. { 3x 2 3 0 x 3 1, 0 otherwise. (11) (12) Now we can find the components E[X i ] = xf X i (x) dx of µ X. E [X 1 ] = E [X 2 ] = E [X 3 ] = 1 0 1 0 1 0 3x(1 x) 2 dx = 1/4, (13) 6x 2 (1 x) dx = 1/2, (14) 3x 3 dx = 3/4. (15) [Continued]

Quiz 8.4 Solution (Continued 4) To find the correlation matrix R X, we need to find E[X i X j ] for all i and j. with the second moments: Using marginal PDFs, the cross terms are E [X 1 X 2 ] = x 1 x 2 f X1,X 2 (x 1, x 2 ), dx 1 dx 2 1 ( 1 ) = 6x 1 x 2 (1 x 2 ) dx 2 dx 1 = 0 x 1 We start E [ ] 1 X1 2 = 3x 2 (1 x) 2 dx = 1 0 10. (16) E [ ] 1 X2 2 = 6x 3 (1 x) dx = 3 10. (17) 0 E [ ] 1 X3 2 = 3x 4 dx = 3 5. (18) 0 1 0 [x 1 3x 3 1 + 2x4 1 ] dx 1 = 3 20. (19) E [X 2 X 3 ] = 1 1 0 x 2 6x 2 2 x 3 dx 3 dx 2 = 1 0 [3x 2 2 3x4 2 ] dx 2 = 2 5. [Continued]

Quiz 8.4 Solution (Continued 5) E [X 1 X 3 ] = = = 1 1 0 1 0 1 0 6x 1 x 3 (x 3 x 1 ) dx 3 dx 1 x 1 ( ) (2x 1 x 3 3 3x2 1 x2 3 ) x 3=1 x 3 =x 1 dx 1 [2x 1 3x 2 1 + x4 1 ] dx 1 = 1/5. (20) Summarizing the results, X has correlation matrix 1/10 3/20 1/5 R X = 3/20 3/10 2/5. (21) 1/5 2/5 3/5 Vector X has covariance matrix C X = R X E [X] E [X] = 1 10 3 20 1 5 3 20 3 10 2 5 1 5 2 5 3 5 1 4 1 2 3 4 [ 1 4 1 2 ] 3 4 = 1 3 2 1 2 4 2. (22) 80 1 2 3 This problem shows that even for fairly simple joint PDFs, computing the covariance matrix can be time consuming.

Section 8.5 Gaussian Random Vectors

Definition 8.12 Gaussian Random Vector X is the Gaussian (µ X, C X ) random vector with expected value µ X and covariance C X if and only if ( 1 f X (x) = (2π) n/2 exp 1 ) [det (C X )] 1/2 2 (x µ X ) C 1 X (x µ X ) where det(c X ), the determinant of C X, satisfies det(c X ) > 0.

Theorem 8.10 A Gaussian random vector X has independent components if and only if C X is a diagonal matrix.

Proof: Theorem 8.10 First, if the components of X are independent, then for i j, X i and X j are independent. By Theorem 5.17(c), Cov[X i, X j ] = 0. Hence the off-diagonal terms of C X are all zero. If C X is diagonal, then C X = σ2 1... σ 2 n and C 1 X = 1/σ2 1 It follows that C X has determinant det(c X ) = n i=1 σ2 i and that (x µ X ) C 1 X (x µ X) =... 1/σ 2 n. (8.41) n (X i µ i ) 2. (8.42) From Definition 8.12, we see that ( ) 1 n f X (x) = (2π) n/2 n exp (x i µ i )/2σ i=1 σ2 i 2 i i=1 (8.43) n 1 = exp ( ) (x i µ i ) 2 /2σ 2 2πσ 2 i. i (8.44) i=1 Thus f X (x) = n i=1 f X i (x i ), implying X 1,..., X n are independent. i=1 σ 2 i

Example 8.10 Problem Consider the outdoor temperature at a certain weather station. On May 5, the temperature measurements in units of degrees Fahrenheit taken at 6 AM, 12 noon, and 6 PM are all Gaussian random variables, X 1, X 2, X 3, with variance 16 degrees 2. The expected values are 50 degrees, 62 degrees, and 58 degrees respectively. The covariance matrix of the three measurements is 16.0 12.8 11.2 C X = 12.8 16.0 12.8. (8.45) 11.2 12.8 16.0 (a) Write the joint PDF of X 1, X 2 using the algebraic notation of Definition 5.10. (b) Write the joint PDF of X 1, X 2 using vector notation. (c) Write the joint PDF of X = [ X 1 X 2 X 3 ] using vector notation.

Example 8.10 Solution (a) First we note that X 1 and X 2 have expected values µ 1 = 50 and µ 2 = 62, variances σ 2 1 = σ2 2 = 16, and covariance Cov[X 1, X 2 ] = 12.8. It follows from Definition 5.6 that the correlation coefficient is ρ X1,X 2 = Cov [X 1, X 2 ] = 12.8 σ 1 σ 2 16 From Definition 5.10, the joint PDF is f X1,X 2 (x 1, x 2 ) = exp [ = 0.8. (8.46) (x 1 50) 2 1.6(x 1 50)(x 2 62) + (x 2 62) 2 19.2 60.3 (b) Let W = [ ] X 1 X 2 denote a vector representation for random variables X 1 and X 2. From the covariance matrix C X, we observe that the 2 2 submatrix in the upper left corner is the covariance matrix of the random vector W. Thus [Continued] ].

Example 8.10 Solution (Continued 2) µ W = [ ] 50, C 62 W = [ ] 16.0 12.8. (8.47) 12.8 16.0 We observe that det(c W ) = 92.16 and det(c W ) 1/2 = 9.6. From Definition 8.12, the joint PDF of W is f W (w) = 1 ( 60.3 exp 1 ) 2 (w µ W )T C 1 W (w µ W ). (8.48) (c) Since µ X = [ 50 62 58 ] and det(cx ) 1/2 = 22.717, X has PDF f X (x) = 1 ( 357.8 exp 1 ) 2 (x µ X )T C 1 X (x µ X ). (8.49)

Theorem 8.11 Given an n-dimensional Gaussian random vector X with expected value µ X and covariance C X, and an m n matrix A with rank(a) = m, Y = AX + b is an m-dimensional Gaussian random vector with expected value µ Y = Aµ X + b and covariance C Y = AC X A.

Proof: Theorem 8.11 The proof of Theorem 8.8 contains the derivations of µ Y and C Y. Our proof that Y has a Gaussian PDF is confined to the special case when m = n and A is an invertible matrix. The case of m < n is addressed in Problem 8.5.14. When m = n, we use Theorem 8.6 to write 1 f Y (y) = det (A) f ( X A 1 (y b) ) (8.50) = exp ( 1 2 [A 1 (y b) µ X ] C 1 X [A 1 (y b) µ X ] ) (2π) n/2 det (A) det (C X ) 1/2. (8.51) In the exponent of f Y (y), we observe that A 1 (y b) µ X = A 1 [y (Aµ X + b)] = A 1 (y µ Y ), (8.52) since µ Y = Aµ X + b. [Continued]

Proof: Theorem 8.11 (Continued 2) Applying (8.52) to (8.51) yields f Y (y) = exp ( 1 2 [A 1 (y µ Y )] C 1 X [A 1 (y µ Y )] ) (2π) n/2 det (A) det (C X ) 1/2. (8.53) Using the identities det(a) det(c X ) 1/2 = det(ac X A ) 1/2 and (A 1 ) = (A ) 1, we can write f Y (y) = exp ( 1 2 (y µ Y ) (A ) 1 C 1 X A 1 (y µ Y ) ) (2π) n/2 det (AC X A ) 1/2. (8.54) Since (A ) 1 C 1 X A 1 = (AC X A ) 1, we see from Equation (8.54) that Y is a Gaussian vector with expected value µ Y and covariance matrix C Y = AC X A.

Example 8.11 Problem Continuing Example 8.10, use the formula Y i = (5/9)(X i 32) to convert the three temperature measurements to degrees Celsius. (a) What is µ Y, the expected value of random vector Y? (b) What is C Y, the covariance of random vector Y? (c) Write the joint PDF of Y = [ Y 1 Y 2 Y 3 ] using vector notation.

Example 8.11 Solution (a) In terms of matrices, we observe that Y = AX + b where A = 5/9 0 0 0 5/9 0, b = 160 0 0 5/9 9 1 1. (8.55) 1 (b) Since µ X = [ 50 62 58 ], from Theorem 8.11, µ Y = Aµ X + b = 10 50/3 130/9. (8.56) (c) The covariance of Y is C Y = AC X A. We note that A = A = (5/9)I where I is the 3 3 identity matrix. Thus C Y = (5/9) 2 C X and C 1 Y = (9/5)2 C 1 X f Y (y) = 1 24.47 exp. The PDF of Y is ( 81 50 (y µ Y )T C 1 X (y µ Y ) ). (8.57)

Definition 8.13 Standard Normal Random Vector The n-dimensional standard normal random vector Z is the n-dimensional Gaussian random vector with E[Z] = 0 and C Z = I.

Theorem 8.12 For a Gaussian (µ X, C X ) random vector, let A be an n n matrix with the property AA = C X. The random vector Z = A 1 (X µ X ) is a standard normal random vector.

Proof: Theorem 8.12 Applying Theorem 8.11 with A replaced by A 1, and b = A 1 µ X, we have that Z is a Gaussian random vector with expected value and covariance E [Z] = E [ A 1 (X µ X ) ] = A 1 E [X µ X ] = 0 (8.58) C Z = A 1 C X (A 1 ) = A 1 AA (A ) 1 = I. (8.59)

Theorem 8.13 Given the n-dimensional standard normal random vector Z, an invertible n n matrix A, and an n-dimensional vector b, X = AZ + b is an n-dimensional Gaussian random vector with expected value µ X = b and covariance matrix C X = AA.

Proof: Theorem 8.13 By Theorem 8.11, X is a Gaussian random vector with expected value µ X = E [X] = E [AZ + µ X ] = A E [Z] + b = b. (8.60) The covariance of X is C X = AC Z A = AIA = AA. (8.61)

Theorem 8.14 For a Gaussian vector X with covariance C X, there always exists a matrix A such that C X = AA.

Proof: Theorem 8.14 To verify this fact, we connect some simple facts: In Problem 8.4.12, we ask you to show that every random vector X has a positive semidefinite covariance matrix C X. By Math Fact B.17, every eigenvalue of C X is nonnegative. The definition of the Gaussian vector PDF requires the existence of C 1 X. Hence, for a Gaussian vector X, all eigenvalues of C X are nonzero. From the previous step, we observe that all eigenvalues of C X must be positive. Since C X is a real symmetric matrix, Math Fact B.15 says it has a singular value decomposition (SVD) C X = UDU where D = diag[d 1,..., d n ] is the diagonal matrix of eigenvalues of C X. Since each d i is positive, we can define D 1/2 = diag[ d 1,..., d n ], and we can write We see that A = UD 1/2. C X = UD 1/2 D 1/2 U = ( UD 1/2) ( UD 1/2). (8.62)

Quiz 8.5 Z is the two-dimensional standard normal random vector. The Gaussian random vector X has components X 1 = 2Z 1 + Z 2 + 2 and X 2 = Z 1 Z 2. (8.65) Calculate the expected value vector µ X and the covariance matrix C X.

Quiz 8.5 Solution We observe that X = AZ + b where [ ] 2 1 A =, b = 1 1 It follows from Theorem 8.13 that µ X = b and that [ ] [ ] [ C X = AA 2 1 2 1 5 1 = = 1 1 1 1 1 2 [ ] 2. (1) 0 ].

Section 8.6 Matlab

Example 8.12 Problem Finite random vector X = [ X 1 X 2, X 5 ] has PMF P X (x) = k x x x i { 10, 9,..., 10} ; i = 1, 2,..., 5, 0 otherwise. (8.66) What is the constant k? Find the expected value and standard deviation of X 3.

Example 8.12 Solution Summing P X (x) over all possible values of x is the sort of tedious task that Matlab handles easily. Here are the code and corresponding output: %x5.m sx=-10:10; [SX1,SX2,SX3,SX4,SX5]... =ndgrid(sx,sx,sx,sx,sx); P=sqrt(SX1.^2 +SX2.^2+SX3.^2+SX4.^2+SX5.^2); k=1.0/(sum(sum(sum(sum(sum(p)))))) P=k*P; EX3=sum(sum(sum(sum(sum(P.*SX3))))) EX32=sum(sum(sum(sum(sum(P.*(SX3.^2)))))); sigma3=sqrt(ex32-(ex3)^2) >> x5 k = 1.8491e-008 EX3 = -3.2960e-017 sigma3 = 6.3047 >> In fact, by symmetry arguments, it should be clear that E[X 3 ] = 0. In adding 11 5 terms, Matlab s finite precision led to a small error on the order of 10 17.

Example 8.13 Problem Write a Matlab function f=gaussvectorpdf(mu,c,x) that calculates f X (x) for a Gaussian (µ, C) random vector.

Example 8.13 Solution function f=gaussvectorpdf(mu,c,x) n=length(x); z=x(:)-mu(:); f=exp(-z *inv(c)*z)/... sqrt((2*pi)^n*det(c)); gaussvectorpdf computes the Gaussian vector PDF f X (x) of Definition 8.12. Of course, Matlab makes the calculation simple by providing operators for matrix inverses and determinants.

Quiz 8.6 The daily noon temperature, measured in degrees Fahrenheit, in New Jersey in July can be modeled as a Gaussian random vector T = [ T 1 T 31 ] where T i is the temperature on the ith day of the month. Suppose that E[T i ] = 80 for all i, and that T i and T j have covariance Cov [ T i, T j ] = 36 1 + i j Define the daily average temperature as (8.67) Y = T 1 + T 2 + + T 31. (8.68) 31 Based on this model, write a Matlab program p=julytemps(t) that calculates P[Y T ], the probability that the daily average temperature is at least T degrees.

Quiz 8.6 Solution First, we observe that Y = AT where A = [ 1/31 1/31 1/31 ]. Since T is a Gaussian random vector, Theorem 8.11 tells us that Y is a 1 dimensional Gaussian vector, i.e., just a Gaussian random variable. The expected value of Y is µ Y = µ T = 80. The covariance matrix of Y is 1 1 and is just equal to Var[Y ]. Thus, by Theorem 8.11, Var[Y ] = AC T A. In julytemps.m shown below, the first two lines generate the 31 31 covariance matrix CT, or C T. Next we calculate Var[Y ]. The final step is to use the Φ( ) function to calculate P[Y < T ]. function p=julytemps(t); [D1 D2]=ndgrid((1:31),(1:31)); CT=36./(1+abs(D1-D2)); A=ones(31,1)/31.0; CY=(A )*CT*A; p=phi((t-80)/sqrt(cy)); [Continued]

Quiz 8.6 Solution (Continued 2) Here is the output of julytemps.m: >> julytemps([70 75 80 85 90]) ans = 0.0000 0.0221 0.5000 0.9779 1.0000 Note that P[T 70] is not actually zero and that P[T 90] is not actually 1.0000. Its just that the Matlab s short format output, invoked with the command format short, rounds off those probabilities. The long format output resembles: >> format long >> julytemps([70 75]) ans = 0.000028442631 0.022073830676 >> julytemps([85 90]) ans = 0.977926169323 0.999971557368 The ndgrid function is a useful to way calculate many covariance matrices. However, in this problem, C X has a special structure; the i, jth element is [Continued]

Quiz 8.6 Solution (Continued 3) C T (i, j) = c i j = 36 1 + i j. (1) If we write out the elements of the covariance matrix, we see that C T = c 0 c 1 c 30 c 1 c 0........... c 1. (2) c 30 c 1 c 0 This covariance matrix is known as a symmetric Toeplitz matrix. Because Toeplitz covariance matrices are quite common, Matlab has a toeplitz function for generating them. The function julytemps2 use the toeplitz to generate the correlation matrix C T. function p=julytemps2(t); c=36./(1+abs(0:30)); CT=toeplitz(c); A=ones(31,1)/31.0; CY=(A )*CT*A; p=phi((t-80)/sqrt(cy));