ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

Similar documents
ECE Lecture #9 Part 2 Overview

conditional cdf, conditional pdf, total probability theorem?

3. Probability and Statistics

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

BASICS OF PROBABILITY

STAT 430/510: Lecture 16

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Continuous Random Variables

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Gaussian random variables inr n

ECE 650 1/11. Homework Sets 1-3

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Appendix A : Introduction to Probability and stochastic processes

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

ECE Lecture #10 Overview

ECE Homework Set 3

Multivariate Random Variable

Review of Probability Theory

1.1 Review of Probability Theory

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

A Probability Review

Lecture 2: Repetition of probability theory and statistics

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

5 Operations on Multiple Random Variables

Notes on Random Vectors and Multivariate Normal

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

Bivariate distributions

ECON Fundamentals of Probability

Chapter 5,6 Multiple RandomVariables

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

EE4601 Communication Systems

Multiple Random Variables

Let X and Y denote two random variables. The joint distribution of these random

18 Bivariate normal distribution I

UNIT-2: MULTIPLE RANDOM VARIABLES & OPERATIONS

Chapter 4. Chapter 4 sections

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

2 (Statistics) Random variables

Random Variables. P(x) = P[X(e)] = P(e). (1)

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Joint Probability Distributions and Random Samples (Devore Chapter Five)

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

Jointly Distributed Random Variables

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Lecture 11. Multivariate Normal theory

MAS223 Statistical Inference and Modelling Exercises

Gaussian vectors and central limit theorem

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University.

Chapter 5 Class Notes

10. Joint Moments and Joint Characteristic Functions

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

ACM 116: Lectures 3 4

Review: mostly probability and some statistics

ENGG2430A-Homework 2

ECON 5350 Class Notes Review of Probability and Distribution Theory

2. Conditional Expectation (9/15/12; cf. Ross)

1 Random Variable: Topics

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Multivariate Distributions (Hogg Chapter Two)

Lecture 11. Probability Theory: an Overveiw

Algorithms for Uncertainty Quantification

Lecture 1: August 28

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Probability Background

Elements of Probability Theory

where r n = dn+1 x(t)

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Raquel Prado. Name: Department of Applied Mathematics and Statistics AMS-131. Spring 2010

3 Multiple Discrete Random Variables

1 Review of Probability

Communication Theory II

The Multivariate Normal Distribution. In this case according to our theorem

Statistical Methods in Particle Physics

Statistics for scientists and engineers

Review (Probability & Linear Algebra)

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Chapter 4 continued. Chapter 4 sections

Chapter 2. Probability

Lecture 2: Review of Probability

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Section 8.1. Vector Notation

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

ECE 4400:693 - Information Theory

16.584: Random Vectors

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

More on Distribution Function

Stat 5101 Notes: Algorithms

Recitation. Shuang Li, Amir Afsharinejad, Kaushik Patnaik. Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014

Random Variables and Their Distributions

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

1 Presessional Probability

Transcription:

EE 650 Lecture 4 Intro to Estimation Theory Random Vectors EE 650 D. Van Alphen 1

Lecture Overview: Random Variables & Estimation Theory Functions of RV s (5.9) Introduction to Estimation Theory MMSE Estimation Example + Handout Intro/Reference Orthogonality Principle, with Example Random Vectors & Transformations of Random Vectors Independent Experiments & Repeated Trials omplex Random Variables: pdf, covariance, variance h. 6 orrelation Matrix, ovariance Matrix for R. Vectors onditional Densities and Distributions onditional Expected Values haracteristic Functions for R. Vectors

Functions of RV s Start with RV s X and Y that are statistically known onsider these as input to systems, say g and h, yielding new RV s Z and W: x y g h z w old RV s; known joint pdf new RV s Goal: Find the statistical description (i.e., the joint pdf) for Z and W; Use f zw (z, w) to find f z (z), f w (w) 3

Functions of RV s z = g(x, y); w = h(x, y) laim: f zw (z, w) = fxy (x1,y 1) J(x,y ) 1 1 fxy (x,y) J(x,y ) fxy (xn,yn) J(x,y ) (summing over all roots) n n x -1, y -1 where J(x, y) = z x w x z y w y is the Jacobian of the transformation. d"new" d"old" Getting rid of x, y in answer 4

Functions of RV s: An Example onsider the linear transformation: z = ax + by J(x, y) = a c b d ad bc w = cx + dy Solve original system backwards for x and y: x dz bw, k k ad bc aw cz y, k dz bw aw cz f xy, f zw (z, w) = k k ad bc to get rid of x, y Note: if X, Y are jointly normal, then so are W, Z 5

Linear Transformation: Example, continued Special ase: z = x cos f + y sin f w = -x sin f + y cos f (a = cos f, b = sin f, c = - sin f, d = cos f) (z 0, w 0 ) f (x 0, y 0 ) Rotation of RV s (x, y) by angle f 6

Introduction to Estimation Theory (not from Miller & hilders) Tx: Y (RV of interest) Random Disturbance Rcv: X (Observable, data) Goal: get estimate of y in terms of observation x (i.e., as a function of X i.e., f(x)) Best estimate (one possibility): minimize mean square value of estimation error (MS estimate, or MMSE) hoose f(x) to minimize: E{[Y-f(X)] } 7

ase 1: Estimation of RV Y by constant c: f(x) = c Notation: Let e = E{[Y-c] } = ( y c) fy(y)dy hoose c to minimize e, the mean-squared error de dc (y c)f Y (y)dy 0 y fy(y)dy c fy(y)dy h Y c c = h Y 8

ase : Linear MS Estimation of RV Y Goal: get estimate of y as linear function of observation X: f(x) = AX + B hoose A, B to minimize e = E{[Y (AX B)] } (*) First fix A, so the equivalent requirement (to *) is: hoose constant B to minimize e = Thus, (**) becomes (plugging in for B): e = E{[(Y AX) hy Ah X )] } to be est. d By case 1, we want: B = E{Y A X } = h Y - Ah X E{[(Y AX) B)] } (**) constant est. E{[(Y h ) A(X Y h X )] } 9

ase : Linear MS Estimation of RV Y ontinuing with: e = E{[(Y h E{(Y h Y ) ) A(X h A(X Y h X X )(Y h )] Y } ) A (X h X ) } = s Y A r s X s Y + A s X ov, XY Now set de/da = 0 to minimize e by our choice of A: r s x s y = A s x A s r s y x Also: A s r s Y cov( X,Y) sy cov( X,Y) X sx sy sx s X 10

Vocabulary for ase : Linear MS Estimation of RV Y The linear estimate: AX + B Non-homogeneous linear estimate: AX + B Homogeneous linear estimate: AX The data or observable of the estimate: RV X The error of the estimate: E = Y (AX + B) The mean-squared error of the estimate: e = E{E } 11

ase 3: Non-linear MS Estimate of Y by Some Function c(x) (No constraints) Arbitrary function best choice for minimizing MS error) Goal: find c(x) to minimize: e = E{ [Y - c(x)] } [y c(x)] f(x,y)dx dy f(x) [y c(x)] 0 f(y x)dy dx f(x) f(y x) Minimize this for each fixed x But note: c(x) is constant for each fixed x, and f y (y x) is just some f y (y) for each fixed x. c(x) = E{Y X} = yf (y x) dy From ase 1 1

ase 3, Special ases (1): Y = g(x), (): X, Y independent 1. Here Y is a deterministic, known function of X, so: c(x) = E{Y X} = g(x) e = E{E } = E{ [Y c(x)] } = E{ [Y g(x)] } = 0. Here knowing X tells me nothing about Y: c(x) = E{Y X} = E{Y}, constant, independent of observation 13

Notes on Estimation Theory In general, the non-linear MS estimate, c(x) = E{Y X} is not a straight line, and will yield a smaller e than the linear estimate AX + B. But, if X and Y are jointly normal, the non-linear MS estimate and the linear MS estimate are identical: E{Y X} = AX + B hard easy 14

Summary: MMSE Estimation Tx: Y (RV of interest) Random Disturbance Rcv: Y (observable) ase 1: Estimating Y by constant c: c = h Y = E{Y} ase : Linear estimate of Y [Y = AX + B] B = h Y A h X, A = r s Y /s X ase 3: Arbitrary estimate of Y [Y = c(x)] c(x) = E{Y X} = y f Y (y x)dy (reduces to linear estimate if x,y jointly Gaussian) Recall: RV s X and Y are orthogonal iff: = EE 650 D. van Alphen 15

MMSE Estimation Example Tx: Y (RV of interest) Random Disturbance Rcv: X (observable) Assume X: U(0,1), Y: U(0,x), given X = x Find the (unconstrained) MMSE estimate of Y, given X = x. Solution: y ^ MMSE = E{ Y X=x } = y f Y (y x) dy 1/x, on (0, x) 1 x y x 0 1 x x x EE 650 D. van Alphen 16

ond. Prob. & Estimation Example (See separate handout on web page for solution.) (Scheaffer & Mclave) A soft-drink machine has a random amount Y in supply at the beginning of a given day, and dispenses a random amount Y 1 during the day (say in gallons). It is not re-supplied during the day; hence, Y 1 Y. The joint density for Y 1 and Y is: f(y 1, y ) = ½ 0 y 1 y, 0 y 0 else (That is, the points (y 1, y ) are uniformly distributed over the triangle shown.) Find the conditional probability density of Y 1, given that Y = y. Also evaluate the probability that less than ½ gallon is sold, given that the machine contains 1 gallon at the start of the day. Y avail. Y 1 dispensed (look-down sketch) y 1 0 EE 650 D. van Alphen 17 0 y

Orthogonality Principle onsider the linear MMSE estimate AX + B of RV Y, as a function of A and B, and thus minimized if: e/a = 0, e/b = 0 A e {E[(Y (AX B)) ]} A E[(Y (AX B))( X)] 0 E[(Y (AX B))(X)] 0 Error RV Data, or obs. Linear MMSE estimate AX + B of Y is the one that makes the error orthogonal to the data. EE 650 D. van Alphen 18

Orthogonality Principle: Intuitive Sketch Sketch for the ase of the homogeneous linear MMSE Estimate (B = 0): y Error: y - Ax x Ax Note that B = 0 means the estimate is ^ y = Ax, in the same direction as x. EE 650 D. van Alphen 19

Example: Finding a Homogeneous Linear MMSE Estimate Find a such that e = E{ [ Y ax] } is minimum. error Applying the orthogonality principle, we need: E{ [ Y ax] X} = 0 (error orthogonal to data) E{YX ax } = 0 E{YX} = E{aX } E{XY} = ae{x } a E{XY } E{X } EE 650 D. van Alphen 0

Random Vectors A random vector is a column vector X = [X 1, X,, X n ] T whose components X i are RV s, where T denotes the transpose. To find the probability that R. Vector X is in region D, we do an n-dimensional integral of the pdf, over region D: Pr{ X D} D f X (X,X 1,...,Xn)dX1dX...dXn, where the joint density for the RV s is n F(x1,x,,xn) f X (X) = f X (X 1, X,, X n ) = x1,..., xn and the joint cdf for the RV s is: F X (x 1, x,, x n ) = Pr{X 1 x 1,, X n x n } EE 650 D. van Alphen 1

Mean Vectors The random (column) vector vector X = [X 1, X,, X n ] T has mean (vector) E(X) = [h X 1, h X,, h Xn ]T where each entry in the vector is the mean of corresponding RV. Example: onsider the R. Vector [X 1, X, X 3, X 4 ] T where the component RV s X k are independent Gaussians, and where: N(h X k = k, s Xk = k ). Then the mean vector is: E(X) = [h X 1, h X, h X3, h X4 ]T = [1,, 3, 4] T EE 650 D. van Alphen

Random Vectors, continued In the over-all joint cdf, F X (x 1,, x n ): Replace some of the arguments by to obtain the joint cdf for other RV s e.g., F(x 1,, x 3, ) = F(x 1, x 3 ) Integrate the over-all joint pdf, f X (x 1,, x n ): Over some of the arguments to obtain the joint pdf for other RV s fx 1 3 4 1 3 4 e.g., (x,x,x,x )dx dx f(x,x ) EE 650 D. van Alphen 3

Transformations of Random Vectors (6.4.1) Given n functions: g 1 (X),, g n (X), where X = [X 1,, X n ] T consider RV s: Y 1 = g 1 (X),, Y n = g n (X) Then solve the system backwards for the x i s in terms of the y i s. 1. If the system of equations has no roots, then f Y (y 1,, y n ) = 0. If the system of equations has a single root, then: f (x,...,x f Y (y 1,, y n ) = x 1 n (*) J(x1,...,xn ) EE 650 D. van Alphen 4 ) where

Transformations of R. Vectors, continued J(x 1,...,x n ) g x g x 1 1 n 1 g x g x 1 n n n is the Jacobian of the transformation ( d(new)/d(old) ) 3. If the system of equations has multiple roots, then add corresponding terms (for each root) to equation (*), summing over all roots. 4. Replace x i s in the final equation by the y i s obtained from the solve backwards step. EE 650 D. van Alphen 5

Independence of RV s The RV s X 1,, X n are (mutually) independent iff: F(x 1,, x n ) = F(x 1 ) F(x n ) f(x 1,, x n ) = f(x 1 ) f(x n ) If the RV s X 1,, X n are independent, then so are the RV s Y 1 = g 1 (X 1 ),, Y n = g n (X n ) (Functions of independent RV s are themselves independent.) EE 650 D. van Alphen 6

Independent Experiments & Repeated Trials Let S n = S 1 x S x x S n be the sample space of a combined experiment where RV s X i only depend on outcome z i of S i ; i.e., X i (z 1, z,, z i,, z n ) = X i (z i ) Special ase: repeat the same experiment n times; then each of the repetitions is independent of the others RV s X i are independent and identically distributed (iid) Example: Toss a coin 100 times; let X i = 1 if i th toss is heads, 0 if tails f X i (x i) (½) 0 1 EE 650 D. van Alphen 7 (½) x i

orrelation Matrices for R Vectors Multiple RV s {X i } are uncorrelated if ij = 0 for all i j. Define the correlation matrix for the R. Vector X = [X 1 X n ] T R x R XX R R R 11 1 n1 R R R 1 n E[ XX where R ij = E{X i X j } = R ji is the correlation of RV s X i and X j. cov(x, X j ) R R R 1n n nn T ] (Note that the matrix is.) EE 650 D. van Alphen 8

orrelation Matrices & ovariance Matrices Define the covariance matrix for the R. Vector X = [X 1 X n ] T X XX 11 1 n1 where ij = E{X i X j } h i h j = R ji h i h j = ji is the covariance of RV s X i and X j. X1 Note that R n = E{X X T } = E [X1 Xn] Size of product: Xn (n, n) (1, n) (n, 1) Recall RV s X i and X j are said to be orthogonal if E{X i X j } = 0. EE 650 D. van Alphen 9 1 n 1n n nn

orrelation Matrices & ovariance Matrices - An Example Find the covariance matrix for the R. Vector X = [X 1, X, X 3, X 4 ] T where the component RV s X k are independent Gaussians, each: N(h = k, s = k ). Note1: The diagonal entries are just the variances: kk = k Note : The off-diagonal entries are the covariances; independent uncorrelated cov ij = 0 (i j) Thus, 11 1 31 41 1 3 4 13 3 33 43 14 4 34 44 1 0 0 0 0 4 0 0 0 0 9 0 0 0 0 16 EE 650 D. van Alphen 30

Review of Facts from Linear Algebra Definition: Square real matrix Z of size (n, n) is non-negative definite if for any real vector A = [ a 1,, a n ]. Q = A Z A T 0 (*) Non-negative definite (nnd) matrices have all eigenvalues 0. If Q in equation (*) is strictly > 0, then Z is positive definite, and all of the eigenvalues of Z will be positive. EE 650 D. van Alphen 31

Special Properties of orrelation Matrices Let D n be the determinant of correlation matrix R X of RV s {X i }. 1. R X is non-negative definite.. D n is real and non-negative: D n 0 3. D n R 11 R R nn with equality iff the RV s {X i } are mutually orthogonal matrix R X is a diagonal matrix. Note that covariance matrix X will have properties similar to the 3 above, because it is the correlation matrix for the centered RV s {X i h i }. EE 650 D. van Alphen 3

onditional Densities & Distributions Recall the conditional pdf for RV s X and Y: Similarly, the conditional pdf for RV s X n,, X k+1, given X k,, X 1 : f(x,,x,,x ) f(x,,x x,,x ) 1 k n n k1 k 1 f(x,,x ) hain Rule, with 4 RV s: f(y Example: f(x,x,x ) d f(x x,x ) 1 3 1 3 F(x1 x,x3) f(x,x ) dx x) f(x 1, x, x 3, x 4 ) = f(x 4 x 3, x, x 1 ) f(x 3 x, x 1 ) f(x x 1 ) f(x 1 ) 3 1 1 k f(x,y) f(x) EE 650 D. van Alphen 33