EE 650 Lecture 4 Intro to Estimation Theory Random Vectors EE 650 D. Van Alphen 1
Lecture Overview: Random Variables & Estimation Theory Functions of RV s (5.9) Introduction to Estimation Theory MMSE Estimation Example + Handout Intro/Reference Orthogonality Principle, with Example Random Vectors & Transformations of Random Vectors Independent Experiments & Repeated Trials omplex Random Variables: pdf, covariance, variance h. 6 orrelation Matrix, ovariance Matrix for R. Vectors onditional Densities and Distributions onditional Expected Values haracteristic Functions for R. Vectors
Functions of RV s Start with RV s X and Y that are statistically known onsider these as input to systems, say g and h, yielding new RV s Z and W: x y g h z w old RV s; known joint pdf new RV s Goal: Find the statistical description (i.e., the joint pdf) for Z and W; Use f zw (z, w) to find f z (z), f w (w) 3
Functions of RV s z = g(x, y); w = h(x, y) laim: f zw (z, w) = fxy (x1,y 1) J(x,y ) 1 1 fxy (x,y) J(x,y ) fxy (xn,yn) J(x,y ) (summing over all roots) n n x -1, y -1 where J(x, y) = z x w x z y w y is the Jacobian of the transformation. d"new" d"old" Getting rid of x, y in answer 4
Functions of RV s: An Example onsider the linear transformation: z = ax + by J(x, y) = a c b d ad bc w = cx + dy Solve original system backwards for x and y: x dz bw, k k ad bc aw cz y, k dz bw aw cz f xy, f zw (z, w) = k k ad bc to get rid of x, y Note: if X, Y are jointly normal, then so are W, Z 5
Linear Transformation: Example, continued Special ase: z = x cos f + y sin f w = -x sin f + y cos f (a = cos f, b = sin f, c = - sin f, d = cos f) (z 0, w 0 ) f (x 0, y 0 ) Rotation of RV s (x, y) by angle f 6
Introduction to Estimation Theory (not from Miller & hilders) Tx: Y (RV of interest) Random Disturbance Rcv: X (Observable, data) Goal: get estimate of y in terms of observation x (i.e., as a function of X i.e., f(x)) Best estimate (one possibility): minimize mean square value of estimation error (MS estimate, or MMSE) hoose f(x) to minimize: E{[Y-f(X)] } 7
ase 1: Estimation of RV Y by constant c: f(x) = c Notation: Let e = E{[Y-c] } = ( y c) fy(y)dy hoose c to minimize e, the mean-squared error de dc (y c)f Y (y)dy 0 y fy(y)dy c fy(y)dy h Y c c = h Y 8
ase : Linear MS Estimation of RV Y Goal: get estimate of y as linear function of observation X: f(x) = AX + B hoose A, B to minimize e = E{[Y (AX B)] } (*) First fix A, so the equivalent requirement (to *) is: hoose constant B to minimize e = Thus, (**) becomes (plugging in for B): e = E{[(Y AX) hy Ah X )] } to be est. d By case 1, we want: B = E{Y A X } = h Y - Ah X E{[(Y AX) B)] } (**) constant est. E{[(Y h ) A(X Y h X )] } 9
ase : Linear MS Estimation of RV Y ontinuing with: e = E{[(Y h E{(Y h Y ) ) A(X h A(X Y h X X )(Y h )] Y } ) A (X h X ) } = s Y A r s X s Y + A s X ov, XY Now set de/da = 0 to minimize e by our choice of A: r s x s y = A s x A s r s y x Also: A s r s Y cov( X,Y) sy cov( X,Y) X sx sy sx s X 10
Vocabulary for ase : Linear MS Estimation of RV Y The linear estimate: AX + B Non-homogeneous linear estimate: AX + B Homogeneous linear estimate: AX The data or observable of the estimate: RV X The error of the estimate: E = Y (AX + B) The mean-squared error of the estimate: e = E{E } 11
ase 3: Non-linear MS Estimate of Y by Some Function c(x) (No constraints) Arbitrary function best choice for minimizing MS error) Goal: find c(x) to minimize: e = E{ [Y - c(x)] } [y c(x)] f(x,y)dx dy f(x) [y c(x)] 0 f(y x)dy dx f(x) f(y x) Minimize this for each fixed x But note: c(x) is constant for each fixed x, and f y (y x) is just some f y (y) for each fixed x. c(x) = E{Y X} = yf (y x) dy From ase 1 1
ase 3, Special ases (1): Y = g(x), (): X, Y independent 1. Here Y is a deterministic, known function of X, so: c(x) = E{Y X} = g(x) e = E{E } = E{ [Y c(x)] } = E{ [Y g(x)] } = 0. Here knowing X tells me nothing about Y: c(x) = E{Y X} = E{Y}, constant, independent of observation 13
Notes on Estimation Theory In general, the non-linear MS estimate, c(x) = E{Y X} is not a straight line, and will yield a smaller e than the linear estimate AX + B. But, if X and Y are jointly normal, the non-linear MS estimate and the linear MS estimate are identical: E{Y X} = AX + B hard easy 14
Summary: MMSE Estimation Tx: Y (RV of interest) Random Disturbance Rcv: Y (observable) ase 1: Estimating Y by constant c: c = h Y = E{Y} ase : Linear estimate of Y [Y = AX + B] B = h Y A h X, A = r s Y /s X ase 3: Arbitrary estimate of Y [Y = c(x)] c(x) = E{Y X} = y f Y (y x)dy (reduces to linear estimate if x,y jointly Gaussian) Recall: RV s X and Y are orthogonal iff: = EE 650 D. van Alphen 15
MMSE Estimation Example Tx: Y (RV of interest) Random Disturbance Rcv: X (observable) Assume X: U(0,1), Y: U(0,x), given X = x Find the (unconstrained) MMSE estimate of Y, given X = x. Solution: y ^ MMSE = E{ Y X=x } = y f Y (y x) dy 1/x, on (0, x) 1 x y x 0 1 x x x EE 650 D. van Alphen 16
ond. Prob. & Estimation Example (See separate handout on web page for solution.) (Scheaffer & Mclave) A soft-drink machine has a random amount Y in supply at the beginning of a given day, and dispenses a random amount Y 1 during the day (say in gallons). It is not re-supplied during the day; hence, Y 1 Y. The joint density for Y 1 and Y is: f(y 1, y ) = ½ 0 y 1 y, 0 y 0 else (That is, the points (y 1, y ) are uniformly distributed over the triangle shown.) Find the conditional probability density of Y 1, given that Y = y. Also evaluate the probability that less than ½ gallon is sold, given that the machine contains 1 gallon at the start of the day. Y avail. Y 1 dispensed (look-down sketch) y 1 0 EE 650 D. van Alphen 17 0 y
Orthogonality Principle onsider the linear MMSE estimate AX + B of RV Y, as a function of A and B, and thus minimized if: e/a = 0, e/b = 0 A e {E[(Y (AX B)) ]} A E[(Y (AX B))( X)] 0 E[(Y (AX B))(X)] 0 Error RV Data, or obs. Linear MMSE estimate AX + B of Y is the one that makes the error orthogonal to the data. EE 650 D. van Alphen 18
Orthogonality Principle: Intuitive Sketch Sketch for the ase of the homogeneous linear MMSE Estimate (B = 0): y Error: y - Ax x Ax Note that B = 0 means the estimate is ^ y = Ax, in the same direction as x. EE 650 D. van Alphen 19
Example: Finding a Homogeneous Linear MMSE Estimate Find a such that e = E{ [ Y ax] } is minimum. error Applying the orthogonality principle, we need: E{ [ Y ax] X} = 0 (error orthogonal to data) E{YX ax } = 0 E{YX} = E{aX } E{XY} = ae{x } a E{XY } E{X } EE 650 D. van Alphen 0
Random Vectors A random vector is a column vector X = [X 1, X,, X n ] T whose components X i are RV s, where T denotes the transpose. To find the probability that R. Vector X is in region D, we do an n-dimensional integral of the pdf, over region D: Pr{ X D} D f X (X,X 1,...,Xn)dX1dX...dXn, where the joint density for the RV s is n F(x1,x,,xn) f X (X) = f X (X 1, X,, X n ) = x1,..., xn and the joint cdf for the RV s is: F X (x 1, x,, x n ) = Pr{X 1 x 1,, X n x n } EE 650 D. van Alphen 1
Mean Vectors The random (column) vector vector X = [X 1, X,, X n ] T has mean (vector) E(X) = [h X 1, h X,, h Xn ]T where each entry in the vector is the mean of corresponding RV. Example: onsider the R. Vector [X 1, X, X 3, X 4 ] T where the component RV s X k are independent Gaussians, and where: N(h X k = k, s Xk = k ). Then the mean vector is: E(X) = [h X 1, h X, h X3, h X4 ]T = [1,, 3, 4] T EE 650 D. van Alphen
Random Vectors, continued In the over-all joint cdf, F X (x 1,, x n ): Replace some of the arguments by to obtain the joint cdf for other RV s e.g., F(x 1,, x 3, ) = F(x 1, x 3 ) Integrate the over-all joint pdf, f X (x 1,, x n ): Over some of the arguments to obtain the joint pdf for other RV s fx 1 3 4 1 3 4 e.g., (x,x,x,x )dx dx f(x,x ) EE 650 D. van Alphen 3
Transformations of Random Vectors (6.4.1) Given n functions: g 1 (X),, g n (X), where X = [X 1,, X n ] T consider RV s: Y 1 = g 1 (X),, Y n = g n (X) Then solve the system backwards for the x i s in terms of the y i s. 1. If the system of equations has no roots, then f Y (y 1,, y n ) = 0. If the system of equations has a single root, then: f (x,...,x f Y (y 1,, y n ) = x 1 n (*) J(x1,...,xn ) EE 650 D. van Alphen 4 ) where
Transformations of R. Vectors, continued J(x 1,...,x n ) g x g x 1 1 n 1 g x g x 1 n n n is the Jacobian of the transformation ( d(new)/d(old) ) 3. If the system of equations has multiple roots, then add corresponding terms (for each root) to equation (*), summing over all roots. 4. Replace x i s in the final equation by the y i s obtained from the solve backwards step. EE 650 D. van Alphen 5
Independence of RV s The RV s X 1,, X n are (mutually) independent iff: F(x 1,, x n ) = F(x 1 ) F(x n ) f(x 1,, x n ) = f(x 1 ) f(x n ) If the RV s X 1,, X n are independent, then so are the RV s Y 1 = g 1 (X 1 ),, Y n = g n (X n ) (Functions of independent RV s are themselves independent.) EE 650 D. van Alphen 6
Independent Experiments & Repeated Trials Let S n = S 1 x S x x S n be the sample space of a combined experiment where RV s X i only depend on outcome z i of S i ; i.e., X i (z 1, z,, z i,, z n ) = X i (z i ) Special ase: repeat the same experiment n times; then each of the repetitions is independent of the others RV s X i are independent and identically distributed (iid) Example: Toss a coin 100 times; let X i = 1 if i th toss is heads, 0 if tails f X i (x i) (½) 0 1 EE 650 D. van Alphen 7 (½) x i
orrelation Matrices for R Vectors Multiple RV s {X i } are uncorrelated if ij = 0 for all i j. Define the correlation matrix for the R. Vector X = [X 1 X n ] T R x R XX R R R 11 1 n1 R R R 1 n E[ XX where R ij = E{X i X j } = R ji is the correlation of RV s X i and X j. cov(x, X j ) R R R 1n n nn T ] (Note that the matrix is.) EE 650 D. van Alphen 8
orrelation Matrices & ovariance Matrices Define the covariance matrix for the R. Vector X = [X 1 X n ] T X XX 11 1 n1 where ij = E{X i X j } h i h j = R ji h i h j = ji is the covariance of RV s X i and X j. X1 Note that R n = E{X X T } = E [X1 Xn] Size of product: Xn (n, n) (1, n) (n, 1) Recall RV s X i and X j are said to be orthogonal if E{X i X j } = 0. EE 650 D. van Alphen 9 1 n 1n n nn
orrelation Matrices & ovariance Matrices - An Example Find the covariance matrix for the R. Vector X = [X 1, X, X 3, X 4 ] T where the component RV s X k are independent Gaussians, each: N(h = k, s = k ). Note1: The diagonal entries are just the variances: kk = k Note : The off-diagonal entries are the covariances; independent uncorrelated cov ij = 0 (i j) Thus, 11 1 31 41 1 3 4 13 3 33 43 14 4 34 44 1 0 0 0 0 4 0 0 0 0 9 0 0 0 0 16 EE 650 D. van Alphen 30
Review of Facts from Linear Algebra Definition: Square real matrix Z of size (n, n) is non-negative definite if for any real vector A = [ a 1,, a n ]. Q = A Z A T 0 (*) Non-negative definite (nnd) matrices have all eigenvalues 0. If Q in equation (*) is strictly > 0, then Z is positive definite, and all of the eigenvalues of Z will be positive. EE 650 D. van Alphen 31
Special Properties of orrelation Matrices Let D n be the determinant of correlation matrix R X of RV s {X i }. 1. R X is non-negative definite.. D n is real and non-negative: D n 0 3. D n R 11 R R nn with equality iff the RV s {X i } are mutually orthogonal matrix R X is a diagonal matrix. Note that covariance matrix X will have properties similar to the 3 above, because it is the correlation matrix for the centered RV s {X i h i }. EE 650 D. van Alphen 3
onditional Densities & Distributions Recall the conditional pdf for RV s X and Y: Similarly, the conditional pdf for RV s X n,, X k+1, given X k,, X 1 : f(x,,x,,x ) f(x,,x x,,x ) 1 k n n k1 k 1 f(x,,x ) hain Rule, with 4 RV s: f(y Example: f(x,x,x ) d f(x x,x ) 1 3 1 3 F(x1 x,x3) f(x,x ) dx x) f(x 1, x, x 3, x 4 ) = f(x 4 x 3, x, x 1 ) f(x 3 x, x 1 ) f(x x 1 ) f(x 1 ) 3 1 1 k f(x,y) f(x) EE 650 D. van Alphen 33