Recursive Estimation Raffaello D Andrea Spring 07 Problem Set 3: Etracting Estimates from Probability Distributions Last updated: April 5, 07 Notes: Notation: Unlessotherwisenoted,, y,andz denoterandomvariables, f )ortheshort hand f)) denotes the probability density function of, and f y y) or f y)) denotes the conditional probability density function of conditioned on y. The epected value is denoted by E[ ], the variance is denoted by Var[ ], and PrZ) denotes the probability that the event Z occurs. A normally distributed random variable with mean µ and variance σ is denoted by N µ,σ ). Please report any errors found in this problem set to the teaching assistants michaemu@ethz.ch or lhewing@ethz.ch).
Problem Set Problem Let and w be scalar CRVs, where,w are independent: f,w) = f)fw), and w [,] is uniformly distributed. Let z = +3w. a) Calculate fz ) using the change of variables. b) Let =. Verify that Prz [,4] = ) = Prw [0,] = ). Problem Let w R and R 4 be CRVs, where and w are independent: f,w) = f)fw). The two elements of the measurement noise vector w are w,w, and the PDF of w is uniform: f w w) = f w,w w,w ) = The measurement model is where z = H+Gw H = /4 for w and w 0 otherwise. [ ] [ 0 0 0 ], G = 0 0 0 0 0 3 Calculate the conditional joint PDF f z z ) using the change of variables.. Problem 3 Find the maimum likelihood ML) estimate ˆ ML := argmafz ) when z i = +w i, i =, where w i are independent with w i N 0,σ i) and zi,w i, R. Problem 4 Solve Problem 3 if the joint PDF of w is fw) = fw,w ) = /4 for w and w 0 otherwise.
Problem 5 Let R n be an unknown, constant parameter. Furthermore, let w R m, n m, represent the measurement noise with its joint PDF defined by the multivariate Gaussian distribution f w w) = π) m / detσ)) ep ) / wt Σ w with mean E[w] = 0 and variance E[ww T ] =: Σ = Σ T R m m. Assume that and w are independent. Let the measurement be z = H+w where the matri H R m n has full column rank. a) Calculate the ML estimate ˆ ML := argmafz ) for a diagonal variance σ 0 0 Σ = 0 σ........... 0. 0 0 σm Note that in this case, the elements w i of w are mutually independent. b) Calculate the ML estimate for non-diagonal Σ, which implies that the elements w i of w are correlated. Hint: The following matri calculus is useful for this problem see, for eample, the Matri Cookbook for reference): Recall the definition of a Jacobian matri from the lecture: Let g) : R n R m be a function that maps the vector R n to another vector g) R m. The Jacobian matri is defined as g) := g ) g ). g m) g ) g ).. g m) g ) n g ) n. g m) n Rm n where j is the j-th element of and g i ) the i-th element of g). Derivative of a quadratic form: Chain rule: T A = T A+A T) R n, A R n n. u = g), R n,u R n ut Au = u T A+A T) u, u = g) Rn n. 3
Problem 6 Find the maimum a posteriori MAP) estimate ˆ MAP := argmafz )f) when z = +w,,z,w R where is eponentially distributed, 0 for < 0 f) = e for 0 and w N0, ). Problem 7 In the derivation of the recursive least squares algorithm, we used the following result: trace ABA T) = AB if B = B T A where trace ) is the sum of the diagonal elements of a matri. Prove that the above holds for the two-dimensional case, i.e. A,B R. Problem 8 The derivation of the recursive least squares algorithm also used the following result: traceab) A = B T Prove that the above holds for A,B R. Problem 9 Consider the recursive least squares algorithm presented in lecture, where the estimation error at step k was shown to be ) ek) = I Kk)Hk) ek ) Kk)wk). Furthermore, E[wk)] = 0, E[ek)] = 0, and, w),...} are mutually independent. Show that the measurement error variance Pk) = Var[ek)] = E[ek)e T k)] is given by ) T Pk) = I Kk)Hk) Pk ) I Kk)Hk)) +Kk)Rk)K T k) where Rk) = Var[wk)]. Problem 0 Use the results from Problems 7 and 8 to derive the gain matri Kk) for the recursive least squares algorithm that minimizes the mean squared error MMSE) by solving 0 = ) T Kk) trace I Kk)Hk)) Pk ) I Kk)Hk)) +Kk)Rk)K T k). 4
Problem Apply the recursive least squares algorithm when where zk) = +wk), k =,,... ) E[] = 0, Var[] =, and E[wk)] = 0, Var[wk)] =. ) The random variables,w),w),...} are mutually independent and zk),wk), R. a) Epress the gain matri Kk) and variance Pk) as eplicit functions of k. b) Simulate the algorithm for normally distributed continuous random variables wk) and with the above properties ). c) Simulate the algorithm for and wk) being discrete random variables that take the values and with equal probability. Note that and wk) satisfy ). Simulate for values of k up to 0000. Problem Consider the random variables and measurement equation from Problem c), i.e. zk) = +wk), k =,,... 3) for mutually independent and wk) that take the values and with equal probability. a) WhatistheMMSEestimateofattimek givenallpastmeasurementsz),z),...,zk)? Remember that the MMSE estimate is given by [ ˆ MMSE k) := argmine ˆ ) ] z),z),...,zk). ˆ b) Now consider the MMSE estimate that is restricted to the sample space of, i.e. [ ˆ MMSE k) := argmin E ˆ ) ] z),z),...,zk) ˆ X where X denotes the sample space of. Calculate the estimate of at time k given all past measurements z),z),...,zk). 5
Sample solutions Problem a) The change of variables formula for conditional PDFs when z = g,w) is f z z ) = f w hz,) ) g. w,hz,)) where w = hz,) is the unique solution to z = g,w). We solve z = g,w) for w to find hz,) w = z 3 = hz,) and apply the change of variables: f z z ) = f w hz,) ) g w,hz,)) = f w 3 z ) ) 3 = 3 f w ) z ). 3 With the given uniform PDF of w, the PDF of fz ) can then be stated eplicitely: 6 for 3 z +3 f z z ) = 0 otherwise b) We begin by computing Prz [,4] = ): Prz [,4] = ) = 4 The probability Prw [0,] = ) is f z z = )dz = Prw [0,] = ) = Prw [0,]) = which is indeed identical to Prz [,4] = ). 0 4 6 dz =. f w w)dw = 0 dw = Problem Analogous to the lecture notes, the multivariate change of variables formula for conditional PDFs is f z z ) = f w hz,) ) detg) where w = hz,) is the unique solution to z = H+Gw. The function hz,) is given by [ ] [ ] ) w = G 0 0 0 0 z H) = 0 z 3 0 0 0 [ ] z = ) 3 z 3 ) 6
where we used =,, 3, 4 ) and z = z,z ). Applying the change of variables, it follows that f z z ) = f w hz,) ) detg) = f w G z H) ) 3 = 3 f w Using the PDF f w w), this can then be epressed as f z z ) = z ), ) 3 z 3 ). 6 for z + and 3 +3 z 3 3 0 otherwise. Problem 3 For a normally-distributed random variable with w i N 0,σ i), we have fw i ) ep w i ) σi. Using the change of variables, and by the conditional independence of z,z given, we can write )) fz,z ) = fz )fz ) ep z ) σ + z ) σ. Differentiating with respect to, and setting to 0, leads to fz,z ) z ˆ ML) z ˆ ML) = 0 + =ˆ ML σ σ = 0 ˆ ML = z σ +z σ σ +σ which is a weighted sum of the measurements. Note that for σ = 0 : ˆ ML = z, for σ = 0 : ˆ ML = z. Problem 4 Using the total probability theorem, we find that fw i ) = for w i [,] 0 otherwise. and fw,w ) = fw )fw ). Using the change of variables, fz i ) = if z i 0 otherwise. We need to solve for fz,z ) = fz )fz ). Consider the different cases: z z >, for which the likelihoods do not overlap, see the following figure: 7
fz ) z fz ) z Since no value for can eplain both, z and z, given the model z i = +w i, we find fz,z ) = 0. z z, where the likelihoods overlap, which is shown in the following figure: fz,z ) fz ) fz ) 4 z z z z We find fz,z ) = 4 for [z,z +] [z,z +] 0 otherwise where A B denotes the intersection of the sets A and B. Therefore the estimate is ˆ ML [z,z +] [z,z +]. All values in this range are equally likely, and any value in the range is therefore a valid maimum likelihood estimate ˆ ML. Problem 5 a) We apply the multivariate change of coordinates: f z z ) = f w hz,) ) detg) = f w z H ) = f w z H) = π) m / detσ)) ep ) / z H)T Σ z H) We proceed similarly as in class, for z,w R m and R n : H = H H. H m, H i = [ ] h i h i... h in, where hij R. Using fz ) ep m )) z i H i ) i= σ i. 8
differentiating with respect to j, and setting to zero yields m z i H iˆ ML h ij = 0, i= σ i [ ] h j h mj }} column of H From which we have and H T W z Hˆ ML) = 0 σ ˆ ML = H T WH ) H T Wz. σ... 0 σm j =,,...,n } } W 0 z Hˆ ML ) = 0, j =,...,n. Note that this can also be interpreted as a weighted least squares, w) = z H ˆ = argmin w) T Ww) where W is a weighting of the measurements. The result in both cases is the same! b) We find the maimizing ˆ ML by setting the derivative of fz ) with respect to to zero. With the given facts about matri calculus, we find 0 = fz ) ep ) z H)T Σ z H) where we use the fact that Σ ) T = Σ T ) = Σ. z H)T Σ + Σ ) T ) z H) = z H) T Σ H) Finally, z H) T Σ H = 0 z T Σ H = T H T Σ H and after we transpose the left and right hand sides, we find H T Σ z = H T Σ H. It follows that ˆ ML = H T Σ H ) H T Σ z. 9
Problem 6 We calculate the conditional probability density function, fz ) = ep ) z ). π Both the measurement likelihood fz ) and the prior f) are shown in the following figure f) fz ) z We find fz )f) = 0 for < 0 π ep )ep z )) for 0. We calculate the MAP estimate ˆ MAP that maimizes the above function for a given z. We note that fz )f) > 0 for 0, and the maimizing value is therefore ˆ MAP 0. Since the function is discontinuous at = 0, we have to consider two cases:. The maimum is at the boundary, ˆ MAP = 0. It follows that f z ˆ MAP ) ) f MAP ˆ = ep π z ). 4). The maimum is where the derivative of the likelihood fz )f) with respect to is zero, i.e. 0 = fz )f)) from which follows ˆ MAP = z =ˆ MAP = + z ˆ MAP ) at the maimum. Substituting back into the likelihood, we obtain f z ˆ MAP ) ) f MAP ˆ = ep z +)ep ) = ep z + ). 5) π π We note that f z ˆ MAP ) f ˆ MAP therefore 0 for z < ˆ MAP = z for z. ) f z ˆ MAP ) f ˆ MAP ) for z, and the ML estimate is 0
Problem 7 We begin by evaluating the individual elements of AB) and ABA T ) with A) ij denoting the i,j)th element of matri A element in ith row and jth column): AB) ij = a i b j +a i b j ABA T ) ij = a ib +a i b )a j +a i b +a i b )a j From this we obtain the trace as trace ABA T) = ABA T) + ABA T) and the derivatives = a b +a b )a +a b +a b )a +a b +a b )a +a b +a b )a trace ABA T) = a b +a b +a a }} b = AB) =b trace ABA T) = a b +a b +a b = AB) a trace ABA T) = a b +a b +a b = AB) a trace ABA T) a = a b +a b +a b = AB) We therefore proved that trace ABA T) = AB A holds for A,B R. Problem 8 We proceed similarly to the previous problem: that is traceab) = AB) +AB) = a b +a b +a b +a b traceab) = b = B) a = B T) traceab) = b = B) a = B T) traceab) = b = B) a = B T) traceab) = b = B) a = B T) traceab) A = B T for A,B R.
Problem 9 By definition, Pk) = E[ek)e T k)]. Substituting the estimation error ) ek) = I Kk)Hk) ek ) Kk)wk) in the definition, we obtain ) ) ) ) ] T Pk) =E[ I Kk)Hk) ek ) Kk)wk) I Kk)Hk) ek ) Kk)wk) ) T =E[ I Kk)Hk) ek )e T k ) I Kk)Hk)) +Kk)wk)w T k)k T k) ) ) ] T I Kk)Hk) ek )w T k)k T k) Kk)wk)e T k ) I Kk)Hk) ) T = I Kk)Hk) Pk ) I Kk)Hk)) +Kk)Rk)K T k) ) I Kk)Hk) E [ ek )w T k) ] K T k) Kk)E [ wk)e T k ) ] T. I Kk)Hk)) Becauseek ) andwk)areindependent,e[wk)e T k )] = E[wk)]E[e T k )]. SinceE[wk)] = 0, E[wk)]E[e T k )] = 0, and it follows that ) T Pk) = I Kk)Hk) Pk ) I Kk)Hk)) +Kk)Rk)K T k). Problem 0 First, we epand ) T I Kk)Hk) Pk ) I Kk)Hk)) +Kk)Rk)K T k) ) = Pk ) Kk)Hk)Pk ) Pk )H T k)k T k)+kk) Hk)Pk )H T k)+rk) K T k). We use the results from problems 7 and 8 and evaluate the summands individually, since tracea+b) = tracea)+traceb). We find Kk) trace Pk ) ) = 0 Kk) trace Kk)Hk)Pk ) ) = Pk )H T k), note that Pk ) = P T k ) Kk) trace Pk )H T k)k T k) ) = Pk )H T k), note that tracea) = trace A T) ) ) ) Kk) trace Kk) Hk)Pk )H T k) +Rk) K T k) = Kk) Hk)Pk )H T k) +Rk). Note that R T k) = Rk) and Hk)Pk )H T k) ) T = Hk)Pk )H T k). Finally, ) Kk) Hk)Pk )H T k) +Rk) = Pk )H T k). Kk) = Pk )H T k) Hk)Pk )H T k) +Rk))
Problem a) We initialize P0) =, ˆ0) = 0, Hk) = and Rk) = k =,,..., and get: Kk) = Pk ) Pk ) + 6) Pk) = Kk)) Pk ) +Kk) = Pk ) +Pk )) + Pk ) +Pk )) = Pk ) +Pk ). 7) From 7) and P0) =, we have P) =, P) = 3,..., and we can show by induction see below) that Pk) = +k, Kk) = +k from 6). Proof by induction: Assume Pk) = +k. 8) Start with P0) = +0 =, which satisfies 8). Now we show for k+): Pk+) = = = +k+ = k+ = Pk) +Pk) +k + +k +k +) by assumption 8) q.e.d. from 7) b) The Matlab code is available on the class web page. Recursive least squares works, and provides a reasonable estimate after several thousand iterations. The recursive least squares estimator is the optimal estimator for Gaussian noise. c) The Matlab code is available on the class web page. Recursive least squares works, but we can do much better with the proposed nonlinear estimator derived in Problem. 3
Problem a) The MMSE estimate is ˆ MMSE k) = argmin ˆ =,) ˆ ) f z),z),...,zk)). 9) Let us first consider the MMSE estimate at time k =. The only possible combinations are: w) z) - - - - 0-0 We can now construct the conditional PDF of when conditioned on z): z) f z)) - - - 0 / 0 / Therefore, if z) = or z) =, we know precisely: = and =, respectively. If z) = 0, it is equally likely that = or =. The MMSE estimate is for z) = : for z) = : for z) = 0: ˆ MMSE ) = since f = z) = ) = 0 ˆ MMSE ) = since f = z) = ) = 0 ˆ MMSE ) = argmin ˆ = argmin ˆ = argmin ˆ ˆ + ) = 0. f = z) = 0)ˆ ) +f = z) = 0)ˆ+) ) ˆ ) + ) ˆ+) Note that the same result could have been obtained by evaluating ˆ MMSE k) = E[ z)] = f z) = 0) = )+ ) ) = 0. =,) At time k =, we could construct a table for the conditional PDF of when conditioned on z) and z) as above. However, note that the MMSE estimate is given for all k if one measures either or at time k =, for z) =, ˆ MMSE k) = k =,,... for z) =, ˆ MMSE k) = k =,,.... 4
Ontheotherhand, ameasurementz) = 0doesnotprovideanyinformation, i.e. ) = and ) = stay equally likely. As soon as a subsequent measurement zk), k, has a value of or, is known precisely which is also reflected by the MMSE estimate. As long as we keep measuring 0, we do not gain additional information and ˆ MMSE k) = 0 if zl) = 0 l =,,...,k. Matlab code for this nonlinear MMSE estimator is available on the class webpage. b) The solution is identical to part a), ecept that the estimate is the minimizing value from the sample space of : ˆ MMSE k) = argmin ˆ ) f z),z),...,zk)) ˆ X. 0) =,) The same argument holds that, if one measurement is either - or, we know precisely. However, for z) = 0, the estimate is ˆ MMSE ) = argmin f = z) = 0)ˆ ) +f = z) = 0)ˆ+) ) ˆ X = argmin ˆ X = ±. ˆ ) + ) ˆ+) Note that X =,} and therefore, we cannot differentiate and set to zero theright hand side of Equation 0 to find the minimizing value. This implies that the estimate is not necessarily equal to E[ z], which is the case for measuring z = 0. In this case E[ z = 0] is equal 0, whereas the MMMSE estimator yields ± for ˆ MMSE. The remainder of the solution is identical to the above part a) in that the estimate remains ˆ MMSE = ± until a measurement is either - or. 5