ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions Problem Solutions : Yates and Goodman, 9.5.3 9.1.4 9.2.2 9.2.6 9.3.2 9.4.2 9.4.6 9.4.7 and Problem 9.1.4 Solution The joint PDF of X and Y is f X,Y (x, y) { 6(y x) x y 1 otherwise (1) (a) The conditional PDF of X given Y is found by dividing the joint PDF by the marginal with respect to Y. For y < or y > 1, f Y (y). For y 1, f Y (y) y The complete expression for the marginal PDF of Y is 6(y x) dx 6xy 3x 2 y 3y2 (2) f Y (y) { 3y 2 y 1 otherwise (3) Thus for < y 1, f X Y (x y) f X,Y (x, y) f Y (y) { 6(y x) 3y 2 x y otherwise (4) (b) The minimum mean square estimator of X given Y y is ˆX M (y) E [X Y y] y (c) First we must find the marginal PDF for X. For x 1, f X (x) f X,Y (x, y) dy The conditional PDF of Y given X is xf X Y (x y) dx (5) 6x(y x) 3y 2 dx 3x2 y 2x 3 xy 3y 2 y/3 (6) x x 6(y x) dy 3y 2 6xy y1 yx (7) 3(1 2x + x 2 ) 3(1 x) 2 (8) f Y X (y x) f X,Y (x, y) f X (x) { 2(y x) 1 2x+x 2 x y 1 otherwise (9) 1
(d) The minimum mean square estimator of Y given X is Ŷ M (x) E [Y X x] Perhaps surprisingly, this result can be simplified to yf Y X (y x) dy (1) 2y(y x) dy (11) x 1 2x + x2 (2/3)y3 y 2 y1 x 2 3x + x3 1 2x + x 2 3(1 x) 2. (12) yx Ŷ M (x) x 3 + 2 3. (13) Problem 9.2.2 Solution The problem statement tells us that f V (v) { 1/12 6 v 6, otherwise. Furthermore, we are also told that R V + X where X is a Gaussian (, 3) random variable. (a) The expected value of R is the expected value V plus the expected value of X. We already know that X has zero expected value, and that V is uniformly distributed between -6 and 6 volts and therefore also has zero expected value. So E [R] E [V + X] E [V ] + E [X]. (2) (b) Because X and V are independent random variables, the variance of R is the sum of the variance of V and the variance of X. (c) Since E[R] E[V ], Var[R] Var[V ] + Var[X] 12 + 3 15. (3) Cov [V, R] E [V R] E [V (V + X)] E [ V 2] Var[V ]. (4) (1) (d) The correlation coefficient of V and R is ρ V,R Cov [V, R] Var[V ] Var[R] The LMSE estimate of V given R is Var[V ] Var[V ] Var[R] σ V σ R. (5) σ V ˆV (R) ρ V,R (R E [R]) + E [V ] σ2 V σ R σr 2 R 12 R. (6) 15 Therefore a 12/15 4/5 and b. (e) The minimum mean square error in the estimate is e Var[V ](1 ρ 2 V,R) 12(1 12/15) 12/5 (7) 2
Problem 9.2.6 Solution The linear mean square estimator of X given Y is ˆX L (Y ) ( E [XY ] µx µ Y Var[Y ] To find the parameters of this estimator, we calculate f Y (y) f X (x) y The moments of X and Y are E [Y ] E [ Y 2] x ) (Y µ Y ) + µ X. (1) 6(y x) dx 6xy 3x 2 y 3y2 ( y 1) (2) 6(y x) dy 3y 3 dy 3/4 E [X] 3y 4 dy 3/5 E [ X 2] The correlation between X and Y is E [XY ] 6 { 3(1 + 2x + x 2 ) x 1, otherwise. x (3) 3x(1 2x + x 2 ) dx 1/4 (4) 3x 2 (1 + 2x + x 2 ) dx 1/1 (5) xy(y x) dy dx 1/5 (6) Putting these pieces together, the optimal linear estimate of X given Y is ( ) ( 1/5 3/16 ˆX L (Y ) 3/5 (3/4) 2 Y 3 ) + 1 4 4 Y 3 (7) Problem 9.3.2 Solution From the problem statement we know that R is an exponential random variable with expected value 1/µ. Therefore it has the following probability density function. f R (r) { µe µr r otherwise (1) It is also known that, given R r, the number of phone calls arriving at a telephone switch, N, is a Poisson (α rt ) random variable. So we can write the following conditional probability mass function of N given R. P N R (n r) { (rt ) n e rt n! n, 1,... otherwise (2) (a) The minimum mean square error estimate of N given R is the conditional expected value of N given R r. This is given directly in the problem statement as r. ˆN M (r) E [N R r] rt (3) 3
(b) The maximum a posteriori estimate of N given R is simply the value of n that will maximize P N R (n r). That is, ˆn MAP (r) arg max n P N R (n r) arg max n (rt )n e rt /n! (4) Usually, we set a derivative to zero to solve for the maximizing value. In this case, that technique doesn t work because n is discrete. Since e rt is a common factor in the maximization, we can define g(n) (rt ) n /n! so that ˆn MAP arg max n g(n). We observe that g(n) rt g(n 1) (5) n this implies that for n rt, g(n) g(n 1). Hence the maximizing value of n is the largest n such that n rt. That is, ˆn MAP rt. (c) The maximum likelihood estimate of N given R selects the value of n that maximizes f R Nn (r), the conditional PDF of R given N. When dealing with situations in which we mix continuous and discrete random variables, its often helpful to start from first principles. In this case, f R N (r n) dr P [r < R r + dr N n] (6) In terms of PDFs and PMFs, we have P [r < R r + dr, N n] P [N n] P [N n R r] P [r < R r + dr] P [N n] f R N (r n) P N R (n r) f R (r) P N (n) To find the value of n that maximizes f R N (r n), we need to find the denominator P N (n). P N (n) (7) (8) (9) P N R (n r) f R (r) dr (1) (rt ) n e rt µe µr dr (11) n! µt n r n (µ + T )e (µ+t )r dr (12) n!(µ + T ) µt n n!(µ + T ) E [Xn ] (13) where X is an exponential random variable with expected value 1/(µ + T ). There are several ways to derive the nth moment of an exponential random variable including integration by parts. In Example 6.5, the MGF is used to show that E[X n ] n!/(µ + T ) n. Hence, for n, µt n P N (n) (µ + T ) n+1 (14) 4
Finally, the conditional PDF of R given N is f R N (r n) P N R (n r) f R (r) P N (n) The ML estimate of N given R is (rt ) n e rt n! µe µr µt n (µ+t ) n+1 (15) (µ + T ) [(µ + T )r]n e (µ+t )r n! (16) ˆn ML (r) arg max n f R N (r n) arg max n (µ + T )[(µ + T )r]n e (µ+t )r n! (17) This maximization is exactly the same as in the previous part except rt is replaced by (µ + T )r. The maximizing value of n is ˆn ML (µ + T )r. Problem 9.4.2 Solution From the problem statement, we learn for vectors X [ X 1 X 2 ] [ X 3 and W W1 ] W 2 that 1 3/4 1/2 E [X], R X 3/4 1 3/4, E [W], R W 1/2 3/4 1 [ ].1.1 (1) In addition, Y [ Y1 Y 2 ] AX + W [ ] 1 1 X + W. (2) 1 1 (a) Since E[Y] AE[X], we can apply Theorem 9.7(a) which states that the minimum mean square error estimate of X 1 is ˆX 1 (Y) â Y where â R 1 Y R YX 1. First we find R Y. R Y E [ YY ] E [ (AX + W)(AX + W) ] (3) E [ (AX + W)(X A + W ) ] (4) E [ AXX A ] + E [ WX A ] + E [ AXW ] + E [ WW ] (5) Since X and W are independent, E[WX ] and E[XW ]. This implies R Y AE [ XX ] A + E [ WW ] (6) AR X A + R W (7) [ ] 1 3/4 1/2 1 [ ] [ ] 1 1 3/4 1 3/4 1 1.1 3.6 3 +. (8) 1 1.1 3 3.6 1/2 3/4 1 1 Once again, independence of W and X 1 yields R YX1 E [YX 1 ] E [(AX + W)X 1 ] AE [XX 1 ]. (9) 5
This implies R YX1 A E [ X1 2 ] E [X 2 X 1 ] E [X 3 X 1 ] R X (1, 1) A R X (2, 1) R X (3, 1) Putting these facts together, we find that â R 1 Y R YX 1 [ 1/11 25/33 25/33 1/11 Thus the linear MMSE estimator of X 1 given Y is [ ] 1 1 1 1 ] [ 7/4 5/4 ] 1 3/4 1/2 [ ] 7/4. (1) 5/4 1 [ ] 85. (11) 132 25 ˆX 1 (Y) â Y 85 132 Y 1 25 132 Y 2.6439Y 1.1894Y 2. (12) (b) By Theorem 9.7(c), the mean squared error of the optimal estimator is e L Var[X 1 ] â R YX1 (13) R X (1, 1) R YX 1 R 1 Y R YX 1 (14) 1 [ 7/4 5/4 ] [ ] [ ] 1/11 25/33 7/4 29.198. (15) 25/33 1/11 5/4 264 In Problem 9.4.1, we solved essentialy the same problem but the observations Y were not subjected to the additive noise W. In comparing the estimators, we see that the additive noise perturbs the estimator somewhat but not dramatically because the correaltion structure of X and the mapping A from X to Y remains unchanged. On the other hand, in the noiseless case, the resulting mean square error was about half as much, 3/52.577 versus.198. (c) We can estimate random variable X 1 based on the observation of random variable Y 1 using Theorem 9.4. Note that Theorem 9.4 is a special case of Theorem 9.8 in which the observation is a random vector. In any case, from Theorem 9.4, the optimum linear estimate is ˆX 1 (Y 1 ) a Y 1 + b where a Cov [X 1, Y 1 ], b µ X1 a µ Y1. (16) Var[Y 1 ] Since E[X i ] µ Xi and Y 1 X 1 + X 2 + W 1, we see that µ Y1 E [Y 1 ] E [X 1 ] + E [X 2 ] + E [W 1 ]. (17) These facts, along with independence of X 1 and W 1, imply Cov [X 1, Y 1 ] E [X 1 Y 1 ] E [X 1 (X 1 + X 2 + W 1 )] (18) R X (1, 1) + R X (1, 2) 7/4 (19) In addition, using R Y from part (a), we see that Var[Y 1 ] E [ Y1 2 ] RY (1, 1) 3.6. (2) 6
Thus a Cov [X 1, Y 1 ] 7/4 Var[Y 1 ] 3.6 35 72, b µ X1 a µ Y1. (21) Thus the optimum linear estimate of X 1 given Y 1 is ˆX 1 (Y 1 ) 35 72 Y 1. (22) From Theorem 9.4(b), the mean square error of this estimator is e L σ2 X 1 (1 ρ 2 X 1,Y 1 ) (23) Since X 1 and Y 1 have zero expected value, σx 2 1 R X (1, 1) 1 and σy 2 1 R Y (1, 1) 3.6. Also, since Cov[X 1, Y 1 ] 7/4, we see that ρ X1,Y 1 Cov [X 1, Y 1 ] 7/4 49 σ X1 σ Y1 3.6 24. (24) Thus e L 1 (49/242 ).1493. As we would expect, the estimate of X 1 based on just Y 1 has larger mean square error than the estimate based on both Y 1 and Y 2. Problem 9.4.6 Solution For this problem, let Y [ X 1 X 2 n 1] and let X X n. Since E[Y] and E[X], Theorem 9.7(a) tells us that the minimum mean square linear estimate of X given Y is ˆX n (Y) â Y, where â R 1 Y R YX. This implies that â is the solution to Note that 1 c c n 2 R Y E [ YY ] c c 2........... c, c n 2 c 1 R Y â R YX. (1) R YX E X 1 X 2. X n 1 X n c n 1 c n 2. c. (2) We see that the last column of cr Y equals R YX. Equivalently, if â [ c ], then R Y â R YX. It follows that the optimal linear estimator of X n given Y is which completes the proof of the claim. The mean square error of this estimate is ˆX n (Y) â Y cx n 1, (3) e L E [ (X n cx n 1 ) 2] (4) R X (n, n) cr X (n, n 1) cr X (n 1, n) + c 2 R X (n 1, n 1) (5) 1 2c 2 + c 2 1 c 2 (6) When c is close to 1, X n 1 and X n are highly correlated and the estimation error will be small. Comment: We will see in Chapter 11 that correlation matrices with this structure arise frequently in the study of wide sense stationary random sequences. In fact, if you read ahead, you will find that the claim we just proved is the essentially the same as that made in Theorem 11.1. 7
Problem 9.4.7 Solution (a) In this case, we use the observation Y to estimate each X i. Since E[X i ], E [Y] E [X j ] p j S j + E [N]. (1) j1 Thus, Theorem 9.7(a) tells us that the MMSE linear estimate of X i is ˆX i (Y) â Y where â R 1 Y R YX i. First we note that R YXi E [YX i ] E X j pj S j + N X i (2) Since N and X i are independent, E[NX i ] E[N]E[X i ]. Because X i and X j are independent for i j, E[X i X j ] E[X i ]E[X j ] for i j. In addition, E[X 2 i ] 1, and it follows that R YXi j1 E [X j X i ] p j S j + E [NX i ] p i S i. (3) j1 For the same reasons, R Y E [ ( YY ] ) E pj X j S j + N pl X l S l + N (4) j1 l1 j1 pj p l E [X j X l ] S j S l pj E [X j N] S j + }{{} l1 + pl E [ X l N ] S l + E [ NN ] (5) }{{} j1 l1 p j S j S j + σ 2 I (6) j1 Now we use a linear algebra identity. For a matrix S with columns S 1, S 2,..., S k, and a diagonal matrix P diag[p 1, p 2,..., p k ], p j S j S j SPS. (7) j1 Although this identity may be unfamiliar, it is handy in manipulating correlation matrices. (Also, if this is unfamiliar, you may wish to work out an example with k 2 vectors of length 2 or 3.) Thus, R Y SPS + σ 2 I, (8) 8
and â R 1 Y R YX i ( SPS + σ 2 I ) 1 pi S i. (9) Recall that if C is symmetric, then C 1 is also symmetric. This implies the MMSE estimate of X i given Y is ˆX i (Y) â Y p i S i ( SPS + σ 2 I ) 1 Y (1) (b) We observe that V (SPS + σ 2 I) 1 Y is a vector that does not depend on which bit X i that we want to estimate. Since ˆX i p i S iv, we can form the vector of estimates ˆX ˆX 1. ˆX k p1 S 1V. pk S kv p1... pk S 1. S k V (11) P 1/2 S V (12) P 1/2 S ( SPS + σ 2 I ) 1 Y (13) Problem 9.5.3 Solution The solution to this problem is almost the same as the solution to Example 9.1, except perhaps the Matlab code is somewhat simpler. As in the example, let W (n), X (n), and Y (n) denote the vectors, consisting of the first n components of W, X, and Y. Just as in Examples 9.8 and 9.1, independence of X (n) and W (n) implies that the correlation matrix of Y (n) is [ R Y (n) E (X (n) + W (n) )(X (n) + W (n) ) ] R X (n) + R W (n) (1) Note that R X (n) and R W (n) are the n n upper-left submatrices of R X and R W. In addition, X 1 + W 1 r R Y (n) X E. X 1.. (2) X n + W n Compared to the solution of Example 9.1, the only difference in the solution is in the reversal of the vector R Y (n) X. The optimal filter based on the first n observations is â (n) R 1 R Y (n) Y (n) X, and the mean square error is r n 1 e L Var[X 1] (â (n) ) R Y (n) X. (3) 9
function emse953(r) Nlength(r); e[]; for n1:n, RYXr(1:n) ; RYtoeplitz(r(1:n))+.1*eye(n); ary\ryx; enr(1)-(a )*RYX; e[e;en]; end plot(1:n,e); The program mse953.m simply calculates the mean square error e L. The input is the vector r corresponding to the vector [ ] r r 2, which holds the first row of the Toeplitz correlation matrix R X. Note that R X (n) is the Toeplitz matrix whose first row is the first n elements of r. To plot the mean square error as a function of the number of observations, n, we generate the vector r and then run mse953(r). For the requested cases (a) and (b), the necessary Matlab commands and corresponding mean square estimation error output as a function of n are shown here:.1.1 MSE.5 MSE.5 5 1 15 2 25 n rasinc(.1*pi*(:2)); mse953(ra) (a) 5 1 15 2 25 n rbcos(.5*pi*(:2)); mse953(rb) (b) In comparing the results of cases (a) and (b), we see that the mean square estimation error depends strongly on the correlation structure given by r i j. For case (a), Y 1 is a noisy observation of X 1 and is highly correlated with X 1. The MSE at n 1 is just the variance of W 1. Additional samples of Y n mostly help to average the additive noise. Also, samples X n for n 1 have very little correlation with X 1. Thus for n 1, the samples of Y n result in almost no improvement in the estimate of X 1. In case (b), Y 1 X 1 + W 1, just as in case (a), is simply a noisy copy of X 1 and the estimation error is due to the variance of W 1. On the other hand, for case (b), X 5, X 9, X 13 and X 17 and X 21 are completely correlated with X 1. Other samples also have significant correlation with X 1. As a result, the MSE continues to go down with increasing n. 1