Solutions to Homework Set #5 (Prepared by Lele Wang). Neural net. Let Y X + Z, where the signal X U[,] and noise Z N(,) are independent. (a) Find the function g(y) that minimizes MSE E [ (sgn(x) g(y)) ], where sgn(x) { x + x >. (b) Plot g(y) vs. y. The minimum MSE is achieved when g(y) E(sgn(X) Y). We have g(y) E(sgn(X) Y y) To find the conditional pdf of X given Y, we use f X Y (x y) f Y X(y x)f X (x) f Y (y) Since X and Z are independent,, where f X (x) sgn(x)f X Y (x y)dx. { X otherwise. f Y X (y x) f Z (y x) Y {X x} N(x,). To find f Y (y) we integrate f Y X (y x)f X (x) over x: f Y (y) f Y X (y x)f X (x)dx π e (x y) dx (Q(y ) Q(y +)). Combining the above results, we get g(y) f Y (y) π e ( sgn(x)f X Y (x y)dx ( (y x) dx π e (x y) dx π e (x y) dx+ Q(y +) Q(y)+Q(y ) Q(y ) Q(y +). sgn(x) e (y x) π f Y (y) ) e (x y) dx π dx ) e (x y) dx π The plot is shown below. Note the sigmoidal shape corresponding to the common neural network activation function.
.5.5 g(y).5.5 8 6 4 4 6 8 y. Additive shot noise channel. Consider an additive noise channel Y X + Z, where the signal X N(,), and the noise Z {X x} N(,x ), i.e., the noise power of increases linearly with the signal squared. (a) Find E(Z ). (b) Find the best linear MSE estimate of X given Y. (a) Since Z {X x} N(,x ), Therefore, E(Z X) Var(Z X x) E(Z X x) E(Z X x) x. E(Z ) E(E(Z X)) E(X ). (b) From the the best linear estimate formula, Here we have E(X), (Y E(Y))+E(X). E(Y) E(X +Z) E(X)+E(Z) E(X)+E(E(Z X)) +E(), E(XZ) E(E(XZ X)) E(XE(Z X)) E(X ) E(), σ Y E(Y ) (E(Y)) E((X +Z) ) E(X )+E(Z )+E(XZ) ++, Cov(X,Y) E((X E(X))(Y E(Y))) E(XY) Using all of the above, we get E(X(X +Z)) E(X )+E(XZ) +. ˆX Y.
3. Estimation vs.detection. Let the signal { +, with probability X, with probability, and the noise Z Unif[,] be independent random variables. Their sum Y X + Z is observed. (a) Find the best MSE estimate of X given Y and its MSE. (b) Now suppose we use a decoder to decide whether X + or X so that the probability of error is minimized. Find the optimal decoder and its probability of error. Compare the optimal decoder s MSE to the minimum MSE. (a) We can easily find the piecewise constant density of Y 4 y f Y (y) 8 < y 3 otherwise The conditional probabilities of X given Y are 3 y < P{X + Y y} y + + < y +3 3 y < P{X Y y} y + + < y +3 Thus the best MSE estimate is 3 Y < g(y) E(X Y) Y + + + < Y +3 The minimum mean square error is E Y (Var(X Y)) E Y (E(X Y) (E(X Y)) ) E( g(y) ) E(g(Y) ) ( 3 3 8 dy+ 3 8 dy g(y) f Y (y)dy +3 4 dy+ + 8 dy 4 4. 8 dy ) 3
(b) The optimal decoder is given by the MAP rule. The a posteriori pmf of X was found in part (a). Thus the MAP rule reduces to 3 y D(y) ± < y < + + + < y +3 Sinceeither value can bechosen for D(y) in thecenter range of Y, a symmetrical decoder is sufficient, i.e., { y < D(y) + y The probability of decoding error is P{d(Y) X} P{X, Y } + P{X, Y < } P{X Y }P{Y } + P{X Y < }P{Y < } 4 + 4 4. If we use the decoder (detector) as an estimator, its MSE is E ( (d(y) X) ) 3 4 + 4. This MSE is twice that of the minimum mean square error estimator. 4. Linear estimator. Consider a channel with the observation Y XZ, where the signal X and the noise Z are uncorrelated Gaussian random variables. Let E[X], E[Z], σ X 5, and σ Z 8. (a) Find the best MSE linear estimate of X given Y. (b) Suppose your friend from Caltech tells you that he was able to derive an estimator with a lower MSE. Your friend from UCLA disagrees, saying that this is not possible because the signal and the noise are Gaussian, and hence the best linear MSE estimator will also be the best MSE estimator. Could your UCLA friend be wrong? (a) We know that the best linear estimate is given by the formula (Y E(Y))+E(X). Note that X and Z Gaussian and uncorrelated implies they are independent. Therefore, E(Y) E(XZ) E(X)E(Z), E(XY) E(X Z) E(X )E(Z) (σ X +E (X))E(Z), E(Y ) E(X Z ) E(X )E(Z ) (σ X +E (X)),(σ Z +E (Z)) 7, Cov(X,Y) σ Y σ Y E(Y ) E (Y) 68, E(XY) E(X)E(Y) σ Y Using all of the above, we get 5 34. ˆX 5 34 Y + 7. 4
(b) The fact that the best linear estimate equals the best MMSE estimate when input and noise are independent Gaussians is only known to be true for additive channels. For multiplicative channels this need not be the case in general. In the following, we prove Y is not Gaussian by contradiction. Suppose Y is Gaussian, then Y N(,68). We have f Y (y) π 68 e (y ) 68. On the other hand, as a function of two random variables, Y has pdf ( y ) f Y (y) X (x)f Z dx. f x But these two expressions are not consistent, because ( ) f Y () X (x)f Z dx f Z () f x e ( ) 8 π 8 π 68 e ( ) 68 f Y (), f X (x)dx f Z () which is a contradiction. Hence, X and Y are not joint Gaussian, and we might able to derive an estimator with a lower MSE. 5. Additive-noise channel with path gain. Consider the additive noise channel shown in the figure below, where X and Z are zero mean and uncorrelated, and a and b are constants. Z X a b Y b(ax +Z) Find the MMSE linear estimate of X given Y and its MSE in terms only of σ X, σ Z, a, and b. By the theorem of MMSE linear estimate, we have (Y E(Y))+E(X). Since X and Z are zero mean and uncorrelated, we have E(X), E(Y) b(ae(x)+e(z)), Cov(X,Y) E(XY) E(X)E(Y) E(Xb(aX +Z)) abσx, E(Y ) (E(Y)) E(b (ax +Z) ) b a σx +b σz. Hence, the best linear MSE estimate of X given Y is given by ˆX aσ X ba σx Y. +bσ Z 5
6. Worst noise distribution. Consider an additive noise channel Y X+Z, where the signal X N(,P) and the noise Z has zero mean and variance N. Assume X and Z are independent. Find a distribution of Z that maximizes the minimum MSE of estimating X given Y, i.e., the distribution of the worst noise Z that has the given mean and variance. You need to justify your answer. The worst noise has Gaussian distribution, i.e. Z N(,N). To prove this statement, we show that the MSE corresponds to any other distribution of Z is less than or equal to the MSE of Gaussian noise, i.e. MSE NonG MSE G. We know for any noise, MMSE estimation is no worse than linear MMSE estimation, so MSE NonG LMSE. Linear MMSE estimate of X given Y is given by (Y E(Y))+E(X) P P +N Y, LMSE σ X Cov (X,Y) σ Y P P P +N NP P +N. Note that LMSE only depends on the second moment of X and Z. So MSE corresponds to any distribution of Z is always upper bounded by the same LMSE, i.e. MSE NonG NP P+N. When Z is Gaussian and independent of X, (X,Y) are joint Gaussian. Then MSE G is equal to LMSE, i.e. MSE G NP P+N. Hence, which shows the Gaussian noise is the worst. MSE NonG NP P +N MSE G, 7. Image processing. A pixel signal X U[ k, k] is digitized to obtain X i+, if i < X i+, i k, k+,..., k, k. To improve the the visual appearance, the digitized value X is dithered by adding an independent noise Z with mean E(Z) and variance Var(Z) N to obtain Y X +Z. (a) Find the correlation of X and Y. (b) Find the best linear MSE estimate of X given Y. Your answer should be in terms only of k, N, and Y. (a) From the definition of X, we know P{ X k} P{i < X i+} k. By the law of 6
total expectation, we have Cov(X,Y) E(XY) E(X)E(Y) E(X( X +Z)) E(X X) k E[X X i < X i+]p(i < X i+) i k k i+ i k i 4k. Since, k i i k(k +)(k +)/6. (b) We have E(X), E(Y) E( X)+E(Z), σ Y Var X +VarZ i k x(i+ ) k dx 8k k (i+) 4k i k i k (i ) i k (i+ ) k +N k (i+) +N 4k 4k Then, the best linear MMSE estimate of X given Y is given by (Y E(Y))+E(X) 4k 4k +N Y. 4k 4k +N Y +N. 8. Covariance matrices. Which of the following matrices can be a covariance matrix? Justify youranswereither byconstructingarandomvector X, asafunctionof thei.i.dzeromeanunit variance random variables Z,Z, andz 3, withthe given covariance matrix, or by establishing a contradiction. [ ] [ ] (a) (b) (c) (d) 3 3 3 3 (a) This cannot be a covariance matrix because it is not symmetric. (b) This is a covariance matrix for X Z +Z and X Z +Z 3. (c) This is a covariance matrix for X Z, X Z +Z, and X 3 Z +Z +Z 3. (d) This cannot be a covariance matrix. Suppose it is, then σ 3 9 > σ σ 33 6, which contradicts the Schwartz inequality. You can also verify this by showing that the matrix is not positive semidefinite. For example, the determinant is. Also one of the eigenvalues is negative (λ.856). Alternatively, we can directly show that this matrix does not satisfy the definition of positive semidefiniteness by [ ] 3 <. 3 3 7
9. Iocane or Sennari: Return of the chemistry professor. An absent-minded chemistry professor forgets to label two identically looking bottles. One contains a chemical named Iocane and the other contains a chemical named Sennari. It is well known that the radioactivity level of Iocane has the Unif[, ] distribution, while the radioactivity level of Sennari has the Exp() distribution. In the previous homework, we found the optimal rule to decide which bottle is which, by measuring the radioactivity level of one of the bottles. The chemistry professor got smarter this time; she now measures both bottles. (a) Let X be the radioactivity level measured from one bottle, and let Y be the radioactivity level measured from the other bottle. What is the optimal decision rule (based on the measurement (X, Y)) that maximizes the chance of correctly identifying the contents? Assume that the radioactivity level of one chemical is independent of the level of the other bottle (conditioned on which bottle contains which). (b) What is the associated probability of error? Let Θ denote the case in which the first bottle (measurement X) is Iocane and the second bottle (measurement Y) is Sennari. Let Θ denote the other case. (a) The optimal MAP rule is equivalent to the ML rule {, f D(x,y) X,Y Θ (x,y ) > f X,Y Θ (x,y ),, otherwise. Since f X,Y Θ (x,y ) x e y and f X,Y Θ (x,y ) e x y, D(x,y) (b) The probability of error is given by {, (x <,y > ) or ( y < x ),, otherwise. P(Θ D(X,Y)) P( X Y Θ )+ P( Y < X Θ ) P( X Y Θ ) y e, e y dxdy which is less than the error probability ( e ) from the single measurement. 8
Solutions to Additional Exercises. Orthogonality. Let ˆX be the minimum MSE estimate of X given Y. (a) Show that for any function g(y), E((X ˆX)g(Y)), i.e., the error (X ˆX) and g(y) are orthogonal. (b) Show that Var(X) E(Var(X Y))+Var( ˆX). Provide a geometric interpretation for this result. (a) We use iterated expectation and the fact that E(g(Y) Y) g(y): ( E (X ˆX)g(Y) ) [ ( E E (X ˆX)g(Y) Y )] (b) First we write and E[E((X E(X Y))g(Y)) Y)] E(g(Y)E((X E(X Y)) Y)) E(g(Y)(E(X Y) E(X Y))). E(Var(X Y)) E(X ) E((E(X Y)) ), Var(E(X Y)) E((E(X Y)) ) (E(E(X Y))) Adding the two terms completes the proof. E((E(X Y)) ) (E(X)). Interpretation: IfweviewX, E(X Y), andx E(X Y)asvectorswith norm Var(X), Var(E(X Y)), and E(Var(X Y)), respectively, then thisresultprovides a Pythagorastheorem, wherethesignal, theerror, andtheestimate arethesidesofaright triangle (estimate and error being orthogonal).. Jointly Gaussian random variables. Let X and Y be jointly Gaussian random variables with pdf f X,Y (x,y) π 3/4 e (4x /3+6y /3+8xy/3 8x 6y+6). (a) Find E(X), E(Y), Var(X), Var(Y), and Cov(X,Y). (b) Find the minimum MSE estimate of X given Y and its MSE. 9
(a) We can write the joint pdf for X and Y jointly Gaussian as ( exp f X,Y (x,y) [ a(x µ X ) +b(y µ Y ) +c(x µ X )(y µ Y ) πσ X σ Y ρ X,Y ]), where a ( ρ, b X,Y )σ X ( ρ, c X,Y )σ Y ρ X,Y ( ρ X,Y )σ Xσ Y. By inspection of the given f X,Y (x,y) we find that a 3, b 8 3, c 4 3, and we get three equations in three unknowns ρ X,Y c ab, To find µ X and µ Y, we solve the equations and find that Finally σx ( ρ, X,Y )a ( ρ X,Y )b 4. aµ X +cµ Y 4, bµ Y +cµ X 8, µ X, µ Y. Cov(X,Y) ρ X,Y σ X σ Y 4. (b) X and Y are jointly Gaussian random variables. Thus, the minimum MSE estimate of X given Y is linear E(X Y) Cov(X,Y) (Y µ Y )+µ X (Y )+ 3 Y. MMSE E(Var(X Y)) ( ρ XY )σ X 3 4.
3. Let X and Y be two random variables. Let Z X +Y and let W X Y. Find the best linear estimate of W given Z as a function of E(X),E(Y),σ X,σ Y,ρ XY and Z. By the theorem of MMSE linear estimate, we have Here we have E(W) E(X) E(Y), E(Z) E(X)+E(Y), σ Z σ X +σ Y +ρ XY σ X σ Y, Ŵ Cov(W,Z) σz (Z E(Z))+E(W). Cov(W,Z) E(WZ) E(W)E(Z) E((X Y)(X +Y)) (E(X) E(Y))(E(X) +E(Y)) E(X ) E(Y ) (E(X)) +(E(Y)) σ X σ Y. So the best linear estimate of W given Z is Ŵ σ X σ Y σ X +σ Y +ρ XYσ X σ Y (Z E(X) E(Y))+E(X) E(Y). 4. Let X and Y be two random variables with joint pdf { x+y, x, y, f(x,y), otherwise. (a) Find the MMSE estimator of X given Y. (b) Find the corresponding MSE. (c) Find the pdf of Z E(X Y). (d) Find the linear MMSE estimator of X given Y. (e) Find the corresponding MSE. (a) We first calculate the marginal pdf of Y, by a direct integration. For y < or y >, we integrate over. For y, f Y (y) (x+y)dx y +. Note that the limits of the integration are derived from the definition of the joint pdf. Thus, { y + f Y (y), y, otherwise. Now we can calculate the conditional pdf f X Y (x y) f XY(x,y) f Y (y) x+y +y
for x,y. Therefore, for y hence E[X Y y] (b) The MSE is given by E[X Y] xf X Y (x y)dx 3 + Y +Y. E(Var(X Y)) EX E((E(X Y)) ) ( x x+ ) dx 5.757. x +yx y + ( 3 + y +y ( 3 + ln(3) ) 44 ln(3) 44 dx 3 + y +y, ) ( y + ) dy (c) Since E[X Y y] 3 +y +y + 6(+y), we have 5 9 E[X Y y] 3 for y. From the part (a), we know Z E[X Y] /3+Y/ /+Y. Thus, 5 9 Z 3. We first find the cdf F Z (z) of Z and then differentiate it to get the pdf. Consider { 3 F Z (z) P{Z z} P + Y { P Y < 3z 3 6z } +Y z }. For 3z 3 6z, i.e., 5 9 z 3, we have F Z (z) f Y (y)dx 3z 3 6z 3z 3 6z By differentiating with respect to z, we get f Z (z) ( 3z 3 6z + ) 8(z ) 3 for 5 9 z 3. Otherwise f Z(z). (d) The best linear MMSE estimate is d dz { P{+3Y 3z +6zY} P Y ( ) y + dx (3z ) 3 6z 6( z) (Y E(Y))+E(X). 3( z) 3z } 3 6z
Here we have E(X) E(Y) x(x+y)dydx y(y + )dy 7, Cov(X,Y) E(XY) E(X)E(Y) σ Y E(Y ) (E(Y)) So the best linear MMSE estimate is (e) The MSE of linear estimate is ˆX Here by symmetry, σ X σ Y 44. Thus, x(x+ )dydx 7, xy(x+y)dydx 7 7 44, y (y + ( ) 7 )dy 44. ( Y 7 ) + 7. MSE σx Cov (X,Y). MSE σ X Cov (X,Y) σ Y 44 +.67. We can check that LMSE > MSE. 5. Additive-noise channel with signal dependent noise. Consider the channel with correlated signal X and noise Z and observation Y X +Z, where µ X, µ Z, σ X 4, σ Z 9, ρ X,Z 3 8. Find the best MSE linear estimate of X given Y. The best linear MMSE estimate is given by the formula Here we have (Y E(Y))+E(X). E(Y) E(X)+E(Z), σ Y 4σ X +σ Z +4ρ XZ σ X σ Z 6, Cov(X,Y) E(XY) E(X)E(Y) E(X +XZ) So the best linear MMSE estimate is (σ X +µ X)+(ρ XZ σ X σ Z +µ X µ Z ) 3 4. ˆX 3 3 (Y )+ 64 64 Y + 9 3. 3