Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

Similar documents
UCSD ECE153 Handout #27 Prof. Young-Han Kim Tuesday, May 6, Solutions to Homework Set #5 (Prepared by TA Fatemeh Arbabjolfaei)

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

UCSD ECE153 Handout #30 Prof. Young-Han Kim Thursday, May 15, Homework Set #6 Due: Thursday, May 22, 2011

Solutions to Homework Set #6 (Prepared by Lele Wang)

Solutions to Homework Set #3 (Prepared by Yu Xiang) Let the random variable Y be the time to get the n-th packet. Find the pdf of Y.

ENGG2430A-Homework 2

UCSD ECE 153 Handout #20 Prof. Young-Han Kim Thursday, April 24, Solutions to Homework Set #3 (Prepared by TA Fatemeh Arbabjolfaei)

Final Examination Solutions (Total: 100 points)

Let X and Y denote two random variables. The joint distribution of these random

More than one variable

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

ESTIMATION THEORY. Chapter Estimation of Random Variables

Introduction to Probability and Stocastic Processes - Part I

STA 256: Statistics and Probability I

conditional cdf, conditional pdf, total probability theorem?

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

A Probability Review

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

Continuous Random Variables

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

Lecture 4: Least Squares (LS) Estimation

2 (Statistics) Random variables

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

MATH/STAT 3360, Probability Sample Final Examination Model Solutions

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

3. Probability and Statistics

Appendix A : Introduction to Probability and stochastic processes

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

Chapter 4 Multiple Random Variables

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University.

Bivariate distributions

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

18.440: Lecture 26 Conditional expectation

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

5 Operations on Multiple Random Variables

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

Random Variables. P(x) = P[X(e)] = P(e). (1)

EE4601 Communication Systems

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Multivariate Random Variable

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

ECE Lecture #9 Part 2 Overview

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

Review: mostly probability and some statistics

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

STT 441 Final Exam Fall 2013

MAS113 Introduction to Probability and Statistics. Proofs of theorems

1 Random Variable: Topics

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

Problem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2},

UCSD ECE250 Handout #24 Prof. Young-Han Kim Wednesday, June 6, Solutions to Exercise Set #7

Chapter 5 Joint Probability Distributions

UCSD ECE250 Handout #27 Prof. Young-Han Kim Friday, June 8, Practice Final Examination (Winter 2017)

Chapter 4 : Expectation and Moments

Lecture 11. Probability Theory: an Overveiw

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Algorithms for Uncertainty Quantification

BASICS OF PROBABILITY

FINAL EXAM: Monday 8-10am

Bivariate Paired Numerical Data

01 Probability Theory and Statistics Review

Bivariate Distributions. Discrete Bivariate Distribution Example

Chp 4. Expectation and Variance

18.440: Lecture 28 Lectures Review

Multivariate distributions

6.041/6.431 Fall 2010 Quiz 2 Solutions

Formulas for probability theory and linear models SF2941

Continuous r.v practice problems

Chapter 5,6 Multiple RandomVariables

Multiple Random Variables

General Random Variables

Chapter 2. Probability

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Lecture 2: Repetition of probability theory and statistics

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Problem Set 7 Due March, 22

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

UCSD ECE250 Handout #20 Prof. Young-Han Kim Monday, February 26, Solutions to Exercise Set #7

Practice Examination # 3

Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation. So: How can we evaluate E(X Y )? What is a neural network? How is it useful?

UCSD ECE 153 Handout #46 Prof. Young-Han Kim Thursday, June 5, Solutions to Homework Set #8 (Prepared by TA Fatemeh Arbabjolfaei)

Gaussian random variables inr n

Joint Distribution of Two or More Random Variables

TAMS39 Lecture 2 Multivariate normal distribution

ECE531 Homework Assignment Number 6 Solution

Lecture Note 1: Probability Theory and Statistics

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

1 Basic continuous random variable problems

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

Properties of Random Variables

2 Functions of random variables

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Transcription:

Solutions to Homework Set #5 (Prepared by Lele Wang). Neural net. Let Y X + Z, where the signal X U[,] and noise Z N(,) are independent. (a) Find the function g(y) that minimizes MSE E [ (sgn(x) g(y)) ], where sgn(x) { x + x >. (b) Plot g(y) vs. y. The minimum MSE is achieved when g(y) E(sgn(X) Y). We have g(y) E(sgn(X) Y y) To find the conditional pdf of X given Y, we use f X Y (x y) f Y X(y x)f X (x) f Y (y) Since X and Z are independent,, where f X (x) sgn(x)f X Y (x y)dx. { X otherwise. f Y X (y x) f Z (y x) Y {X x} N(x,). To find f Y (y) we integrate f Y X (y x)f X (x) over x: f Y (y) f Y X (y x)f X (x)dx π e (x y) dx (Q(y ) Q(y +)). Combining the above results, we get g(y) f Y (y) π e ( sgn(x)f X Y (x y)dx ( (y x) dx π e (x y) dx π e (x y) dx+ Q(y +) Q(y)+Q(y ) Q(y ) Q(y +). sgn(x) e (y x) π f Y (y) ) e (x y) dx π dx ) e (x y) dx π The plot is shown below. Note the sigmoidal shape corresponding to the common neural network activation function.

.5.5 g(y).5.5 8 6 4 4 6 8 y. Additive shot noise channel. Consider an additive noise channel Y X + Z, where the signal X N(,), and the noise Z {X x} N(,x ), i.e., the noise power of increases linearly with the signal squared. (a) Find E(Z ). (b) Find the best linear MSE estimate of X given Y. (a) Since Z {X x} N(,x ), Therefore, E(Z X) Var(Z X x) E(Z X x) E(Z X x) x. E(Z ) E(E(Z X)) E(X ). (b) From the the best linear estimate formula, Here we have E(X), (Y E(Y))+E(X). E(Y) E(X +Z) E(X)+E(Z) E(X)+E(E(Z X)) +E(), E(XZ) E(E(XZ X)) E(XE(Z X)) E(X ) E(), σ Y E(Y ) (E(Y)) E((X +Z) ) E(X )+E(Z )+E(XZ) ++, Cov(X,Y) E((X E(X))(Y E(Y))) E(XY) Using all of the above, we get E(X(X +Z)) E(X )+E(XZ) +. ˆX Y.

3. Estimation vs.detection. Let the signal { +, with probability X, with probability, and the noise Z Unif[,] be independent random variables. Their sum Y X + Z is observed. (a) Find the best MSE estimate of X given Y and its MSE. (b) Now suppose we use a decoder to decide whether X + or X so that the probability of error is minimized. Find the optimal decoder and its probability of error. Compare the optimal decoder s MSE to the minimum MSE. (a) We can easily find the piecewise constant density of Y 4 y f Y (y) 8 < y 3 otherwise The conditional probabilities of X given Y are 3 y < P{X + Y y} y + + < y +3 3 y < P{X Y y} y + + < y +3 Thus the best MSE estimate is 3 Y < g(y) E(X Y) Y + + + < Y +3 The minimum mean square error is E Y (Var(X Y)) E Y (E(X Y) (E(X Y)) ) E( g(y) ) E(g(Y) ) ( 3 3 8 dy+ 3 8 dy g(y) f Y (y)dy +3 4 dy+ + 8 dy 4 4. 8 dy ) 3

(b) The optimal decoder is given by the MAP rule. The a posteriori pmf of X was found in part (a). Thus the MAP rule reduces to 3 y D(y) ± < y < + + + < y +3 Sinceeither value can bechosen for D(y) in thecenter range of Y, a symmetrical decoder is sufficient, i.e., { y < D(y) + y The probability of decoding error is P{d(Y) X} P{X, Y } + P{X, Y < } P{X Y }P{Y } + P{X Y < }P{Y < } 4 + 4 4. If we use the decoder (detector) as an estimator, its MSE is E ( (d(y) X) ) 3 4 + 4. This MSE is twice that of the minimum mean square error estimator. 4. Linear estimator. Consider a channel with the observation Y XZ, where the signal X and the noise Z are uncorrelated Gaussian random variables. Let E[X], E[Z], σ X 5, and σ Z 8. (a) Find the best MSE linear estimate of X given Y. (b) Suppose your friend from Caltech tells you that he was able to derive an estimator with a lower MSE. Your friend from UCLA disagrees, saying that this is not possible because the signal and the noise are Gaussian, and hence the best linear MSE estimator will also be the best MSE estimator. Could your UCLA friend be wrong? (a) We know that the best linear estimate is given by the formula (Y E(Y))+E(X). Note that X and Z Gaussian and uncorrelated implies they are independent. Therefore, E(Y) E(XZ) E(X)E(Z), E(XY) E(X Z) E(X )E(Z) (σ X +E (X))E(Z), E(Y ) E(X Z ) E(X )E(Z ) (σ X +E (X)),(σ Z +E (Z)) 7, Cov(X,Y) σ Y σ Y E(Y ) E (Y) 68, E(XY) E(X)E(Y) σ Y Using all of the above, we get 5 34. ˆX 5 34 Y + 7. 4

(b) The fact that the best linear estimate equals the best MMSE estimate when input and noise are independent Gaussians is only known to be true for additive channels. For multiplicative channels this need not be the case in general. In the following, we prove Y is not Gaussian by contradiction. Suppose Y is Gaussian, then Y N(,68). We have f Y (y) π 68 e (y ) 68. On the other hand, as a function of two random variables, Y has pdf ( y ) f Y (y) X (x)f Z dx. f x But these two expressions are not consistent, because ( ) f Y () X (x)f Z dx f Z () f x e ( ) 8 π 8 π 68 e ( ) 68 f Y (), f X (x)dx f Z () which is a contradiction. Hence, X and Y are not joint Gaussian, and we might able to derive an estimator with a lower MSE. 5. Additive-noise channel with path gain. Consider the additive noise channel shown in the figure below, where X and Z are zero mean and uncorrelated, and a and b are constants. Z X a b Y b(ax +Z) Find the MMSE linear estimate of X given Y and its MSE in terms only of σ X, σ Z, a, and b. By the theorem of MMSE linear estimate, we have (Y E(Y))+E(X). Since X and Z are zero mean and uncorrelated, we have E(X), E(Y) b(ae(x)+e(z)), Cov(X,Y) E(XY) E(X)E(Y) E(Xb(aX +Z)) abσx, E(Y ) (E(Y)) E(b (ax +Z) ) b a σx +b σz. Hence, the best linear MSE estimate of X given Y is given by ˆX aσ X ba σx Y. +bσ Z 5

6. Worst noise distribution. Consider an additive noise channel Y X+Z, where the signal X N(,P) and the noise Z has zero mean and variance N. Assume X and Z are independent. Find a distribution of Z that maximizes the minimum MSE of estimating X given Y, i.e., the distribution of the worst noise Z that has the given mean and variance. You need to justify your answer. The worst noise has Gaussian distribution, i.e. Z N(,N). To prove this statement, we show that the MSE corresponds to any other distribution of Z is less than or equal to the MSE of Gaussian noise, i.e. MSE NonG MSE G. We know for any noise, MMSE estimation is no worse than linear MMSE estimation, so MSE NonG LMSE. Linear MMSE estimate of X given Y is given by (Y E(Y))+E(X) P P +N Y, LMSE σ X Cov (X,Y) σ Y P P P +N NP P +N. Note that LMSE only depends on the second moment of X and Z. So MSE corresponds to any distribution of Z is always upper bounded by the same LMSE, i.e. MSE NonG NP P+N. When Z is Gaussian and independent of X, (X,Y) are joint Gaussian. Then MSE G is equal to LMSE, i.e. MSE G NP P+N. Hence, which shows the Gaussian noise is the worst. MSE NonG NP P +N MSE G, 7. Image processing. A pixel signal X U[ k, k] is digitized to obtain X i+, if i < X i+, i k, k+,..., k, k. To improve the the visual appearance, the digitized value X is dithered by adding an independent noise Z with mean E(Z) and variance Var(Z) N to obtain Y X +Z. (a) Find the correlation of X and Y. (b) Find the best linear MSE estimate of X given Y. Your answer should be in terms only of k, N, and Y. (a) From the definition of X, we know P{ X k} P{i < X i+} k. By the law of 6

total expectation, we have Cov(X,Y) E(XY) E(X)E(Y) E(X( X +Z)) E(X X) k E[X X i < X i+]p(i < X i+) i k k i+ i k i 4k. Since, k i i k(k +)(k +)/6. (b) We have E(X), E(Y) E( X)+E(Z), σ Y Var X +VarZ i k x(i+ ) k dx 8k k (i+) 4k i k i k (i ) i k (i+ ) k +N k (i+) +N 4k 4k Then, the best linear MMSE estimate of X given Y is given by (Y E(Y))+E(X) 4k 4k +N Y. 4k 4k +N Y +N. 8. Covariance matrices. Which of the following matrices can be a covariance matrix? Justify youranswereither byconstructingarandomvector X, asafunctionof thei.i.dzeromeanunit variance random variables Z,Z, andz 3, withthe given covariance matrix, or by establishing a contradiction. [ ] [ ] (a) (b) (c) (d) 3 3 3 3 (a) This cannot be a covariance matrix because it is not symmetric. (b) This is a covariance matrix for X Z +Z and X Z +Z 3. (c) This is a covariance matrix for X Z, X Z +Z, and X 3 Z +Z +Z 3. (d) This cannot be a covariance matrix. Suppose it is, then σ 3 9 > σ σ 33 6, which contradicts the Schwartz inequality. You can also verify this by showing that the matrix is not positive semidefinite. For example, the determinant is. Also one of the eigenvalues is negative (λ.856). Alternatively, we can directly show that this matrix does not satisfy the definition of positive semidefiniteness by [ ] 3 <. 3 3 7

9. Iocane or Sennari: Return of the chemistry professor. An absent-minded chemistry professor forgets to label two identically looking bottles. One contains a chemical named Iocane and the other contains a chemical named Sennari. It is well known that the radioactivity level of Iocane has the Unif[, ] distribution, while the radioactivity level of Sennari has the Exp() distribution. In the previous homework, we found the optimal rule to decide which bottle is which, by measuring the radioactivity level of one of the bottles. The chemistry professor got smarter this time; she now measures both bottles. (a) Let X be the radioactivity level measured from one bottle, and let Y be the radioactivity level measured from the other bottle. What is the optimal decision rule (based on the measurement (X, Y)) that maximizes the chance of correctly identifying the contents? Assume that the radioactivity level of one chemical is independent of the level of the other bottle (conditioned on which bottle contains which). (b) What is the associated probability of error? Let Θ denote the case in which the first bottle (measurement X) is Iocane and the second bottle (measurement Y) is Sennari. Let Θ denote the other case. (a) The optimal MAP rule is equivalent to the ML rule {, f D(x,y) X,Y Θ (x,y ) > f X,Y Θ (x,y ),, otherwise. Since f X,Y Θ (x,y ) x e y and f X,Y Θ (x,y ) e x y, D(x,y) (b) The probability of error is given by {, (x <,y > ) or ( y < x ),, otherwise. P(Θ D(X,Y)) P( X Y Θ )+ P( Y < X Θ ) P( X Y Θ ) y e, e y dxdy which is less than the error probability ( e ) from the single measurement. 8

Solutions to Additional Exercises. Orthogonality. Let ˆX be the minimum MSE estimate of X given Y. (a) Show that for any function g(y), E((X ˆX)g(Y)), i.e., the error (X ˆX) and g(y) are orthogonal. (b) Show that Var(X) E(Var(X Y))+Var( ˆX). Provide a geometric interpretation for this result. (a) We use iterated expectation and the fact that E(g(Y) Y) g(y): ( E (X ˆX)g(Y) ) [ ( E E (X ˆX)g(Y) Y )] (b) First we write and E[E((X E(X Y))g(Y)) Y)] E(g(Y)E((X E(X Y)) Y)) E(g(Y)(E(X Y) E(X Y))). E(Var(X Y)) E(X ) E((E(X Y)) ), Var(E(X Y)) E((E(X Y)) ) (E(E(X Y))) Adding the two terms completes the proof. E((E(X Y)) ) (E(X)). Interpretation: IfweviewX, E(X Y), andx E(X Y)asvectorswith norm Var(X), Var(E(X Y)), and E(Var(X Y)), respectively, then thisresultprovides a Pythagorastheorem, wherethesignal, theerror, andtheestimate arethesidesofaright triangle (estimate and error being orthogonal).. Jointly Gaussian random variables. Let X and Y be jointly Gaussian random variables with pdf f X,Y (x,y) π 3/4 e (4x /3+6y /3+8xy/3 8x 6y+6). (a) Find E(X), E(Y), Var(X), Var(Y), and Cov(X,Y). (b) Find the minimum MSE estimate of X given Y and its MSE. 9

(a) We can write the joint pdf for X and Y jointly Gaussian as ( exp f X,Y (x,y) [ a(x µ X ) +b(y µ Y ) +c(x µ X )(y µ Y ) πσ X σ Y ρ X,Y ]), where a ( ρ, b X,Y )σ X ( ρ, c X,Y )σ Y ρ X,Y ( ρ X,Y )σ Xσ Y. By inspection of the given f X,Y (x,y) we find that a 3, b 8 3, c 4 3, and we get three equations in three unknowns ρ X,Y c ab, To find µ X and µ Y, we solve the equations and find that Finally σx ( ρ, X,Y )a ( ρ X,Y )b 4. aµ X +cµ Y 4, bµ Y +cµ X 8, µ X, µ Y. Cov(X,Y) ρ X,Y σ X σ Y 4. (b) X and Y are jointly Gaussian random variables. Thus, the minimum MSE estimate of X given Y is linear E(X Y) Cov(X,Y) (Y µ Y )+µ X (Y )+ 3 Y. MMSE E(Var(X Y)) ( ρ XY )σ X 3 4.

3. Let X and Y be two random variables. Let Z X +Y and let W X Y. Find the best linear estimate of W given Z as a function of E(X),E(Y),σ X,σ Y,ρ XY and Z. By the theorem of MMSE linear estimate, we have Here we have E(W) E(X) E(Y), E(Z) E(X)+E(Y), σ Z σ X +σ Y +ρ XY σ X σ Y, Ŵ Cov(W,Z) σz (Z E(Z))+E(W). Cov(W,Z) E(WZ) E(W)E(Z) E((X Y)(X +Y)) (E(X) E(Y))(E(X) +E(Y)) E(X ) E(Y ) (E(X)) +(E(Y)) σ X σ Y. So the best linear estimate of W given Z is Ŵ σ X σ Y σ X +σ Y +ρ XYσ X σ Y (Z E(X) E(Y))+E(X) E(Y). 4. Let X and Y be two random variables with joint pdf { x+y, x, y, f(x,y), otherwise. (a) Find the MMSE estimator of X given Y. (b) Find the corresponding MSE. (c) Find the pdf of Z E(X Y). (d) Find the linear MMSE estimator of X given Y. (e) Find the corresponding MSE. (a) We first calculate the marginal pdf of Y, by a direct integration. For y < or y >, we integrate over. For y, f Y (y) (x+y)dx y +. Note that the limits of the integration are derived from the definition of the joint pdf. Thus, { y + f Y (y), y, otherwise. Now we can calculate the conditional pdf f X Y (x y) f XY(x,y) f Y (y) x+y +y

for x,y. Therefore, for y hence E[X Y y] (b) The MSE is given by E[X Y] xf X Y (x y)dx 3 + Y +Y. E(Var(X Y)) EX E((E(X Y)) ) ( x x+ ) dx 5.757. x +yx y + ( 3 + y +y ( 3 + ln(3) ) 44 ln(3) 44 dx 3 + y +y, ) ( y + ) dy (c) Since E[X Y y] 3 +y +y + 6(+y), we have 5 9 E[X Y y] 3 for y. From the part (a), we know Z E[X Y] /3+Y/ /+Y. Thus, 5 9 Z 3. We first find the cdf F Z (z) of Z and then differentiate it to get the pdf. Consider { 3 F Z (z) P{Z z} P + Y { P Y < 3z 3 6z } +Y z }. For 3z 3 6z, i.e., 5 9 z 3, we have F Z (z) f Y (y)dx 3z 3 6z 3z 3 6z By differentiating with respect to z, we get f Z (z) ( 3z 3 6z + ) 8(z ) 3 for 5 9 z 3. Otherwise f Z(z). (d) The best linear MMSE estimate is d dz { P{+3Y 3z +6zY} P Y ( ) y + dx (3z ) 3 6z 6( z) (Y E(Y))+E(X). 3( z) 3z } 3 6z

Here we have E(X) E(Y) x(x+y)dydx y(y + )dy 7, Cov(X,Y) E(XY) E(X)E(Y) σ Y E(Y ) (E(Y)) So the best linear MMSE estimate is (e) The MSE of linear estimate is ˆX Here by symmetry, σ X σ Y 44. Thus, x(x+ )dydx 7, xy(x+y)dydx 7 7 44, y (y + ( ) 7 )dy 44. ( Y 7 ) + 7. MSE σx Cov (X,Y). MSE σ X Cov (X,Y) σ Y 44 +.67. We can check that LMSE > MSE. 5. Additive-noise channel with signal dependent noise. Consider the channel with correlated signal X and noise Z and observation Y X +Z, where µ X, µ Z, σ X 4, σ Z 9, ρ X,Z 3 8. Find the best MSE linear estimate of X given Y. The best linear MMSE estimate is given by the formula Here we have (Y E(Y))+E(X). E(Y) E(X)+E(Z), σ Y 4σ X +σ Z +4ρ XZ σ X σ Z 6, Cov(X,Y) E(XY) E(X)E(Y) E(X +XZ) So the best linear MMSE estimate is (σ X +µ X)+(ρ XZ σ X σ Z +µ X µ Z ) 3 4. ˆX 3 3 (Y )+ 64 64 Y + 9 3. 3