Summary of EE226a: MMSE and LLSE

Similar documents
Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

Problem Set 7 Due March, 22

EE226a - Summary of Lecture 13 and 14 Kalman Filter: Convergence

2 Statistical Estimation: Basic Concepts

Stat 366 A1 (Fall 2006) Midterm Solutions (October 23) page 1

Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation. So: How can we evaluate E(X Y )? What is a neural network? How is it useful?

Kalman Filtering. Namrata Vaswani. March 29, Kalman Filter as a causal MMSE estimator

Lecture Note 12: Kalman Filter

ECE531 Lecture 8: Non-Random Parameter Estimation

CS70: Jean Walrand: Lecture 22.

Problem Set 3 Due Oct, 5

RECURSIVE ESTIMATION AND KALMAN FILTERING

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

ECE Lecture #9 Part 2 Overview

Ch. 12 Linear Bayesian Estimators

Appendix A : Introduction to Probability and stochastic processes

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Elliptically Contoured Distributions

Chapter 5 Class Notes

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

MATH 101 Midterm Examination Spring 2009

DS-GA 1002 Lecture notes 12 Fall Linear regression

4 Derivations of the Discrete-Time Kalman Filter

1 The Differential Geometry of Surfaces

Problem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2},

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

Announcements (repeat) Principal Components Analysis

P R O B A B I L I T Y I N E E & C S

EE5585 Data Compression May 2, Lecture 27

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Estimation techniques

Gaussian Processes (10/16/13)

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Assignment 4 (Solutions) NPTEL MOOC (Bayesian/ MMSE Estimation for MIMO/OFDM Wireless Communications)

Math 111 lecture for Friday, Week 10

The Gauss-Markov Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 61

In general, the formula is S f ds = D f(φ(u, v)) Φ u Φ v da. To compute surface area, we choose f = 1. We compute

ESTIMATION THEORY. Chapter Estimation of Random Variables

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007

University of Regina Department of Mathematics and Statistics

f G It expf fx.in xn xn IF Hat expf EE T vectorse Today we'll speed Multiua Reading 9 I 9.5 Last week we organized finite collectors of Us into

Optimization and Simulation

The Hilbert Space of Random Variables

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

Chapter 4 continued. Chapter 4 sections

Chapter 6 - Orthogonality

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Joint ] X 5) P[ 6) P[ (, ) = y 2. x 1. , y. , ( x, y ) 2, (

The Multivariate Normal Distribution 1

Math 112 Section 10 Lecture notes, 1/7/04

Final Examination Solutions (Total: 100 points)

Multiple Regression Model: I

The Multivariate Normal Distribution 1

Notes on Time Series Modeling

DS-GA 1002 Lecture notes 10 November 23, Linear models

Section 5.5 More Integration Formula (The Substitution Method) 2 Lectures. Dr. Abdulla Eid. College of Science. MATHS 101: Calculus I

[Chapter 6. Functions of Random Variables]

STA 256: Statistics and Probability I

Statistics & Data Sciences: First Year Prelim Exam May 2018

c Igor Zelenko, Fall

Concentration Ellipsoids

Conditional Distributions

Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques

Functional Analysis (2006) Homework assignment 2

UCSD ECE153 Handout #30 Prof. Young-Han Kim Thursday, May 15, Homework Set #6 Due: Thursday, May 22, 2011

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

5 Operations on Multiple Random Variables

STAT 100C: Linear models

[y i α βx i ] 2 (2) Q = i=1

Vector Calculus handout

22 Approximations - the method of least squares (1)

LINEAR MMSE ESTIMATION

Siegel s formula via Stein s identities

Convergence of Fourier Series

APPM 3310 Problem Set 4 Solutions

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1

STA 4322 Exam I Name: Introduction to Statistics Theory

ECEN 615 Methods of Electric Power Systems Analysis Lecture 18: Least Squares, State Estimation

Lecture 11: Continuous-valued signals and differential entropy

Numerical Optimization

Problem Set 1 Sept, 14

Lecture Note 2: Estimation and Information Theory

Lecture 2 INF-MAT : A boundary value problem and an eigenvalue problem; Block Multiplication; Tridiagonal Systems

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7

UCSD ECE250 Handout #27 Prof. Young-Han Kim Friday, June 8, Practice Final Examination (Winter 2017)

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

3.4 Conic sections. Such type of curves are called conics, because they arise from different slices through a cone

Change Of Variable Theorem: Multiple Dimensions

Week 2: First Order Semi-Linear PDEs

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

10. Joint Moments and Joint Characteristic Functions

6. Vector Random Variables

Ch 10.1: Two Point Boundary Value Problems

Review Sheet for the Final

The Multivariate Gaussian Distribution [DRAFT]

Transcription:

1 Summary of EE226a: MMSE and LLSE Jean Walrand Here are the key ideas and results: I. SUMMARY The MMSE of given Y is E[ Y]. The LLSE of given Y is L[ Y] = E() + K,Y K 1 Y (Y E(Y)) if K Y is nonsingular. Theorem 3 states the properties of the LLSE. The linear regression approximates the LLSE when the samples are realizations of i.i.d. random pairs ( m, Y m ). II. ESTIMATION: FORMULATION The estimation problem is a generalized version of the Bayesian decision problem where the set of values of can be R n. In applications, one is given a source model f and a channel model f Y that together define f,y. One may have to estimate f Y by using a training sequence, i.e., by selecting the values of and observing the channel outputs. Alternatively, one may be able to observe a sequence of values of (, Y) and use them to estimate f,y. Definition 1: Estimation Problems One is given the joint distribution of (, Y). The estimation problem is to calculate Z = g(y) to minimize E(c(, Z)) for a given function c(, ). The random variable Z = g(y) that minimizes E( Z 2 ) is the minimum mean squares estimator (MMSE) of given Y. The random variable Z = AY + b that minimizes E( Z 2 ) is called the linear least squares estimator (LLSE) of given Y; we designate it by Z = L[ Y]. III. MMSE Here is the central result about minimum mean squares estimation. You have seen this before, but we recall the proof of that important result. Theorem 1: MMSE The MMSE of given Y is E[ Y]. Proof: You should recall the definition of E[ Y], a random variable that has the property E[( E[ Y])h 1 (Y)] =, h 1 ( ), (1) or equivalently E[h 2 (Y)( E[ Y])] =, h 2 ( ). (2) By E() = E[E[ Y]], the interpretation is that E[ Y] h(y) for all h( ). By Pythagoras, one then expect E[ Y] to be the function of Y that is closest to, as illustrated in Figure 1. E[ Y] = g(y) h(y) {r(y), r(.) = function} Fig. 1. The conditional expectation as a projection. Formally, let h(y) be an arbitrary function. Then E( h(y) 2 ) = E( E[ Y] + E[ Y] h(y) 2 ) = E( E[ Y] 2 ) + 2E((E[ Y] h(y)) T ( E[ Y])) + E( E[ Y] h(y) 2 ) = E( E[ Y] 2 ) + E( E[ Y] h(y) 2 ) E( E[ Y] 2 ). In this derivation, the third identity follows from the fact that the cross-term vanishes in view of (2). Note also that this derivation corresponds to the fact that the triangle {, E[ Y], h(y)} is a right triangle with hypothenuse (, h(y)).

2 Example 1: You recall that if (, Y) are jointly Gaussian with K Y nonsingular, then E[ Y] = E() + K,Y K 1 Y (Y E(Y)). You also recall what happens when K Y is singular. The key is to find A so that K,Y = AK Y = AQΛQ T. By writing [ ] Λ1 Q = [Q 1 Q 2 ], Λ = where Λ 1 corresponds the part of nonzero eigenvalues of K Y, one finds [ ] [ K,Y = AQΛQ T Λ1 Q T = A[Q 1 Q 2 ] 1 so that K,Y = AK Y with A = K,Y Q 1 Λ 1 1 QT 1. IV. LLSE Q T 2 ] = AQ 1 Λ 1 Q T 1, Theorem 2: LLSE L[ Y] = E() + K,Y K 1 Y (Y E(Y)) if K Y is nonsingular. L[ Y] = E() + K,Y Q 1 Λ 1 1 QT 1 (Y E(Y)) if K Y is singular. Proof: Z = L[ Y] = AY + b satisfies Z BY + d for any B and d with E() = E[L[ Y]]. If we consider any Z = CY + c, then E((Z Z ) T ( Z)) = since Z Z = AY + b CY c = (A C)Y + (b c) = BY + d. It follows that E( Z 2 ) = E( Z + Z Z 2 ) = E( Z 2 ) + 2E((Z Z ) T ( Z)) + E( Z Z 2 ) = E( Z 2 ) + E( Z Z 2 ) E( Z 2 ). Figure 2 illustrates this calculation. It shows that L[ Y] is the projection of on the set of linear functions of Y. The projection Z is characterized by the property that Z is orthogonal to all the linear functions BY + d. This property holds if and only if (from (1)) E( Z) = and E(( Z)Y T ) =, which are equivalent to E( Z) = and cov( Z, Y) =. The comparison between L[ Y] and E[ Y] is shown in Figure 3. Z = L[ Y] Z {r(y) = CY + d} Fig. 2. The LLSE as a projection. L[ Y] E[ Y] {CY + d} {g(y)} Fig. 3. LLSE vs. MMSE. The following result indicates some properties of the LLSE. They are easy to prove. Theorem 3: Properties of LLSE (a) L[A 1 + B 2 Y] = AL[ 1 Y] + BL[ 2 Y]. (b) L[L[ Y, Z] Y] = L[ Y]. (c) If Y, then L[ Y] = E(). (d) Assume that and Z are conditionally independent given Y. Then, in general, L[ Y, Z] L[ Y ].

3 Proof: (b): It suffices to show that E(L[ Y, Z] L[ Y]) = and E((L[ Y] L[ Y, Z])Y T ) =. We already have E( L[ Y, Z]) = and E( L[ Y]) =, which implies E(L[ Y, Z] L[ Y]) =. Moreover, from E(( L[ Y, Z])Y T ) = and E(( L[ Y])Y T ) =, it is obtained E((L[ Y] L[ Y, Z])Y T ) =. (d): Here is a trivial example. Assume that = Z = Y 1/2 where is U[, 1]. Then L[ Y, Z] = Z. However, since Y = 2, Y 2 = 4, and Y = 3, we find We see that L[ Y, Z] L[ Y ] because E(Y ) E()E(Y ) L[ Y ] = E() + E(Y 2 ) E(Y ) 2 (Y E(Y )) = 1 ( 1 2 + 4 1 2 1 ) ( 1 3 5 1 ) 1 ( Y 1 ) 9 3 = 3 16 Y. Z 3 16 Y = 3 16 Z2. Figure 4 illustrates property (b) which is called the smoothing property of LLSE. L[ Y] L[ Y, Z] {CY + d} {AY + BZ + c} Fig. 4. Smoothing property of LLSE. V. EAMPLES We illustrate the previous ideas on a few examples. Example 2: Assume that U, V, W are independent and U[, 1]. Calculate L[(U + V ) 2 V 2 + W 2 ]. Solution: Let = (U + V ) 2 and Y = V 2 + W 2. Then Now, and Hence, We conclude that K,Y = cov(u 2 + 2UV + V 2, V 2 + W 2 ) = 2cov(UV, V 2 ) + cov(v 2, V 2 ). cov(uv, V 2 ) = E(UV 3 ) E(UV )E(V 2 ) = 1 8 1 12 = 1 24 cov(v 2, V 2 ) = E(V 4 ) E(V 2 )E(V 2 ) = 1 5 1 9 = 4 45. K,Y = 1 12 + 4 45 = 31 18. L[(U + V ) 2 V 2 + W 2 ] = E((U + V ) 2 ) + K,Y K 1 Y (Y E(Y )) = 7 6 + 31 32 Example 3: Let U, V, W be as in the previous example. Calculate L[cos(U + V ) V + W ]. Solution: Let = cos(u + V ) and Y = V + W. We find K,Y = E(Y ) E()E(Y ). ( Y 2 ). 3 Now, E(Y ) = E(V cos(u + V )) + 1 2 E().

4 Also, Moreover, In addition, and E(V cos(u + V )) = E() = = v cos(u + v)dudv = (v sin(v + 1) v sin(v))dv = = [v cos(v + 1)] 1 + [v sin(u + v)] 1 dv v d cos(v + 1) + cos(v + 1)dv + [v cos(v)] 1 = cos(2) + [sin(v + 1)] 1 + cos(1) [sin(v)] 1 = cos(2) + sin(2) sin(1) + cos(1) sin(1). cos(u + v)dudv = [sin(u + v)] 1 du = (sin(u + 1) sin(u))du cos(v)dv v d cos(v) = [cos(u + 1)] 1 + [cos(u)] 1 = cos(2) + cos(1) + cos(1) cos() = cos(2) + 2 cos(1) 1. E(Y ) = E(V ) + E(U) = 1. K Y = 1 6. VI. LINEAR REGRESSION VS. LLSE We discuss the connection between the familiar linear regression procedure and the LLSE. One is given a set of n pairs of numbers {(x m, y m ), m = 1,..., n}, as shown in Figure 5. One draws a line through the points. That line approximates the values y m by z m = αx m + β. The line is chosen to minimize (z m y m ) 2 = (αx m + β y m ) 2. (3) That is, the linear regression is the linear approximation that minimizes the sum of the squared errors. Note that there is no probabilistic framework in this procedure. To find the values of α and β that minimize the sum of the squared errors, one differentiates (3) with respect to α and β and sets the derivatives to zero. One finds (αx m + β y m ) = and x m (αx m + β y m ) =. Solving these equations, we find α = A(xy) A(x)A(y) A(x 2 ) A(x) 2 In these expressions, we used the following notation: A(y) := 1 y m, A(x) := 1 x m, A(xy) := 1 n n n Thus, the point y m is approximated by z m = A(y) + and β = A(y) αa(x). x m y m, and A(x 2 ) := 1 n A(xy) A(x)A(y) A(x 2 ) A(x) 2 (x m A(x)). x 2 m. Note that if the pairs (x m, y m ) are realizations of i.i.d. random variables ( m, Y m ) = D (, Y ) with finite variances, then, as n, A(x) E(), A(x 2 ) E( 2 ), A(y) E(Y ), and A(xy) E(Y ). Consequently, if n is large, we see that See [2] for examples. z m A(Y ) + cov(, Y ) (x m E()) = L[Y = x m ]. var() REFERENCES [1] R. G. Gallager: Stochastic Processes: A Conceptual Approach. August 2, 21. [2] D. Freedman: Statistical Models: Theory and Practice. Cambridge University Press, 25. [3] J Walrand: Lecture Notes on Probability and Random Processes, On-Line Book, August 24.

5 z m y m x m Fig. 5. Linear Regression.