Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Similar documents
Brief Review on Estimation Theory

2 Statistical Estimation: Basic Concepts

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

System Identification, Lecture 4

System Identification, Lecture 4

Econometrics I, Estimation

Review: mostly probability and some statistics

Parameter Estimation

Mathematical statistics

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

conditional cdf, conditional pdf, total probability theorem?

Multiple Random Variables

Continuous Random Variables

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Mathematical Statistics

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Lecture 2: Repetition of probability theory and statistics

Mathematical statistics

Lecture 11. Probability Theory: an Overveiw

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Parametric Techniques Lecture 3

ECE531 Lecture 8: Non-Random Parameter Estimation

1.1 Review of Probability Theory

Efficient Monte Carlo computation of Fisher information matrix using prior information

Parametric Techniques

STAT 512 sp 2018 Summary Sheet

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

ECE531 Lecture 10b: Maximum Likelihood Estimation

Let X and Y denote two random variables. The joint distribution of these random

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Statistics and Data Analysis

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Recitation 2: Probability

Probability Background

Statistics Ph.D. Qualifying Exam: Part II November 3, 2001

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

Qualifying Exam in Probability and Statistics.

ECE534, Spring 2018: Solutions for Problem Set #3

STAT215: Solutions for Homework 2

EE4601 Communication Systems

Appendix A : Introduction to Probability and stochastic processes

Chapter 3. Point Estimation. 3.1 Introduction

F & B Approaches to a simple model

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Basic concepts in estimation

HT Introduction. P(X i = x i ) = e λ λ x i

Advanced Signal Processing Introduction to Estimation Theory

Asymptotic Statistics-III. Changliang Zou

A Very Brief Summary of Statistical Inference, and Examples

Quick Tour of Basic Probability Theory and Linear Algebra

Lecture 7 Introduction to Statistical Decision Theory

Probability Theory and Statistics. Peter Jochumzen

Estimators as Random Variables

A Few Notes on Fisher Information (WIP)

ESTIMATION THEORY. Chapter Estimation of Random Variables

Multivariate Regression

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Algorithms for Uncertainty Quantification

Chapter 5,6 Multiple RandomVariables

A Very Brief Summary of Statistical Inference, and Examples

5 Operations on Multiple Random Variables

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Review of Probability Theory

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

Evaluating the Performance of Estimators (Section 7.3)

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Formulas for probability theory and linear models SF2941

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

BASICS OF PROBABILITY

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Central Limit Theorem ( 5.3)

First Year Examination Department of Statistics, University of Florida

Introduction to Simple Linear Regression

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Introduction to Estimation Methods for Time Series models Lecture 2

ECE 275A Homework 7 Solutions

Chapter 3: Maximum Likelihood Theory

1. Point Estimators, Review

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Chapter 6: Random Processes 1

Multivariate Random Variable

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

Problem Selected Scores

Multivariate Statistics

ECE531 Homework Assignment Number 6 Solution

EIE6207: Estimation Theory

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

Review. December 4 th, Review

Mathematical statistics: Estimation theory

A Probability Review

Transcription:

Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1

Random Variables Let X be a scalar random variable (rv) X : Ω R defined over the set of elementary events Ω. The notation X F X (x), f X (x) denotes that: F X (x) is the cumulative distribution function (cdf) of X F X (x) = P {X x}, x R f X (x) is the probability density function (pdf) of X F X (x) = x f X (σ)dσ, x R 2

Multivariate distributions Let X = (X 1,...,X n ) be a vector of rvs X : Ω R n defined over Ω. The notation X F X (x), f X (x) denotes that: F X (x) is the joint cumulative distribution function (cdf) of X F X (x) = P {X 1 x 1,...,X n x n }, x = (x 1,...,x n ) R n f X (x) is the joint probability density function (pdf) of X F X (x) = x1... xn f X (σ 1,...,σ n )dσ 1...dσ n, x R n 3

Moments of a rv First order moment (mean) m X = E[X] = Second order moment (variance) + xf X (x)dx σ 2 X = Var(X) = E [ (X m X ) 2] = + (x m X ) 2 f X (x)dx Example The normal or Gaussian pdf, denoted by N(m,σ 2 ), is defined as f X (x) = 1 2πσ e (x m)2 2σ 2. It turns out that E[X] = m and Var(X) = σ 2. 4

Conditional distribution Bayes formula f X Y (x y) = f X,Y(x,y) f Y (y) One has: f X (x) = + f X Y (x y)f Y (y)dy If X and Y are independent: f X Y (x y) = f X (x) Definitions: conditional mean: E[X Y] = + xf X Y (x y)dx conditional variance: P X Y = + (x E[X Y]) 2 f X Y (x y)dx 5

Gaussian conditional distribution Let X and Y Gaussian rvs such that: E[X] = m X E[Y] = m Y E X m X Y m Y X m X Y m Y = R X R XY R XY R Y It turns out that: E[X Y] = m X +R XY R 1 Y (Y m Y) P X Y = R X R XY R 1 Y R XY 6

Estimation problems Problem. Estimate the value of θ R p, using an observation y of the rv Y R n. Two different settings: a. Parametric estimation The pdf of Y depends on the unknown parameter θ b. Bayesian estimation The unknown θ is a random variable 7

Parametric estimation problem The cdf and pdf of Y depend on the unknown parameter vector θ, Y FY(x), θ fy(x) θ Θ R p denotes the parameter space, i.e., the set of values which θ can take Y R n denotes the observation space, to which belongs the rv Y 8

Parametric estimator The parametric estimation problem consists in finding θ on the basis of an observation y of the rv Y. Definition 1 An estimator of the parameter θ is a function T : Y Θ Given the estimator T( ), if one observes, y, then the estimate of θ is ˆθ = T(y). There are infinite possible estimators (all the functions of y!). Therefore, it is crucial to establish a criterion to assess the quality of an estimator. 9

Unbiased estimator Definition 2 An estimator T( ) of the parameter θ is unbiased (or correct) if E θ [T( )] = θ, θ Θ. biased unbiased Pdf of two estimators T( ) θ 10

Examples Let Y 1,...,Y n be identically distributed rvs, with mean m. The sample mean n Ȳ = 1 n Y i i=1 is an unbiased estimator of m. Indeed, E [ Ȳ ] = 1 n n E[Y i ] = m Let Y 1,...,Y n be independent identically distributed (i.i.d.) rvs, with variance σ 2. The sample variance i=1 S 2 = 1 n 1 n (Y i Ȳ)2 i=1 is an unbiased estimator of σ 2. 11

Consistent estimator Definition 3 Let {Y i } i=1 be a sequence of rvs. The sequence of estimators T n =T n (Y 1,...,Y n ) is said to be consistent if T n converges to θ in probability for all θ Θ, i.e. lim P { T n θ > ε} = 0, ε > 0, θ Θ n n = 500 n = 100 n = 50 n = 20 A sequence of consistent estimators T n ( ) θ 12

Example Let Y 1,...,Y n be independent rvs with mean m and finite variance. The sample mean Ȳ = 1 n n i=1 is a consistent estimator of m, thanks to the next result. Y i Theorem 1 (Law of large numbers) Let {Y i } i=1 be a sequence of independent rvs with mean m and finite variance. Then, the sample mean Ȳ converges to m in probability. 13

A suffcient condition for consistency Theorem 2 Let ˆθ n = T n (y) be a sequence of unbiased estimators of θ R, based on the realization y R n of the n-dimensional rv Y, i.e.: E θ [T n (y)] = θ, n, θ Θ. If lim Eθ[ (T n (y) θ) 2] = 0, n + then, the sequence of estimators T n ( ) is consistent. Example. Let Y 1,...,Y n be independent rvs with mean m and variance σ 2. We know that the sample mean Moreover, it turns out that Ȳ is an unbiased estimate of m. σ2 Var(Ȳ) = n Therefore, the sample mean is a consistent estimator of the mean. 14

Mean square error Consider an estimator T( ) of the scalar parameter θ. Definition 4 We define mean square error (MSE) of T( ), E θ[ (T(Y) θ) 2] If the estimator T( ) is unbiased, the mean square error corresponds to the variance of the estimation error T(Y) θ. Definition 5 Given two estimators T 1 ( ) and T 2 ( ) of θ, T 1 ( ) is better than T 2 ( ) if E θ[ (T 1 (Y) θ) 2] E θ[ (T 2 (Y) θ) 2], θ Θ If we restrict our attention to unbiased estimators, we are interested to the one with the least MSE for any value of θ (notice that it may not exist). 15

Minimum variance unbiased estimator Definition 6 An unbiased estimator T ( ) of θ is UMVUE (Uniformly Minimum Variance Unbiased Estimator) if E θ[ (T (Y) θ) 2] E θ[ (T(Y) θ) 2], θ Θ for any unbiased estimator T( ) of θ. UMVUE θ 16

Minimum variance linear estimator Let us restrict our attention to the class of linear estimators n T(x) = a i x i, a i R i=1 Definition 7 A linear unbiased estimator T ( ) of the scalar parameter θ is said to be BLUE (Best Linear Unbiased Estimator) if E θ[ (T (Y) θ) 2] E θ[ (T(Y) θ) 2], θ Θ for any linear unbiased estimator T( ) di θ. Example Let Y i be independent rvs with mean m and variance σi, 2 i = 1,...,n. 1 n 1 Ŷ = Y n 1 σi 2 i i=1 is the BLUE estimator of m. i=1 σ 2 i 17

Cramer-Rao bound The Cramer-Rao bound is a lower bound to the variance of any unbiased estimator of the parameter θ. Theorem 3 Let T( ) be an unbiased estimator of the scalar parameter θ, and let the observation space Y be independent on θ. Then (under some technical assumptions), E θ[ (T(Y) θ) 2] [I n (θ)] 1 [ ( lnf ) where I n (θ)=e θ θ 2 ] Y (Y) ( Fisher information). θ Remark To compute I n (θ) one must know the actual value of θ; therefore, the Cramer-Rao bound is usually unknown in practice. 18

Cramer-Rao bound For a parameter vector θ and any unbiased estimator T( ), one has E θ[ (T(Y) θ)(t(y) θ) ] [I n (θ)] 1 (1) where I n (θ) = E θ [ ( lnf θ Y (Y) θ ) ( ) ] lnf θ Y (Y) θ is the Fisher information matrix. The inequality in (1) is in matricial sense (A B means that A B is positive semidefinite). Definition 8 An unbiased estimator T( ) such that equality holds in (1) is said to be efficient. 19

Cramer-Rao bound If the rvs Y 1,...,Y n are i.i.d., it turns out that I n (θ) = ni 1 (θ) Hence, for fixed θ the Cramer-Rao bound decreases as 1 n the data sample. with the size n of Example Let Y 1,...,Y n be i.i.d. rvs with mean m and variance σ 2. Then [ (Ȳ ) ] 2 E m = σ2 n [I n(θ)] 1 = [I 1(θ)] 1 n where Ȳ denotes the sample mean. Moreover, if the rvs Y 1,...,Y n are normally distributed, one has also I 1 (θ)= 1 σ 2. Since the Cramer-Rao bound is achieved, in the case of normal i.i.d rvs, the sample mean is an efficient estimator of the mean. 20

Maximum likelihood estimators Consider a rv Y f θ Y(y), and let y be an observation of Y. We define likelihood function, the function of θ (for fixed y) L(θ y) = f θ Y(y) We choose as estimate of θ the value of the parameter which maximises the likelihood of the observed event (this value depends on y!). Definition 9 A maximum likelihood estimator of the parameter θ is the estimator T ML (x) = argmax θ Θ L(θ x) Remark The functions L(θ x) and ln L(θ x) achieve their maximum values for the same θ. In some cases is easier to find the maximum of lnl(θ x) (exponential distributions). 21

Properties of the maximum likelihood estimators Theorem 4 Under the assumptions for the existence of the Cramer-Rao bound, if there exists an efficient estimator T ( ), then it is a maximum likelihood estimator T ML ( ). Example Let Y i N(m,σ 2 i) be independent, with known σ 2 i, i = 1,...,n. The estimator Ŷ = 1 n 1 n i=1 1 σ 2 i Y i i=1 of m is unbiased and such that Var(Ŷ) = 1 n σ 2 i 1 i=1 σ 2 i, while I n (m) = n i=1 1 σ 2 i. Hence, m. Ŷ is efficient, end therefore it s a maximum likelihood estimator of 22

The maximum likelihood estimator has several nice asymptotic properties. Theorem 5 If the rvs Y 1,...,Y n are i.i.d., then (under suitable technical assumptions) In (θ)(t ML (Y) θ) lim n + is a random variable with standard normal distribution N(0,1). Theorem 5 states that the maximum likelihood estimator asymptotically unbiased consistent asymptotically efficient asymptotically normal 23

Example Let Y 1,...,Y n be normal rvs with mean m and variance σ 2. The sample mean n Ȳ = 1 n Y i i=1 is a maximum likelihood estimator of m. Moreover, I n (m)(ȳ m) N(0,1), being I n(m)= n σ 2. Remark The maximum likelihood estimator may be biased. Let Y 1,...,Y n be independent normal rvs with variance σ 2. The maximum likelihood estimator of σ 2 is n Ŝ 2 = 1 n (Y i Ȳ)2 i=1 which is biased, being E[Ŝ2 ] = n 1 n σ2. 24

Confidence intervals In many estimation problems, it is important to establish a set to which the parameter to be estimated belongs with a known probability. Definition 10 A confidence interval with confidence level 1 α, 0 < α < 1, for the scalar parameter θ is a function that maps any observation y Y into an interval B(y) Θ such that P θ {θ B(y)} 1 α, θ Θ Hence, a confidence interval of level 1 α for θ is a subset of Θ such that, if we observe y, then θ B(y) with probability at least 1 α, whatever is the true value θ Θ. 25

Example Let Y 1,...,Y n be normal rvs with unknown mean m and known n variance σ 2. Then, (Ȳ m) N(0,1), where Ȳ is the sample mean. σ Let y α be such that 1 α = P { n σ yα y α 1 2π e y2 dy = 1 α. Being, } (Ȳ m) y α = P { Ȳ σ n y α m Ȳ + σ n y α } one has that [Ȳ σ n y α, Ȳ + σ n y α ] 0.4 area=1 α is a confidence interval of level 1 α for m. 0.2 0 x α 0 x α 26

Nonlinear ML estimation problems Let Y R n be a vector of rvs such that Y = U(θ)+ε where - θ R p is the unknown parameter vector - U( ) : R p R n is a known function - ε R n is a vector of rvs, for which we assume ε N(0,Σ ε ) Problem: find a maximum likelihood estimator of θ ˆθ ML = T ML (Y) 27

Least squares estimate The pdf of the data Y is f Y (y) = f ε (y U(θ)) = L(θ y) Therefore, ˆθ ML = arg max θ = arg min θ ln L(θ y) (y U(θ)) Σ 1 ε (y U(θ)) If the covariance matrix Σ ε is known, we obtain the weighted least squares estimate. If U(θ) is a generic nonlinear function, the solution must be computed numerically MATLAB Optimization Toolbox >> help optim This problem can be computationally intractable! 28

Linear estimation problems If the function U( ) is linear, i.e., U(θ) = Uθ with U R n p known matrix, one has Y = Uθ +ε and the maximum likelihood estimator is the so-called Gauss-Markov estimator ˆθ ML = ˆθ GM = (U Σ 1 ε U) 1 U Σ 1 ε y In the special case ε N(0,σ 2 I) (the rvs ε i are independent!), one has the celebrated least squares estimator ˆθ LS = (U U) 1 U y 29

A special case: biased measurement error How to treat the case in which E[ε i ] = m ǫ 0, i = 1,...,n? 1) If m ǫ is known, just use the unbiased measurements Y m ε 1: where 1 = [1 1... 1]. 2) If m ε is known, estimate it! ˆθ ML = ˆθ GM = (U Σ 1 ε U) 1 U Σ 1 ε (y m ǫ 1) Let θ = [θ m ε ] R p+1 and then Y = [U 1] θ +ε Then, apply the Gauss-Markov estimator with estimate of θ (simultaneous estimate of θ and m ε ). Ū = [U 1] to obtain an Clearly, the variance of the estimation error of θ will be higher wrt case 1! 30

Gauss-Markov estimator The estimates ˆθ GM and ˆθ LS are widely used in practice, also if some of the assumptions on ε do not hold or cannot be validated. In particular, the following result holds. Theorem 6 Let Y = Uθ +ε with ε a vector of random variables with zero mean and covariance matrix Σ. Then, the Gauss-Markov estimator is the BLUE estimator of the parameter θ, ˆθ BLUE = ˆθ GM and the corresponding covariance of the estimation error is equal to ) ] E[(ˆθGM θ)(ˆθgm θ = ( U Σ 1 U ) 1. 31

Examples of least squares estimate Example 1. Y i = θ +ε i, i = 1,...,n ε i independent rvs with zero mean and variance σ 2 E[Y i ] = θ We want to estimate θ using observations of Y i, i = 1,...,n One has Y = Uθ +ε with U = (1 1... 1) and n ˆθ LS = (U U) 1 U y = 1 n i=1 y i The least squares estimator is equal to the sample mean (and it is also the maximum likelihood estimate if the rvs ε i are normal). 32

Example 2. Same setting of Example 1, with E[ε 2 i] = σ 2 i, i = 1,...,n In this case, E[εε ] = Σ ε = σ 2 1 0... 0 0 σ 2 2... 0........ 0 0... σ 2 n The least squares estimator is still the sample mean The Gauss-Markov estimator is ˆθ GM = (U Σ 1 ε U) 1 U Σ 1 ε y = 1 n 1 n i=1 1 σ 2 i y i i=1 σ 2 i and is equal to the maximum likelihood estimate if the rvs ε i are normal. 33

Bayesian estimation Estimate an unknown rv X, using observations of the rv Y Key tool: joint pdf f X,Y (x,y) least mean square error estimator optimal linear estimator 34

Bayesian estimation: problem formulation Problem: Given observations y of the rv Y R n, find an estimator of the rv X based on the data y. Solution: an estimator ˆX = T(Y), where T( ) : R n R p To assess the quality of the estimator we must define a suitable criterion: in general, we consider the risk function J r = E[d(X,T(Y))] = d(x,t(y))f X,Y (x,y)dxdy and we minimize J r with respect to all possible estimators T( ) d(x,t(y)) distance between the unknown X and its estimate T(Y) 35

Least mean square error estimator Let d(x,t(y)) = X T(Y) 2. One gets the least mean square error (MSE) estimator ˆX MSE = T (Y) where Theorem T ( ) = arg min T( ) E[ X T(Y) 2 ] ˆX MSE = E[X Y]. The conditional mean of X given Y is the least MSE estimate of X based on the observation of Y Let Q(X,T(Y)) = E[(X T(Y))(X T(Y)) ]. Then: Q(X, ˆX MSE ) Q(X,T(Y)), for any T(Y). 36

Optimal linear estimator The least MSE estimator needs the knowledge of the conditional distribution of X given Y Simpler estimators Linear estimators: T(Y) = AY +b A R p n, b R p 1 : estimator coefficients (to be determined) The Linear Mean Square Error (LMSE) estimate is given by ˆX LMSE = A Y +b where A,b = arg min A,b E[ X AY b 2 ] 37

LMSE estimator Theorem Let X and Y be rvs such that: E[X] = m X E[Y] = m Y E X m X Y m Y X m X Y m Y = R X R XY R XY R Y Then i.e, Moreover, ˆX LMSE = m X +R XY R 1 Y (Y m Y ) A = R XY R 1 Y, b = m X R XY R 1 Y m Y E[(X ˆX LMSE )(X ˆX LMSE ) ] = R X R XY R 1 Y R XY 38

Properties of the LMSE estimator The LMSE estimator does not require knowledge of the joint pdf of X e Y, but only of the covariance matrices R XY, R Y (second order statistics) The LMSE estimate satisfies E[(X ˆX LMSE )Y ] = E[{X m X R XY R 1 Y (Y m Y)}Y ] = R XY R XY R 1 Y R Y = 0 The optimal linear estimator is uncorrelated with data Y If X and Y are jointly Gaussian E[X Y] = m X +R XY R 1 Y (Y m Y ) hence ˆX LMSE = ˆX MSE In the Gaussian setting, the MSE estimate is a linear function of the observed variables Y, and therefore is equal to the LMSE estimate 39

Sample mean and covariances In many estimation problems, 1st and 2nd order moments are not known What if only a set of data x i, y i, i = 1,...,N, is available? Use the sample means and sample covariances as estimates of the moments Sample means N N ˆm N X = 1 N x i ˆm N Y = 1 N y i i=1 i=1 Sample covariances ˆR N X = ˆR N Y = ˆR N XY = 1 N 1 1 N 1 1 N 1 N (x i ˆm N X)(x i ˆm N X) i=1 N (y i ˆm N Y )(y i ˆm N y ) i=1 N (x i ˆm N X)(y i ˆm N y ) i=1 40

Example of LMSE estimation (1/2) Y i, i = 1,...,n, rvs such that Y i = u i X +ε i where - X rv with mean m X and variance σ 2 X ; - u i known coefficients; - ε i independent rvs wih zero mean and variance σi. 2 One has Y = UX +ε where U = (u 1 u 2... u n ) and E[εε ] = Σ ε = diag{σ 2 i}. We want to compute the LMSE estimate ˆX LMSE = m X +R XY R 1 Y (Y m Y ) 41

Example of LMSE estimation (2/2) - m Y = E[Y] = Um X - R XY = E[(X m X )(Y Um X ) ] = σ X 2 U - R Y = E[(Y Um X )(Y Um X ) ] = Uσ X 2 U +Σ ε Being ( Uσ X 2 U +Σ ε ) 1 = Σ 1 ε Σ 1 ε U ˆX LMSE = ( U Σ 1 ε U + 1 σ X 2 U Σ 1 ε Y + 1 σ X 2 m X U Σ 1 ε U + 1 σ X 2 Special case: U = (1 1... 1) (i.e., Y i = X +ε i ) n 1 Y σ 2 i + 1 2 i=1 i σ m X X ˆX LMSE = n 1 + 1 2 σ X i=1 Remark: the a priori info on X is treated as additional data. 42 σ 2 i ) 1 U Σ 1 ε, one gets

Example of Bayesian estimation (1/2) Let X and Y be two rvs whose joint pdf is 3 f X,Y (x,y) = 2 x2 +2xy 0 x 1, 1 y 2 0 else We want to find the estimates ˆX MSE and ˆX LMSE of X, based on one observation of the rv Y. Solutions: ˆX MSE = 2 3 y 3 8 y 1 2 ˆX LMSE = 1 22 y + 73 132 See MATLAB file: Es bayes.m 43

Example of Bayesian estimation (2/2) Joint pdf 0.65 0.64 fx,y (x,y) 2.5 2 1.5 1 ˆx 0.63 0.62 0.61 0.5 0.6 0 2 1.8 1.6 0.59 MEQM LMEQM E[X] 1 0.8 0.58 1.4 0.6 0.4 1.2 0.2 0.57 1 y 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 x y Joint pdf f X,Y (x,y) ˆXMSE (y) (red) ˆX LMSE (y) (green) E[X] (blue) 44