Estimation Theory Fredrik Rusek. Chapters

Similar documents
Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

A Few Notes on Fisher Information (WIP)

EIE6207: Estimation Theory

Estimation Theory Fredrik Rusek. Chapters 6-7

Detection & Estimation Lecture 1

Detection & Estimation Lecture 1

Estimation and Detection

Advanced Signal Processing Introduction to Estimation Theory

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Graduate Econometrics I: Unbiased Estimation

Detection Theory. Composite tests

LINEAR MODELS IN STATISTICAL SIGNAL PROCESSING

Basic concepts in estimation

ECE 275A Homework 6 Solutions

ECE531 Lecture 10b: Maximum Likelihood Estimation

Chapter 8: Least squares (beginning of chapter)

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Estimation Theory Fredrik Rusek. Chapter 11

Regression #4: Properties of OLS Estimator (Part 2)

Estimators as Random Variables

Advanced Signal Processing Minimum Variance Unbiased Estimation (MVU)

Parameter Estimation

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10. Linear Models and Maximum Likelihood Estimation

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Classical Estimation Topics

EIE6207: Maximum-Likelihood and Bayesian Estimation

Rowan University Department of Electrical and Computer Engineering

Computational Intelligence Winter Term 2018/19

ECE 275A Homework 7 Solutions

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

ECE 275B Homework # 1 Solutions Version Winter 2015

Maximum Likelihood Estimation

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Linear models. x = Hθ + w, where w N(0, σ 2 I) and H R n p. The matrix H is called the observation matrix or design matrix 1.

Frames and Phase Retrieval

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

V. Properties of estimators {Parts C, D & E in this file}

557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE

DOA Estimation using MUSIC and Root MUSIC Methods

ECE 275B Homework # 1 Solutions Winter 2018

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

STA414/2104 Statistical Methods for Machine Learning II

Proof In the CR proof. and

Large Sample Properties of Estimators in the Classical Linear Regression Model

Detection and Estimation Theory

2 Statistical Estimation: Basic Concepts

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation. So: How can we evaluate E(X Y )? What is a neural network? How is it useful?

STAT 730 Chapter 4: Estimation

Rethinking Biased Estimation: Improving Maximum Likelihood and the Cramér Rao Bound

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Fisher Information Matrix-based Nonlinear System Conversion for State Estimation

MIMO Capacities : Eigenvalue Computation through Representation Theory

MSE Bounds With Affine Bias Dominating the Cramér Rao Bound Yonina C. Eldar, Senior Member, IEEE

Gaussian Random Variables Why we Care

Closed-Form Expressions for the Exact Cramér-Rao Bound for Parameter Estimation of Arbitrary Square QAM-Modulated Signals

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

COMPLEX CONSTRAINED CRB AND ITS APPLICATION TO SEMI-BLIND MIMO AND OFDM CHANNEL ESTIMATION. Aditya K. Jagannatham and Bhaskar D.

Brief Review on Estimation Theory

Motivation CRLB and Regularity Conditions CRLLB A New Bound Examples Conclusions

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Lecture 8: Information Theory and Statistics

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

6.1 Variational representation of f-divergences

Detection theory. H 0 : x[n] = w[n]

Optimal Time Division Multiplexing Schemes for DOA Estimation of a Moving Target Using a Colocated MIMO Radar

Estimation, Detection, and Identification

Interactive Interference Alignment

Lecture Notes 7 Stationary Random Processes. Strict-Sense and Wide-Sense Stationarity. Autocorrelation Function of a Stationary Process

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Lecture 7 Introduction to Statistical Decision Theory

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Tutorial: Statistical distance and Fisher information

Lecture 03 Positive Semidefinite (PSD) and Positive Definite (PD) Matrices and their Properties

Machine Learning Basics: Maximum Likelihood Estimation

Signal Denoising with Wavelets

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

SIGNALS OF OPPORTUNITY IN MOBILE RADIO POSITIONING. Armin Dammann, Stephan Sand and Ronald Raulefs

Timing Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

STA 260: Statistics and Probability II

Cramer-Rao Bound Fundamentals of Kalman Filtering: A Practical Approach

Linear Models for Regression CS534

STAT 100C: Linear models

Statistics: Learning models from data

Mathematical Statistics

Elements of statistics (MATH0487-1)

Fundamentals of Statistical Signal Processing Volume II Detection Theory

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

Regression #3: Properties of OLS Estimator

Probability Background

Transcription:

Estimation Theory Fredrik Rusek Chapters 3.5-3.10

Recap We deal with unbiased estimators of deterministic parameters Performance of an estimator is measured by the variance of the estimate (due to the unbiased condition) If an estimator has lower variance for all values of the parameter to estimate, it is the minimum variance estimator (MVU) If the estimator is biased, one cannot definte any concept of being an optimal estimator for all values of the parameter to estimate The smallest variance possible is determined by the CRLB If the CRLB is tight for all values of the parameter, the estimator is efficient The CRLB provides us with one method to find the MVU estimator Efficient -> MVU. The converse is not true To derive the CRLB, one must know the likelihood function

Recap Chapter 3 Cramer-Rao lower bound

Fisher Information The quantity qualifies as an information measure 1. The more information, the smaller variance 2. Nonnegative : 3. Additive for independent observations

Fisher Information 3. Additive for independent observations is important:

Fisher Information 3. Additive for independent observations is important: For independent observations: And for IID observations x[n] where

Fisher Information 3. Additive for independent observations is important: For independent observations: And for IID observations x[n] Further verification of information measure : If x[0]= x[n-1], then where

Section 3.5: CRLB for signals in white Gaussian noise White Gaussian noise is common. Handy to derive CRLB explicitly for this case Signal model Likelihood

Section 3.5: CRLB for signals in white Gaussian noise White Gaussian noise is common. Handy to derive CRLB explicitly for this case Signal model Likelihood We need to compute the Fisher information

Section 3.5: CRLB for signals in white Gaussian noise White Gaussian noise is common. Handy to derive CRLB explicitly for this case Signal model Likelihood Take one differential

Section 3.5: CRLB for signals in white Gaussian noise White Gaussian noise is common. Handy to derive CRLB explicitly for this case Signal model Likelihood Take one differential Take one more

Section 3.5: CRLB for signals in white Gaussian noise White Gaussian noise is common. Handy to derive CRLB explicitly for this case Signal model Likelihood Take one differential Take one more Take expectation

Section 3.5: CRLB for signals in white Gaussian noise White Gaussian noise is common. Handy to derive CRLB explicitly for this case Signal model Likelihood Take one differential Take one more Take expectation Conclude:

Example 3.5: sinusoidal frequency estimation in white noise Signal model Derive CRLB for f 0

Example 3.5: sinusoidal frequency estimation in white noise Signal model Derive CRLB for f 0 We have from previous slides that (where θ=f 0 )

Example 3.5: sinusoidal frequency estimation in white noise Signal model Derive CRLB for f 0 We have from previous slides that (where θ=f 0 ) Which yields

Section 3.6: transformation of parameters Now assume that we are interested in estimation of α=g(θ) We already proved the CRLB for this case

Section 3.6: transformation of parameters Now assume that we are interested in estimation of α=g(θ) We already proved the CRLB for this case Question: If we estimate θ first, can we then estimate α as α=g(θ)?

Section 3.6: transformation of parameters Works for linear transformations g(x)=ax+b. The estimator for θ is efficient. Choose the estimator for g(θ) as We have: so the estimator is unbiased

Section 3.6: transformation of parameters Works for linear transformations g(x)=ax+b. The estimator for θ is efficient. Choose the estimator for g(θ) as We have: so the estimator is unbiased The CRLB states

Section 3.6: transformation of parameters Works for linear transformations g(x)=ax+b. The estimator for θ is efficient. Choose the estimator for g(θ) as We have: so the estimator is unbiased The CRLB states but so we reach the CRLB

Section 3.6: transformation of parameters Works for linear transformations g(x)=ax+b. The estimator for θ is efficient. Choose the estimator for g(θ) as We have: so the estimator is unbiased The CRLB states but so we reach the CRLB Efficiency is preserved for affine transformations!

Section 3.6: transformation of parameters Now move on to non-linear transformations Recall the DC level in white noise example. Now seek the CRLB for the power A 2

Section 3.6: transformation of parameters Now move on to non-linear transformations Recall the DC level in white noise example. Now seek the CRLB for the power A 2 We have g(x)=x 2, so

Section 3.6: transformation of parameters Recall: the sample mean estimator is efficient for the DC level estimation Question: Is A 2 efficient for A 2?

Section 3.6: transformation of parameters Recall: the sample mean estimator is efficient for the DC level estimation Question: Is A 2 efficient for A 2? NO! This is not even an unbiased estimator

Section 3.6: transformation of parameters Recall: the sample mean estimator is efficient for the DC level estimation Question: Is A 2 efficient for A 2? NO! This is not even an unbiased estimator Efficiency is in general destroyed by non-linear transformations!

Section 3.6: transformation of parameters Non-linear transformations with large data records Take a look at the bias again

Section 3.6: transformation of parameters Non-linear transformations with large data records Take a look at the bias again The square of the sample mean is asymptotically unbiased or unbiased as N grows large

Section 3.6: transformation of parameters Non-linear transformations with large data records Take a look at the bias again The square of the sample mean is asymptotically unbiased or unbiased as N grows large Further While the CRLB states that

Section 3.6: transformation of parameters Non-linear transformations with large data records Take a look at the bias again The square of the sample mean is asymptotically unbiased or unbiased as N grows large Further While the CRLB states that Hence, the estimator A 2 is asymptotically efficient

Section 3.6: transformation of parameters Why is the estimator α=g(θ) asymptotically efficient?

Section 3.6: transformation of parameters Why is the estimator α=g(θ) asymptotically efficient? When the data record grows, the data statistic* becomes more concentrated around a stable value. We can linearize g(x) around this value, and as the data record grows large, the nonlinear region will seldomly occur *Definition of statistic: function of data observations used to estimate parameters of interest

Section 3.6: transformation of parameters Why is the estimator α=g(θ) asymptotically efficient? When the data record grows, the data statistic* becomes more concentrated around a stable value. We can linearize g(x) around this value, and as the data record grows large, the nonlinear region will seldomly occur *Definition of statistic: function of data observations used to estimate parameters of interest

Section 3.6: transformation of parameters Why is the estimator α=g(θ) asymptotically efficient? Linearization at A

Section 3.6: transformation of parameters Why is the estimator α=g(θ) asymptotically efficient? Linearization at A Expectation Unbiased!

Section 3.6: transformation of parameters Why is the estimator α=g(θ) asymptotically efficient? Linearization at A Expectation Variance Unbiased! Efficient!

Section 3.7: Multi-variable CRLB

Section 3.7: Multi-variable CRLB Same regularity conditions as before Satisfied (in general) when integration and differentiating can be interchanged

Section 3.7: Multi-variable CRLB Covariance of estimator minus Fisher matrix is positive semi-definite Will soon be a bit simplified

Section 3.7: Multi-variable CRLB We can find the MVU estimator also in this case

Section 3.7: Multi-variable CRLB Let us start with an implication of

Section 3.7: Multi-variable CRLB Let us start with an implication of The diagonal elements of a positive semi-definite matrix are non-negative Therefore

Section 3.7: Multi-variable CRLB Let us start with an implication of The diagonal elements of a positive semi-definite matrix are non-negative Therefore And consequently,

Proof Regularity condition: same story as for single parameter case Start by differentiating the unbiasedness conditions

Proof Regularity condition: same story as for single parameter case Start by differentiating the unbiasedness conditions And then add 0 via the regularity condition to get

Proof Regularity condition: same story as for single parameter case Start by differentiating the unbiasedness conditions And then add 0 via the regularity condition to get For i j, we have

Proof Regularity condition: same story as for single parameter case Combining into matrix form, we get And then add 0 via the regularity condition to get For i j, we have

Proof Chapter 3 Cramer-Rao lower bound Regularity condition: same story as for single parameter case Combining into matrix form, we get rx1 vector 1xp vector And then add 0 via the regularity condition to get rxp matrix For i j, we have

Proof Regularity condition: same story as for single parameter case Combining into matrix form, we get Now premultiply with a T and postmultiply with b, to yieldcondition to get For i j, we have

Proof Regularity condition: same story as for single parameter case Combining into matrix form, we get Now premultiply with a T and postmultiply with b, to yieldcondition to get Now apply C-S to yield

Proof Regularity condition: same story as for single parameter case Combining into matrix form, we get Now premultiply with a T and postmultiply with b, to yieldcondition to get Now apply C-S to yield

Proof Regularity condition: same story as for single parameter case Note that but since this is ok

Proof The vectors a and b are arbitrary. Now select b as

Proof The vectors a and b are arbitrary. Now select b as Some manipulations yield

Proof The vectors a and b are arbitrary. Now select b as Some manipulations yield For later use, we must now prove that the Fisher information matrix I(θ) is positive semidefinite.

Interlude (proof that Fisher information matrix is positive semidefinite) We have So

Interlude (proof that Fisher information matrix is positive semidefinite) We have So Condition for positive semi-definite:

Interlude (proof that Fisher information matrix is positive semidefinite) We have So Condition for positive semi-definite: But d T d for d

Proof Now return to the proof, we had

Proof The vectors a and b are arbitrary. Now select b as Some manipulations yield We have positive semi-definite positive semi-definite positive semi-definite

Proof The We therefore get

Proof The and but a is arbitrary, so must be positive semi-definite

Proof Last part:

Proof Last part: Condition for equality

Proof Last part: Condition for equality However, a is arbitrary, so it must hold that

Proof Each element of the vector equals Condition for equality However, a is arbitrary, so it must hold that

Proof Each element of the vector equals One more differential (just as in the single-parameter case) yields (via the chain rule)

Proof Each element of the vector equals Take expectation

Proof Each element of the vector equals Take expectation Only k=j remains

Proof Each element of the vector equals Take expectation Only k=j remains def From above

Proof Each element of the vector equals Take expectation Only k=j remains def From above

Example 3.6 Chapter 3 Cramer-Rao lower bound DC level in white Gaussian noise with unknown density: estimate both A and σ 2

Example 3.6 Chapter 3 Cramer-Rao lower bound DC level in white Gaussian noise with unknown density: estimate both A and σ 2 Fisher Information matrix is

Example 3.6 Chapter 3 Cramer-Rao lower bound DC level in white Gaussian noise with unknown density: estimate both A and σ 2 Fisher Information matrix is Derivatives are

Example 3.6 Chapter 3 Cramer-Rao lower bound DC level in white Gaussian noise with unknown density: estimate both A and σ 2 Fisher Information matrix is Derivatives are

Example 3.6 Chapter 3 Cramer-Rao lower bound DC level in white Gaussian noise with unknown density: estimate both A and σ 2 Fisher Information matrix is What are the implications of a diagonal Fisher information matrix?

Example 3.6 Chapter 3 Cramer-Rao lower bound DC level in white Gaussian noise with unknown density: estimate both A and σ 2 Fisher Information matrix is What are the implications of a diagonal Fisher information matrix? I -1 (θ) 11 I -1 (θ) 22 But this is precisely what one would have obtained if only one parameter was unknown

Example 3.6 DC level in white Gaussian noise with unknown density: estimate both A and σ 2 Fisher Information matrix is A diagonal Fisher Information matrix means that the parameters to estimate are independent and that the quality of the estimate are not degraded when the other parameters are unknown.

Example 3.7 Chapter 3 Cramer-Rao lower bound Line fitting problem: Estimate A and B from the data x[n] Fisher Information matrix

Example 3.7 Chapter 3 Cramer-Rao lower bound Line fitting problem: Estimate A and B from the data x[n] Fisher Information matrix

Example 3.7 Chapter 3 Cramer-Rao lower bound Line fitting problem: Estimate A and B from the data x[n] Fisher Information matrix In this case, the quality of A is degraded if B is unknown (4 times if N is large)

Example 3.7 Chapter 3 Cramer-Rao lower bound How to find the MVU estimator?

Example 3.7 How to find the MVU estimator? Use the CRLB!

Example 3.7 Find the MVU estimator. Not easy to see this. We have

Example 3.7 Find the MVU estimator. Not easy to see this. We have No dependency on parameters to estimate No dependency on x[n]

Example 3.8 Chapter 3 Cramer-Rao lower bound DC level in Gaussian noise with unknown density. Estimate SNR

Example 3.8 Chapter 3 Cramer-Rao lower bound DC level in Gaussian noise with unknown density. Estimate SNR Fisher information matrix From proof of CRLB

Example 3.8 Chapter 3 Cramer-Rao lower bound DC level in Gaussian noise with unknown density. Estimate SNR Fisher information matrix From proof of CRLB

Example 3.8 Chapter 3 Cramer-Rao lower bound DC level in Gaussian noise with unknown density. Estimate SNR Fisher information matrix So

Section 3.9. Derivation of Fisher matrix for Gaussian signals A common case is that the received signal is Gaussian Convenient to derive formulas for this case (see appendix for proof) When the covariance is not dependent on θ, the second term vanishes, and one can reach

Example 3.14 Chapter 3 Cramer-Rao lower bound Consider the estimation of A, f 0, and φ

Example 3.14 Chapter 3 Cramer-Rao lower bound Consider the estimation of A, f 0, and φ Interpretation of the structure?

Example 3.14 Chapter 3 Cramer-Rao lower bound Consider the estimation of A, f 0, and φ Frequency estimation decays as 1/N 3

Section 3.10: Asymptotic CRLB for Gaussian WSS processes For Gaussian WSS processes (first and second order statistics are constant) over time The elements of the Fisher matrix can be found easy Where P xx is the PSD of the process and N (observation length) grows unbounded This is widely used in e.g. ISI problems

Chapter 4 The linear model Definition This is the linear model, note that in this book, the noise is white Gaussian

Chapter 4 The linear model Definition Let us now find the MVU estimator.how to proceed?

Chapter 4 The linear model Definition Let us now find the MVU estimator

Chapter 4 The linear model Definition Let us now find the MVU estimator

Chapter 4 The linear model Definition Let us now find the MVU estimator

Chapter 4 The linear model Definition Let us now find the MVU estimator

Chapter 4 The linear model Definition Let us now find the MVU estimator

Chapter 4 The linear model Definition Let us now find the MVU estimator

Chapter 4 The linear model Definition Let us now find the MVU estimator Conclusion 1: Conclusion 2: MVU estimator (efficient) Statistical performance Covariance

Chapter 4 The linear model Example 4.1: curve fitting Task is to fit data samples with a second order polynomial We can write this as and the (MVU) estimator is

Chapter 4 The linear model Section 4.5: Extended linear model Now assume that the noise is not white, so Further assume that the data contains a known part s, so that we have We can transfer this back to the linear model by applying the following transformation: x =D(x-s) where C -1 =D T D

Chapter 4 The linear model Section 4.5: Extended linear model In general we have

Chapter 4 The linear model Example: Signal transmitted over multiple antennas and received by multiple antennas Assume that an unknown signal θ is transmitted and received over equally many antennas θ[0] x[0] θ[1] θ[n-1] x[1] x[n-1] All channels are assumed Different due to the nature of radio propagation

Chapter 4 The linear model Example: Signal transmitted over multiple antennas and received by multiple antennas Assume that an unknown signal θ is transmitted and received over equally many antennas θ[0] x[0] θ[1] θ[n-1] x[1] x[n-1] All channels are assumed Different due to the nature of radio propagation The linear model applies

Chapter 4 The linear model Example: Signal transmitted over multiple antennas and received by multiple antennas Assume that an unknown signal θ is transmitted and received over equally many antennas θ[0] x[0] θ[1] θ[n-1] x[1] x[n-1] All channels are assumed Different due to the nature of radio propagation The linear model applies So, the best estimator (MVU) is which is the ZF equalizer in MIMO