Estimation techniques

Similar documents
2 Statistical Estimation: Basic Concepts

Ch. 12 Linear Bayesian Estimators

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Parameter Estimation

Lecture 9: Diversity-Multiplexing Tradeoff Theoretical Foundations of Wireless Communications 1

Lecture 9: Diversity-Multiplexing Tradeoff Theoretical Foundations of Wireless Communications 1. Overview. Ragnar Thobaben CommTh/EES/KTH

Assignment 4 (Solutions) NPTEL MOOC (Bayesian/ MMSE Estimation for MIMO/OFDM Wireless Communications)

Advanced Signal Processing Introduction to Estimation Theory

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation

DS-GA 1002 Lecture notes 10 November 23, Linear models

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Detection and Estimation Theory

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Bayesian Decision Theory

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Outline Lecture 2 2(32)

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

ECE531 Lecture 10b: Maximum Likelihood Estimation

COMPLEX CONSTRAINED CRB AND ITS APPLICATION TO SEMI-BLIND MIMO AND OFDM CHANNEL ESTIMATION. Aditya K. Jagannatham and Bhaskar D.

A Simple Example Binary Hypothesis Testing Optimal Receiver Frontend M-ary Signal Sets Message Sequences. possible signals has been transmitted.

Problem Set 7 Due March, 22

Lecture 19: Bayesian Linear Estimators

Detecting Parametric Signals in Noise Having Exactly Known Pdf/Pmf

Pattern Recognition. Parameter Estimation of Probability Density Functions

Lessons in Estimation Theory for Signal Processing, Communications, and Control

ECE 564/645 - Digital Communications, Spring 2018 Homework #2 Due: March 19 (In Lecture)

L11: Pattern recognition principles

List Decoding: Geometrical Aspects and Performance Bounds

DS-GA 1002 Lecture notes 12 Fall Linear regression

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

EE6604 Personal & Mobile Communications. Week 15. OFDM on AWGN and ISI Channels

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

LINEAR MMSE ESTIMATION

MMSE DECODING FOR ANALOG JOINT SOURCE CHANNEL CODING USING MONTE CARLO IMPORTANCE SAMPLING

Basic concepts in estimation

Lecture 5 Channel Coding over Continuous Channels

Applications of Lattices in Telecommunications

Linear Regression and Its Applications

Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Outline lecture 2 2(30)

Lecture 18: Gaussian Channel

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Lecture 5 Least-squares

Statistics: Learning models from data

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

Lecture : Probabilistic Machine Learning

Chapter 9 Fundamental Limits in Information Theory

System Identification

IEEE C80216m-09/0079r1

Bayesian Decision Theory

Maximum Likelihood Estimation. only training data is available to design a classifier

Least Squares and Kalman Filtering Questions: me,

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

SOLUTIONS TO ECE 6603 ASSIGNMENT NO. 6

Recursive Estimation

ECE521 week 3: 23/26 January 2017

Problem Set 3 Due Oct, 5

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Massachusetts Institute of Technology

Communication Theory II

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Machine Learning 4771

EE6604 Personal & Mobile Communications. Week 13. Multi-antenna Techniques

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

STA414/2104 Statistical Methods for Machine Learning II

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Simultaneous SDR Optimality via a Joint Matrix Decomp.

Lecture 8: MIMO Architectures (II) Theoretical Foundations of Wireless Communications 1. Overview. Ragnar Thobaben CommTh/EES/KTH

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Least Squares Estimation Namrata Vaswani,

EE-597 Notes Quantization

Brief Review on Estimation Theory

A Theoretical Overview on Kalman Filtering

Statistics for scientists and engineers

LECTURE 18. Lecture outline Gaussian channels: parallel colored noise inter-symbol interference general case: multiple inputs and outputs

The Optimality of Beamforming: A Unified View

12.4 Known Channel (Water-Filling Solution)

Frequentist-Bayesian Model Comparisons: A Simple Example

Density Estimation: ML, MAP, Bayesian estimation

arxiv: v1 [math.st] 13 Dec 2016

EE 574 Detection and Estimation Theory Lecture Presentation 8

Classical Estimation Topics

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Efficient Equalization for Wireless Communications in Hostile Environments

MMSE Equalizer Design

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

ENGR352 Problem Set 02

ELEC546 MIMO Channel Capacity

Improved Detected Data Processing for Decision-Directed Tracking of MIMO Channels

Computer Vision & Digital Image Processing

Appendix B Information theory from first principles

Transcription:

Estimation techniques March 2, 2006 Contents 1 Problem Statement 2 2 Bayesian Estimation Techniques 2 2.1 Minimum Mean Squared Error (MMSE) estimation........................ 2 2.1.1 General formulation...................................... 2 2.1.2 Important special case: Gaussian model.......................... 3 2.1.3 Example: Linear Gaussian Model.............................. 3 2.2 Linear estimation............................................ 4 2.3 Linear MMSE Estimator........................................ 4 2.4 Orthogonality Condition........................................ 5 2.4.1 Principle............................................. 5 2.4.2 Example: Linear Model.................................... 5 3 Non-Bayesian Estimation Techniques 6 3.1 The BLU estimator........................................... 6 3.1.1 Example: Linear Gaussian Model.............................. 7 3.2 The LS estimator............................................ 8 3.3 Orthogonality condition........................................ 8 4 Extension to vector estimation 8 5 Further problems 8 5.1 Problem 1: relations for Linear Gaussian Models.......................... 8 5.1.1 No a priori information: MMSE LS........................... 9 5.1.2 Uncorrelated noise: BLU LS............................... 9 5.1.3 Zero-noise: MMSE LS................................... 9 5.2 Problem 2: orthogonality........................................ 9 5.3 Practical example: multi-antenna communication......................... 9 5.3.1 Problem............................................. 10 5.3.2 Solution............................................. 10 1

1 Problem Statement We are interested in estimating a continuous scalar 1 parameter a A from a vector observation r. The observation the parameter are related through a probabilistic mapping p R A (r a). a may or may not be a sample of a rom variable. This leads to Bayesian non-bayesian estimators, respectively. Special cases Noisy observation model: r = f (a, n) where n (the noise component) is independent of a, has a pdf p N (n). Linear model: for a known h, with n independent of a r = ha + n Linear Gaussian model: linear model with N N (0, Σ N ), A N (m A, Σ A ) 2 Bayesian Estimation Techniques Here, a A has a known a priori distribution p A (a). The most important Bayesian estimators are the MAP (maximum a posteriori) estimator the MMSE (minimum mean squared error) estimator the linear MMSE estimator The latter two will be covered in this section. We remind that the MAP estimate is given by â MAP (r) = arg max a A p A R (a r). 2.1 Minimum Mean Squared Error (MMSE) estimation 2.1.1 General formulation The MMSE estimator minimizes the expected estimation error C = da (a â (r)) 2 p A R (a r) p R (r) dr A R { (A â (R)) 2}. Taking the derivative (assuming this is possible) w.r.t. â (r) setting the result to zero yields â (r) = a p A R (a r) da. 1 We will only consider estimation of scalars. Extension to vectors is straightforward. A 2

2.1.2 Important special case: Gaussian model When A R are jointly Gaussian, we can write where [A, R] N (m, Σ) Σ = m = [ ma m R ] [ ] ΣA Σ T AR. Σ AR Σ R In that case it can be shown (after some straightforward manipulation), that A R N ( m A + Σ T ARΣ 1 R (r m R), Σ A Σ T ARΣ 1 R Σ ) AR so that Note that â MMSE (r) = m A + Σ T ARΣ 1 R (r m R) (1) A posteriori (after observing r), the uncertainty w.r.t. a is reduced from Σ A to Σ A Σ T AR Σ 1 R Σ AR < Σ A. When A R are independent Σ AR = 0, so that â MMSE (r) = m A For A R jointly Gaussian, â MAP (r) = â MMSE (r). The estimator is linear in the observation 2.1.3 Example: Linear Gaussian Model Problem: Derive the MMSE estimator for the Linear Gaussian Model Solution: We know that R = ha + N with A N (m A, Σ A ), N N (0, Σ N ) with A N independent. We also know that the MMSE estimator is given by: Let us compute m R, Σ R Σ AR. â MMSE (r) = m A + Σ T ARΣ 1 R (r m R). m R {R} = hm A Σ AR {(R m R ) (A m A )} {(ha + N hm A ) (A m A )} {(h (A m A ) + N) (A m A )} { = he (A m A ) 2} + E {N (A m A )} = hσ A Σ R {(R } m R ) (R m R ) T {(h } (A m A ) + N) (h (A m A ) + N) T = he {(A m A ) 2} h T + E { NN T } = hσ A h T + Σ N. 3

This leads to â MMSE (r) = m A + Σ T ARΣ 1 R (r m R) = m A + Σ A h T ( hσ A h T + Σ N (r hma ) 2.2 Linear estimation In some cases, it is preferred to have an estimator which is a linear function of the observation: â (r) = b T r + c so that â (r) is obtained through an affine transformation of the observation. Clearly, when A R are jointly Gaussian, the MMSE estimator is a linear estimator with b T = Σ T ARΣ 1 R c = m A + Σ T ARΣ 1 R (m R). 2.3 Linear MMSE Estimator When A R are not jointly Gaussian, we can still use (1) as an estimator. This is known as the Linear MMSE (LMMSE) estimator: â LMMSE (r) = m A + Σ T ARΣ 1 R (r m R) Theorem: We wish to find an estimator with the following properties the estimator must be linear in the observation E {â (r)} = m A (this can be interpreted as a form of unbiasedness) the estimator has the minimum MSE among all linear estimators Then the estimator is the LMMSE estimator. Proof: Given a generic linear estimator â (r) = b T r + c (2) we would like to find b c such that the resulting estimator is unbiased has minimum variance. Clearly E {â (r)} = b T m R + c so that b c have to satisfy We also know that b T m R + c = m A c = m A b T m R c 2 = m 2 A + b T m R m T Rb 2m A b T m R Σ R {(R } m R ) (R m R ) T { RR T } m R m T R Σ AR {(R m R ) (A m A )} {RA} m R m A. 4

Let us now look at the variance of the estimation error (the estimation error is zero-mean): V {(â (R) A) 2} { (b T R + c A ) } 2 = b T E { RR T } b + E { A 2} + c 2 + 2cb T m R 2cm A 2b T E {RA} = b T ( Σ R + m R m T R) b + ΣA + m 2 A + c 2 + 2c ( b T ) m R m A 2b T (Σ AR + m R m A ) = b T ( Σ R + m R m T R) b + ΣA + m 2 A c 2 2b T (Σ AR + m R m A ) = b T Σ R b + Σ A 2b T Σ AR where we have used the fact that c 2 = m 2 A + bt m R m T R b 2m Ab T m R. Taking derivative w.r.t. b equating the result to zero, yields 2 2Σ R b 2Σ AR = 0 thus (since Σ 1 R is symmetric) b = Σ 1 R Σ AR (3) Substitution of (3) (4) into (2) gives us the final result which is clearly the desired result. QED 2.4 Orthogonality Condition 2.4.1 Principle c = m A b T m R = m A Σ T ARΣ 1 R m R (4) â (r) = m A + Σ T ARΣ 1 R (r m R) The orthogonality condition is useful in deriving (L-)MMSE estimators. Theorem: for LMMSE estimation, the estimation error is (statistically) orthogonal to the observation when m R = 0 m A = 0: Proof: QED. 2.4.2 Example: Linear Model E {R (â (R) A)} { R ( Σ T ARΣ 1 R R A)} { RR T } ( Σ 1 R Σ AR) E {RA} = Σ R ( Σ 1 R Σ AR) ΣAR = 0. Problem: Verify the orthogonality condition assuming m A = 0. Solution: For m A = 0, we need to verify that E {R (â (R) A)} = 0, we see that we need to verify that E {Râ (R)} {RA} = hσ A 2 To see this requires some knowledge of vector calculus: differentiation w.r.t. x gives us x T Ax 2Ax x T y y. 5

{ E {Râ (R)} R (Σ A h T ( hσ A h T )} + Σ N R { RR T } ( hσ A h T + Σ N ΣA h = ( hσ A h T + Σ N ) ( hσa h T + Σ N ΣA h = Σ A h 3 Non-Bayesian Estimation Techniques The above techniques cannot be applied when we do not consider A to be a rom variable. Another approach is called for, where we would like unbiased estimators with small variances. The most important non-bayesian estimators are the ML (maximum likelihood) estimator the BLU (best linear unbiased) estimator the LS (least squares) estimator The latter two will be covered in this section. All expectations are taken for a given value of a. We remind that the ML estimate is given by â ML (r) = arg max a A p R A (r a). 3.1 The BLU estimator As before, we consider a generic linear estimator: â (r) = b T r + c. To obtain an unbiased estimator, we need to assume that c = 0 that E {R} = ha for some known vector h. We introduce Σ R {(R } E {R}) (R E {R}) T { RR T } a 2 hh T which is assumed to be known to the estimator. Our goal is to find a b that leads to an unbiased estimator with minimal variance for all a. Unbiased An unbiased estimator must satisfy E {â (r)} = a, for all a A, so that b T h = 1. 6

Variance of Estimation Error The variance of the estimation error is given by V a {(â (R) a) 2} { (b T r a ) } 2 { (b T (r ha) ) } 2 { = b T E (r ha) 2} b = b T Σ R b. This leads us to the following optimization problem: find b that minimizes V a, subject to b T h = 1. Using a Lagrange multiplier technique, we find we need to minimize b T Σ R b λ ( b T h 1 ). This leads to With the constraint b T h = 1 giving rise to b = Σ 1 R hλ. λh T Σ 1 R h = 1 so that b = Σ 1 R h ( h T Σ 1 R h V a = ( h T Σ 1 R h h T Σ 1 R Σ RΣ 1 R h ( h T Σ 1 R h = ( h T Σ 1 R h. 3.1.1 Example: Linear Gaussian Model Problem: derive the BLU estimator for the Linear Gaussian Model Solution: where Hence â BLU (r) = ( h T Σ 1 R h h T Σ 1 R r Σ R {(R } m R ) (R m R ) T {(R } ha) (R ha) T {(ha } + N ha) (ha + N ha) T = Σ N. â BLU (r) = ( h T Σ 1 N h h T Σ 1 N r 7

3.2 The LS estimator If we refer back to our problem statement, the LS estimator tries to find a that minimizes the distance the between the observation the reconstructed noiseless observation: { d r h (a, 0) 2}. When h (a, 0) = ha, we find that d { r ha 2} { r T r } + a 2 h T h 2ar T h Minimizing w.r.t. a gives us finally 2ah T h 2r T h = 0 3.3 Orthogonality condition â LS (r) = ( h T h h T r. The reconstruction error is orthogonal to the parameter: (r hâ LS (r)) T a = ( r h ( h T h h T r) T a 4 Extension to vector estimation When = (r r) T a = 0 r = Ha + n with r n N 1 vectors, H a N M matrix, a an M 1 vector. The MMSE, LS BLU estimates are then given by 5 Further problems â LMMSE (r) = m A + Σ A H T ( HΣ A H T + Σ N (r HmA ) = m A + ( H T Σ 1 N H + Σ 1 A H T Σ 1 N (r Hm A) â LS (r) = ( H T H H T r â BLU (r) = ( H T Σ 1 N H H T Σ 1 N r 5.1 Problem 1: relations for Linear Gaussian Models How are MMSE, LS BLU related? Consider the scalar case. The estimators are given by â MMSE (r) = m A + Σ A h T ( hσ A h T + Σ N (r hma ) â LS (r) = ( h T h h T r â BLU (r) = ( h T Σ 1 N h h T Σ 1 N r 8

5.1.1 No a priori information: MMSE LS When m A = 0 Σ 1 A = 0 we get 5.1.2 Uncorrelated noise: BLU LS When Σ N = σ 2 I, 5.1.3 Zero-noise: MMSE LS For the MMSE estimator, when Σ N = 0, 5.2 Problem 2: orthogonality â MMSE (r) = h T ( hh T r = ( hh T h T r = â LS (r) â BLU (r) = ( h T h h T r = â LS (r) â MMSE (r) = m A + h T ( hh T (r hma ) = h T ( hh T r = â LS (r). Problem: Derive the LMMSE estimator, starting from the orthogonality condition. Solution: We introduce Q = A m A, S = R m R = R hm A, find the LMMSE estimate of Q from the observation S. The orthogonality principle then tells us that with, for some (as yet undetermined) b: so that Hence The estimate of A is then given by E {Sˆq (S)} {SQ} = hσ A E {Sˆq (S)} { Sb T S } = ( hσ A h T + Σ N ) b b = ( hσ A h T + Σ N ΣA h. ˆq LMMSE (s) = Σ A h T ( hσ A h T + Σ N s. â LMMSE (r) = ˆq LMMSE (r m R ) + m A = m A + Σ A h T ( hσ A h T + Σ N (r hma ). 5.3 Practical example: multi-antenna communication In this problem, we show how LMMSE LS can be used when ML or MAP leads to algorithms which are too complex to implement. 9

5.3.1 Problem We are interested in the following problem: we transmit a vector of n T iid data symbols a A = Ω n T over an AWGN channel. Here n T is the number of transmit antennas. The receiver is equipped with n R receive antennas. We can write the observation as r = Ha + n where H is a (known) n R n T channel matrix N N ( 0, σ 2 I nr ). In any practical communication scheme, Ω is a finite set of size M (e.g., BPSK signaling with Ω = { 1, +1}). The symbols are iid, uniformly distributed over Ω. 1. Determine the ML MAP estimates of a. Determine the complexity of the receiver (number of operations to estimate a) as a function of n T. 2. How can we use LMMSE LS be used to estimate a? We assume Σ A = I nt m A = 0. Determine the complexity of the resulting estimators as a function of n T. 5.3.2 Solution Solution - part 1 The MAP ML estimators are considered to be optimal in the case of estimating discrete parameters, in a sense of minimizing the error probability. â ML (r) = arg max p R A (r a) = arg max log p R A (r a) = arg max 1 r Ha 2 σ2 = arg min r Ha 2 â MAP (r) = arg max p A R (a r) = arg max p R A (r a) p A (a) p R (r) = arg max p R A (r a) = â ML (r) Complexity: for each a Ω n T, we need to compute r Ha 2. Hence, the complexity is exponential in the number of transmit antennas. Special case: Let us assume Ω = { 1, +1} n T = 1, n R > 1. Then â ML (r) = arg max a Ω ( ah T r ) In this case we combine the observations from multiple receive antennas. Each observation is weighted with the channel gain. This means that when the channel is unreliable on a given antenna (so that h k is small), this has only a small contribution to our decision statistic. 10

Solution - part 2 Let us temporarily forget that a lives in a discrete space. We can then introduce the LMMSE LS estimates as follows: â LMMSE (r) = H T ( HH T + σ 2 I nr r â LS (r) = ( H T H H T r. Note that â LMMSE (r) â LS (r) may not belong to Ω n T. Hence, we need to quantize these estimates (mapping the estimate to the closest point in Ω n T ); this can be done on a symbol-per-symbol basis. Generally the LS estimator will have poor performance when HH T is close to singular. Note that the LMMSE estimator requires knowledge of σ 2, while the MAP ML estimators do not. Complexity: now the complexity is of the order n R n T (since any matrix not depending on r can be pre-computed by the receiver), which is linear in the number of transmit antennas. Special case: -=-=-Comment: there was a mistake in the lecture. These are the correct results-=-=- Let us assume that we use more transmit that receive antennas: n T > n R. In that case the the n R n R matrix HH T is necessarily singular, so that LS estimation will not work. The LMMSE estimator requires the inversion of HH T + σ 2 I nr. When σ 2 0, this matrix is always invertible. This is a strange situation, where noise is actually helpful in the estimation process. 11