Parameter Estimation

Similar documents
ECE531 Lecture 8: Non-Random Parameter Estimation

EIE6207: Maximum-Likelihood and Bayesian Estimation

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

ECE531 Homework Assignment Number 6 Solution

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Digital Band-pass Modulation PROF. MICHAEL TSAI 2011/11/10

Chapter 2 Signal Processing at Receivers: Detection Theory

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

This examination consists of 11 pages. Please check that you have a complete copy. Time: 2.5 hrs INSTRUCTIONS

Statistical learning. Chapter 20, Sections 1 3 1

ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations

This examination consists of 10 pages. Please check that you have a complete copy. Time: 2.5 hrs INSTRUCTIONS

BASICS OF DETECTION AND ESTIMATION THEORY

Continuous Random Variables

Linear Block Codes. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Lecture 7: Wireless Channels and Diversity Advanced Digital Communications (EQ2410) 1

Digital Communications

ECE531 Lecture 10b: Maximum Likelihood Estimation

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Signal Design for Band-Limited Channels

Linear Regression and Discrimination

Bayesian Learning (II)

Carrier frequency estimation. ELEC-E5410 Signal processing for communications

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Estimation techniques

ECE 564/645 - Digital Communications, Spring 2018 Homework #2 Due: March 19 (In Lecture)

Direct-Sequence Spread-Spectrum

CHAPTER 14. Based on the info about the scattering function we know that the multipath spread is T m =1ms, and the Doppler spread is B d =0.2 Hz.

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 3 1

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

A Probability Review

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Machine learning - HT Maximum Likelihood

Detecting Parametric Signals in Noise Having Exactly Known Pdf/Pmf

COS513 LECTURE 8 STATISTICAL CONCEPTS

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Bayesian Decision and Bayesian Learning

Statistics: Learning models from data

Digital Modulation 1

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

System Identification, Lecture 4

CSC321 Lecture 18: Learning Probabilistic Models

ECE531 Lecture 2b: Bayesian Hypothesis Testing

Naïve Bayes classification

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EE4512 Analog and Digital Communications Chapter 4. Chapter 4 Receiver Design

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Lecture 2

Maximum Likelihood Diffusive Source Localization Based on Binary Observations

Square Root Raised Cosine Filter

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

CS 195-5: Machine Learning Problem Set 1

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

Probability and Estimation. Alan Moses

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

ECE Homework Set 3

ECE6604 PERSONAL & MOBILE COMMUNICATIONS. Week 3. Flat Fading Channels Envelope Distribution Autocorrelation of a Random Process

System Identification, Lecture 4

Digital Transmission Methods S

Statistical Learning. Philipp Koehn. 10 November 2015

2014/2015 Smester II ST5224 Final Exam Solution

that efficiently utilizes the total available channel bandwidth W.

Lecture 12. Block Diagram

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

STONY BROOK UNIVERSITY. CEAS Technical Report 829

Hierarchical Models & Bayesian Model Selection

JOINT ITERATIVE DETECTION AND DECODING IN THE PRESENCE OF PHASE NOISE AND FREQUENCY OFFSET

Lecture 4: Probabilistic Learning

Multi User Detection I

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

GWAS V: Gaussian processes

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Chapter Review of of Random Processes

EE 574 Detection and Estimation Theory Lecture Presentation 8

EIE6207: Estimation Theory

Lecture 18: Bayesian Inference

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

3F1 Random Processes Examples Paper (for all 6 lectures)

GWAS IV: Bayesian linear (variance component) models

Bayesian RL Seminar. Chris Mansley September 9, 2008

Lecture 8: Information Theory and Statistics

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

First Year Examination Department of Statistics, University of Florida

A Simple Example Binary Hypothesis Testing Optimal Receiver Frontend M-ary Signal Sets Message Sequences. possible signals has been transmitted.

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 3: Pattern Classification

UCSD ECE250 Handout #27 Prof. Young-Han Kim Friday, June 8, Practice Final Examination (Winter 2017)

Transcription:

1 / 44 Parameter Estimation Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay October 25, 2012

Motivation

System Model used to Derive Optimal Receivers s(t) Channel y(t) y(t) = s(t) + n(t) s(t) Transmitted Signal y(t) Received Signal n(t) Noise Simplified System Model. Does Not Account For Propagation Delay Carrier Frequency Mismatch Between Transmitter and Receiver Clock Frequency Mismatch Between Transmitter and Receiver In short, Lies! Why? 3 / 44

You want answers? 4 / 44

I want the truth! 5 / 44

6 / 44 You can't handle the truth!... right at the beginning of the course. Now you can.

7 / 44 Why Study the Simplified System Model? s(t) Channel y(t) y(t) = s(t) + n(t) Receivers estimate propagation delay, carrier frequency and clock frequency before demodulation Once these unknown parameters are estimated, the simplified system model is valid Then why not study parameter estimation first? Hypothesis testing is easier to learn than parameter estimation Historical reasons

8 / 44 Unsimplifying the System Model Effect of Propagation Delay Consider a complex baseband signal s(t) = n= b n p(t nt ) and the corresponding passband signal s p (t) = Re [ 2s(t)e j2πf ct ]. After passing through a noisy channel which causes amplitude scaling and delay, we have y p (t) = As p (t τ) + n p (t) where A is an unknown amplitude, τ is an unknown delay and n p (t) is passband noise

9 / 44 Unsimplifying the System Model Effect of Propagation Delay The delayed passband signal is [ ] s p (t τ) = Re 2s(t τ)e j2πf c(t τ) [ = Re 2s(t τ)e jθ e j2πfct] where θ = 2πf c τ mod 2π. For large f c, θ is modeled as uniformly distributed over [0, 2π]. The complex baseband representation of the received signal is then y(t) = Ae jθ s(t τ) + n(t) where n(t) is complex Gaussian noise.

10 / 44 Unsimplifying the System Model Effect of Carrier Offset Frequency of the local oscillator (LO) at the receiver differs from that of the transmitter Suppose the LO frequency at the transmitter is f c s p (t) = Re [ 2s(t)e j2πf ct ]. Suppose that the LO frequency at the receiver is f c f The received passband signal is y p (t) = As p (t τ) + n p (t) The complex baseband representation of the received signal is then y(t) = Ae j(2π ft+θ) s(t τ) + n(t)

11 / 44 Unsimplifying the System Model Effect of Clock Offset Frequency of the clock at the receiver differs from that of the transmitter The clock frequency determines the sampling instants at the matched filter output Suppose the symbol rate at the transmitter is 1 T symbols per second Suppose the receiver sampling rate is 1+δ T symbols per second where δ 1 and δ may be positive or negative The actual sampling instants and ideal sampling instants will drift apart over time

12 / 44 The Solution Estimate the unknown parameters τ, θ, f and δ Timing Synchronization Estimation of τ Carrier Synchronization Estimation of θ and f Clock Synchronization Estimation of δ Perform demodulation after synchronization

Parameter Estimation

14 / 44 Parameter Estimation Hypothesis testing was about making a choice between discrete states of nature Parameter or point estimation is about choosing from a continuum of possible states Example Consider the complex baseband signal below y(t) = Ae jθ s(t τ) + n(t) The phase θ can take any real value in the interval [0, 2π) The amplitude A can be any real number The delay τ can be any real number

15 / 44 System Model for Parameter Estimation Consider a family of distributions Y P θ, θ Λ where the observation vector Y Γ R n for n N and Λ R m is the parameter space Example: Y = A + N where A is an unknown parameter and N is a standard Gaussian RV The goal of parameter estimation is to find θ given Y An estimator is a function from the observation space to the parameter space ˆθ : Γ Λ

16 / 44 Which is the Optimal Estimator? Assume there is a cost function C which quantifies the estimation error C : Λ Λ R such that C[a, θ] is the cost of estimating the true value of θ as a Examples of cost functions Squared Error C[a, θ] = (a θ) 2 Absolute Error C[a, θ] = a { θ 0 if a θ Threshold Error C[a, θ] = 1 if a θ >

17 / 44 Which is the Optimal Estimator? With an estimator ˆθ we associate a conditional cost or risk conditioned on θ ]} R θ (ˆθ) = E θ {C [ˆθ(Y), θ Suppose that the parameter θ is the realization of a random variable Θ The average risk or Bayes risk is given by { } r(ˆθ) = E R Θ (ˆθ) The optimal estimator is the one which minimizes the Bayes risk

18 / 44 Given that Which is the Optimal Estimator? ]} R θ (ˆθ) = E θ {C [ˆθ(Y), θ = E { C [ˆθ(Y), Θ] } Θ = θ the average risk or Bayes risk is given by { ]} r(ˆθ) = E C [ˆθ(Y), Θ { { = E E C [ˆθ(Y), Θ] }} Y The optimal estimate for θ can be found by minimizing for each Y = y the posterior cost { ] E C [ˆθ(y), } Θ Y = y

19 / 44 Minimum-Mean-Squared-Error (MMSE) Estimation C[a, θ] = (a θ) 2 The posterior cost is given by { } E (ˆθ(y) Θ) 2 Y = y = [ˆθ(y)] 2 The Bayes estimate is given by { } ˆθ MMSE (y) = E Θ Y = y { } 2ˆθ(y)E Θ Y = y { } +E Θ 2 Y = y

20 / 44 Example 1: MMSE Estimation Suppose X and Y are jointly Gaussian random variables Let the joint pdf be given by ( 1 p XY (x, y) = exp 1 ) 2π Σ 1 2 2 (s µ)t Σ 1 (s µ) [ ] [ ] [ ] x µx σ 2 where s =, µ = and Σ = x ρσ x σ y y µ y ρσ x σ y σy 2 Suppose Y is observed and we want to estimate X The MMSE estimate of X is [ ] ˆX MMSE (y) = E X Y = y

21 / 44 Example 1: MMSE Estimation The conditional distribution of X given Y = y is a Gaussian RV with mean and variance µ X y = µ x + σ x σ y ρ(y µ y ) σ 2 X y = (1 ρ2 )σ 2 x Thus the MMSE estimate of X given Y = y is ˆX MMSE (y) = µ x + σ x σ y ρ(y µ y )

22 / 44 Example 2: MMSE Estimation Suppose A is a Gaussian RV with mean µ and known variance v 2 Suppose we observe Y i, i = 1, 2,..., M such that Y i = A + N i where N i s are independent Gaussian RVs with mean 0 and known variance σ 2 Suppose A is independent of the N i s The MMSE estimate is given by  MMSE (y) = Mv 2 σ 2 Â1(y) + µ Mv 2 σ 2 + 1 where Â1(y) = 1 M M i=1 y i

23 / 44 Minimum-Mean-Absolute-Error (MMAE) Estimation C[a, θ] = a θ The Bayes estimate ˆθ ABS is given by the median of the posterior density p(θ Y = y) ( ) ( ) Pr Θ < t Y = y Pr Θ > t Y = y, t < ˆθ ABS (y) ( ) ( ) Pr Θ < t Y = y Pr Θ > t Y = y, t > ˆθ ABS (y) p(θ Y = y) ( ) Pr Θ < t ( Y = y ) Pr Θ > t Y = y t ˆθ ABS (y)

24 / 44 Minimum-Mean-Absolute-Error (MMAE) Estimation For Pr[X 0] = 1, E[X] = 0 Pr[X > x] dx Since ˆθ(y) Θ 0 { } E ˆθ(y) Θ Y = y [ ] = Pr ˆθ(y) Θ > x Y = y dx 0 [ ] = Pr Θ > x + ˆθ(y) Y = y dx 0 [ ] + Pr Θ < x + ˆθ(y) Y = y dx 0 [ ] = Pr Θ > t Y = y dt + ˆθ(y) ˆθ(y) [ Pr ] Θ < t Y = y dt

Minimum-Mean-Absolute-Error (MMAE) Estimation { } Differentiating E ˆθ(y) Θ Y = y wrt to ˆθ(y) { } ˆθ(y) E ˆθ(y) Θ Y = y [ ] = Pr Θ > t ˆθ(y) Y = y dt ˆθ(y) + ˆθ(y) [ ] Pr Θ < t ˆθ(y) Y = y dt [ ] [ ] = Pr Θ < ˆθ(y) Y = y Pr Θ > ˆθ(y) Y = y The derivative is nondecreasing tending to 1 as ˆθ(y) and +1 as ˆθ(y) The minimum risk is achieved at the point the derivative changes sign 25 / 44

26 / 44 Minimum-Mean-Absolute-Error (MMAE) Estimation Thus the MMAE ˆθ ABS is given by any value θ such that ( ) ( ) Pr Θ < t Y = y Pr Θ > t Y = y, t < ˆθ ABS (y) ( ) ( ) Pr Θ < t Y = y Pr Θ > t Y = y, t > ˆθ ABS (y) Why not the following expression? ( ) ( ) Pr Θ < ˆθ ABS (y) Y = y = Pr Θ ˆθ ABS (y) Y = y Why not the following expression? ( ) ( ) Pr Θ < ˆθ ABS (y) Y = y = Pr Θ > ˆθ ABS (y) Y = y MMAE estimation for discrete distributions requires the more general expression above

27 / 44 Maximum A Posteriori (MAP) Estimation The MAP estimator is given by ( ) ˆθ MAP (y) = argmax p θ Y = y θ It can be obtained as the optimal estimator for the threshold cost function { 0 if a θ C[a, θ] = 1 if a θ > for small > 0

Maximum A Posteriori (MAP) Estimation For the threshold cost function, we have 1 { E C [ˆθ(y), Θ] } Y = y ( ) = C[ˆθ(y), θ]p θ Y = y dθ = = ˆθ(y) = 1 ( p ˆθ(y)+ ( ) p θ Y = y ) θ Y = y dθ ˆθ(y) ( p ( ) dθ + p θ Y = y ˆθ(y)+ ˆθ(y)+ ( ) p θ Y = y dθ ˆθ(y) ) θ Y = y dθ dθ The Bayes estimate is obtained by maximizing the integral in the last equality 1 Assume a scalar parameter θ for illustration 28 / 44

29 / 44 Maximum A Posteriori (MAP) Estimation p(θ Y = y) ( ) ˆθ(y)+ ˆθ(y) p θ Y = y ˆθ(y) The shaded area is the integral ) ˆθ(y)+ (θ ˆθ(y) p Y = y dθ To maximize this integral, the location of ˆθ(y) should be chosen to be the value of θ which maximizes p(θ Y = y)

30 / 44 Maximum A Posteriori (MAP) Estimation p(θ Y = y) ( ) ˆθ(y)+ ˆθ(y) p θ Y = y ˆθMAP(y) This argument is not airtight as p(θ Y = y) may not be symmetric at the maximum But the MAP estimator is widely used as it is easier to compute than the MMSE or MMAE estimators

31 / 44 Maximum Likelihood (ML) Estimation The ML estimator is given by ( ) ˆθ ML (y) = argmax p Y = y θ θ It is the same as the MAP estimator when the prior probability distribution of Θ is uniform It is also used when the prior distribution is not known

32 / 44 Example 1: ML Estimation Suppose we observe Y i, i = 1, 2,..., M such that Y i N (µ, σ 2 ) where Y i s are independent, µ is unknown and σ 2 is known The ML estimate is given by M ˆµ ML (y) = 1 M i=1 y i Assignment 5

33 / 44 Example 2: ML Estimation Suppose we observe Y i, i = 1, 2,..., M such that Y i N (µ, σ 2 ) where Y i s are independent, both µ and σ 2 are unknown The ML estimates are given by ˆµ ML (y) = 1 M ˆσ 2 ML (y) = 1 M M i=1 y i M (y i ˆµ ML (y)) 2 i=1 Assignment 5

34 / 44 Example 3: ML Estimation Suppose we observe Y i, i = 1, 2,..., M such that Y i Bernoulli(p) where Y i s are independent and p is unknown The ML estimate of p is given by M ˆp ML (y) = 1 M i=1 y i Assignment 5

35 / 44 Example 4: ML Estimation Suppose we observe Y i, i = 1, 2,..., M such that Y i Uniform[0, θ] where Y i s are independent and θ is unknown The ML estimate of θ is given by ˆθ ML (y) = max (y 1, y 2,..., y M 1, y M ) Assignment 5

36 / 44 Reference Chapter 4, An Introduction to Signal Detection and Estimation, H. V. Poor, Second Edition, Springer Verlag, 1994.

Parameter Estimation of Random Processes

38 / 44 ML Estimation Requires Conditional Densities ML estimation involves maximizing the conditional density wrt unknown parameters Example: Y N (θ, σ 2 ) where θ is known and σ 2 is unknown ( ) p Y = y θ 1 (y θ)2 = e 2σ 2 2πσ 2 Suppose the observation is the realization of a random process y(t) = Ae jθ s(t τ) + n(t) What is the conditional density of y(t) given A, θ and τ?

39 / 44 Maximizing Likelihood Ratio for ML Estimation Consider Y N (θ, σ 2 ) where θ is unknown and σ 2 is known 1 (y θ)2 p(y θ) = e 2σ 2 2πσ 2 Let q(y) be the density of a Gaussian with distribution N (0, σ 2 ) 1 y2 q(y) = e 2σ 2 2πσ 2 The ML estimate of θ is obtained as ˆθ ML (y) = p(y θ) argmax p(y θ) = argmax θ θ q(y) = argmax L(y θ) θ where L(y θ) is called the likelihood ratio

40 / 44 Likelihood Ratio and Hypothesis Testing The likelihood ratio L(y θ) is the ML decision statistic for the following binary hypothesis testing problem H 1 : Y N (θ, σ 2 ) H 0 : Y N (0, σ 2 ) where θ is assumed to be known H 0 is a dummy hypothesis which makes calculation of the ML estimator easy for random processes

41 / 44 Likelihood Ratio of a Signal in AWGN Let H s (θ) be the hypothesis corresponding the following received signal H s (θ) : y(t) = s θ (t) + n(t) where θ can be a vector parameter Define a noise-only dummy hypothesis H 0 H 0 : y(t) = n(t) Define Z and y (t) as follows Z = y, s θ y (t) = y(t) y, s θ s θ(t) s θ 2 Z and y (t) completely characterize y(t)

42 / 44 Likelihood Ratio of a Signal in AWGN Under both hypotheses y (t) is equal to n (t) where n (t) = n(t) n, s θ s θ(t) s θ 2 n (t) is independent of the noise component in Z and has the same distribution under both hypotheses n (t) is irrelevant for this binary hypothesis testing problem The likelihood ratio of y(t) equals the likelihood ratio of Z under the following hypothesis testing problem H s (θ) : Z N ( s θ 2, σ 2 s θ 2 ) H 0 (θ) : Z N (0, σ 2 s θ 2 )

Likelihood Ratio of Signals in AWGN The likelihood ratio of signals in real AWGN is ( 1 L(y s θ ) = exp σ 2 [ y, s θ s θ 2 ]) The likelihood ratio of signals in complex AWGN is ( 1 L(y s θ ) = exp σ 2 2 [Re( y, s θ ) s θ 2 ]) Maximizing these likelihood ratios as functions of θ results in the ML estimator 2 43 / 44

Thanks for your attention 44 / 44