EIE6207: Maximum-Likelihood and Bayesian Estimation

Similar documents
EIE6207: Estimation Theory

Detection theory. H 0 : x[n] = w[n]

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

COS513 LECTURE 8 STATISTICAL CONCEPTS

Parameter Estimation

Parametric Techniques Lecture 3

Parametric Techniques

Probability and Estimation. Alan Moses

Advanced Signal Processing Introduction to Estimation Theory

Density Estimation. Seungjin Choi

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Bayesian Decision and Bayesian Learning

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CSC321 Lecture 18: Learning Probabilistic Models

A Few Notes on Fisher Information (WIP)

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Detection & Estimation Lecture 1

ECE 275A Homework 7 Solutions

Econometrics I, Estimation

Rowan University Department of Electrical and Computer Engineering

Linear Models A linear model is defined by the expression

PATTERN RECOGNITION AND MACHINE LEARNING

Bayesian Learning (II)

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Detection & Estimation Lecture 1

6.867 Machine Learning

Brief Review on Estimation Theory

Hierarchical Models & Bayesian Model Selection

STAT 730 Chapter 4: Estimation

Lecture 2: Priors and Conjugacy

ECE531 Lecture 10b: Maximum Likelihood Estimation

Statistics: Learning models from data

6.867 Machine Learning

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

F & B Approaches to a simple model

Lecture 4: Probabilistic Learning

Estimation and Detection

Module 1 - Signal estimation

Detection and Estimation Theory

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Bayesian Inference and MCMC

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Some Probability and Statistics

Density Estimation: ML, MAP, Bayesian estimation

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Quick Tour of Basic Probability Theory and Linear Algebra

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Introduction to Machine Learning

Technique for Numerical Computation of Cramér-Rao Bound using MATLAB

Basic concepts in estimation

Review of Maximum Likelihood Estimators

Bayesian Methods: Naïve Bayes

Estimation Theory Fredrik Rusek. Chapters

Learning Bayesian network : Given structure and completely observed data

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Lecture 18: Bayesian Inference

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Computational Cognitive Science

Some Probability and Statistics

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Chapter 8: Least squares (beginning of chapter)

ECE 275A Homework 6 Solutions

Introduction to Probabilistic Machine Learning

Machine Learning (CSE 446): Probabilistic Machine Learning

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Detecting Parametric Signals in Noise Having Exactly Known Pdf/Pmf

Naïve Bayes classification

10. Linear Models and Maximum Likelihood Estimation

CPSC 540: Machine Learning

5.2 Fisher information and the Cramer-Rao bound

Bayesian Regression Linear and Logistic Regression

Mathematical statistics

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

PMR Learning as Inference

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Lecture 5 September 19

CS-E3210 Machine Learning: Basic Principles

Fundamentals of Statistical Signal Processing Volume II Detection Theory

Estimation Theory Fredrik Rusek. Chapter 11

Classical Estimation Topics

g-priors for Linear Regression

Chapters 9. Properties of Point Estimators

Estimators as Random Variables

Estimation. Max Welling. California Institute of Technology Pasadena, CA

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Introduction to Maximum Likelihood Estimation

Transcription:

EIE6207: Maximum-Likelihood and Bayesian Estimation Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: Steven M. Kay, Fundamentals of Statistical Signal Processing, Prentice Hall, 1993. http://www.cs.tut.fi/~hehu/ssp/ ovember 12, 2018 Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 1 / 31

Overview 1 Introduction to ML Estimators 2 Biased and Unbiased ML Estimators 3 MLE of Transformed Parameters 4 Application: Range Estimation in Radar 5 Bayesian Estimators Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 2 / 31

What is ML Estimator? Maximum likelihood (ML) is the most popular estimation approach due to its applicability in complicated estimation problems. The basic principle is simple: Find the parameter θ that is the most probable to have generated the data x. The ML estimator is in general neither unbiased nor optimal in the minimum variance sense. However, asymptotically it becomes unbiased and reaches the Cramer-Rao bound. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 3 / 31

Definition The Maximum Likelihood estimate for a scalar parameter θ is defined to be the value that maximizes p(x; θ). log p(x; θ) is a log-likelihood function of θ. ML estimate is θ ML = argmax{log p(x; θ)} θ The figure next page shows the likelihood function and the log-likelihood function for one possible realization of data. The data consists of 50 points, with true A = 5. The likelihood function gives the probability of observing these particular points with different values of A. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 4 / 31

Example 1 Consider the DC level in WG: x[n] = A + w[n], n = 0, 1,..., 1, where w[n] (0, σ 2 ). The likelihood and log-likelihood functions are shown below: Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 5 / 31

Example 1 We maximize log p(x; A) with respect to A: Setting Setting A ML =  = argmax log p(x; A) = argmax A log p(x;a) A A { 2 log(2πσ2 ) 1 2σ 2 } (x[n] A) 2 = 0, we have the ML estimator  = 1 x[n] log p(x;a) σ 2 = 0, we have the ML estimator for σ 2 : σml 2 = ˆσ 2 = 1 (x[n] Â)2 (1) Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 6 / 31

Example 1 Â is unbiased because E{Â} = 1 ˆσ 2 is biased because { } E{ˆσ 2 1 } = E (x[n] Â)2 { } { 1 = E (x[n]) 2 E σ 2 To proof that, we need to use E{x[n]} = 1 A = A 2 x[n]â E{z 2 } = cov(z, z) + µ z = σ 2 z + µ z } + E {Â2 } (2) (3) Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 7 / 31

Example 1 The first term in Eq. 2 is { } [ 1 E (x[n]) 2 = 1 ] (cov(x[n], x[n]) + A 2 ) [ = 1 ] (σ 2 + A 2 ) = σ 2 + A 2 The second term in Eq. 2 is { } { 2 2 E x[n]â = E = 2 2 E { m=0 x[n]x[m] = 2 2 [ σ 2 + 2 A 2] = 2 } x[n] 1 = 2 2 (A 2 + σ2 x[m] m=0 m=0 ) } (cov(x[n], x[m]) + A 2 ) Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 8 / 31

Example 1 The third term in Eq. 2 is E {Â2 } ( 1 = E [ = 1 2 x[n] ) 2 E{(x[n]) 2 } + m=0 = 1 2 [ (σ 2 + A 2 ) + ( 1)A 2] = 1 2 [ 2 A 2 + σ 2] = A 2 + σ2 E{x[n]x[m]} ] Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 9 / 31

Example 1 Combining the 3 terms, we have E{ˆσ 2 } = σ 2 + A 2 A 2 σ2 = σ 2 σ2 ( ) 1 = σ 2 σ 2 To make ˆσ 2 unbiased, we need to use ˆσ 2 = 1 (x[n] 1 Â)2 Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 10 / 31

Example 2 Consider the DC level in WG: x[n] = A + w[n], n = 0, 1,..., 1, where w[n] (0, A). CRLB cannot be used because log p(x; A) A = 2A + 1 (x[n] A) + 1 A 2A 2 (x[n] A) 2 I(A)(g(x) A) for any functions I(A) and g(x). However, we may use maximum-likelihood and set which gives  2 +  1 x 2 [n] = 0 =  = 1 2 + 1 x 2 [n] + 1 4 log p(x;a) A = 0, Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 11 / 31 (4)

Example 2 This estimator is biased because E{Â} = E 1 2 + 1 x 2 [n] + 1 4 1 { } 2 + 1 E x 2 [n] + 1 4 = 1 2 + A + A 2 + 1 4 = A, as expectation cannot carry over square root. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 12 / 31

Example 2 However, if is large enough, the bias is negligible. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 13 / 31

Example 2 The ML estimator in Eq. 4 is a reasonable estimator because when, 1 x 2 [n] E{x 2 (n)} = A + A 2 Therefore, Â 1 ( 2 + A + 1 ) = A when 2 The MLE always becomes optimal and unbiased as. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 14 / 31

MLE of Transformed Parameters Often it is required to estimate a transformed parameter instead of the one the PDF depends on. For example, in the DC-level problem we might be interested in the power of the signal A 2 instead of the mean A. Given x[n] = A + w[n], n = 0, 1,..., 1, where w[n] (0, σ 2 ), find the MLE of a transformed parameter: The log-likelihood function is α = exp(a) log p T (x; α) = 2 log(2πσ2 ) 1 2σ 2 (x[n] log α) 2 Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 15 / 31

MLE of Transformed Parameters Setting the derivative of log p T (x; α) to 0 yields (x[n] log ˆα) = 0 = ˆα = exp( x), 1ˆα where ˆα > 0. Things get more complicated if the transformation is We need to consider two PDFs: α = A 2 = A = ± α log p T1 (x; α) = const 1 2σ 2 (x[n] α) 2 for α 0, A 0 log p T2 (x; α) = const 1 2σ 2 (x[n] + α) 2 for α > 0, A < 0 Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 16 / 31

MLE of Transformed Parameters Then, we solve the ML estimation problem in both cases and choose the one that has higher maximum value: ˆα = argmax{p T1 (x; α), p T2 (x; α)} α It can be easily shown that the MLE is ˆα = Â2 = x 2. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 17 / 31

Invariance property of the MLE Given a PDF p(x; θ) paramaterized by θ, the MLE of the parameter α = g(θ) is ˆα = g(ˆθ), where ˆθ is the MLE of θ, which is obtained by maximizing p(x; θ). If g is not a one-to-one function, then ˆα maximizes the modified likelihood function p(x; θ) = max p(x; θ) θ:α=g(θ) Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 18 / 31

Application: Range Estimation in Radar In radar or sonar, a signal pulse is transmitted. The round trip delay τ 0 from the transmitter to the target and back is related to the range R as τ 0 = 2R/c, where c is the speed of propagation. In analog form, the received signal can be written as x(t) = s(t τ 0 ) + w(t) 0 t T, where s(t) is the transmitted signal and w(t) is noise with variance σ 2. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 19 / 31

Application: Range Estimation in Radar After discretisation, we have w[n] 0 n n 0 1 x[n] = s[n n 0 ] + w[n] n 0 n n 0 + M 1 w[n] n 0 + M n 1 where M is the length of the sampled signal and n 0 = F s τ 0, where F s is the sample rate, which must be at least twice the bandwidth of the signal. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 20 / 31

Application: Range Estimation in Radar Assume that everything is Gaussian, the PDF is p(x; n 0 ) = n 0 1 { 1 exp 1 } 2πσ 2 2σ 2 x2 [n] n 0 +M 1 { 1 exp 1 } n=n 0 2πσ 2 2σ 2 (x[n] s[n n 0) 2 n=n 0 +M { 1 = exp (2πσ 2 ) 2 n 0 +M 1 n=n 0 { 1 exp 1 } 2πσ 2 2σ 2 x2 [n] } 1 2σ 2 x 2 [n] { exp 1 ( 2x[n]s[n n0 2σ 2 ] + s 2 [n n 0 ] ) } Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 21 / 31

Application: Range Estimation in Radar Considering the term involving n 0, the MLE of n 0 can be found by maximizing { exp 1 n 0 +M 1 ( 2x[n]s[n n0 2σ 2 ] + s 2 [n n 0 ] ) } n=n 0 Or by minimizing n 0 +M 1 n=n 0 ( 2x[n]s[n n0 ] + s 2 [n n 0 ] ) ote that n 0 +M 1 n=n 0 s 2 [n n 0 ] = M 1 m=0 s2 [m], which is independent of n 0. So, the MLE of n 0 is found by maximizing n 0 +M 1 n=n 0 x[n]s[n n 0 ] = M 1 m=0 x[m + n 0 ]s[m] Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 22 / 31

Application: Range Estimation in Radar This means that the MLE of n 0 is found by correlating the transmitted signal s[n] with all possible received signals x[n] and then choosing the maximum. By the invariance principle, the MLE of the range is R = cτ 0 2 = cn 0 2F s Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 23 / 31

Bayesian Estimators Bayesian estimators differ from classical estimators in that they consider the parameters as random variables instead of unknown constants. The parameters also have a PDF, which needs to be taken into account when seeking for an estimator. The PDF of the parameters can be used for incorporating any prior knowledge we may have about its value. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 24 / 31

Bayesian Estimators For example, we might know that the normalized frequency f 0 of an observed sinusoid cannot be greater than 0.1. This is ensured by choosing { 10 if 0 f0 0.1 p(f 0 ) = 0 otherwise as the prior PDF in the Bayesian framework. Usually differentiable PDFs are easier, and we could approximate the uniform PDF with, e.g., the Rayleigh PDF. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 25 / 31

Prior and Posterior Estimates Bayesian approach can be applied to small data records. The estimate can be improved sequentially as new data arrives. For example, consider tossing a coin and estimating the probability of a head, µ. Maximum-likelihood estimate is ˆµ = #heads #toss If the no. of tosses is 3 and 3 heads (no tail) are observed, then µ ML = 1. The Bayesian approach can circumvent this problem, because the prior regularizes the likelihood and avoids overfitting to the small amount of data. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 26 / 31

Prior and Posterior Estimates Likelihood, prior, and posterior after observing 3 heads in a row. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 27 / 31

Prior and Posterior Estimates Likelihood function: p(x µ) = µ #heads (1 µ) #tails If x = {H, H, H}, max µ p(x µ) = 1 and argmax µ p(x µ) = 1. The prior p(µ) is selected to reflect the fact that we have a fair coin. The posterior density can be obtained from Bayes formula: p(µ x) = p(x µ)p(µ) p(x) p(x µ)p(µ) The Bayesian approach is the select the maximum of the posterior (maximum a posteriori: ˆµ = argmax µ p(µ x) = argmax p(x µ)p(µ) µ Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 28 / 31

Average Cost Bayesian estimators can be obtained by minimizing the average cost: ˆθ = argmin C(θ ˆθ)p(x, θ)dxdθ ˆθ θ x = argmin C(θ ˆθ)p(θ x)p(x)dxdθ ˆθ θ x = argmin ˆθ = argmin ˆθ x θ ( θ ) C(θ ˆθ)p(θ x)dθ p(x)dx C(θ ˆθ)p(θ x)dθ Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 29 / 31

Bayesian MMSE Estimator If C(z) = z 2, we have the Bayesian minimum mean-square error (MMSE) estimator: ˆθ mmse = argmin (θ ˆθ) 2 p(θ x)dθ ˆθ θ Differentiating the integral w.r.t. ˆθ and set the result to 0, we obtain 2(θ ˆθ)p(θ x)dθ = 0 = ˆθp(θ x)dθ = θp(θ x)dθ = ˆθ p(θ x)dθ = θp(θ x)dθ = ˆθ = θp(θ x)dθ Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 30 / 31

Bayesian MMSE Estimator Therefore, the Bayesian MMSE Estimator is ˆθ mmse = θp(θ x)dθ = E θ p(θ x) {θ x}, which is the mean of the posterior PDF, p(θ x). Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 31 / 31