EIE6207: Maximum-Likelihood and Bayesian Estimation

EIE6207: Maximum-Likelihood and Bayesian Estimation Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: Steven M. Kay, Fundamentals of Statistical Signal Processing, Prentice Hall, 1993. http://www.cs.tut.fi/~hehu/ssp/ ovember 12, 2018 Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 1 / 31

Overview 1 Introduction to ML Estimators 2 Biased and Unbiased ML Estimators 3 MLE of Transformed Parameters 4 Application: Range Estimation in Radar 5 Bayesian Estimators Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 2 / 31

What is ML Estimator? Maximum likelihood (ML) is the most popular estimation approach due to its applicability in complicated estimation problems. The basic principle is simple: Find the parameter θ that is the most probable to have generated the data x. The ML estimator is in general neither unbiased nor optimal in the minimum variance sense. However, asymptotically it becomes unbiased and reaches the Cramer-Rao bound. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 3 / 31

Definition The Maximum Likelihood estimate for a scalar parameter θ is defined to be the value that maximizes p(x; θ). log p(x; θ) is a log-likelihood function of θ. ML estimate is θ ML = argmax{log p(x; θ)} θ The figure next page shows the likelihood function and the log-likelihood function for one possible realization of data. The data consists of 50 points, with true A = 5. The likelihood function gives the probability of observing these particular points with different values of A. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 4 / 31

Example 1 Consider the DC level in WG: x[n] = A + w[n], n = 0, 1,..., 1, where w[n] (0, σ 2 ). The likelihood and log-likelihood functions are shown below: Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 5 / 31

Example 1 We maximize log p(x; A) with respect to A: Setting Setting A ML = Â = argmax log p(x; A) = argmax A log p(x;a) A A { 2 log(2πσ2 ) 1 2σ 2 } (x[n] A) 2 = 0, we have the ML estimator Â = 1 x[n] log p(x;a) σ 2 = 0, we have the ML estimator for σ 2 : σml 2 = ˆσ 2 = 1 (x[n] Â)2 (1) Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 6 / 31

Example 1 Â is unbiased because E{Â} = 1 ˆσ 2 is biased because { } E{ˆσ 2 1 } = E (x[n] Â)2 { } { 1 = E (x[n]) 2 E σ 2 To proof that, we need to use E{x[n]} = 1 A = A 2 x[n]â E{z 2 } = cov(z, z) + µ z = σ 2 z + µ z } + E {Â2 } (2) (3) Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 7 / 31

Example 1 The first term in Eq. 2 is { } [ 1 E (x[n]) 2 = 1 ] (cov(x[n], x[n]) + A 2 ) [ = 1 ] (σ 2 + A 2 ) = σ 2 + A 2 The second term in Eq. 2 is { } { 2 2 E x[n]â = E = 2 2 E { m=0 x[n]x[m] = 2 2 [ σ 2 + 2 A 2] = 2 } x[n] 1 = 2 2 (A 2 + σ2 x[m] m=0 m=0 ) } (cov(x[n], x[m]) + A 2 ) Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 8 / 31

Example 1 The third term in Eq. 2 is E {Â2 } ( 1 = E [ = 1 2 x[n] ) 2 E{(x[n]) 2 } + m=0 = 1 2 [ (σ 2 + A 2 ) + ( 1)A 2] = 1 2 [ 2 A 2 + σ 2] = A 2 + σ2 E{x[n]x[m]} ] Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 9 / 31

Example 1 Combining the 3 terms, we have E{ˆσ 2 } = σ 2 + A 2 A 2 σ2 = σ 2 σ2 ( ) 1 = σ 2 σ 2 To make ˆσ 2 unbiased, we need to use ˆσ 2 = 1 (x[n] 1 Â)2 Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 10 / 31

Example 2 Consider the DC level in WG: x[n] = A + w[n], n = 0, 1,..., 1, where w[n] (0, A). CRLB cannot be used because log p(x; A) A = 2A + 1 (x[n] A) + 1 A 2A 2 (x[n] A) 2 I(A)(g(x) A) for any functions I(A) and g(x). However, we may use maximum-likelihood and set which gives Â 2 + Â 1 x 2 [n] = 0 = Â = 1 2 + 1 x 2 [n] + 1 4 log p(x;a) A = 0, Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 11 / 31 (4)

Example 2 This estimator is biased because E{Â} = E 1 2 + 1 x 2 [n] + 1 4 1 { } 2 + 1 E x 2 [n] + 1 4 = 1 2 + A + A 2 + 1 4 = A, as expectation cannot carry over square root. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 12 / 31

Example 2 However, if is large enough, the bias is negligible. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 13 / 31

Example 2 The ML estimator in Eq. 4 is a reasonable estimator because when, 1 x 2 [n] E{x 2 (n)} = A + A 2 Therefore, Â 1 ( 2 + A + 1 ) = A when 2 The MLE always becomes optimal and unbiased as. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 14 / 31

MLE of Transformed Parameters Often it is required to estimate a transformed parameter instead of the one the PDF depends on. For example, in the DC-level problem we might be interested in the power of the signal A 2 instead of the mean A. Given x[n] = A + w[n], n = 0, 1,..., 1, where w[n] (0, σ 2 ), find the MLE of a transformed parameter: The log-likelihood function is α = exp(a) log p T (x; α) = 2 log(2πσ2 ) 1 2σ 2 (x[n] log α) 2 Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 15 / 31

MLE of Transformed Parameters Setting the derivative of log p T (x; α) to 0 yields (x[n] log ˆα) = 0 = ˆα = exp( x), 1ˆα where ˆα > 0. Things get more complicated if the transformation is We need to consider two PDFs: α = A 2 = A = ± α log p T1 (x; α) = const 1 2σ 2 (x[n] α) 2 for α 0, A 0 log p T2 (x; α) = const 1 2σ 2 (x[n] + α) 2 for α > 0, A < 0 Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 16 / 31

MLE of Transformed Parameters Then, we solve the ML estimation problem in both cases and choose the one that has higher maximum value: ˆα = argmax{p T1 (x; α), p T2 (x; α)} α It can be easily shown that the MLE is ˆα = Â2 = x 2. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 17 / 31

Invariance property of the MLE Given a PDF p(x; θ) paramaterized by θ, the MLE of the parameter α = g(θ) is ˆα = g(ˆθ), where ˆθ is the MLE of θ, which is obtained by maximizing p(x; θ). If g is not a one-to-one function, then ˆα maximizes the modified likelihood function p(x; θ) = max p(x; θ) θ:α=g(θ) Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 18 / 31

Application: Range Estimation in Radar In radar or sonar, a signal pulse is transmitted. The round trip delay τ 0 from the transmitter to the target and back is related to the range R as τ 0 = 2R/c, where c is the speed of propagation. In analog form, the received signal can be written as x(t) = s(t τ 0 ) + w(t) 0 t T, where s(t) is the transmitted signal and w(t) is noise with variance σ 2. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 19 / 31

Application: Range Estimation in Radar After discretisation, we have w[n] 0 n n 0 1 x[n] = s[n n 0 ] + w[n] n 0 n n 0 + M 1 w[n] n 0 + M n 1 where M is the length of the sampled signal and n 0 = F s τ 0, where F s is the sample rate, which must be at least twice the bandwidth of the signal. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 20 / 31

Application: Range Estimation in Radar Assume that everything is Gaussian, the PDF is p(x; n 0 ) = n 0 1 { 1 exp 1 } 2πσ 2 2σ 2 x2 [n] n 0 +M 1 { 1 exp 1 } n=n 0 2πσ 2 2σ 2 (x[n] s[n n 0) 2 n=n 0 +M { 1 = exp (2πσ 2 ) 2 n 0 +M 1 n=n 0 { 1 exp 1 } 2πσ 2 2σ 2 x2 [n] } 1 2σ 2 x 2 [n] { exp 1 ( 2x[n]s[n n0 2σ 2 ] + s 2 [n n 0 ] ) } Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 21 / 31

Application: Range Estimation in Radar Considering the term involving n 0, the MLE of n 0 can be found by maximizing { exp 1 n 0 +M 1 ( 2x[n]s[n n0 2σ 2 ] + s 2 [n n 0 ] ) } n=n 0 Or by minimizing n 0 +M 1 n=n 0 ( 2x[n]s[n n0 ] + s 2 [n n 0 ] ) ote that n 0 +M 1 n=n 0 s 2 [n n 0 ] = M 1 m=0 s2 [m], which is independent of n 0. So, the MLE of n 0 is found by maximizing n 0 +M 1 n=n 0 x[n]s[n n 0 ] = M 1 m=0 x[m + n 0 ]s[m] Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 22 / 31

Application: Range Estimation in Radar This means that the MLE of n 0 is found by correlating the transmitted signal s[n] with all possible received signals x[n] and then choosing the maximum. By the invariance principle, the MLE of the range is R = cτ 0 2 = cn 0 2F s Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 23 / 31

Bayesian Estimators Bayesian estimators differ from classical estimators in that they consider the parameters as random variables instead of unknown constants. The parameters also have a PDF, which needs to be taken into account when seeking for an estimator. The PDF of the parameters can be used for incorporating any prior knowledge we may have about its value. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 24 / 31

Bayesian Estimators For example, we might know that the normalized frequency f 0 of an observed sinusoid cannot be greater than 0.1. This is ensured by choosing { 10 if 0 f0 0.1 p(f 0 ) = 0 otherwise as the prior PDF in the Bayesian framework. Usually differentiable PDFs are easier, and we could approximate the uniform PDF with, e.g., the Rayleigh PDF. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 25 / 31

Prior and Posterior Estimates Bayesian approach can be applied to small data records. The estimate can be improved sequentially as new data arrives. For example, consider tossing a coin and estimating the probability of a head, µ. Maximum-likelihood estimate is ˆµ = #heads #toss If the no. of tosses is 3 and 3 heads (no tail) are observed, then µ ML = 1. The Bayesian approach can circumvent this problem, because the prior regularizes the likelihood and avoids overfitting to the small amount of data. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 26 / 31

Prior and Posterior Estimates Likelihood, prior, and posterior after observing 3 heads in a row. Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 27 / 31

Prior and Posterior Estimates Likelihood function: p(x µ) = µ #heads (1 µ) #tails If x = {H, H, H}, max µ p(x µ) = 1 and argmax µ p(x µ) = 1. The prior p(µ) is selected to reflect the fact that we have a fair coin. The posterior density can be obtained from Bayes formula: p(µ x) = p(x µ)p(µ) p(x) p(x µ)p(µ) The Bayesian approach is the select the maximum of the posterior (maximum a posteriori: ˆµ = argmax µ p(µ x) = argmax p(x µ)p(µ) µ Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 28 / 31

Average Cost Bayesian estimators can be obtained by minimizing the average cost: ˆθ = argmin C(θ ˆθ)p(x, θ)dxdθ ˆθ θ x = argmin C(θ ˆθ)p(θ x)p(x)dxdθ ˆθ θ x = argmin ˆθ = argmin ˆθ x θ ( θ ) C(θ ˆθ)p(θ x)dθ p(x)dx C(θ ˆθ)p(θ x)dθ Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 29 / 31

Bayesian MMSE Estimator If C(z) = z 2, we have the Bayesian minimum mean-square error (MMSE) estimator: ˆθ mmse = argmin (θ ˆθ) 2 p(θ x)dθ ˆθ θ Differentiating the integral w.r.t. ˆθ and set the result to 0, we obtain 2(θ ˆθ)p(θ x)dθ = 0 = ˆθp(θ x)dθ = θp(θ x)dθ = ˆθ p(θ x)dθ = θp(θ x)dθ = ˆθ = θp(θ x)dθ Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 30 / 31

Bayesian MMSE Estimator Therefore, the Bayesian MMSE Estimator is ˆθ mmse = θp(θ x)dθ = E θ p(θ x) {θ x}, which is the mean of the posterior PDF, p(θ x). Man-Wai MAK (EIE) ML and Bayesian Estimation ovember 12, 2018 31 / 31