EIE6207: Estimation Theory

Size: px

Start display at page:

Download "EIE6207: Estimation Theory"

Asher Reed
5 years ago
Views:

1 EIE6207: Estimation Theory Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University mwmak References: Steven M. Kay, Fundamentals of Statistical Signal Processing, Prentice Hall, November 8, 2018 Man-Wai MAK (EIE) Estimation Theory November 8, / 28

2 Overview 1 Motivations 2 Introduction 3 Minimum Variance Unbiased Estimation 4 Cramer-Rao Lower Bound Man-Wai MAK (EIE) Estimation Theory November 8, / 28

3 Motivations Estimation in signal processing is fundamental to many applications: Radar Sonar Speech and audio processing Image and video processing Biomedical and biomedicine Communications Control engineering Seismology All of these applications require to estimate a set of parameters, e.g., the position of aircraft through radar, position of a submarine through sonar, and phonemes of speech waveform. All of these applications have a common characteristics: The parameters are estimated from noisy signals. So, their value can only be an estimate. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

4 Introduction In all of these applications, we estimate the values of parameters based on some waveforms. Specifically, Given N discrete-time {x[0], x[1],..., x[n 1]} that depends on an unknown parameter θ, we define an estimator ˆθ to estimate θ: ˆθ = g(x[0], x[1],..., x[n 1]), where g is some function. For example, for the mean estimator, g is given by g(x[0], x[1],..., x[n 1]) = 1 N N 1 i=0 x[i] Man-Wai MAK (EIE) Estimation Theory November 8, / 28

5 Introduction To determine good estimators, we need to mathematically model the data. We describe the data using probability density function (PDF) parameterized by θ: p(x[0],..., x[n 1]; θ) For example, when N = 1 and assume Gaussian PDF, we have [ 1 p(x[0]) = exp 1 ] (x[0] θ)2 2πσ 2 2σ2 In actual problems, we do not have the PDF but only the data. So, we select an appropriate parametric PDF based on our prior knowledge of the problem. The PDF should also be mathematically tractable. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

6 Introduction The stock price in the figure below seems to increase on the average. To verify this hypothesis, we use the model: x[n] = A + Bn + w[n], n = 0, 1,..., N 1, where w[n] is white Gaussian noise with PDF N (0, σ 2 ). Figure: y-axis: stock price x[n]; x-axis: time point n Man-Wai MAK (EIE) Estimation Theory November 8, / 28

7 Introduction So far, we assume that θ is deterministic but unknown. This is called classical estimation. But we could also bring in prior knowledge about θ, e.g., prior range or distribution of A and B. Then, we have a Bayesian estimator. The data and parameters are modeled by a joint PDF: p(x, θ) = p(x θ)p(θ), where p(θ) is the prior PDF, which summarizes our knowledge about θ before any data are observed. p(x θ) is the conditional PDF, summarizing our knowledge about x when we know θ. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

8 Introduction If w[n] is uncorrelated with x[i], then p(x; θ) = = = N 1 n=1 N 1 n=0 p(x[n]; θ) [ 1 exp 2πσ 2 1 (2πσ 2 exp ) N/2 1 ] (x[n] A Bn)2 2σ2 [ ] 1 N 1 2σ 2 (x[n] A Bn) 2 n=0 where x = [x[0]... x[n 1]] T and θ = [A B] T. Maximizing p(x; θ) with respect to θ, we obtain close-form solutions for A and B. For the diagram shown in previous page, B > 0, meaning the stock price is increasing. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

9 Minimum Variance Unbiased Estimation An estimator is unbiased if on average it yield the true value of the unknown parameter θ. Mathematically, an estimator is unbiased if E(ˆθ) = θ a < θ < b. Example: Unbiased estimator for DC level with white Gaussian noise (WGN): Consider the observations x[n] = A + w[n] n = 0, 1,..., N 1 (1) where < A < is the parameter to be estimated and w[n] is WGN. A reasonable estimator for the average value of x[n] is Â = 1 N N 1 n=0 x[n] Man-Wai MAK (EIE) Estimation Theory November 8, / 28

10 Minimum Variance Unbiased Estimation Using the linearity property of expectation: E(Â) = 1 N N 1 n=0 E(x[n]) = A. The sample mean is an unbiased estimator of the DC level in WGN. If multiple estimators of the same parameter θ are available, i.e., {ˆθ 1, ˆθ 2,..., ˆθ K }, a better estimator can be obtained by averaging these estimators: ˆθ = 1 K ˆθ k. K k=1 If all estimators are unbiased, we have E(ˆθ) = θ If all estimators have the same variance, then var(ˆθ) = 1 K K 2 var(ˆθ k ) = var(ˆθ 1 ) = variance decrease k=1 K Man-Wai MAK (EIE) Estimation Theory November 8, / 28

11 Minimum Variance Unbiased Estimation However, a biased estimator is characterized by a systematic error, e.g., E(ˆθ k ) = θ + b(θ). Then, E(ˆθ) = 1 K K E(ˆθ k ) = θ + b(θ) k=1 This means that having more unbiased estimators does not mean that the bias can be reduced. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

12 Minimum Variance Criterion Mean square error (MSE): mse(ˆθ) = E {(ˆθ θ) 2} The MSE has problem if the estimator is biased, because { [ ] } 2 mse(ˆθ) = E (ˆθ E(ˆθ)) + (E(ˆθ) θ) [ ] 2 = var(ˆθ) + E(ˆθ) θ (2) = var(ˆθ) + b 2 (θ) This means that MSE comprises errors due to the variance of the estimators and the bias. Therefore, we should find an unbiased estimator (with b = 0) that produces the minimum variance. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

13 Example of Biased Estimators Let the estimator of A in Eq. 1 be Ã = α N N 1 n=0 x[n], where α is an arbitrary constant. We would like to find the value of α that minimize { mse(ã) = E (Ã A)2} [ ] 2 = var(ã) + E{Ã} A = α2 σ 2 N + (α 1)2 A 2 Setting mse(ã) α = 0, we obtain α = A 2 A 2 + σ 2 /N Man-Wai MAK (EIE) Estimation Theory November 8, / 28

14 Example of Biased Estimators The optimal value α depends on the unknown parameter A. Therefore the estimator is not realizable. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

15 Unbiased Estimators We could focus on estimators that have zero bias so that the bias contributes noting to the MSE. Without the bias term, the MSE in Eq. 2 does not involve the unknown parameter. By focusing on estimators with zero bias, we may hope to arrive at a design criterion that yields realizable estimators. Definition: An estimator ˆθ is called unbiased if its bias is zero for all values of the unknown parameter. Recap: For an estimator to be unbiased, we require that on average the estimator will yield the true value of the unknown parameter. Let the estimator be ˆθ = g(x), e.g., g can be the mean function. g is unbiased if θ : E{ˆθ} = g(x)p(x θ)dx = θ, then g(x) is an unbiased estimator. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

16 Minimum Variance Criterion An unbiased estimator is not necessarily a good estimator. For example, Ã = 1 N 1 x[n] 2N n=0 E(Ã) = 1 { 0 if A = 0 = unbiased 2 A = 1A if A 0 = biased 2 Some unbiased estimators are more useful than others. For example, consider two estimators for Eq. 1: Â 1 = x[0] and Â2 = 1 N 1 N n=0 x[n]. Although E(Â1) = E(Â2) = A, Â 2 is better because var(â1) = σ 2 > var(â2) = σ2 N. (3) Man-Wai MAK (EIE) Estimation Theory November 8, / 28

17 Minimum Variance Unbiased Estimators If the bias is zero, the MSE is just the variance. This gives rise to the minimum variance unbiased estimator (MVUE) for θ. Definition: An estimator ˆθ is the MVUE if it is unbiased and has the smallest variance among any unbiased estimator for all values of the unknown parameter θ, i.e., ˆθ, we have var(ˆθ MVUE ) var(ˆθ) It is important to note that a uniformly MVUE may not always exist, and even if it does, we may not be able to find it. This definition and the above discussions apply to vector parameters θ. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

18 Cramer-Rao Lower Bound Given an estimation problem, what is the variance of the best possible estimator? This quantity is given by the Cramer-Rao lower bound (CRLB). The CRLB theorem also provides a method for finding the best estimator. We will use the example in Eq. 1 to explain the idea of CRLB. For simplicity, suppose that we are using only a single observation x[0] to estimate A in Eq. 1. The PDF of x[0] is [ 1 p(x[0]; A) = exp 1 ] (x[0] A)2 2πσ 2 2σ2 Once we have observed x[0], say for example x[0] = 3, some values of A are more likely than others. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

Cramer-Rao Lower Bound The pdf of A has the same form as the PDF of x[0]: [ 1 pdf of A = exp 1 ] (3 A)2 2πσ 2 2σ2 0.7 PDF plotted versus A in the case x[0]=3 and σ 2 = 1/3 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 Value of A x[0] 0.

19 Cramer-Rao Lower Bound The pdf of A has the same form as the PDF of x[0]: [ 1 pdf of A = exp 1 ] (3 A)2 2πσ 2 2σ2 0.7 PDF plotted versus A in the case x[0]=3 and σ 2 = 1/ Value of A x[0] 0.7 PDF plotted versus A in the case x[0]=3 and σ 2 = Value of A x[0] We probably will get more accurate estimate of A for the distribution of A in the upper pannel. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

20 Cramer-Rao Lower Bound If the pdf is viewed as a function of the unknown parameter (with x fixed), it is called the likelihood function (function of the unknown parameters). The sharpness of the likelihood function determines how accurately we can estimate the parameter. For example the pdf on the top panel of previous figure is the easier case. If a single sample is observed as x[0] = A + w[0], then we can expect a better estimate if σ 2 is small. How to quantify the sharpness? Is there any measure that would be common for all possible estimators for a specific estimation problem? The second derivative of the likelihood function (or log-likelihood) is one alternative for measuring the sharpness of a function. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

21 Cramer-Rao Lower Bound Recall that the PDF of x[0] is [ 1 p(x[0]; A) = exp 1 ] (x[0] A)2 2πσ 2 2σ2 The log-likelihood function: p(x[0]; A) = log 2πσ 2 1 (x[0] A)2 2σ2 The first and second derivatives w.r.t. A are: log p(x[0]; A) = 1 (x[0] A) A σ2 2 log p(x[0]; A) A 2 = 1 σ 2 Man-Wai MAK (EIE) Estimation Theory November 8, / 28

22 Cramer-Rao Lower Bound Since σ 2 is the smallest possible variance, in this specific case we have an alternative way of finding the minimum variance of all estimators: minimum variance = σ 2 = 1 2 log p(x[0];a) A 2 For the general case, If the function depends on the data x, take the expectation over all x. If the function depends on the parameter θ, evaluate the derivative at the true value of θ, We have the general rule: min. var. of any unbiased estimator = 1 [ ], E 2 log p(x,θ) θ 2 Man-Wai MAK (EIE) Estimation Theory November 8, / 28

23 CRLB Theorem If the pdf p(x; θ) satisfies the regularity condition { } log p(x; θ) E = 0, θ, θ where the expectation is taken with respect to p(x; θ). Then, the variance of any unbiased estimator ˆθ must satisfy 1 var(ˆθ) [ ], E 2 log p(x,θ) θ 2 where the derivative is evalauted at the true value of θ and the expectation is taken with respect to p(x; θ). An unbiased estimator attains the bound for all θ if and only if log p(x; θ) θ = I(θ)(g(x) θ) (4) for some functions g and I. The MVU estimator is ˆθ = g(x) and the minimum variance = 1/I(θ). Man-Wai MAK (EIE) Estimation Theory November 8, / 28

24 CRLB Example 1: Estimation of DC Level in WGN Example: DC level in Gaussian noise: x[n] = A + w[n], n = 0, 1,..., N 1 What is the minimum variance of any unbiased estimator using N samples? The likelihood function is the product of N densities: p(x; A) = = N 1 n=0 [ 1 exp 2πσ 2 1 (2πσ 2 exp ) N/2 1 ] (x[n] A)2 2σ2 [ ] 1 N 1 2σ 2 (x[n] A) 2 n=0 Man-Wai MAK (EIE) Estimation Theory November 8, / 28

25 CRLB Example 1: Estimation of DC Level in WGN The log-likelihood function is [ ] log p(x; A) = log (2πσ 2 ) N 2 1 N 1 2σ 2 (x[n] A) 2 The first derivative is log p(x, A) A where x is the sample mean. The 2nd derivative is n=0 = 1 N 1 σ 2 (x[n] A) = N ( x A), σ2 n=0 2 log p(x, A) A 2 = N σ 2 Therefore, the minimum variance of any unbiased estimator is σ2 var(â) N (5) Man-Wai MAK (EIE) Estimation Theory November 8, / 28

26 CRLB Example 1: Estimation of DC Level in WGN In Eq. 3, the sample mean A 2 gives the minimum variance in Eq. 5. Therefore, the sample mean is an MVUE for this problem. Using the CRLB theorem in Eq. 4, we have log p(x; A) A = N ( x A) σ2 = I(A)(g(x) A), where I(A) = N and g(x) = x. σ 2 So, g(x) is the MVUE and 1/I(A) is its variance. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

27 CRLB Example 2: Estimation of the Phase of Sinusoid Problem: Given N data points and a model with known amplitude A and frequency f 0 : x[n] = A cos(2πf 0 n + φ) + w[n], n = 0,..., N 1 estimate the phase φ of a sinusoid embedded in WGN with variance σ 2. How accurate will the estimator be? Solution: Derivation: See Tutorial var( ˆφ) 2σ 2 }{{} NA 2 approx Man-Wai MAK (EIE) Estimation Theory November 8, / 28

28 CRLB Summary Cramer Rao inequality provides lower bound for the estimation error variance. Minimum attainable variance is often larger than CRLB. We need to know the pdf to evaluate CRLB. Often we dont know this information and cannot evaluate this bound. If the data is multivariate Gaussian or i.i.d. with known distribution, we can evaluate it. Its not guaranteed that MVUE exists or is realizable. Man-Wai MAK (EIE) Estimation Theory November 8, / 28

EIE6207: Maximum-Likelihood and Bayesian Estimation

EIE6207: Maximum-Likelihood and Bayesian Estimation Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak