Estimation and Detection

Size: px

Start display at page:

Download "Estimation and Detection"

Lee Johnson
5 years ago
Views:

1 stimation and Detection Lecture 2: Cramér-Rao Lower Bound Dr. ir. Richard C. Hendriks & Dr. Sundeep P. Chepuri 7//207

2 Remember: Introductory xample Given a process (DC in noise): x[n]=a + w[n], n=0,,n, w[n] zero mean noise How can we estimate A? Some candidates:. Â = x[0] Consider the case A =. Which estimate is better? Â = x[0] = Â 2 = N P N n=0 x[n] Â2 = N P N n=0 x[n]=0.9 Can the performance of an estimator be based on a single realisation? No! The data is random. As a result, the estimator is a random variable too!

3 Introductory xample Use statistical descriptors: xpected value and variance of the estimator. (Â)=A (Â2)= N P N n=0 x[n] = N Both estimators are unbiased. P N n=0 (x[n]) = A Look at the variance, define 2 = var(w[n]): var(â)= 2 var(â2) = var N P N n=0 x[n] = P N N 2 n=0 var(x[n]) = 2 N Â 2 has smaller variance. Also, var(â2)! 0 as N!: the estimator is consistent. Conclusion: stimators are random variables. Question: What are good (or optimal) estimators?

4 Minimum Variance Criterion A natural criterion is the Mean Square rror (MS): h h mse(ˆ )= (ˆ ) 2i i 2 = (ˆ (ˆ )) + ((ˆ ) ) The MS consist of errors due to h = (ˆ (ˆ )) 2i +((ˆ ) ) 2 = var(ˆ ) {z } variance +((ˆ ) {z } bias ) 2 bias of the estimator variance of the estimator.

5 Minimum Variance Criterion h mse(ˆ )= (ˆ (ˆ )) 2i +((ˆ ) ) 2 = var(ˆ ) {z } variance +((ˆ ) {z } bias ) 2 Consider the (biased) estimator: Â 3 = a N P N n=0 x[n], with constant a. [Â3]=aA var[â3]= a2 2 N MS(Â3)= a2 2 N +(a )2 A 2. Find the optimal estimator requires finding the optimal a: a opt = A 2 A /N. The optimal estimator: Â 3 = P A2 N NA n=0 x[n]. ) This estimator depends on A and is therefore unrealisable!!

6 Minimum Variance Unbiased stimator The optimal estimator depends on A (and thus unrealisable), because the bias term in the MS depends on A: MS(Â3)= a2 2 N +(a )2 A 2. {z } We therefore abandon the MS and constrain the bias to be zero. Hence, if (ˆ )=, then bias h mse(ˆ )= (ˆ (ˆ )) 2i h +((ˆ ) ) 2 = (ˆ (ˆ )) 2i. This is what we call: the minimum variance unbiased estimator (MVU).

7 Minimum Variance Unbiased stimator Remark: The MVU does not always exist and is generally difficult to find. var(ˆ ) ˆ var(ˆ ) ˆ ˆ 2 ˆ 3 ˆ 2 ˆ 3 If ˆ, ˆ 2 and ˆ 3 are all unbiased estimators, then ˆ 3 is the MVU. MVU does not exist!

8 Finding the MVU How to find the MVU?!? ven if the MVU exists, there is no standard "recipe" to find it.. Determine Cramer-Rao-Bound (Today, Ch. 3) 2. Apply Rao-Blackwell-Lehmann-Scheffe theorem (will not be discussed) 3. Restrict estimators to be both unbiased AND linear (Next week, Ch. 6)

9 Today s Agenda. Stating the "The Cramer-Rao Lower Bound" 2. xample 3. Derivation of the CRLB 4. xamples

10 Cramér-Rao Lower Bound Theorem () CRLB: bound on the variance of any unbiased estimator Assume the pdf p(x; ) satisfies the regularity condition: lnp(x; ) =0 for all The variance of any unbiased estimator ˆ then satisfies var(ˆ ) 2 lnp(x; ) 2 = # 2 lnp(x; )

11 Cramér-Rao Lower Bound Theorem (2) CRLB: Can be used in some cases to find the MVU An unbiased estimator may be found that attains the bound for all lnp(x; ) = I( )(g(x) ) for some functions g and I. The estimator then is ˆ = g(x) with mean (ˆ ) = and variance var(ˆ )= I( ). An estimator is called efficient if it meets the CRLB with equality. In that case it is the MVU (the converse is not necessarily true).

12 xample CRLB () Given a process (DC in noise): x[n]=a + w[n], n=0,,n, w[n] zero mean white Gaussian noise NY " p exp 2 2 " 2 2 N # (x[n] A) 2 = 2 2 # X (x[n] A) 2 = 2 n=0 " PN # n=0 (x[n] A)2 exp (2 2 ) N/2 2 2 X (x[n] A) N 2 ln 2 = N 2, which is data independent in this case.

13 xample CRLB (2) From the CRLB we know: var(ˆ ) 2 lnp(x; ) 2 = # 2 lnp(x; ) and thus var(â) 2 @A " 2 2 # X (x[n] A) 2 = 2 N n=0 X (x[n] A)= N 2 N n=0 {z} I( ) 0 N NX n=0 x[n] {z } g(x) {z} A C A In this case, our estimator attains the bound and is thus efficient (var(â)= 2 N ).

14 The Cramér-Rao Lower Bound The CRLB is a lower bound on the variance of any unbiased estimator. Notice that a biased estimator could have a lower variance. We consider the pdf p(x; ) for a particular realisation x, as a function of the unknown parameter : the likelihood function. If p(x; ) is very "peaky" with respect to changes in it is easy to determine the correct value of from data x.

15 The Cramér-Rao Lower Bound The CRLB is a lower bound on the variance of any unbiased estimator. Notice that a biased estimator could have a lower variance. Given again a process (DC in noise): x[n]=a + w[n], n=0,,n, w[n] zero mean white Gaussian noise xample of a biased estimator: Â = c N P N n=0 x[n] with 0 apple c apple [Â]=cA (thus biased) Var[Â]=[Â2 ] [Â]2 = c 2 2, which is lower than the CRLB! N

16 The Cramér-Rao Lower Bound - Curvature p(x[0] = 3; A; < 2 = =3]) Likelihood function: p(x[0]; A) as function of A. Consider the two cases 2 =and 2 =/ A 0.5 p(x[0] = 3; A; < 2 = ]) The "sharpness" (curvature) of the likelihood function determines how accurately the unknown parameter can be estimated How to measure sharpness/curvature? A

17 The Cramér-Rao Lower Bound - Curvature easure curvature p(x[0] = 3; A; < = ]) p(x[0]; A) A ln [p(x[0]; ln[p(x[0];a)] A

18 The Cramér-Rao Lower Bound Score function The CRLB is based on the gradient of the log-likelihood function: s(x; lnp(x; ) s(x; ) is also called the score function and is a measure of the sensitivity of p(x; ) to changes in. (It scores values of.) Scores near zero are good scores" as we search for stationary points (maximum) in p(x; ). The log-likelihood function and score function, which depend on the data, is itself a random variable (so it is of interest to calculate [s(x; )] and Var[s(x; )]).

19 The xpected Value and Variance of Score Function First and second order moment of the score function are central to the CRLB theorem. An important condition when deriving the CRLB is that the first order moment of the score function is zero: lnp(x; ) =0 This is known as the regularity condition. Most pdfs are regular. First order moment nonzero when the domain of the pdf for which it is nonzero depends on the unknown parameter.

20 The xpected Value and Variance of Score Function When does the regularity condition hold? lnp(x; ) = lnp(x; ) p(x; )dx = p(x; ) p(x; ) p(x; )dx = p(x; )dx Z p(x; )dx =0 Consider p(x; )=U[0, ] Z dx Z 0 dx 6=0 The regularity condition does not hold in this case. This step assumes we can switch gradient and integral. This is only allowed in specific cases (when integral boundary, i.e., domain of the p depends on ) 2

21 The xpected Value and Variance of Score Function The 2nd moment of the score function is called the Fisher information # 2 apple lnp(x; 2 lnp(x; ) = 2 = I( ) The Fisher information is a way of measuring the amount of information that an observable random variable x carries about an unknown parameter. Fisher information is. Nonnegative. 2. Additive for independent observations lnp(x; )= NX n=0 2 N lnp(x; ) X ln p(x[n]; ) ) 2 = n=0 2 lnp(x[n]; ) 2 2

22 The xpected Value and Variance of Score Function Let us now prove that 2 lnp(x; ) 2 = Proof: From the regularity condition, we obtain # 2 lnp(x; lnp(x; ) =0 lnp(x; ) p(x; )dx =0 Z 2 lnp(x; ) ) 2 p(x; lnp(x; ) dx =0 2 Z 2 lnp(x; lnp(x; ) ) 2 p(x; )dx = p(x; )dx 2 # 2 lnp(x; ) lnp(x; ) ) 2 = 2

23 2 Proof of the Cramér-Rao Lower Bound () Starting from the unbiased assumption: Z (ˆ )p(x; )dx =0 Z (ˆ )p(x; )dx =0 Z Z ) p(x; )dx + (ˆ ) dx =0 Z ) (ˆ lnp(x; ) p(x; )dx = Z ) (ˆ ) p p(x; lnp(x; ) p p(x; )dx =

24 2 Proof of the Cramér-Rao Lower Bound (2) Let us now use the Cauchy-Schwarz inequality Z Z (ˆ ) p p(x; lnp(x; ) p p(x; )dx = Z f 2 (x)dx g 2 (x)dx Z 2 f(x)g(x)dx Then we obtain Z (ˆ ) var(ˆ ) Z lnp(x; ) ) 2 p(x; )dx p(x; )dx # 2 lnp(x; )

25 2 Proof of the Cramér-Rao Lower Bound (3) CRLB: Can be used in some cases to find the MVU An unbiased estimator may be found that attains the bound for all lnp(x; ) = I( )(g(x) ) for some functions g and I. The estimator then is ˆ = g(x) with mean (ˆ ) = and variance var(ˆ )= I( ). An estimator is called efficient if it meets the CRLB with equality. In that case it is the MVU (the converse is not necessarily true).

26 2 Proof of the Cramér-Rao Lower Bound (4) lnp(x; ) = I( )(ˆ (x) ), it follows that (ˆ )=, var(ˆ )= I( ) Proof for (ˆ )= From the regularity condition: lnp(x; ) h i h i = I( )(ˆ (x) ) = I( ) (ˆ (x) ) =0 i So, hˆ (x) =

27 2 Proof of the Cramér-Rao Lower Bound (5) lnp(x; ) = I( )(ˆ (x) ), it follows that (ˆ )=, var(ˆ )= I( ) Proof for var(ˆ )= I( ) # 2 lnp(x; ) h = I 2 ( )(ˆ (x) ) 2i h = I 2 ( ) (ˆ (x) ) 2i i = I 2 ( )var hˆ (x) i So, var hˆ (x) = 20 lnp(x; I 2 ( ) A2 3 5

28 2 Proof of the Cramér-Rao Lower Bound (5) Further it follows lnp(x; ) = I( )(ˆ (x) ) 2 lnp(x; ) 2 = I( ) (ˆ (x) ) Taking expectations: 2 lnp(x; ) 2 = I( )+ We obtain 2 lnp(x; ) 2 = I( ) ) (ˆ (x) ) = I( ) h i (ˆ (x) ) {z } =0 because unbiased

29 2 Proof of the Cramér-Rao Lower Bound (6) We already know that: 2 lnp(x; ) 2 = # 2 lnp(x; ) We just obtained I( )= 2 lnp(x; ) 2 # 2 lnp(x; ) and thus I( )= i var hˆ (x) = # 2 lnp(x; ) I 2 ( ) = # 2 lnp(x; )

30 CRLB for General Gaussian model Let us assume a Gaussian distribution for the noise: w N (0,C w ), p(w)= Then the Gaussian model is defined as (2 ) N 2 det(c w ) 2 apple exp 2 wt Cw w x = h( )+w, x N (h( ),C w ) ) p(x)= (2 ) N 2 det(c w ) 2 apple exp 2 (x h( ))T Cw (x h( ) Let us derive the CRLB for this lnp(x; ) ( ) Cw x h( ) lnp(x; ) 2 h T ( ) 2 Cw x h( ) 2 lnp(x; ) ) 2 ( ) T ( ) ) w ) var(ˆ T ( ) ) w 3

31 CRLB for Linear Gaussian model & MVU Assume the model is a linear Gaussian model: x = h + w, w N (0,C w ) From CRLB for a General Gaussian model, we obtain var(ˆ ) h T C w h. lnp(x; ) = h T Cw (x h )=h T Cw h[(h T Cw h) h T Cw x ] As a result, the MVU exists and its solution reaches the CRLB: ˆ =(h T Cw h) h T Cw x. Properties: (ˆ )= var(ˆ )=(h T C w h) ; it is equal to the CRLB: in this case the MVU is efficient. ˆ is a linear transformation of x and thus Gaussian: ˆ N [,(h T C w h) ] 3

32 3 xample Poisson Distribution Let X,...,X N be iid measurements from a Poisson ( ) distribution with marginal pmf p(x i ; )=e x i x i!, with expected value [x i ]=. Calculate the CRLB for parameter and try to find the MVU estimator for.

33 3 Summary () CRLB: bound on the variance of any unbiased estimator Assume the pdf p(x; ) satisfies the regularity condition: lnp(x; ) =0 for all The variance of any unbiased estimator ˆ then satisfies var(ˆ ) 2 lnp(x; ) 2 = # 2 lnp(x; )

34 3 Summary (2) CRLB: Can be used in some cases to find the MVU An unbiased estimator may be found that attains the bound for all lnp(x; ) = I( )(g(x) ) for some functions g and I. The estimator then is ˆ = g(x) with mean (ˆ ) = and variance var(ˆ )= I( ). An estimator is called efficient if it meets the CRLB with equality. In that case it is the MVU (the converse is not necessarily true).

Advanced Signal Processing Introduction to Estimation Theory

Advanced Signal Processing Introduction to Estimation Theory Danilo Mandic, room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,