Maximum Likelihood Estimation

Similar documents
Unbiased Estimation. February 7-12, 2008

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

An Introduction to Signal Detection and Estimation - Second Edition Chapter IV: Selected Solutions

Stat410 Probability and Statistics II (F16)

Lecture 11 and 12: Basic estimation theory

Statistical Theory MT 2008 Problems 1: Solution sketches

Statistical Theory MT 2009 Problems 1: Solution sketches

Chapter 6 Principles of Data Reduction

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

Problem Set 4 Due Oct, 12

Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Exponential Families and Bayesian Inference

Solutions: Homework 3

Lecture 12: September 27

Lecture 7: Properties of Random Samples

1.010 Uncertainty in Engineering Fall 2008

Stat 421-SP2012 Interval Estimation Section

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Last Lecture. Unbiased Test

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Estimation for Complete Data

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

MATH 112: HOMEWORK 6 SOLUTIONS. Problem 1: Rudin, Chapter 3, Problem s k < s k < 2 + s k+1

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Topic 9: Sampling Distributions of Estimators

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Statistics for Applications. Chapter 3: Maximum Likelihood Estimation 1/23

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

7.1 Convergence of sequences of random variables

Topic 9: Sampling Distributions of Estimators

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Notes 5 : More on the a.s. convergence of sums

Estimation Theory Chapter 3

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Random Variables, Sampling and Estimation

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Chapter 7 Maximum Likelihood Estimate (MLE)

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Lecture 2: Monte Carlo Simulation

Homework for 2/3. 1. Determine the values of the following quantities: a. t 0.1,15 b. t 0.05,15 c. t 0.1,25 d. t 0.05,40 e. t 0.

Lecture 23: Minimal sufficiency

6. Sufficient, Complete, and Ancillary Statistics

Introductory statistics

7.1 Convergence of sequences of random variables

Questions and Answers on Maximum Likelihood

Lecture 3: MLE and Regression

Convergence of random variables. (telegram style notes) P.J.C. Spreij

The Bilateral Laplace Transform of the Positive Even Functions and a Proof of Riemann Hypothesis

Sequences and Series of Functions

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

REVIEW 1, MATH n=1 is convergent. (b) Determine whether a n is convergent.

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5

Lecture 10 October Minimaxity and least favorable prior sequences

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Maximum Likelihood Estimation and Complexity Regularization

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Topic 9: Sampling Distributions of Estimators

Lecture Stat Maximum Likelihood Estimation

Bayesian Methods: Introduction to Multi-parameter Models

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

Riemann Sums y = f (x)

Regression and generalization

Machine Learning Brett Bernstein

Section 14. Simple linear regression.

Lecture 19: Convergence

SOLUTION FOR HOMEWORK 7, STAT np(1 p) (α + β + n) + ( np + α

Expectation and Variance of a random variable

Estimation of the Mean and the ACVF

Solutions to Homework 2 - Probability Review

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

6.3 Testing Series With Positive Terms

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Summary. Recap ... Last Lecture. Summary. Theorem

Fall 2013 MTH431/531 Real analysis Section Notes

1 Introduction to reducing variance in Monte Carlo simulations

The standard deviation of the mean

1+x 1 + α+x. x = 2(α x2 ) 1+x

EE 4TM4: Digital Communications II Probability Theory

PROBLEM SET 5 SOLUTIONS 126 = , 37 = , 15 = , 7 = 7 1.

2. The volume of the solid of revolution generated by revolving the area bounded by the

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

1 Inferential Methods for Correlation and Regression Analysis

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Transcription:

ECE 645: Estimatio Theory Sprig 2015 Istructor: Prof. Staley H. Cha Maximum Likelihood Estimatio (LaTeX prepared by Shaobo Fag) April 14, 2015 This lecture ote is based o ECE 645(Sprig 2015) by Prof. Staley H. Cha i the School of Electrical ad Computer Egieerig at Purdue Uiversity. 1 Itroductio For may families besides expoetial family, Miimum Variace Ubiased Estimator (MVUE) could be very difficult to fid, or it may ot eve exist. For such models, we eed a alterative method to obtai good estimators. With the absece of the prior iformatio, the maximum likelihood estimatio might be a viable alterative. (Poor IV.D) Defiitio 1. Maximum likelihood estimate (MLE) The maximum likelihood estimator is defied as: θ ML (y) def = argmax θ f θ (y) (1) where f θ (y) = f Y (y;θ). Here, the fuctio f θ (y) is called the likelihood fuctio. We ca also take log o f θ (y) ad yield the same maximizer: θ ML (y) def =argmax logf θ (y). (2) θ The fuctio logf θ (y) is called the log-likelihood fuctio. Example 1. Let Y = [Y 1,...,Y ] be a sequece of iid radom variables such that Assume that σ 2 is kow, fid θ ML for µ. Solutio: First of all, the likelihood fuctio is f θ (y) = Y k N(µ,σ 2 ). ( 1 exp 1 (2πσ 2 ) /2 2σ 2 ) (y k µ) 2. Takig the log o both sides we have the log-likelihood fuctio logf θ (y) = 1 2σ 2 (y k µ) 2 2 log(2πσ2 ). I order to fid the maximizer of the log-likelihood fuctio, we take the first order derivative ad set it to zero. This yields µ ML (y) = 1 y k.

We ca also show that E[ µ ML (Y )] = 1 which says that the estimator is ubiased. E[Y k ] = µ, Example 2. Nowwecosiderthepreviousexamplewithbothµadσ ukow. Ourgoalistodetermie θ def ML = [ θ 1, θ 2 ] T for θ 1 = µ ad θ 2 = σ 2. Solutio: Same as the previous problem, the log-likelihood fuctio is logf θ (y) = 1 2θ 2 Takig the partial derivative wrt to θ 1 yields which gives (y k θ 1 ) 2 2 log(2πθ 2). θ 1 logf θ (y) = 1 2θ 2 θ 1 (y) = 1 Similarly, takig the partial derivative wrt to θ 2 yields which gives θ 2 logf θ (y) = 1 θ 2 2 θ 2 (y) = 1 Note that E[ θ 2 (Y )] = 1 σ2 σ 2. So θ 2 is biased. 2(y k θ 1 ) = 0, y k. 2(y k θ 1 ) 2 2θ 2 = 0, (y k θ 1 ) 2. Remark: I order to obtai a ubiased estimator for the populatio variace, it is preferred to use the sample variace defie as (Zwilliger 1995, p. 603): S 1 = 1 1 (Y i Y) 2, where Y = 1 Y i is the sample mea. I fact, the fuctio var i MATLAB is the sample variace. Example 3. Beroulli (Statistical Iferece: Example 7.2.7, Casella ad Berger) Let Y = [Y 1,...,Y ] be a sequece of i.i.d. Beroulli radom variables of parameter θ. We would like to fid the MLE θ ML for θ. Solutio: 2

First of all, we defie the likelihood fuctio: f θ (y) = θ y k (1 θ) 1 y k. Lettig y = y k, we ca rewrite the likelihood fuctio as Hece, the log-likelihood fuctio is Takig the derivative ad settig it to zero yields f θ (y) = θ y (1 θ) 1 y. logf θ (y) = ylogθ +( y)log(1 θ). θ logf θ(y) = y θ y 1 θ = 0. Therefore, θ ML (y) is θ ML (y) = y k. Example 4. Biomial Let Y = [Y 1,...,Y ] be a sequece of iid radom variables of a Biomial distributio of biomial(k,θ). We would like to fid θ ML for θ. Solutio: The likelihood fuctio is f θ (y) = ( ) k θ yi (1 θ) k yi. By lettig y = y i, we ca rewrite the likelihood fuctio as: The log-likelihood fuctio is y i ( ) k f θ (y) = θ y (1 θ) 1 y. logf θ (y) = ylogθ +(k y)log(1 θ)+ Takig the first order derivative ad settig to zero yields θ ML (y) = y k = 1 k y i ( ) k log y i }{{} This term does ot cotai θ y i. Example 5. Poisso Let Y = [Y 1,...,Y ] be a sequece of i.i.d. Poisso radom variables of parameter λ. Recall that Poisso distributio is: P(Y i = y i ) = e λ λ y i y i!. We would like to fid λ ML for parameter λ. 3

Solutio: Similarly as i previous examples, first we fid the likelihood fuctio: Thus the log-likelihood fuctio is logf λ (y) = λ+ Settig the first-order derivative to 0 yields f λ (y) = e λ λ yi y i! = e λ λ yi y. i! y i logθ λ ML (y) = 1 log y i! }{{} This term does ot cotai λ y i. 2 Bias v.s. Variace I geeral, MLE could be both biased or ubiased. To take a closer look at this property, we write the MLE as a sum of bias ad variace terms as below: MSE θ = E Y [( θ ML (Y ) θ) 2 ] = E[( θ ML E[ θ ML ]+E[ θ ML ] θ) 2 ] = E[( θ ML E[ θ ML ]) 2 ]+E[(E[ θ ML ] θ) 2 ]+2E[( θ ML E[ θ ML ])(E[ θ ML ] θ)] = E Y [( θ ML (Y ) E[ θ ML (Y )]) 2 ] +(E[ θ ML (Y )] θ) 2. }{{}}{{} variace bias (3) Example 6. Image Deoisig Let z be a clea sigal ad let be a oise vector such that N(0,σ 2 I). Suppose that we are give the oisy observatio y = z +, our goal is to estimate z from y. I this example, let us cosidera liear deoisigmethod. We wouldliketo fid aw suchthat the estimator ẑ = Wy would be optimal i some sese. We shall call W as a smoothig filter. To determie what W would be good, we first cosider the MSE: MSE = E[ ẑ z 2 ] = E[ Wy z 2 ] = E[ W(z +) z 2 ] = E[ (W I)z +W 2 ] = (W I)z 2 +E[ W 2 ] }{{}}{{} bias variace 4

Now, by usig eige-decompostio we ca write W as W = UΛU T. The, the bias ca be computed as bias = (W I)z 2 = E[ ẑ z 2 ] = (UΛU T I)z 2 = U(Λ I)U T z 2 = z T U(Λ I) 2 U T z = (λ i 1) 2 vi 2, where v = U T z. Similarly, the variace ca be computed as Therefore, the MSE ca be writte as: variace = E[ W 2 ] MSE = = E[ T W T W] { } = σ 2 Tr W T W = σ 2 λ 2 i (λ i 1) 2 vi 2 +σ2 To miimize MSE, λ i should be chose such that which is λ 2 i λ i MSE = 2v 2 i (λ i 1)+2σ 2 λ i = 0, λ i = v2 i v 2 i +σ2. Thus far we have come across may examples where the estimators are ubiased. So are biased estimators bad? The aswer is o. Here is a example. Let us cosider a radom variable Y N(0,σ 2 ). Now, cosider the followig two estimators: Estimator 1: θ 1 (Y) = Y 2. The E[ θ 1 (Y)] = E[Y 2 ] = σ 2, thus it is ubiased. Estimator 2: θ 2 (Y) = ay 2,a 1, the E[ θ 2 (Y)] = aσ 2. Thus it is biased. Let us ow cosider the MSE of θ 2. (Note that the MSE of θ 1 ca be foud by lettig a = 1.) Therefore, the MSE attais its miimum at: which is MSE = E[( θ 2 (Y) σ 2 ) 2 ] = E[(aY 2 σ 2 ) 2 ] = E[a 2 Y 4 ] 2σ 2 E[aY 2 ]+σ 4 = 3a 2 σ 4 2aσ 4 +σ 4 = σ 4 (3a 2 2a+1) a MSE = σ4 (6a 2) = 0, a = 1 3. This result says: although θ 2 is biased, it actually attais a lower MSE! 5

3 Fisher Iformatio 3.1 Variace ad Curvature of log-likelihood For ubiased estimators, the variace ca provide extremely importat iformatio about the performace of the estimators. I order to study the variace more carefully, we first study its relatioship with regard to the log-likelihood as demostrated i the example below. Example 7. Let Y N(θ,σ 2 ), where σ is kow. Accordigly, f θ (y) = 1 θ)2 exp( (y 2πσ 2 2σ 2 ) logf θ (y) = log 2πσ 2 1 θ)2 2σ2(y logf θ(y) θ = 1 σ2(y θ) 2 logf θ (y) } θ {{ 2 = 1 } σ 2 curvature of log-likelihood Therefore, as σ 2 icreases, we ca easily coclude that 2 θ 2 logf θ (y) will decrease. Thus, we coclude that with the variace icreasig, the curvature will be decreasig. 3.2 Fisher-Iformatio Defiitio 2. Fisher Iformatio The Fisher-iformatio is defied as: [ I(θ) def 2 ] logf θ (Y ) = E Y θ 2, (4) where [ 2 ] logf θ (Y ) 2 logf θ (y) E Y θ 2 = θ 2 f θ (y)dy (5) We will try to estimate the fisher iformatio i the examples below. Example 8. Let Y = [Y 1,...,Y ] be a sequece of iid radom variables such that Y i N(θ,σ 2 ). We would like to determie I(θ). First, we kow that the log-likelihood is The first order derivative is logf θ (y) = 2 log(2πσ2 ) logf θ (y) θ = σ2(y θ), (y i θ) 2σ 2. 6

where Cosequetly, the secod order derivative is Fially, the fisher iformatio is y = 1 y i. 2 logf θ (y) θ 2 I(θ) = E Y [ 2 logf θ (Y ) θ 2 = σ 2. ] = E Y [ σ ] 2 = σ 2. Example 9. Let Y = [Y 1,...,Y ] be a sequece of iid radom variables such that where N k N(0,σ 2 ). Fid I(θ). The likelihood fuctio is f θ (y) = Y k = Acos(w 0 k +θ)+n k, 1 exp( (y i Acos(w 0 k +θ)) 2 2πσ 2 2σ 2 ) 1 1 = exp( /2 2πσ 2 2σ 2 (y i Acos(w 0 k +θ)) 2 ) The, the first order derivative of the log-likelihood is [ ] θ logf θ(y) = 1 θ σ 2 (y i Acos(w 0 k +θ)) 2 = 1 σ 2 (y i Acos(w 0 k +θ))(asi(w 0 k +θ)) The secod order derivative is = A σ 2 2 θ 2 logf θ(y) = A σ 2 (y i si(w 0 k +θ) A 2 si(2w 0k +2θ)) [y i cos(w 0 k +θ) Acos(2w 0 k +2θ)] Accordigly, the E Y [ 2 θ logf 2 θ (Y )] ca be estimated as below: [ ] 2 E Y θ 2 logf θ(y ) = A σ 2 (E[Y i ]cos(w 0 k +θ) Acos(2w 0 k +2θ)) = A 2σ 2 = A2 σ 2 [Acos 2 (w 0 k +θ) Acos(2w 0 k +2θ)] ( 1 2 + 1 2 cos(2w 0k +2θ) cos(2w 0 k +2θ)) = A2 2σ 2 + A2 2σ 2 1 cos(2w 0 k +2θ) 7

By usig the fact that 1 cos(2w 0k +2θ) 0, we have: I(θ) = A2 2σ 2. 3.3 Fisher-Iformatio ad KL Divergece There is a iterestig relatioship betwee the Fisher-Iformatio ad the KL divergece, which we shall ow discuss. To begi with, let us first list out two assumptios. Assumptio: 1. 2. θ θ f θ (y)dy = θ(y)f θ (y)dy = θ f θ(y)dy θ f θ(y) θ(y)dy Basically, the two assumptios say that we ca iterchage the order of itegratio ad the differetiatio. If the assumptio holds, we ca show the followig result: [ (logfθ ) ] 2 (Y) I(θ) = E Y. (6) θ Proof. By the assumptios ad itegratio by part, we have [ ] ( I(θ) = E Y 2 logf θ (Y) f θ 2 = = f θ (y)dy 1 + f }{{} θ (y) (f θ (y))2 dy = 0 by (1) θ (y)f θ(y) (f θ (y))2 f 2 θ (y) ( ) [ 2 (logfθ ) ] 2 logfθ (y) (y) = f θ (y) dy = E Y. θ θ ) f θ (y)dy The followig propositio liks KL divergece ad I(θ). Propositio 1. Let θ = θ 0 +δ for some small deviatio δ, the D(f θ0 f θ ) I(θ 0) 2 (θ θ 0) 2 +O(θ θ 0 ) 3. (7) Iterpretatio: If I(θ) is large, the D(f θ0 f θ ) is large. Accordigly, it would be easier to differetiate θ 0 ad θ. Proof. First, recall that the KL divergece is defied as D(f θ0 f θ ) = f θ0 (y)log f θ 0 (y) f θ (y) dy. 8

Cosider Taylor expasio o θ 0, we estimate the first two terms as below. First-order derivative: θ D(f θ 0 f θ ) = f θ0 (y) θ=θ0 θ [logf θ 0 (y) logf θ (y)] dy θ=θ0 [ ] 1 = f θ0 (y) f θ (y) θ f dy θ(y) θ=θ 0 = θ f θ(y)dy = f θ (y)dy = 0. θ (8) Secod-order derivative: 2 θ 2D(f θ 0 f θ ) θ=θ0 = f θ0 (y) 2 θ 2 [ logf θ(y)]dy ] = E [ 2 θ 2 logf θ(y) = I(θ 0 ). θ=θ 0 (9) Substitute the above terms ito Taylor expasio ad igore the higher order terms: D(f θ0 f θ ) = D(f θ0 f θ0 )+(θ θ 0 ) θ D(f θ 0 f θ )+ (θ θ 0) 2 = (θ θ 0) 2 I(θ 0 )+O(θ θ 0 ) 3. 2 2 2 θ 2D(f θ 0 f θ )+O(θ θ) 3 (10) 4 Cramer-Rao Lower Boud (CRLB) Theorem The CRLB is a fudametal result that characterizes the performace of a estimator. Theorem 1. Uder the assumptios (1) ad (2): Var( θ(y)) ( θ E[ θ(y)]) 2 I(θ) (11) for ay estimator θ(y). Proof. To prove the iequality, we first ote that Lettig Var( θ(y))i(θ) = ) 2fθ ( ) 2 ( θ(y) E[ θ(y)] (y)dy θ logf θ(y) f θ (y)dy. A = θ(y) E[ θ(y)], B = θ logf θ(y), 9

the above equatio ca be simplified as Var( θ(y))i(θ) = E[A 2 ]E[B 2 ] E[AB] 2, where the iequality is due to Cauchy. We ca also show that [ E[AB] 2 = [ = [ = = ( θ(y) E[ θ(y)])( ] 2 θ logf θ(y))f θ (y)dy ( θ(y) E[ θ(y)]) ] 2 θ f θ(y)dy θ(y) θ f θ(y)dy E[ θ(y)] θ f θ(y)dy [ ] 2 ( ) 2 θ E[ θ(y)] 0 = θ E[ θ(y)]. ] 2 Propositio 2. A estimator θ(y) achieves CRLB equality if ad oly if θ(y) is a sufficiet statistic of a oe-parameter expoetial family. Proof. Suppose that CRLB equality holds, the we must have for some fuctio k(θ). This implies that Thus, logf θ (y) = θ logf θ(y) = k(θ)( θ(y) E[ θ(y)]), θ = a θ which is a oe-parameter expoetial family. k(θ )( θ(y) E[ θ(y)])dθ +H(y) k(θ )E[ θ(y)]dθ } a {{ } log C(θ) + H(y) }{{} log h(y) f θ (y) = C(θ)exp(Q(θ) θ(y)) h(y), θ + θ(y) k(θ )dθ. } a {{ } Q(θ) Coversely, suppose that θ(y) is a sufficiet statistic of a oe-parameter expoetial family, the, f θ (y) = C(θ)exp(Q(θ)T(y)) h(y), where T(y) = θ(y), ad ( C(θ) = exp(q(θ)t(y)) h(y)dy) 1. I order to show that Var{T(Y)} attais the CRLB, we eed to obtai the Fisher Iformatio: { ( ) } 2 I(θ) = E θ logf θ(y). 10

Note that sice logf θ (y) = Q(θ)T(y)+logh(y) log exp(q(θ)t(y))h(y)dy, we must have ( θ logf θ(y) = Q (θ)t(y) = Q (θ){t(y) E{T(Y)}}. ) exp(q(θ)t(y))h(y) T(y) dy Q (θ) exp(q(θ)t(y))h(y)dy Therefore, The Cramer Rao Lower boud is ( ) 2 I(θ) = E θ logf θ(y) = (Q (θ)) 2 Var{T(Y)}. Var( θ(y)) ( θ E[ θ(y)]) 2. I(θ) Thus we eed to determie ( θ E[ θ(y)]) 2. Suppose that θ(y) = T(Y), Therefore, θ {E{ θ(y)}} = T(y)exp(Q(θ)T(y))h(y)dy θ exp(q(θ)t(y))h(y)dy T(y) 2 exp(q(θ)t(y))h(y)dy exp(q(θ)t(y))h(y)dy ( T(y)exp(Q(θ)T(y))h(y)dy) 2 = Q (θ) = Q (θ)var{t(y)}. ( exp(q(θ)t(y))h(y)dy) 2 ( θ E[ θ(y)]) 2 I(θ) = Q (θ) 2 Var{T(Y)} 2 (Q (θ)) 2 Var{T(Y)} = Var(T(Y)) = Var( θ(y)), which shows that CRLB equality is attaied. Example 10. Let Y = [Y 1,...,Y ] be a sequece of iid radom variables such that Y i N(θ,σ 2 ). Cosider the estimator θ(y ) = 1 Y i. Is θ(y ) a MVUE? Solutio: The CRLB is Var( θ) ( θ E[ θ(y )]) 2, I(θ) where it is ot difficult to show that I(θ) = σ 2 ad E[Y i ] = θ. Therefore, CRLB becomes Var( θ) 1 I(θ) = σ2. 11

O the other had, we ca show that Var( θ) = Var ( 1 ) Y i = 1 2 Var(Y i ) = σ2 = 1 I(θ), which meas that CRLB equality is achieved. Therefore, the estimator is a MVUE. θ(y ) = 1 Y i Example 11. Let Y = [Y 1,...,Y ] be a sequece of iid radom variables such that Y k s k (θ) + N k where s k (θ) is a fuctio of k, ad N k N(0,σ 2 ). Fid CRLB for ay ubiased estimator θ. Solutio: The log-likelihood is Cosequetly, we ca show that 2 θ 2 logf θ(y) = 1 σ 2 logf θ (y) = 2 log(2πσ2 ) 1 2σ 2 (y i s k (θ)) 2 [ ] (y k s k (θ)) 2 s k (θ) θ 2 1 ( ) 2 sk (θ) σ 2. θ Accordigly, Therefore, ] E [ 2 θ 2 θ(y ) = 1 ( ) 2 sk (θ) σ 2. θ Var( θ) 1 I(θ) = σ 2 ( θ s k(θ)) 2. For example, if s k = θ, the Var( θ) σ2. If s k(θ) = Acos(w 0 k +θ), the Var( θ) 2σ2 A 2. 12