Lecture 6 Ecient estimators. Rao-Cramer bound.

Similar documents
Unbiased Estimation. February 7-12, 2008

Statistical Theory MT 2008 Problems 1: Solution sketches

Statistical Theory MT 2009 Problems 1: Solution sketches

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Lecture 12: September 27

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

1.010 Uncertainty in Engineering Fall 2008

Lecture 23: Minimal sufficiency

6. Sufficient, Complete, and Ancillary Statistics

Lecture 11 and 12: Basic estimation theory

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

x = Pr ( X (n) βx ) =

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Chapter 6 Principles of Data Reduction

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

Lecture 16: UMVUE: conditioning on sufficient and complete statistics

Exponential Families and Bayesian Inference

APPM 5720 Solutions to Problem Set Five

Support vector machine revisited

Lecture 7: Properties of Random Samples

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

7.1 Convergence of sequences of random variables

Topic 9: Sampling Distributions of Estimators

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 18: Sampling distributions

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

6.041/6.431 Spring 2009 Final Exam Thursday, May 21, 1:30-4:30 PM.

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

5. Likelihood Ratio Tests

Estimation for Complete Data

Solutions: Homework 3

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

TAMS24: Notations and Formulas

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Topic 9: Sampling Distributions of Estimators

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

AMS570 Lecture Notes #2

Topic 9: Sampling Distributions of Estimators

Algorithms for Clustering

Element sampling: Part 2

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

An Introduction to Randomized Algorithms

Department of Mathematics

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Lecture 10 October Minimaxity and least favorable prior sequences

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Maximum Likelihood Estimation and Complexity Regularization

Time series models 2007

Distribution of Random Samples & Limit theorems

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

LECTURE 8: ASYMPTOTICS I

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

6.867 Machine learning

2. The volume of the solid of revolution generated by revolving the area bounded by the

STAT Homework 1 - Solutions

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Lecture 2: Monte Carlo Simulation

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECE 901 Lecture 13: Maximum Likelihood Estimation

SOLUTION FOR HOMEWORK 7, STAT np(1 p) (α + β + n) + ( np + α

Problem Set 4 Due Oct, 12

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Approximations and more PMFs and PDFs

Random Variables, Sampling and Estimation

Notes 5 : More on the a.s. convergence of sums

Stat 421-SP2012 Interval Estimation Section

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

HOMEWORK I: PREREQUISITES FROM MATH 727

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Estimation of the Mean and the ACVF

Random Signals and Noise Winter Semester 2017 Problem Set 12 Wiener Filter Continuation

Lecture Stat Maximum Likelihood Estimation


o <Xln <X2n <... <X n < o (1.1)

Homework for 2/3. 1. Determine the values of the following quantities: a. t 0.1,15 b. t 0.05,15 c. t 0.1,25 d. t 0.05,40 e. t 0.

4. Partial Sums and the Central Limit Theorem

Binomial Distribution

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

EE 4TM4: Digital Communications II Probability Theory

Questions and Answers on Maximum Likelihood

Probability and Statistics

Stat410 Probability and Statistics II (F16)

Simulation. Two Rule For Inverting A Distribution Function

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein

Mathematical Statistics - MS

Kernel density estimator

CSE 527, Additional notes on MLE & EM

Transcription:

Lecture 6 Eciet estimators. Rao-Cramer boud. 1 MSE ad Suciecy Let X (X 1,..., X) be a radom sample from distributio f θ. Let θ ˆ δ(x) be a estimator of θ. Let T (X) be a suciet statistic for θ. As we have see already, MSE provides oe way to compare the quality of dieret estimators. I particular, estimators with smaller MSE are said to be more eciet. O the other had, oce we kow T (X), we ca discard X. How do these cocepts relate to each other? The theorem below shows that for ay estimator θˆ δ(x), there is aother estimator which depeds o data X oly through T (X) ad is at least as eciet as θˆ: Theorem 1 (Rao-Blackwell). I the settig above, dee φ(t ) E[δ(X) T ]. The θˆ2 φ(t (X)) is a estimator for θ ad MSE(θˆ2) MSE(θˆ). I additio, if θˆ is ubiased, the θˆ2 is ubiased as well. Proof. To show that θˆ2 is a estimator, we have to check that it does ot deped o θ. Ideed, sice T is suciet for θ, the coditioal distributio of X give T is idepedet of θ. So the coditioal distributio of δ(x) give T is idepedet of θ as well. I particular, the coditioal expectatio E[δ(X) T ] does ot deped o θ. Thus, φ(t (X)) depeds oly o the data X ad θˆ2 is a estimator. where i the last lie we used sice E[δ(X) T ] φ(t (X)). MSE(θˆ) E[(θˆ θˆ θˆ 2 + 2 θ) 2 ] E[(θˆ θ ˆ 2 ) 2 ] + 2E[( θˆ θˆ 2 )(θˆ2 θ)] + E[(θˆ2 θ) 2 ] E[(θˆ θ ˆ 2 ) 2 ] + 2E[(θˆ θ ˆ 2 )(θˆ2 θ)] + MSE(θˆ2) E[(θˆ ˆ θ 2 ) 2 ] + MSE(θˆ2), E[(θˆ θ ˆ 2 )(θˆ2 θ)] E[(δ(X) φ(t (X)))(φ(T (X)) θ)] E[E[(δ(X) φ(t (X)))(φ(T (X)) θ) T ]] E[(φ(T (X)) θ)e[(δ(x) φ(t (X))) T ]] E[(φ(T (X)) θ) (E[δ(X) T ] φ(t (X)))] 0, Cite as: Aa Mikusheva, course materials for 14.381 Statistical Methods i Ecoomics, Fall 2011. MIT OpeCourseWare (http://ocw.mit.edu), Massachusetts Istitute of Techology. Dowloaded o [DD Moth YYYY].

To show the last result, we have by the law of iterated expectatio. E[φ(T (X))] E[E[δ(X) T ]] E[δ(X)] θ Example Let X 1,..., X be a radom sample from Biomial(p, k), i.e. P {X j m} (k!/(m!(k m)!))p m (1 p) k m for ay iteger m 0. Suppose our parameter of iterest is the probability of oe success, i.e. θ P {X j 1} kp(1 p) k 1. Oe possible estimator is θˆ i1 I(X i 1)/. This estimator is ubiased, i.e. E[θˆ] θ. Let us d a suciet statistic. The joit desity of the data is f(x 1,..., x ) (k!/(xi!(k x i )!))p i (1 p) k x i i1 fuctio(x 1,..., x )p x i (1 p) k x i x Thus, T i1 X i is suciet. I fact, it is miimal suciet. Usig the Rao-Blackwell theorem, we ca improve θˆ by cosiderig its coditioal expectatio give T. Let φ E[θˆ T ] deote this estimator. The, for ay oegative iteger t, φ(t) E[ I(X i 1)/ X i t] i1 i1 P {X i 1 X j t}/ i1 j1 P {X 1 1 X j t} j1 P { X 1 1, j1 X j t} P { j1 X j t} P {X 1 1, j2 X j t 1} P { j1 X j t} P {X 1 1} P { j2 X j t 1} P { j1 X j t} kp(1 p) k 1 (k( 1))!/((t 1)!(k( 1) (t 1))!)p t 1 (1 p) k( 1) (t 1) (k)!/(t!(k t)!)p t (1 p) k t k(k( 1))! /((t 1)!(k( 1) (t 1))!) (k)!/(t!(k t)!) k(k( 1))!(k t)!t (k)!(k k + 1 t)! where we used the fact that X 1 is idepedet of (X 2,..., X ), i1 X i Biomial(k, p), ad i2 X i 2

Biomial(k( 1), p). So our ew estimator is ˆ k( k( 1))!(k i1 θ 2 φ(x 1,..., X ) X i)!( i1 X i) (k)!(k k + 1 i1 X i)! By the theorem above, it is ubiased ad at least as eciet as θˆ. The procedure we just applied is sometimes iformally referred to as Rao-Blackwellizatio. 2 Fisher iformatio Let f(x θ) with θ Θ be some parametric family. For give θ Θ, let Supp θ {x : f(x θ) > 0}. Supp θ is usually called the support of distributio f(x θ). Assume that Supp θ does ot deped o θ. As before, l(x, θ) log f(x θ) is called loglikelihood fuctio. Assume that l(x, θ) is twice cotiuously dieretiable i θ for all x S. Let X be some radom variable with distributio f(x θ). The Deitio 2. I(θ) E θ [( l(x, θ)/ θ) 2 ] is called Fisher iformatio. Fisher iformatio plays a importat role i maximum likelihood estimatio. The theorem below gives two iformatio equalities: Theorem 3. I the settig above, (1) E θ [ l(x, θ)/ θ] 0 ad (2) I(θ) E θ [ 2 l(x, θ)/ θ 2 ]. Proof. Sice l( x, θ) is twice dieretiable i θ, f(x θ) is twice dieretiable i θ as well. Dieretiatig idetity θ)dx 1 with respect to θ yields + f(x for all θ Θ. The secod dieretiatio yields + f(x θ) dx 0 θ for all θ Θ. I additio, ad The former equality yields + 2 f(x θ) dx 0 θ 2 (1) l(x, θ) log f(x θ) 1 f(x θ) θ θ f(x θ) θ 2 ( ) 2 l(x, θ) 1 f ( x θ) 1 2 f(x θ) θ 2 f 2 +. (x θ) θ f(x θ) θ 2 [ ] [ l(x, θ) 1 f(x θ) ] + 1 f(x θ) + f(x θ) E θ E θ f(x θ)dx dx 0, θ f(x θ) θ f(x θ) θ θ 3

which is our rst result. The latter equality yields [ 2 ( ) 2 ] l(x, θ) + 1 f(x θ) E θ θ 2 f(x θ) θ dx i view of equatio (1). So, [ ( ) ] 2 l(x, θ) I(θ) E θ θ + ( ) 2 1 f(x θ) f(x θ)dx f(x θ) θ + ( ) 2 1 f(x θ) dx f(x θ) θ [ ] 2 l(x, θ) E θ, θ 2 which is our secod result. Example Let us calculate Fisher iformatio for a N(µ, σ 2 ) distributio where σ 2 is kow. Thus, our parameter θ µ. The desity of a ormal distributio is f(x µ) exp( (x µ) 2 /(2σ 2 ))/ 2π. The loglikelihood is l(x, µ) log(2π)/2 (x µ) 2 /(2σ 2 ). So l(x, µ)/ µ (x µ)/σ 2 ad 2 l(x, µ)/ µ 2 1/σ 2. So E θ [ 2 l(x, µ)/ µ 2 ] 1/σ 2. At the same time, 2 4 2 I(θ) E µ [( l(x, µ)/ µ) 2 ] Eµ[(X µ) /σ ] 1/σ So, as was expected i view of the theorem above, I(θ) E µ [ 2 l(x, µ)/ µ 2 ] i this example. Example Let us calculate Fisher iformatio for a Beroulli(θ) distributio. Note that a Beroulli distributio is discrete. So we use probability mass fuctio (pms) istead of pdf. The pms of Beroulli(θ) is f(x θ) θ x (1 θ) 1 x for x {0, 1}. The log-likelihood is l(x, θ) x log θ + (1 x) log(1 θ). So l(x, θ)/ θ x/θ (1 x)/(1 θ) ad 2 l(x, θ)/ θ 2 x/θ 2 (1 x)/(1 θ) 2. So E θ [( l(x, θ)/ θ) 2 ] E θ [(X/θ (1 X)/(1 θ)) 2 ] E θ [X 2 /θ 2 ] 2E θ [X(1 X)/(θ(1 θ))] + E θ [(1 X) 2 /(1 θ) 2 ] E θ [X/θ 2 ] + E θ [(1 X)/(1 θ) 2 ] 1/(θ(1 θ)) 4

sice x x 2, x(1 x) 0, ad (1 x) (1 x) 2 if x {0, 1}. At the same time, So I(θ) E θ [ 2 l(x, θ)/ θ 2 ] as it should be. 2.1 Iformatio for a radom sample E 2 θ[ l(x, θ)/ θ 2 ] E θ [X/θ 2 + (1 X)/(1 θ) 2 ] θ/θ 2 + (1 θ)/(1 θ) 2 1/θ + 1/(1 θ) 1/(θ(1 θ)) Let us ow cosider Fisher iformatio for a radom sample. Let X ( X 1,..., X ) be a radom sample from distributio f(x θ). The the joit pdf is f (x) i1 f(x i θ) where x (x 1,..., x ). The joit log-likelihood is l (x, θ) i1 l(x i, θ). So Fisher iformatio for the sample X is I(θ) E θ [( ( ) ) 2 ] l X, θ / θ E θ [ ( l(x i, θ)/ θ)( l(x j, θ)/ θ)] 1 i,j E θ [ ( l(x i, θ)/ θ) 2 ] + 2E θ [ ( l(x i, θ)/ θ)( l(x j, θ)/ θ)] E I( i1 1 i<j 2 θ[ ( l(x i, θ)/ θ) ] i1 θ) where we used the fact that for ay i < j, E θ [( l(x i, θ)/ θ)( l(x j, θ)/ θ)] 0 by idepedece ad rst iformatio equality. Here I(θ) deotes Fisher iformatio for distributio f(x θ). 3 Rao-Cramer boud A importat questio i the theory of statistical estimatio is whether there is a otrivial boud such that o estimator ca be more eciet tha this boud. The theorem below is a result of this sort: Theorem 4 (Rao-Cramer boud). Let X (X 1,..., X ) be a radom sample from distributio f(x θ) with iformatio I (θ). Let W (X) be a estimator of θ such that (1) de θ [W (X)]/dθ W (x)df(x, θ)/dθdx where x (x 1,...x ) ad (2) V (W ) <. The V (W ) (de θ [W (X)]/dθ) 2 /I (θ). I particular, if W is ubiased for θ, the V (W ) 1/I θ (θ). 5

Proof. The rst iformatio equality gives E θ [ l(x, θ)/ θ] 0. So, cov(w (X), l(x, θ)/ θ) E[W (X) l(x, θ)/ θ] W (x) l(x, θ)/ θf(x θ)dx W (x) f(x θ)/ θ (1/f(x θ))f(x θ)dx W (x) f(x θ)/ θdx de θ [W (X)]/dθ. By the Cauchy-Schwarz iequality, (cov(w (X), l(x, θ)/ θ) 2 V (W (X))V ( l(x, θ)/ θ) V (W (X))I (θ). Thus, V (W (X)) (de θ [W (X)]/dθ) 2 /I (θ). If W is ubiased for θ, the E θ [W (X)] θ, de θ [W (X)]/dθ 1, ad V (W (X)) 1/I (θ). Example Let us calculate the Rao-Cramer boud for radom sample X 1,..., X from Beroulli(θ) distributio. We have already see that I(θ) 1/(θ(1 θ)) i this case. So Fisher iformatio for the sample is I (θ) /(θ(1 θ)). Thus, ay ubiased estimator of θ, uder some regularity coditios, has variace o smaller tha θ(1 θ)/. O the other had, let θˆ X i1 X i/ be a estimator of θ. The E θ [θˆ] θ, i.e. θˆ is ubiased, ad V (θˆ) θ(1 θ)/ which coicides with the Rao-Cramer boud. Thus, X is the uiformly miimum variace ubiased (UMVU) estimator of θ. Word uiformly i this situatio meas that X has the smallest variace amog ubiased estimators for all θ Θ. Example Let us ow cosider a couterexample to the Rao-Cramer theorem. Let X 1,..., X be a radom sample from U[0, θ]. The f(x i θ) 1/θ if x i [0, θ] ad 0 otherwise. So l(x i, θ) log θ if x i [0, θ]. The l/ θ 1/θ ad 2 l/ θ 2 1/θ 2. So I(θ) 1/θ 2 while E θ [ 2 l(x i, θ)/ θ 2 ] 1/θ 2 I(θ). Thus, the secod iformatio equality does ot hold i this example. The reaso is that support of the distributio depeds o θ i this example. Moreover, cosider a estimator θˆ (( + 1)/)X () of θ. The E θ [X () ] θ ad V (θˆ) (( + 1) 2 / 2 )V (X () ) θ 2 /(( + 2)) as we saw whe we cosidered order statistics. So θˆ is ubiased, but its variace is smaller tha 1/I (θ) θ 2 / 2. Thus, the Rao-Cramer theorem does ot work i this example as well. Agai, the reaso is that Rao-Cramer theorem assumes that support is idepedet of parameter. 6

MIT OpeCourseWare http://ocw.mit.edu 14.381 Statistical Method i Ecoomics Fall 2013 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.