Lecture 6 Eciet estimators. Rao-Cramer boud. 1 MSE ad Suciecy Let X (X 1,..., X) be a radom sample from distributio f θ. Let θ ˆ δ(x) be a estimator of θ. Let T (X) be a suciet statistic for θ. As we have see already, MSE provides oe way to compare the quality of dieret estimators. I particular, estimators with smaller MSE are said to be more eciet. O the other had, oce we kow T (X), we ca discard X. How do these cocepts relate to each other? The theorem below shows that for ay estimator θˆ δ(x), there is aother estimator which depeds o data X oly through T (X) ad is at least as eciet as θˆ: Theorem 1 (Rao-Blackwell). I the settig above, dee φ(t ) E[δ(X) T ]. The θˆ2 φ(t (X)) is a estimator for θ ad MSE(θˆ2) MSE(θˆ). I additio, if θˆ is ubiased, the θˆ2 is ubiased as well. Proof. To show that θˆ2 is a estimator, we have to check that it does ot deped o θ. Ideed, sice T is suciet for θ, the coditioal distributio of X give T is idepedet of θ. So the coditioal distributio of δ(x) give T is idepedet of θ as well. I particular, the coditioal expectatio E[δ(X) T ] does ot deped o θ. Thus, φ(t (X)) depeds oly o the data X ad θˆ2 is a estimator. where i the last lie we used sice E[δ(X) T ] φ(t (X)). MSE(θˆ) E[(θˆ θˆ θˆ 2 + 2 θ) 2 ] E[(θˆ θ ˆ 2 ) 2 ] + 2E[( θˆ θˆ 2 )(θˆ2 θ)] + E[(θˆ2 θ) 2 ] E[(θˆ θ ˆ 2 ) 2 ] + 2E[(θˆ θ ˆ 2 )(θˆ2 θ)] + MSE(θˆ2) E[(θˆ ˆ θ 2 ) 2 ] + MSE(θˆ2), E[(θˆ θ ˆ 2 )(θˆ2 θ)] E[(δ(X) φ(t (X)))(φ(T (X)) θ)] E[E[(δ(X) φ(t (X)))(φ(T (X)) θ) T ]] E[(φ(T (X)) θ)e[(δ(x) φ(t (X))) T ]] E[(φ(T (X)) θ) (E[δ(X) T ] φ(t (X)))] 0, Cite as: Aa Mikusheva, course materials for 14.381 Statistical Methods i Ecoomics, Fall 2011. MIT OpeCourseWare (http://ocw.mit.edu), Massachusetts Istitute of Techology. Dowloaded o [DD Moth YYYY].
To show the last result, we have by the law of iterated expectatio. E[φ(T (X))] E[E[δ(X) T ]] E[δ(X)] θ Example Let X 1,..., X be a radom sample from Biomial(p, k), i.e. P {X j m} (k!/(m!(k m)!))p m (1 p) k m for ay iteger m 0. Suppose our parameter of iterest is the probability of oe success, i.e. θ P {X j 1} kp(1 p) k 1. Oe possible estimator is θˆ i1 I(X i 1)/. This estimator is ubiased, i.e. E[θˆ] θ. Let us d a suciet statistic. The joit desity of the data is f(x 1,..., x ) (k!/(xi!(k x i )!))p i (1 p) k x i i1 fuctio(x 1,..., x )p x i (1 p) k x i x Thus, T i1 X i is suciet. I fact, it is miimal suciet. Usig the Rao-Blackwell theorem, we ca improve θˆ by cosiderig its coditioal expectatio give T. Let φ E[θˆ T ] deote this estimator. The, for ay oegative iteger t, φ(t) E[ I(X i 1)/ X i t] i1 i1 P {X i 1 X j t}/ i1 j1 P {X 1 1 X j t} j1 P { X 1 1, j1 X j t} P { j1 X j t} P {X 1 1, j2 X j t 1} P { j1 X j t} P {X 1 1} P { j2 X j t 1} P { j1 X j t} kp(1 p) k 1 (k( 1))!/((t 1)!(k( 1) (t 1))!)p t 1 (1 p) k( 1) (t 1) (k)!/(t!(k t)!)p t (1 p) k t k(k( 1))! /((t 1)!(k( 1) (t 1))!) (k)!/(t!(k t)!) k(k( 1))!(k t)!t (k)!(k k + 1 t)! where we used the fact that X 1 is idepedet of (X 2,..., X ), i1 X i Biomial(k, p), ad i2 X i 2
Biomial(k( 1), p). So our ew estimator is ˆ k( k( 1))!(k i1 θ 2 φ(x 1,..., X ) X i)!( i1 X i) (k)!(k k + 1 i1 X i)! By the theorem above, it is ubiased ad at least as eciet as θˆ. The procedure we just applied is sometimes iformally referred to as Rao-Blackwellizatio. 2 Fisher iformatio Let f(x θ) with θ Θ be some parametric family. For give θ Θ, let Supp θ {x : f(x θ) > 0}. Supp θ is usually called the support of distributio f(x θ). Assume that Supp θ does ot deped o θ. As before, l(x, θ) log f(x θ) is called loglikelihood fuctio. Assume that l(x, θ) is twice cotiuously dieretiable i θ for all x S. Let X be some radom variable with distributio f(x θ). The Deitio 2. I(θ) E θ [( l(x, θ)/ θ) 2 ] is called Fisher iformatio. Fisher iformatio plays a importat role i maximum likelihood estimatio. The theorem below gives two iformatio equalities: Theorem 3. I the settig above, (1) E θ [ l(x, θ)/ θ] 0 ad (2) I(θ) E θ [ 2 l(x, θ)/ θ 2 ]. Proof. Sice l( x, θ) is twice dieretiable i θ, f(x θ) is twice dieretiable i θ as well. Dieretiatig idetity θ)dx 1 with respect to θ yields + f(x for all θ Θ. The secod dieretiatio yields + f(x θ) dx 0 θ for all θ Θ. I additio, ad The former equality yields + 2 f(x θ) dx 0 θ 2 (1) l(x, θ) log f(x θ) 1 f(x θ) θ θ f(x θ) θ 2 ( ) 2 l(x, θ) 1 f ( x θ) 1 2 f(x θ) θ 2 f 2 +. (x θ) θ f(x θ) θ 2 [ ] [ l(x, θ) 1 f(x θ) ] + 1 f(x θ) + f(x θ) E θ E θ f(x θ)dx dx 0, θ f(x θ) θ f(x θ) θ θ 3
which is our rst result. The latter equality yields [ 2 ( ) 2 ] l(x, θ) + 1 f(x θ) E θ θ 2 f(x θ) θ dx i view of equatio (1). So, [ ( ) ] 2 l(x, θ) I(θ) E θ θ + ( ) 2 1 f(x θ) f(x θ)dx f(x θ) θ + ( ) 2 1 f(x θ) dx f(x θ) θ [ ] 2 l(x, θ) E θ, θ 2 which is our secod result. Example Let us calculate Fisher iformatio for a N(µ, σ 2 ) distributio where σ 2 is kow. Thus, our parameter θ µ. The desity of a ormal distributio is f(x µ) exp( (x µ) 2 /(2σ 2 ))/ 2π. The loglikelihood is l(x, µ) log(2π)/2 (x µ) 2 /(2σ 2 ). So l(x, µ)/ µ (x µ)/σ 2 ad 2 l(x, µ)/ µ 2 1/σ 2. So E θ [ 2 l(x, µ)/ µ 2 ] 1/σ 2. At the same time, 2 4 2 I(θ) E µ [( l(x, µ)/ µ) 2 ] Eµ[(X µ) /σ ] 1/σ So, as was expected i view of the theorem above, I(θ) E µ [ 2 l(x, µ)/ µ 2 ] i this example. Example Let us calculate Fisher iformatio for a Beroulli(θ) distributio. Note that a Beroulli distributio is discrete. So we use probability mass fuctio (pms) istead of pdf. The pms of Beroulli(θ) is f(x θ) θ x (1 θ) 1 x for x {0, 1}. The log-likelihood is l(x, θ) x log θ + (1 x) log(1 θ). So l(x, θ)/ θ x/θ (1 x)/(1 θ) ad 2 l(x, θ)/ θ 2 x/θ 2 (1 x)/(1 θ) 2. So E θ [( l(x, θ)/ θ) 2 ] E θ [(X/θ (1 X)/(1 θ)) 2 ] E θ [X 2 /θ 2 ] 2E θ [X(1 X)/(θ(1 θ))] + E θ [(1 X) 2 /(1 θ) 2 ] E θ [X/θ 2 ] + E θ [(1 X)/(1 θ) 2 ] 1/(θ(1 θ)) 4
sice x x 2, x(1 x) 0, ad (1 x) (1 x) 2 if x {0, 1}. At the same time, So I(θ) E θ [ 2 l(x, θ)/ θ 2 ] as it should be. 2.1 Iformatio for a radom sample E 2 θ[ l(x, θ)/ θ 2 ] E θ [X/θ 2 + (1 X)/(1 θ) 2 ] θ/θ 2 + (1 θ)/(1 θ) 2 1/θ + 1/(1 θ) 1/(θ(1 θ)) Let us ow cosider Fisher iformatio for a radom sample. Let X ( X 1,..., X ) be a radom sample from distributio f(x θ). The the joit pdf is f (x) i1 f(x i θ) where x (x 1,..., x ). The joit log-likelihood is l (x, θ) i1 l(x i, θ). So Fisher iformatio for the sample X is I(θ) E θ [( ( ) ) 2 ] l X, θ / θ E θ [ ( l(x i, θ)/ θ)( l(x j, θ)/ θ)] 1 i,j E θ [ ( l(x i, θ)/ θ) 2 ] + 2E θ [ ( l(x i, θ)/ θ)( l(x j, θ)/ θ)] E I( i1 1 i<j 2 θ[ ( l(x i, θ)/ θ) ] i1 θ) where we used the fact that for ay i < j, E θ [( l(x i, θ)/ θ)( l(x j, θ)/ θ)] 0 by idepedece ad rst iformatio equality. Here I(θ) deotes Fisher iformatio for distributio f(x θ). 3 Rao-Cramer boud A importat questio i the theory of statistical estimatio is whether there is a otrivial boud such that o estimator ca be more eciet tha this boud. The theorem below is a result of this sort: Theorem 4 (Rao-Cramer boud). Let X (X 1,..., X ) be a radom sample from distributio f(x θ) with iformatio I (θ). Let W (X) be a estimator of θ such that (1) de θ [W (X)]/dθ W (x)df(x, θ)/dθdx where x (x 1,...x ) ad (2) V (W ) <. The V (W ) (de θ [W (X)]/dθ) 2 /I (θ). I particular, if W is ubiased for θ, the V (W ) 1/I θ (θ). 5
Proof. The rst iformatio equality gives E θ [ l(x, θ)/ θ] 0. So, cov(w (X), l(x, θ)/ θ) E[W (X) l(x, θ)/ θ] W (x) l(x, θ)/ θf(x θ)dx W (x) f(x θ)/ θ (1/f(x θ))f(x θ)dx W (x) f(x θ)/ θdx de θ [W (X)]/dθ. By the Cauchy-Schwarz iequality, (cov(w (X), l(x, θ)/ θ) 2 V (W (X))V ( l(x, θ)/ θ) V (W (X))I (θ). Thus, V (W (X)) (de θ [W (X)]/dθ) 2 /I (θ). If W is ubiased for θ, the E θ [W (X)] θ, de θ [W (X)]/dθ 1, ad V (W (X)) 1/I (θ). Example Let us calculate the Rao-Cramer boud for radom sample X 1,..., X from Beroulli(θ) distributio. We have already see that I(θ) 1/(θ(1 θ)) i this case. So Fisher iformatio for the sample is I (θ) /(θ(1 θ)). Thus, ay ubiased estimator of θ, uder some regularity coditios, has variace o smaller tha θ(1 θ)/. O the other had, let θˆ X i1 X i/ be a estimator of θ. The E θ [θˆ] θ, i.e. θˆ is ubiased, ad V (θˆ) θ(1 θ)/ which coicides with the Rao-Cramer boud. Thus, X is the uiformly miimum variace ubiased (UMVU) estimator of θ. Word uiformly i this situatio meas that X has the smallest variace amog ubiased estimators for all θ Θ. Example Let us ow cosider a couterexample to the Rao-Cramer theorem. Let X 1,..., X be a radom sample from U[0, θ]. The f(x i θ) 1/θ if x i [0, θ] ad 0 otherwise. So l(x i, θ) log θ if x i [0, θ]. The l/ θ 1/θ ad 2 l/ θ 2 1/θ 2. So I(θ) 1/θ 2 while E θ [ 2 l(x i, θ)/ θ 2 ] 1/θ 2 I(θ). Thus, the secod iformatio equality does ot hold i this example. The reaso is that support of the distributio depeds o θ i this example. Moreover, cosider a estimator θˆ (( + 1)/)X () of θ. The E θ [X () ] θ ad V (θˆ) (( + 1) 2 / 2 )V (X () ) θ 2 /(( + 2)) as we saw whe we cosidered order statistics. So θˆ is ubiased, but its variace is smaller tha 1/I (θ) θ 2 / 2. Thus, the Rao-Cramer theorem does ot work i this example as well. Agai, the reaso is that Rao-Cramer theorem assumes that support is idepedet of parameter. 6
MIT OpeCourseWare http://ocw.mit.edu 14.381 Statistical Method i Ecoomics Fall 2013 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.