3. Maximum likelihood estimators ad efficiecy 3.1. Maximum likelihood estimators. Let X 1,..., X be a radom sample, draw from a distributio P θ that depeds o a ukow parameter θ. We are lookig for a geeral method to produce a statistic T = T (X 1,..., X ) that (we hope) will be a reasoable estimator for θ. Oe possible aswer is the maximum likelihood method. Suppose I observed the values x 1,..., x. Before the experimet, the probability that exactly these values would occur was P θ (X 1 = x 1,..., X = x ), ad this will deped o θ. Sice I did observe these values, maybe it s a good idea to look for a θ that maximizes this probability (which, to impress the uiitiated, we ow call likelihood). Please do ot cofuse this maximizatio with the futile attempt to fid the θ that is ow most likely, give what I just observed. I really maximize over the coditio: give that θ has some cocrete value, we ca work out the probability that what I observed occurred, ad this is what I maximize. Exercise 3.1. Please elaborate. Ca you also make it plausible that there are (artificial) examples where the MLE is i fact quite likely to produce a estimate that is hopelessly off target? Defiitio 3.1. We call a statistic θ = θ(x 1,..., X ) a maximum likelihood estimator for θ if P θ (X 1 = x 1,..., X = x ) is maximal at θ = θ(x 1,..., x ). There is, i geeral, o guaratee that this maximum exists or (if it does) is uique, but we ll igore this potetial problem ad just hope for the best. Also, observe that if we take the defiitio apart very carefully, we discover a certai amout of jugglig aroud with argumets of fuctios: the MLE θ is a statistic, that is, a radom variable that is a fuctio of the radom sample, but the maximizig value of the parameter is obtaied by replacig the X j by their observed values x j. Alteratively, we could say that we cosider the likelihood fuctio L(x 1,..., x ) = P (X 1 = x 1,..., X = x ), the plug the radom variables X j ito their ow likelihood fuctio ad fially maximize, which the produces a maximizer that is a radom variable itself (ad i fact a statistic). Noe of this matters a whole lot right ow; we ll ecouter this curious procedure (plug radom variables ito fuctios obtaied from their ow distributio) agai i the ext sectio. Example 3.1. Let s retur to the coi flip example: P (X 1 = 1) = θ, P (X 1 = ) = 1 θ, ad here it s coveiet to combie this ito oe 22
MLE ad efficiecy 23 formula by writig P (X 1 = x) = θ x (1 θ) 1 x, for x =, 1. Thus P (X 1 = x 1,..., X = x ) = θ x j (1 θ) x j. We are lookig for the θ that maximizes this expressio. Take the θ derivative ad set this equal to zero. Also, let s abbreviate S = x j. Sθ S 1 (1 θ) S ( S)θ S (1 θ) S 1 = or S(1 θ) ( S)θ =, ad this has the solutio θ = S/. (We d ow have to check that this is ideed a maximum, but we skip this part.) So the MLE for this distributio is give by θ = T = X. It is reassurig that this obvious choice ow receives some theoretical justificatio. We kow that this estimator is ubiased. I geeral, however, MLEs ca be biased. To see this, let s retur to aother example that was discussed earlier. Example 3.2. Cosider agai the ur with a ukow umber N = θ of balls i it, labeled 1,..., N. We form a radom sample X 1,..., X by drawig times, with replacemet, accordig to the distributio P (X 1 = x) = (1/N)χ 1,...,N (x). For fixed x 1,..., x 1, the probability of observig this outcome is the give by { N max x j N (3.1) P (X 1 = x 1,..., X = x ) = max x j > N. We wat to fid the MLE, so we are tryig to maximize this over N, for fixed x 1,..., x. Clearly, eterig the secod lie of (3.1) is o good, so we must take N max x j. For ay such N, the quatity we re tryig to maximize equals N, so we get the largest possible value by takig the smallest N that is still allowed. I other words, the MLE is give by N = max X j. We kow that this estimator is ot ubiased. Agai, it is ice to see some theoretical justificatio emergig for a estimator that looked reasoable. Example 3.3. Recall that the Poisso distributio with parameter θ > is give by P (X = x) = θx x! e θ, (x =, 1, 2,...). Let s try to fid the MLE for θ. A radom sample draw from this distributio has the likelihood fuctio P (X 1 = x 1,..., X = x ) = θx 1+...+x x 1! x! e θ.
24 Christia Remlig We wat to maximize this with respect to θ, so we ca igore the deomiator, which does ot deped o θ. Let s agai write S = x j ; we the wat to maximize θ S e θ. This leads to or θ = S/, that is θ = X. Sθ S 1 e θ θ S e θ = Exercise 3.2. Show that EX = θ if X is Poisso distributed with parameter θ. Coclude that the MLE is ubiased. For radom samples draw from cotiuous distributios, the above recipe caot literally be applied because P (X 1 = x 1,..., X = x ) = always i this situatio. However, we ca modify it as follows: call a statistic θ a MLE for θ if θ(x 1,..., x ) maximizes the (joit) desity f X1,...,X (x 1,..., x ; θ) = f(x 1 ; θ)f(x 2 ; θ) f(x ; θ), for all possible values x j of the radom sample. I aalogy to our termiology i the discrete case, we will agai refer to this product of the desities as the likelihood fuctio. Example 3.4. Cosider the expoetial distributio with parameter θ; this is the distributio with desity (3.2) f(x) = e x/θ (x ), θ ad f(x) = for x <. Let s first fid EX for a expoetially distributed radom variable X: EX = 1 xe x/θ dx = xe x/θ + e x/θ dx = θ, θ by a itegratio by parts i the first step. (So it is atural to use θ as the parameter, rather tha 1/θ.) To fid the MLE for θ, we have to maximize θ e S/θ (writig, as usual, S = x j ). This gives θ 1 e S/θ + S θ 2 θ e S/θ = or θ = S/, that is, as a statistic, θ = X (agai...). This MLE is ubiased. What would have happeed if he had used η = 1/θ i (3.2) istead, to avoid the reciprocals? So f(x) = ηe ηx for x, ad I ow wat to fid the MLE η for η. I other words, I wat to maximize η e ηs, ad proceedig as above, we fid that this happes at η = /S or η = 1/X. Now recall that η = 1/θ, ad the MLE for θ was θ = X. This is o coicidece; essetially, we solved the same maximizatio problem
MLE ad efficiecy 25 twice, with slightly chaged otatio the secod time. I geeral, we have the followig (almost tautological) statemet: Theorem 3.2. Cosider parameters η, θ that parametrize the same distributio. Suppose that they are related by η = g(θ), for a bijective g. The, if θ is a MLE for θ, the η = g( θ) is a MLE for η. Exercise 3.3. Give a somewhat more explicit versio of the argumet suggested above. Notice, however, that the MLE estimator is o loger ubiased after the trasformatio. This could be checked rather quickly by a idirect argumet, but it is also possible to work thigs out explicitly. To get this started, let s first look at the distributio of the sum S 2 = X 1 + X 2 two idepedet expoetially distributed radom variables X 1, X 2. We kow that the desity of S 2 is the covolutio of the desity from (3.2) with itself: f 2 (x) = 1 x e t/θ e (x t)/θ dt = 1 θ 2 θ 2 xe x/θ Next, if we add oe more idepedet radom variable with this distributio, that is, if we cosider S 3 = S 2 + X 3, the the desity of S 3 ca be obtaied as the covolutio of f 2 with the desity f from (3.2), so f 3 (x) = 1 x te t/θ e (t x)/θ dt = 1 θ 3 2θ 3 x2 e x/θ. Cotiuig i this style, we fid that f (x) = 1 ( 1)!θ x 1 e x/θ. Exercise 3.4. Deote the desity of S = S by f. Show that the S/ has desity f(x) = f (x). Sice X = S/, the Exercise i particular says that X has desity (3.3) f(x) = ( 1)!θ (x) 1 e x/θ (x ). This is already quite iterestig, but let s keep goig. We were origially iterested i Y = 1/X, the MLE for η = 1/θ. We apply the usual techique to trasform the desities: P (Y y) = P (X 1/y) = 1/y f(x) dx,
26 Christia Remlig ad sice g = f Y that ca be obtaied as the y derivative of this, we see (3.4) g(y) = 1 y 2 f(1/y) = ( 1)!θ y 2 (/y) 1 e /(θy) (y > ). This gives EY = = yg(y) dy = ( 1)!θ ( 1)!θ t 2 e t dt. y 1 ( y ) 1 e /(θy) dy We have used the substitutio t = /(θy) to pass to the secod lie. The itegral ca be evaluated by repeated itegratio by parts, or, somewhat more elegatly, you recogize it as Γ( 1) = ( 2)!. So, puttig thigs together, it follows that E(1/X) = ( 1)θ = 1 η. I particular, Y = 1/X is ot a ubiased estimator for η; we are off by the factor /( 1) > 1 (which, however, is very close to 1 for large ). Exercise 3.5. Check oe more time that X is a ubiased estimator for θ, this time by makig use of the desity f from (3.3) to compute EX (i a admittedly rather clumsy way). You ca agai use the fact that Γ(k) = (k 1)! for k = 1, 2,.... Example 3.5. Cosider the uiform distributio o [, θ]: { 1/θ x θ f(x) = otherwise We would like to fid the MLE for θ. We the eed to maximize with respect to θ (for give x 1,..., x ) the likelihood fuctio { θ max x j θ f(x 1 ) f(x ) = max x j > θ. This first of all forces us to take θ max x j, to eter the first lie, ad the θ as small as (still) possible, to maximize θ. Thus θ = max(x 1,..., X ). This estimator is ot ubiased. Exercise 3.6. Why? This whole example is a exact (cotiuous) aalog of its discrete versio Example 3.2.
MLE ad efficiecy 27 Example 3.6. Fially, let s take a look at the ormal distributio. Let s first fid the MLE for θ = σ 2, for a ormal distributio with kow µ. We the eed to maximize θ /2 e A/θ, A = (x j µ) 2. 2 This gives (/2)/θ + A/θ 2 = or θ = 2A/, that is, 1 (3.5) θ = (X j µ) 2. Exercise 3.7. (a) Show that θ/σ 2 χ 2 (). (b) Coclude that θ is ubiased. By Theorem 3.2, the MLE for σ is the give by 1 σ = (Xj µ) 2. This estimator is ot ubiased. What if µ ad σ are both ukow? There is a obvious way to adapt our procedure: we ca maximize over both parameters simultaeously to obtai two statistics that ca serve as MLE style estimators. So we ow wat to maximize ( ) θ /2 exp 1 (x j µ) 2 2θ over both µ ad θ. This leads to the two coditios 2θ + 1 (x 2θ 2 j µ) 2 =, (x j µ) =. The secod equatio says that µ = (1/) x j =: x, ad the, by repeatig the calculatio from above, we see from this ad the first equatio that θ = (1/) (x j x) 2. I other words, 1 µ = X, θ = (X j X) 2 = 1 S2. So µ is ubiased, but θ is ot sice ES 2 = σ 2 = θ, so E θ = (( 1)/)θ. Exercise 3.8. Fid the MLE for θ for the followig desities: (a) f(x) = θx θ 1 for < x < 1, ad f(x) = otherwise, ad θ > ; (b) f(x) = e θ x for x θ ad f(x) = otherwise
28 Christia Remlig Exercise 3.9. Here s a example where the maximizatio does ot produce a uique value. Cosider the desity f(x) = (1/2)e x θ. Assume for coveiece that = 2k is eve ad cosider data x 1 < x 2 <... < x. The show that ay θ i the iterval x k < θ < x k+1 maximizes the likelihood fuctio. Exercise 3.1. (a) Show that f(x, θ) = 1 θ 2 xe x/θ (x ) (ad f(x) = for x < ) is a desity for θ >. (b) Fid the MLE θ for θ. (c) Show that θ is ubiased. 3.2. Cramer-Rao bouds. If a estimator is ubiased, it delivers the correct value at least o average. It would the be ice if this estimator showed oly little variatio about this correct value (of course, if T is biased, it is less clear if little variatio about the icorrect value is a good thig). Let s take aother look at our favorite example from this poit of view. So P (X 1 = 1) = θ, P (X 1 = ) = 1 θ, ad we are goig to use the MLE T = θ = X. Sice the X j are idepedet, the variaces add up ad thus Var(T ) = 1 Var(X θ(1 θ) 1) = 2 ad σ T = θ(1 θ)/ 1/(2 ). This does t look too bad. I particular, for large radom samples, it gets small; it decays at the rate σ T 1/. Could we perhaps do better tha this with a differet ubiased estimator? It turs out that this is ot the case. The statistic T = X is optimal i this example i the sese that it has the smallest possible variace amog all ubiased estimators. We ow derive such a result i a geeral settig. Let f(x, θ) be a desity that depeds o the parameter θ. We will assume throughout this sectio that f is sufficietly well behaved so that the followig maipulatios are justified, without actually makig explicit a precise versio of such assumptios. We will certaily eed f to be twice differetiable with respect to θ sice we will take this secod derivative, but this o its ow is ot sufficiet to justify some of the other steps (such as differetiatig uder the itegral sig). We have that f dx = 1, so by takig the θ derivative (ad iterchagig differetiatio ad itegral), we obtai that f/ θ dx =.
MLE ad efficiecy 29 This we may rewrite as (3.6) f(x, θ) l f(x, θ) dx =. θ There are potetial problems here with regios where f = ; to avoid these, I will simply iterpret (3.6) as a itegral over oly those parts of the real lie where f >. (To make sure that the argumet leadig to (3.6) is still justified i this settig, we should really make the additioal assumptio that {x : f(x, θ) > } does ot deped o θ, but we ll igore purely techical poits of this kid.) A alterative readig of (3.6) is E( / θ) l f(x, θ) =. Here (ad below) I use the geeral fact that Eg(X) = g(x)f(x) dx for ay fuctio g. Also ote the somewhat curious costructio here: we plug the radom variable X ito its ow desity (ad the take the logarithm) to produce the ew radom variable l f(x) (which also depeds o θ). If we take oe more derivative, the (3.6) becomes (3.7) ( ) 2 f(x, θ) 2 l f(x, θ) dx + f(x, θ) l f(x, θ) dx =. θ2 θ Defiitio 3.3. The Fisher iformatio is defied as I(θ) = E ( ) 2 l f(x, θ). θ This assumes that X is a cotiuous radom variable; i the discrete case, we replace f by P (X = x, θ) (ad agai plug X ito its ow distributio). From (3.7), we obtai the alterative formula (3.8) I(θ) = E 2 l f(x, θ); θ2 moreover, it is also true that (3.9) I(θ) = Var(( / θ) l f(x, θ)). Example 3.7. Let s retur oe more time to the coi flip example: P (X = x) = θ x (1 θ) 1 x (x =, 1), so l P = x l θ + (1 x) l(1 θ) ad (3.1) θ l P = x θ 1 x 1 θ.
3 Christia Remlig To fid the Fisher iformatio, we plug X ito this fuctio ad take the square. This produces X 2 (1 X)2 + θ2 (1 θ) X) 2X(1 = 2 θ(1 θ) X 2 ( 1 θ 2 + 1 (1 θ) 2 + 2 θ(1 θ) ) 2X ( ) 1 (1 θ) + 1 1 + 2 θ(1 θ) (1 θ). 2 Now recall that EX = EX 2 = θ, ad take the expectatio. We fid that I(θ) = θ(1 θ)2 + θ 3 + 2θ 2 (1 θ) 2θ 3 2θ 2 (1 θ) + θ 2 θ 2 (1 θ) 2 1 = θ(1 θ). Alteratively, we could have obtaied the same result more quickly from (3.8). Take oe more derivative i (3.1), plug X ito the resultig fuctio ad take the expectatio: I(θ) = E ( X θ 2 1 X (1 θ) 2 ) = 1 θ + 1 1 θ = 1 θ(1 θ) Example 3.8. Cosider the N(θ, 1) distributio. Its desity is give by f = (2π) 1/2 e (x θ)2 /2, so l f = (x θ) 2 /2 + C. Two differetiatios produce ( 2 / θ 2 ) l f = 1, so I = 1. Whe dealig with a radom sample X 1,..., X, Defiitio 3.3 ca be adapted by replacig f by what we called the likelihood fuctio i the previous sectio. More precisely, we could replace (3.9) with ( ) Var θ l L(X 1,..., X ; θ), where L(x 1,..., x ) = f(x 1 ) f(x ) (cotiuous case) or L(x 1,..., x ) = P (X 1 = x 1,..., X = x ) (discrete case). The, however, we ca use the product structure of L ad idepedece to evaluate (i the cotiuous case, say) Var ( θ l f(x j, θ) ) = ( ) Var θ l f(x j, θ) = I(θ), where ow I is the Fisher iformatio of a idividual radom variable X. A aalogous calculatio works i the discrete case.
MLE ad efficiecy 31 Theorem 3.4 (Cramer-Rao). Let T = T (X 1,..., X ) be a statistic ad write k(θ) = ET. The, uder suitable (smoothess) assumptios, Var(T ) (k (θ)) 2 I(θ). Corollary 3.5. If the statistic T i Theorem 3.4 is ubiased, the Var(T ) 1 I(θ). As a illustratio, let s agai look at the coi flip example with its MLE T = θ = X. We saw earlier that Var(T ) = θ(1 θ)/, ad this equals 1/(I) by our calculatio from Example 3.7. Sice T is also ubiased, this meas that this estimator achieves the Cramer-Rao boud from Corollary 3.5. We give a special ame to estimators that are optimal, i this sese: Defiitio 3.6. Let T be a ubiased estimator for θ. efficiet if T achieves the CR boud: Var(T ) = 1 I(θ) We call T So we ca summarize by sayig that X is a efficiet estimator for θ. Let s ow try to derive the CR boud. I ll do this for cotiuous radom variables, with desity f(x, θ). The k(θ) = dx 1 dx 2... dx T (x 1,..., x )f(x 1, θ) f(x, θ) ad thus (at least if we are allowed to freely iterchage differetiatios ad itegrals) k (θ) = dx 1 dx 2... dx T (x 1,..., x ) = = ET Z, f(x 1, θ) f(x j, θ) f(x, θ) θ dx 1 dx 2... dx T (x 1,..., x ) ( ) θ l f(x j, θ) f(x 1, θ) f(x, θ)
32 Christia Remlig where we have abbreviated Z = ( / θ) l f(x j, θ). We kow that EZ = (compare (3.6)) ad Var(Z) = I, by idepedece of the X j. We will ow eed the followig very importat ad fudametal tool: Exercise 3.11. Establish the Cauchy-Schwarz iequality: For ay two radom variables X, Y, we have that EXY ( EX 2) 1/2 ( EY 2 ) 1/2. Suggestio: Cosider the parabola f(t) = E(X + ty ) 2 ad fid its miimum. Exercise 3.12. Ca you also show that we have equality i the CSI precisely if X = cy or Y = cx for some c R? Exercise 3.13. Defie the correlatio coefficiet of two radom variables X, Y as E(X EX)(Y EY ) ρ X,Y =. σ X σ Y Deduce from the CSI that 1 ρ 1. Also, show that ρ = if X, Y are idepedet. (The coverse of this statemet is ot true, i geeral.) Sice EZ =, we ca write k (θ) = ET Z = E(T ET )Z = E(T ET )(Z EZ), ad ow the CSI shows that as claimed. k 2 Var(T )Var(Z) = I(θ)Var(T ), Exercise 3.14. Observe that the iequality was oly itroduced i the very last step. Thus, by Exercise 3.12, we have equality i the CR boud precisely if T ET ad Z are multiples of oe aother. I particular, this must hold for the efficiet statistic T = X from the coi flip example. Cofirm directly that ideed X θ = cz. Example 3.9. We saw i Example 3.4 that the MLE for the expoetial distributio f(x) = e x/θ /θ (x ) is give by T = θ = X ad that T is ubiased. Is T also efficiet? To aswer this, we compute the Fisher iformatio: l f = l θ x/θ, so 2 l f/ θ 2 = 1/θ 2 + 2X/θ 3, ad, takig expectatios, we see that I = 1/θ 2. O the other had, Var(T ) = (1/)Var(X 1 ) ad EX 2 1 = 1 θ x 2 e x/θ dx = θ 2 t 2 e t dt = 2θ 2,
MLE ad efficiecy 33 by two itegratios by parts. This implies that Var(X 1 ) = EX1 2 (EX 1 ) 2 = θ 2, ad thus Var(T ) = θ 2 / = 1/(I), ad T is ideed efficiet. Let s ow take aother look at the uiform distributio from Example 3.5. Its desity equals { 1/θ < x < θ f(x, θ) = otherwise ; recall that the MLE is give by θ = max(x 1,..., X ). We kow that T = θ is ot ubiased. Let s try to be more precise here. Sice P (T t) = (t/θ), the statistic T has desity f(t) = t 1 /θ ( < t < θ). It follows that ET = θ t dt = θ + 1 θ. Exercise 3.15. Show by a similar calculatio that ET 2 = (/(+2))θ 2. I particular, if we itroduce U = + 1 T = + 1 max(x 1,..., X ), the this ew statistic is ubiased (though it is o loger the MLE for θ). By the exercise, so (3.11) Var(U) = ( ) 2 + 1 EU 2 = ET 2 = ( ) ( + 1) 2 ( + 2) 1 θ 2 = ( + 1)2 ( + 2) θ2, θ 2 ( + 2). This looks great! I our previous examples, the variace decayed oly at the rate 1/, ad here we ow have that Var(U) 1/ 2. Come to thik of it, is this cosistet with the CR boud? Does t Corollary 3.5 say that Var(T ) 1/ for ay ubiased statistic T? The aswer to this is that the whole theory does t apply here. The desity f(x, θ) is ot cotiuous (let aloe differetiable) as a fuctio of θ; it jumps at θ = x. I fact, the problems ca be pipoited more precisely: (3.6) fails, the itegrad equals 1/θ 2, ad (3.6) was used to deduce that EZ =, so the whole argumet breaks dow. Recall that by our discussio followig (3.6), the itegratio i (3.6) is really oly exteded over < x < θ, so problems with the jump of f are temporarily avoided. (However, I also remarked parethetically that I would like the set {x : f(x, θ) > } to be idepedet of θ, ad this clearly fails here.)
34 Christia Remlig Let s compare U with aother ubiased estimator. Let V = 2X. Sice EX = EX 1 = θ/2, this is ideed ubiased. It is a cotiuous aalog of the ubiased estimator that we suggested (ot very seriously, though) i the ur example from Chapter 2; see pg. 1. We have that Var(X) = Var(X 1 )/ ad EX 2 1 = 1 θ θ so Var(X 1 ) = θ 2 (1/3 1/4) = θ 2 /12, thus t 2 dt = θ2 3, Var(V ) = θ2 3. This is markedly iferior to (3.11). We right away had a bad feelig about V (i Chapter 2); this ow receives precise theoretical cofirmatio. Exercise 3.16. However, if = 1, the Var(V ) = Var(U). Ca you explai this? Exercise 3.17. Cosider the desity { 2x/θ 2 x θ f(x, θ) = otherwise. (a) Fid the MLE θ. (b) Show that T = 2+1 θ is ubiased. 2 (c) Fid Var(T ). Suggestio: Proceed as i the discussio above. Example 3.1. Let s retur to the MLE T = θ = X for the Poisso distributio; compare Example 3.3. We saw earlier that this is ubiased. Is T also efficiet? To aswer this, we first work out the Fisher iformatio: l P (X = x, θ) = x l θ + θ + l x!, so by takig two derivatives ad the the expectatio, we fid that I(θ) = EX/θ 2 = 1/θ. O the other had, EX 2 1 = k= k 2 θk k! e θ = θ 2 k= θ k k! e θ + EX 1 = θ 2 + θ; the first step follows by writig k 2 = k(k 1) + k. Thus Var(X 1 ) = θ, hece Var(T ) = θ/, ad T is efficiet. Exercise 3.18. I this problem, you should frequetly refer to results ad calculatios from Example 3.4. Cosider the desity f(x, θ) =
MLE ad efficiecy 35 θe θx (x ) ad f(x) = for x <. Recall that T = 1 Y, Y = 1/X is a ubiased estimator for θ. (a) Fid the Fisher iformatio I(θ) for this desity. (b) Compute Var(T ); coclude that T is ot efficiet. (Later we will see that T evertheless has the smallest possible variace amog all ubiased estimators.) Suggestio: Use the desity of Y from (3.4) to work out EY 2, ad the ET 2 ad Var(T ). Avoid the trap of forgettig that the θ of the preset exercise correspods to 1/θ i (3.4). Example 3.11. Let s ow try to estimate the variace of a N(, σ) distributio. We take θ = σ 2 as the parameter labelig this family of desities. Two ubiased estimators come to mid: T 1 = 1 X 2 j, T 2 = S 2 = 1 1 ( Xj X ) 2 We kow from Example 3.6 that T 1 is the MLE for θ; see (3.5). We start out by computig the Fisher iformatio. We have that l f = (1/2) l θ + X 2 /(2θ) + C, so I(θ) = 1 2θ 2 + 1 θ 3 EX2 = 1 2θ 2. Next, idepedece gives that Var(T 1 ) = (1/)Var(X 2 1), ad this latter variace we compute as EX 4 1 (EX 2 1) 2. Exercise 3.19. Show that EX 4 1 = 3θ 2. Suggestio: Use itegratio by parts i the resultig itegral. Sice EX1 2 = Var(X 1 ) = θ, this shows that Var(X1) 2 = 2θ 2 ad thus Var(T 1 ) = 2θ 2 /. So T 1 is efficiet. As for T 2, we recall that ( 1)S 2 /θ χ 2 ( 1) ad also that this is the distributio of the sum of 1 iid N(, 1)-distributed radom variables. I other words, ( 1)S 2 /θ has the same distributio as Z = 1 Y j 2, with Y j iid ad Y j N(, 1). I particular, the variaces agree, ad Var(Z) = ( 1)Var(Y1 2 ) = 2( 1), by the calculatio we just did. Thus θ 2 Var(S 2 ) = 2( 1) ( 1) = 2θ2 2 1, ad this estimator is ot efficiet (it comes very close though).
36 Christia Remlig If we had used istead of the slightly uexpected 1 i the deomiator of the formula defiig S 2, the resultig estimator Y 3 = S2 has variace 1 2( 1)θ2 (3.12) Var(Y 3 ) = = 1 1 2 I(θ). This, of course, does ot cotradict the CR boud from Corollary 3.5: this estimator is ot ubiased. O the cotrary, everythig is i perfect order, we oly eed to refer to Theorem 3.4, which hadles this situatio. Sice k(θ) = EY 3 = ( 1)θ/, we have that k 2 = (( 1)/) 2, ad the variace from (3.12) is i fact slightly larger (by a factor of /( 1)) tha the lower boud provided by the theorem. Exercise 3.2. Cosider a radom sample draw from a N(θ, 1) distributio. Show that (the MLE) X is a efficiet estimator for θ.