Chapter 3. Point Estimation. 3.1 Introduction

Size: px

Start display at page:

Download "Chapter 3. Point Estimation. 3.1 Introduction"

Valentine Wood
5 years ago
Views:

1 Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e. γ : (Θ, B Θ ) (γ(θ), B γ ) 3.1 Introduction Def An estimator T is a measurable function T : (X, B X ) (γ(θ), B γ ). Of course, it is hoped that T (X) will tend to be close to the unknown estimand γ(θ), but this requirement is not part of a formal definition of an estimator. Desirable properties of an estimator are: Unbiasedness Consistency (strong, weak, in r th mean) Sufficiency Asymptotic Normality Minimal Sufficiency, Completeness, Invariance,... 41

2 42 CHAPTER 3. POINT ESTIMATION In the sequel we are interested in unbiased estimators and we shall lern about a further statistical criterion: efficiency. Def Let γ : Θ IR m be measurable. (a) A statistic T : (X, B X ) (IR m, B m ) is called unbiased, if E θ (T ) = γ(θ) θ Θ. (b) Each function γ on Θ, for which there exists an unbiased estimator, is called an estimable function. (c) For a biased estimator b(γ(θ), T ) := E θ (T ) γ(θ) is called the bias. (d) An estimator T is called asymptotically unbiased for γ(θ) if lim b(γ(θ), T n) = 0. n Def An estimator T is called median unbiased for γ(θ), if med θ (T ) = γ(θ) θ Θ. If T is unbiased for γ(θ), then in general g(t ) is biased for γ(θ), unless g is linear. Unbiased estimators do not always exist. Unbiased estimators are not always reasonable.

3 3.2. MINIMUM VARIANCE UNBIASED ESTIMATORS Minimum Variance Unbiased Estimators In the sequel the case Θ IR is considered. Def Let T be the set of all unbiased estimators T of θ with E θ (T 2 ) < for all θ Θ and let T θ0 be the set of all unbiased estimators T of θ 0 with E θ0 (T 2 ) < (a) T 0 T θ0 is called locally minimum variance unbiased estimator (LMVUE) in θ 0, if E θ0 [(T 0 θ 0 ) 2 ] E θ0 [(T θ 0 ) 2 ] for all T T θ0. (b) T T is called uniformly minimum variance unbiased estimator (UMVUE), if E θ [(T θ) 2 ] E θ [(T θ) 2 ] for all T T and θ Θ. Other names are: (locally) best unbiased estimator and in the case of a linear estimator (locally) best linear unbiased estimator (BLUE). Theorem Let T be as in Def , T =, and let T (0) be the set of all unbiased estimators of the zero, i.e. T (0) = {T 0 E θ (T 0 ) = 0, E θ (T 2 ) < θ Θ}. Then it holds that T T is UMVUE if and only if E θ (T 0 T ) = 0 for all θ Θ and T 0 T (0). Proof: According to the above assumption E θ [T 0 T ] exists for all θ Θ and T 0 T (0). Necessity: Suppose T T is UMVUE and there exists a θ 0 Θ and a T 0 T (0) such that E θ0 [T 0 T ] 0. Then T + λt 0 T for all λ IR. In case E θ0 [T0 2 ] = 0

4 44 CHAPTER 3. POINT ESTIMATION E θ0 [T 0 T ] = 0 (Schwarz inequality). Let hence E θ0 [T0 2 ] > 0 and choose λ 0 = E θ0 [T 0 T ]/E θ0 [T0 2 ]. Then for T + λt 0 = T T 0 E θ0 [T 0 T ]/E θ0 [T0 2 ] it holds that E θ0 [(T + λt 0 ) 2 ] = E θ0 [T 2 ] Eθ 2 0 [T 0 T ]/E θ0 (T0 2 ) < E θ0 (T 2 ) or V ar θ0 [T + λ 0 T 0 ] < V ar θ0 [T ] (contradiction!). Sufficiency: Suppose E θ [T 0 T ] = 0 holds for a T T and let T T. Then T T T (0) and from the above condition it follows that E θ [T (T T )] = 0 for all θ Θ, which entails E θ [T 2 ] = E θ [T T ] E θ [T 2 ] 1/2 E θ [(T ) 2 ] 1/2. For E θ [(T ) 2 ] = 0 there is nothing to prove. For E θ [(T ) 2 ] > 0 it follows that E θ [(T ) 2 ] E θ (T 2 ) for all θ Θ, hence V ar θ [T ] V ar θ [T ]for all θ Θ and T T. Theorem Let T. Then there exists at most one UMVUE. Proof: Let T and T be both UMVUE s. Then T T T (0), hence E θ [T ( T T )] = 0 or E θ [T T ] = Eθ [(T ) 2 ] or Cov θ (T, T ) = V ar θ (T ) = V ar θ ( T ), from which Corr(T, T ) = 1 follows for all θ Θ. Therefore there exist a, b IR with P θ (a T + b T = 0) = 1 for all θ Θ. Since E θ (a T + b T ) = (a + b)θ for all θ it follows that P θ (T = T ) = 1 for all θ Θ. Theorem (Rao-Blackwell) Let P = {P θ θ Θ}, T T and let S be sufficient for P. Then (a) E θ [(T S)] is independent of θ and an unbiased estimator for θ for all θ Θ and

5 3.2. MINIMUM VARIANCE UNBIASED ESTIMATORS 45 (b) E θ [(E(T S) θ) 2 ] E θ [(T θ) 2 ] if P θ (T = E(T S)) = 1 θ Θ. θ Θ. Equality holds if and only Proof: The independence from θ follows from the independence of the conditional distributions P X S=s and the unbiasedness from E θ [E(T S)] = E θ [T ] = θ. Therefore it is sufficient to show that E θ [E(T S) 2 ] E θ [T 2 ] for all θ Θ. Now E θ [T 2 ] = E θ [T 2 S]. Hence we have to show that [E(T S) 2 E[E(T 2 S)] holds P θ - a. e. for all θ Θ. But this follows from Schwarz s inequality (add E[1/S]). Equality holds in (b) if and only if i.e. E θ [E(T S) 2 ] = E θ (T 2 ), E θ [E[T 2 S] E 2 [T S]] = 0 which is equivalent to E θ [V ar(t S)] = 0 E[T 2 S] = E 2 [T S] P θ -a.e. T = E[T S]P θ -a.e. for all θ Θ. Theorem (Lehmann-Scheffé) If S is a complete sufficient statistic and if T T, then there exists an UMVUE, and it is given by E(T S). Proof: For T 1, T 2 T E θ [E(T 1 S) E(T 2 S)] = 0 holds for all θ Θ. Since S is complete E[T 1 S] = E[T 2 S] holds P θ -a.e., and this is the UMVUE according to Theorem Remark: (a) According to Rao & Blackwell s Theorem one should look to find unbiased functions of a sufficient statistic. If this sufficient statistic is complete, then this function is the UMVUE. (b) UMVUE s may exist, even if there does not exist a sufficient statistic.

6 46 CHAPTER 3. POINT ESTIMATION Theorem (Cramér-Rao-Fréchét) Let P = {P θ θ Θ} with µ densities (µ = # or µ = λ) and let Θ be an open interval in IR 1. {x f θ (x) = 0} be independent of θ Θ. For every θ let f θ (x)/ be defined. Suppose that (i) fθ dµ = f θdµ = 0 θ Θ. (ii) Let γ : Θ IR be differentiable on Θ, and let T be an unbiased estimator for γ(θ) such that E θ (T 2 ) < for all θ Θ. Let further T (x)f(x; θ)µ(dx) = T (x) f θ(x)µ(dx) θ Θ. Then (a) [ ] 2 [γ (θ)] 2 E θ [(T γ(θ)) 2 ] E θ log f(x; θ) θ Θ. For any θ 0 Θ, either γ (θ 0 ) = 0 and equality holds in (a) for θ = θ 0, or (b) Var θ0 (T ) = E θ [(T γ(θ)) 2 ] [γ (θ)] 2 E θ ( [ log f(x;θ) ] 2 ). If, in the latter case, equality holds in (b) and if T is not a constant, then there exists a real number K θ0 0 such that (c) T (x) γ(θ) = K θ0 log f(x; θ 0 ) µ a.e.

7 3.2. MINIMUM VARIANCE UNBIASED ESTIMATORS 47 Remarks: The function log f(x; θ 0 )/ is also called score function and [ ] 2 ( ) E θ log f(x; θ) log f(x; θ) = Var θ is called the Fisher Information I(θ). For γ(θ) = θ γ (θ) = 1, of course. Proof: Differentiating both sides of f(x; θ)µ(dx) = 1 leads (with (i)) to f(x; θ)µ(dx) = 0, or on {f > 0} to f(x; θ) {f>0} {f>0} f(x; θ) f(x; θ)µ(dx) = 0 logf(x; θ) f(x; θ)µ(dx) = 0, leading to E θ [ logf(x; θ) According to assumption (ii) we have γ(θ) = T (x)f(x; θ)µ(dx), γ (θ) = E θ [T (X) which entails logf(x; θ) ], ( ) logf(x; θ) E θ [T (X) γ(θ)] = γ (θ) and (a) follows from Schwarz inequality. ] = 0. For (b) it is sufficient to consider either the case γ (θ 0 ) 0 or the case, where in (a) the < -sign holds for a θ 0. In both cases for the Fisher- Information I (θ 0 ) > 0 holds, which entails (b).

8 48 CHAPTER 3. POINT ESTIMATION If in (b) the = -sign holds, then γ (θ 0 ) 0 must hold. Then according to Schwarz (in)equality there exists a real number K θ0, such that T (X) γ(θ 0 ) = K θ0 logf(x; θ 0 ) holds µ-a.e. Let for the vector case, Θ IR P, γ(θ) be a convex subset of IR k. Then f(x; θ)/ is a p vector, ( ) ( ) T I(θ) = E log f(x; θ) log f(x; θ) is a p p matrix, γ(θ) is a k p matrix and Var θ (T ) = E θ [ (T (X) γ(θ)) (T (X) γ(θ)) ] is a k k matrix. With the corresponding regularity conditions of Theorem one can easily show the corresponding inequality (d) Var θ (T ) ( ) ( ) T γ(θ) γ(θ) I(θ) 1, where the sign is to be understood as the difference between the left and the right hand side being a positive semidefinite matrix. For a proof of Theorem in the multiparameter case we refer to Lehmann/Casella (2001), pp Theorem 3.2.6: In the above case let p = k and assume that the k k matrix (γ) be regular for all θ Θ, and let f/ be continuous for all θ and x. Then in (d) the equality sign holds if and only if there are functions C(θ), Q 1 (θ),..., Q k (θ) and H(x), such that dp k = f(x; θ) = C(θ) exp Q dµ j (θ)t j (x) H(x), j=1

9 3.2. MINIMUM VARIANCE UNBIASED ESTIMATORS 49 and with Q(θ) = (Q 1 (θ),..., Q k (θ)) it holds that [( ) ] 1 [ ] Q lnc γ(θ) =. Proof: 1. Let f and γ have the above form. Then we show that in the CRinequality the = - sign holds. From f as above we have with c(θ) = exp {D(θ)} logf(x; θ) = Q (θ)t ((X) + D (θ), where Q (θ) = ( ) f(x;θ i ) j i, j = 1,..., k, [ ] and since E θ = 0, logf(x;θ) 0 = E θ [Q (θ)t (X) + D (θ)] = D (θ) + Q (θ)e θ [T (X)], weobtainfordet (Q (θ)) 0 (which may E θ [T (X)] = Q (θ) 1 D (θ). Hence the estimator ˆγ(θ) = T (X) is unbiased for γ(θ). Since T (X) γ(θ) = T (X) + Q (θ) 1 D (θ), we get, by putting K(θ) = Q (θ) 1, that K θ logf(x; θ) = T (X) + Q (θ) 1 D(θ), i. e. the equality sign holds in the CR-inequality. 2. From the CR-equality the above representation of f and γ follows. If equality holds, then there exists a regular (k x k)-matrix K θ such that T (X) γ(θ) = K θ logf(x; θ) µ a.e.

10 50 CHAPTER 3. POINT ESTIMATION or K 1 θ [T (X) γ(θ)] = logf(x; θ). We integrate both sides with resprect to θ, where we put D(θ) := Kθ 1 γ(θ)dθ and = Q(θ) := Kθ 1 dθ. Introducing an integration constant S(X), which generally depends on X, leads to ln f(x; θ) = Q(θ)T (X) + D(θ) + S(X), and with C(θ) :=exp {D(θ)} and H(X) = exp {S(X)} f and γ have the claimed form with ˆγ(θ) = T (X).. Corollary 3.2.7: If under the regularity conditions of Theorem T is an unbiased estimator for γ(θ) which assumes the Cramér-Rao lower bound, then T is minimal sufficient and complete. An unbiased estimator, which assumes the CR-bound, is called an efficient estimator. In the scalar case the ratio e(t, θ) between the CR-bound and V ar θ (T ) is called the efficiency of the estimator T. Obviously, 0 e(t, θ) 1. When comparing two unbiased estimators T 1 and T 2, e θ (T 1 T 2 ) := V ar θ(t 2 ) V ar θ (T 1 ) is called the relative efficiency of T 1 with respect to T 2. lim e θ(t ) is called the asymptotic efficiency and n lim e θ(t 1 T 2 ) is called the asymptotic relative efficiency. n

11 3.3. METHOD OF MOMENTS Method of Moments Let P = {P θ θ Θ} and γ : Θ IR k. In many cases, the estimands γ(θ) can be written as functions of the moments of P θ, γ(θ) = g(µ 1,..., µ k). In order to estimate γ(θ), one then may try to estimate γ(θ) by replacing the unknown moments µ j, j = 1,..., k, by the corresponding sample moments. Let T be any statistic with existing expectation µ t (θ) := E θ (T (X)) for all θ Θ. Then the SLLN (Chinchine) entails T n := (T (X 1 ) + T (X 2 ) T (X n ))/n µ t (θ) a.s. If Θ IR k and a statistic T = (T 1,..., T k ) with existing expectation µ T (θ) = (µ t1 (θ),..., µ tk (θ)), then one can try to find an estimator ˆθ n = (ˆθ 1,n,..., ˆθ k,n ) as a solution of the system of equations ˆµ t1 (ˆθ 1,n..., ˆθ k,n ) = (T 1 (X 1 ) T 1 (X n ))/n =: T 1,n... ˆµ tk (ˆθ 1,n..., ˆθ k,n ) = (T k (X 1 ) T k (X n ))/n =: T k,n. Under regularity conditions we have then (SLLN) ˆγ(ˆθ n ) = g(ˆµ t1,..., ˆµ tk ) g(µ t1,..., µ tk ) = γ(θ). If the moments up to order 2k exist, then according to the Lindeberg-Levy Central Limit Theorem the asymptotic normality of ˆγ(ˆθ n ) can be proved. Remark: In general method of moments estimators are not unique, and they are in general not functions of sufficent statistics and so they cannot be efficient either. 3.4 Maximum Likelihood Estimation Def : A solution ˆθ of sup L(θ; x) (3.2) θ Θ

12 52 CHAPTER 3. POINT ESTIMATION is called a Maximum Likelihood Estimator for θ. With the ML principle one tries to find the mode of the underlying distribution. Since very often the mode as an estimator of location is worse than the mean or the median, ML estimators often have poor small sample properties. Often it is simpler in practice to work with the log-likelihood function l than with L. If the µ density f(x; θ) is positive µ a.e., if Θ IR k is an open set and if ( /)f(x; θ) exists on Θ, then a solution of 3.2 fulfills the likelihood equations θ l(θ; x) := log f(x; θ) = 0. (3.3) A solution of 3.3 is called a MLE in the weak sense, a solution of 3.2 is called a strict MLE. Theorem 3.4.1: Let Θ IR k and Λ IR p be intervals, p k, and let γ : Θ Λ be surjective. If ˆθ is MLE for θ, then γ(ˆθ) is MLE for γ(θ). Proof: For each λ Λ let Θ λ := {θ Θ γ(θ) = λ} and let M(λ; x) := sup θ Θ λ L(θ; x). Let ˆθ be a MLE for θ. Then ˆθ belongs to one of the sets Θ λ, e.g. to Θˆλ, and it holds M(ˆλ; x) = sup L(θ; x) L(ˆθ; x) and λ maximizes M, θ Θˆλ since we have M(ˆλ; x) sup λ Λ M(λ; x) = sup L(θ; x) = L(ˆθ; x). θ Θ Theorem 3.4.2: Let S be a sufficient statistic for P = {P θ θ Θ} µ (σ finite). If a unique MLE ˆθ exists, then it is a (measurable) function of S.

13 3.4. MAXIMUM LIKELIHOOD ESTIMATION 53 Proof: Since S is sufficient, there exists a factorization f(x; θ) = g(s(x); θ)h(x). Maximizing f with resprect to θ is hence equivalent to maximizing g with resprect to θ, and g is a function of S, and ˆθ depends on x only through S. Remark: If the lilkelihood equations (3.3) exist and if there exists a sufficient statistic S, then the MLE s are given as a solution of log g(s(x); θ) = 0. Theorem 3.4.3: Suppose that the regularity conditions of the CR inequality are satisfied and that θ belongs to an open interval in IR k. If T is an unbiased estimator for which the covariance matrix attains the CR lower bound, then the likelihood equations have the unique solution ˆγ(θ) = T (X). Proof: According to Theorem (resp. its multivariate version) there exists a regular matrix K θ such that K θ logf(x; θ) = T (X) γ(θ) µ a.e. and the likelihood equation have the unique solution ˆ γ(θ) = T (X). For large sample considerations we introduce the following regularity conditions: (A0) For θ θ f(x; θ) f(x; θ ) (identifiability). (A1) The support of f(x; θ), i.e. the set A := { x f(x; θ) > 0}, does not depend on θ Θ. (A2) The sample observations X 1,..., X n are iid with a density f(x; θ) with respect to some σ finite measure µ.

14 54 CHAPTER 3. POINT ESTIMATION (A3) The parameter space Θ contains an open set Θ 0 and the true θ 0 is an interior point of Θ 0. (A4) The density f(x; θ) is differentiable for µ almost all x with respect to θ Θ 0 with derivative f(x; θ) := f(x; θ). Theorem 3.4.4: Let (A0) (A2) hold. Then P θ0 [L(θ 0 ; x) > L(θ; x)] 1 for n and for all θ θ 0. (3.4) Proof: For the proof we refer to Jensen s inequality, according to which for φ konvex on an open interval I with P (X I) = 1 and E(X) < φ[e(x)] E[φ(X)]. (A0) implies 1 n for all θ θ 0. n log[f(x i ; θ)/f(x i ; θ 0 )] < 0 i=1 According to the SLLN the left hand side converges a.s. to E θ0 [log {f(x; θ)/f(x; θ 0 )}]. Since log(.) is a strictly convex function, Jensen s inequality yields E θ0 [log {f(x; θ)/f(x; θ 0 )}] < log {E θ0 [f(x; θ)/f(x; θ 0 )]}, where the right hand side is equal to zero. This entails (3.3) If therefore the density f is a smooth function of θ, then one may expect that the MLE for θ will lie close to θ 0.

15 3.4. MAXIMUM LIKELIHOOD ESTIMATION 55 Theorem 3.4.5: Let (A0) (A4) hold. Then, with probability going to 1 the likelihood equations n l(θ; x) = 0, f(x j ; θ) f(x j ; θ) = 0 j=1 have a solution ˆθ n with ˆθ n θ 0 in probability for n. Proof: Let δ be sufficiently small such that (according to (A3)) (θ 0 δ, θ 0 + δ) Θ 0 and let S n := {x l(θ 0 x) > l(θ 0 δ x)and l(θ 0 x) > l(θ 0 + δ x)}. According to Theorem P θ0 (S n ) 1 for n. For each x S n there is hence a ˆθ n with θ 0 δ < ˆθ n < θ 0 + δ, where l(θ; x) takes a local maximum and therefore l(ˆθ n ) = 0. This entails that for each small enough δ there exists a sequence ˆθ n = ˆθ n (δ) of solustions, such that P θ0 ( ˆθ n θ 0 < δ) 1 for n. It remains to show that such a sequence exists which does not depend on δ. Let θn be the solution closest to θ 0. (It exists, since, because of the continuity of l(θ) the limes of a sequence of solutions is itself a solution.) Then it naturally holds that P θ0 ( θn θ 0 < δ) 1 for all δ > 0.. Remark: If the solutions are not unique, then the above Theorem does not yield a consistent sequence of estimators. θ 0 is unknown and the data don t tell you which root to choose. In order to show asymptotic efficiency for the univariate case further regularity conditions are needed:

16 56 CHAPTER 3. POINT ESTIMATION (A5) Θ IR is an open interval. (A6) For x A the density f(x; θ) is three times continuously differentiable with respect to θ. (A7) The integral f(x; θ)µ(dx) can be differenciated three times with respect to θ under the integral sign. (A8) For the Fisher information 0 < I(θ) < holds. (A9) To every θ 0 Θ there exists a δ > 0 and a function M(x) (both may depend on θ 0 ) such that 3 log f(x; θ) M(x) 3 for all x A, θ 0 δ < θ < θ 0 + δ with E θ0 [M(x)] <. Theorem 3.4.6: Let the conditions (A1), (A2), (A5) (A9) hold. Then for each consistent sequence ˆθ n of solutions of the likelihood equations holds. n(ˆθn θ 0 ) L N (0, I(θ) 1 ) Proof: For every fixed x A a Taylor series expansion of l(ˆθ n ) around θ 0 yields 0 = log(fx; ˆθ n ) = log (f(x; θ 0)) where θ n lies between θ 0 and ˆθ n With obvious abbreviations this is equal to + (ˆθn θ ) 2 log (f(x; θ 0 )) 2 ) 2 (ˆθn 3 log (f(x; θ θ n)) 0, 3 0 = l(ˆθ n ) = l(θ 0 ) + (ˆθ n θ 0 ) l(θ o ) (ˆθ n θ 0 ) 2 l (θ n)

17 3.4. MAXIMUM LIKELIHOOD ESTIMATION 57 or (ˆθ n θ 0 ) [ l(θ0 ) (ˆθ n θ 0 ) ] l (θn) = l(θ 0 ), and for the expression [...] 0 we obtain n(ˆθn θ 0 ) = n 1 l(θ n 0 ) n l(θ 1 0 ) 1 (ˆθ 2n n θ 0 ). (3.4) l (θn) In Theorem we have already shown that (ˆθ n θ 0 ) converges to zero in probability for n. We will now show that (1) n 1/2 l(θ0 ) converges weakly to a N (0, I(θ 0 )), (2) n 1 l(θ0 ) converges to I(θ 0 ) > 0 a.s. resp. in probability (3) 1 n l (θn) is stochastically bounded. (1): n 1/2 l(θ0 ) = n 1 n n i=1 log (f(x i ; θ 0 )) =: nb n, where according to the SLLN B n converges a.s. to [ ] log (f(x; θ0 )) B 0 = E θ0 = 0. 0 According to the CLT n [B n 0] converges in distribution to a normal distribution with expected value equal to zero and variance E [ ( ) ] 2 B0 2 = E logf(x; θ0 ) = I(θ 0 ) where I(θ 0 ) > 0 according to (A8).

18 58 CHAPTER 3. POINT ESTIMATION (2): Since with l = log f(x; θ), l f =, l = f we have f. f ( f) 2 f 2 n 1 l(θ0 ) = 1 n n f(x i ; θ 0 ) 2 f(x i ; θ 0 ) f(x i ; θ 0 ). f 2 (X i ; θ 0 ) i=1 According to the SLLN this term converges (a.s. probability) to I(θ 0 ), since and hence also in E θ0 [ 1 n ( f 2 f f )] = 1 2 f n = f 2 f dµ = E θ0 f 2 f dµ fdµ }{{} =0 [ 2 ] log(f(x; θ 0 ) = I(θ 2 0 ). 1 (3): Finally n l (θn) = 1 n n i=1 3 3 log(f(x i ; θ n)), and with (A9) we get 1 n l (θ n) 1 n [M(X 1) M(X n )]. The right hand side converges to E θ n [M(x)] < according to (A9). Since (ˆθ n θ 0 ) converges to zero in probability according to Theorem 3.4.5, the second term in the denominator of (3.4) converges to zero as well. Putting (1) to (3) together we have shown that n(ˆθ n θ 0 ) converges weakly to a N (0, I(θ 0 ) 1 ). Remarks: (1) A sequence of estimators which fulfils the conditions of Theorem is called an efficient likelihood estimator.

19 3.4. MAXIMUM LIKELIHOOD ESTIMATION 59 (2) (A6), (A7) entail for all θ Θ 0 [ ] log f(x; θ) (i) E = 0 and [ ] [ ( ) ] (ii) E 2 log f(x; θ) log f(x; θ) 2 2 = E = I(θ). Corollary 3.4.7: Let the conditions of Theorem hold. If the likelihood equations have a unique solution for all x and n resp. if the probability for multiple roots goes to zero for n, then the MLE is asymptotically efficient. Some final remarks: (1) In general, the likelihood equations (2) cannot be resolved explicitely. In this case the roots can be found only by using numerical procedures. (Problems of existence, uniqueness and convergence of solutions for used algorithms). (2) MLE s need strong prerequisites (conditions). Under certain conditions consistency and asymptotic normality still hold, even if the distributional assumptions do not exactly coincide with reality. But in this case asymptotic efficiency gets lost: Already small deviations between reality and model assumptions can lead to a considerable loss of efficiency. (3) Consistency and asymptotic normality may hold even if some regularity conditions of the above Theorems are violated. For the multivariate case Θ IR k a result like Theorem can be obtained in a similar way, if the conditions (A5),... are reformulated accordingly.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it