Chapter 3. Point Estimation. 3.1 Introduction

Size: px
Start display at page:

Download "Chapter 3. Point Estimation. 3.1 Introduction"

Transcription

1 Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e. γ : (Θ, B Θ ) (γ(θ), B γ ) 3.1 Introduction Def An estimator T is a measurable function T : (X, B X ) (γ(θ), B γ ). Of course, it is hoped that T (X) will tend to be close to the unknown estimand γ(θ), but this requirement is not part of a formal definition of an estimator. Desirable properties of an estimator are: Unbiasedness Consistency (strong, weak, in r th mean) Sufficiency Asymptotic Normality Minimal Sufficiency, Completeness, Invariance,... 41

2 42 CHAPTER 3. POINT ESTIMATION In the sequel we are interested in unbiased estimators and we shall lern about a further statistical criterion: efficiency. Def Let γ : Θ IR m be measurable. (a) A statistic T : (X, B X ) (IR m, B m ) is called unbiased, if E θ (T ) = γ(θ) θ Θ. (b) Each function γ on Θ, for which there exists an unbiased estimator, is called an estimable function. (c) For a biased estimator b(γ(θ), T ) := E θ (T ) γ(θ) is called the bias. (d) An estimator T is called asymptotically unbiased for γ(θ) if lim b(γ(θ), T n) = 0. n Def An estimator T is called median unbiased for γ(θ), if med θ (T ) = γ(θ) θ Θ. If T is unbiased for γ(θ), then in general g(t ) is biased for γ(θ), unless g is linear. Unbiased estimators do not always exist. Unbiased estimators are not always reasonable.

3 3.2. MINIMUM VARIANCE UNBIASED ESTIMATORS Minimum Variance Unbiased Estimators In the sequel the case Θ IR is considered. Def Let T be the set of all unbiased estimators T of θ with E θ (T 2 ) < for all θ Θ and let T θ0 be the set of all unbiased estimators T of θ 0 with E θ0 (T 2 ) < (a) T 0 T θ0 is called locally minimum variance unbiased estimator (LMVUE) in θ 0, if E θ0 [(T 0 θ 0 ) 2 ] E θ0 [(T θ 0 ) 2 ] for all T T θ0. (b) T T is called uniformly minimum variance unbiased estimator (UMVUE), if E θ [(T θ) 2 ] E θ [(T θ) 2 ] for all T T and θ Θ. Other names are: (locally) best unbiased estimator and in the case of a linear estimator (locally) best linear unbiased estimator (BLUE). Theorem Let T be as in Def , T =, and let T (0) be the set of all unbiased estimators of the zero, i.e. T (0) = {T 0 E θ (T 0 ) = 0, E θ (T 2 ) < θ Θ}. Then it holds that T T is UMVUE if and only if E θ (T 0 T ) = 0 for all θ Θ and T 0 T (0). Proof: According to the above assumption E θ [T 0 T ] exists for all θ Θ and T 0 T (0). Necessity: Suppose T T is UMVUE and there exists a θ 0 Θ and a T 0 T (0) such that E θ0 [T 0 T ] 0. Then T + λt 0 T for all λ IR. In case E θ0 [T0 2 ] = 0

4 44 CHAPTER 3. POINT ESTIMATION E θ0 [T 0 T ] = 0 (Schwarz inequality). Let hence E θ0 [T0 2 ] > 0 and choose λ 0 = E θ0 [T 0 T ]/E θ0 [T0 2 ]. Then for T + λt 0 = T T 0 E θ0 [T 0 T ]/E θ0 [T0 2 ] it holds that E θ0 [(T + λt 0 ) 2 ] = E θ0 [T 2 ] Eθ 2 0 [T 0 T ]/E θ0 (T0 2 ) < E θ0 (T 2 ) or V ar θ0 [T + λ 0 T 0 ] < V ar θ0 [T ] (contradiction!). Sufficiency: Suppose E θ [T 0 T ] = 0 holds for a T T and let T T. Then T T T (0) and from the above condition it follows that E θ [T (T T )] = 0 for all θ Θ, which entails E θ [T 2 ] = E θ [T T ] E θ [T 2 ] 1/2 E θ [(T ) 2 ] 1/2. For E θ [(T ) 2 ] = 0 there is nothing to prove. For E θ [(T ) 2 ] > 0 it follows that E θ [(T ) 2 ] E θ (T 2 ) for all θ Θ, hence V ar θ [T ] V ar θ [T ]for all θ Θ and T T. Theorem Let T. Then there exists at most one UMVUE. Proof: Let T and T be both UMVUE s. Then T T T (0), hence E θ [T ( T T )] = 0 or E θ [T T ] = Eθ [(T ) 2 ] or Cov θ (T, T ) = V ar θ (T ) = V ar θ ( T ), from which Corr(T, T ) = 1 follows for all θ Θ. Therefore there exist a, b IR with P θ (a T + b T = 0) = 1 for all θ Θ. Since E θ (a T + b T ) = (a + b)θ for all θ it follows that P θ (T = T ) = 1 for all θ Θ. Theorem (Rao-Blackwell) Let P = {P θ θ Θ}, T T and let S be sufficient for P. Then (a) E θ [(T S)] is independent of θ and an unbiased estimator for θ for all θ Θ and

5 3.2. MINIMUM VARIANCE UNBIASED ESTIMATORS 45 (b) E θ [(E(T S) θ) 2 ] E θ [(T θ) 2 ] if P θ (T = E(T S)) = 1 θ Θ. θ Θ. Equality holds if and only Proof: The independence from θ follows from the independence of the conditional distributions P X S=s and the unbiasedness from E θ [E(T S)] = E θ [T ] = θ. Therefore it is sufficient to show that E θ [E(T S) 2 ] E θ [T 2 ] for all θ Θ. Now E θ [T 2 ] = E θ [T 2 S]. Hence we have to show that [E(T S) 2 E[E(T 2 S)] holds P θ - a. e. for all θ Θ. But this follows from Schwarz s inequality (add E[1/S]). Equality holds in (b) if and only if i.e. E θ [E(T S) 2 ] = E θ (T 2 ), E θ [E[T 2 S] E 2 [T S]] = 0 which is equivalent to E θ [V ar(t S)] = 0 E[T 2 S] = E 2 [T S] P θ -a.e. T = E[T S]P θ -a.e. for all θ Θ. Theorem (Lehmann-Scheffé) If S is a complete sufficient statistic and if T T, then there exists an UMVUE, and it is given by E(T S). Proof: For T 1, T 2 T E θ [E(T 1 S) E(T 2 S)] = 0 holds for all θ Θ. Since S is complete E[T 1 S] = E[T 2 S] holds P θ -a.e., and this is the UMVUE according to Theorem Remark: (a) According to Rao & Blackwell s Theorem one should look to find unbiased functions of a sufficient statistic. If this sufficient statistic is complete, then this function is the UMVUE. (b) UMVUE s may exist, even if there does not exist a sufficient statistic.

6 46 CHAPTER 3. POINT ESTIMATION Theorem (Cramér-Rao-Fréchét) Let P = {P θ θ Θ} with µ densities (µ = # or µ = λ) and let Θ be an open interval in IR 1. {x f θ (x) = 0} be independent of θ Θ. For every θ let f θ (x)/ be defined. Suppose that (i) fθ dµ = f θdµ = 0 θ Θ. (ii) Let γ : Θ IR be differentiable on Θ, and let T be an unbiased estimator for γ(θ) such that E θ (T 2 ) < for all θ Θ. Let further T (x)f(x; θ)µ(dx) = T (x) f θ(x)µ(dx) θ Θ. Then (a) [ ] 2 [γ (θ)] 2 E θ [(T γ(θ)) 2 ] E θ log f(x; θ) θ Θ. For any θ 0 Θ, either γ (θ 0 ) = 0 and equality holds in (a) for θ = θ 0, or (b) Var θ0 (T ) = E θ [(T γ(θ)) 2 ] [γ (θ)] 2 E θ ( [ log f(x;θ) ] 2 ). If, in the latter case, equality holds in (b) and if T is not a constant, then there exists a real number K θ0 0 such that (c) T (x) γ(θ) = K θ0 log f(x; θ 0 ) µ a.e.

7 3.2. MINIMUM VARIANCE UNBIASED ESTIMATORS 47 Remarks: The function log f(x; θ 0 )/ is also called score function and [ ] 2 ( ) E θ log f(x; θ) log f(x; θ) = Var θ is called the Fisher Information I(θ). For γ(θ) = θ γ (θ) = 1, of course. Proof: Differentiating both sides of f(x; θ)µ(dx) = 1 leads (with (i)) to f(x; θ)µ(dx) = 0, or on {f > 0} to f(x; θ) {f>0} {f>0} f(x; θ) f(x; θ)µ(dx) = 0 logf(x; θ) f(x; θ)µ(dx) = 0, leading to E θ [ logf(x; θ) According to assumption (ii) we have γ(θ) = T (x)f(x; θ)µ(dx), γ (θ) = E θ [T (X) which entails logf(x; θ) ], ( ) logf(x; θ) E θ [T (X) γ(θ)] = γ (θ) and (a) follows from Schwarz inequality. ] = 0. For (b) it is sufficient to consider either the case γ (θ 0 ) 0 or the case, where in (a) the < -sign holds for a θ 0. In both cases for the Fisher- Information I (θ 0 ) > 0 holds, which entails (b).

8 48 CHAPTER 3. POINT ESTIMATION If in (b) the = -sign holds, then γ (θ 0 ) 0 must hold. Then according to Schwarz (in)equality there exists a real number K θ0, such that T (X) γ(θ 0 ) = K θ0 logf(x; θ 0 ) holds µ-a.e. Let for the vector case, Θ IR P, γ(θ) be a convex subset of IR k. Then f(x; θ)/ is a p vector, ( ) ( ) T I(θ) = E log f(x; θ) log f(x; θ) is a p p matrix, γ(θ) is a k p matrix and Var θ (T ) = E θ [ (T (X) γ(θ)) (T (X) γ(θ)) ] is a k k matrix. With the corresponding regularity conditions of Theorem one can easily show the corresponding inequality (d) Var θ (T ) ( ) ( ) T γ(θ) γ(θ) I(θ) 1, where the sign is to be understood as the difference between the left and the right hand side being a positive semidefinite matrix. For a proof of Theorem in the multiparameter case we refer to Lehmann/Casella (2001), pp Theorem 3.2.6: In the above case let p = k and assume that the k k matrix (γ) be regular for all θ Θ, and let f/ be continuous for all θ and x. Then in (d) the equality sign holds if and only if there are functions C(θ), Q 1 (θ),..., Q k (θ) and H(x), such that dp k = f(x; θ) = C(θ) exp Q dµ j (θ)t j (x) H(x), j=1

9 3.2. MINIMUM VARIANCE UNBIASED ESTIMATORS 49 and with Q(θ) = (Q 1 (θ),..., Q k (θ)) it holds that [( ) ] 1 [ ] Q lnc γ(θ) =. Proof: 1. Let f and γ have the above form. Then we show that in the CRinequality the = - sign holds. From f as above we have with c(θ) = exp {D(θ)} logf(x; θ) = Q (θ)t ((X) + D (θ), where Q (θ) = ( ) f(x;θ i ) j i, j = 1,..., k, [ ] and since E θ = 0, logf(x;θ) 0 = E θ [Q (θ)t (X) + D (θ)] = D (θ) + Q (θ)e θ [T (X)], weobtainfordet (Q (θ)) 0 (which may E θ [T (X)] = Q (θ) 1 D (θ). Hence the estimator ˆγ(θ) = T (X) is unbiased for γ(θ). Since T (X) γ(θ) = T (X) + Q (θ) 1 D (θ), we get, by putting K(θ) = Q (θ) 1, that K θ logf(x; θ) = T (X) + Q (θ) 1 D(θ), i. e. the equality sign holds in the CR-inequality. 2. From the CR-equality the above representation of f and γ follows. If equality holds, then there exists a regular (k x k)-matrix K θ such that T (X) γ(θ) = K θ logf(x; θ) µ a.e.

10 50 CHAPTER 3. POINT ESTIMATION or K 1 θ [T (X) γ(θ)] = logf(x; θ). We integrate both sides with resprect to θ, where we put D(θ) := Kθ 1 γ(θ)dθ and = Q(θ) := Kθ 1 dθ. Introducing an integration constant S(X), which generally depends on X, leads to ln f(x; θ) = Q(θ)T (X) + D(θ) + S(X), and with C(θ) :=exp {D(θ)} and H(X) = exp {S(X)} f and γ have the claimed form with ˆγ(θ) = T (X).. Corollary 3.2.7: If under the regularity conditions of Theorem T is an unbiased estimator for γ(θ) which assumes the Cramér-Rao lower bound, then T is minimal sufficient and complete. An unbiased estimator, which assumes the CR-bound, is called an efficient estimator. In the scalar case the ratio e(t, θ) between the CR-bound and V ar θ (T ) is called the efficiency of the estimator T. Obviously, 0 e(t, θ) 1. When comparing two unbiased estimators T 1 and T 2, e θ (T 1 T 2 ) := V ar θ(t 2 ) V ar θ (T 1 ) is called the relative efficiency of T 1 with respect to T 2. lim e θ(t ) is called the asymptotic efficiency and n lim e θ(t 1 T 2 ) is called the asymptotic relative efficiency. n

11 3.3. METHOD OF MOMENTS Method of Moments Let P = {P θ θ Θ} and γ : Θ IR k. In many cases, the estimands γ(θ) can be written as functions of the moments of P θ, γ(θ) = g(µ 1,..., µ k). In order to estimate γ(θ), one then may try to estimate γ(θ) by replacing the unknown moments µ j, j = 1,..., k, by the corresponding sample moments. Let T be any statistic with existing expectation µ t (θ) := E θ (T (X)) for all θ Θ. Then the SLLN (Chinchine) entails T n := (T (X 1 ) + T (X 2 ) T (X n ))/n µ t (θ) a.s. If Θ IR k and a statistic T = (T 1,..., T k ) with existing expectation µ T (θ) = (µ t1 (θ),..., µ tk (θ)), then one can try to find an estimator ˆθ n = (ˆθ 1,n,..., ˆθ k,n ) as a solution of the system of equations ˆµ t1 (ˆθ 1,n..., ˆθ k,n ) = (T 1 (X 1 ) T 1 (X n ))/n =: T 1,n... ˆµ tk (ˆθ 1,n..., ˆθ k,n ) = (T k (X 1 ) T k (X n ))/n =: T k,n. Under regularity conditions we have then (SLLN) ˆγ(ˆθ n ) = g(ˆµ t1,..., ˆµ tk ) g(µ t1,..., µ tk ) = γ(θ). If the moments up to order 2k exist, then according to the Lindeberg-Levy Central Limit Theorem the asymptotic normality of ˆγ(ˆθ n ) can be proved. Remark: In general method of moments estimators are not unique, and they are in general not functions of sufficent statistics and so they cannot be efficient either. 3.4 Maximum Likelihood Estimation Def : A solution ˆθ of sup L(θ; x) (3.2) θ Θ

12 52 CHAPTER 3. POINT ESTIMATION is called a Maximum Likelihood Estimator for θ. With the ML principle one tries to find the mode of the underlying distribution. Since very often the mode as an estimator of location is worse than the mean or the median, ML estimators often have poor small sample properties. Often it is simpler in practice to work with the log-likelihood function l than with L. If the µ density f(x; θ) is positive µ a.e., if Θ IR k is an open set and if ( /)f(x; θ) exists on Θ, then a solution of 3.2 fulfills the likelihood equations θ l(θ; x) := log f(x; θ) = 0. (3.3) A solution of 3.3 is called a MLE in the weak sense, a solution of 3.2 is called a strict MLE. Theorem 3.4.1: Let Θ IR k and Λ IR p be intervals, p k, and let γ : Θ Λ be surjective. If ˆθ is MLE for θ, then γ(ˆθ) is MLE for γ(θ). Proof: For each λ Λ let Θ λ := {θ Θ γ(θ) = λ} and let M(λ; x) := sup θ Θ λ L(θ; x). Let ˆθ be a MLE for θ. Then ˆθ belongs to one of the sets Θ λ, e.g. to Θˆλ, and it holds M(ˆλ; x) = sup L(θ; x) L(ˆθ; x) and λ maximizes M, θ Θˆλ since we have M(ˆλ; x) sup λ Λ M(λ; x) = sup L(θ; x) = L(ˆθ; x). θ Θ Theorem 3.4.2: Let S be a sufficient statistic for P = {P θ θ Θ} µ (σ finite). If a unique MLE ˆθ exists, then it is a (measurable) function of S.

13 3.4. MAXIMUM LIKELIHOOD ESTIMATION 53 Proof: Since S is sufficient, there exists a factorization f(x; θ) = g(s(x); θ)h(x). Maximizing f with resprect to θ is hence equivalent to maximizing g with resprect to θ, and g is a function of S, and ˆθ depends on x only through S. Remark: If the lilkelihood equations (3.3) exist and if there exists a sufficient statistic S, then the MLE s are given as a solution of log g(s(x); θ) = 0. Theorem 3.4.3: Suppose that the regularity conditions of the CR inequality are satisfied and that θ belongs to an open interval in IR k. If T is an unbiased estimator for which the covariance matrix attains the CR lower bound, then the likelihood equations have the unique solution ˆγ(θ) = T (X). Proof: According to Theorem (resp. its multivariate version) there exists a regular matrix K θ such that K θ logf(x; θ) = T (X) γ(θ) µ a.e. and the likelihood equation have the unique solution ˆ γ(θ) = T (X). For large sample considerations we introduce the following regularity conditions: (A0) For θ θ f(x; θ) f(x; θ ) (identifiability). (A1) The support of f(x; θ), i.e. the set A := { x f(x; θ) > 0}, does not depend on θ Θ. (A2) The sample observations X 1,..., X n are iid with a density f(x; θ) with respect to some σ finite measure µ.

14 54 CHAPTER 3. POINT ESTIMATION (A3) The parameter space Θ contains an open set Θ 0 and the true θ 0 is an interior point of Θ 0. (A4) The density f(x; θ) is differentiable for µ almost all x with respect to θ Θ 0 with derivative f(x; θ) := f(x; θ). Theorem 3.4.4: Let (A0) (A2) hold. Then P θ0 [L(θ 0 ; x) > L(θ; x)] 1 for n and for all θ θ 0. (3.4) Proof: For the proof we refer to Jensen s inequality, according to which for φ konvex on an open interval I with P (X I) = 1 and E(X) < φ[e(x)] E[φ(X)]. (A0) implies 1 n for all θ θ 0. n log[f(x i ; θ)/f(x i ; θ 0 )] < 0 i=1 According to the SLLN the left hand side converges a.s. to E θ0 [log {f(x; θ)/f(x; θ 0 )}]. Since log(.) is a strictly convex function, Jensen s inequality yields E θ0 [log {f(x; θ)/f(x; θ 0 )}] < log {E θ0 [f(x; θ)/f(x; θ 0 )]}, where the right hand side is equal to zero. This entails (3.3) If therefore the density f is a smooth function of θ, then one may expect that the MLE for θ will lie close to θ 0.

15 3.4. MAXIMUM LIKELIHOOD ESTIMATION 55 Theorem 3.4.5: Let (A0) (A4) hold. Then, with probability going to 1 the likelihood equations n l(θ; x) = 0, f(x j ; θ) f(x j ; θ) = 0 j=1 have a solution ˆθ n with ˆθ n θ 0 in probability for n. Proof: Let δ be sufficiently small such that (according to (A3)) (θ 0 δ, θ 0 + δ) Θ 0 and let S n := {x l(θ 0 x) > l(θ 0 δ x)and l(θ 0 x) > l(θ 0 + δ x)}. According to Theorem P θ0 (S n ) 1 for n. For each x S n there is hence a ˆθ n with θ 0 δ < ˆθ n < θ 0 + δ, where l(θ; x) takes a local maximum and therefore l(ˆθ n ) = 0. This entails that for each small enough δ there exists a sequence ˆθ n = ˆθ n (δ) of solustions, such that P θ0 ( ˆθ n θ 0 < δ) 1 for n. It remains to show that such a sequence exists which does not depend on δ. Let θn be the solution closest to θ 0. (It exists, since, because of the continuity of l(θ) the limes of a sequence of solutions is itself a solution.) Then it naturally holds that P θ0 ( θn θ 0 < δ) 1 for all δ > 0.. Remark: If the solutions are not unique, then the above Theorem does not yield a consistent sequence of estimators. θ 0 is unknown and the data don t tell you which root to choose. In order to show asymptotic efficiency for the univariate case further regularity conditions are needed:

16 56 CHAPTER 3. POINT ESTIMATION (A5) Θ IR is an open interval. (A6) For x A the density f(x; θ) is three times continuously differentiable with respect to θ. (A7) The integral f(x; θ)µ(dx) can be differenciated three times with respect to θ under the integral sign. (A8) For the Fisher information 0 < I(θ) < holds. (A9) To every θ 0 Θ there exists a δ > 0 and a function M(x) (both may depend on θ 0 ) such that 3 log f(x; θ) M(x) 3 for all x A, θ 0 δ < θ < θ 0 + δ with E θ0 [M(x)] <. Theorem 3.4.6: Let the conditions (A1), (A2), (A5) (A9) hold. Then for each consistent sequence ˆθ n of solutions of the likelihood equations holds. n(ˆθn θ 0 ) L N (0, I(θ) 1 ) Proof: For every fixed x A a Taylor series expansion of l(ˆθ n ) around θ 0 yields 0 = log(fx; ˆθ n ) = log (f(x; θ 0)) where θ n lies between θ 0 and ˆθ n With obvious abbreviations this is equal to + (ˆθn θ ) 2 log (f(x; θ 0 )) 2 ) 2 (ˆθn 3 log (f(x; θ θ n)) 0, 3 0 = l(ˆθ n ) = l(θ 0 ) + (ˆθ n θ 0 ) l(θ o ) (ˆθ n θ 0 ) 2 l (θ n)

17 3.4. MAXIMUM LIKELIHOOD ESTIMATION 57 or (ˆθ n θ 0 ) [ l(θ0 ) (ˆθ n θ 0 ) ] l (θn) = l(θ 0 ), and for the expression [...] 0 we obtain n(ˆθn θ 0 ) = n 1 l(θ n 0 ) n l(θ 1 0 ) 1 (ˆθ 2n n θ 0 ). (3.4) l (θn) In Theorem we have already shown that (ˆθ n θ 0 ) converges to zero in probability for n. We will now show that (1) n 1/2 l(θ0 ) converges weakly to a N (0, I(θ 0 )), (2) n 1 l(θ0 ) converges to I(θ 0 ) > 0 a.s. resp. in probability (3) 1 n l (θn) is stochastically bounded. (1): n 1/2 l(θ0 ) = n 1 n n i=1 log (f(x i ; θ 0 )) =: nb n, where according to the SLLN B n converges a.s. to [ ] log (f(x; θ0 )) B 0 = E θ0 = 0. 0 According to the CLT n [B n 0] converges in distribution to a normal distribution with expected value equal to zero and variance E [ ( ) ] 2 B0 2 = E logf(x; θ0 ) = I(θ 0 ) where I(θ 0 ) > 0 according to (A8).

18 58 CHAPTER 3. POINT ESTIMATION (2): Since with l = log f(x; θ), l f =, l = f we have f. f ( f) 2 f 2 n 1 l(θ0 ) = 1 n n f(x i ; θ 0 ) 2 f(x i ; θ 0 ) f(x i ; θ 0 ). f 2 (X i ; θ 0 ) i=1 According to the SLLN this term converges (a.s. probability) to I(θ 0 ), since and hence also in E θ0 [ 1 n ( f 2 f f )] = 1 2 f n = f 2 f dµ = E θ0 f 2 f dµ fdµ }{{} =0 [ 2 ] log(f(x; θ 0 ) = I(θ 2 0 ). 1 (3): Finally n l (θn) = 1 n n i=1 3 3 log(f(x i ; θ n)), and with (A9) we get 1 n l (θ n) 1 n [M(X 1) M(X n )]. The right hand side converges to E θ n [M(x)] < according to (A9). Since (ˆθ n θ 0 ) converges to zero in probability according to Theorem 3.4.5, the second term in the denominator of (3.4) converges to zero as well. Putting (1) to (3) together we have shown that n(ˆθ n θ 0 ) converges weakly to a N (0, I(θ 0 ) 1 ). Remarks: (1) A sequence of estimators which fulfils the conditions of Theorem is called an efficient likelihood estimator.

19 3.4. MAXIMUM LIKELIHOOD ESTIMATION 59 (2) (A6), (A7) entail for all θ Θ 0 [ ] log f(x; θ) (i) E = 0 and [ ] [ ( ) ] (ii) E 2 log f(x; θ) log f(x; θ) 2 2 = E = I(θ). Corollary 3.4.7: Let the conditions of Theorem hold. If the likelihood equations have a unique solution for all x and n resp. if the probability for multiple roots goes to zero for n, then the MLE is asymptotically efficient. Some final remarks: (1) In general, the likelihood equations (2) cannot be resolved explicitely. In this case the roots can be found only by using numerical procedures. (Problems of existence, uniqueness and convergence of solutions for used algorithms). (2) MLE s need strong prerequisites (conditions). Under certain conditions consistency and asymptotic normality still hold, even if the distributional assumptions do not exactly coincide with reality. But in this case asymptotic efficiency gets lost: Already small deviations between reality and model assumptions can lead to a considerable loss of efficiency. (3) Consistency and asymptotic normality may hold even if some regularity conditions of the above Theorems are violated. For the multivariate case Θ IR k a result like Theorem can be obtained in a similar way, if the conditions (A5),... are reformulated accordingly.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee Stochastic Processes Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee 1 Outline Methods of Mean Squared Error Bias and Unbiasedness Best Unbiased Estimators CR-Bound for variance

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Chapter 8.8.1: A factorization theorem

Chapter 8.8.1: A factorization theorem LECTURE 14 Chapter 8.8.1: A factorization theorem The characterization of a sufficient statistic in terms of the conditional distribution of the data given the statistic can be difficult to work with.

More information

Chapter 4. Theory of Tests. 4.1 Introduction

Chapter 4. Theory of Tests. 4.1 Introduction Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 9: Asymptotics III(MLE) 1 / 20 Jensen

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y

More information

557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE

557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE 557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE An estimator, T (X), of θ can be evaluated via its statistical properties. Typically, two aspects are considered: Expectation Variance either in terms

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Winter 2018 ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,

More information

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Classical Estimation Topics

Classical Estimation Topics Classical Estimation Topics Namrata Vaswani, Iowa State University February 25, 2014 This note fills in the gaps in the notes already provided (l0.pdf, l1.pdf, l2.pdf, l3.pdf, LeastSquares.pdf). 1 Min

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

5.2 Fisher information and the Cramer-Rao bound

5.2 Fisher information and the Cramer-Rao bound Stat 200: Introduction to Statistical Inference Autumn 208/9 Lecture 5: Maximum likelihood theory Lecturer: Art B. Owen October 9 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Evaluating the Performance of Estimators (Section 7.3)

Evaluating the Performance of Estimators (Section 7.3) Evaluating the Performance of Estimators (Section 7.3) Example: Suppose we observe X 1,..., X n iid N(θ, σ 2 0 ), with σ2 0 known, and wish to estimate θ. Two possible estimators are: ˆθ = X sample mean

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Unbiased estimation Unbiased or asymptotically unbiased estimation plays an important role in

More information

Inference in non-linear time series

Inference in non-linear time series Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators

More information

3.1 General Principles of Estimation.

3.1 General Principles of Estimation. 154 Chapter 3 Basic Theory of Point Estimation. Suppose X is a random observable taking values in a measurable space (Ξ, G) and let P = {P θ : θ Θ} denote the family of possible distributions of X. An

More information

Proof In the CR proof. and

Proof In the CR proof. and Question Under what conditions will we be able to attain the Cramér-Rao bound and find a MVUE? Lecture 4 - Consequences of the Cramér-Rao Lower Bound. Searching for a MVUE. Rao-Blackwell Theorem, Lehmann-Scheffé

More information

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

STAT215: Solutions for Homework 2

STAT215: Solutions for Homework 2 STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator

More information

Graduate Econometrics I: Unbiased Estimation

Graduate Econometrics I: Unbiased Estimation Graduate Econometrics I: Unbiased Estimation Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Unbiased Estimation

More information

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,

More information

Section 8.2. Asymptotic normality

Section 8.2. Asymptotic normality 30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that

More information

f(y θ) = g(t (y) θ)h(y)

f(y θ) = g(t (y) θ)h(y) EXAM3, FINAL REVIEW (and a review for some of the QUAL problems): No notes will be allowed, but you may bring a calculator. Memorize the pmf or pdf f, E(Y ) and V(Y ) for the following RVs: 1) beta(δ,

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA. Asymptotic Inference in Exponential Families Let X j be a sequence of independent,

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Stat 411 Lecture Notes 03 Likelihood and Maximum Likelihood Estimation

Stat 411 Lecture Notes 03 Likelihood and Maximum Likelihood Estimation Stat 411 Lecture Notes 03 Likelihood and Maximum Likelihood Estimation Ryan Martin www.math.uic.edu/~rgmartin Version: August 19, 2013 1 Introduction Previously we have discussed various properties of

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Principles of Statistics

Principles of Statistics Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for

More information

Z-estimators (generalized method of moments)

Z-estimators (generalized method of moments) Z-estimators (generalized method of moments) Consider the estimation of an unknown parameter θ in a set, based on data x = (x,...,x n ) R n. Each function h(x, ) on defines a Z-estimator θ n = θ n (x,...,x

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota 1 Likelihood Inference We have learned one very general method of estimation: method of moments. the Now we

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Probability on a Riemannian Manifold

Probability on a Riemannian Manifold Probability on a Riemannian Manifold Jennifer Pajda-De La O December 2, 2015 1 Introduction We discuss how we can construct probability theory on a Riemannian manifold. We make comparisons to this and

More information

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Final Examination Statistics 200C. T. Ferguson June 11, 2009 Final Examination Statistics 00C T. Ferguson June, 009. (a) Define: X n converges in probability to X. (b) Define: X m converges in quadratic mean to X. (c) Show that if X n converges in quadratic mean

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 19, 2011 Lecture 17: UMVUE and the first method of derivation Estimable parameters Let ϑ be a parameter in the family P. If there exists

More information

1. (Regular) Exponential Family

1. (Regular) Exponential Family 1. (Regular) Exponential Family The density function of a regular exponential family is: [ ] Example. Poisson(θ) [ ] Example. Normal. (both unknown). ) [ ] [ ] [ ] [ ] 2. Theorem (Exponential family &

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

1 Likelihood. 1.1 Likelihood function. Likelihood & Maximum Likelihood Estimators

1 Likelihood. 1.1 Likelihood function. Likelihood & Maximum Likelihood Estimators Likelihood & Maximum Likelihood Estimators February 26, 2018 Debdeep Pati 1 Likelihood Likelihood is surely one of the most important concepts in statistical theory. We have seen the role it plays in sufficiency,

More information

Section 8: Asymptotic Properties of the MLE

Section 8: Asymptotic Properties of the MLE 2 Section 8: Asymptotic Properties of the MLE In this part of the course, we will consider the asymptotic properties of the maximum likelihood estimator. In particular, we will study issues of consistency,

More information

Parametric Inference

Parametric Inference Parametric Inference Moulinath Banerjee University of Michigan April 14, 2004 1 General Discussion The object of statistical inference is to glean information about an underlying population based on a

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

i=1 h n (ˆθ n ) = 0. (2)

i=1 h n (ˆθ n ) = 0. (2) Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating

More information

Answers to the 8th problem set. f(x θ = θ 0 ) L(θ 0 )

Answers to the 8th problem set. f(x θ = θ 0 ) L(θ 0 ) Answers to the 8th problem set The likelihood ratio with which we worked in this problem set is: Λ(x) = f(x θ = θ 1 ) L(θ 1 ) =. f(x θ = θ 0 ) L(θ 0 ) With a lower-case x, this defines a function. With

More information

5601 Notes: The Sandwich Estimator

5601 Notes: The Sandwich Estimator 560 Notes: The Sandwich Estimator Charles J. Geyer December 6, 2003 Contents Maximum Likelihood Estimation 2. Likelihood for One Observation................... 2.2 Likelihood for Many IID Observations...............

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Theory of Statistics.

Theory of Statistics. Theory of Statistics. Homework V February 5, 00. MT 8.7.c When σ is known, ˆµ = X is an unbiased estimator for µ. If you can show that its variance attains the Cramer-Rao lower bound, then no other unbiased

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium November 12, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

SOLUTION FOR HOMEWORK 7, STAT p(x σ) = (1/[2πσ 2 ] 1/2 )e (x µ)2 /2σ 2.

SOLUTION FOR HOMEWORK 7, STAT p(x σ) = (1/[2πσ 2 ] 1/2 )e (x µ)2 /2σ 2. SOLUTION FOR HOMEWORK 7, STAT 6332 1. We have (for a general case) Denote p (x) p(x σ)/ σ. Then p(x σ) (1/[2πσ 2 ] 1/2 )e (x µ)2 /2σ 2. p (x σ) p(x σ) 1 (x µ)2 +. σ σ 3 Then E{ p (x σ) p(x σ) } σ 2 2σ

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Information in a Two-Stage Adaptive Optimal Design

Information in a Two-Stage Adaptive Optimal Design Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Graduate Econometrics I: Maximum Likelihood II

Graduate Econometrics I: Maximum Likelihood II Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information