Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets i some sese a good guess for the true value of the parameter. 19 Defiitios 19.1 Parameter A pmf/pdf ca be equivaletly writte as f X (x) or f X (x θ), where θ represets the costats that fully defie the distributio. For example, if X is a ormally distributed RV, the costats µ ad σ will fully defie the distributio. These costats are called parameters ad are geerally deoted by the Greek letter θ. 1 Example 19.1. Normal distributio: f(x θ) f(x µ, σ), 2 parameters: θ 1 = µ ad θ 2 = σ; Biomial distributio: f(x θ) f(x, p), 2 parameters: θ 1 = ad θ 2 = p; Poisso distributio: f(x θ) f(x λ), 1 parameter: θ = λ; Gamma distributio: f(x θ) f(x α, β), 2 parameters: θ 1 = α ad θ 2 = β. 19.2 (Poit) Estimator A (poit) estimator of θ, deoted by θ, ˆ is a statistic (a fuctio of the radom sample): θˆ = r(x 1, X 2,..., X ). (63) Cautio: These otes are ot ecessarily self-explaatory otes. They are to be used as a complemet to (ad ot as a substitute for) the lectures. 1 Aother way of sayig this: parameters are costats that idex a family of distributios. 1

Note that θˆ does ot deped directly o θ, but oly idirectly through the radom process of each X i. A (poit) estimate of θ is a realizatio of the estimator θˆ (i.e.: a fuctio of the realizatio of the radom sample): θˆ = r(x 1, x 2,..., x ). (64) Example 19.2. Assume we have a radom sample X 1,..., X 10 from a ormal distributio N(µ, σ 2 ) ad we wat to have a estimate of the parameter µ (which is ukow). There is a ifiite umber of estimators of µ that we could costruct. I fact, ay fuctio of the radom sample could classify as a estimator of µ, for example: θˆ = r(x 1, X 2,..., X 10 ) = X 10 2 X 10 +X 1 2 1 10 X = 10 i=1 X i 1.5X 2 etc. Example 19.3. Assume a radom sample X 1,..., X from a U[0, θ] distributio, where θ is ukow. Propose 3 differet estimators. 2

20 Evaluatig (Poit) Estimators Sice there are may possible estimators, we eed to defie some characteristic i order to evaluate ad rak them. 20.1 Ubiasedess A estimator θˆ is said to be a ubiased estimator of the parameter θ, if for every possible value of θ: E(θˆ) = θ (65) If θˆ is ot ubiased, it is said to be a biased estimator, where the differece E(θˆ) θ is called the bias of θˆ. If the radom sample X 1,..., X is iid with E(X i ) = θ, the the sample mea estimator θˆ = X = 1 X i (66) i=1 is a ubiased estimator of the populatio mea: E(X ) = θ (see Example 18.1 i Lecture Note 7). If the radom sample X 1,..., X is iid with E(X i ) = µ ad V ar(x i ) = θ, the the sample variace estimator θˆ = S 2 = 1 (X i 1 X ) 2 i=1 (67) is a ubiased estimator of the populatio variace: E(S 2 ) = θ (see Example 18.1 i Lecture Note 7). Example 20.1. Let X i estimator of θ as: U [0, θ]. Assume a radom sample of size ad defie a θˆ = 2 X i. Is θˆ biased? i=1 3

20.2 Efficiecy Let θˆ1 ad θˆ2 be two ubiased estimators of θ. θˆ1 is said to be more efficiet tha θˆ2, if for a give sample size, V ar(θˆ1) < V ar(θˆ2) (68) Where V ar(θˆi) is the variace of the estimator. Let θˆ1 be a ubiased estimator of θ. θˆ1 is said to be efficiet, or miimum variace ubiased estimator, if for ay ubiased estimators of θ, θˆk, V ar(θˆ1) V ar(θˆk) (69) Do ot cofuse the variace of the estimator θˆ, V ar(θˆ), with the sample variace esti mator S 2, which is a ubiased estimator of the populatio variace σ 2 (!). Example 20.2. How would you compare the efficiecy of the estimators i Example 19.2? Which of these estimators is ubiased? 20.3 Mea Squared Error Why restrict ourselves to the class of ubiased estimators? The mea square error (MSE ) specifies for each estimator θˆ a trade off betwee bias ad efficiecy. MSE(θˆ) = E[(θˆ θ) 2 ] = V ar(θˆ) + (bias(θˆ)) 2 (70) θˆ is the miimum mea square error estimator of θ if, amog all possible estimators of θ, it has the smallest MSE for a give sample size. 4

Example 20.3. Graph the pdf of two estimators such that the bias of the first estimator is less of a problem tha iefficiecy (ad vice versa for the other estimator). 20.4 Asymptotic Criteria 20.4.1 Cosistecy p Let θˆ be a estimator of θ. θˆ is is said to be cosistet if θˆ θ (Law of Large Numbers; Lecture Note 7). Example 20.4. Assume a radom sample of size from a populatio f(x), where E(X) = µ (ukow). Which of the followig estimators is cosistet? 5 1 1 1 µˆ1 = X i µˆ2 = X i µˆ3 = X i i=1 5 i=1 5 i=1 MSE 0 as = cosistecy. 20.4.2 Asymptotic Efficiecy Let θˆ1 be a estimator of θ. θˆ1 is is said to asymptotically efficiet if it satisfies the defiitio of a efficiet estimator, equatio (69), whe. 5

21 Poit Estimatio Methods The followig are two stadard methods used to costruct (poit) estimators. 21.1 Method of Momets (MM) Let X 1, X 2,..., X be a radom sample from a populatio with pmf/pdf f(x θ 1,..., θ k ), where θ 1,..., θ k are ukow parameters. Oe way to estimate these parameters is by equatig the first k populatio momets to the correspodig k sample momets. The resultig k estimators are called method of momets (MM) estimators of the parameters θ 1,..., θ k. 2 The procedure is summarized as follows: Populatio momet System of equatios: Sample momet (theoretical) k equatios ad k ukows 1 i=1 X i = E(Xi 1 ) = yields θˆ 1,..., θˆ k Note that i) E(X j i ) = g j (θ 1,..., θ k ) ii) the realizatio of the sample momet is a scalar. 1 1 i=1 X2 i = E(X 2 i ) 1 i=1 X3 i = E(Xi 3 ).. i=1 Xk i = E(X k i ). First momet Secod momet Third momet k th momet 2 This estimatio method was itroduced by Karl Pearso i 1894. 6

Example 21.1. Assume a radom sample of size from a N(µ, σ 2 ) populatio, where µ ad σ 2 are ukow parameters. Fid the MM estimator of both parameters. Example 21.2. Assume a radom sample of size from a gamma distributio: f(x) = 1 x α 1 e x/β for 0 < x <. Assume also that the radom sample realizatio is Γ(α)β α 1 1 2 characterized by i=1 x i = 7.29 ad i=1 x i = 85.59. Fid the MM estimators of α ad β (parameters). Remember that E(X i ) = αβ ad Var(X i ) = αβ 2. 7

21.2 Maximum Likelihood Estimatio (MLE) Let X 1, X 2,..., X be a radom sample from a populatio with pmf/pdf f(x θ 1,..., θ k ), where θ 1,..., θ k are ukow parameters. Aother way to estimate these parameters is fidig the values of θˆ 1,..., θˆ k that maximize the likelihood that the observed sample is geerated by f(x θˆ 1,..., θˆ k). The joit pmf/pdf of the radom sample, f(x 1, x 2,..., x θ 1,..., θ k ), is called the likelihood fuctio, ad it is deoted by L(θ x). L(θ x) = L(θ 1,..., θ k x 1,..., x ) = f(x 1,..., x θ 1,..., θ k ) geerally i=1 f(x i θ 1,..., θ k ) radom sample (iid) Where θ ad x are vectors such that θ = (θ 1,..., θ k ) ad x = (x 1,..., x ). For a give sample vector x, deote θˆmle (x) the parameter value of θ at which L(θ x) attais its maximum. The, θˆmle (x) is the maximum likelihood estimator (MLE) of the (ukow) parameters θ 1,..., θ k. 3 Ituitio: Discrete case. 3 This estimatio method was itroduced by R.A. Fisher i the 1912. 8

Does a global maximum exist? Ca we fid it? Uique?...Back to Calculus 101: foc : L(θ x) θ i = 0, i = 1,..k (for a well behaved fuctio.) You also eed to check that it is really a maximum ad ot a miimum (2 d derivative). May times it is easier to look for the maximum of LL(θ x) (same maximum sice is a mootoic trasformatio...back to Calculus 101). Ivariace property of MLE: ˆτ MLE (θ) = τ(θˆmle ). For large samples MLE yields a excellet estimator of θ (cosistet ad asymptotically efficiet). No woder it is widely used. But... i) Numerical sesitivity (robustess). ii) Not ecessary ubiased. iii) Might be hard to compute. 9

Example 21.3. Assume a radom sample from a N(µ, σ 2 ) populatio, where the parameters are ukow. Fid the ML estimator of both parameters. Example 21.4. Assume a radom sample from a U(0, θ) populatio, where θ is ukow. Fid θˆmle. 10