Departmet of Mathematics Ma 3/103 KC Border Itroductio to Probability ad Statistics Witer 2017 Lecture 19: Estimatio II Relevat textbook passages: Larse Marx [1]: Sectios 5.2 5.7 19.1 The method of momets Let X 1,..., X be idepedet ad idetically distributed with desity f(x; θ 1,..., θ m ). The the k th sample momet is i=1 xk i. The distributio s k th momet is x k f(x; θ 1,..., θ m ) dx. Larse Marx [1]: pp. 293 296 Solvig for the (ˆθ 1,..., ˆθ m ) that equates the first m momets is called the method of momets. x k f(x; ˆθ 1,..., ˆθ m ) dx = i=1 xk i (k = 1,..., m). 19.1.1 Example (Method of momets ad the Gamma distributio) Recall that the Larse Gamma(r, λ) distributio (r > 0, λ > 0) has desity give by f(t) = λr Γ(r) tr 1 e λt (t > 0). The parameter r is the shape parameter, ad λ is the scale parameter. The mea ad variace of a Gamma(r, λ) radom variable are give by E X = r λ, Var X = r λ 2. It is difficult to derive closed form expressios for the MLE of a Gamma, because the gamma fuctio Γ(r) does ot have a closed form expressio. But it is straightforward to derive the method of momets estimators for r ad λ. Usig the fact that for ay radom variable X, we have E(X 2 ) = Var X + (E X) 2 (see Sectio 6.10), give a sample x 1,..., x of idepedet draws from a Gamma, we just eed to solve the two equatios x = i x i = r λ, i x2 i = r λ 2 + ( r λ ) 2 = r(r + 1) λ 2. The solutio is gotte by solvig the first for r = λ x ad substitutig that ito the secod to get i x2 i (λ x)(1 + λ x) = λ 2 2 λ x + λ2 x = λ 2 = x λ + x2 Marx [1]: Example 5.2.5, pp. 294 295 19 1
Ma 3/103 Witer 2017 KC Border Estimatio II 19 2 so ˆλ = i x2 i x, / x2 ad ˆr = ˆλ x. 19.1.2 Example (Method of momets ad the Normal distributio) The Normal(µ, σ 2 ) has mea µ ad variace σ 2, so the method of momets estimators solve ˆµ = x,, i x 2 i / = σ 2 + ˆµ 2. Solvig gives ˆµ = x, σ2 = i x 2 i / x 2 = i (x i x) 2 /. The momets estimators for µ ad σ 2 are the same as the maximum likelihood estimators. 19.2 Other ways to geerate estimators Most other geeral methods for fidig estimators ivolve some sort of maximizatio or miimizatio. For istace, there are miimum χ 2 estimators, that frequetly have ice properties. Mosteller s [3] aalysis of the World Series cosiders miimum χ 2 estimatio i additio to MLE. I ll describe this kid of estimatio later o, whe we discuss χ 2 tests. Most geeral methods for geeratig estimators ivolve choosig a method of either similarity or distace betwee the observed data ad the data that might have bee geerated by the dgp with a give parameter. There are deep reasos why such estimators have good properties, but that s a topic for a more advaced course. 19.3 Digressio: The quatiles z α Statisticias have adopted the followig special otatio. Let Z be a Stadard Normal radom Larse variable, with cumulative distributio fuctio deoted Φ. For 0 < α < 1, defie z α by Marx [1]: p. 307 P (Z > z α ) = α or equivaletly The P (Z z α ) = 1 α. z α = Φ 1 (1 α) This is somethig you ca look up with R or Mathematica s built-i quatile fuctios. (Remember the quatile fuctio is Φ 1.) By symmetry, P (Z < z α ) = α ad P ( Z > z α ) = 2α v. 2017.02.13::17.42 KC Border
Ma 3/103 Witer 2017 KC Border Estimatio II 19 3 so The last iequality is ofte expressed as P ( z α Z z α ) = 1 2α. P ( z α/2 Z z α/2 ) = 1 α. Here are some commoly used values of α ad the correspodig z α to two decimal places. α z α 1 2α 0.1 1.28 0.80 0.05 1.64 0.90 0.025 1.96 0.95 0.01 2.33 0.98 0.005 2.58 0.99 0.5 0.4 0.3 0.2 0.1-3 -2-1 0 1 2 3 This shaded area is the probability of the evet ( Z > 1.96), which is equal to 0.05. Values outside the iterval ( 1.96, 1.96) are ofte regarded as ulikely to have occurred by chace. 19.4 Cofidece itervals for Normal meas if σ is kow So far we have looked at poit estimates, ad barely made a det i the subject. (Erich L. Lehma s classic Theory of Poit Estimatio [2] rus to about 500 pages.) But it is time to move o. Iterval estimates are closely related to hypothesis testig (comig up soo) ad are sometime more useful tha poit estimates. Go back to the Normal estimatio case. The maximum likelihood estimator ˆµ MLE of the mea µ is just the sample mea x = i x i/, but how good is that estimate? If X 1,..., X are idepedet ad idetically distributed N(µ, σ 2 ), the ˆµ MLE = X 1 + + X N(µ, σ 2 /), so by stadardizig ˆµ we have We have just see that ˆµ µ σ/ N(0, 1). z 0.025 = 1.96. KC Border v. 2017.02.13::17.42
Ma 3/103 Witer 2017 KC Border Estimatio II 19 4 Therefore P ( 1.96 ˆµ µ ) σ/ 1.96 = 0.95 But this evet is also equal to the evet ( ˆµ 1.96σ µ So aother way to iterpret this is eve though µ is ot radom. The iterval ) ˆµ + 1.96σ. P ( µ [ˆµ 1.96σ/, ˆµ + 1.96σ/ ] ) = 95% I = [ˆµ 1.96σ/, ˆµ + 1.96σ/ ] is called a 95% cofidece iterval for µ. More geerally we have the followig To get a 1 α cofidece iterval for µ whe σ is kow, set [ I = ˆµ z α/2σ, ˆµ + z ] α/2σ. (1) The P (µ I) = 1 α. 19.4.1 Iterpretig cofidece itervals Remember that µ is ot radom, rather the iterval I(X) = [ˆµ 1.96σ/, ˆµ + 1.96σ/ ] is radom, sice it is based o the radom ˆµ. But oce I calculate I, µ either belogs to I or it does t, so what am I to make of the 95% probability? I thik the way to thik about it is this: No matter what the values of µ ad σ are, followig the procedure draw a sample X from the distributio N(µ, σ 2 ), ad use (1) to calculate the iterval I(X), the iterval I(X) will the have a 95% probability of cotaiig µ. This is ot the same as sayig, I used (1) to calculate the iterval I, so o matter what the values of µ ad σ are, the iterval I has a 95% probability of cotaiig µ. It is the procedure, ot the iterval per se, that gives us the cofidece. Figure 19.1 shows the result of usig this procedure 100 times to costruct a symmetric 95% cofidece iterval for µ, based o (pseudo-)radom samples of size 5 draw from a stadard ormal distributio. Note that i this istace, 5 of the 100 itervals missed the true mea 0. 19.4.2 Hold o But wait! The cofidece iterval give by (1) depeds o σ. What if we do t kow σ? We ca use ˆσ to estimate σ to get a cofidece iterval. The catch is that ˆµ µ ˆσ/ is ot a Stadard Normal radom variable. Istead it has a Studet t distributio. We will discuss this later i Lecture 21, sectios 21.6 ad 21.7. v. 2017.02.13::17.42 KC Border
Ma 3/103 Witer 2017 KC Border Estimatio II 19 5 Figure 19.1. Here are oe hudred 95% cofidece itervals for the mea from a Mote Carlo simulatio of a sample of size 5 idepedet stadard ormals. The itervals that do ot iclude the true mea 0 are show i red. KC Border v. 2017.02.13::17.42
Ma 3/103 Witer 2017 KC Border Estimatio II 19 6 You might ask, whe might I kow σ, but ot kow µ? Maybe i a case like this: I ca imagie the variace i a measuremet of weight usig a balace beam scale depeds o the frictio i the balace bearig. I ca also imagie that the mea measuremet of a sample s mass depeds o the sample s actual mass. I might have a lot of experiece with this particular of scale, so that I kow the variace σ, but the mea of the measuremet depeds o which sample I am weighig. To get a good estimate of the weight, I might make several measuremets, 1 ad I could the use this procedure to geerate a cofidece iterval. (I just made this up, ad it souds plausible, but do ay of you chemists or egieers have ay real iformatio o such scales?) 19.5 Cosideratios i costructig cofidece itervals There are two more poits worth otig. Suppose we kow µ, ad we wat to choose a iterval I so that the stadard ormal radom variable Z = lies i I with probability 1 α. Ay iterval [a, b] satisfyig b a ˆµ µ σ/ 1 2π e z2 /2 dz = 1 α has this property. Because of the symmetry of the ormal distributio, the symmetric iterval [ z α/2σ, z α/2σ ] is the shortest such iterval. [ Because of the properties ] of the stadard ormal distributio, the legth of the iterval ˆµ z α/2σ, ˆµ + z α/2σ does ot deped o µ. For distributios that are ot symmetric, you may wat to costruct asymmetric cofidece itervals. I ca thik of at least two priciples you could use. 1. Choose the shortest iterval [a, b] cotaiig your poit MLE ˆθ that has Pˆθ( [a, b] ) = 1 α. This would be the iterval where the likelihood (= desity) is highest. Sice ˆθ maximizes the likelihood, we kow it will be i the iterval. Oops. How do we kow that a iterval is the shortest set? Maybe we would be better off takig two short itervals istead oe log oe. For uimodal (sigle-peaked) desities, this wo t happe. 2. The other priciple you might cosider is to choose a iterval [a, b] so that P (θ < a) = P (θ > b) = α/2, bearig i mid the above iterpretatio of the probability. I the ormal case, these two priciples are ot i coflict ad procedure for costructig the iterval described above is cosistet with both. Bibliography [1] R. J. Larse ad M. L. Marx. 2012. A itroductio to mathematical statistics ad its applicatios, fifth ed. Bosto: Pretice Hall. [2] E. L. Lehma. 1983. Theory of poit estimatio. Wiley Series i Probability ad Mathematical Statistics. New York: Joh Wiley ad Sos. [3] F. Mosteller. 1952. The world series competitio. Joural of the America Statistical Associatio 47(259):355 380. http://www.jstor.org/stable/2281309 1 My gradfather was a carpeter, so I am quite familiar with the old saw, Measure twice, cut oce. (Sorry, I could t help myself.) v. 2017.02.13::17.42 KC Border