Topic 10: Introduction to Estimation

Topic 0: Itroductio to Estimatio Jue, 0 Itroductio I the simplest possible terms, the goal of estimatio theory is to aswer the questio: What is that umber? What is the legth, the reactio rate, the fractio displayig a particular behavior, the temperature, the mea salary, mea lifetime, the slope ad itercept of a lie? The ext step is to perform a experimet or a sta collectio scheme that is well desiged to estimate oe (or more) umbers However, before we ca embark o such a desig, we must lear some priciples of estimatio to have some uderstadig of the properties of a good estimator ad to preset our ucertaily about the estimatio procedure We begi with a defiitio: Defiitio A statistic is a fuctio of the data that does ot deped o ay ukow quatity We have to this poit, see a variety of statistics Example sample mea, x sample variace, s sample stadard deviatio, s sample media, sample quartiles Q, Q 3 ad percetiles stadardized scores (x i x)/s Poit Estimates Here we will limit ourselves to three estimatio questios: For a simple radom sample X, X,, X havig ukow mea µ, we estimate µ by X, the mea of the sample For Beroulli trils X, X,, X havig ukow success probability p, we estimate p by ˆp, the sample proportio For two idepedet simple radom samples X, X,, X ad Y, Y,, Y havig ukow meas µ X ad µ Y, repectively, we estimate µ X = µ Y by X Ȳ c 0 Joseph C Watkis

Itroductio to Statistical Methodology Itroductio to Estimatio Notice that we have the expected values E X = µ, E ˆp = p, ad E[ X Ȳ ] = µ X µ Y I words, this say that the estimator does ot systematically uderestimate or overestimate the ukow mea, probability, or differece i meas Whe such a idetity hold, we say that the estimator is ubiased For example, for the variace σ, we have see two choices: (x i x) ad i= (x i x) The first of the systematically uderestimates σ, the secod is ubiased Oe criterio for a good estimator is little or o bias The secod is a small variace For the first two examples above, we have the variaces Notice that we have the expected values Var( X) = σ, ad Var(ˆp) = p( p) Thus, the variace of the estimator is proportioal to the variace of a sigle observatio ad iversely proportioal to the umber of observatios For the third example, we have the Pythagorea theorem for variaces Var( X Ȳ ) = Var( X) + ( ) Var(Ȳ )Var = ( X) + Var(Ȳ ) = σ X X + σ Y Y I most circumstaces, the variaces are ukow ad thus we replace them by the stadard deviatio from the data s, ˆp( ˆp), ad s X X + s Y Y The square root of this quatity is kow are the stadard error s, ˆp( ˆp), ad s X X + s Y Y Example 3 Returig to the study of the smokig habits of 5375 high school childre i Tucso i 967 Here is a two-way table summarizig some of the results studet studet sample stadard smokes does ot smoke total proportio error parets smoke 400 380 780 05 000 paret smokes 46 83 39 086 0009 0 parets smoke 88 68 356 039 0008 total 004 437 5375 i=

Itroductio to Statistical Methodology Itroductio to Estimatio Example 4 For rolls of a die, we have the followig summaries stadard type mea deviatio fair 3340 636 weighted 740 56 Thus the estimate for the mea value o a fair die is 3340 The estimate for the mea value o the weighted die is 740 The stadard error of the estimates are 636 = 03 for the fair die, ad 56 = 0 I this case (uusually), we kow the distributioal mea is 30 for the fair die ad 667 for the weighted die Note that these values are withi oe stadard error of the sample meas The estimate of the differece i the meas is 3340 740 = 0600 with a stadard error 636 + 56 = 030 Note here that the mea differece is early twice the stadard error 3 Cofidece Itervals I some sese, this should be sufficiet to describe the estimate ad determie the quality of the estimator by givig the stadard error However, the typical way to describe a estimatio procedure is with a cofidece iterval This is a procedure to determie ad iterval from the data that has a high probability of capturig the true value If this probability is C%, the this is called a C% cofidece itervaltypical value for C% are 95%, 98% ad 99% Lookig at the Tucso data for smokig, we have a sample proportio ˆp = 05 of childre who smoke i households i which parets smoke We might ask what is the proportio p i the etire populatio of childre who smoke i households i which parets smoke We kow that their is a 95% probability that ˆp is withi z = 96 stadard uits of the populatio proportio p Reversig this, we have that the populatio proportio p has a 95% probability of beig withi 96 stadard uits of ˆp I symbols, with a 95% probability p is somewhere i the iterval estimate margi of error ad estimate + margi of error estimate value stadard error ad estimate + value stadard error ˆp z ˆp( ˆp) ad ˆp + z ˆp( ˆp) 05 960 0 ad 05 + 960 0 005 ad 045 Whe the cofidece iterval icludes the mea, we eed to take ito accout that we have made the replacemet of the distributioal variace by the sample variace Thus, the z-statistic z = x µ σ/ 3

Itroductio to Statistical Methodology Itroductio to Estimatio 00 0 0 03 04 00 0 04 06 08 0-4 - 0 4 x -4-0 4 x Figure : The desity ad distributio fuctio for a stadard ormal radom variable (black) ad a t radom variable with 4 degrees of freedom (red) The variace of the t distributio is df/(df ) = 4/(4 ) = is higher tha the variace of a stadard ormal This ca be see i the broader shoulders of the desity fuctio or i the more rapid rise i the distributio fuctio away from the mea of 0 is replaced by the t-statistic t = x µ s/ The remarkable discovery by William Gossett is that the distributio of the t statistic ca be determied exactly However, this statistic depeds o the umber of observatios Thus, we use a table of values for the t- statistics The so-called degrees of freedom is oe fewer tha the umber of observatios For a 95% cofidece iterval the value for 49 degrees of freedom is 00 This is slightly larger tha the value 960 for the correspodig value for the ormal distributio Thus, the 95% cofidece iterval for the fair die is estimate margi of error ad estimate + margi of error estimate value stadard error ad estimate + value stadard error x t s ad x + t s 3340 00 03 ad 3340 + 00 03 776 ad 3704 Example 5 A radom sample of legths of movies, i miutes, durig a Jue weeked is give below 0 3 9 37 5 30 4 90 96 0 07 94 90 96 84 6 97 For these data, the mea x = 088 miutes ad the stadard deviatio is s = 567 miutes Thus, the stadard error of the mea is 567/ = 34 miutes For a 95% cofidece iterval, the value t = 086 Thus 4

Itroductio to Statistical Methodology Itroductio to Estimatio the cofidece iterval mea legth, i miutes, of movies is 088 ± 086 34 = (068, 594) 4 Summary of Stadard Cofidece Itervals The cofidece iterval is a extesio of the idea of a poit estimatio of the parameter to a iterval that is likely to cotai the true parameter value A level C cofidece iterval for a populatio parameter is a iterval computed from the sample data havig probability C of producig a iterval cotaiig the true parameter value For a estimate of a populatio mea or proportio, a level C cofidece iterval ofte has the form estimate ± t stadard error where t is the upper C value for the t distributio with the appropriate umber of degrees of freedom If the umber of degrees of freedom is ifiite, we use the stadard ormal distributio to detemie the value, usually deoted by z The margi of error m = t stadard error decreases if C decreases the stadard deviatio decreases icreases The procedures for fidig the cofidece iterval are summarized i the table below procedure parameter estimate stadard error degrees of freedom oe sample µ x s two sample µ µ x x s + s ˆp( ˆp) oe proportio p ˆp two proportio p p ˆp ˆp ˆp ( ˆp ) + ˆp ( ˆp ) mi{, } 5