Statistics 511 Additional Materials

Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability from statistics is to thik of probability as the process of makig a iferece about a subset (a sample) of the populatio whe we kow the attributes of the etire populatio. This was the case i the previous sectios. We (preteded that we) kew the etire populatio or the etire distributio. We the were able to discuss probabilities for how a sigle observatio might behave, as well as how the average of several observatios might behave. Thus i probability we had kowledge of the whole populatio (via the values of its parameters) ad we wat to be able to make statemets about a observatio or a group of observatios. Now i statistics this process reverses. I statistics we will have a sample, a collectio of observatios, from the populatio. From this subset, the sample, we wat to be able to make statemets about the populatio. Backgroud The ideas preseted i this topic deped heavily o cocepts from the previous sectios. Specifically the idea of variability from sample to sample is crucial. As we saw previously, each time we take a sample we got differet values for the sample mea ad these values differed from the populatio mea. Oe cosequece of this is that usig the sample mea, aloe, is ot the best estimate of the populatio mea. The reaso beig that the sample mea differs. Each sample gives us a differet value for the sample mea. The idea of a cofidece iterval is that istead of simply usig a sigle umber, we use a iterval, a rage of umbers, to estimate the populatio mea. Defiitios ad prelimiaries Defiitio: A parameter is a umerical quatity that summarizes a characteristic of the etire populatio. Defiitio: A statistic is a umerical quatity that summarizes a characteristic of a subset of the populatio (this is a sample). We differetiate here betwee µ ad σ which are parameters ad x ad s x which are statistics. Recall that µ is the populatio mea ad σ is the populatio stadard deviatio, while x is the sample mea ad s x is the sample stadard deviatio. µ ad σ are umerical summaries for the etire populatio. x ad s x are calculated from a subset (sample) of the populatio. Defiitio: A poit estimate of a populatio parameter is a sigle umber used to estimate the ukow value of a populatio parameter. Page 1 of 9

Statistics 511 Additioal Materials Defiitio: A iterval estimate of a populatio parameter is a rage of umbers used to estimate the ukow value of a populatio parameter. Suppose that we are iterested i estimatig the mea weight of all black bears i West Virgiia. We are able to weigh 38 black bears. From the radom sample of 38 observatios, we wat to make statemets about the etire populatio of black bears. That is we wat to estimate the average weight of the etire populatio of black bears i West Virgiia. A poit estimate for this parameter, the mea weight of all black bears i WV, is 457 pouds. A iterval estimate would be that the mea weight of all black bears is betwee 428 ad 497 pouds. Now it is importat to ote that the actual mea weight of the populatio of all black bears i West Virgiia does ot chage. Rather it is our kowledge that is imperfect. We have oly 38 bears from the etire populatio ad as a cosequece we do ot kow the weight of all bears. That idea is worth reiteratig. The eed for iterval estimates comes from the fact that we do ot have all of the iformatio that we wat about the parameter, i the previous example, the populatio mea. That is, we have a subset of the populatio or a sample but ot the etire populatio. We kow that the sample mea is a radom variable ad that its value is likely ot the same as the populatio mea. Thus what we will do is specify a rage of values that are plausible based o our sample. It is worth otig that uless we specify a rage of values that goes from egative ifiity to positive ifiity, we ca ever guaratee that the populatio mea will be i the iterval estimate for that mea. No method will cotai or have the populatio mea iside the iterval for every sample; however, statistical methods ca specify the percetage of time that our itervals will miss the parameter of iterest. Whe we use the word cotai i this topic it has a special meaig. It is importat to remember that the value of the populatio mea or ay parameter is costat. Cosequetly, whe we are cosiderig a iterval estimate, if the parameter is iside the iterval we will cosider the parameter s value to be cotaied i the iterval. For the parameter to be cotaied i the iterval, it must fall betwee the edpoits of the iterval; a iterval will have a upper edpoit ad a lower edpoit. Defiitio: A (1-α)*100% cofidece iterval for a parameter is a iterval estimate that through repeatig the process of takig a sample ad makig a cofidece iterval from that sample, will cotai the parameter (1-α)*100% of the time. Defiitio: The cofidece level for a cofidece iterval is the percetage of times that a collectio of cofidece itervals will cotai the parameter of iterest i repeated samplig. The cofidece level for a 95% cofidece iterval is 0.95. The cofidece level for a 84% cofidece iterval is 0.84. The cofidece level is ofte deoted by (1-α)*100%. The reaso for this is to allow each researcher to specify his/her level of cofidece: Page 2 of 9

Statistics 511 Additioal Materials α=0.05 yields a cofidece level of 0.95; while α = 0.10 yields of cofidece iterval of 0.90. The defiitio of a cofidece iterval eeds some explaatio (maybe plety of explaatio). Each cofidece iterval is calculated from a sample. This sample is a subset of the populatio. Previously, we saw that each sample of observatios was differet. The idea of repeatig the process of takig a sample described i the defiitio above is just that, each sample will be differet. As we will see shortly (whe we talk about calculatios), sice each sample is differet, each cofidece iterval will be differet. Because of the variability from sample to sample, some of the cofidece itervals that we costruct will ot cotai the parameter it is tryig to estimate. The difficulty with cofidece itervals is this: We DON T get to kow the value of the parameter we are tryig to estimate; so we do t kow which itervals capture the parameter ad which do t. Remember that the parameter is a quatity calculated from the populatio. I Statistics, we oly see a subset of the populatio, so we caot kow the value of the parameter. Thus, we will costruct a cofidece iterval ad we will ot kow whether the parameter is iside the cofidece iterval. Istead we must be cotet to kow that the procedure works a certai percetage of the time, specifically (1-α)*100%. The procedure begis with selectig objects for the radom sample, gettig data from those uits ad the makig our calculatios. However, we do t kow if the cofidece iterval that we have created is oe of those (1-α)*100% of times that cotai the ukow value of the parameter or oe of the α*100% of the times that the cofidece iterval does ot cotai the ukow value of the parameter. Cosider this more cocrete example. A 95% cofidece iterval is made for the mea height of Yellow Poplars i West Virgiia. This iterval goes from 85.68 feet to 94.39 feet. This is based upo a radom sample of 56 trees take from aroud the state. We say that we are 95% cofidet that the mea height of the populatio of all Yellow Poplars i WV is betwee 85.68 ad 94.39 feet. However, sice we do t kow the actual value of the populatio mea we caot say with absolute certaity that it is betwee these umbers. So why, if I ca t say aythig with certaity usig a cofidece iterval would I use it at all. The aswer is simply that by usig statistics we ca accurately tell the percetage of times the process will fail. No other methodology allows you to specify that percetage. Statistics allows for this, but forces a layer of ucertaity ito the discourse. Cofidece Itervals o mu (Small Sample Size) Whe the umber of observatios i the sample is large (at least 30 observatios), we ca use the Cetral Limit Theorem to help us costruct a cofidece iterval o the ukow value of the populatio mea mu. For large samples the CLT tells us that the distributio of the possible values of the sample mea is ormal. We ca use the stadard ormal distributio to fid the critical value (z) that is part of the (margi of) error term. Page 3 of 9

Statistics 511 Additioal Materials Let s start with a 95% cofidece iterval for the populatio mea. 95% is a commo choice for the cofidece level. The critical value correspodig to a 95% cofidece level is 1.96. Note that 0.95 = P(-1.96<Z<1.96) = P( 1.96 < X µ <1.96) σ x Some algebra later: σ x σ = P( µ 1.96 < X < µ + 1.96 x ) The above expressio is a statemet of probability about X, if we kow the values for µ ad σ x. This follows what we did i the previous chapters whe we preteded that we kew the value of µ or σ x or m or p. Some more algebra later, we ca tur this ito a iterval for µ, σ = P( X 1.96 = 0.95. < µ < X + 1.96 x σ x ) This expressio provides the edpoits of the CI: X 1.96 σ x < µ < X +1.96 σ x which we ca write more succictly (ad i a more geeral fashio) as ) * σ x But we do t kow the value of σ x. Whe the value of σ x is ukow (always the case i the real world) we ca substitute our estimate s x for σ x. Our CI is computed as X ± t ( 1, α 2 After we have computed x ad s x (ad ot µ ad σ x ) the we ca costruct a cofidece iterval o the ukow value of mu; however, there are two direct results of this. 1.The critical value (z) i the error term o loger have a ormal distributio, it comes from somethig called a t-distributio. Page 4 of 9

Statistics 511 Additioal Materials 2.We o loger have a statemet of probability; we have a statemet of cofidece. The t-distributio The t-distributio is similar to the z distributio or stadard ormal distributio. It is based upo takig a sample of observatios from a Normal distributio with mea µ ad stadard deviatio σ. The radom variable T will possess a t-distributio with -1 degrees of freedom, where T X µ is computed as T =. sx A particular member of the family of t-distributios is defied by its umber of degrees of freedom much like the Poisso was idexed by µ. Degrees of freedom is a parameter for the family of t-distributios just as µ was a parameter for the Poisso family. The mea of a t radom variable is 0. This distributio is symmetric ad uimodal, but it has slightly more variability tha the Normal distributio. We will use the percetiles of the t-distributio quite frequetly throughout the rest of the course. Cosequetly, we have specific otatio for it. The k th percetile for a t-distributio with df degrees of freedom will be deoted by t (df;k). t (25, 0.05) would be the 95 th percetile of a t R.V. with 25 degrees of freedom. t (38,0.10) would be the 90 th percetile of a t R.V. with 38 degrees of freedom. For calculatig these percetiles we use Table A.2. This table has the degrees of freedom i the first colum ad percetiles i the other colums. This book uses P to represet the area to the right of the percetile, thus if we wat the 90 th percetile from the table, we eed to look i the colum with P = 0.10. Likewise the 99 th percetile ca be foud i the colum that is desigated by P = 0.01. t (14, 0.05) = 1.7613 t (25, 0.01) = 2.4851 t (30, 0.10) = 1.3104 Table A.2 does ot cotai all possible values for degrees of freedom. For example, if the degrees of freedom is 30 or more, the you would use the Stadard Normal table (Table A.1) to estimate the correspodig t-value. Page 5 of 9

Statistics 511 Additioal Materials Give a sample of 14 observatios from a distributio that is kow to be Normally distributed, costruct a 99% cofidece iterval o the ukow value of populatio parameter mu. Form the data we have calculated X = 4.127 ad s x = 0.358. We ca use the formula X ± t ( 1, α 2 sice there are more tha 2 observatios ad the data comes from a Normal distributio. First, df=-1=14-1=13 ad α =1 C.L. =1 0.99 = 0.01. So α 2 = 0.01 = 0.005 (here P=0.005) ad t = 3.0123 2 The X ± t ( 1, α 2 = 4.127 ± t (13,0.005) * 0.358 14 = 4.127 ± 3.0123* 0.358 14 = 4.127 ± 3.012*0.096 = 4.127 ± 0.28915 The edpoits of our cofidece iterval o mu are (3.838, 4.416). (3.838, 4.416) is mathematical otatio for a iterval that goes from 3.838 to 4.416. Page 6 of 9

Statistics 511 Additioal Materials So a 99% cofidece iterval for the populatio mea goes from 3.838 to 4.416. We iterpret this by sayig that we are 99% cofidet that the populatio mea s value falls withi the iterval 3.838 to 4.416. Cofidece Itervals o mu (Large Sample Size) Whe the sample size is large (at least 30 observatios) a t-distributio with -1 d.f. is virtually idetical to the Stadard Normal distributio. So we ca obtai our critical value from the Stadard Normal distributio table istead of the t-table. I the large sample situatio, the formula for a (1-α)*100% cofidece iterval o mu becomes The t-value i the error term is replaced by a z-value from a Stadard Normal distributio. All else remais the same. The formula above ca be used to costruct a (1-α)*100% cofidece iterval (CI) for the populatio mea whe 1. (the umber of observatios) is more tha 2 ad the origial data (the values of the variable X) are approximately Normal or 2. is at least 30 ( 30) (ad we do t kow what distributio the data came from) Suppose that wat to estimate the mea of a populatio. We have a sample of 48 observatios from a populatio. The mea of these observatios is 290.34 ad the stadard deviatio of these observatios is 41.22. The a 95% cofidece iterval for the populatio mea would be as follows We ca use the formula below sice 30. = 290.34 ±1.96* 41.22 48 = 290.34 ±11.66 Page 7 of 9

Statistics 511 Additioal Materials = (278.368, 302.00) (278.68, 302.00) is mathematical otatio for a iterval that goes from 278.68 to 302.00. Thus a 95% cofidece iterval for the populatio mea goes from 278.68 to 302.00. We iterpret this by cocludig that we are 95% cofidet the ukow value of the populatio mea is betwee 278.68 ad 302.00. Cofidece istead of probability Whe we are dealig with parameters such as µ or σ, we are dealig with fixed quatities. As a cosequece, if we make a statemet such as the value of the populatio mea is betwee 8.5 ad 19.4, that statemet is either true or false. The populatio mea is either i the cofidece iterval or the populatio mea is outside of the cofidece iterval. This has implicatios for our iterpretatio of a cofidece iterval. After we create a 95% cofidece iterval for µ from say 175.46 to 176.32, the P(175.46< µ< 176.32) 0.95. This probability, P(175.46< µ< 176.32) is either 0 or 1. The value of the populatio mea is either iside the iterval or it is ot iside the iterval. The cofidece that we assert comes from repetitios of the process of takig may samples ad calculatig the cofidece iterval for each sample. However, for ay idividual iterval we do ot kow whether the mea is iside the iterval or outside the iterval. What we do kow is that if we repeated the process of collectig samples ad makig 95% cofidece itervals for the populatio mea from each sample, the approximately 95% of those cofidece itervals would cotai the populatio mea. Factors Affectig the Width of a (1-α)*100% Cofidece Iterval There are three factors that ifluece the size or width of a cofidece iterval. The sample size. As icreases, the width of the CI decreases. Cofidece level (1-α). As cofidece level icreases, the width of the CI icreases. The sample stadard deviatio s. The bigger s is, the wider the CI is. Summary: The basic form of a cofidece iterval for a populatio parameter is as follows: error. Poit estimate ± critical value from a samplig distributio * stadard Page 8 of 9

Statistics 511 Additioal Materials The poit estimate is best sigle umber estimate for the parameter. The stadard error is a estimate of the variability from sample to sample for the poit estimate. The critical value that is used is based upo the cofidece level that we wat to use ad the samplig distributio is determied by the type of parameter that we are estimatig. (1-α)*100% Large Sample CI o mu: (1-α)*100% Small Sample CI o mu: X ± t ( 1, α 2 (1-α)*100% CI o mu whe our data comes from a ormal distributio (regardless of the sample size): Page 9 of 9