Chapter 6. Sampling and Estimation

Samplig ad Estimatio - 34 Chapter 6. Samplig ad Estimatio 6.. Itroductio Frequetly the egieer is uable to completely characterize the etire populatio. She/he must be satisfied with examiig some subset of the populatio, or several subsets of the populatio, i order to ifer iformatio about the etire populatio. Such subsets are called samples. A populatio is the etirety of observatios ad a sample is a subset of the populatio. A sample that gives correct ifereces about the populatio is a radom sample, otherwise it is biased. Statistics are give differet symbols tha the expectatio values because statistics are approximatios of the expectatio value. The statistic called the mea is a approximatio to the expectatio value of the mea. The statistic mea is the mea of the sample ad the expectatio value mea is the mea of the etire populatio. I order to calculate a expectatio, oe requires kowledge of the PDF. I practice, the motivatio i calculatig a statistic is that oe has o kowledge of the uderlyig PDF. 6.. Statistics Ay fuctio of the radom variables costitutig a radom sample is called a statistic. Example 6..: Mea The mea is a statistic of a radom sample of size ad is defied as X X i i (6.) Example 6..: Media The media is a statistic of a radom sample of size, which represets the middle value of the sample ad, for a samplig arraged i icreasig order of magitude, is defied as 34

Samplig ad Estimatio - 35 ~ X X ~ X X ( + / ) / + X ( + ) / for odd for eve (6.) The media of the sample space {,,3} is. The media of the sample space {3,,} is. The media of the sample space {,,3,4} is.5. Example 6.3.: Mode The mode is a statistic of a radom sample of size, which represets the most frequetly appearig value i the sample. The mode may ot exist ad, if it does, it may ot be uique. The mode of the sample space {,,,3} is. The mode of the sample space {,,,3,4,4} is ad 4. (bimodal) The mode of the sample space {,,3} does ot exist sice each etry occurs oly oce. Example 6.4.: Rage The rage is a statistic of a radom sample of size, which represets the spa of the sample ad, for a samplig arraged i icreasig order of magitude, is defied as rage(x) X -X (6.3) The rage of {,,3,4,5} is 5-4. Example 6.5.: Variace The variace is a statistic of a radom sample of size, which represets the spread of the sample ad is defied as S ( X X ) ( X i ) i i i i ( ) X i (6.4) The reaso for usig (-) i the deomiator rather tha is give later. Example 6.6.: Stadard Deviatio The stadard deviatio, s, is a statistic of a radom sample of size, which represets the spread of the sample ad is defied as the positive square root of the variace. S S (6.5) 35

Samplig ad Estimatio - 36 6.3. Samplig Distributios We have ow stated the defiitios of the statistics we are iterested i. Now, we eed to kow the distributio of the statistics to determie how good these samplig approximatios are to the true expectatio values of the populatio. Statistic. Mea whe the variace is kow: Samplig Distributio If X is the mea of a radom sample of size take from a populatio with mea µ ad variace, the the limitig form of the distributio of X µ Z (6.6) / as, is the stadard ormal distributio (z;0,). This is kow as the Cetral Limit Theorem. What this says is that, give a collectio of radom samples, each of size, yieldig a mea X, the distributio of X approximates a ormal distributio, ad becomes exactly a ormal distributio as the sample size goes to ifiity. The distributio of X does ot have to be ormal. Geerally, the ormal approximatio for X is good if > 30. We provide a derivatio i Appedix V provig that the distributio of the sample mea is give by the ormal distributio. Example 6.7.: distributio of the mea, variace kow I a reactor iteded to grow crystals i solutio, a seed is used to ecourage ucleatio. Idividual crystals are radomly sampled from the effluet of each reactor of sizes 0. The populatio has variace i crystal size of. 0 µm. (We must kow this from previous research.) The samples yield mea crystal sizes of x 5. 0 µm. What is the likelihood that the true populatio mea, µ, is actually less tha 4.0 µm? Z x µ 5 4 3.6 / / 0 ( < 4 ) P( z > 3.6) P µ We have the chage i sig because as µ icreases, z decreases. The evaluatio of the cumulative ormal probability distributio ca be performed several ways. First, whe the pioeers were crossig the plais i their covered wagos ad they wated to evaluate probabilities from the ormal distributio, they used Tables of the cumulative ormal PDF, such as those provided i the back of the statistics textbook. These tables are also available olie. For example wikipedia has a table of cumulative stadard umeral PDFs at 36

Samplig ad Estimatio - 37 http://e.wikipedia.org/wiki/stadard_ormal_table Usig the table, we fid ( < 4 ) P( z > 3.6) P( z < 3.6) 0.999 0. 0008 P µ Secod, we ca use a moder computatioal tool like MATLAB to evaluate the probability. The problem ca be worked i terms of the stadard ormal PDF (µ 0 ad ), which for P µ < 4 P z > 3.6 P z < 3.6 is ( ) ( ) ( ) >> p - cdf('ormal',3.6,0,) p 7.83447870803e-04 Alteratively, the problem ca be worked i terms of the o-stadard ormal PDF ( x 5 ad / / 0 P µ < 4 ), which for ( ) >> p cdf('ormal',4,5,/sqrt(0)) p 7.87090076e-04 The differece i these results is due to the roud-off i 3.6, used as a argumet i the fuctio call for the stadard ormal distributio. Based o our samplig data, the probability that the true sample mea is less tha 4.0 µm is 0.078%. Statistic. differece of meas whe the variace is kow: Samplig Distributio It is useful to kow the samplig differece of two meas whe you wat to determie whether there is a sigificat differece betwee two populatios. This situatio applies whe you takes two radom samples of size ad from two differet populatios, with meas µ ad µ ad variaces ad, respectively. The the samplig distributio of the differece of meas, X X X X, is approximately ormal, distributed with mea µ µ µ ad variace XX 37

Samplig ad Estimatio - 38 Hece, Z ( X X ) ( µ µ ) + (6.7) is approximately a stadard ormal variable. Example 6.8.: distributio of the differece of meas, variaces kow I a reactor iteded to grow crystals, two differet types of seeds are used to ecourage ucleatio. Idividual crystals are radomly sampled from the effluet of each reactor of sizes 0 ad 0. The populatios have variaces i crystal size of. 0 µm ad.0 µm. (We must kow this from previous research.) The samples yield mea crystal sizes of X 5. 0 µm ad X 0. 0 µm. How cofidet ca we be that the true differece i populatio meas, µ µ, is actually 4.0 µm or greater? Usig equatio (6.7) we have: Z ( X X ) ( µ µ ) ( 5 0) ( 4) + 0 + 0.36 ( µ > 4.0) P( z.36) P µ < We have the chage i sig because as µ icreases, z decreases. The probability that µ µ is greater 4.0 µm is the give by P(Z<.36). How do we kow that we wat P(Z<.36) ad ot P(Z>.36)? We just have to sit dow ad thik what the problem physically meas. Sice we wat the probability that µ µ is greater 4.0 µm, we kow we eed to iclude the area due to higher values of µ µ. Higher values of µ µ yield lower values of Z. Therefore, we eed the less tha sig. The evaluatio of the cumulative ormal probability distributio ca agai be performed two ways. First, usig a stadard ormal table, we have P(Z <. 4 ) 0. 9875 Secod, usig MATLAB we have >> p cdf('ormal',.36,0,) 38

Samplig ad Estimatio - 39 p 0.987373897090 We expect 98.73% of the differeces i crystal size of the two populatios to be at least 4.0 µm. Statistic 3. Mea whe the variace is ukow: Samplig Distributio Of course, usually we do t kow the populatio variace. I that case, we have to use some other statistic to get a hadle o the distributio of the mea. If X is the mea of a radom sample of size take from a populatio with mea µ ad ukow variace, the the limitig form of the distributio of T X µ (6.8) S / as, is the t distributio f T ( t; v). The T-statistic has a t-distributio with v- degrees of freedom. The t-distributio is just aother cotiuous PDF, like the others we leared about i the previous sectio. The t distributio is give by Γ f ( t) Γ [( v + ) / ) ] ( v / ) πv t + v As a remider, the t distributio is plotted agai i Figure 6.. v+ Example 6.9.: distributio of the mea, variace ukow I a reactor iteded to grow crystals, a seed is used to ecourage ucleatio. Idividual crystals are radomly sampled from the effluet of each reactor of sizes 0. The populatio has ukow variace i crystal size. The samples yield mea crystal sizes of x 5. 0 µm ad a sample variace of s. 0µm. f(t) for 0.45 0.4 0.35 0.3 0.5 0. 0.5 0. 0.05 < t < 0-6 -4-0 4 6 t Figure 6.. The t distributio as a fuctio of the degrees of freedom ad the ormal distributio. 39 ormal 00 50 0 0 5

Samplig ad Estimatio - 40 What is the likelihood that the true populatio mea, µ, is actually less tha 4.0 µm? t x µ 5 4 3.6 s / / 0 ( < 4 ) P( t > 3.6) P µ We have the chage i sig because as µ icreases, t decreases. The parameter v - 9. The evaluatio of the cumulative t probability distributio ca agai be performed two ways. First, we ca use a table of critical values of the t-distributio. It is crucial to ote that such a table does ot provide cumulative PDFs, rather it provides oe mius the cumulative PDF. I other words, where as the stadard ormal table provides the probability less tha z (the cumulative PDF), the t-distributio table provides the probability greater tha t (oe mius the cumulative PDF). We the have ( < 4) P( t > 3.6) 0. 007 P µ Secod, usig MATLAB we have P ( µ < 4 ) P( t > 3.6) P( t < 3.6) >> p - cdf('t',3.6,9) p 0.0057565656007 Based o our samplig data, the probability that the true sample mea is less tha 4.0 µm is 0.57%. We should poit out that our percetage here is substatially greater tha for our percetage whe we kew the populatio variace (0.078%). That is because kowig the populatio variace reduces our ucertaity. Approximatig the populatio variace with the samplig variace adds to the ucertaity ad results i a larger percetage of our populatio deviatig farther from the sample mea. Example 6.0.: distributio of the mea, variace ukow A egieer claims that the populatio mea yield of a batch process is 500 g/ml of raw material. To verify this, she samples 5 batches each moth. Oe moth the sample has a mea X 58 g ad a stadard deviatio of s40 g. Does this sample support his claim that µ 500 g? The first step i solvig this problem is to compute the T statistic. T X µ 500 58.5 S / 40 / 5 40

Samplig ad Estimatio - 4 Secod, usig MATLAB we have P ( µ > 58) P( t <.5) >> p cdf('t',-.5,4) p 0.069445545754 (Or usig a Table, we fid that whe v4 ad T.5, 0.0). This meas there is oly a.6% probability that a populatio with µ 500 would yield a sample with X 58 or higher. Therefore, it is ulikely that 500 is the populatio mea. Statistic 4. differece of meas whe the variace is ukow: Samplig Distributio It is useful to kow the samplig differece of two meas whe you wat to determie whether there is a sigificat differece betwee two populatios. Sometimes you wat to do this whe you do t kow the populatio variaces. This situatio applies whe you takes two radom samples of size ad from two differet populatios, with meas µ ad µ ad ukow variaces. The the samplig distributio of the differece of meas, X X, follows the t- distributio. trasformatio: T symmetry: t t, ( X X ) ( µ µ ) s s + (6.9) parameters: v + if s s + parameters: v if s ( ) + ( ) s Sice we do t kow either populatio variace i this case, we ca t assume they are equal uless we are told they are equal. Example 6..: distributio of the differece of meas, variaces ukow I a reactor iteded to grow crystals, two differet types of seeds are used to ecourage ucleatio. Idividual crystals are radomly sampled from the effluet of each reactor of sizes 0 ad 0. The populatios have ukow variaces i crystal size. The samples yield 4

Samplig ad Estimatio - 4 mea crystal sizes of X 5. 0 µm ad 0. X 0 µm ad sample variaces of s. 0 µm ad s.0 µm. What percetage of true populatio differeces yieldig these samplig results would have a true differece i populatio meas, µ µ, of 4.0 µm or greater? T ( X X ) ( µ µ ) ( 5 0) ( 4) s s + 0 + 0.36 The degree of freedom parameter is give by: v s 0 + 0 0 0 ( ) + ( ) ( 0 ) + ( 0 ) s s + s 7.98 8 ( µ > 4.0) P( t <.36) P( t.36) P µ > The evaluatio of the cumulative ormal probability distributio ca agai be performed two ways. First, usig a table of critical values of the t-distributio, we have ( µ > 4.0) P( t <.36) P( t >.36) 0.07 0. 9783 P µ Secod, usig MATLAB we have for P ( µ µ > 4.0) P( t.36) >> p cdf('t',.36,8) p 0.9835747598848 < We expect 98.3% of the differeces i crystal size of the two populatios to be at least 4.0 µm. Statistic 5. Variace: Samplig Distributio We ow wish to kow the samplig distributio of the sample variace, S. If S is the variace of a radom sample of size take from a populatio with mea µ ad variace, the the statistic χ ( ) S ( X i X ) (6.0) i 4

Samplig ad Estimatio - 43 has a chi-squared distributio with v- degrees of freedom, f ( χ ; ). The chi-squared distributio is defied as f χ ( x; v) v / x Γ( v / ) 0 v/- e -x/ for x > 0 elsewhere χ It is a special case of the Gamma Distributio, whe v/ ad β, where v is called the degrees of freedom ad is a positive iteger. As a remider, we provide a plot of the chisquared distributio i Figure 6.. f(χ ) 0.8 0.6 0.4 0. 0. 0.08 50 40 30 0 0 5 Example 6..: distributio of 0.06 the variace 0.04 I a reactor iteded to grow crystals, a seed is used to 0.0 ecourage ucleatio. Idividual 0 crystals are radomly sampled 0 0 0 30 40 50 60 70 80 90 00 χ from the effluet of each reactor Figure 6.. The chi-squared distributio for various values of sizes 0. The samples of v. yield mea crystal sizes of x 5.0 µm ad a sample variace of s. 0µm. What is the likelihood that the true populatio variace,, is actually less tha 0.5 µm? ( ) S χ P (0 ) 8 0.5 ( < 0.5) P( χ > 8) 9. We have the chage i sig because as icreases, χ decreases. The parameter v - The evaluatio of the cumulative χ probability distributio ca agai be performed two ways. First, we ca use a table of critical values of the χ -distributio. It is crucial to ote that such a table does ot provide cumulative PDFs, rather it provides oe mius the cumulative PDF. We the have 43

Samplig ad Estimatio - 44 P ( < 0.5) P( χ > 8) 0. 04 Secod, usig MATLAB we have P ( < 0.5) P( χ > 8) P( χ < 8) >> p - cdf('chi',8,9) p 0.03573539466985 Based o our samplig data, the probability that the true variace is less tha 0.5 µm is 3.5%. Statistic 6. the ratio of Variaces: Samplig Distributio (F-distributio) Just as we studied the distributio of two sample meas, so too are we iterested i the distributio of two variaces. I the case of the mea, it was a differece. I the case of the variace, the ratio is more useful. Now cosider samplig two radom samples of size ad from two differet populatios, with meas ad, respectively. The statistic, F, S / S F (6.) S / S provides a distributio of the ratio of two variaces. This distributio is called the F-distributio with v ad v degrees of freedom. The f-distributio is defied as h ( f ; v, v ) f v + v v Γ v v v Γ Γ 0 v v + v f v + v v f for f > 0 elsewhere As a remider, the f-distributio is plotted i Figure 6.3. Example 6.3.: ratio of the variaces I a reactor iteded to grow crystals, two differet types of seeds are used to ecourage ucleatio. Idividual crystals are radomly sampled from the effluet of each reactor of sizes 0 ad 0. The populatios have ukow variaces i crystal size. The samples yield 44

Samplig ad Estimatio - 45 mea crystal sizes of X 5. 0 µm ad X 0. 0 µm ad sample variaces of s. 0 µm ad s.0 µm. What is the probability that the ratio of variaces,, is less tha 0.5? h(f ) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 v 0, v0 v0, v0 v 0, v5 v5, v0 v5, v0 v5, v5 S F S 0.5 P < 0.5 > P F ( ) 0. 0. 0 0 3 4 5 6 7 8 9 f Figure 6.3. The F distributio for various values of v ad v. We have the chage i sig because as icreases, F decreases. The parameters are v 9 ad v 9. The evaluatio of the cumulative F probability distributio ca agai be performed i oe way. We caot use tables because there are o tables for arbitrary values of the probability. There are oly tables for two values of the probability, 0.0 ad 0.05. Therefore, usig MATLAB we have P < 0.5 ( > ) ( < ) P F P F >> p - cdf('f',,9,9) p 0.09743049973 Based o our samplig data, the probability that the ratio of variaces is less tha 0.5 is 9.7%. 6.4. Cofidece Itervals I the previous sectio we showed what types of distributios describe various statistics of a radom sample. I this sectio, we discuss estimatig the populatio mea ad variace from the sample mea ad variace. I additio, we itroduce cofidece itervals to quatify the goodess of these estimates. 45

Samplig ad Estimatio - 46 A cofidece iterval is some subset of radom variable space with which someoe ca say somethig like, I am 95% sure that the true populatio mea is betwee µ low ad µ hi. I this sectio, we discuss how a cofidece iterval is defied ad calculated. The cofidece iterval is defied by a percet. This percet is called (-). So if 0.05, the you would have a 90% cofidece iterval. The cocept of a cofidece iterval is illustrated i graphical terms i Figure 6.4. Figure 6.4. A schematic illustratig a cofidece iterval. The trick the is to fid µ low z ad µ hi z so that you ca say for a give, I am ( )% cofidet that µ low < µ < µ hi. Statistic. mea, kow: cofidece iterval We ow kow that the sample mea is distributed with the stadard ormal distributio. For a symmetric PDF, cetered aroud zero, like the stadard ormal, µ low µ hi. We ca the make the statemet: P ( z < Z < z ) Now the ormal distributio is symmetric about the y-axis so we ca write z z so P ( z < Z < z P z < Z < z ) ( ) 46

where Samplig ad Estimatio - 47 Z X µ. / We ca rearrage this to equatio to read P ( X + z < µ < X z ) (6.) where we ow have µ low ad µ hi explicitly. Example 6.4.: cofidece iterval o mea, variace kow Samples of dioxi cotamiatio i 36 frot yards i St. Louis show a cocetratio of 6 ppm. Fid the 95% cofidece iterval for the populatio mea. Assume that the stadard deviatio is.0 ppm. To solve this, first calculate, z, z. 0.95 0.05 z z0.05.96 z z.96 The z value came from a stadard ormal table. Alteratively, we ca compute this value from MATLAB, >> z icdf('ormal',0.05,0,) z -.959963984540055 Here we used the iverse cumulative distributio fuctio (icdf) commad. Sice we have the stadard ormal PDF, the mea is 0 ad the variace is. The value of 0.05 correspods to alpha, the probability. To get the value of the other limit, we either rely o symmetry, or compute it directly, >> z icdf('ormal',0.975,0,) z.959963984540054 Note that these values of z are idepedet of all aspects of the problem except the value of the cofidece iterval. 47

Samplig ad Estimatio - 48 Therefore, by equatio (6.) P ( 6 + (.96) < µ < X (.96) 0.05 0.95 36 36 so the 95% cofidece iterval for the mea is 5.673 < µ < 6. 37. Statistic. mea, ukow: cofidece iterval Now usually, we do t kow the variace. We have to use our estimate of the variace, s, for. I that case, estimatig the mea requires the T-distributio. (See previous sectio.) Let me stress that we do everythig exactly as we did before but we use s for ad use the t-distributio istead of the ormal distributio. Remember the t-distributio is also symmetric about the origi, so t t. (this meas you oly have to compute the t probability oce. Remember, v-. where P ( t < T < t ) P( t < T < t ) T X µ. s / Just as before, we ca rearrage this to equatio to read s s P ( X + t < µ < X t ) (6.3) where we ow have µ low ad µ hi explicitly. Example 6.5.: cofidece iterval o mea, variace ukow Samples of dioxi cotamiatio i 36 frot yards i St. Louis show a cocetratio of 6 ppm. Fid the 95% cofidece iterval for the populatio mea. The sample stadard deviatio, s, was measured to be.0. To solve this, first calculate, t, t for v 35. 0.95 0.05 t t0.05.03 t t +.03 48

Samplig ad Estimatio - 49 The t value came from a table of t-distributio values. Alteratively, we ca compute this value usig MATLAB, >> t icdf('t',0.05,35) t -.03007985034 ad for the upper limit >> t icdf('t',0.975,35) t.03007985034, which ca also be obtaied by symmetry. Note that these values of t are idepedet of all aspects of the problem except the value of the cofidece iterval ad the umber of sample poits,. Therefore, by equatio (6.3) P ( 6 (.03) < µ < X + (.03) 0.05 0.95 36 36 so the 95% cofidece iterval for the mea is 5.66 < µ < 6. 338. You should ote that we are a little less cofidet about the mea whe we use the sample variace as the estimate for the populatio variace, for which the 95% cofidece iterval for the mea was 5.673 < µ < 6. 37. Statistic 3. differece of meas, kow: cofidece iterval The exact same derivatio that we used above for a sigle mea ca be used for the differece of meas. Whe we the variaces of the two samples are kow, we have: P ( X X ) + z + < ( µ µ ) < ( X X ) z + (6.4) where z is a radom variable obeyig the stadard ormal PDF. Example 6.6.: cofidece iterval o the differece of meas, variaces kow Samples of dioxi cotamiatio i 36 frot yards i Times Beach, a suburb of St. Louis, show a cocetratio of 6 ppm with a populatio variace of.0 ppm. Samples of dioxi cotamiatio i 6 frot yards i Quail Ru, aother suburb of St. Louis, show a cocetratio of 8 ppm with a populatio variace of 3.0 ppm. Fid the 95% cofidece iterval for the differece of populatio meas.. 49

Samplig ad Estimatio - 50 To solve this, first calculate, z, z. 0.95 0.05 z z0.05.96 z z.96 The z value came from a table of stadard ormal PDF values. Alteratively, we ca compute this value from MATLAB, >> z icdf('ormal',0.05,0,) z -.959963984540055 Therefore, by equatio (6.6) P P 36 3 6 3 6 ( 8).96 + < ( µ µ ) < ( 6 8) +.96 + (0.05) 36 6 [.909 < ( µ µ ) <.09] 0. 95 So the 95% cofidece iterval for the mea is.909 < ( µ µ ) <. 09. If we are determiig which site is more cotamiated, the we are 95% sure that site (Quail Ru) is more cotamiated by to 3 ppm tha site, (Times Beach). Statistic 4. differece of meas, ukow: cofidece iterval Whe we the variaces of the two samples are ukow, we have: s s s s P ( X X ) + t + < ( µ µ ) < ( X X ) + t + (6.5) where the umber of degrees of freedom for the t-distributio is v + if 50

Samplig ad Estimatio - 5 5 ( ) ( ) + + s s s s v if Example 6.6.: cofidece iterval o the differece of meas, variaces ukow Samples of dioxi cotamiatio i 36 frot yards i Times Beach, a suburb of St. Louis, show a cocetratio of 6 ppm with a sample variace of.0 ppm. Samples of dioxi cotamiatio i 6 frot yards i Quail Ru, aother suburb of St. Louis, show a cocetratio of 8 ppm with a sample variace of 3.0 ppm. Fid the 95% cofidece iterval for the differece of populatio meas.. To solve this, first calculate,, t t. ( ) ( ) ( ) ( ) 0 9.59 6 6 3 36 36 6 3 36 + + + + s s s s v.086.086 0.05 0.95 0.05 t t t t The t value came from a table of t-pdf values. Alteratively, we ca compute this value usig MATLAB, >> t icdf('t',0.05,0) t -.08596344765864 Therefore, substitutig ito equatio (6.5) yields ( ) ( ) ( ) ) (0.05 6 3 36.086 8 6 6 3 36.086 8 6 + + < < + µ µ P ( ) [ ] 95 0..03.97 < < µ µ P

So the 95% cofidece iterval for the mea is.97 < ( µ ) <. 03 µ. Samplig ad Estimatio - 5 If we are determiig which site is more cotamiated, the we are 95% sure that site (Quail Ru) is more cotamiated by to 3 ppm tha site, (Times Beach). Statistic 5. variace: cofidece iterval The cofidece iterval of the variace ca be estimated i a precisely aalogous way, kowig that the statistic ( ) S ( X i X ) χ i has a chi-squared distributio with v- degrees of freedom, f ( χ ; ). So χ ( ) ( ) P < < (6.6) χ χ Perversely, the tables of the critical values for the χ distributio, have defied to be -, so the idices have to be switched whe usig the table. ( ) ( ) P < < whe usig the χ critical values table oly! χ χ If you get cofused, just remember that the upper limit must be greater tha the lower limit. Remember also that the f ( χ ; ) is ot symmetric about the origi, so we caot use the χ symmetry argumets used for the cofidece itervals for fuctios of the mea. Example 6.7.: variace Samples of dioxi cotamiatio i 6 frot yards i St. Louis show a cocetratio of 6 ppm. Fid the 95% cofidece iterval for the populatio mea. The sample stadard deviatio, s, was measured to be.0. To solve this, first calculate χ, χ. For v 5, we have, 0.95 0.05 χ χ 0.05 χ χ 0.975 7.488 6.6 5

Samplig ad Estimatio - 53 The t value came from a table of value usig MATLAB, >> chi icdf('chi',0.05,5) chi 6.6377950435 ad >> chi icdf('chi',0.975,5) chi 7.488398634497 Therefore, substitutig ito equatio (6.6) yields χ -distributio values. Alteratively, we ca compute this ( 6 ).0 P < 7.488 (6 ).0 < (0.05) 6.6 P ( 0.5457 < <.395) 0. 95 So the 95% cofidece iterval for the mea is 0.5457 < <. 395. Statistic 6. ratio of variaces: cofidece iterval (p. 53) The ratio of two populatio variaces ca be estimated i a precisely aalogous way, kowig that the statistic S / F S / S S follows the F-distributio with v ad v degrees of freedom. Remember, the F- distributio has a symmetry, f / ( v, v). This symmetry relatio is essetial if oe f / ( v, v ) is to use tables for the critical value of the F-distributio. It is ot essetial if oe uses MATLAB commads. If oe is computig the cumulative PDF for the f distributio, the oe simply, rearrages this equatio for 53

Samplig ad Estimatio - 54 S F S F S S S S P < < (6.7) S f ( v, v ) S f ( v, v ) Oe otes that the order of the limits has chaged here, sice as goes up, F goes dow. I ay case, the lower limit must be smaller tha the upper limit. If oe chooses to use tables of critical values, oe must take ito accout two idiosycrasies of the procedure. First, as was the case with the t ad chi-squared distributios, the table provide the probability that f is greater tha a value, ot the cumulative PDF, which is the probability that f is less tha a value. Secod, the tables oly provide data for small values of. Therefore, we must elimiate all istaces of -., usig a symmetry relatio. The result is S S P < < f ( v, v ) whe usig the tables oly! S f ( v, v) S Example 6.8.: cofidece iterval o the ratio of variaces Samples of dioxi cotamiatio i 0 frot yards i Times Beach, a suburb of St. Louis, show a cocetratio of 6 ppm with a sample variace of.0 ppm. Samples of dioxi cotamiatio i 6 frot yards i Quail Ru, aother suburb of St. Louis, show a cocetratio of 8 ppm with a sample variace of 3.0 ppm. Fid the 90% cofidece iterval for the differece of populatio meas.. To solve this, first calculate, F, F, with v 9 ad v 5 0.90 0.05 We ca compute the f probabilities usig MATLAB, >> f icdf('f',0.05,9,5) f 0.4476496650385 54

Samplig ad Estimatio - 55 ad >> f icdf('f',0.95,9,5) f.339898665456 Substitutig ito equatio (6.6) yields P 3.3398 < < 3 0.4476 P 0.45 < < 0.7447 0.90 (0.05) Alteratively, we ca use the table of critical values F F 0.05 F ( v 0.05 F 5, v 0.05 ( v 9, v 9).3 5) F 0.05 ( v 0, v 5).33 P 3.33 P 0.43 < < <.3 (0.05) 3 < 0.7433 0.90 So the 90% cofidece iterval for the mea is.45 < < 0. 7447. 0 If we are determiig which site has a greater variace of cotamiatio levels the we are 90% sure that site (Quail Ru) has more variace by a factor of.3 to 7.0. 6.5. Problems We ited to purchase a liquid as a raw material for a material we are desigig. Two vedors offer us samples of their product ad a statistic sheet. We ru the samples i our ow labs ad come up with the followig data: 55

Samplig ad Estimatio - 56 Vedor Vedor sample # outcome sample # outcome.3.49.49.98 3.05 3.8 4.4 4.36 5.8 5.47 6. 6.36 7.38 7.8 8.39 8.88 9.4 9.87 0.46 0.87.9.04 3.43 4.34 5.9 6. Vedor Specificatio Claims: Vedor : µ. 0 ad 0. 05, 0.36 Vedor : µ. 3 ad 0., 0.3464 Sample statistics, based o the data provided i the table above. 6 i x ( x i x ) 6.80 x i 6 0 6 i x ( ) [ ] 0.09 s 53 6 s 0. 0 [ ] 0.0744 0 x i.8 s 0 i 0 i x i x s 0.78 Problem 6.. Determie a 95% cofidece iterval o the mea of sample. Use the value of the populatio variace give. Is the give populatio mea legitimate? Problem 6.. Determie a 95% cofidece iterval o the differece of meas betwee samples ad. Use the values of the populatio variace give. Is the differece betwee the give populatio meas legitimate? 56

Samplig ad Estimatio - 57 Problem 6.3. Determie a 95% cofidece iterval o the mea of sample. Assume the give values of the populatio variaces are suspect ad ot to be trusted. Is the give populatio mea legitimate? Problem 6.4. Determie a 95% cofidece iterval o the differece of meas betwee samples ad. Assume the give values of the populatio variaces are suspect ad ot to be trusted. Is the differece betwee the give populatio meas legitimate? Problem 6.5. Determie a 95% cofidece iterval o the variace of sample. Is the give populatio variace legitimate? Problem 6.6. Determie a 98% cofidece iterval o the ratio of variace of samples &. Is the ratio of the give populatio variaces legitimate? 57