Chapter 6. Sampling and Estimation

Similar documents
Statistics 511 Additional Materials

Stat 421-SP2012 Interval Estimation Section

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Expectation and Variance of a random variable

Topic 9: Sampling Distributions of Estimators

Module 1 Fundamentals in statistics

6.3 Testing Series With Positive Terms

Chapter 8: Estimating with Confidence

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Frequentist Inference

Random Variables, Sampling and Estimation

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Chapter 6 Sampling Distributions

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Computing Confidence Intervals for Sample Data

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

1 Inferential Methods for Correlation and Regression Analysis

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Power and Type II Error

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Properties and Hypothesis Testing

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Basis for simulation techniques

Estimation of a population proportion March 23,

1 Approximating Integrals using Taylor Polynomials

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Lecture 2: Monte Carlo Simulation

Lecture 7: Properties of Random Samples

1.010 Uncertainty in Engineering Fall 2008

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Simulation. Two Rule For Inverting A Distribution Function

ANALYSIS OF EXPERIMENTAL ERRORS

This is an introductory course in Analysis of Variance and Design of Experiments.

(6) Fundamental Sampling Distribution and Data Discription

Statistical Intervals for a Single Sample

NICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) =

Confidence Level We want to estimate the true mean of a random variable X economically and with confidence.

(7 One- and Two-Sample Estimation Problem )

Estimation for Complete Data


DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Chapter 23: Inferences About Means

Statistics 300: Elementary Statistics

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Bayesian Methods: Introduction to Multi-parameter Models

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Parameter, Statistic and Random Samples

Data Analysis and Statistical Methods Statistics 651

Understanding Samples

Common Large/Small Sample Tests 1/55

Statistical inference: example 1. Inferential Statistics

An Introduction to Randomized Algorithms

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

The Growth of Functions. Theoretical Supplement

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

AP Statistics Review Ch. 8

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

Statisticians use the word population to refer the total number of (potential) observations under consideration

Chapter 2 The Monte Carlo Method

The standard deviation of the mean

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Infinite Sequences and Series

Read through these prior to coming to the test and follow them when you take your test.

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Binomial Distribution

Output Analysis (2, Chapters 10 &11 Law)

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Sample Size Determination (Two or More Samples)

Chapter 2 Descriptive Statistics

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Confidence Intervals

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Kinetics of Complex Reactions

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Transcription:

Samplig ad Estimatio - 34 Chapter 6. Samplig ad Estimatio 6.. Itroductio Frequetly the egieer is uable to completely characterize the etire populatio. She/he must be satisfied with examiig some subset of the populatio, or several subsets of the populatio, i order to ifer iformatio about the etire populatio. Such subsets are called samples. A populatio is the etirety of observatios ad a sample is a subset of the populatio. A sample that gives correct ifereces about the populatio is a radom sample, otherwise it is biased. Statistics are give differet symbols tha the expectatio values because statistics are approximatios of the expectatio value. The statistic called the mea is a approximatio to the expectatio value of the mea. The statistic mea is the mea of the sample ad the expectatio value mea is the mea of the etire populatio. I order to calculate a expectatio, oe requires kowledge of the PDF. I practice, the motivatio i calculatig a statistic is that oe has o kowledge of the uderlyig PDF. 6.. Statistics Ay fuctio of the radom variables costitutig a radom sample is called a statistic. Example 6..: Mea The mea is a statistic of a radom sample of size ad is defied as X X i i (6.) Example 6..: Media The media is a statistic of a radom sample of size, which represets the middle value of the sample ad, for a samplig arraged i icreasig order of magitude, is defied as 34

Samplig ad Estimatio - 35 ~ X X ~ X X ( + / ) / + X ( + ) / for odd for eve (6.) The media of the sample space {,,3} is. The media of the sample space {3,,} is. The media of the sample space {,,3,4} is.5. Example 6.3.: Mode The mode is a statistic of a radom sample of size, which represets the most frequetly appearig value i the sample. The mode may ot exist ad, if it does, it may ot be uique. The mode of the sample space {,,,3} is. The mode of the sample space {,,,3,4,4} is ad 4. (bimodal) The mode of the sample space {,,3} does ot exist sice each etry occurs oly oce. Example 6.4.: Rage The rage is a statistic of a radom sample of size, which represets the spa of the sample ad, for a samplig arraged i icreasig order of magitude, is defied as rage(x) X -X (6.3) The rage of {,,3,4,5} is 5-4. Example 6.5.: Variace The variace is a statistic of a radom sample of size, which represets the spread of the sample ad is defied as S ( X X ) ( X i ) i i i i ( ) X i (6.4) The reaso for usig (-) i the deomiator rather tha is give later. Example 6.6.: Stadard Deviatio The stadard deviatio, s, is a statistic of a radom sample of size, which represets the spread of the sample ad is defied as the positive square root of the variace. S S (6.5) 35

Samplig ad Estimatio - 36 6.3. Samplig Distributios We have ow stated the defiitios of the statistics we are iterested i. Now, we eed to kow the distributio of the statistics to determie how good these samplig approximatios are to the true expectatio values of the populatio. Statistic. Mea whe the variace is kow: Samplig Distributio If X is the mea of a radom sample of size take from a populatio with mea µ ad variace, the the limitig form of the distributio of X µ Z (6.6) / as, is the stadard ormal distributio (z;0,). This is kow as the Cetral Limit Theorem. What this says is that, give a collectio of radom samples, each of size, yieldig a mea X, the distributio of X approximates a ormal distributio, ad becomes exactly a ormal distributio as the sample size goes to ifiity. The distributio of X does ot have to be ormal. Geerally, the ormal approximatio for X is good if > 30. We provide a derivatio i Appedix V provig that the distributio of the sample mea is give by the ormal distributio. Example 6.7.: distributio of the mea, variace kow I a reactor iteded to grow crystals i solutio, a seed is used to ecourage ucleatio. Idividual crystals are radomly sampled from the effluet of each reactor of sizes 0. The populatio has variace i crystal size of. 0 µm. (We must kow this from previous research.) The samples yield mea crystal sizes of x 5. 0 µm. What is the likelihood that the true populatio mea, µ, is actually less tha 4.0 µm? Z x µ 5 4 3.6 / / 0 ( < 4 ) P( z > 3.6) P µ We have the chage i sig because as µ icreases, z decreases. The evaluatio of the cumulative ormal probability distributio ca be performed several ways. First, whe the pioeers were crossig the plais i their covered wagos ad they wated to evaluate probabilities from the ormal distributio, they used Tables of the cumulative ormal PDF, such as those provided i the back of the statistics textbook. These tables are also available olie. For example wikipedia has a table of cumulative stadard umeral PDFs at 36

Samplig ad Estimatio - 37 http://e.wikipedia.org/wiki/stadard_ormal_table Usig the table, we fid ( < 4 ) P( z > 3.6) P( z < 3.6) 0.999 0. 0008 P µ Secod, we ca use a moder computatioal tool like MATLAB to evaluate the probability. The problem ca be worked i terms of the stadard ormal PDF (µ 0 ad ), which for P µ < 4 P z > 3.6 P z < 3.6 is ( ) ( ) ( ) >> p - cdf('ormal',3.6,0,) p 7.83447870803e-04 Alteratively, the problem ca be worked i terms of the o-stadard ormal PDF ( x 5 ad / / 0 P µ < 4 ), which for ( ) >> p cdf('ormal',4,5,/sqrt(0)) p 7.87090076e-04 The differece i these results is due to the roud-off i 3.6, used as a argumet i the fuctio call for the stadard ormal distributio. Based o our samplig data, the probability that the true sample mea is less tha 4.0 µm is 0.078%. Statistic. differece of meas whe the variace is kow: Samplig Distributio It is useful to kow the samplig differece of two meas whe you wat to determie whether there is a sigificat differece betwee two populatios. This situatio applies whe you takes two radom samples of size ad from two differet populatios, with meas µ ad µ ad variaces ad, respectively. The the samplig distributio of the differece of meas, X X X X, is approximately ormal, distributed with mea µ µ µ ad variace XX 37

Samplig ad Estimatio - 38 Hece, Z ( X X ) ( µ µ ) + (6.7) is approximately a stadard ormal variable. Example 6.8.: distributio of the differece of meas, variaces kow I a reactor iteded to grow crystals, two differet types of seeds are used to ecourage ucleatio. Idividual crystals are radomly sampled from the effluet of each reactor of sizes 0 ad 0. The populatios have variaces i crystal size of. 0 µm ad.0 µm. (We must kow this from previous research.) The samples yield mea crystal sizes of X 5. 0 µm ad X 0. 0 µm. How cofidet ca we be that the true differece i populatio meas, µ µ, is actually 4.0 µm or greater? Usig equatio (6.7) we have: Z ( X X ) ( µ µ ) ( 5 0) ( 4) + 0 + 0.36 ( µ > 4.0) P( z.36) P µ < We have the chage i sig because as µ icreases, z decreases. The probability that µ µ is greater 4.0 µm is the give by P(Z<.36). How do we kow that we wat P(Z<.36) ad ot P(Z>.36)? We just have to sit dow ad thik what the problem physically meas. Sice we wat the probability that µ µ is greater 4.0 µm, we kow we eed to iclude the area due to higher values of µ µ. Higher values of µ µ yield lower values of Z. Therefore, we eed the less tha sig. The evaluatio of the cumulative ormal probability distributio ca agai be performed two ways. First, usig a stadard ormal table, we have P(Z <. 4 ) 0. 9875 Secod, usig MATLAB we have >> p cdf('ormal',.36,0,) 38

Samplig ad Estimatio - 39 p 0.987373897090 We expect 98.73% of the differeces i crystal size of the two populatios to be at least 4.0 µm. Statistic 3. Mea whe the variace is ukow: Samplig Distributio Of course, usually we do t kow the populatio variace. I that case, we have to use some other statistic to get a hadle o the distributio of the mea. If X is the mea of a radom sample of size take from a populatio with mea µ ad ukow variace, the the limitig form of the distributio of T X µ (6.8) S / as, is the t distributio f T ( t; v). The T-statistic has a t-distributio with v- degrees of freedom. The t-distributio is just aother cotiuous PDF, like the others we leared about i the previous sectio. The t distributio is give by Γ f ( t) Γ [( v + ) / ) ] ( v / ) πv t + v As a remider, the t distributio is plotted agai i Figure 6.. v+ Example 6.9.: distributio of the mea, variace ukow I a reactor iteded to grow crystals, a seed is used to ecourage ucleatio. Idividual crystals are radomly sampled from the effluet of each reactor of sizes 0. The populatio has ukow variace i crystal size. The samples yield mea crystal sizes of x 5. 0 µm ad a sample variace of s. 0µm. f(t) for 0.45 0.4 0.35 0.3 0.5 0. 0.5 0. 0.05 < t < 0-6 -4-0 4 6 t Figure 6.. The t distributio as a fuctio of the degrees of freedom ad the ormal distributio. 39 ormal 00 50 0 0 5

Samplig ad Estimatio - 40 What is the likelihood that the true populatio mea, µ, is actually less tha 4.0 µm? t x µ 5 4 3.6 s / / 0 ( < 4 ) P( t > 3.6) P µ We have the chage i sig because as µ icreases, t decreases. The parameter v - 9. The evaluatio of the cumulative t probability distributio ca agai be performed two ways. First, we ca use a table of critical values of the t-distributio. It is crucial to ote that such a table does ot provide cumulative PDFs, rather it provides oe mius the cumulative PDF. I other words, where as the stadard ormal table provides the probability less tha z (the cumulative PDF), the t-distributio table provides the probability greater tha t (oe mius the cumulative PDF). We the have ( < 4) P( t > 3.6) 0. 007 P µ Secod, usig MATLAB we have P ( µ < 4 ) P( t > 3.6) P( t < 3.6) >> p - cdf('t',3.6,9) p 0.0057565656007 Based o our samplig data, the probability that the true sample mea is less tha 4.0 µm is 0.57%. We should poit out that our percetage here is substatially greater tha for our percetage whe we kew the populatio variace (0.078%). That is because kowig the populatio variace reduces our ucertaity. Approximatig the populatio variace with the samplig variace adds to the ucertaity ad results i a larger percetage of our populatio deviatig farther from the sample mea. Example 6.0.: distributio of the mea, variace ukow A egieer claims that the populatio mea yield of a batch process is 500 g/ml of raw material. To verify this, she samples 5 batches each moth. Oe moth the sample has a mea X 58 g ad a stadard deviatio of s40 g. Does this sample support his claim that µ 500 g? The first step i solvig this problem is to compute the T statistic. T X µ 500 58.5 S / 40 / 5 40

Samplig ad Estimatio - 4 Secod, usig MATLAB we have P ( µ > 58) P( t <.5) >> p cdf('t',-.5,4) p 0.069445545754 (Or usig a Table, we fid that whe v4 ad T.5, 0.0). This meas there is oly a.6% probability that a populatio with µ 500 would yield a sample with X 58 or higher. Therefore, it is ulikely that 500 is the populatio mea. Statistic 4. differece of meas whe the variace is ukow: Samplig Distributio It is useful to kow the samplig differece of two meas whe you wat to determie whether there is a sigificat differece betwee two populatios. Sometimes you wat to do this whe you do t kow the populatio variaces. This situatio applies whe you takes two radom samples of size ad from two differet populatios, with meas µ ad µ ad ukow variaces. The the samplig distributio of the differece of meas, X X, follows the t- distributio. trasformatio: T symmetry: t t, ( X X ) ( µ µ ) s s + (6.9) parameters: v + if s s + parameters: v if s ( ) + ( ) s Sice we do t kow either populatio variace i this case, we ca t assume they are equal uless we are told they are equal. Example 6..: distributio of the differece of meas, variaces ukow I a reactor iteded to grow crystals, two differet types of seeds are used to ecourage ucleatio. Idividual crystals are radomly sampled from the effluet of each reactor of sizes 0 ad 0. The populatios have ukow variaces i crystal size. The samples yield 4

Samplig ad Estimatio - 4 mea crystal sizes of X 5. 0 µm ad 0. X 0 µm ad sample variaces of s. 0 µm ad s.0 µm. What percetage of true populatio differeces yieldig these samplig results would have a true differece i populatio meas, µ µ, of 4.0 µm or greater? T ( X X ) ( µ µ ) ( 5 0) ( 4) s s + 0 + 0.36 The degree of freedom parameter is give by: v s 0 + 0 0 0 ( ) + ( ) ( 0 ) + ( 0 ) s s + s 7.98 8 ( µ > 4.0) P( t <.36) P( t.36) P µ > The evaluatio of the cumulative ormal probability distributio ca agai be performed two ways. First, usig a table of critical values of the t-distributio, we have ( µ > 4.0) P( t <.36) P( t >.36) 0.07 0. 9783 P µ Secod, usig MATLAB we have for P ( µ µ > 4.0) P( t.36) >> p cdf('t',.36,8) p 0.9835747598848 < We expect 98.3% of the differeces i crystal size of the two populatios to be at least 4.0 µm. Statistic 5. Variace: Samplig Distributio We ow wish to kow the samplig distributio of the sample variace, S. If S is the variace of a radom sample of size take from a populatio with mea µ ad variace, the the statistic χ ( ) S ( X i X ) (6.0) i 4

Samplig ad Estimatio - 43 has a chi-squared distributio with v- degrees of freedom, f ( χ ; ). The chi-squared distributio is defied as f χ ( x; v) v / x Γ( v / ) 0 v/- e -x/ for x > 0 elsewhere χ It is a special case of the Gamma Distributio, whe v/ ad β, where v is called the degrees of freedom ad is a positive iteger. As a remider, we provide a plot of the chisquared distributio i Figure 6.. f(χ ) 0.8 0.6 0.4 0. 0. 0.08 50 40 30 0 0 5 Example 6..: distributio of 0.06 the variace 0.04 I a reactor iteded to grow crystals, a seed is used to 0.0 ecourage ucleatio. Idividual 0 crystals are radomly sampled 0 0 0 30 40 50 60 70 80 90 00 χ from the effluet of each reactor Figure 6.. The chi-squared distributio for various values of sizes 0. The samples of v. yield mea crystal sizes of x 5.0 µm ad a sample variace of s. 0µm. What is the likelihood that the true populatio variace,, is actually less tha 0.5 µm? ( ) S χ P (0 ) 8 0.5 ( < 0.5) P( χ > 8) 9. We have the chage i sig because as icreases, χ decreases. The parameter v - The evaluatio of the cumulative χ probability distributio ca agai be performed two ways. First, we ca use a table of critical values of the χ -distributio. It is crucial to ote that such a table does ot provide cumulative PDFs, rather it provides oe mius the cumulative PDF. We the have 43

Samplig ad Estimatio - 44 P ( < 0.5) P( χ > 8) 0. 04 Secod, usig MATLAB we have P ( < 0.5) P( χ > 8) P( χ < 8) >> p - cdf('chi',8,9) p 0.03573539466985 Based o our samplig data, the probability that the true variace is less tha 0.5 µm is 3.5%. Statistic 6. the ratio of Variaces: Samplig Distributio (F-distributio) Just as we studied the distributio of two sample meas, so too are we iterested i the distributio of two variaces. I the case of the mea, it was a differece. I the case of the variace, the ratio is more useful. Now cosider samplig two radom samples of size ad from two differet populatios, with meas ad, respectively. The statistic, F, S / S F (6.) S / S provides a distributio of the ratio of two variaces. This distributio is called the F-distributio with v ad v degrees of freedom. The f-distributio is defied as h ( f ; v, v ) f v + v v Γ v v v Γ Γ 0 v v + v f v + v v f for f > 0 elsewhere As a remider, the f-distributio is plotted i Figure 6.3. Example 6.3.: ratio of the variaces I a reactor iteded to grow crystals, two differet types of seeds are used to ecourage ucleatio. Idividual crystals are radomly sampled from the effluet of each reactor of sizes 0 ad 0. The populatios have ukow variaces i crystal size. The samples yield 44

Samplig ad Estimatio - 45 mea crystal sizes of X 5. 0 µm ad X 0. 0 µm ad sample variaces of s. 0 µm ad s.0 µm. What is the probability that the ratio of variaces,, is less tha 0.5? h(f ) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 v 0, v0 v0, v0 v 0, v5 v5, v0 v5, v0 v5, v5 S F S 0.5 P < 0.5 > P F ( ) 0. 0. 0 0 3 4 5 6 7 8 9 f Figure 6.3. The F distributio for various values of v ad v. We have the chage i sig because as icreases, F decreases. The parameters are v 9 ad v 9. The evaluatio of the cumulative F probability distributio ca agai be performed i oe way. We caot use tables because there are o tables for arbitrary values of the probability. There are oly tables for two values of the probability, 0.0 ad 0.05. Therefore, usig MATLAB we have P < 0.5 ( > ) ( < ) P F P F >> p - cdf('f',,9,9) p 0.09743049973 Based o our samplig data, the probability that the ratio of variaces is less tha 0.5 is 9.7%. 6.4. Cofidece Itervals I the previous sectio we showed what types of distributios describe various statistics of a radom sample. I this sectio, we discuss estimatig the populatio mea ad variace from the sample mea ad variace. I additio, we itroduce cofidece itervals to quatify the goodess of these estimates. 45

Samplig ad Estimatio - 46 A cofidece iterval is some subset of radom variable space with which someoe ca say somethig like, I am 95% sure that the true populatio mea is betwee µ low ad µ hi. I this sectio, we discuss how a cofidece iterval is defied ad calculated. The cofidece iterval is defied by a percet. This percet is called (-). So if 0.05, the you would have a 90% cofidece iterval. The cocept of a cofidece iterval is illustrated i graphical terms i Figure 6.4. Figure 6.4. A schematic illustratig a cofidece iterval. The trick the is to fid µ low z ad µ hi z so that you ca say for a give, I am ( )% cofidet that µ low < µ < µ hi. Statistic. mea, kow: cofidece iterval We ow kow that the sample mea is distributed with the stadard ormal distributio. For a symmetric PDF, cetered aroud zero, like the stadard ormal, µ low µ hi. We ca the make the statemet: P ( z < Z < z ) Now the ormal distributio is symmetric about the y-axis so we ca write z z so P ( z < Z < z P z < Z < z ) ( ) 46

where Samplig ad Estimatio - 47 Z X µ. / We ca rearrage this to equatio to read P ( X + z < µ < X z ) (6.) where we ow have µ low ad µ hi explicitly. Example 6.4.: cofidece iterval o mea, variace kow Samples of dioxi cotamiatio i 36 frot yards i St. Louis show a cocetratio of 6 ppm. Fid the 95% cofidece iterval for the populatio mea. Assume that the stadard deviatio is.0 ppm. To solve this, first calculate, z, z. 0.95 0.05 z z0.05.96 z z.96 The z value came from a stadard ormal table. Alteratively, we ca compute this value from MATLAB, >> z icdf('ormal',0.05,0,) z -.959963984540055 Here we used the iverse cumulative distributio fuctio (icdf) commad. Sice we have the stadard ormal PDF, the mea is 0 ad the variace is. The value of 0.05 correspods to alpha, the probability. To get the value of the other limit, we either rely o symmetry, or compute it directly, >> z icdf('ormal',0.975,0,) z.959963984540054 Note that these values of z are idepedet of all aspects of the problem except the value of the cofidece iterval. 47

Samplig ad Estimatio - 48 Therefore, by equatio (6.) P ( 6 + (.96) < µ < X (.96) 0.05 0.95 36 36 so the 95% cofidece iterval for the mea is 5.673 < µ < 6. 37. Statistic. mea, ukow: cofidece iterval Now usually, we do t kow the variace. We have to use our estimate of the variace, s, for. I that case, estimatig the mea requires the T-distributio. (See previous sectio.) Let me stress that we do everythig exactly as we did before but we use s for ad use the t-distributio istead of the ormal distributio. Remember the t-distributio is also symmetric about the origi, so t t. (this meas you oly have to compute the t probability oce. Remember, v-. where P ( t < T < t ) P( t < T < t ) T X µ. s / Just as before, we ca rearrage this to equatio to read s s P ( X + t < µ < X t ) (6.3) where we ow have µ low ad µ hi explicitly. Example 6.5.: cofidece iterval o mea, variace ukow Samples of dioxi cotamiatio i 36 frot yards i St. Louis show a cocetratio of 6 ppm. Fid the 95% cofidece iterval for the populatio mea. The sample stadard deviatio, s, was measured to be.0. To solve this, first calculate, t, t for v 35. 0.95 0.05 t t0.05.03 t t +.03 48

Samplig ad Estimatio - 49 The t value came from a table of t-distributio values. Alteratively, we ca compute this value usig MATLAB, >> t icdf('t',0.05,35) t -.03007985034 ad for the upper limit >> t icdf('t',0.975,35) t.03007985034, which ca also be obtaied by symmetry. Note that these values of t are idepedet of all aspects of the problem except the value of the cofidece iterval ad the umber of sample poits,. Therefore, by equatio (6.3) P ( 6 (.03) < µ < X + (.03) 0.05 0.95 36 36 so the 95% cofidece iterval for the mea is 5.66 < µ < 6. 338. You should ote that we are a little less cofidet about the mea whe we use the sample variace as the estimate for the populatio variace, for which the 95% cofidece iterval for the mea was 5.673 < µ < 6. 37. Statistic 3. differece of meas, kow: cofidece iterval The exact same derivatio that we used above for a sigle mea ca be used for the differece of meas. Whe we the variaces of the two samples are kow, we have: P ( X X ) + z + < ( µ µ ) < ( X X ) z + (6.4) where z is a radom variable obeyig the stadard ormal PDF. Example 6.6.: cofidece iterval o the differece of meas, variaces kow Samples of dioxi cotamiatio i 36 frot yards i Times Beach, a suburb of St. Louis, show a cocetratio of 6 ppm with a populatio variace of.0 ppm. Samples of dioxi cotamiatio i 6 frot yards i Quail Ru, aother suburb of St. Louis, show a cocetratio of 8 ppm with a populatio variace of 3.0 ppm. Fid the 95% cofidece iterval for the differece of populatio meas.. 49

Samplig ad Estimatio - 50 To solve this, first calculate, z, z. 0.95 0.05 z z0.05.96 z z.96 The z value came from a table of stadard ormal PDF values. Alteratively, we ca compute this value from MATLAB, >> z icdf('ormal',0.05,0,) z -.959963984540055 Therefore, by equatio (6.6) P P 36 3 6 3 6 ( 8).96 + < ( µ µ ) < ( 6 8) +.96 + (0.05) 36 6 [.909 < ( µ µ ) <.09] 0. 95 So the 95% cofidece iterval for the mea is.909 < ( µ µ ) <. 09. If we are determiig which site is more cotamiated, the we are 95% sure that site (Quail Ru) is more cotamiated by to 3 ppm tha site, (Times Beach). Statistic 4. differece of meas, ukow: cofidece iterval Whe we the variaces of the two samples are ukow, we have: s s s s P ( X X ) + t + < ( µ µ ) < ( X X ) + t + (6.5) where the umber of degrees of freedom for the t-distributio is v + if 50

Samplig ad Estimatio - 5 5 ( ) ( ) + + s s s s v if Example 6.6.: cofidece iterval o the differece of meas, variaces ukow Samples of dioxi cotamiatio i 36 frot yards i Times Beach, a suburb of St. Louis, show a cocetratio of 6 ppm with a sample variace of.0 ppm. Samples of dioxi cotamiatio i 6 frot yards i Quail Ru, aother suburb of St. Louis, show a cocetratio of 8 ppm with a sample variace of 3.0 ppm. Fid the 95% cofidece iterval for the differece of populatio meas.. To solve this, first calculate,, t t. ( ) ( ) ( ) ( ) 0 9.59 6 6 3 36 36 6 3 36 + + + + s s s s v.086.086 0.05 0.95 0.05 t t t t The t value came from a table of t-pdf values. Alteratively, we ca compute this value usig MATLAB, >> t icdf('t',0.05,0) t -.08596344765864 Therefore, substitutig ito equatio (6.5) yields ( ) ( ) ( ) ) (0.05 6 3 36.086 8 6 6 3 36.086 8 6 + + < < + µ µ P ( ) [ ] 95 0..03.97 < < µ µ P

So the 95% cofidece iterval for the mea is.97 < ( µ ) <. 03 µ. Samplig ad Estimatio - 5 If we are determiig which site is more cotamiated, the we are 95% sure that site (Quail Ru) is more cotamiated by to 3 ppm tha site, (Times Beach). Statistic 5. variace: cofidece iterval The cofidece iterval of the variace ca be estimated i a precisely aalogous way, kowig that the statistic ( ) S ( X i X ) χ i has a chi-squared distributio with v- degrees of freedom, f ( χ ; ). So χ ( ) ( ) P < < (6.6) χ χ Perversely, the tables of the critical values for the χ distributio, have defied to be -, so the idices have to be switched whe usig the table. ( ) ( ) P < < whe usig the χ critical values table oly! χ χ If you get cofused, just remember that the upper limit must be greater tha the lower limit. Remember also that the f ( χ ; ) is ot symmetric about the origi, so we caot use the χ symmetry argumets used for the cofidece itervals for fuctios of the mea. Example 6.7.: variace Samples of dioxi cotamiatio i 6 frot yards i St. Louis show a cocetratio of 6 ppm. Fid the 95% cofidece iterval for the populatio mea. The sample stadard deviatio, s, was measured to be.0. To solve this, first calculate χ, χ. For v 5, we have, 0.95 0.05 χ χ 0.05 χ χ 0.975 7.488 6.6 5

Samplig ad Estimatio - 53 The t value came from a table of value usig MATLAB, >> chi icdf('chi',0.05,5) chi 6.6377950435 ad >> chi icdf('chi',0.975,5) chi 7.488398634497 Therefore, substitutig ito equatio (6.6) yields χ -distributio values. Alteratively, we ca compute this ( 6 ).0 P < 7.488 (6 ).0 < (0.05) 6.6 P ( 0.5457 < <.395) 0. 95 So the 95% cofidece iterval for the mea is 0.5457 < <. 395. Statistic 6. ratio of variaces: cofidece iterval (p. 53) The ratio of two populatio variaces ca be estimated i a precisely aalogous way, kowig that the statistic S / F S / S S follows the F-distributio with v ad v degrees of freedom. Remember, the F- distributio has a symmetry, f / ( v, v). This symmetry relatio is essetial if oe f / ( v, v ) is to use tables for the critical value of the F-distributio. It is ot essetial if oe uses MATLAB commads. If oe is computig the cumulative PDF for the f distributio, the oe simply, rearrages this equatio for 53

Samplig ad Estimatio - 54 S F S F S S S S P < < (6.7) S f ( v, v ) S f ( v, v ) Oe otes that the order of the limits has chaged here, sice as goes up, F goes dow. I ay case, the lower limit must be smaller tha the upper limit. If oe chooses to use tables of critical values, oe must take ito accout two idiosycrasies of the procedure. First, as was the case with the t ad chi-squared distributios, the table provide the probability that f is greater tha a value, ot the cumulative PDF, which is the probability that f is less tha a value. Secod, the tables oly provide data for small values of. Therefore, we must elimiate all istaces of -., usig a symmetry relatio. The result is S S P < < f ( v, v ) whe usig the tables oly! S f ( v, v) S Example 6.8.: cofidece iterval o the ratio of variaces Samples of dioxi cotamiatio i 0 frot yards i Times Beach, a suburb of St. Louis, show a cocetratio of 6 ppm with a sample variace of.0 ppm. Samples of dioxi cotamiatio i 6 frot yards i Quail Ru, aother suburb of St. Louis, show a cocetratio of 8 ppm with a sample variace of 3.0 ppm. Fid the 90% cofidece iterval for the differece of populatio meas.. To solve this, first calculate, F, F, with v 9 ad v 5 0.90 0.05 We ca compute the f probabilities usig MATLAB, >> f icdf('f',0.05,9,5) f 0.4476496650385 54

Samplig ad Estimatio - 55 ad >> f icdf('f',0.95,9,5) f.339898665456 Substitutig ito equatio (6.6) yields P 3.3398 < < 3 0.4476 P 0.45 < < 0.7447 0.90 (0.05) Alteratively, we ca use the table of critical values F F 0.05 F ( v 0.05 F 5, v 0.05 ( v 9, v 9).3 5) F 0.05 ( v 0, v 5).33 P 3.33 P 0.43 < < <.3 (0.05) 3 < 0.7433 0.90 So the 90% cofidece iterval for the mea is.45 < < 0. 7447. 0 If we are determiig which site has a greater variace of cotamiatio levels the we are 90% sure that site (Quail Ru) has more variace by a factor of.3 to 7.0. 6.5. Problems We ited to purchase a liquid as a raw material for a material we are desigig. Two vedors offer us samples of their product ad a statistic sheet. We ru the samples i our ow labs ad come up with the followig data: 55

Samplig ad Estimatio - 56 Vedor Vedor sample # outcome sample # outcome.3.49.49.98 3.05 3.8 4.4 4.36 5.8 5.47 6. 6.36 7.38 7.8 8.39 8.88 9.4 9.87 0.46 0.87.9.04 3.43 4.34 5.9 6. Vedor Specificatio Claims: Vedor : µ. 0 ad 0. 05, 0.36 Vedor : µ. 3 ad 0., 0.3464 Sample statistics, based o the data provided i the table above. 6 i x ( x i x ) 6.80 x i 6 0 6 i x ( ) [ ] 0.09 s 53 6 s 0. 0 [ ] 0.0744 0 x i.8 s 0 i 0 i x i x s 0.78 Problem 6.. Determie a 95% cofidece iterval o the mea of sample. Use the value of the populatio variace give. Is the give populatio mea legitimate? Problem 6.. Determie a 95% cofidece iterval o the differece of meas betwee samples ad. Use the values of the populatio variace give. Is the differece betwee the give populatio meas legitimate? 56

Samplig ad Estimatio - 57 Problem 6.3. Determie a 95% cofidece iterval o the mea of sample. Assume the give values of the populatio variaces are suspect ad ot to be trusted. Is the give populatio mea legitimate? Problem 6.4. Determie a 95% cofidece iterval o the differece of meas betwee samples ad. Assume the give values of the populatio variaces are suspect ad ot to be trusted. Is the differece betwee the give populatio meas legitimate? Problem 6.5. Determie a 95% cofidece iterval o the variace of sample. Is the give populatio variace legitimate? Problem 6.6. Determie a 98% cofidece iterval o the ratio of variace of samples &. Is the ratio of the give populatio variaces legitimate? 57