Chapter 6 Sampling Distributions

Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to examie probabilities associated with evets that specify coditios o two or more radom variables. Def: A set of radom variables costitutes a radom sample of size from a fiite populatio of size N if each member of the sample,, is chose i such a way that every sample of size has the same probability of beig chose. Def: A set of (cotiuous or discrete) radom variables X 1, X 2,..., X is called a radom sample of size if the r.v. s have the same distributio ad are idepedet. We say that X 1, X 2,..., X are idepedet ad idetically distributed (i.i.d.). Note: We will also use the term radom sample to the set of observed values of the radom variables. Prior to selectig the sample ad makig the measuremets, we have with each beig a (ukow) radom quatity havig associated probability distributio f(x). After selectig the sample ad makig the measuremets, we have Note: I practice, it is ofte difficult to do radom samplig. However, radom samplig is basic to the use of the statistical iferetial procedures that we will discuss later. These procedures are used for aalyzig experimetal data, for testig hypotheses, for estimatig parameters (umerical characteristics of populatios), ad for performig quality cotrol i maufacturig. I each

situatio, we must somehow obtai covicig evidece that the data collected do approximate the coditios of radomess. 2 Example : I a maufacturig situatio, we have maufactured items comig off a assembly lie. Assume that the populatio of items that have bee completed is relatively large. We wat to check the quality of these items by selectig a radom sample of them ad makig measuremets o each item i the sample. If the sample is radom, the it has a good chace of beig represetative of the populatio, ad we ca obtai useful iformatio about the quality of the etire populatio. For example, we are iterested i kowig whether the average value of a certai measuremet is close to the specified target value. It is very ulikely that the sample average will be exactly equal to the populatio average, but it is likely to be close. The Samplig Distributio of the Sample Mea Def: A statistic is a radom variable which is a fuctio of a radom sample. The probability distributio associated with a statistic is called its samplig distributio. Example: Let X 1, X 2,..., X be a radom sample from a populatio 1 (probability distributio). The statistic X X i is called the i 1 sample mea. Sice The X i s are radom variables, the X is also a radom variable, with a samplig distributio. Some other examples of statistics are: 2 1 S 1 1) The sample variace, X i X i 1 2,

3 2) The sample media, X ~, Theorem 6.1: Let X 1, X 2,..., X be a radom sample from a distributio havig mea ad stadard deviatio. The the mea of the samplig distributio of is: 1 1 E X E X i X i 1 i 1 The variace of the samplig distributio depeds o the size of the populatio from which the sample is draw. If the populatio is of ifiite size, the 2 1 2 X 2 i 1 2. Note: The quatity (stadard deviatio of the samplig distributio of the sample mea) is also called the stadard error of the mea. It provides us with a measure of reliability of the sample mea as a estimate of the populatio mea. This term will be importat whe we discuss statistical iferece. Note: If the radom sample was selected from a ormal distributio (we write X 1, X 2,..., X ~ Normal(, ) ), the it ca be show that X ~ Normal,. Example: O page 134, Exercise 5.27. If I radomly select a sigle assembled piece of machiery from the populatio of assembled pieces, the time for assembly will be a radom variable X havig a Normal(µ = 12.9 mi., σ = 2.0 mi.) O the other had, if I select a radom sample of size 64 from the populatio, the distributio of,

the average assembly time for the sample of pieces, will have a distributio that is ( ) Note that the variability i the distributio of is oly oe-eighth the variability i the distributio of X. This is a importat cocept. 4 The followig theorem is EXTREMELY importat (as well as astoishig). This theorem provides the basis for our procedures for doig statistical iferece. Theorem 6.3: (Cetral Limit Theorem) If X 1, X 2,..., X are a radom sample from ay distributio with mea ad stadard deviatio X < +, the the limitig distributio of stadard ormal. as + is Note: Nothig was said about the distributio from which the sample was selected except that it has fiite stadard deviatio. The sample could be selected from a ormal distributio, or from a expoetial distributio, or from a Weibull distributio, or from a Beroulli distributio, or from a Poisso distributio, or from ay other distributio with fiite stadard deviatio. See, e.g., the example o pages 179-180. See also the illustratio o page 184. Note: For what will the ormal approximatio be good? For most purposes, if 30, we will say that the approximatio give by the Cetral Limit Theorem (CLT) works well.

Example: p. 187, Exercise 6.15. 5 Example: The fracture stregth of tempered glass averages 14 (measured i thousads of p.s.i.) ad has a stadard deviatio of 2. What is the probability that the average fracture stregth of 100 radomly selected pieces of tempered glass will exceed 14,500 p.s.i.? Example: Shear stregth measuremets for spot welds have bee foud to have a stadard deviatio of 10 p.s.i. If 100 test welds are to be measured, what is the approximate probability that the sample mea will be withi 1 p.s.i. of the true populatio mea? The T Distributio Use of the above discussio (Cetral Limit Theorem, etc.) to draw coclusios about the value of the populatio mea, µ, from a measured value of the sample mea,, has a flaw. If we have to deped o sample data for iformatio about the populatio mea, the we would ted ot to kow the value of the populatio stadard deviatio, either. We would also have to estimate σ. We eed to modify our theory somewhat to take this complicatio ito accout. We itroduce aother probability distributio that allows us to use sample data aloe to make ifereces about the populatio mea. Theorem 6.4: If is the mea of a radom sample of size take from a ormal distributio havig mea µ ad stadard deviatio σ, ad if ( ) is the sample variace, the the radom variable

6 ( ) has a t-distributio with degrees of freedom ν = 1. The t-distributio (which is actually a family of distributios, characterized by the degrees of freedom) has characteristics similar to those of the stadard ormal distributio, as we ca see from the figure o page 187. Note that for large d.f., the t(-1) distributio is very close to the stadard ormal distributio. I fact, the stadard ormal distributio provides a good approximatio to the t(-1) distributio for of size 30 or more. Note: Cut-off values ad various tail probabilities for the t- distributio, with various values for ν, may be foud i Table 4 o page 516. Note that i order to use this table, we must kow the degrees of freedom i the particular exercise. However, we will fid these values usig Excel. The Excel fuctios to be used would be ( ) ( ) ad ( ) Example: page 188. The Samplig Distributio of the Variace The above discussio provides us with the tools to do iferece about the value of a populatio mea. If we wat to do iferece about the value of a populatio variace,, the we eed to discuss the samplig distributio for the sample statistic,, that we use to estimate the populatio variace. For this, we eed to itroduce aother family of probability distributios, the chi-square family.

7 Theorem 6.5: If is the variace of a radom sample of size take from a ormal distributio with variace the the radom variable ( ) ( ) has a chi-square distributio with degrees of freedom ν = 1. Note: Cut-off values ad various tail-probabilities for the chi square distributio, with various values for ν, may be foud i Table 5 o page 517. Note that i order to use this table, we must kow the degrees of freedom i the particular exercise. However, we will fid these values usig Excel. The Excel fuctios to be used are ( ) ( ) ad ( ) Example: p. 190. The F-Distributio Whe we do aalysis of experimetal data, our coclusios about whether the experimetal treatmets had a effect will be based o a statistic which may be imagied as a sigal-to-oise ratio, with the sigal beig the treatmet effect (differeces amog the treatmet groups) ad the oise beig the variability of the data withi treatmet groups. The samplig distributio of this statistic is give i the followig theorem. This statistic may also be used to do iferece about the differeces betwee two populatio variaces.

8 Theorem 6.6: If ad are the variaces of idepedet radom samples of size ad, respectively, take from two ormal distributios havig the same variace, the the radom variable has a F distributio with parameters degrees of freedom) ad freedom). (the umerator (the deomiator degrees of Note: Cut-off values ad various tail-probabilities for the F distributio, with various values for ad, may be foud i Table 6 o pages 518-519 (ote that this table is a abbreviated versio of a F-table that would be used i practical situatios). Note that i order to use this table, we must kow the values of the two degrees-of-freedom parameters i the particular exercise. We may also fid probabilities ad quatiles usig Excel. We will come back to the F distributio later i the course.