Statisticians use the word population to refer the total number of (potential) observations under consideration

6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space (chapter 3) Therefore, a populatio may be fiite (e.g. umber of households i the US) or (effectively) ifiite (e.g. umber of stars i the uiverse)

e.g. questio: average umber of TV sets per household i US populatio: umber of TV sets i each household i US questio: average umber of TV sets per household i North America populatio: umber of TV sets i each household i Caada, US ad Mexico questio: probability that a star has plaets populatio: umber of plaets per star for all stars (past, preset, future) i all galaxies i the uiverse

I aswerig questios (e.g. what is the mea, what is the variace, what is the probability) for a give populatio, oe seldom aswers the questios usig the etire populatio. I practice the questios are aswered from a subset (a sample) of the populatio. it is importat to choose the sample i a way that does ot bias the aswers This is the subject of a area of statistics referred to as experimetal desig. (how to desig the sample such that you adequately reflect the etire populatio) e.g. i determiig the probability of gettig a pair i a poker had, you would ot sample oly poker hads that cotaied two pairs. (techically this would be a attempt to determie the probability P(pair) for the etire populatio by approximatig it by a coditioal probability P(pair two pair) e.g. To determie the average legth of logs movig o a coveyor belt at costat speed, oe might decide to measure oly the logs that pass a certai poit o the coveyor belt every 10 miutes. Upo reflectio, you realize that loger logs have a greater probability of beig at the measurig poit at the selected times, thus the sample would give a biased average legth measure that would be too large. e.g. to determie the expected lifetime of a tire, you oly test it o smooth, paved roads? e.g. to determie fuel ratig o cars, the EPA presumes that every car is drive 55 percet of the time i the city ad 45 percet of the time o the highway!?

Oe way to esure ubiased samplig is to esure your subset is a radom sample Suppose our sample is to cosist of observatios, x 1, x,, x. We have to select the first observatio x 1, the secod x, etc. We thik of the procedure for pickig x k as selectig a value for a radom variable X k, that is, we thik of pickig values x 1, x,, x for our sample as the process of pickig values for radom variables X 1, X,, X. Usig this thikig, we ca defie a radom sample as follows: fiite populatio: A set of observatios X 1, X,, X costitutes a radom sample of size from a fiite populatio of size N, if values for the set are chose so that each subset of of the N elemets of the populatio has the same probability of beig selected. ifiite populatio: A set of observatios X 1, X,, X costitutes a radom sample of size from the ifiite populatio described by distributio (discrete) or desity (cotiuous) f(x) if 1. each X i is a RV whose distributio/desity is give by f(x). the RVs are idepedet The phrase radom sample is applied both to the RV s X 1, X,, X ad their values x 1, x,, x

How to achieve a radom sample? e.g. the populatio is fiite (ad relatively small) Label each elemet of the populatio 1,,, N. Draw umbers sequetially, i groups of, from a radom digits table

Whe the populatio size is large or ifiite, this process ca become practically impossible, ad careful thought must be give to, at least approximate, radom samplig desig. e.g. areal samplig usig a regular grid works if uderlyig populatio (e.g. chemical cotamiat cocetratio) is relatively homogeous. Does t work if uderlyig populatio is spatially cocetrated. e.g. replicate samplig i aomalous areas

6. Samplig Distributio of the Mea For each sample x 1, x,, x of observatios, we ca compute a mea x. The mea value will vary with each of our samples. Thus we ca thik of the sample mea (mea value for each sample) as a radom variable X obeyig some distributio fuctio f(x ; ) The distributio f(x ; ) is referred to as the theoretical samplig distributio. We put aside for the momet the questio of the form for f(x ; ) ad ote that, i chapter 5.10, we have already computed the mea ad variace for f(x ; ) i the case of cotiuous RV s. Theorem 6.1: If a radom sample X 1, X,, X of size is take from a populatio havig mea μ ad variace σ, the X is a RV whose distributio f(x ; ) has: ifiite populatio mea value E(X) = μ ad variace Var X = σ fiite populatio mea value E(X) = μ ad variace Var X = σ N N 1 Note: The appearace of the term N for the variace of X i the fiite populatio case is N 1 uexpected based upo the calculatio i 5.10. The calculatios i 5.10, whe applied to a fiite populatio, assume that N. This correctio factor, called the fiite populatio correctio (fpc) factor is icluded to accout for cases i which N. Note that the fpc factor =0 for = N. (i.e. Var X =0 whe = N). This implies that, whe oe sample is take usig the etire populatio, X exactly measures the populatio mea with o error (variace).

e.g. For N = 1,000 ad = 10, the fpc is fpc = 990 999 = 0.991 Note that the results i Theorem 6.1 are idepedet of what f(x ; ) may actually be!!! Apply Chebyshev s theorem to the RV X Let ε = k σ, i.e. k = ε σ, givig P X μ > k σ < 1 k. P X μ > ε < σ ε = σ ε Therefore, for ay (arbitrarily small but) o-zero value for ε, the probability that X differs from μ ca be made arbitrarily small by makig large eough. (We eed σ ε, which meas must get very large as ε gets small). This observatio is kow as the law of large umbers (if you make the sample size large eough, a sigle sample is sufficiet to give a value for x arbitrarily close to the populatio mea.)

Theorem 6. Let X 1, X,, X be a radom sample, each havig the same mea value μ ad variace σ. The for ay ε > 0 P X μ > ε 0 as as the sample size gets large, the probability that the average from a sigle radom sample differs from the true mea goes to zero. Agai this result o X is idepedet of what f(x ; ) may actually be. e.g. I a experimet, evet A occurs with probability p. Repeat the experimet times ad compute relative frequecy of occurrece of A = Show that the relative frequecy of A p as umber of times A occurs i trials Cosider each trial as a idepedet RV, X 1, X,, X Each X i takes o two values, x i = 0,1 depedig o whether A does ot or does occur i experimet i. X i has mea value E X i = 0 1 p + 1 p = p ad variace Var X i = E X i E X i = 0 1 p + 1 p p = p(1 p) The X 1 + X + + X records the umber of times A occurs i trials, ad X = X 1 + X + + X is i fact the relative frequecy of occurrece of A. From Theorem 6. we have p(1 p) ε P X p > ε < 0 for ay p [0,1] as

Var(X) σ X = σ is referred to at the stadard error of the mea. To reduce the stadard error by a factor of two, it is ecessary to icrease 4. Thus (ufortuately) icreasig sample size decreases the stadard error at a relatively slow rate. (e.g. if goes from 5 to,500 (a factor of 100), the stadard error decreases oly by 10.) While the results i Theorems 6.1 ad 6. are idepedet of the form of the theoretical samplig distributio/desity f(x ; ), the actual form for f(x ; ) depeds o kowig the probability distributio which govers the populatio. I geeral it ca be very difficult to compute the form of f(x ; ). Two results are kow both preseted as theorems. Theorem 6.3 (cetral limit theorem) Let X be the mea of a radom sample of size take from a populatio havig mea μ ad variace σ. The the associated RV, the stadardized sample mea X μ Z σ is a RV whose distributio fuctio approaches the stadard ormal distributio as

The cetral limit theorem says that, as, the theoretical samplig distributio f(x ; ) a ormal distributio (i.e. X is ormally distributed) with mea μ ad variace σ The distributio f(x ; ) of X for samples of size for a populatio with expoetial distributio The distributio f(x ; ) of X for samples of size for populatio with uiform distributio I practice, the distributio for X is well approximated by a ormal distributio for as small as 5 to 30.

Practical use of the cetral limit theorem: You have a populatio whose mea μ ad stadard deviatio σ you assume that you kow (but whose desity fuctio f(x) you do ot kow). You sample the populatio with a sample of size. From the sample you compute a mea value x. If the sample size is sufficietly large the cetral limit theorem will tell you the probability of gettig the value x give your assumptios o the values of μ ad σ. To test your assumptio, compute the stadardized sample mea z usig the measured x ad assumed values μ ad σ. The cetral limit theorem states that the probability of gettig the value x is the same as the probability of gettig the z-score z i a stadard ormal distributio.

Theorem (Normal populatios) Let X be the mea of a radom sample of size take from a populatio that is ormally distributed havig mea μ ad variace σ. The the stadardized sample mea X μ Z σ has the stadard ormal distributio fuctio regardless of the size of. (i.e. f(x ; ) for X is ormal desity with mea μ ad variace σ /). Practical use of this theorem: You have a populatio whose distributio is (assumed to be) ormal ad whose mea μ ad stadard deviatio σ you assume that you kow. You sample the populatio with a sample of size. From the sample you compute a mea value x. This theorem will tell you the probability of gettig the value x give your assumptios o ormality ad the values of μ ad σ. To test your assumptios, compute the stadardized sample mea z usig the measured x ad assumed values μ ad σ. This theorem states that the probability of gettig the value x is the same as the probability of gettig the z-score z i a stadard ormal distributio.

e.g. 1-gallo pait cas (the populatio) from a particular maufacturer cover, o average 513.3 sq. ft, with a stadard deviatio of 31.5 sq. ft. What is the probability that the mea area covered by a sample of 40 1-gallo cas will lie withi 510.0 to 50.0 sq. ft. Fid the stadardized sample meas for the two limits of the rage: 510.0 513.3 50.0 513.3 z 1 = = 0.66, z = = 1.34 31.5 40 31.5 40 Assumig the cetral limit theorem, we have from Table 3 P 510.0 < X < 50.0 = P 0.66 < Z < 1.34 = F 1.34 F 0.66 = 0.9099 0.546 = 0.6553

6.3 The Samplig Distributio of the Mea whe σ is ukow (usual case) I 6. we discussed aspects of the distributio of the sample mea X (it has a distributio with mea μ,variace σ (for cotiuous RVs), ad the related RV X μ Z σ the stadardized sample mea approaches the stadard ormal distributio as ). I practice σ is ot kow ad we have to deal with the values x μ t s where s is the sample stadard deviatio s = s, ad s is the sample variace s x i x = 1 Similar to X, we defie the radom variable S called the sample variace S X i X = 1 which has values s. I this sectio ad the ext, we are iterested i the behavior of t ad S thought of as radom variables.

Little is kow about the behavior of the distributio for t whe is small uless we are samplig from a populatio govered by the ormal distributio (a ormal populatio ) Theorem 6.4 If X is the sample mea for a radom sample of size take from a ormal populatio havig mea μ, the X μ t S is a radom variable havig the t distributio with parameter v = 1. Note: it is covetio to use small t for the RV for the t distributio (breakig the covetio to use capital letters for the RV ad small letters for its values). We will use small t to stad for both the RV ad its values.

The t distributio: a oe-parameter family of RVs, with values defied o (, ) desity fuctio f t; v = Γ v + 1 vπγ v 1 + t v+1 v mea value 0 (for v > 1), otherwise udefied variace v v (for v > ), for 1 < v <, otherwise udefied The t distributio is symmetric about 0, ad very close to the stadard ormal distributio. I fact the t distributio the stadard ormal distributio as v. The t distributio has heavier tails tha the stadard ormal distributio (i.e. there is higher probability i the tails of the t distributio). It is ofte referred to as studet s t distributio v v v v

The parameter v i the t distributio is referred to as the (umber of) degrees of freedom (df) Recall that the sum of the sample deviatios x i x is 0, hece oly 1 of the deviatios are idepedet of each other. Thus the RVs S ad, by the same reasoig, t both have 1 degrees of freedom. Similar to the z α for the stadard ormal distributio, we defie the t α for the t distributio. Because of the symmetry of the stadard ormal ad t distributios we have z 1 α = z α, t 1 α = t α Recall that Table 3 lists values of the cumulative stadard ormal distributio F(z) for various values of z I cotrast, Table 4 lists values of t α for various values of α ad v. (Recall, α is the probability i the right-had tail above t α ) By symmetry, the probability i the left-had tail below t α is also α. Note that for, t α = z α The stadard ormal distributio provides a good approximatio to the t distributio for samples of size 30 or more.

Practical use of theorem 6.4: You have a populatio whose distributio is ( assumed to be) ormal ad whose mea μ you assume that you kow (but whose stadard deviatio you do ot kow). You sample the populatio with a sample of size. From the sample you compute a sample mea value x ad the sample stadard deviatio s. Theorem 6.4 will tell you the probability of gettig the values x ad s give your assumptios o ormality ad the value of μ. To test your assumptio, compute the stadardized sample mea z usig the measured x ad s ad the assumed values μ. Theorem 6.4 states that the probability of gettig the value x, s is the same as the probability of gettig the value t i a t distributio with v = 1

e.g. a maufacturer s fuses (the populatio) will blow i 1.40 miutes o average whe subjected to a 0% overload. A sample of 0 fuses are subjected to a 0% overload. The sample average ad stadard deviatio were observed to be, respectively, 10.63 ad.48 miutes. What is the probability of this observatio give the maufacturers claim? 10.63 1.40 t = = 3.19, v = 0 1 = 19.48/ 0 From Table 4, for v = 19, we see that a t value of.861 already has oly 0.5% probability (α = 0.005) of beig exceeded. Cosequetly there is less tha a 0.5% probability that a t value smaller tha -.861 will occur. Sice the t value obtaied i our sample of 0 is 3.19, we coclude that there is less tha 0.5% probability of gettig this result. We therefore suspect that the maufacturers claim is icorrect, ad that the maufacturers fuses will blow i less tha 1.40 miutes o average whe subjected to 0% overload. If the populatio is ot ormal, studies have show that the distributio of X μ S is fairly close to that of the t distributio as log as the populatio distributio is relatively bell-shaped ad ot too skewed. This ca be checked usig a ormal scores plot o the populatio.

6.4 The Distributio of the Sample Variace S Theorem 6.5 Cosider a radom sample of size take from a ormal populatio havig variace σ. The the RV ( 1)S X i X σ = σ has the chi-square distributio with parameter v = 1 The chi-square distributio: a oe-parameter family of RVs, with values defied o (0, ) desity fuctio 1 f x; v = v Γ v x v 1 e v mea value v variace v The chi-square distributio is just the gamma distributio with α = v, β = Agai, the parameter v is referred to as the (umber of) degrees of freedom (df) We defie the α otatio similar to that of z α ad t α. Just as for Table 4, Table 5 lists values of α for various values of α ad v.

v v v v v v

e.g. (the populatio) glass blaks from a optical firm suitable for gridig ito leses Variace or refractive idex of glass is 1.6 10 4. Radom sample of size 0 selected from ay shipmet, ad if variace of refractive idex of sample exceeds 10 4, the sample is rejected. What is probability of rejectio assumig uderlyig populatio is ormal? For the measured sample of 0 0 1 10 4 1.6 10 4 = 30. From Table 5, for v = 19, 30. correspods to a value α = 0.05. There is therefore a 5% probability of rejected a shipmet

Practical use of theorem 6.5: You have a populatio whose distributio is ( assumed to be) ormal ad whose variace σ you assume that you kow. You sample the populatio with a sample of size. From the sample you compute a sample variace s. Theorem 6.5 will tell you the probability of gettig the value s give your assumptios o ormality ad the value of σ. To test your assumptio, compute the chi square value usig the measured s ad the assumed value σ. Theorem 6.5 states that the probability of gettig the value s is the same as the probability of gettig the value i a chi square distributio with v = 1

Recap sample 1 outcomes y 1 y sample space (N outcomes if fiite) e.g. throws each of k dice sample values for RV x 1 x e.g. k-dice sums sample j Thik of each x i value as resultig from a RV X i such that 1. each X i has the same desity f(x), mea μ, ad variace σ. the X i are idepedet radom sample The populatio of outcomes i the sample space geerates values for the RVs

Each sample geerates a sample mea x ad a sample variace s = Thik of the sample meas ad variaces are values for the RVs X ad S What are F X, E X, Var X, F S, E S, Var S? 1 x i x Chapter 5 states: E X = μ, Var X E X = μ, Var X = σ / for a ifiite populatio = σ N N 1 for a ifiite populatio Chapter 6 addresses the questios o F X, F S Law of large umbers for a sigle sample (ad sigle value of X) Cetral limit theorem is a RV whose distributio F Z P X μ > ε < σ ε Z X μ σ stadard ormal N(0,1) as (i.e. X is a RV whosedistributio F X N(μ, σ) as )

If the X i are ormally distributed with mea μ ad variace σ X μ Z σ is a RV whose distributio F Z = N(0,1) for all i.e. X is a RV whose distributio F X = N(μ, σ) for all If the X i are ormally distributed with mea μ X μ t S is a RV whose distributio F t is the t-distributio with df v = 1 If the X i are ormally distributed with variace σ ( 1)S X i X σ = σ is a RV whose distributio F is the chi square distributio with df v = 1

Assume we have two populatios. We may wish to iquire whether they have the same variace. Assume S 1 ad S are measured sample variaces for each populatio. Theorem 6.6 If S 1 ad S are measured sample variaces of idepedet radom samples of respective sizes 1 ad take from two ormal populatios havig the same variace the F = S 1 S is a RV havig the F distributio with parameters v 1 = 1 1 ad v = 1. The F distributio: a two-parameter family of RVs, with values defied o (0, ) desity fuctio mea value variace f x; v 1, v = v v for v > 1 B v 1, v v (v 1 +v ) v 1 v (v 4) for v > 4 v 1 v v 1 x v 1 1 The F distributio is similar to the beta distributio. B v 1, v B x, y = t x 1 1 t y 1 dt 0 1 1 + v 1 v x v 1+v is the beta fuctio

F distributio v 1 v 1 v 1 v v v v 1 v 1 v v

The parameter v 1 is referred to as the umerator degrees of freedom (df of uerator) The parameter v is referred to as the deomiator degrees of freedom (df of deomiator) As with z α, t α, etc we defie F α. Values of F α are give i Table 6 for various values of v 1 ad v for α = 0.05 (Table 6(a)) ad α = 0.01(Table 6(b)) Practical use of theorem 6.6: You have two populatio whose distributio are ( assumed to be) ormal ad whose variaces you assume to be equal. You sample populatio 1 with a sample of size 1 ad populatio with a sample of size. From each sample you compute sample variaces s 1 ad s. Theorem 6.6 will tell you the probability of gettig the ratio s 1 s give your assumptios o ormality ad equality of variace. To test your assumptio, compute the value F. Theorem 6.6 states that the probability of gettig the ratio s 1 s is the same as the probability of gettig the value F i a F distributio with v 1 = 1 1, v = 1.

e.g. Two radom samples of size 1 = 7 ad = 13 are take from the same ormal populatio. What is the probability that the variace of the first sample will be at least 3 times that of the secod. For v 1 = 6 ad v = 1, Table 6(a) shows a F value of 3.00 for α = 0.05. Therefore there is a 5% probability that the variace of the first sample will be at least 3 times that of the secod.

6.5 Represetatios of ormal distributios Defiig ew radom variables i terms of others is referred to as a represetatio chi-square Let Z 1, Z,, Z v be idepedet stadard ormal RVs. Defie the RV v v = Z i The v has a chi square distributio with v df Thus we also see that the square of a stadard ormal RV is a chi-square RV Let v 1 1 = Z i v 1 +v ad = Z i i=v 1 +1 where the Z i are idepedet stadard ormal RVs (ad thus 1 ad are idepedet of each other). The 1 + has a chi square distributio with v 1 + v df. Thus we see that the sum of two idepedet chi square RVs is also a chi square RV with the sum of the idividual df

t distributio Let Z be a stadard ormal RV ad be a chi-square RV with v df. Assume Z ad are idepedet. The t Z has a t distributio with v df v F distributio Let 1 ad be chi-square RVs with df v 1 ad v respectively. Assume 1 ad are idepedet. The has a F distributio with v 1, v df F v1,v 1 v 1 v Thus we see that is a RV with a F 1,v distributio t Z 1 v

e.g. Let X 1, X,, X be idepedet ormal RVs all havig mea μ ad stadard deviatio σ. The Z i = X i μ σ is a stadard ormal RV for each i. The Z 1 is also a stadard ormal RV. Cosider i.e. Z i Z Z i = Z i = 1 Z Z i Z i = Z i Z X i μ σ = X μ σ/ + Z = Z i + Z Z Note that the LHS is chi square distributio with df. The last term o the RHS is chi square with 1 df. This implies that the first term o the RHS is chi-square with 1df. Thus we see that ( 1)S X i X σ = σ = Z i Z has a chi square distributio with 1df (as claimed i Theorem 6.5)

Let X i be N(μ i, σ i ) for i = 1,, be idepedet ormal RVs The is ormal with E X = u i A sum of ormal RVs is a ormal RV X = X i, Var X = σ i Let X i be a chi-square RV with df= v i for i = 1,, ; assume the X i are idepedet The X = is a chi-square RV with df v = A sum of chi-square RVs is chi-square X i v i

Let X i be a Poisso RV with parameter λ i for i = 1,, ; assume the X i are idepedet The X = is a Poisso RV with parameter λ = A sum of Poisso RVs is Poisso X i λ i