Chapter two: Hypothesis testing

: Hypothesis testig - Some basic cocepts: - Data: The raw material of statistics is data. For our purposes we may defie data as umbers. The two kids of umbers that we use i statistics are umbers that result from the takig of a measuremet, ad those that result from the process of coutig. - Biostatistics: The tools of statistics are employed i may fields busiess, educatio, psychology, agriculture, ad ecoomics, to metio oly a few. Whe the data aalyzed are derived from the biological scieces ad medicie, we use the term biostatistics to distiguish this particular applicatio of statistical tools ad cocepts. - Hypothesis testig: A hypothesis may be defied simply as a statemet about oe or more populatios. The hypothesis is frequetly cocered with the parameters of the populatios about which the statemet is made. By meas of hypothesis testig oe determies whether or ot such statemets are compatible with the available data. There are two statistical hypotheses ivolved i hypothesis testig, ad these should be stated explicitly. The ull hypothesis is the hypothesis to be tested. It is desigated by the symbol H o, The ull hypothesis is sometimes referred to as a hypothesis of o differece, sice it is a statemet of agreemet with (or o differece from) coditios presumed to be true i the populatio of iterest. I the testig process the ull hypothesis either is rejected or is ot rejected. If the ull hypothesis is ot rejected, we will say that the data o which the test is based do ot provide sufficiet evidece to cause rejectio. If the testig procedure leads to rejectio, we will say that the data at had are ot compatible with the ull hypothesis, but are supportive of some other hypothesis.

The alterative hypothesis is a statemet of what we will believe is true if our sample data cause us to reject the ull hypothesis. Usually the alterative hypothesis ad the research hypothesis are the same, ad i fact the two terms are used iterchageably. We shall desigate the alterative hypothesis by the symbol H (or H A ). Rules for Statig Statistical Hypotheses: Whe hypotheses are of the type cosidered a idicatio of equality (either =, or ) must appear i the ull hypothesis. Suppose, for example, that we wat to aswer the questio: Ca we coclude that a certai populatio mea is ot 50? The ull hypothesis is ad the alterative is H o : =50 H : 50 Suppose we wat to kow if we ca coclude that the populatio mea is greater tha 50. Our hypotheses are H o : 50, H : > 50 If we wat to kow if we ca coclude that the populatio mea is less tha 50, the hypotheses are H o : 50, H : < 50 Distributio of test statistic. It has bee poited out that the key to statistical iferece is the samplig distributio. We are remided of this agai whe it becomes ecessary to specify the probability distributio of the test statistic. The distributio of the test statistic X o Z for example, follows the stadard ormal distributio if the ull hypothesis is true ad the assumptios are met.

- Sigificace Level: The decisio as to which values go ito the rejectio regio ad which oes go ito the o-rejectio regio is made o the basis of the desired level of sigificace, desigated by. The term level of sigificace reflects the fact that hypothesis tests are sometimes called sigificace tests, ad a computed value of the test statistic that falls i the rejectio regio is said to be sigificat. The level of sigificace, specifies the area uder the curve of the distributio of the test statistic that is above the values o the horizotal axis costitutig the rejectio regio. We select a small value of i order to make the probability of rejectig a true ull hypothesis small. The more frequetly ecoutered values of are 0.0, 0.05, ad 0.0. - Types of Errors: The error committed whe a true ull hypothesis is rejected is called the type I error. The type II error is the error committed whe a false ull hypothesis is ot rejected. The probability of committig a type II error is desigated by. Wheever we reject a ull hypothesis there is always the cocomitat risk of committig a type I error, rejectig a true ull hypothesis. Wheever we fail to reject a ull hypothesis the risk of failig to reject a false ull hypothesis is always preset. We ever kow whether we have committed oe of these errors whe we reject or fail to reject a ull hypothesis, sice the true state of affairs is ukow. Figure - shows for various coditios of a hypothesis test the possible actios that a ivestigator may take ad the coditios uder which each of the two types of error will be made. Figure (-): Coditios uder which type I ad type II errors may be committed 3

Purpose of Hypothesis Testig: The purpose of hypothesis testig is to assist admiistrators ad cliicias i makig decisios. The admiistrative or cliical decisio usually depeds o the statistical decisio. If the ull hypothesis is rejected, the admiistrative or cliical decisio usually reflects this, i that the decisio is compatible with the alterative hypothesis. The reverse is usually true if the ull hypothesis is ot rejected. The admiistrative or cliical decisio, however, may take other forms, such as a decisio to gather more data. Figure (-): Steps i the hypothesis testig procedure 4

Oe-Sided Hypothesis Tests: The hypothesis test i the example () is of a two-sided test (two tailed test), so called because the rejectio regio is split betwee the two sides or tails of the distributio of the test statistic. A hypothesis test may be oe-sided test (oe tailed test), i which case all the rejectio regio is i oe or the other tail of the distributio. Whether a oesided or a two-sided test is used depeds o the ature of the questio beig asked by the researcher. If both large ad small values will cause rejectio of the ull hypothesis, a two sided test is idicated. Whe either sufficietly small values oly or sufficietly large values oly will cause rejectio of the ull hypothesis, a oe-sided test is idicated. H Figure (-3): Oe ad two sided test 5

-- Hypothesis testig: Sigle populatio mea. ---Samplig from a populatio that is ormally distributed: Populatio variace kow. Whe samplig is from a ormally distributed populatio ad the populatio variace is kow, the test statistic is: which, whe H o is true, is distributed as the stadard ormal. Examples:. Researchers are iterested i the mea age of a certai populatio. A radom sample of 0 idividuals draw from the populatio of iterest has a mea of 7. Assumig that the populatio is approximately ormally distributed with variace 0, ca we coclude that the mea is differet from 30 years? Use α=0.05. Z X. Boys of a certai age are kow to have a mea weight of μ 85 pouds. A complait is made that the boys livig i a Orphaage are uderfed. = 5 boys (of the same age) are weighed ad foud to have a mea weight of x 80.94 pouds. It is kow that the populatio stadard deviatio σ is.6. Based o the available data, what should be cocluded cocerig the complait? With α =0.05 6

--- Samplig from a populatio that is ormally distributed: Populatio variace ukow. Whe samplig is from a approximately ormal populatio with a ukow variace, the test statistic is: X t S which, whe H o is true, is distributed as Studet s t with(-) degrees of freedom. Examples:. It is assumed that the mea systolic blood pressure is μ = 0. I a Health Study, a sample of = 00 people had a average systolic blood pressure of 30. mm Hg with a stadard deviatio of.. Is the group sigificatly differet from the regular populatio? Use α=0.05.. A professor wats to kow if her itroductory statistics class has a good grasp of basic math. Six studets are chose at radom from the class ad give a math proficiecy test. The professor wats the class to be able to score above 70 o the test. The six studets get scores of 6, 9, 75, 68, 83, ad 95. Ca the professor have 90 percet cofidece that the mea score for the class o the test would be above 70? 7

--3- Samplig from a populatio that is ot ormally distributed. If the sample o which we base our hypothesis test about a populatio mea comes from a populatio that is ot ormally distributed, we may, if our sample is large (greater tha or equal to 30), take advatage of the cetral limit theorem ad use (Z) as the test statistic. If the populatio stadard deviatio is ot kow, the usual practice is to use the sample stadard deviatio as a estimate. The test statistic is z X S which, whe H o is true, is distributed approximately as the stadard ormal distributio if is large. The ratioale for usig s to replace is that the large sample, ecessary for the cetral limit theorem to apply, will yield a sample stadard deviatio that closely approximates. Example: Ca we coclude that the mea age at death of patiets with homozygous sicklecell disease is less tha 30 years? A sample of 50 patiets yielded the followig ages i years: 5.5 45..7 0.8. 8. 9.7 8. 8. 7.6 45 66.4 67.4.5 6.7 6. 3.7 6.9 3.5.9 3. 9.6 9.7 3.5.6 4.4 0.7 30.9 36.6. 3.6 0.9 7.6 3.5 6.3 40. 3.7 4.8 33. 7. 36.7 3. 38 3.5.8.4 Let =0.05, what assumptios are ecessary? 8

-- Hypothesis testig: The differece betwee two populatio meas. Hypothesis testig ivolvig the differece betwee two populatio meas is most frequetly employed to determie whether or ot it is reasoable to coclude that the two populatio meas are uequal. I such cases, oe or the other of the followig hypotheses may be formulated:. H o : - = 0, H : - 0. H o : - 0, H : - < 0 3. H o : - 0, H : - > 0 It is possible, however, to test the hypothesis that the differece is equal to, greater tha or equal to, or less tha or equal to some value other tha zero. 9

--- Samplig from ormally distributed idepedet populatios: populatio variaces kow. Whe each of two idepedet simple radom samples has bee draw from a ormally distributed populatio with a kow variace, the test statistic for testig the ull hypothesis of equal populatio meas is z ( x x) ( ) Examples:. Researchers wish to kow if the data they have collected provide sufficiet evidece to idicate a differece i mea serum uric acid levels betwee ormal idividuals ad idividuals with Dow s sydrome. The data cosist of serum uric acid readigs o idividuals with Dow s sydrome ad 5 ormal idividuals. The meas are ad x 3.4 x 4.5. The populatios are ormally distributed with variace equal to () for the Dow s sydrome populatio, ad (.5) for ormal populatio. Are the populatio meas equal? With =0.05.. The amout of a certai trace elemet i blood is kow to vary with a stadard deviatio of 4. for male blood doors ad 9.5 for female doors. Radom samples of 75 male ad 50 female doors yield cocetratio meas of 8 ad 33, respectively. What is the likelihood that the populatio meas of cocetratios of the elemet are the same for me ad wome? With =0.05 0

--- Samplig from Normally Distributed idepedet Populatios: Populatio Variaces are equal ad ukow. Whe the populatio variaces are ukow, but assumed to be equal, it is appropriate to pool the sample variaces by meas of the followig formula: s p ( ) s ( ) s Whe each of two idepedet simple radom samples has bee draw from a ormally distributed populatio ad the two populatios have equal but ukow variaces, the test statistic is give by t ( x x) ( ) s p which, whe H0 is true, is distributed as Studet s t with + - degrees of freedom. s p Examples:. A urse was hired by a govermetal ecology agecy to ivestigate the impact of a lead smelter o the level of lead i the blood of childre livig ear the smelter. Te childre were chose at radom from those livig ear the smelter. A compariso group of 7 childre was radomly selected from those livig i a area relatively free from possible lead pollutio. Blood samples were take from the childre, ad lead levels determied. The followig are the results: Lead Levels Childre Livig Near Smelter Childre Livig i Upolluted Area 8 9 6 3 8 4 5 7 7 9 4 5 8 Usig =0.0, suppose that the populatios variace are equal, what do you coclude?

. A psychologist was iterested i explorig whether or ot male ad female college studets have differet drivig behaviors. There were a umber of ways that she could quatify drivig behaviors. She opted to focus o the fastest speed ever drive by a idividual. Therefore, the particular statistical questio she framed was as follows: Is the mea fastest speed drive by male college studets differet tha the mea fastest speed drive by female college studets? She coducted a survey of a radom = 34 male college studets ad a radom = 9 female college studets. Here is a descriptive summary of the results of her survey: Males Females =34 =9 Mea= 05.5 Mea=90.9 Stadard deviatio= 0. Stadard deviatio=. Suppose that the variaces of the two populatio are equal, is there sufficiet evidece at α = 0.05 level to coclude that the mea fastest speed drive by male college studets differs from the mea fastest speed drive by female college studets? --3- Samplig from Normally Distributed idepedet Populatios: Populatio Variaces are uequal ad ukow. Whe two idepedet simple radom samples have bee draw from ormally distributed populatios with ukow ad uequal variaces, the test statistic is t ( x x) ( ) where r, the adjusted degrees of freedom is determied by the equatio: r S S s s S S

Examples:. A researcher examied subjects with hypertesio ad healthy cotrol subjects. He assumed that the mea of the two groups are equal. The sample sizes, meas, ad sample stadard deviatios are: The data costitute two idepedet radom samples, oe from a populatio of subjects with hypertesio ad the other from a cotrol populatio. We assume that the values are approximately ormally distributed i both populatios. The populatio variaces are ukow ad uequal. Test the researchers assumptio, usig =0.05.. If the average residece of a radom sample of patiets of size 5 i the hospital (a) equal to 6 days ad the stadard deviatio of the legth of stay equal to 3, ad the average residece of a radom sample of patiets of size 0 i the hospital (b) equal to 9 days ad the stadard deviatio of the legth of stay equal to 4. If the populatio variaces are uequal ad ukow, calculate the 5% level of sigificace the hypothesis that the average patiet's stay i hospital (a) equal to the average patiet's stay i hospital (b). 3

--4- Samplig from populatios that are ot ormally distributed: Whe samplig is from populatios that are ot ormally distributed, the results of the cetral limit theorem may be employed if sample sizes are large (say30). This will allow the use of ormal theory sice the distributio of the differece betwee sample meas will be approximately ormal. Whe each of two large idepedet simple radom samples has bee draw from a populatio that is ot ormally distributed, the test statistic is ( x z x) ( ) If the populatio variaces are kow, they are used; but if they are ukow, as is the usual case, the sample variaces, which are ecessarily based o large samples, are used as estimates. Sample variaces are ot pooled, sice equality of populatio variaces is ot a ecessary assumptio whe the z statistic is used. Example: The objective of a study was to idetify the role of various disease states ad additioal risk factors i the developmet of thrombosis. Oe focus of the study was to determie if there were differet levels of the atibody i subjects with ad without thrombosis. We wish to kow if we may coclude, o the basis of these results, that, i geeral, persos with thrombosis have, o the average, higher IgG levels tha persos without thrombosis. 4

-3- Hypothesis testig: Paired comparisos: A method frequetly employed for assessig the effectiveess of a treatmet or experimetal procedure is oe that makes use of related observatios resultig from o idepedet samples. A hypothesis test based o this type of data is kow as a paired comparisos test. Istead of performig the aalysis with idividual observatios, we use d i, the differece betwee pairs of observatios, as the variable of iterest. Whe the sample differeces computed from the pairs of measuremets costitute a simple radom sample from a ormally distributed populatio of differeces, the test statistic for testig hypotheses about the populatio mea differece d is d t d s d d i y i x i i,,..., S d di d d d i s d s d Whe H o is true, the test statistic is distributed as Studet s t with - degrees of freedom. Examples:. Blood samples from = 0 people were set to each of two laboratories (Lab ad Lab ) for cholesterol determiatios. The resultig data are summarized here Subject Lab Lab 96 38 68 87 3 30 98 4 6 85 5 30 45 6 80 30 7 90 99 8 5 0 9 57 3 0 35 87 5

Is there a statistically sigificat differece at the α = 0.0 level, say, i the (populatio) mea cholesterol levels reported by Lab ad Lab?. The followig data shows the weekly sales volume of te busiesses before ad after a itesive advertisig campaig, tested the hypothesis that the idetical volume of sales before ad after the campaig, at 5% sigificace level. Subject Before campaig After campaig 8 0 5 8 3 0 3 4 0 8 5 7 6 5 7 6 30 8 5 0 9 6 9 0 0-4- Hypothesis testig: Testig proportio -4-- Hypothesis testig: Sigle populatio proportio Testig hypotheses about populatio proportios is carried out i much the same way as for meas whe the coditios ecessary for usig the ormal curve are met. Oe sided or two-sided tests may be made, depedig o the questio beig asked. Whe a sample sufficietly large for applicatio of the cetral limit theorem is available for aalysis, the test statistic is p p z ˆ poq which, whe H o is true, is distributed approximately as the stadard ormal. o o 6

Example: Let p equal the proportio of drivers who use a seat belt i a state that does ot have a madatory seat belt law. It was claimed that p = 0.4. A advertisig campaig was coducted to icrease this proportio. Two moths after the campaig, y = 04 out of a radom sample of = 590 drivers were wearig seat belts. Was the campaig successful? -4-- Hypothesis testig: The differece betwee two populatio proportios The most frequet test employed relative to the differece betwee two populatio proportios is that their differece is zero. It is possible, however, to test that the differece is equal to some other value. Both oe-sided ad two-sided tests may be made. Whe the ull hypothesis to be tested is p -p =0, we are hypothesizig that the two populatio proportios are equal. We use this as justificatio for combiig the results of the two samples to come up with a pooled estimate of the hypothesized commo proportio. If this procedure is adopted, oe computes x x p, ad q p where x ad x are the umbers i the first ad secod samples, respectively, possessig the characteristic of iterest. This pooled estimate of p=p =p is used i computig the estimated stadard error of the estimator, as follows: ˆ pˆ pˆ p( p) p( p) The test statistic becomes z ( pˆ pˆ ) ( p ˆ p p ˆ ˆ p ) which is distributed approximately as the stadard ormal if the ull hypothesis is true. 7

Example Time magazie reported the result of a telephoe poll of 800 adult Americas. The questio posed of the Americas who were surveyed was: "Should the federal tax o cigarettes be raised to pay for health care reform?" The results of the survey were: Is there sufficiet evidece at the α = 0.05 level, say, to coclude that the two populatios smokers ad o-smokers differ sigificatly with respect to their opiios? 8

Summary of the formulas 9