Lecture 12: Hypothesis Testing

Size: px

Start display at page:

Download "Lecture 12: Hypothesis Testing"

Brook Dalton
6 years ago
Views:

1 9.07 Itroductio to Statistics for Brai ad Cogitive Scieces Emery N. Brow Lecture : Hypothesis Testig I. Objectives. Uderstad the hypothesis testig paradigm.. Uderstad how hypothesis testig procedures are costructed. 3. Uderstad how to do sample size calculatios. 4. Uderstad the relatio betwee hypothesis testig, cofidece itervals, likelihood ad Bayesia methods ad their uses for iferece purposes. II. The Hypothesis Testig Paradigm ad Oe-Sample Tests A. Oe-Sample Tests To motivate the hypothesis testig paradigm we review first two problems. I both cases there is a sigle sample of data. Example 3. (cotiued) Aalysis of MEG Sesor Bias. Is there a oise bias i this SQUID sesor? Here we have a sample,..., x where we model each x i : (, ) We assume that x N µ. µ is ukow ad is kow. If there is bias, the µ 0, ad µ > 0 would suggest a positive bias where µ < 0 would suggest a egative bias. Example. (cotiued) Behavioral Learig. Has the aimal leared the task? We have data x,..., x ad we recall that our model is x i : B(, p) where p is ukow. We defie learig as performace greater tha expected by chace. I particular, chace suggests p = (biary choice), learig suggests p >, whereas impaired learig suggests p <. To defie the hypothesis testig paradigm we state a series of defiitios. A hypothesis is the statemet of a scietific questio i terms of a proposed value of a parameter i a probability model. Hypothesis testig is a process of establishig proof by falsificatio. It has two essetial compoets: a ull hypothesis ad a alterative hypothesis. The ull hypothesis is a stated value of the parameter which defies the hypothesis we wat to falsify. It is usually stated as a sigle value although it ca be composite. We deote it as H 0. The alterative hypothesis is the hypothesis whose veracity we wish to establish. It is usually defied by a value or set of values of the parameter that are differet from the oe specified i the ull hypothesis. We deote it as H A.

2 page : 9.07 Lecture : Hypothesis Testig To establish the result, we carry out a test i a attempt to reject the ull hypothesis. The test is a procedure based o the observed data that allows us to choose betwee the ull ad alterative hypothesis. Example 3. (cotiued) Aalysis of MEG Sesor Bias. For this example the ull hypothesis could be H 0 : µ = 0 ad alterative hypotheses could be H A : µ > 0 or H A : µ < 0. Example. (cotiued) Behavioral Learig. Here the ull hypothesis is H : p = ad the 0 alterative hypotheses could be oe-sided as either or H A : p > H A : p < or two-sided H A : p. To ivestigate the hypothesis we require a test statistic. A test statistic is a statistic whose values will allow us to distiguish betwee the ull ad the alterative hypotheses. Example 3. (cotiued) Aalysis of MEG Sesor Bias. For this example we recall that x : N ( µ, ). Hece, we ca choose x, because of its distributioal properties. I geeral, choosig the statistic is a importat issue. Here there are two cases: Case i): If x >> 0 or x << 0 we coclude µ 0. That is, we would be willig to coclude µ 0 if x is sufficietly large. Ideed, the larger x is i absolute value, the more likely we are to coclude µ 0. If our data allow us to reach this coclusio, we say we reject the ull hypothesis H 0. Case ii) If x is close to 0, we say we fail to reject the ull hypothesis H 0. We do ot say we accept the ull hypothesis because, i this case, we do ot reach a coclusio. There are two types of errors we ca commit i hypothesis testig. If we reject H 0 whe it is true, this is a error. It is called a Type error. The probability of the error is deoted as α. We write

3 page 3: 9.07 Lecture : Hypothesis Testig I Example 3. we have α = Pr(rejectig H0 H0 is true). α = Pr( x > c α µ = 0). We choose α as small as possible. Typical values are α = 0.0 ad α = What is c α? To determie it, we compute Pr(Type I error) = Pr(rejectig H H true) = α. 0 0 We wat α to be small α = Pr( x > c α µ = µ 0 = 0) ( x µ ) 0 ( c α µ 0 ) α = Pr( > ) x µ = Pr( 0 > z α ) where z α is a quatile of the stadard Gaussia. Hece, we have c α = µ 0 + z α. Some values of z α for the correspodig values of α are α z α The area to the right of z α or c α is the critical regio. It has probability cotet of α. The value c α is the cut-off value. This test based o the ormal distributio is the z-test. If H 0 is ot true, i.e. H A is true ad we fail to reject H 0, the this is also a error. It is termed a Type II error or β error ad is defied as Pr(Type II error) = Pr(fail to reject H H is true) 0 A Assume H A is true, i.e. µ = µ A > 0. The we have β = Pr( x < c α µ = µ A) ( x µ A ) ( cα µ A) = Pr( < ) = Pr( ( x µ A ) < z β )

4 page 4: 9.07 Lecture : Hypothesis Testig We do ot talk i terms of β but β which is the power of the test. The power of the test is the probability of rejectig the ull hypothesis whe it is false. We compute it as power = β = Pr(rejectig H H is true) 0 A = Pr( x > c α µ = µ A ). Remark.. You should ot carry out a test if the power is ot at least > 0.80 for importat H A. Remark.. Never report a egative result (failig to reject H 0 ) without reportig the power. Remark.3. A statistical test is a iformatio assay. As such, it is oly as useful as it is powerful agaist a importat alterative hypothesis. If we reject H 0 we report the p-value which is the smallest value of α for which we ca reject H 0. The p-value is also the probability of all evets that are at least as rare as the observed statistic. It is the probability that we are makig a mistake i rejectig the ull hypothesis whe the ull hypothesis is true. A observed value of the test statistic has statistical sigificace if p < α. Statistical sigificace does ot imply scietific sigificace. Example 3. (cotiued) Aalysis of MEG Sesor Bias. I this example assume we have 500 observatios,..., x ad the stadard deviatio is =. 0 x f Tesla. Suppose we wish to test H 0 : µ = 0 agaist the alterative hypothesis H A : µ > 0 with α = If we have x = 0. 0 f Tesla the ( x ) (500) (0.) z = = =.5. From the Table of a stadard ormal distributio we have z = z =.645 α 0.95 or α = 0.05 ad we see p = 0.0. Therefore, we reject ad coclude that there is a positive bias i the magetic field aroud this recordig sesor. Example. (cotiued) Behavioral Learig. For this experimet if our hypotheses are H 0 : p = H A : p > k We have = 40 trials ad observe k = ad pˆ =. Based o our Cetral Limit Theorem 40 results i Lecture 7, we ca aalyze this questio usig the Gaussia approximatio to the biomial provided p > 5 ad ( p) > 5. Because we have

5 page 5: 9.07 Lecture : Hypothesis Testig p( p) = 40 = 0 > 5 we ca use the Gaussia approximatio to the biomial to test H 0. Notice that if p( p) > 5 the it must be that p > 5 ad ( p) > 5. If we take α = 0.05, our test statistic is ( pˆ p ) (40) ( ) 6.3(0.05) z = = = = 0.63 [ p ( p)] 4 z =.645, hece z< z ad we fail to reject H 0. We see that α α z.645 α c p [ p ( p)] = + α = + = What is the power of this test if the true probability of a correct respose is p A = 0.7? power = Pr(rejectig H 0 H A = 0.7) = Pr( pˆ > 0.63 p A = 0.7) ( pˆ p A ) ( cα p A) = Pr( > p A = 0.7) [ p A ( p A )] [ p A ( p A )] [ (40) ( ) = Φ [(0.8)(0.7)] (6.3)(0.09) = Φ [ ] (.36) = Φ [.58] = = Therefore, if the true propesity to respod correctly were p A = 0.7 the there was a probability of of rejectig the ull hypothesis i this case. B. Oe-Sample Test, Power ad Sample Size Calculatio Give a ull ad a oe-sided alterative hypothesis we compute the power as H 0 : µ = µ 0 H A : µ = µ A > µ 0

6 page 6: 9.07 Lecture : Hypothesis Testig because z α the it is easy to show that Power = Pr(rejectig H H 0 A is true) ( x = Pr( µ 0 ) > z α µ = µ A ) z α = Pr( x > µ 0 + µ = µ A ) = Pr( x µ z α A > µ 0 µ A + µ = µ ( x µ ) (µ 0 µ A) = Pr( A > + z α µ = µ A ) (µ µ ) 0 A (µ 0 µ A) Φ( + z α ) = Φ( z α ) = z. Similarly, if we have α I geeral for a oe-sided alterative H 0 : µ = µ 0 H A : µ = µ A < µ 0 A ) (µ A µ 0 ) = Φ(z α + ) (µ 0 µ A ) power = Φ[ + z α ] µ 0 µ A power = Φ[ + z α ] We use these formulae to derive expressios for sample size calculatios. Notice that µ 0 µ A power = Φ( + z α ) µ 0 µ A power = β = Φ( + z ) α If we apply Φ to the left ad right had sides of the equatio above we get µ A µ 0 Φ ( β ) = Φ [Φ( + z α )] µ A µ 0 z β = + z z z = β α µ A µ 0 α

7 page 7: 9.07 Lecture : Hypothesis Testig Or sice z α = z α where Δ = µ 0 µ A. we obtai the sample size formula (z β + z α ) = Example 3. (cotiued) Aalysis of MEG Sesor Bias. How may measuremets should Steve Stufflebeam make daily to be at least 80% sure that if there is a positive drift of 0. 0 f Tesla he ca detect it with α = 0.05? To aswer this questio, we apply our sample size formula with.645, z = 0.84, = f Tesla ad we obtai z = Δ (.) ( ) = (0.). 6.8 = 0.0 = 748 Therefore, i our problem studied above, we should have take aroud 750 measuremets istead of 500. Remark.4. The ratio is like the iverse of a sigal-to-oise ratio. As we wat a smaller Δ Type I error (z α ), ad/or more power icreases. Similarly, icreases with ad decreases with Δ. III. Oe-Sample Two-Sided Tests If the alterative hypothesis is two-sided the we eed to costruct a two-sided test. A. Two-Sided Tests Example 3. (cotiued) Aalysis of MEG Sesor Bias. Suppose our ull ad alterative hypotheses for this problem are respectively H 0 : µ = 0 H A : µ 0. This alterative hypothesis implies that we would reject H 0 for either a positive bias or egative bias. Uder H 0 we have x : N ( µ, ). We would therefore reject H 0 if x >> µ 0 = 0 or if x << µ 0 = 0. We take as our test statistic x ad we will reject H 0 if x >> 0. Pick α ad take α α = α + α. Sice we do ot favor µ > 0 more tha µ < 0, we take α = α =. To reject H0, we cosider

8 page 8: 9.07 Lecture : Hypothesis Testig Pr( x > c α ) = Pr( x > c α or x < cα ) ( x µ ) x µ ) ( α µ ) = Pr( 0 ) ( cα > µ 0 ( c 0 0 or < ) = Pr(z > z α or z < z α ) = Pr( z > z α ) This is a two-sided test because we reject H 0 for either very large positive or egative values of the test statistic. We reject H 0 for z > z α or equivaletly, we reject H 0 if Examples of α ad z α are x > c α = µ + z α 0 α z α Example 3. (cotiued) Aalysis of MEG Sesor Bias. We cosider H 0 : µ = 0 H A : µ 0 Suppose we pick α = 0.05, we assume =. ad we compute x = 0., the we have ( x ) (500) 0. z = = =.5. ad z α = z =.96. Because.5 <.96, we have z < z α ad hece, we reject H Example. (cotiued) Behavioral Learig. We ca perform a similar aalysis for the learig example H 0 : p 0 = H A : p This alterative implies either impaired learig or learig ad would lead us to reject if pˆ >> or pˆ <<. We have that uder the Gaussia approximatio

9 page 9: 9.07 Lecture : Hypothesis Testig ( pˆ p 0 ) z = [ p ( p ] 0 0 ) Hece, give α we reject H 0 if or equivaletly if z> z α or z< zα pˆ > c α or pˆ < c α p 0 ( p 0 ) where cα = p +[ ] 0 z. Give α = 0.0, we obtai z = z =.645 ad sice α α 0.95 k pˆ = =, we have ( pˆ p 0 ) (40) ( ) 6.3(0.05) z = = = = 0.63 [ p ( p )] 0 0 ( ) 4 Because z< z0.95 we fail to reject H0. B. Power for the Two-Sided Alterative It is straightforward to show that for the mea of a Gaussia distributio with kow variace if the ull hypothesis is H 0 : µ = µ 0 versus the two-sided alterative H A : µ µ 0 the power of the two-sided test is defied as ( x µ 0 ) ( x µ 0 ) ( x µ 0 ) Pr( > z α H A is true) = Pr( > z α or < z H α A is true) This simplifies to (µ 0 µ A ) (µ A µ 0 ) power = Φ [ z α + ] + Φ [ z α + ]. The correspodig sample size formula is where Δ = µ A µ 0. = (z α + z β ). Δ Example 3. (cotiued) Aalysis of MEG Sesor Bias. If Steve wated to worry about both positive ad egative drift, the the previous sample size calculatio becomes with z α / = z =.96,

10 page 0: 9.07 Lecture : Hypothesis Testig (.) ( ) = (0.).(.8).(7.84) = = = 949. Remark.5. Notice that (z z 0.8 ) 8. Hece, 8 SNR Remark.6. I Homework Assigmet 9 we will explore a similar formula for the biomial distributio. C. Adjustmets for the Gaussia Assumptios. Oe-Sample t- Test for Ukow. The z test allows us to test hypotheses about the mea of a Gaussia distributio uder the assumptio that the variace is kow. The t- test allows us to test the same hypotheses whe the sample size is ot large ad the variace is ukow ad must be estimated from the sample. The t- test was developed by Gossett i 908 while he was workig i the Guiess Brewery. Gossett wrote uder the pseudoym of Studet. For this reaso it is still referred to as Studet s t-test. The distributio was worked out later by R.A. Fisher. Suppose x,..., x is a radom sample from a Gaussia probability model N( µ, ) ad we wish to test the ull hypothesis H 0 : µ = µ 0 agaist the alterative H A : µ µ 0. Assume is ot kow ad is ot large, say 5 < <0. Therefore, as discussed i Lecture 8, we costruct a t- test by estimatig with a ubiased estimate, ad istead of a z statistic we costruct a t- statistic as ( x t = µ 0 ) s where s = ( ) (x i x) i= t : t, a t- distributio o degrees of freedom. Recall that we showed i the practice problems for the secod I Class Examiatio that s is a ubiased estimate of. Give α, to test H 0 : µ = µ 0 agaist H A : µ µ 0 we reject H 0 for t > t, α

11 page : 9.07 Lecture : Hypothesis Testig or equivaletly if either s s x > µ 0 + t, α or x < µ t 0, α Example. Reactio Time Measuremets. I a learig experimet, alog with the correct ad icorrect resposes, we record the reactio times which are the times it takes the aimal to execute the task. I a previous study, oce it had bee determied that a aimal had leared a task, it was foud that the average reactio time was 0 secods. O the 4 trials after the aimal leared the task by the behavioral criteria, the average reactio time was 8.5 secods. The sample stadard deviatio was.. Is this aimal s performace differet from that previously reported? We have H 0 : µ = 0.0 H A : µ 0.0 ( x µ) (4) (8.5 0) (3.74)(.75) t = = = =.6 s (.) (.) Now it follows from the Table of the t-distributio that t 3,0.975 =.6. Because t > t 3,0.975 we reject H 0.. Biomial Exact Method If p 0( p 0 ) < 5, we caot use the Gaussia approximatio to the biomial to tests hypotheses about the biomial proportio. I this case, we base the test o the exact biomial probabilities. k We have x : B (, p ) ad we observe k successes, ad we take pˆ =. The p-value depeds o whether pˆ p0 or pˆ > p 0. If pˆ p 0, the If pˆ > p, the 0 p value = Pr( k successes i trials H0 ) k j j = ( ) p ( p ) j 0 0 j=0 p value = Pr( k successes i trials H0 ) j j = ( )p ( p ) j 0 0 j= k Example.. A New Learig Experimet. Assume that we execute the learig experimet with 0 trials ad there is a probability of a correct respose by chace. Suppose k = ad 3 3 pˆ = =. We wat to test H 0 : p = agaist H A : p. We see that

12 page : 9.07 Lecture : Hypothesis Testig 40 4 p 0( p 0 ) = 0 = = 4 < We have > ad hece pˆ > p 0 ad we compute the p-value as j 0 j p 0 = Pr( x ) = ( j ) 3 3 j= = or equivaletly p = (0.086) = Therefore, we reject H 0 ad coclude that the aimal is ot performig at chace ad most likely has leared. Remark.7. A importat topic that we have ot cosidered is o-parametric tests. Each of the mai parametric tests we cosidered, i.e., the z-test ad t-test has a o-parametric aalog. It is importat to use these oparametric tests whe the sample size is small ad the Gaussia assumptio o which most of the tests are based is ot valid (Roser, 006). Remark.8. The requiremet of the Gaussia assumptio for most the stadard hypothesis testig paradigms is very limitig. For this reaso, we have spet more time i the course learig about priciples of moder statistical aalysis such as likelihood methods, the bootstrap ad other Mote Carlo methods, Bayesia methods ad, as we shall see i Lecture 6, the geeralized liear model. D. The Relatio Amog Cofidece Itervals, Hypothesis Tests ad p- Values The three statistics we have cosidered thus far for two-sided hypothesis tests ca be used to costruct 00 ( α) cofidece itervals. These are z- test (Gaussia mea ad variace kow) x ± z α z- test (Biomial proportio) pˆ ± pˆ( pˆ) z α t- test (Gaussia mea ad variace ukow) s x ± t, α

13 page 3: 9.07 Lecture : Hypothesis Testig We ca reject H 0 with α = 0.05 if the value of the parameter uder the ull hypothesis is ot i the 00%( α) cofidece iterval. I this way, a cofidece iterval provides a hypothesis test. A similar costructio of a cofidece boud ca be used to carry out a oe-sided test. Most importatly, the cofidece iterval tells us the reasoable rage for the parameter that ca be iferred from the data aalysis. Cofidece itervals report the aalysis results o the physical scale o which the problem takes place. The p- value oly tells us how likely the observed statistic would be uder H 0. I this way, the hypothesis test simply provides a mechaism for makig a decisio. Cofidece itervals are always more iformative tha p-values. Realizig this i the early 80s, the New Eglad Joural of Medicie set out a recommedatio obligig that all statistical results be reported i terms of cofidece itervals. This is ow the stadard for publicatio of research articles i that joural. Ideed, p- values aloe mea othig! Example 3. (cotiued) Aalysis of MEG Sesor Bias. To illustrate this poit, we ote that i this problem, the 95% cofidece iterval is. 0. ±.96 (500) 0. ± [0.04, 0.06] ad we reject H 0 because 0 is ot i the cofidece iterval. The p- value was < 0.05 which essetially tells us othig about the magitude of the bias i the sesor. Example. (cotiued) Behavioral Learig. I this problem, the 95% cofidece iterval is 4 pˆ ± ± ± 0.58 [0.39, 0.708]. Hece, the p- value is > We fail to reject the ull hypothesis. The cofidece iterval ot oly makes it clear that the ull hypothesis is ot rejected, it also shows what the reasoable rage of ucertaity is for the aimal s propesity to lear. Remark.9. Oe of the true values of the hypothesis-testig paradigm is sample size calculatios.

14 page 4: 9.07 Lecture : Hypothesis Testig IV. Two-Sample Tests A. Two-Sample t Test Example.3. Reactio Time Aalysis i a Learig Experimet. Suppose we have the reactio times o trial 50 of the 9 rats from the treatmet group ad 3 rats from the cotrol group we studied i Homework Assigmet 8-9. The mea reactio time for the treatmet group was 5.5 sec with a stadard deviatio of 3.5 sec ad the mea reactio time for the cotrol group was.5 sec with a stadard deviatio of 3. sec. What ca be said about the uderlyig mea differeces i reactio times betwee the two groups? Assume we have T x i : N (µt, T ) i =,..., T c x j : N(µ c, c ) j =,..., c Take H 0 : µ T = µ c H A : µ T µ c We have x = 5.5sec T s = 3.5sec T x c =.5sec s c = 3.sec Uder we have E x ) = E x ) ( ) = µ ad hece, H 0 ( T x c ( T E x c T µ c = 0 T T x : N ( µ, ) T T c c x : N ( µ, ) c c where T = 9 ad c = 3. Now uder the idepedece assumptio of the two samples T c Var( x T x c ) = Var ( x T ) +Var ( x c ) = + Hece, uder ad the assumptio that = =, H 0 T c T c

15 page 5: 9.07 Lecture : Hypothesis Testig If were kow, the we would have x T x c : N(0, ( + )). T c x T x c ( + ) T c : N(0,) ad we could base our hypothesis test o this z- statistic. Sice is ukow, let us cosider the estimate of defied by where ( T ) st + ( c )sc s = T + c T T s T = ( T ) (x j x T ) j= c c s c = ( c ) (x i x c ). i= T c Notice that if we assume that = = the ( T ) E( s T ) + ( c ) E( s c ) ( T ) + ( c ) Es ( )= = =, + + ad s is a ubiased estimate of. T c T c Give α we ca test H 0 with the followig test statistic x T xc t = s( + ) T termed the two-sample t- statistic with equal variace with = T + c degrees of freedom. We reject H 0 at level α c t > t, α Example.3 (cotiued). For this problem we have

16 page 6: 9.07 Lecture : Hypothesis Testig ad hece, ( T ) s T + (c )s s = c T + c 8(3.5) + ()(3.) = 0 8(0.56) +(9.6) = = = t = = =.9 [0 ( + )] (.88) 9 3 From the Table of the t-distributio we have t =.086. Sice t t we reject H 0 ad > 0, ,0.975 coclude that the mea reactio times of the two groups are differet. The loger average reactio time for the treatmet group suggests that learig may be impaired i that group. B. Cofidece Iterval for the True Differece i the Meas Because our test statistic follows a t- distributio, we ca costruct a 00%( α) cofidece iterval for the true differece i the measure as follows x T x c ± t, α s +. Example.3 (cotiued) Reactio Time Aalysis i a Behavioral Experimet. If we apply this formula to the data from Example.3, we obtai with α = 0.05 a 95% cofidece iterval for the true mea differece of T C 4 ±.086 [0 + ] ±.086 (.37) 4 ±.86 which is [ ]. The iterval does ot cotai zero as expected based o our hypothesis test. More importatly, we see that ot oly is it ulikely that the true mea differece is 0, but that the differece could be as small as.4 sec or as large as 6.86 sec.

17 page 7: 9.07 Lecture : Hypothesis Testig Remark.0. If we caot assume that the variaces i the two samples are equal but ukow, the we have to devise a alterative t- statistic that takes accout of the ukow ad estimated variaces. It ca be show that a appropriate t- statistic i this case is where the umber of degrees of freedom is x T xc t = s s T c + T c T sc s ( + ) T c d ' = s s T ( T ) + c ( c ) T c This statistic is the Satterthwaite approximatio to the degrees of freedom for a t- statistic with ukow ad uequal variace (Roser, 006). C. Two-Sample Test for Biomial Proportios Example. (cotiued) Behavioral Learig Experimet. Let us assume that o Day, the rat had k = correct resposes ad o Day we had k = 5 correct resposes. Day is out of 40 trials ad Day is out of 0 trials. What ca we say about the differece i performace betwee the two days? We could treat the results from Day as truth ad compare Day to Day. A more appropriate way to proceed is to treat the data from both days as if they were observed with ucertaity. For this we assume Take x j : B(, p ) j =,..., x j : B(, p ) j =..,,.. H 0 : p = p H A : p p Our test statistic ca be derived from the estimates of p ad p k pˆ = pˆ = k

18 page 8: 9.07 Lecture : Hypothesis Testig Uder H 0 ad usig the Gaussia approximatio to the biomial p( p) pˆ N( p, ) p( p) pˆ N( p, ) ad if we assume the samples are idepedet we have the approximate z- statistic ( pˆ pˆ ) z = N(0,) [ pˆ ( pˆ ) + ] where, sice p is ukow, we estimate it as pˆ defied as pˆ ˆp k k ˆ + + p = = + + Hece, give a level α we have the approximate z- statistic is pˆ pˆ z = [ pˆ ( pˆ) + ] We reject H 0 : if z > z α ad the approximate p- value is p = [ Φ ( z )]. A alterative form of z that icludes the cotiuity correctio (Lecture 8) to make the Gaussia approximatio to the biomial more accurate is defied as pˆ pˆ + z =. [ pˆ ( pˆ) + ] Example. (cotiued) Behavioral Learig Experimet. For this problem, we have or k + k pˆ = = = = pˆ =

19 page 9: 9.07 Lecture : Hypothesis Testig z = [(0.667)(0.3833) + ] = = = = [ ] (0.0773) Because z α = z =.96 we have z > z so we fail to reject the ull hypothesis of o differece i performace. Here the p- value is Similarly, the 95% cofidece iterval is p = [ Φ (.50)] = [0.0668] = or pˆ pˆ ± z [ pˆ ( pˆ) + ] 0.0 ±.96(0.33) 0.0 ± 0.6 [-0.46, 0.06]. As expected, the 95% cofidece iterval icludes zero which explais why we fail to reject H 0. The cofidece iterval suggests that there may be evidece of improved performace o the secod day relative to the first sice most of the iterval icludes egative values of the differece. The value of the cotiuity correctio is + = + = This chages z from to z =.465 ad the corrected p-value is Remark.. The two sample biomial data ca also be aalyzed as a cotigecy table Correct Icorrect Total Day k k Day k k k + k + ( k + k ) + A cotigecy table is a table cosistig of two rows cross-classified by two colums. The cotigecy table aalysis uses a test statistic based o a chi-squared distributio with oe degree of freedom ad gives the same result as the z- test. This is to be expected sice we showed i Lecture 4 that the square of a stadard Gaussia radom variable is a chi-squared radom variable with oe degree of freedom. We will study this i Homework Assigmet 9. Remark.. Use of the Gaussia approximatio to the biomial to costruct this z- test is valid provided p 5, p 5, ( p ) 5 ad ( p ) 5. This correspods to the coditio

20 page 0: 9.07 Lecture : Hypothesis Testig that the aalysis of the cotigecy table usig a chi-squared statistic is valid if the expected umber of observatios per cell is at least 5. Remark.3. This problem could easily be aalyzed usig a Bayesia aalysis i which p ad p had uiform priors. We could the use Algorithm 0. to compare the posterior desities of p ad p. Remark.4. We ca perform a likelihood aalysis ad compare the overlap i the likelihoods. We could alteratively costruct α cofidece itervals for p ad p separately usig the likelihood theory ad see if they overlap. Remark.5. All the tests we have discussed here ca be derived from the likelihood theory we preseted by the likelihood ratio procedure. A detailed discussio of this approach is beyod the scope of this course. (See DeGroot ad Schervish, 00; Rice, 007). V. Summary Hypothesis testig is a key part of classical statistics. It emphasizes procedures based primarily o the Gaussia distributio ad Gaussia approximatios. The hypothesis testig paradigm is very useful for prospective plaig of studies usig sample size formulae. Cofidece itervals are always more iformative tha p-values. Cofidece itervals report the aalysis results o the physical scale o which the problem takes place. May of the classical problems i hypothesis testig ca ow be carried out i a more iformative way usig more moder approaches such as Mote Carlo methods, bootstrappig, ad Bayesia methods. Ackowledgmets I am grateful to Julie Scott for techical assistace i preparig this lecture ad to Jim Mutch for careful proofreadig ad commets. Refereces DeGroot MH, Schervish MJ. Probability ad Statistics, 3rd editio. Bosto, MA: Addiso Wesley, 00. Rice JA. Mathematical Statistics ad Data Aalysis, 3 rd editio. Bosto, MA, 007. Roser B. Fudametals of Biostatistics, 6 th editio. Bosto, MA: Duxbury Press, 006.

21 MIT OpeCourseWare Statistics for Brai ad Cogitive Sciece Fall 06 For iformatio about citig these materials or our Terms of Use, visit:

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate