Lecture 12: Hypothesis Testing

Similar documents
A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

6 Sample Size Calculations

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

The standard deviation of the mean

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Common Large/Small Sample Tests 1/55

4. Understand the key properties of maximum likelihood estimation. 6. Understand how to compute the uncertainty in maximum likelihood estimates.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Properties and Hypothesis Testing

Data Analysis and Statistical Methods Statistics 651

Topic 9: Sampling Distributions of Estimators

Topic 18: Composite Hypotheses

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Topic 9: Sampling Distributions of Estimators

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Topic 9: Sampling Distributions of Estimators

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Final Examination Solutions 17/6/2010

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Math 140 Introductory Statistics

Introductory statistics

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

STAT431 Review. X = n. n )

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Chapter 6 Sampling Distributions

Frequentist Inference

Sample Size Determination (Two or More Samples)

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 8: Estimating with Confidence

Stat 319 Theory of Statistics (2) Exercises

MATH/STAT 352: Lecture 15

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Lecture 2: Monte Carlo Simulation

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Module 1 Fundamentals in statistics

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

1.010 Uncertainty in Engineering Fall 2008

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

1 Inferential Methods for Correlation and Regression Analysis

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Chapter 23: Inferences About Means

This is an introductory course in Analysis of Variance and Design of Experiments.

A statistical method to determine sample size to estimate characteristic value of soil parameters

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

MA238 Assignment 4 Solutions (part a)

Computing Confidence Intervals for Sample Data

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Statistics 511 Additional Materials

GUIDELINES ON REPRESENTATIVE SAMPLING

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Notes on Hypothesis Testing, Type I and Type II Errors

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Economics Spring 2015

Statistical Inference

Random Variables, Sampling and Estimation

Biostatistics for Med Students. Lecture 2

Problem Set 4 Due Oct, 12

Statistical inference: example 1. Inferential Statistics

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Chapter 5: Hypothesis testing

Power and Type II Error

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

4. Partial Sums and the Central Limit Theorem

5. Likelihood Ratio Tests

Stat 421-SP2012 Interval Estimation Section

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

GG313 GEOLOGICAL DATA ANALYSIS

Chapter 13, Part A Analysis of Variance and Experimental Design

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Statistical Intervals for a Single Sample

Chapter two: Hypothesis testing

Last Lecture. Wald Test

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

University of California, Los Angeles Department of Statistics. Hypothesis testing

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Transcription:

9.07 Itroductio to Statistics for Brai ad Cogitive Scieces Emery N. Brow Lecture : Hypothesis Testig I. Objectives. Uderstad the hypothesis testig paradigm.. Uderstad how hypothesis testig procedures are costructed. 3. Uderstad how to do sample size calculatios. 4. Uderstad the relatio betwee hypothesis testig, cofidece itervals, likelihood ad Bayesia methods ad their uses for iferece purposes. II. The Hypothesis Testig Paradigm ad Oe-Sample Tests A. Oe-Sample Tests To motivate the hypothesis testig paradigm we review first two problems. I both cases there is a sigle sample of data. Example 3. (cotiued) Aalysis of MEG Sesor Bias. Is there a oise bias i this SQUID sesor? Here we have a sample,..., x where we model each x i : (, ) We assume that x N µ. µ is ukow ad is kow. If there is bias, the µ 0, ad µ > 0 would suggest a positive bias where µ < 0 would suggest a egative bias. Example. (cotiued) Behavioral Learig. Has the aimal leared the task? We have data x,..., x ad we recall that our model is x i : B(, p) where p is ukow. We defie learig as performace greater tha expected by chace. I particular, chace suggests p = (biary choice), learig suggests p >, whereas impaired learig suggests p <. To defie the hypothesis testig paradigm we state a series of defiitios. A hypothesis is the statemet of a scietific questio i terms of a proposed value of a parameter i a probability model. Hypothesis testig is a process of establishig proof by falsificatio. It has two essetial compoets: a ull hypothesis ad a alterative hypothesis. The ull hypothesis is a stated value of the parameter which defies the hypothesis we wat to falsify. It is usually stated as a sigle value although it ca be composite. We deote it as H 0. The alterative hypothesis is the hypothesis whose veracity we wish to establish. It is usually defied by a value or set of values of the parameter that are differet from the oe specified i the ull hypothesis. We deote it as H A.

page : 9.07 Lecture : Hypothesis Testig To establish the result, we carry out a test i a attempt to reject the ull hypothesis. The test is a procedure based o the observed data that allows us to choose betwee the ull ad alterative hypothesis. Example 3. (cotiued) Aalysis of MEG Sesor Bias. For this example the ull hypothesis could be H 0 : µ = 0 ad alterative hypotheses could be H A : µ > 0 or H A : µ < 0. Example. (cotiued) Behavioral Learig. Here the ull hypothesis is H : p = ad the 0 alterative hypotheses could be oe-sided as either or H A : p > H A : p < or two-sided H A : p. To ivestigate the hypothesis we require a test statistic. A test statistic is a statistic whose values will allow us to distiguish betwee the ull ad the alterative hypotheses. Example 3. (cotiued) Aalysis of MEG Sesor Bias. For this example we recall that x : N ( µ, ). Hece, we ca choose x, because of its distributioal properties. I geeral, choosig the statistic is a importat issue. Here there are two cases: Case i): If x >> 0 or x << 0 we coclude µ 0. That is, we would be willig to coclude µ 0 if x is sufficietly large. Ideed, the larger x is i absolute value, the more likely we are to coclude µ 0. If our data allow us to reach this coclusio, we say we reject the ull hypothesis H 0. Case ii) If x is close to 0, we say we fail to reject the ull hypothesis H 0. We do ot say we accept the ull hypothesis because, i this case, we do ot reach a coclusio. There are two types of errors we ca commit i hypothesis testig. If we reject H 0 whe it is true, this is a error. It is called a Type error. The probability of the error is deoted as α. We write

page 3: 9.07 Lecture : Hypothesis Testig I Example 3. we have α = Pr(rejectig H0 H0 is true). α = Pr( x > c α µ = 0). We choose α as small as possible. Typical values are α = 0.0 ad α = 0.05. What is c α? To determie it, we compute Pr(Type I error) = Pr(rejectig H H true) = α. 0 0 We wat α to be small α = Pr( x > c α µ = µ 0 = 0) ( x µ ) 0 ( c α µ 0 ) α = Pr( > ) x µ = Pr( 0 > z α ) where z α is a quatile of the stadard Gaussia. Hece, we have c α = µ 0 + z α. Some values of z α for the correspodig values of α are α z α 0.05.645 0.05.96 0.0.35 The area to the right of z α or c α is the critical regio. It has probability cotet of α. The value c α is the cut-off value. This test based o the ormal distributio is the z-test. If H 0 is ot true, i.e. H A is true ad we fail to reject H 0, the this is also a error. It is termed a Type II error or β error ad is defied as Pr(Type II error) = Pr(fail to reject H H is true) 0 A Assume H A is true, i.e. µ = µ A > 0. The we have β = Pr( x < c α µ = µ A) ( x µ A ) ( cα µ A) = Pr( < ) = Pr( ( x µ A ) < z β )

page 4: 9.07 Lecture : Hypothesis Testig We do ot talk i terms of β but β which is the power of the test. The power of the test is the probability of rejectig the ull hypothesis whe it is false. We compute it as power = β = Pr(rejectig H H is true) 0 A = Pr( x > c α µ = µ A ). Remark.. You should ot carry out a test if the power is ot at least > 0.80 for importat H A. Remark.. Never report a egative result (failig to reject H 0 ) without reportig the power. Remark.3. A statistical test is a iformatio assay. As such, it is oly as useful as it is powerful agaist a importat alterative hypothesis. If we reject H 0 we report the p-value which is the smallest value of α for which we ca reject H 0. The p-value is also the probability of all evets that are at least as rare as the observed statistic. It is the probability that we are makig a mistake i rejectig the ull hypothesis whe the ull hypothesis is true. A observed value of the test statistic has statistical sigificace if p < α. Statistical sigificace does ot imply scietific sigificace. Example 3. (cotiued) Aalysis of MEG Sesor Bias. I this example assume we have 500 observatios,..., x ad the stadard deviatio is =. 0 x f Tesla. Suppose we wish to test H 0 : µ = 0 agaist the alterative hypothesis H A : µ > 0 with α = 0.05. If we have x = 0. 0 f Tesla the ( x ) (500) (0.) z = = =.5. From the Table of a stadard ormal distributio we have z = z =.645 α 0.95 or α = 0.05 ad we see p = 0.0. Therefore, we reject ad coclude that there is a positive bias i the magetic field aroud this recordig sesor. Example. (cotiued) Behavioral Learig. For this experimet if our hypotheses are H 0 : p = H A : p > k We have = 40 trials ad observe k = ad pˆ =. Based o our Cetral Limit Theorem 40 results i Lecture 7, we ca aalyze this questio usig the Gaussia approximatio to the biomial provided p > 5 ad ( p) > 5. Because we have

page 5: 9.07 Lecture : Hypothesis Testig p( p) = 40 = 0 > 5 we ca use the Gaussia approximatio to the biomial to test H 0. Notice that if p( p) > 5 the it must be that p > 5 ad ( p) > 5. If we take α = 0.05, our test statistic is ( pˆ p ) (40) (0.55 0.5) 6.3(0.05) z = = = = 0.63 [ p ( p)] 4 z =.645, hece z< z ad we fail to reject H 0. We see that α α z.645 α c p [ p ( p)] = + α = + =0.63 6.3 What is the power of this test if the true probability of a correct respose is p A = 0.7? power = Pr(rejectig H 0 H A = 0.7) = Pr( pˆ > 0.63 p A = 0.7) ( pˆ p A ) ( cα p A) = Pr( > p A = 0.7) [ p A ( p A )] [ p A ( p A )] [ (40) (0.63 0.7) = Φ [(0.8)(0.7)] (6.3)(0.09) = Φ [ ] (.36) = Φ [.58] = 0.057 = 0.943. Therefore, if the true propesity to respod correctly were p A = 0.7 the there was a probability of 0.943 of rejectig the ull hypothesis i this case. B. Oe-Sample Test, Power ad Sample Size Calculatio Give a ull ad a oe-sided alterative hypothesis we compute the power as H 0 : µ = µ 0 H A : µ = µ A > µ 0

page 6: 9.07 Lecture : Hypothesis Testig because z α the it is easy to show that Power = Pr(rejectig H H 0 A is true) ( x = Pr( µ 0 ) > z α µ = µ A ) z α = Pr( x > µ 0 + µ = µ A ) = Pr( x µ z α A > µ 0 µ A + µ = µ ( x µ ) (µ 0 µ A) = Pr( A > + z α µ = µ A ) (µ µ ) 0 A (µ 0 µ A) Φ( + z α ) = Φ( z α ) = z. Similarly, if we have α I geeral for a oe-sided alterative H 0 : µ = µ 0 H A : µ = µ A < µ 0 A ) (µ A µ 0 ) = Φ(z α + ) (µ 0 µ A ) power = Φ[ + z α ] µ 0 µ A power = Φ[ + z α ] We use these formulae to derive expressios for sample size calculatios. Notice that µ 0 µ A power = Φ( + z α ) µ 0 µ A power = β = Φ( + z ) α If we apply Φ to the left ad right had sides of the equatio above we get µ A µ 0 Φ ( β ) = Φ [Φ( + z α )] µ A µ 0 z β = + z z z = β α µ A µ 0 α

page 7: 9.07 Lecture : Hypothesis Testig Or sice z α = z α where Δ = µ 0 µ A. we obtai the sample size formula (z β + z α ) = Example 3. (cotiued) Aalysis of MEG Sesor Bias. How may measuremets should Steve Stufflebeam make daily to be at least 80% sure that if there is a positive drift of 0. 0 f Tesla he ca detect it with α = 0.05? To aswer this questio, we apply our sample size formula with.645, z = 0.84, = f Tesla ad we obtai z =. 0 0.95 0.80 Δ (.) (.645 + 0.84) = (0.). 6.8 = 0.0 = 748 Therefore, i our problem studied above, we should have take aroud 750 measuremets istead of 500. Remark.4. The ratio is like the iverse of a sigal-to-oise ratio. As we wat a smaller Δ Type I error (z α ), ad/or more power icreases. Similarly, icreases with ad decreases with Δ. III. Oe-Sample Two-Sided Tests If the alterative hypothesis is two-sided the we eed to costruct a two-sided test. A. Two-Sided Tests Example 3. (cotiued) Aalysis of MEG Sesor Bias. Suppose our ull ad alterative hypotheses for this problem are respectively H 0 : µ = 0 H A : µ 0. This alterative hypothesis implies that we would reject H 0 for either a positive bias or egative bias. Uder H 0 we have x : N ( µ, ). We would therefore reject H 0 if x >> µ 0 = 0 or if x << µ 0 = 0. We take as our test statistic x ad we will reject H 0 if x >> 0. Pick α ad take α α = α + α. Sice we do ot favor µ > 0 more tha µ < 0, we take α = α =. To reject H0, we cosider

page 8: 9.07 Lecture : Hypothesis Testig Pr( x > c α ) = Pr( x > c α or x < cα ) ( x µ ) x µ ) ( α µ ) = Pr( 0 ) ( cα > µ 0 ( c 0 0 or < ) = Pr(z > z α or z < z α ) = Pr( z > z α ) This is a two-sided test because we reject H 0 for either very large positive or egative values of the test statistic. We reject H 0 for z > z α or equivaletly, we reject H 0 if Examples of α ad z α are x > c α = µ + z α 0 α z α 0.0.645 0.05.96 0.0.58 Example 3. (cotiued) Aalysis of MEG Sesor Bias. We cosider H 0 : µ = 0 H A : µ 0 Suppose we pick α = 0.05, we assume =. ad we compute x = 0., the we have ( x ) (500) 0. z = = =.5. ad z α = z =.96. Because.5 <.96, we have z < z α ad hece, we reject H 0. 0.975 Example. (cotiued) Behavioral Learig. We ca perform a similar aalysis for the learig example H 0 : p 0 = H A : p This alterative implies either impaired learig or learig ad would lead us to reject if pˆ >> or pˆ <<. We have that uder the Gaussia approximatio

page 9: 9.07 Lecture : Hypothesis Testig ( pˆ p 0 ) z = [ p ( p ] 0 0 ) Hece, give α we reject H 0 if or equivaletly if z> z α or z< zα pˆ > c α or pˆ < c α p 0 ( p 0 ) where cα = p +[ ] 0 z. Give α = 0.0, we obtai z = z =.645 ad sice α α 0.95 k pˆ = =, we have 40 40 ( pˆ p 0 ) (40) (0.55 0.5) 6.3(0.05) z = = = = 0.63 [ p ( p )] 0 0 ( ) 4 Because z< z0.95 we fail to reject H0. B. Power for the Two-Sided Alterative It is straightforward to show that for the mea of a Gaussia distributio with kow variace if the ull hypothesis is H 0 : µ = µ 0 versus the two-sided alterative H A : µ µ 0 the power of the two-sided test is defied as ( x µ 0 ) ( x µ 0 ) ( x µ 0 ) Pr( > z α H A is true) = Pr( > z α or < z H α A is true) This simplifies to (µ 0 µ A ) (µ A µ 0 ) power = Φ [ z α + ] + Φ [ z α + ]. The correspodig sample size formula is where Δ = µ A µ 0. = (z α + z β ). Δ Example 3. (cotiued) Aalysis of MEG Sesor Bias. If Steve wated to worry about both positive ad egative drift, the the previous sample size calculatio becomes with z α / = z 0.975 =.96,

page 0: 9.07 Lecture : Hypothesis Testig (.) (.96 + 0.84) = (0.).(.8).(7.84) = = 0.0 0.0 = 949. Remark.5. Notice that (z 0.975 + z 0.8 ) 8. Hece, 8 SNR Remark.6. I Homework Assigmet 9 we will explore a similar formula for the biomial distributio. C. Adjustmets for the Gaussia Assumptios. Oe-Sample t- Test for Ukow. The z test allows us to test hypotheses about the mea of a Gaussia distributio uder the assumptio that the variace is kow. The t- test allows us to test the same hypotheses whe the sample size is ot large ad the variace is ukow ad must be estimated from the sample. The t- test was developed by Gossett i 908 while he was workig i the Guiess Brewery. Gossett wrote uder the pseudoym of Studet. For this reaso it is still referred to as Studet s t-test. The distributio was worked out later by R.A. Fisher. Suppose x,..., x is a radom sample from a Gaussia probability model N( µ, ) ad we wish to test the ull hypothesis H 0 : µ = µ 0 agaist the alterative H A : µ µ 0. Assume is ot kow ad is ot large, say 5 < <0. Therefore, as discussed i Lecture 8, we costruct a t- test by estimatig with a ubiased estimate, ad istead of a z statistic we costruct a t- statistic as ( x t = µ 0 ) s where s = ( ) (x i x) i= t : t, a t- distributio o degrees of freedom. Recall that we showed i the practice problems for the secod I Class Examiatio that s is a ubiased estimate of. Give α, to test H 0 : µ = µ 0 agaist H A : µ µ 0 we reject H 0 for t > t, α

page : 9.07 Lecture : Hypothesis Testig or equivaletly if either s s x > µ 0 + t, α or x < µ t 0, α Example. Reactio Time Measuremets. I a learig experimet, alog with the correct ad icorrect resposes, we record the reactio times which are the times it takes the aimal to execute the task. I a previous study, oce it had bee determied that a aimal had leared a task, it was foud that the average reactio time was 0 secods. O the 4 trials after the aimal leared the task by the behavioral criteria, the average reactio time was 8.5 secods. The sample stadard deviatio was.. Is this aimal s performace differet from that previously reported? We have H 0 : µ = 0.0 H A : µ 0.0 ( x µ) (4) (8.5 0) (3.74)(.75) t = = = =.6 s (.) (.) Now it follows from the Table of the t-distributio that t 3,0.975 =.6. Because t > t 3,0.975 we reject H 0.. Biomial Exact Method If p 0( p 0 ) < 5, we caot use the Gaussia approximatio to the biomial to tests hypotheses about the biomial proportio. I this case, we base the test o the exact biomial probabilities. k We have x : B (, p ) ad we observe k successes, ad we take pˆ =. The p-value depeds o whether pˆ p0 or pˆ > p 0. If pˆ p 0, the If pˆ > p, the 0 p value = Pr( k successes i trials H0 ) k j j = ( ) p ( p ) j 0 0 j=0 p value = Pr( k successes i trials H0 ) j j = ( )p ( p ) j 0 0 j= k Example.. A New Learig Experimet. Assume that we execute the learig experimet with 0 trials ad there is a probability of a correct respose by chace. Suppose k = ad 3 3 pˆ = =. We wat to test H 0 : p = agaist H A : p. We see that 0 5 3 3

page : 9.07 Lecture : Hypothesis Testig 40 4 p 0( p 0 ) = 0 = = 4 < 5 3 3 9 9 We have > ad hece pˆ > p 0 ad we compute the p-value as 0 3 0 j 0 j p 0 = Pr( x ) = ( j ) 3 3 j= = 0.086 or equivaletly p = (0.086) = 0.057. Therefore, we reject H 0 ad coclude that the aimal is ot performig at chace ad most likely has leared. Remark.7. A importat topic that we have ot cosidered is o-parametric tests. Each of the mai parametric tests we cosidered, i.e., the z-test ad t-test has a o-parametric aalog. It is importat to use these oparametric tests whe the sample size is small ad the Gaussia assumptio o which most of the tests are based is ot valid (Roser, 006). Remark.8. The requiremet of the Gaussia assumptio for most the stadard hypothesis testig paradigms is very limitig. For this reaso, we have spet more time i the course learig about priciples of moder statistical aalysis such as likelihood methods, the bootstrap ad other Mote Carlo methods, Bayesia methods ad, as we shall see i Lecture 6, the geeralized liear model. D. The Relatio Amog Cofidece Itervals, Hypothesis Tests ad p- Values The three statistics we have cosidered thus far for two-sided hypothesis tests ca be used to costruct 00 ( α) cofidece itervals. These are z- test (Gaussia mea ad variace kow) x ± z α z- test (Biomial proportio) pˆ ± pˆ( pˆ) z α t- test (Gaussia mea ad variace ukow) s x ± t, α

page 3: 9.07 Lecture : Hypothesis Testig We ca reject H 0 with α = 0.05 if the value of the parameter uder the ull hypothesis is ot i the 00%( α) cofidece iterval. I this way, a cofidece iterval provides a hypothesis test. A similar costructio of a cofidece boud ca be used to carry out a oe-sided test. Most importatly, the cofidece iterval tells us the reasoable rage for the parameter that ca be iferred from the data aalysis. Cofidece itervals report the aalysis results o the physical scale o which the problem takes place. The p- value oly tells us how likely the observed statistic would be uder H 0. I this way, the hypothesis test simply provides a mechaism for makig a decisio. Cofidece itervals are always more iformative tha p-values. Realizig this i the early 80s, the New Eglad Joural of Medicie set out a recommedatio obligig that all statistical results be reported i terms of cofidece itervals. This is ow the stadard for publicatio of research articles i that joural. Ideed, p- values aloe mea othig! Example 3. (cotiued) Aalysis of MEG Sesor Bias. To illustrate this poit, we ote that i this problem, the 95% cofidece iterval is. 0. ±.96 (500) 0. ± 0.096 [0.04, 0.06] ad we reject H 0 because 0 is ot i the cofidece iterval. The p- value was < 0.05 which essetially tells us othig about the magitude of the bias i the sesor. Example. (cotiued) Behavioral Learig. I this problem, the 95% cofidece iterval is 4 pˆ ±.96 40 0.55 ±.96 6.3 0.55 ± 0.58 [0.39, 0.708]. Hece, the p- value is > 0.05. We fail to reject the ull hypothesis. The cofidece iterval ot oly makes it clear that the ull hypothesis is ot rejected, it also shows what the reasoable rage of ucertaity is for the aimal s propesity to lear. Remark.9. Oe of the true values of the hypothesis-testig paradigm is sample size calculatios.

page 4: 9.07 Lecture : Hypothesis Testig IV. Two-Sample Tests A. Two-Sample t Test Example.3. Reactio Time Aalysis i a Learig Experimet. Suppose we have the reactio times o trial 50 of the 9 rats from the treatmet group ad 3 rats from the cotrol group we studied i Homework Assigmet 8-9. The mea reactio time for the treatmet group was 5.5 sec with a stadard deviatio of 3.5 sec ad the mea reactio time for the cotrol group was.5 sec with a stadard deviatio of 3. sec. What ca be said about the uderlyig mea differeces i reactio times betwee the two groups? Assume we have T x i : N (µt, T ) i =,..., T c x j : N(µ c, c ) j =,..., c Take H 0 : µ T = µ c H A : µ T µ c We have x = 5.5sec T s = 3.5sec T x c =.5sec s c = 3.sec Uder we have E x ) = E x ) ( ) = µ ad hece, H 0 ( T x c ( T E x c T µ c = 0 T T x : N ( µ, ) T T c c x : N ( µ, ) c c where T = 9 ad c = 3. Now uder the idepedece assumptio of the two samples T c Var( x T x c ) = Var ( x T ) +Var ( x c ) = + Hece, uder ad the assumptio that = =, H 0 T c T c

page 5: 9.07 Lecture : Hypothesis Testig If were kow, the we would have x T x c : N(0, ( + )). T c x T x c ( + ) T c : N(0,) ad we could base our hypothesis test o this z- statistic. Sice is ukow, let us cosider the estimate of defied by where ( T ) st + ( c )sc s = T + c T T s T = ( T ) (x j x T ) j= c c s c = ( c ) (x i x c ). i= T c Notice that if we assume that = = the ( T ) E( s T ) + ( c ) E( s c ) ( T ) + ( c ) Es ( )= = =, + + ad s is a ubiased estimate of. T c T c Give α we ca test H 0 with the followig test statistic x T xc t = s( + ) T termed the two-sample t- statistic with equal variace with = T + c degrees of freedom. We reject H 0 at level α c t > t, α Example.3 (cotiued). For this problem we have

page 6: 9.07 Lecture : Hypothesis Testig ad hece, ( T ) s T + (c )s s = c T + c 8(3.5) + ()(3.) = 0 8(0.56) +(9.6) = 0 84.48 +5.3 = 0 99.8 = 9.999 0 5.5.5 4 t = = =.9 [0 ( + )] (.88) 9 3 From the Table of the t-distributio we have t =.086. Sice t t we reject H 0 ad > 0,0.975 0,0.975 coclude that the mea reactio times of the two groups are differet. The loger average reactio time for the treatmet group suggests that learig may be impaired i that group. B. Cofidece Iterval for the True Differece i the Meas Because our test statistic follows a t- distributio, we ca costruct a 00%( α) cofidece iterval for the true differece i the measure as follows x T x c ± t, α s +. Example.3 (cotiued) Reactio Time Aalysis i a Behavioral Experimet. If we apply this formula to the data from Example.3, we obtai with α = 0.05 a 95% cofidece iterval for the true mea differece of T C 4 ±.086 [0 + ] 9 3 4 ±.086 (.37) 4 ±.86 which is [.4 6.86]. The iterval does ot cotai zero as expected based o our hypothesis test. More importatly, we see that ot oly is it ulikely that the true mea differece is 0, but that the differece could be as small as.4 sec or as large as 6.86 sec.

page 7: 9.07 Lecture : Hypothesis Testig Remark.0. If we caot assume that the variaces i the two samples are equal but ukow, the we have to devise a alterative t- statistic that takes accout of the ukow ad estimated variaces. It ca be show that a appropriate t- statistic i this case is where the umber of degrees of freedom is x T xc t = s s T c + T c T sc s ( + ) T c d ' = s s T ( T ) + c ( c ) T c This statistic is the Satterthwaite approximatio to the degrees of freedom for a t- statistic with ukow ad uequal variace (Roser, 006). C. Two-Sample Test for Biomial Proportios Example. (cotiued) Behavioral Learig Experimet. Let us assume that o Day, the rat had k = correct resposes ad o Day we had k = 5 correct resposes. Day is out of 40 trials ad Day is out of 0 trials. What ca we say about the differece i performace betwee the two days? We could treat the results from Day as truth ad compare Day to Day. A more appropriate way to proceed is to treat the data from both days as if they were observed with ucertaity. For this we assume Take x j : B(, p ) j =,..., x j : B(, p ) j =..,,.. H 0 : p = p H A : p p Our test statistic ca be derived from the estimates of p ad p k pˆ = pˆ = k

page 8: 9.07 Lecture : Hypothesis Testig Uder H 0 ad usig the Gaussia approximatio to the biomial p( p) pˆ N( p, ) p( p) pˆ N( p, ) ad if we assume the samples are idepedet we have the approximate z- statistic ( pˆ pˆ ) z = N(0,) [ pˆ ( pˆ ) + ] where, sice p is ukow, we estimate it as pˆ defied as pˆ ˆp k k ˆ + + p = = + + Hece, give a level α we have the approximate z- statistic is pˆ pˆ z = [ pˆ ( pˆ) + ] We reject H 0 : if z > z α ad the approximate p- value is p = [ Φ ( z )]. A alterative form of z that icludes the cotiuity correctio (Lecture 8) to make the Gaussia approximatio to the biomial more accurate is defied as pˆ pˆ + z =. [ pˆ ( pˆ) + ] Example. (cotiued) Behavioral Learig Experimet. For this problem, we have or k + k +5 37 pˆ = = = = 0.667 + 60 60 pˆ = 0.3833

page 9: 9.07 Lecture : Hypothesis Testig 0.55 0.75 z = [(0.667)(0.3833) + ] 40 0 0.0 0.0 0.0 = = = =.50. 3 [0.3638 ] (0.0773) 0.33 40 Because z α = z =.96 we have z > z so we fail to reject the ull hypothesis of o 0.975 0.975 differece i performace. Here the p- value is Similarly, the 95% cofidece iterval is p = [ Φ (.50)] = [0.0668] = 0.336. or pˆ pˆ ± z 0.975 [ pˆ ( pˆ) + ] 0.0 ±.96(0.33) 0.0 ± 0.6 [-0.46, 0.06]. As expected, the 95% cofidece iterval icludes zero which explais why we fail to reject H 0. The cofidece iterval suggests that there may be evidece of improved performace o the secod day relative to the first sice most of the iterval icludes egative values of the differece. The value of the cotiuity correctio is + = + = 0.0375. This chages z from 80 40.50 to z =.465 ad the corrected p-value is 0.44. Remark.. The two sample biomial data ca also be aalyzed as a cotigecy table Correct Icorrect Total Day k k Day k k k + k + ( k + k ) + A cotigecy table is a table cosistig of two rows cross-classified by two colums. The cotigecy table aalysis uses a test statistic based o a chi-squared distributio with oe degree of freedom ad gives the same result as the z- test. This is to be expected sice we showed i Lecture 4 that the square of a stadard Gaussia radom variable is a chi-squared radom variable with oe degree of freedom. We will study this i Homework Assigmet 9. Remark.. Use of the Gaussia approximatio to the biomial to costruct this z- test is valid provided p 5, p 5, ( p ) 5 ad ( p ) 5. This correspods to the coditio

page 0: 9.07 Lecture : Hypothesis Testig that the aalysis of the cotigecy table usig a chi-squared statistic is valid if the expected umber of observatios per cell is at least 5. Remark.3. This problem could easily be aalyzed usig a Bayesia aalysis i which p ad p had uiform priors. We could the use Algorithm 0. to compare the posterior desities of p ad p. Remark.4. We ca perform a likelihood aalysis ad compare the overlap i the likelihoods. We could alteratively costruct α cofidece itervals for p ad p separately usig the likelihood theory ad see if they overlap. Remark.5. All the tests we have discussed here ca be derived from the likelihood theory we preseted by the likelihood ratio procedure. A detailed discussio of this approach is beyod the scope of this course. (See DeGroot ad Schervish, 00; Rice, 007). V. Summary Hypothesis testig is a key part of classical statistics. It emphasizes procedures based primarily o the Gaussia distributio ad Gaussia approximatios. The hypothesis testig paradigm is very useful for prospective plaig of studies usig sample size formulae. Cofidece itervals are always more iformative tha p-values. Cofidece itervals report the aalysis results o the physical scale o which the problem takes place. May of the classical problems i hypothesis testig ca ow be carried out i a more iformative way usig more moder approaches such as Mote Carlo methods, bootstrappig, ad Bayesia methods. Ackowledgmets I am grateful to Julie Scott for techical assistace i preparig this lecture ad to Jim Mutch for careful proofreadig ad commets. Refereces DeGroot MH, Schervish MJ. Probability ad Statistics, 3rd editio. Bosto, MA: Addiso Wesley, 00. Rice JA. Mathematical Statistics ad Data Aalysis, 3 rd editio. Bosto, MA, 007. Roser B. Fudametals of Biostatistics, 6 th editio. Bosto, MA: Duxbury Press, 006.

MIT OpeCourseWare https://ocw.mit.edu 9.07 Statistics for Brai ad Cogitive Sciece Fall 06 For iformatio about citig these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.