Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Similar documents
Null Hypothesis Significance Testing p-values, significance level, power, t-tests

Frequentist Statistics and Hypothesis Testing Spring

Compute f(x θ)f(θ) dθ

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Chapter 5: HYPOTHESIS TESTING

Gov 2000: 6. Hypothesis Testing

6.4 Type I and Type II Errors

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Introductory Econometrics. Review of statistics (Part II: Inference)

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

Exam 2 Practice Questions, 18.05, Spring 2014

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

PhysicsAndMathsTutor.com

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Confidence Intervals for Normal Data Spring 2018

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Difference between means - t-test /25

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Chapter 7: Hypothesis Testing

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

The Chi-Square Distributions

STAT Chapter 8: Hypothesis Tests

Confidence Intervals for Normal Data Spring 2014

Comparison of Bayesian and Frequentist Inference

280 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Tests of Statistical Hypotheses

Statistical Inference. Hypothesis Testing

Chapter 9. Hypothesis testing. 9.1 Introduction

For use only in [the name of your school] 2014 S4 Note. S4 Notes (Edexcel)

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Bayesian Inference for Normal Mean

MTH U481 : SPRING 2009: PRACTICE PROBLEMS FOR FINAL

Chapter 24. Comparing Means

Comparing two samples

STA Module 10 Comparing Two Proportions

Introduction to Statistical Inference

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Confidence Intervals with σ unknown

Conjugate Priors: Beta and Normal Spring 2018

MEI STRUCTURED MATHEMATICS STATISTICS 2, S2. Practice Paper S2-B

Hypothesis tests

T test for two Independent Samples. Raja, BSc.N, DCHN, RN Nursing Instructor Acknowledgement: Ms. Saima Hirani June 07, 2016

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Statistical Preliminaries. Stony Brook University CSE545, Fall 2016

Part 3: Parametric Models

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

6 Sample Size Calculations

Chapter 9: Hypothesis Testing Sections

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

1 Statistical inference for a population mean

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Gov Univariate Inference II: Interval Estimation and Testing

Probability and Probability Distributions. Dr. Mohammed Alahmed

Conjugate Priors: Beta and Normal Spring 2018

F79SM STATISTICAL METHODS

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

7 Estimation. 7.1 Population and Sample (P.91-92)

Two Sample Problems. Two sample problems

Section 6.2 Hypothesis Testing

14.30 Introduction to Statistical Methods in Economics Spring 2009

Evaluating Hypotheses

This does not cover everything on the final. Look at the posted practice problems for other topics.

Relating Graph to Matlab

Comparing Systems Using Sample Data

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Lab #12: Exam 3 Review Key

Topic 15: Simple Hypotheses

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

8.1-4 Test of Hypotheses Based on a Single Sample

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

The Chi-Square Distributions

Background to Statistics

Machine Learning

STA 2101/442 Assignment 2 1

Chapter Three. Hypothesis Testing

CONTINUOUS RANDOM VARIABLES

MTMS Mathematical Statistics

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Sampling Distributions: Central Limit Theorem

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Contents. 22S39: Class Notes / October 25, 2000 back to start 1

UNIVERSITY OF TORONTO Faculty of Arts and Science

18.05 Practice Final Exam

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Population 1 Population 2

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

Math 494: Mathematical Statistics

Advanced Herd Management Probabilities and distributions

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Math 2000 Practice Final Exam: Homework problems to review. Problem numbers

(ii) at least once? Given that two red balls are obtained, find the conditional probability that a 1 or 6 was rolled on the die.

Transcription:

Null Hypothesis Significance Testing p-values, significance level, power, t-tests 18.05 Spring 2017

Understand this figure f(x H 0 ) x reject H 0 don t reject H 0 reject H 0 x = test statistic f (x H 0 ) = pdf of null distribution = green curve Rejection region is a portion of the x-axis. Significance = probability over the rejection region = red area. April 11, 2017 2 / 26

Simple and composite hypotheses Simple hypothesis: the sampling distribution is fully specified. Usually the parameter of interest has a specific value. Composite hypotheses: the sampling distribution is not fully specified. Usually the parameter of interest has a range of values. Example. A coin has probability θ of heads. Toss it 30 times and let x be the number of heads. (i) H: θ = 0.4 is simple. x binomial(30, 0.4). (ii) H: θ > 0.4 is composite. x binomial(30, θ) depends on which value of θ is chosen. April 11, 2017 3 / 26

Extreme data and p-values Hypotheses: H 0, H A. Test statistic: value: x, computed from data. Null distribution: f (x H 0 ) (assumes null hypothesis is true) Sides: H A determines if the rejection region is one or two-sided. Rejection region/significance: P(x in rejection region H 0 ) = α. The p-value is a tool to check if the test statistic is in the rejection region. It is also a measure of the evidence for rejecting H 0. p-value: P(data at least as extreme as x H 0 ) Data at least as extreme is defined by the sidedness of the rejection region. April 11, 2017 4 / 26

Extreme data and p-values Example. Suppose we have the right-sided rejection region shown below. Also suppose we see data with test statistic x = 4.2. Should we reject H 0? f(x H 0 ) c α 4.2 don t reject H 0 reject H 0 x answer: The test statistic is in the rejection region, so reject H 0. Alternatively: blue area < red area Significance: α = P(x in rejection region H 0 ) = red area. p-value: p = P(data at least as extreme as x H 0 ) = blue area. Since p < α we reject H 0. April 11, 2017 5 / 26

Extreme data and p-values Example. Now suppose x = 2.1 as shown. Should we reject H 0? f(x H 0 ) don t reject H 0 2.1 c α reject H 0 x answer: Test statistic not in the rejection region: don t reject H 0. Alternatively: blue area > red area Significance: α = P(x in rejection region H 0 ) = red area. p-value: p = P(data at least as extreme as x H 0 ) = blue area. Since p > α we don t reject H 0. April 11, 2017 6 / 26

Critical values Critical values: The boundary of the rejection region are called critical values. Critical values are labeled by the probability to their right. They are complementary to quantiles: c 0.1 = q 0.9 Example: for a standard normal c 0.025 = 1.96 and c 0.975 = 1.96. In R, for a standard normal c 0.025 = qnorm(0.975). April 11, 2017 7 / 26

Two-sided p-values These are trickier: what does at least as extreme mean in this case? Remember the p-value is a trick for deciding if the test statistic is in the region. If the significance (rejection) probability is split evenly between the left and right tails then p = 2min(left tail prob. of x, right tail prob. of x) f(x H 0 ) c 1 α/2 x c α/2 reject H 0 don t reject H 0 reject H 0 x x is outside the rejection region, so p > α: do not reject H 0 April 11, 2017 8 / 26

Concept question 1. You collect data from an experiment and do a left-sided z-test with significance 0.1. You find the z-value is 1.8 (i) Which of the following computes the critical value for the rejection region? (a) pnorm(0.1, 0, 1) (b) pnorm(0.9, 0, 1) (c) pnorm(0.95, 0, 1) (d) pnorm(1.8, 0, 1) (e) 1 - pnorm(1.8, 0, 1) (f) qnorm(0.05, 0, 1) (g) qnorm(0.1, 0, 1) (h) qnorm(0.9, 0, 1) (i) qnorm(0.95, 0, 1) (ii) Which of the above computes the p-value for this experiment? (iii) Should you reject the null hypothesis? (a) Yes (b) No answer: (i) g. (ii) d. (iii) No. (Draw a picture!) April 11, 2017 9 / 26

Error, significance level and power True state of nature Our Reject H 0 Type I error correct decision decision Don t reject H 0 correct decision Type II error H 0 H A Significance level = P(type I error) = probability we incorrectly reject H 0 = P(test statistic in rejection region H 0 ) = P(false positive) Power = probability we correctly reject H 0 = P(test statistic in rejection region H A ) = 1 P(type II error) = P(true positive) H A determines the power of the test. Significance and power are both probabilities of the rejection region. Want significance level near 0 and power near 1. April 11, 2017 10 / 26

Table question: significance level and power The rejection region is boxed in red. The corresponding probabilities for different hypotheses are shaded below it. x 0 1 2 3 4 5 6 7 8 9 10 H 0 : p(x θ = 0.5).001.010.044.117.205.246.205.117.044.010.001 H A : p(x θ = 0.6).000.002.011.042.111.201.251.215.121.040.006 H A : p(x θ = 0.7).000.0001.001.009.037.103.200.267.233.121.028 1. Find the significance level of the test. 2. Find the power of the test for each of the two alternative hypotheses. answer: 1. Significance level = P(x in rejection region H 0 ) = 0.11 2. θ = 0.6: power = P(x in rejection region H A ) = 0.18 θ = 0.7: power = P(x in rejection region H A ) = 0.383 April 11, 2017 11 / 26

Concept question 1. The power of the test in the graph is given by the area of f(x H A ) f(x H 0 ) R 1 R 2 R 3 R 4 reject H 0 region. non-reject H 0 region x (a) R 1 (b) R 2 (c) R 1 + R 2 (d) R 1 + R 2 + R 3 answer: (c) R 1 + R 2. Power = P(rejection region H A ) = area R 1 + R 2. April 11, 2017 12 / 26

Concept question 2. Which test has higher power? f(x H A ) f(x H 0 ) reject H 0 region. do not reject H 0 region x f(x H A ) f(x H 0 ) reject H 0 region. do not reject H 0 region x (a) Top graph (b) Bottom graph April 11, 2017 13 / 26

Solution answer: (a) The top graph. Power = P(x in rejection region H A ). In the top graph almost all the probability of H A is in the rejection region, so the power is close to 1. April 11, 2017 14 / 26

Discussion question The null distribution for test statistic x is N(4, 8 2 ). The rejection region is {x 20}. What is the significance level and power of this test? answer: 20 is two standard deviations above the mean of 4. Thus, P(x 20 H 0 ) 0.025 This was a trick question: we can t compute the power without an alternative distribution. April 11, 2017 15 / 26

One-sample t-test Data: we assume normal data with both µ and σ unknown: x 1, x 2,..., x n N(µ, σ 2 ). Null hypothesis: µ = µ 0 for some specific value µ 0. Test statistic: t = x µ 0 s/ n where s 2 = 1 n (x i x) 2. n 1 Here t is the Studentized mean and s 2 is the sample variance. Null distribution: f (t H 0 ) is the pdf of T t(n 1), the t distribution with n 1 degrees of freedom. Two-sided p-value: p = P( T > t ). R command: pt(x,n-1) is the cdf of t(n 1). http://mathlets.org/mathlets/t-distribution/ i=1 April 11, 2017 16 / 26

Board question: z and one-sample t-test For both problems use significance level α = 0.05. Assume the data 2, 4, 4, 10 is drawn from a N(µ, σ 2 ). Suppose H 0 : µ = 0; H A : µ 0. 1. Is the test one or two-sided? If one-sided, which side? 2. Assume σ 2 = 16 is known and test H 0 against H A. 3. Now assume σ 2 is unknown and test H 0 against H A. Answer on next slide. April 11, 2017 17 / 26

Solution We have x = 5, s 2 = 9+1+1+25 3 = 12 1. Two-sided. A standardized sample mean above or below 0 is consistent with H A. 2. We ll use the standardized mean z for the test statistic (we could also use x). The null distribution for z is N(0, 1). This is a two-sided test so the rejection region is (z z 0.975 or z z 0.025 ) = (, 1.96] [1.96, ) Since z = ( x 0)/(4/2) = 2.5 is in the rejection region we reject H 0 in favor of H A. Repeating the test using a p-value: p = P( z 2.5 H 0 ) = 0.012 Since p < α we reject H 0 in favor of H A. Continued on next slide. April 11, 2017 18 / 26

Solution continued 3. We ll use the Studentized t = x µ s/ for the test statistic. The null n distribution for t is t 3. For the data we have t = 5/ 3. This is a two-sided test so the p-value is Since p > α we do not reject H 0. p = P( t 5/ 3 H 0 ) = 0.06318 April 11, 2017 19 / 26

Two-sample t-test: equal variances Data: we assume normal data with µ x, µ y and (same) σ unknown: x 1,..., x n N(µ x, σ 2 ), y 1,..., y m N(µ y, σ 2 ) Null hypothesis H 0 : µ x = µ y. Pooled variance: Test statistic: sp 2 = (n 1)s2 x + (m 1)sy 2 n + m 2 t = x ȳ s p ( 1 n + 1 ). m Null distribution: f (t H 0 ) is the pdf of T t(n + m 2) In general (so we can compute power) we have ( x ȳ) (µ x µ y ) s p t(n + m 2) Note: there are more general formulas for unequal variances. April 11, 2017 20 / 26

Board question: two-sample t-test Real data from 1408 women admitted to a maternity hospital for (i) medical reasons or through (ii) unbooked emergency admission. The duration of pregnancy is measured in complete weeks from the beginning of the last menstrual period. Medical: 775 obs. with x = 39.08 and s 2 = 7.77. Emergency: 633 obs. with x = 39.60 and s 2 = 4.95 1. Set up and run a two-sample t-test to investigate whether the duration differs for the two groups. 2. What assumptions did you make? April 11, 2017 21 / 26

Solution The pooled variance for this data is s 2 p = 774(7.77) + 632(4.95) 1406 The t statistic for the null distribution is x ȳ s p = 3.8064 ( 1 775 + 1 ) =.0187 633 Rather than compute the two-sided p-value using 2*tcdf(-3.8064,1406) we simply note that with 1406 degrees of freedom the t distribution is essentially standard normal and 3.8064 is almost 4 standard deviations. So P( t 3.8064) = P( z 3.8064) which is very small, much smaller than α =.05 or α =.01. Therefore we reject the null hypothesis in favor of the alternative that there is a difference in the mean durations. Continued on next slide. April 11, 2017 22 / 26

Solution continued 2. We assumed the data was normal and that the two groups had equal variances. Given the big difference in the sample variances this assumption might not be warranted. Note: there are significance tests to see if the data is normal and to see if the two groups have the same variance. April 11, 2017 23 / 26

Table discussion: Type I errors Q1 1. Suppose a journal will only publish results that are statistically significant at the 0.05 level. What percentage of the papers it publishes contain type I errors? April 11, 2017 24 / 26

Table discussion: Type I errors Q1 1. Suppose a journal will only publish results that are statistically significant at the 0.05 level. What percentage of the papers it publishes contain type I errors? answer: With the information given we can t know this. The percentage could be anywhere from 0 to 100! See the next two questions. April 11, 2017 24 / 26

Table discussion: Type I errors Q2 2. Jerry desperately wants to cure diseases but he is terrible at designing effective treatments. He is however a careful scientist and statistician, so he randomly divides his patients into control and treatment groups. The control group gets a placebo and the treatment group gets the experimental treatment. His null hypothesis H 0 is that the treatment is no better than the placebo. He uses a significance level of α = 0.05. If his p-value is less than α he publishes a paper claiming the treatment is significantly better than a placebo. (a) Assuming that his treatments are never effective, what percentage of his experiments result in published papers? (b) What percentage of his published papers contain type I errors, i.e., describe treatments that are no better than placebo? answer: (a) Since in all of his experiments H 0 is true, roughly 5%, i.e. the significance level, of his experiments will have p < 0.05 and be published. (b) Since he s always wrong, all of his published papers contain type I errors. April 11, 2017 25 / 26

Table discussions: Type I errors: Q3 3. Efrat is a genius at designing treatments, so all of her proposed treatments are effective. She s also a careful scientist and statistician so she too runs double-blind, placebo controlled, randomized studies. Her null hypothesis is always that the new treatment is no better than the placebo. She also uses a significance level of α = 0.05 and publishes a paper if p < α. (a) How could you determine what percentage of her experiments result in publications? (b) What percentage of her published papers contain type I errors, i.e. describe treatments that are no better than placebo? answer: 3. (a)the percentage that get published depends on the power of her treatments. If they are only a tiny bit more effective than placebo then roughly 5% of her experiments will yield a publication. If they are a lot more effective than placebo then as many as 100% could be published. (b) None of her published papers contain type I errors. April 11, 2017 26 / 26