Notes on Hypothesis Testing, Type I and Type II Errors

Similar documents
Chapter 5: Hypothesis testing

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

1036: Probability & Statistics

MA238 Assignment 4 Solutions (part a)

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Chapter 22: What is a Test of Significance?

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

A statistical method to determine sample size to estimate characteristic value of soil parameters

Power and Type II Error

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Math 140 Introductory Statistics

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Common Large/Small Sample Tests 1/55

Successful HE applicants. Information sheet A Number of applicants. Gender Applicants Accepts Applicants Accepts. Age. Domicile

6 Sample Size Calculations

LESSON 20: HYPOTHESIS TESTING

Chapter 13, Part A Analysis of Variance and Experimental Design

University of California, Los Angeles Department of Statistics. Hypothesis testing

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

1 Constructing and Interpreting a Confidence Interval

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Data Analysis and Statistical Methods Statistics 651

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Chapter 4 Tests of Hypothesis

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Accuracy assessment methods and challenges

Chapter 23: Inferences About Means

6.3 Testing Series With Positive Terms

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Topic 18: Composite Hypotheses

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Properties and Hypothesis Testing

Hypothesis Testing (2) Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

A Confidence Interval for μ

Frequentist Inference

Chapter 6 Sampling Distributions

Chapter 9, Part B Hypothesis Tests

Comparing your lab results with the others by one-way ANOVA

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

GG313 GEOLOGICAL DATA ANALYSIS

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Sample Size Determination (Two or More Samples)

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

1 Constructing and Interpreting a Confidence Interval

Problem Set 4 Due Oct, 12

MA131 - Analysis 1. Workbook 2 Sequences I

Statistical Inference About Means and Proportions With Two Populations

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lesson 10: Limits and Continuity

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

f(x)dx = 1 and f(x) 0 for all x.

Infinite Sequences and Series

1 Inferential Methods for Correlation and Regression Analysis

Hypothesis Testing. H 0 : θ 1 1. H a : θ 1 1 (but > 0... required in distribution) Simple Hypothesis - only checks 1 value

Last Lecture. Wald Test

10.6 ALTERNATING SERIES

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Sequences I. Chapter Introduction

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Final Examination Solutions 17/6/2010

Series III. Chapter Alternating Series

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Stat 200 -Testing Summary Page 1

Lecture 2: Monte Carlo Simulation

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Lecture 12: Hypothesis Testing

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Lecture 3: August 31

Statistics 511 Additional Materials

Data Analysis and Statistical Methods Statistics 651

STATISTICAL INFERENCE

11.6 Absolute Convergence and the Ratio and Root Tests

This is an introductory course in Analysis of Variance and Design of Experiments.

MIT : Quantitative Reasoning and Statistical Methods for Planning I

Chapter 1 (Definitions)

Posted-Price, Sealed-Bid Auctions

Chapter two: Hypothesis testing

Statistical inference: example 1. Inferential Statistics

The Comparison Tests. Examples. math 131 infinite series, part iii: comparison tests 18

p we will use that fact in constructing CI n for population proportion p. The approximation gets better with increasing n.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Median and IQR The median is the value which divides the ordered data values in half.

2 f(x) dx = 1, 0. 2f(x 1) dx d) 1 4t t6 t. t 2 dt i)

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Transcription:

Joatha Hore PA 818 Fall 6 Notes o Hypothesis Testig, Type I ad Type II Errors Part 1. Hypothesis Testig Suppose that a medical firm develops a ew medicie that it claims will lead to a higher mea cure rate. Suppose the old cure rate was µ. The firm claims that the ew mea rate is µ 1 > µ. How ca the firm verify their claim? By usig Hypothesis Testig. Null Hypothesis: The positio that must get the beefit of doubt. This is usually the covetioal wisdom. I this case, H : µ µ Alterative Hypothesis: The claim we seek to prove, ofte called the Research Hypothesis. H a : µ> µ Aim of the Researcher: Fid evidece i favor of H a. So he will reject H if the calculated sample mea is high relative to µ. Importat Step: Assume: µ=µ. This is doe to give the maximum chace for H to be true if we get a high sample mea. Remember that the questio of rejectig the ull oly arises if the sample mea is high. Next: Assumig µ=µ, calculate the probability of observig a sample mea of x or higher. This is because eve after givig the ull so much beefit of doubt, if ( x) is low, chaces are that the ull hypothesis does t hold. Essetially, we are askig what is the probability that we observe the sample mea that we actually see, coditioal o the ull beig true. If this probability is very low, the that is evidece that the ull hypothesis does ot hold. Basically, the probability we wat to compute is: μ x μ [ x H true] = [ x μ = μ ] = p o = This is the p-value. If p <, the reject H. So what is? is a predetermied probability level that specifies the proportio of times we are willig to reject H whe H is true. If we reject H whe H is true, we are makig a error, so = proportio of times we are willig to make this error How high is depeds o how costly (i terms of reputatio, moetary cost, etc.) this error is likely to be If the cost is very high, we should set very low. Typically =.1,.5,.5 or.1. Suppose p >. Should we reject H? NO!! Suppose p =.6 ad =.5. The p =.6 implies that x 6% of the time that H is true. So if we reject H because x, we will be committig a error 6% of the time. But =.5 implies that we are oly willig to make these mistakes 5% of the time, so we should ot reject H.

Alterative Method to Coduct Hypothesis Tests We ca also use critical values to coduct hypothesis tests. Here, we calculate a critical value, c, such that [ c μ = μ ] c = μ + z = μ c μ c μ = The, if c, we ca reject H. The iterpretatio of c is that eve though H is true, by rejectig H we are committig a error less tha proportio of times. This is withi tolerable levels. EAMPLE 1. A store wats to istall a ew billig system that will be cost effective oly if your mothly accout exceeds $17. I a sample of 4 accouts, the sample mea mothly accout was $178. Suppose = $65 ad =.5. Should the ew system be istalled? H : µ 17 H a : µ> 17 First, work out the p-value: [ > 178 H true] = [ > 178 μ = 17] = [ Z >.46] =.69 <. 5 = z 178 17 μ = > 65 4 Sice the p-value is lower tha, we ca reject the ull hypothesis ad istall the ew system. The iterpretatio is as follows. Give µ 17, the probability of observig a sample mea of 178 is.69%. So it is reasoable to coclude that µ>17. This coclusio may be wrog, but it will oly be wrog less tha 1% of the time. Hece there is little risk i rejectig H. Alteratively, we could use the critical value approach to compute the rejectio regio. So we eed to fid c such that: 17 μ c [ c μ = 17] = =.5 =.5 65 4 c 17 = z.5 = 1.645 65 4 65 c = 17 + 1.645 = 175.35 4 The, sice =178>175.35, we ca reject H.

What we just did was a Right-Tailed Test (H : µ µ ). A Left-Tailed Test is very similar. The, H : µ µ H a : µ< µ Ad so we reject H if the sample mea is too LOW. The costructio of a p-value is very similar to before: μ x μ [ < x H otrue] = [ < x μ = μ ] = < = p Ad we reject H if p <. The oly differece i a Left-Tailed Test is that we wat [ x H otrue] [ x H true] < istead of o i the Right-Tailed Test case (because low values of the sample mea provide evidece agaist the ull). We ca also use the critical value approach, where the critical value is c = μ z (istead of c = μ + z i the Right-Tailed Test). EAMPLE. Say that H : µ ad H a : µ<. Suppose that = 6, =.1, = ad 1.63 μ [ < 1.63 μ = ] = < = [ Z <.9] =. 1788 6 Sice p >, we caot reject the ull hypothesis that µ. We ca also use the critical value approach to costruct the rejectio regio. 6 c = μ z = 1.645 = 1.3346 The, sice =1.63 > 1.3346, we caot reject the ull. =1.63. The the p-value is: These were both Oe-Tailed Tests. Fially, we cosider a Two-Tailed Test. H : µ = µ H a : µ µ I the case of a Two-Tailed Test, we reject the ull hypothesis if the sample mea turs out to be too big OR too small. The calculatio of the p-value is slightly differet to a Oe-Tailed Test. Now, we wat to calculate μ x x x μ μ μ μ < + > = Z > = p The, reject H if p <. We ca also use critical values to derive the rejectio regio, as before. Now however, there must be two critical values: c1 = μ + z c = μ z So we reject the ull if:

μ z μ + z μ z, EAMPLE 3. H : µ = 17.9, H a : µ 17.9. Suppose that = 3.87, =.5, = 1 ad =17.55. So let s first compute the p-value: x μ 17.55 17.9 So, p = Z > = Z > = [ Z > 1.19] = (.117) =. 34 3.87 1 Sice p =.34 >.5 =, we caot reject the ull. Alteratively, we ca fid the rejectio regio usig critical values: 3.87 c1 = μ + z = 17.9 + 1.96 = 17.9 +.7585 = 17.8485 1 c = μ z = 17.9 1.96 3.87 1 The sice = 17.55 [ 16.33148,17.8485] = 17.9.7585 = 16.33148, we caot reject the ull. Part. Type I ad Type II Errors H True H False Accept H OK Type II Error Reject H Type I Error OK Type I errors are very serious you do ot wat to make wild claims ad be prove wrog later. i our previous discussio is the probability of makig a Type I error. Hypothesis tests are desiged to make low, sice we choose a low value for! Type II errors occur the the ull hypothesis is false but we fail to reject it. These errors are ot as serious, but we would like to avoid them. There is a trade-off betwee Type I ad Type II errors. If we do ot chage the sample size, the oly way to reduce Type II errors is by icreasig the probability of makig Type I errors. However, both ca be reduced by icreasig. Type II errors are always computed for a give. Thik of the probability of NOT makig a Type II error. To compute this, we eed to specify a value for µ, µ a, which we (the researcher) believe is true istead of µ. The the probability of ot committig a Type II error is: p(µ a ) = [Reject H at sigificace H False] = [ falls i the rejectio regio µ = µ a ] So we have that: [Type II Error] = 1 - p(µ a ) p(µ a ) = Power of the test give µ = µ a.

EAMPLE 4. Suppose the old maufacturig process produces 8 uits per hour. We wat to evaluate the claim that a ew maufacturig process produces 85 uits per hour. Let =.5, = ad = 5. Hece, H : µ 8 H a : µ> 8 So first, we compute the rejectio regio. μ [ c μ = 8] = =.5 c 8 = z.5 = 1.645 5 c = 8 + 1.645 84 5 So we reject the ull if 84. c 8 5 =.5 Next, fid the probability of a Type II error give µ a = 85. p(85) = obability ot committig a Type II error 5 μ 84 85 = [ 5 84 μ = 85] = =. 6595 5 So the power of this test is.6595, so the probability of a Type II error is 1.6595 =.345. Hece, eve if the claim is correct, the statistical test will fail to show it 34% of the time. We ca reduce the probability of a Type II error by icreasig. Now, suppose = 1. The rejectio regio is: 8 μ c [ c μ = 8] = =.5 =.5 1 c 8 = z.5 = 1.645 c = 8 + 1.645 8.83 1 1 1 μ 8.83 85 So the power is: p(85) = [ 1 8.83 μ = 85] = =. 896 1 So the probability of a Type II error is 1 p(85) = 1.896 =.138. Hece, doublig the sample size reduced the probability of a Type II error from 34% to 1%.