PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Similar documents
z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

1 Hypothesis testing for a single mean

Chapter 5: HYPOTHESIS TESTING

CENTRAL LIMIT THEOREM (CLT)

Introduction to Statistics

Introductory Econometrics. Review of statistics (Part II: Inference)

Chapter 7: Hypothesis Testing

Midterm 1 and 2 results

Content by Week Week of October 14 27

280 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Tests of Statistical Hypotheses

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Hypothesis tests

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

16.400/453J Human Factors Engineering. Design of Experiments II

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Samples and Populations Confidence Intervals Hypotheses One-sided vs. two-sided Statistical Significance Error Types. Statistiek I.

F79SM STATISTICAL METHODS

Chapter 24. Comparing Means

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

Ch. 7. One sample hypothesis tests for µ and σ

Chapter 9 Inferences from Two Samples

23. MORE HYPOTHESIS TESTING

Lecture 1: Probability Fundamentals

Lab #12: Exam 3 Review Key

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

1 Statistical inference for a population mean

Student s t-distribution. The t-distribution, t-tests, & Measures of Effect Size

Econ 325: Introduction to Empirical Economics

Lecture 10: Comparing two populations: proportions

OHSU OGI Class ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Brainerd Basic Statistics Sample size?

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

POLI 443 Applied Political Research

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Introduction to Statistical Inference

Two Sample Problems. Two sample problems

Evaluating Hypotheses

Sampling Distributions: Central Limit Theorem

Introduction to Business Statistics QM 220 Chapter 12

TOPIC 12: RANDOM VARIABLES AND THEIR DISTRIBUTIONS

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Chapter 7 Comparison of two independent samples

Two-sample inference: Continuous data

Hypothesis Testing. ) the hypothesis that suggests no change from previous experience

INTERVAL ESTIMATION AND HYPOTHESES TESTING

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Difference between means - t-test /25

Statistics and Sampling distributions

Lecture 15: Inference Based on Two Samples

One-sample categorical data: approximate inference

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

For use only in [the name of your school] 2014 S4 Note. S4 Notes (Edexcel)

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Harvard University. Rigorous Research in Engineering Education

Chapter 3. Comparing two populations

Last few slides from last time

Confidence Intervals, Testing and ANOVA Summary

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Originality in the Arts and Sciences: Lecture 2: Probability and Statistics

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Chapter 23. Inference About Means

Medical statistics part I, autumn 2010: One sample test of hypothesis

CONTINUOUS RANDOM VARIABLES

Lecture 17. Ingo Ruczinski. October 26, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

Quantitative Analysis and Empirical Methods

Precept 4: Hypothesis Testing

Chapter 12 - Lecture 2 Inferences about regression coefficient

Topic 15: Simple Hypotheses

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Lecture 4: Random Variables and Distributions

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

Pump failure data. Pump Failures Time

Chapter 7: Statistical Inference (Two Samples)

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

Midterm Exam 1 Solution

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Lecture on Null Hypothesis Testing & Temporal Correlation

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

review session gov 2000 gov 2000 () review session 1 / 38

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Probability and Statistics

Transcription:

PHP2510: Principles of Biostatistics & Data Analysis Lecture X: Hypothesis testing PHP 2510 Lec 10: Hypothesis testing 1

In previous lectures we have encountered problems of estimating an unknown population mean and constructing confidence intervals for the mean. We have given the example of cholesterol levels for people who go on a new diet that may help lower the cholesterol level. If we have a sample of these people and observe their cholesterol levels after they stay on the diet for some time, we can estimate the expected cholesterol level for people on this diet. Assume X 1, X 2,...,X n N(µ, σ 2 ), we can estimate µ with X and construct a (1 α)100% confidence interval X ± tα/2,df=n 1 S/ n PHP 2510 Lec 10: Hypothesis testing 2

The numerical example we gave was a sample of size 10: 174 178 196 181 181 197 185 167 173 176 X = 180.8, SX 2 = 93. A 95% confidence interval for µ is X ± t α/2,df=n 1 S/ n = 180.8 ± 2.26 93/10 = (173.9, 187.7). If we know that the mean cholesterol level from the general population is 200 (for example, you looked it up from cdc website), another question we may ask is: Is the sample mean we observe (180.8) compatible with the hypothesis that people on the diet actually have the same mean cholesterol level as the general population? What do we mean by compatible? In other words, if we had sampled 10 people randomly from the general population, instead of the diet sub population, could we have observed 180.8 as well? PHP 2510 Lec 10: Hypothesis testing 3

To better illustrate the logic behind hypothesis testing, we will study a side example on whether the data are compatible with the hypothesis. Suppose we have a coin and the hypothesis is that this is a regular coin that head or tail has equal probability when I flip it. We refer to this hypothesis the null hypothesis and denote it by H 0. If now I flip it for 10 times and see all heads, you may start doubting the H 0 because it seems unlikely to observe all heads for 10 times if H 0 were true. How unlikely is it? The probability of observing 10 heads out of 10 flips based on H 0 (a fair coin) is ( ) 10.5 10 (1.5) 0 =.5 10 = 0.00098 10 This is a very small probability not impossible, but rather unlikely. The data do not appear to be compatible with the hypothesis. PHP 2510 Lec 10: Hypothesis testing 4

There are two possible explanations: either something really unlikely happened (unlikely impossible), or that the hypothesis is wrong. Action: Reject the H 0. PHP 2510 Lec 10: Hypothesis testing 5

Does this suggest that we should simply compute the probability of observing the data under the null hypothesis (H 0 )? Suppose we flip a coin for 200 times and observe 100 heads and 100 tails. It seems there is no reason to doubt that this coin is fair, P(H) = P(T) =.5. However, under the model X Binomial(200,.5), P(X = 100) = This is not a large probability itself. ( ) 200.5 100 (1.5) 100 = 0.056 100 The probability of getting exactly 500 heads from 1000 flips is only.025. What should we do? We certainly are not willing to conclude that the observation is incompatible with the hypothesis in this case. PHP 2510 Lec 10: Hypothesis testing 6

We need an alternative hypothesis, which we denote with H 1 or H a. For example, if the coin is not fair, the alternative could be that p.5, but we do not know whether p >.5 or p <.5. Under the model H 0, which values are as or more extreme than the one observed? By more extreme, we mean data that would make you lean towards the alternative more compared to the data you observed. # of heads 0 1 2 3 4 5 6 7 8 9 10 Probability.000977.00977.0439.117.205.246.205.117.0439.00977.000977 PHP 2510 Lec 10: Hypothesis testing 7

Since our alternative is p.5, extreme observations are either small number of heads or large number of heads values that are far away from the EX = 10.5 = 5. As extreme as the observation 2 heads is 8 heads, more extreme than 2 heads is 0,1, or 9, or 10 heads. Now we can ask the question: what is probability of observing something as or more extreme than the actual data, under H 0 )? P 0 (X = 0 or 1 or 2 or 8 or 9, or 10) = P 0 (X = 0) + P 0 (X = 1) + P 0 (X = 2) + P 0 (X = 8) + P 0 (X = 9) + P 0 (X = 10) =.11 (We use subscript 0 in P 0 to indicate the probability is calculated under H 0 model.) PHP 2510 Lec 10: Hypothesis testing 8

This probability says that, if there are a lot of people doing the same experiment of flipping a fair coin for 10 times, about 11% of them will see values as or more extreme than 2 heads. If you don t consider 11% a very rare probability, then you may not be surprised when the observation is 2. PHP 2510 Lec 10: Hypothesis testing 9

We have actually done hypothesis testing already! Given the data observed 2 heads out of 10 flips of a coin, We tested the hypothesis H 0 that the coin is fair, against the alternative hypothesis H 1 that the coin is not fair. We computed the probability of observing results as or more extreme than the data, under H 0. This probability is referred to as the p-value. If the p-value is small, it means either something improbable has happened, or that H 0 is problematic. We reject H 0 when the p-value is small. How small is small? Traditionally people have used.05 and.01. This number is called the significance level. PHP 2510 Lec 10: Hypothesis testing 10

What if the alternative hypothesis is p <.5 instead of p.5? In hypothesis testing, we think that either H 0 or H 1 is true. H 1 is used to determine which values are as or more extreme under H 0. # of heads 0 1 2 3 4 5 6 7 8 9 10 Probability.000977.00977.0439.117.205.246.205.117.0439.00977.000977 Which values, compared to the actual data X = 2, would make you lean more towards the alternative hypothesis p <.5? These would be X = 0 and X = 1. 8,9,10 are no longer extreme values if the alternative is p <.5 instead of p.5. Now the p-value becomes P 0 (X=0 or 1 or 2) = P(X = 0) + P(X = 1) + P(X = 2) = 0.055 When we used H 1 : p.5, we call it a two-sided test. When we use H 1 : p <.5 or H 1 : p >.5, we call it a one-sided test. PHP 2510 Lec 10: Hypothesis testing 11

Now let s get back to our original example: From a random sample of 10 people who are on a new diet, we observed cholesterol levels 174 178 196 181 181 197 185 167 173 176 Can we test the hypothesis that people on the diet actually have the same mean cholesterol level as the general population? H 0 : µ = 188 First, let s do a two sided test. H 1 : µ 188. PHP 2510 Lec 10: Hypothesis testing 12

We start with the simplest case, as we did for confidence intervals. Let s assume that we know the standard deviation is 9.8. Under H 0, X10 N(188, 9.8 2 /10). 0.00 0.04 0.08 0.12 175 180 185 190 195 200 Now the question is, which values are as or more extreme? PHP 2510 Lec 10: Hypothesis testing 13

0.00 0.04 0.08 0.12 175 180 185 190 195 200 X 10 N(188, 9.8 2 /10) Can you compute the probability of as or more extreme than 180.8? PHP 2510 Lec 10: Hypothesis testing 14

P( X 10 <= 180.8) = P( X 188 9.82 /10 < 180.8 188 9.82 /10 ) = P(Z < 2.32) =.01 p-value=2.01 =.02 <.05, so we would reject H 0 at significance level 0.05. PHP 2510 Lec 10: Hypothesis testing 15

For the observed sample mean 180.8, we have rejected the H 0 at significance level 0.05. What if the observations is 182? What about 183? Or more general, what are the values of X10 such that you would just reject H 0 at significance level.05? What we know under H 0 : distribution of normal Z If X 10 188 9.82 /10 is the standard X 10 188 < 1.96 or X10 188 > 1.96, we would reject H 0 at 9.82 /10 9.82 /10 significance level.05. If X 10 188 9.82 /10 < z.01/2 = 2.58 or would reject H 0 at significance level.01. X 10 188 9.82 /10 > z.01/2 = 2.58, we PHP 2510 Lec 10: Hypothesis testing 16

0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 PHP 2510 Lec 10: Hypothesis testing 17

We call X 10 188 9.82 /10 the test statistic and regions (, 1.96), (1.96, ) or (, 2.58), (, 2.58) critical regions. When the test statistic is inside the critical region, we reject H 0. We say X 10 188 is a Z-statistic since it follows a standard 9.82 /10 normal distribution under H 0 model. In our example, X 10 = 180.8, so X 10 188 = 180.8 188 = 2.32. 9.82 /10 9.82 /10-2.32 is within the region (, 1.96), but not within the regions (, 2.58) or (, 2.58). Thus we reject H 0 at the.05 level, but not the.01 level. PHP 2510 Lec 10: Hypothesis testing 18

What if we have an one-sided alternative? H 0 : µ = 188 H 1 : µ < 188. Now which values are as or more extreme? 0.00 0.04 0.08 0.12 175 180 185 190 195 200 PHP 2510 Lec 10: Hypothesis testing 19

X 10 188 = 180.8 188 = 2.32 9.82 /10 9.82 /10 p-value: P(Z < 2.32) =.01 critical value for α =.05: P(Z < q) =.05 q = 1.64 critical value for α =.01: P(Z < q) =.05 q = 2.32 For one sided test, we would reject H 0 at both significance level 0.05 and.01 PHP 2510 Lec 10: Hypothesis testing 20

Summary of the steps in hypothesis testing A: In general 1. Select the probability model 2. Set up the null and alternative hypothesis 3. determine a test statistic 4. determine significance level and critical region 5. reject H 0 if test statistic is in critical region In previous example 1. Use a normal probability model with known variance 2. H 0 : µ = 188, H 1 : µ < 188 3. Z statistic X µ σ/ n = 2.32 4. α =.05,(, 1.64) 5. -2.32 in critical region, reject H 0 PHP 2510 Lec 10: Hypothesis testing 21

Summary of the steps in hypothesis testing B: In general 1. Select the probability model 2. Set up the null and alternative hypothesis 3. determine a test statistic 4. compute p-value 5. reject H 0 if p-value less than significance level In previous example 1. Use a normal probability model with known variance 2. H 0 : µ = 188, H 1 : µ < 188 3. Z statistic X µ σ/ n = 2.32 4. P(Z < 2.32) =.01 5. 0.01 < α =.05, reject H 0 PHP 2510 Lec 10: Hypothesis testing 22

Now, what if we do not know the standard deviation? Can we still do hypothesis testing? 1. Use a normal probability model with unknown variance 2. H 0 : µ = 188, H 1 : µ < 188 We can no longer form the Z-statistic. But we can estimate σ 2 and form a T statistic: From our data we have S 2 = 93, thus T = X µ S/ n = 180.8 188 = 2.36 93/10 X µ S/ n t df=n 1 Method A: find critical region from t-distribution (df=9). t.05,df=9 = 1.83, thus we have critical region (, 1.83) The test statistics is in critical region, reject H 0 at significance level.05. Method B: p-value P(T < 2.36) = 0.02 <.05, reject H 0 at significance level.05. PHP 2510 Lec 10: Hypothesis testing 23

Hypothesis testing for comparing two means: I X 1,...,X n1 N(µ X, σx 2 )(σ2 X known) Y 1,...,Y n2 N(µ Y, σy 2 )(σ2 Y known) H 0 : µ X = µ Y or µ x µ Y = 0 T = X Ȳ (µ X µ Y ) σ 2 X /n 1 + σ 2 Y /n 2 N(0, 1) under H 0 PHP 2510 Lec 10: Hypothesis testing 24

Example: we had the cholesterol data from last lecture on confidence intervals X: 174 178 196 181 181 197 185 167 173 176 Y: 212 204 204 201 194 218 205 180 207 195 189 198 190 193 194 183 208 202 189 213 X = 180.8, Ȳ = 199. Suppose we know σ2 X = σ2 Y = 9.82. T = 180.8 199 9.82 /10 + 9.8 2 /20 = 4.79 One sided critical value for α =.05: (, 1.64) One sided critical value for α =.01: (, 2.32) Reject H 0. Or, compute p-value P(Z < 4.79) 0, reject H 0 PHP 2510 Lec 10: Hypothesis testing 25

Hypothesis testing for comparing two means: II X 1,...,X n1 N(µ X, σ 2 X ) Y 1,...,Y n2 N(µ Y, σ 2 Y ) (σ 2 X, σ2 Y unknown but equal) T = X Ȳ (µ X µ Y ) S 2 p/n 1 + S 2 p/n 2 t df=n1 +n 2 2 under H 0 Estimate common variance by pooled sample variance S 2 p = 100.4 (if you forgot how this is done, review lecture 13). Form test statistic T = 180.8 199 100.4(1/10 + 1/20) = 4.69 PHP 2510 Lec 10: Hypothesis testing 26

One sided critical value for α =.05: (, t.05,df=28 ) = (, 1.70) One sided critical value for α =.01: (, t.01,df=28 ) = (, 2.47) Reject H 0 (exercise: What would be the critical regions if we were doing two-sided tests?) Or compute p-value P(t df=28 < 4.69) 0, reject H 0 PHP 2510 Lec 10: Hypothesis testing 27

Hypothesis testing for comparing two means: III X 1,...,X n1 N(µ X, σ 2 X ) Y 1,...,Y n2 N(µ Y, σ 2 Y ) (σ 2 X, σ2 Y unknown and unequal) T = X Ȳ (µ X µ Y ) S 2 X /n 1 + S 2 Y /n 2 Welch t under H 0 As we learned in previous lectures, the degree of freedom for this distribution is not simple. For large samples we know this converges to N(0,1), for small samples we can be conservative and use df = min(n 1 1, n 2 1) if there is no computer available. PHP 2510 Lec 10: Hypothesis testing 28

What happens when the two populations are not independent? What happens when we do not start with normal distributions? PHP 2510 Lec 10: Hypothesis testing 29

Difference of means of paired observations X N(µ X, σ 2 x) Y N(µ Y, σ 2 y) where X k is paired with Y k (before/after, left-hand/right-hand, paired treated/control...) H 0 : µ x = µ y, H 1 : µ x µ y But since X and Y are not independent, X and Ȳ are not independent. We do not have the simple result X Ȳ N(µ X µ Y, σx 2 /n + σ2 Y /n) (why not?) PHP 2510 Lec 10: Hypothesis testing 30

Solution: For each pair, we form the difference D k = X k Y k. Thus D 1,...,D k N(µ X µ Y, σd 2 ) and we estimate σ2 D with the sample variance SD 2. The test statistic for testing D = d 0 is then D d 0 S d / n degree of freedom n 1. student t with The most common test is for H 0 : D = 0. PHP 2510 Lec 10: Hypothesis testing 31

Example: Suppose you wish to test the effect of Prozac on the well-being of depressed individuals, using a standardised well-being scale. Higher scores indicate greater well-being (that is, Prozac is having a positive effect). We assume that the scores are approximately normally distributed. ID Pre Post 1 0 1 2 3 5 3 6 5 4 7 7 5 4 10 6 3 9 7 2 7 8 1 11 9 4 8 PHP 2510 Lec 10: Hypothesis testing 32

ID Pre Post difference (post-pre) 1 0 1 1 2 3 5 2 3 6 5-1 4 7 7 0 5 4 10 6 6 3 9 6 7 2 7 5 8 1 11 10 9 4 8 4 d = 3.67 S 2 d = 12.25 H 0 : d = 0, H 1 : d 0 t = D 0 S/ = 3.143 df = 8 9 Rejection region at α =.05: (2.306, ) and (, 2.306) Rejection region at α =.01: (3.355, ) and (, 3.355) PHP 2510 Lec 10: Hypothesis testing 33

When we do not start with normal distributions, Central limit theorem is our friend again, as long as we have large samples. PHP 2510 Lec 10: Hypothesis testing 34

Example: Bernoulli Suppose the incidence rate for children at 5 for disease W is.0137 (137 per 10,000) in 2007. We want to know if the incidence rate in Providence is the same as the national rate. A sample of 2000 children were randomly selected and their medical record queried to see if they had caught the disease in 2007. 30 of them had the disease. PHP 2510 Lec 10: Hypothesis testing 35

1. 1. select a probability model: Bernoulli(p) 2. 2. H 0 : p =.0137. H 1 : p.0137 3. determine a test statistic: 2000 is a large sample, by CLT, X. N(p, p(1 p)/n), thus 4. Under H 0, X p p(1 p)/n N(0, 1) X.0137.0137(1.0137)/2000 N(0, 1) We observe X = 30/2000 =.015, thus the test statistic is.015.0137.0137(1.0137)/2000 = 0.519 PHP 2510 Lec 10: Hypothesis testing 36

0.0 0.1 0.2 0.3 0.4 30% 1.96 2.5% 3 2 1 0.519 1 2 3 For two-sided test, the critical regions for α =.05 are (, 1.96) and (1.96, ). We do not reject H 0 at significance level 0.05 (thus we certainly do not reject H 0 at any more significant level, such as.01). OR, we compute the p-value 2P(Z >.519) = 2.30 =.60.60 >.05 and we do not reject H 0 at significance level.05 PHP 2510 Lec 10: Hypothesis testing 37

Example: Suppose we want to compare the average daily visit for two emergency rooms, A and B. For each we record the daily visit number for a year. On average, there is 15.4 visits to ER A a day, and 14.8 visits to ER B a day. Do these two ERs have the same daily visit rates? 1. Probability model: Poisson, for events randomly happen over time 2. H 0 : λ A = λ B, H 1 : λ A λ B 3. Test statistic: For either ER we observe 365 days, by CLT, X. A N(λ A, λ A /n), X. B N(λ B, λ B /n), X A X. B N(λ A λ B, λ A /n + λ B /n) PHP 2510 Lec 10: Hypothesis testing 38

( X A X B ) (λ A λ B ) λa /n + λ B /n. N(0, 1) 4. Under H 0, (λ A λ B ) = 0, ( X A X B ). N(0, 1) 2λ/n We can pool estimate λ from the two samples and get ˆλ = (15.4 365 + 14.8 365)/(365 + 365) = 15.1 Test statistic: (15.4 14.8) 2 15.1/365 = 2.09 > Z.025 = 1.96 OR,p value = 2P(Z > 2.09) = 2.018 =.036 <.05 Reject H 0. PHP 2510 Lec 10: Hypothesis testing 39

Review: The logic for hypothesis testing: If the null hypothesis, instead of the alternative hypothesis, is true, should I be surprised by the data? I am surprised, if the the probability of observing such or more extreme result is small based on H 0. This probability is called the p-value. By convention, most people reject the null hypothesis if p-value is smaller than 0.05. The smaller the p-value, the stronger my doubt is about H 0, thus the more significant the result is against H 0. PHP 2510 Lec 10: Hypothesis testing 40

Review: The procedure for hypothesis testing 1. Select the probability model 2. Set up the null and alternative hypothesis 3. determine a test statistic 4. compute p-value 5. reject H 0 if p-value less than significance level PHP 2510 Lec 10: Hypothesis testing 41

Possible results from hypothesis testing: Decision H 0 True Reject, Type I error (α) Not reject H 1 True Reject Not reject, Type II error β Type I error: Rejecting H 0 when H 0 is true. Type II error: Fail to reject H 0 when H 0 is false (H 1 is true) Power: The ability to reject H 0 when H 0 is false P 0 ( Reject H 0 ) = P(Reject H 0 H 0 true) = α this probability is the significance level (Type I error rate) P 1 ( Reject H 0 ) = P(reject H 0 H 1 true) = 1 β this probability is the power of a hypothesis test PHP 2510 Lec 10: Hypothesis testing 42

Consider a one-sided test first: H 0 : µ = µ 0 = 5 H 1 : µ = µ 1 = 8 > µ 0. Suppose we have a normal model and know the variance is 10 2. For a sample size of 100, we form the test statistic T = X 5 10/ = X 5. 100 Under H 0, we know T follows Z distribution. 0.0 0.1 0.2 0.3 0.4 critical region, reject H0 2 0 c 2 4 6 For any decision rule reject H 0 if test statistic is greater than c, the type I error is the area of the area of red shaded region. PHP 2510 Lec 10: Hypothesis testing 43

Demo 1: type I error and choice of critical region PHP 2510 Lec 10: Hypothesis testing 44

What about type II error? X 8 Under H 1, 10/ = X 8 N(0, 1) 100 Under H 1, X 5 = ( X 8) + 3 N(3, 1) 0.0 0.1 0.2 0.3 0.4 H0 β α H1 critical region, reject H0 2 0 2 4 6 We can try to reduce type I error by using a larger cutoff, but this would increase type II error (reducing power). We can try to increase power by using a smaller cutoff, but this would increase the type I error. PHP 2510 Lec 10: Hypothesis testing 45

Demo: the trade-off between type I and type II error. PHP 2510 Lec 10: Hypothesis testing 46

Two-sided test: 0.0 0.1 0.2 0.3 0.4 critical region, reject H0 critical region, reject H0 6 4 2 0 2 4 6 PHP 2510 Lec 10: Hypothesis testing 47

We have seen that for a hypothesis test, there is a trade off between type I and II errors. For the same study design, we cannot simultaneously reduce both of them. The common practice is to fix the type I error at a small level, such as.05 or.01, so that we know that at least we are not rejecting H 0 too often when we should not. What other factors affect power, for a given type I value? PHP 2510 Lec 10: Hypothesis testing 48

(1. Type I error) 2. effect size If H 1 is true, the larger the difference between µ 0 (nullvalue) and µ 1 (alternative), the higher the power. 0.0 0.1 0.2 0.3 0.4 H0 β H1 H1 critical region, reject H0 α 2 0 2 4 6 8 PHP 2510 Lec 10: Hypothesis testing 49

3. Sample size We know that X. N(µ, σ 2 /n). This means as sample size increases, the sample mean gets more concentrated near the true mean. Thus the null and alternative hypothesis becomes easier to separate. H0 H1 H0 H1 PHP 2510 Lec 10: Hypothesis testing 50

demo3 PHP 2510 Lec 10: Hypothesis testing 51

Computation of power 1. Write down the two hypothesis H 0 and H 1 2. Write down the probability models based on each hypothesis 3. determine the test statistic 4. For given type I error rate (α), effect size and sample size, determine the critical region (rejection region) 5. computer power 1-β PHP 2510 Lec 10: Hypothesis testing 52

Computation of power : one sided test Example: Suppose we want to test whether the mean of a population is 12 or less than 12. We assume normal distribution with known variance 36. What is the power of this test if the true mean is 10, and we have a sample size of 25, and set significance level.05? 1. H 0 : µ = 12;H 1 : µ < 12 2. Under H 0 : X N(12, 36) X N(12, 36/n) Truth:X N(10, 36) X N(10, 36/n) 3. Test statistic: X 12 36 4. Since we have a one sided test with H 1 : µ < 12, we will reject H 0 when the test statistic is less than a cutoff. X 12 α = 0.05 = P 0 (Reject H 0 ) = P 0 ( 36/ 25 < C) = P(Z < C) From the Z-table we know C = 1.645 5. Under H 1 PHP 2510 Lec 10: Hypothesis testing 53

X 12 POWER = P 1 ( < 1.645) 36/ 25 = P 1 ( X 10 + 10 12 6/5 = P 1 ( X 10 6/5 = P 1 ( X 10 6/5 + 10 12 6/5 < 1.645 < 1.645) < 1.645) 10 12 6/5 ) = P(Z <.0217) = 1 P(Z >.0217) =.52 PHP 2510 Lec 10: Hypothesis testing 54

Example: Suppose we want to test whether the mean of a population is 12 or greater than 12. We assume normal distribution with known variance 36. What is the power of this test if the true mean is 10, and we have a sample size of 25, and set significance level.05? 1. H 0 : µ = 12;H 1 : µ > 12 2. Under H 0 : X N(12, 36) X N(12, 36/n) Truth : X N(10, 36) X N(10, 36/n) 3. Test statistic: X 12 36 4. Since we have a one sided test with H 1 : µ > 12, we will reject H 0 when the test statistic is greater than a cutoff. X 12 α = 0.05 = P 0 (Reject H 0 ) = P 0 ( 36/ 25 > C) = P(Z > C) From the Z-table we know C = 1.645 5. Under H 1 PHP 2510 Lec 10: Hypothesis testing 55

X 12 POWER = P 1 ( > 1.645) 36/ 25 = P 1 ( X 10 + 10 12 6/5 = P 1 ( X 10 6/5 = P 1 ( X 10 6/5 + = P(Z > 3.31) 0 10 12 6/5 > 1.645 > 1.645) > 1.645) 10 12 6/5 ) When truth is µ = 10, the probability that you will be able to reject H 0 in a test for µ = 12 versus µ > 12, is nearly 0. PHP 2510 Lec 10: Hypothesis testing 56

Example: Suppose we want to test whether the mean of a population is 12 or not 12. We assume normal distribution with known variance 36. What is the power of this test if the true mean is 10, and we have a sample size of 25, and set significance level.05? 1. H 0 : µ = 12;H 1 : µ 12 2. Under H 0 : X N(12, 36) X N(12, 36/n) Truth X N(10, 36) X N(10, 36/n) 3. Test statistic: X 12 36 4. Since we have a two-sided test, we will reject H 0 when the absolute value of test statistic is greater than a cutoff. X 12 α = 0.05 = P 0 (Reject H 0 ) = P 0 ( 36/ 25 > C) = P( Z > C) From the Z-table we know C = Z.025 = 1.96 5. Under H 1 PHP 2510 Lec 10: Hypothesis testing 57

X 12 P 1 ( > 1.96) 36/ 25 = P 1 ( X 12 6/5 = P 1 ( X 10 + 10 12 6/5 = P 1 ( X 10 6/5 > 1.96) + P 1 ( X 12 6/5 + = P(Z > 1.96 = P(Z > 3.63) + P(Z <.29) < 1.96) > 1.96) + P 1 ( X 10 + 10 12 6/5 10 12 + 6/5 10 12 10 12 ) + P(Z < 1.96 6/5 6/5 ) > 1.96) + P 1 ( X 10 6/5 = P(Z > 3.63) + P(Z >.29) =.386 10 12 6/5 < 1.96) < 1.96) PHP 2510 Lec 10: Hypothesis testing 58

In general, for X 1, X 2,...,X n N(µ, σ 2 ), the power for test H 0 : µ = µ 0 versus H 1 : µ < µ 0 is P(Z < µ 0 µ 1 σ/ n Z α) the power for test H 0 : µ = µ 0 versus H 1 : µ > µ 0 is P(Z > µ 0 µ 1 σ/ n + Z α) the power for test H 0 : µ = µ 0 versus H 1 : µ µ 0 is P(Z < µ 0 µ 1 σ/ n Z α/2) + P(Z > µ 0 µ 1 σ/ n + Z α/2) PHP 2510 Lec 10: Hypothesis testing 59

Exercise: Now if I give you the same set up, just different numberes: different H 0 mean (not 12, but 200), different truth (not 10, but 180), different variance( not 36, but 64), different significance level (not.05, but.01), different sample size (not 25, but 49), can you compute the power for the test H 0 : µ = 200 versus H 1 : µ > 200? PHP 2510 Lec 10: Hypothesis testing 60

Next topic: We now know that type I error, Type II error (1-power), effect size and sample size are all connected. Can we determine a necessary sample size when we need to meet certain requirements of error rates? For one sample two-sided test:h 0 : µ = µ 0 versus µ µ 0 n = (Z α/2 + Z β ) 2 σ 2 (µ 1 µ 0 ) 2 For one sample one-sided test:h 0 : µ = µ 0 versus µ µ 0 n = (Z α + Z β ) 2 σ 2 (µ 1 µ 0 ) 2 the smaller the error rate, the larger the sample size the larger the variance, the larger the sample size the larger the effect size, the smaller the sample size PHP 2510 Lec 10: Hypothesis testing 61