Statistical Testing I. De gustibus non est disputandum

Size: px

Start display at page:

Download "Statistical Testing I. De gustibus non est disputandum"

Adele Stewart
6 years ago
Views:

1 Statistical Testing I De gustibus non est disputandum

2 The Pepsi Challenge "Take the Pepsi Challenge" was the motto of a marketing campaign by the Pepsi-Cola Company in the 1980's. A total of 100 Coca-Cola drinkers were asked to blindly taste unmarked cups of Diet Pepsi and Diet Coke, and to select their favorite. A subsequent Pepsi TV commercial stated "... in recent blind taste tests, more than half of all Diet Coke drinkers surveyed said they preferred the taste of Diet Pepsi". Assume that, out of the 100 Diet Coke drinkers, 56 preferred Diet Pepsi. Would this result support the claim that more than half of all Diet Coke drinkers prefer Diet Pepsi to Diet Coke?

3 "Scientific Method" "The validity of knowledge is tied to the probability of falsification." Karl Popper ( ) "Scientific propositions can be falsified empirically. On the other hand, unscientific claims are always 'right' and cannot be falsified at all."

4 Statistical Testing New Knowledge Through Falsification current knowledge new knowledge H 0 falsification H A

5 Decision Making - Scientific questions are often formulated in the form of mutually exclusive hypotheses (i.e. H 0 versus H A ) about one or more population parameters. - A statistical test is a decision rule that allows a researcher to either reject H 0 ("statistically significant result") or maintain H 0 on the basis of sample data.

6 Statistical Testing Null Hypothesis The null hypothesis usually implies the opposite of what a researcher expects (or wishes) to be true. It often represents conservatism or common opinion. H 0 : The expected diastolic blood pressure of patients with a particular disease equals that of control individuals.

7 Statistical Testing Alternative Hypothesis The alternative hypothesis usually implies what a researcher expects (or wishes) to be true. The alternative hypothesis is regarded as established when the null hypothesis is rejected. H A : The expected diastolic blood pressure of patients with a particular disease differs from that of control individuals.

8 Blood Pressure and Myocardial Infarction A study was carried out to assess whether the expected diastolic blood pressure (DBP) of patients with myocardial infarction (MI) differs from the expected DBP of control individuals, namely 80 mmhg. H 0 : µ=µ 0 H A : µ µ 0

9 Statistical Testing Procedure - All information from the sample data is collapsed in a single numerical quantity, called the test statistic (T). - The maintenance region of the test comprises all values of T for which H 0 is maintained. - The rejection region comprises all values of T for which H 0 is rejected. - The maintenance and rejection regions are demarcated by the critical values.

10 Statistical Testing Procedure critical value critical value H 0 rejection region maintenance region rejection region T T in maintenance region T in rejection region maintain H 0 reject H 0

11 Statistical Testing Possible Errors A type I error is made when H 0 is rejected although it is true. A type II error is made when H 0 is maintained although it is wrong. truth decision maintain H 0 reject H 0 H 0 correct type I error H A type II error correct

12 Statistical Testing Significance Level - A statistical test has significance level α if the probability of making a type I error is at most α. - Before data collection, the critical values of a test are chosen such that the test has a pre-specified significance level (e.g. 0.05). - The choice of critical values depends upon the prespecified significance level and the nature of H 0, but not the nature of H A.

13 Blood Pressure and Myocardial Infarction H 0 : µ=µ 0 H A : µ µ 0 The significance level of a test of H 0 versus H A limits the probability of erroneously claiming a difference in expected DBP between MI patients and control individuals.

14 Statistical Testing Critical Values H 0 α/2 α/2 c α/2 c 1-α/2 T

15 One-sample t-test Procedure Random Variable Hypotheses Test Statistic Rejection Region X N(µ,σ 2 ) both parameters unknown H : µ 0 X µ = 0 H A : µ µ 0 S/ µ 0 T = n T t α/2,n-1 or T t 1-α/2,n-1 =-t α/2,n-1 'degrees of freedom' (ν)

16 Blood Pressure and Myocardial Infarction A study was carried out to assess whether the expected diastolic blood pressure (DBP) of patients with myocardial infarction (MI) differs from the expected DBP of control individuals, namely 80 mmhg. The following DBP values were observed in 9 patients with MI: 92, 87, 79, 87, 99, 82, 74, 83, 103 x = mmhg s = 9.34 mmhg t = t.975, 8 0 = 2.306

17 t-distribution Quantiles

18 Statistical Testing Power - The probability of making a type II error (i.e. to adhere to H 0 if, in fact, H A is true) is designated as β. - The converse probability 1-β, i.e. the probability of avoiding a type II error, is called the power of a test. - The power of a statistical test depends upon the nature of H A, but not the nature of H 0.

19 Statistical Testing Error Probabilities truth decision H 0 H A maintain H 0 1-α β reject H 0 α 1-β

20 Statistical Testing Critical Values H 0 H A α/2 α/2 c α/2 c 1-α/2 β T

21 Blood Pressure and Myocardial Infarction H 0 : µ=80 H A : µ 80 σ=10 mmhg µ P µ (T , T 2.306) α=0.05 H H A 81 (79) 85 (75) 90 (70) β 1-β 1-β

22 Statistical Testing Effect Size and Power H 0 H A α/2 α/2 c α/2 c 1-α/2 β T

23 Statistical Testing Significance and Power H 0 H A α'/2 β' α'/2 T c α'/2 c 1-α'/2

24 t-distribution Quantiles

25 Blood Pressure and Myocardial Infarction H 0 : µ=80 H A : µ 80 σ=10 mmhg µ P µ (T , T 2.896) α=0.02 H H A 81 (79) 85 (75) 90 (70) β 1-β 1-β

26 Alternative Hypotheses Two-Sided A two-sided alternative hypothesis does not specify a direction of the expected findings and usually - reflects a lack of prior knowledge about realistic alternatives to the null hypothesis - reads "is different from" or "deviates from" H A : The expected diastolic blood pressure of patients with a particular disease differs from that of control individuals.

27 Alternative Hypotheses Two-Sided H A (?) H 0 H A α/2 α/2 c α/2 c 1-α/2 β T

28 Alternative Hypotheses One-Sided H 0 H A β c 1-α α T

29 Alternative Hypotheses One-Sided A one-sided alternative hypothesis specifies the direction of the expected findings and usually - reflects common sense or suitable knowledge from previous scientific experiments - reads "is larger than", "is heavier than" or "is longer than" H A : The expected diastolic blood pressure of patients with a particular disease exceeds that of control individuals.

30 Clinical Studies In a clinical study, researchers often wish to compare the respective probability of therapeutic success between a new medication (π M ) and placebo (π P ). H A : π M >π P H 0 : π M π P significance level power upper limit for the probability to declare a useless medication effective probability to recognise an effective medication as effective

31 One-sample t-test One-Sided Random Variable Hypotheses Test Statistic X N(µ,σ 2 ) both parameters unknown or X S/ H : µ 0 0 H : µ < µ µ 0 A 0 µ 0 H A : µ > µ 0 H : µ µ 0 T = n Rejection Region or T t α,n-1 T t 1-α,n-1

32 t-distribution Quantiles

33 Blood Pressure and Myocardial Infarction H 0 : µ 80 H A : µ>80 σ=10 mmhg µ P µ (T 1.860) α=0.02 P µ ( T 2.306) H H A β 1-β

34 One-sample t-test Sample Size Which sample size, n, is required to detect, at significance level α, a given effect µ-µ 0 with power 1-β? one-sided two-sided n σ z 1 α + z 1 µ µ 0 β 2 n σ z 1 + z α/2 µ µ 0 1 β 2

35 One-sample t-test Sample Size (one-sided) 1000 σ = 10 α = β = 0.90, 0.80, 0.70 n µ µ 0

36 One-sample t-test Sample Size (two-sided) 1000 σ = 10 α = β = 0.90, 0.80, 0.70 n µ µ 0

37 The Pepsi Challenge H 0 : Pepsi does not taste better than Coke (π 0.5). H A : Pepsi tastes better than Coke (π>0.5). P P i 100 i ( T 59) = = = i 59 i i 100 i ( T 58) = = = i 58 i c 0.05 = 59 Conclusion: The number of Diet Coke drinkers who preferred Diet Pepsi (i.e. 56) was not significantly higher than the number who preferred Diet Coke (i.e. 44).

Statistics and Truth Egon Pearson (1895-1980) Jerzy Neyman

itself provide any valuable evidence of the truth or falsehood of

38 Statistics and Truth Egon Pearson ( ) Jerzy Neyman ( ) "No test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of a hypothesis." Neyman J, Pearson E (1933) Phil Trans R Soc A, 231:

39 Statistics and Truth Ronald A. Fisher ( ) "It would, therefore, add greatly to the clarity with which the tests of significance are regarded if it were generally understood that the tests of significance, when used accurately, are capable of rejecting or invalidating hypotheses, in so far as they are contradicted by the data: but that they are never capable of establishing them as certainly true."

40 p Value H 0 p T t obs The p-value is the probability of obtaining the observed, or an even less probable, value of T than t obs when the null hypothesis is correct.

41 p Value Evidence Against H 0 p value evidence none "moderate" "strong" "very strong"

42 Blood Pressure and Myocardial Infarction H 0 : µ 80 H A : µ>80 p = P(T>2.354) = H 0 : µ=80 H A : µ 80 p = P( T >2.354) = The Pepsi Challenge H 0 : π 0.5 H A : π>0.5 p 100 = P i = 56 i ( ) 100 i 100 i X 56 = =

43 Pravastatin and Cardiovascular Disease major cardiovascular outcome placebo (n=2078) Pravastatin (n=2081) p non-fatal MI or death from CHD CABG or PTCA <0.001 Stroke CAGB: coronary artery bypass grafting, PTCA: percutaneous transluminal coronary angioplasty Sacks FM et al. (1996) N Engl J Med 335:

44 Negative Findings Negative findings are as important as positive findings because they reduce ignorance and may suggest interesting new hypotheses and lines of investigation. They are also necessary to guide future research in the field of interest (publication bias).

45 Summary -Statistical problems are usually defined as mutally exclusive hypotheses about population parameters. -Statistical tests are decision rules to either maintain or reject a given null hypothesis on the basis of sample data. -When performing a statistical test, two types of error can occur through falsely rejecting either the null hypothesis or the alternative hypothesis. -The probability of making a type I error is limited by the significance level of the test; the probability of avoiding a type II error is called the power of the test. -The p value is a measure of the discrepancy between the data and the null hypothesis.

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Sample Size and Power I: Binary Outcomes James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power Principles: Sample size calculations are an essential part of study design Consider