E509A: Principle of Biostatistics. GY Zou

Size: px

Start display at page:

Download "E509A: Principle of Biostatistics. GY Zou"

Emil Poole
5 years ago
Views:

1 E509A: Principle of Biostatistics (Week 4: Inference for a single mean ) GY Zou gzou@srobarts.ca

4 Example 5.4. (p. 183). A random sample of n =16, Mean I.Q is 106 with standard deviation S =12.4. What is the 95% CI? 106 ± t 1.05/2, / 16 = (99.4, 112, 6) Example 5.5 (p. 184). A random sample of n =65, Mean number of visits over 3-year period is 16 with S =1.4. What is the 99% CI? 16 ± z 1.01/2 1.4/ 65 = (15.5, 16.5)

5 meaning of confidence interval For a confidence interval constructed based on a single sample, it either cover or not cover the true parameter. we are unable to predict th result of any single observation before we have made it, but we can predict, with very considerable accuracy, the result of a long series. (Weldon 1906). It is incorrect to say: There is a 95% probability that the estimated interval [a, b] contains the unknown μ, because [a, b] is changing from sample to sample, while μ is fixed. Image throw a horse shoe in a dark room. The (1 α)100 % is referring to if th study were to be repeated 100 times, of the 100 resulting (1 α)100 % confidence intervals, we would expect (1 α)100 of these to include the population parameter. Your samples from Framingham Study will show this.

6 Why not just construct a 100% confidence interval?

7 Sample size of estimating a population mean Mind set: sample size estimation is used to distinguish n and 3n, but not n and n +3.

8 Since CI is given by X ± Z 1 α/2 σ/ n The uncertainty is Z 1 α/2 σ/ n,let denoteitase, i.e. E = Z 1 α/2 S/ n Thus n = ( Z1 α/2 σ E ) 2 σ, E, must be given by the researcher: from literature, gut feeling, etc. sometimes use σ = range/4. The probability of achieving the target precision is only 50%.

9 Example 5.7. (p 188). Hospital administration wants to estimate the mean time it takes for patients to get from one dept to another. The margin of error is 5 minutes with 95%. How big a sample does it need? Do a pilot, if nothing to rely on. Here σ =17from a pilot, thus ( ) Z1 α/2 S 2 ( ) n = = = E 5

10 options ls=64 nocenter; proc power; onesamplemeans ci=t alpha = 0.05 halfwidth = 5 stddev = 17 probwidth = 0.5 ntotal =.; run; probwidth = desired probability of achieving the target precision

11 The POWER Procedure Confidence Interval for Mean Fixed Scenario Elements Distribution Normal Method Exact Alpha 0.05 CI Half-Width 5 Standard Deviation 17 Nominal Prob(Width) 0.5 Number of Sides 2 Prob Type Conditional Computed N Total Actual Prob N (Width) Total

12 One after at Rothamstead, Ronald A Fisher poured a cup of tea and offered it to the women standing beside him. She refused, remarking that she preferred milk to be in the cup before the tea was added. Fisher could not believe that there could be any difference in the taste, and then a trial was conducted. The woman correctly identified more than enough of those cups into which tea had been poured first to prove her case.

13 Assume 10 cups of tea made without the woman knowing how they were made. The women correctly identified 9 cups. Did the women guess correctly? or there indeed a difference in taste? If the women was guessing, then 50/50 chance, which gives H 0 : p =0.5 Pr(X =9)= 10! 9!(10 9)! (0.5)9 (1 0.5) 10 9 =.0098 Another piece: would count if the women had identified all 10 Pr(X = 10) = 10! 10!(10 10)! (0.5)10 (1 0.5) =.0010 Thus, if the women had been guessing, the probability of correctly identified 9 out of 10 just by chance is = This probability, of getting the observed result or more extreme, is called p-value.

14 p =Pr( observed H 0 ) It is NOT Pr(H 0 observed)

15 Hypothesis testing Researchers always have some hypothesis. e.g., diabetes have raised BP, oral contraceptive may cause breast cancer, etc. Can we prove hypothesis? No, one can always think of cases which have not yet arisen. Thus we set out to disprove a hypothesis, this is what we call hypothesis testing.

16 Three steps: Choose a significance level, α, of the test (also called false positive error rate we are willing to accept); Pretend the null hypothesis is true (so we can have a distribution as benchmark) Conduct the study, observe the data and compute p-value; Compare p and α and make decision, reject H 0 or not reject H 0. α is selected before the begin p is calculated after the study.

17 What is the p-value? Suppose a study observed a test statistic of 2.05 and the p-value for testing H 0 : μ =0is If we replicate the study 100 times, if H 0 is true, then 4 of these 100 studies we will have a statistic at least In terms of conditional probability p =Pr(Data H 0 ) We know that Pr(Data H 0 ) Pr(H 0 Data) Therefore, p-value is NOT the probability of H 0 being true.

18 To repeat the message In research, most of time we collect evidence to against H 0 (just like a prosecutor in a trial), we do NOT prove H 0. When our p-value is larger than 5%, we say we do not have sufficient evidence to suggest H 1, but NEVER say we showed no effect or we proved H 0. Hartung et al Absence of evidence is not evidence of absence. Anesthesiology 58: Donald Rumsfeld knows this, so should you. See

19 Example 5.9. (p.196). Population mean cholesterol for males age 50 years old is μ = 241. Wish to see if modified diet could reduce it. n =12people on the diet for 3 months. Set α =0.05 and H 0 = 241 versus H 1 : μ<241. X = 235 and S =12.5. Assuming cholesterol is normally distributed, T = x μ 0 s/ n t 12 1 T = / = 1.66 >t ,12 1 = Do not reject H 0. p =0.063.

20 Example (198). Male entry level salary is μ 0 = $ Wish to see if female entry salary is significant different from this. Take a sample of size 10 and the observations are 1000; Set α =0.05 Assuming normal distribution, t = x μ 0 s/ n t 10 1 T = / 10 = 1.02 If H 0 : μ = is true, p =

21 SAS program options nocenter ls=80 ps=100; data salary; input salary cards; ; proc print; proc ttest H0=29.5; run;

22 The SAS System 22:45 Saturday, September 24, The TTEST Procedure Statistics Lower CL Upper CL Lower CL Variable N Mean Mean Mean Std Dev Std Dev salary Statistics Upper CL Variable Std Dev Std Err Minimum Maximum salary T-Tests Variable DF t Value Pr > t salary

23 Sample size estimation For a two-sided test, n = ( Z1 α/2 + Z 1 β (μ 1 μ 0 )/σ ) 2, where 1 β is the power of the test, i.e, the probability of detecting a difference if such a difference does exist. For an one-sided test, n = ( Z1 α + Z 1 β (μ 1 μ 0 )/σ ) 2

24 Type I, Type II errors Truth Decision H 0 is true H 0 is not true Reject TypeIerror(α) Power (1 β) Don t reject Type II error (β) Is there a Type III error? Status Test result No disease (D ) Disease (D + ) T + T Pr(T + D + )=Sensitivity Pr(T D )=Specificity

25 Since we obtained Pr(D + T + ) (or Pr(D T )) using the knowledge of Pr(D), Pr(T + D + ) and Pr(T D ). Question: can we do the same here, i.e., can we obtain Pr(D T )? In other words, can we use data to obtain the probability of H 0 being true? Many people tried that, but

26 Example (p. 210). Suppose we wish to conduct a study to test μ = 100 at a 5% level of significance and 80% power. A difference of 5 units would be worthwhile. σ =9.5 n = ( Z1 α/2 + Z 1 β (μ 1 μ 0 )/σ ) 2 = ( ) = /9.5 proc power; onesamplemeans nullm=100 mean = 105 ntotal =. stddev = 9.5 power =.80; run;

27 The POWER Procedure One-sample t Test for Mean Fixed Scenario Elements Distribution Normal Method Exact Null Mean 100 Mean 105 Standard Deviation 9.5 Nominal Power 0.8 Number of Sides 2 Alpha 0.05 Computed N Total Actual N Power Total

28 Power calculation For a two-sided test, 1 β =Pr ( Z 1 α/2 μ ) 1 μ 0 σ/ n For an one-sided test, 1 β =Pr ( Z 1 α μ ) 1 μ 0 σ/ n

29 Example 5.11 (p. 208). μ 0 =80and μ 1 =85. α =5%. Two-sided test. n =20and σ =9.5 Z 1 α/2 μ 1 μ 0 σ/ n = / 20 = 0.40 Pr(Z > 0.40) = 1 Pr(Z < 0.4) = =

30 proc power; onesamplemeans nullm = 80 mean = 85 ntotal = 20 stddev = 9.5 power =.; run;

31 The POWER Procedure One-sample t Test for Mean Fixed Scenario Elements Distribution Normal Method Exact Null Mean 80 Mean 85 Standard Deviation 9.5 Total Sample Size 20 Number of Sides 2 Alpha 0.05 Computed Power Power 0.608

32 The relationship between confidence interval and hypothesis testing: If a (1 α/2) 100% confidence interval contains the null hypothesis value, then the 2-sided test does not reject the null hypothesis at the α level This means that one can read off the hypothesis testing results by looking at a confidence interval. Suppose your confidence interval for μ is (-0.2, 0.5), and you want to test H 0 : μ =0, don t reject H 0. It is also clear what is the conclusion of testing H 0 : μ =0.51. In fact, you can do infinite many tests with one confidence interval.

33 Look-up t-critical value (quantile) with known degree-of-freedom and probability, use crit = tinv(prob, df); Look-up probability with known degree-of-freedom and calculated test statistic, use prob =probt(tcal, df); data; prob =0.95; df =12; crit = tinv(prob, df); tcal =1.812; df1 = 10; prob1 = probt(tcal, df1); ; proc print; run; Obs prob df crit tcal df1 prob

34 SAS program for Ex 5.7 proc power; onesamplemeans ci=t alpha =0.05 halfwidth= stddev = 17 probwidth =.50 ntotal =.; run;

35 The POWER Procedure Confidence Interval for Mean Fixed Scenario Elements Distribution Normal Method Exact Alpha 0.05 Standard Deviation 17 Nominal Prob(Width) 0.5 Number of Sides 2 Prob Type Conditional Computed N Total Actual Half- Prob N Index Width (Width) Total considering many scenarios.

36 SAS program for Ex 5.12 (p. 210) proc power; onesamplemeans nullmean = 100 mean = sides=1 2 alpha=0.05 stddev = 9.5 power= ntotal =.; run;

37 The POWER Procedure One-sample t Test for Mean Fixed Scenario Elements Distribution Normal Method Exact Null Mean 100 Alpha 0.05 Standard Deviation 9.5 Computed N Total Nominal Actual N Index Sides Mean Power Power Total

38 SAS document we have for PROC POWER may have many errors, the corrected version can be obtained through

39 Standardized quantity is our test statistic T = x μ 0 S/ n = n X μ 0 S if two-sided, look at Pr > T The large the n, the smaller the p. It is impossible not to reject H 0.

40 Confidence Interval and Significance Testing In theory, they are closely related. Confidence interval approach uses the sample statistic to find out what parameter values make this observed statistic most plausible; Significance testing fix a parameter value and asks what sample statistics are consistent with the fixed parameter value.

41 Recall Lower limit (L): the lowest parameter could make the observed one x become the 97.5% quantile cutoff point, i.e., a right tail test. x L S/ n = z 97.5 L = x 1.96S/ n Upper limit (U): the highest parameter could make the observed one x become the 2.5% quantile cutoff point, i.e., a left tail test. x U S/ n = z 2.5 U = x +1.96S/ n The values of the parameter inside the 95% confidence interval are precisely those which would not be contradicted by a two-sided test at 5% level.

42 It is a coincidence that L and U are symmetric about x in this simplest case. In general L and U are asymmetric about the sample estimate, just like our faces usually are asymmetric about our noses. L x and U x are called margins of errors.

43 Validity of a statistical procedure In practice, one data set cannot tell you if the procedure is valid. One can either use theory or simulation study, or both to justify.

44 If we want to know whether a sample size of 10 could make a confidence interval procedure for exponential mean valid, we could draw samples from exponential distribution, with each having 10 observations; use a procedure to construct a 95% CI s with each sample data, resulting in CI s; count how many of these CI s cover the true mean. If close to 9500, then the procedure is valid, otherwise, it is not. If we want to know a hypothesis testing procedure valid when sample size is 10, draw samples from normal distribution, with each having 10 observations; use the procedure to test the hypothesis with each sample at 5% level, result in conclusions (either reject or not reject); count how many rejections; if the rejection is close to 5%, then the procedure is valid, otherwise it s not.

Power and Sample Size Bios 662

Power and Sample Size Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-10-31 14:06 BIOS 662 1 Power and Sample Size Outline Introduction One sample: continuous