E509A: Principle of Biostatistics. GY Zou

E509A: Principle of Biostatistics (Week 4: Inference for a single mean ) GY Zou gzou@srobarts.ca

Example 5.4. (p. 183). A random sample of n =16, Mean I.Q is 106 with standard deviation S =12.4. What is the 95% CI? 106 ± t 1.05/2,16 1 12.4/ 16 = (99.4, 112, 6) Example 5.5 (p. 184). A random sample of n =65, Mean number of visits over 3-year period is 16 with S =1.4. What is the 99% CI? 16 ± z 1.01/2 1.4/ 65 = (15.5, 16.5)

meaning of confidence interval For a confidence interval constructed based on a single sample, it either cover or not cover the true parameter. we are unable to predict th result of any single observation before we have made it, but we can predict, with very considerable accuracy, the result of a long series. (Weldon 1906). It is incorrect to say: There is a 95% probability that the estimated interval [a, b] contains the unknown μ, because [a, b] is changing from sample to sample, while μ is fixed. Image throw a horse shoe in a dark room. The (1 α)100 % is referring to if th study were to be repeated 100 times, of the 100 resulting (1 α)100 % confidence intervals, we would expect (1 α)100 of these to include the population parameter. Your samples from Framingham Study will show this.

Why not just construct a 100% confidence interval?

Sample size of estimating a population mean Mind set: sample size estimation is used to distinguish n and 3n, but not n and n +3.

Since CI is given by X ± Z 1 α/2 σ/ n The uncertainty is Z 1 α/2 σ/ n,let denoteitase, i.e. E = Z 1 α/2 S/ n Thus n = ( Z1 α/2 σ E ) 2 σ, E, must be given by the researcher: from literature, gut feeling, etc. sometimes use σ = range/4. The probability of achieving the target precision is only 50%.

Example 5.7. (p 188). Hospital administration wants to estimate the mean time it takes for patients to get from one dept to another. The margin of error is 5 minutes with 95%. How big a sample does it need? Do a pilot, if nothing to rely on. Here σ =17from a pilot, thus ( ) Z1 α/2 S 2 ( ) 1.96 17 2 n = = =44.4 45 E 5

options ls=64 nocenter; proc power; onesamplemeans ci=t alpha = 0.05 halfwidth = 5 stddev = 17 probwidth = 0.5 ntotal =.; run; probwidth = desired probability of achieving the target precision

The POWER Procedure Confidence Interval for Mean Fixed Scenario Elements Distribution Normal Method Exact Alpha 0.05 CI Half-Width 5 Standard Deviation 17 Nominal Prob(Width) 0.5 Number of Sides 2 Prob Type Conditional Computed N Total Actual Prob N (Width) Total 0.525 47

One after at Rothamstead, Ronald A Fisher poured a cup of tea and offered it to the women standing beside him. She refused, remarking that she preferred milk to be in the cup before the tea was added. Fisher could not believe that there could be any difference in the taste, and then a trial was conducted. The woman correctly identified more than enough of those cups into which tea had been poured first to prove her case.

Assume 10 cups of tea made without the woman knowing how they were made. The women correctly identified 9 cups. Did the women guess correctly? or there indeed a difference in taste? If the women was guessing, then 50/50 chance, which gives H 0 : p =0.5 Pr(X =9)= 10! 9!(10 9)! (0.5)9 (1 0.5) 10 9 =.0098 Another piece: would count if the women had identified all 10 Pr(X = 10) = 10! 10!(10 10)! (0.5)10 (1 0.5) 10 10 =.0010 Thus, if the women had been guessing, the probability of correctly identified 9 out of 10 just by chance is 0.0098 + 0.0010 = 0.0108 This probability, of getting the observed result or more extreme, is called p-value.

p =Pr( observed H 0 ) It is NOT Pr(H 0 observed)

Hypothesis testing Researchers always have some hypothesis. e.g., diabetes have raised BP, oral contraceptive may cause breast cancer, etc. Can we prove hypothesis? No, one can always think of cases which have not yet arisen. Thus we set out to disprove a hypothesis, this is what we call hypothesis testing.

Three steps: Choose a significance level, α, of the test (also called false positive error rate we are willing to accept); Pretend the null hypothesis is true (so we can have a distribution as benchmark) Conduct the study, observe the data and compute p-value; Compare p and α and make decision, reject H 0 or not reject H 0. α is selected before the begin p is calculated after the study.

What is the p-value? Suppose a study observed a test statistic of 2.05 and the p-value for testing H 0 : μ =0is 0.04. If we replicate the study 100 times, if H 0 is true, then 4 of these 100 studies we will have a statistic at least 2.05. In terms of conditional probability p =Pr(Data H 0 ) We know that Pr(Data H 0 ) Pr(H 0 Data) Therefore, p-value is NOT the probability of H 0 being true.

To repeat the message In research, most of time we collect evidence to against H 0 (just like a prosecutor in a trial), we do NOT prove H 0. When our p-value is larger than 5%, we say we do not have sufficient evidence to suggest H 1, but NEVER say we showed no effect or we proved H 0. Hartung et al. 1983 Absence of evidence is not evidence of absence. Anesthesiology 58: 298 300 Donald Rumsfeld knows this, so should you. See http://www.defenselink.mil/news/feb2002/t02122002_t212sdv2.html}

Example 5.9. (p.196). Population mean cholesterol for males age 50 years old is μ = 241. Wish to see if modified diet could reduce it. n =12people on the diet for 3 months. Set α =0.05 and H 0 = 241 versus H 1 : μ<241. X = 235 and S =12.5. Assuming cholesterol is normally distributed, T = x μ 0 s/ n t 12 1 T = 235 241 12.5/ = 1.66 >t 12 0.05,12 1 = 1.796 Do not reject H 0. p =0.063.

Example 5.10. (198). Male entry level salary is μ 0 = $29500. Wish to see if female entry salary is significant different from this. Take a sample of size 10 and the observations are 1000; 32 27 31 27 26 26 30 22 25 36 Set α =0.05 Assuming normal distribution, t = x μ 0 s/ n t 10 1 T = 28.2 29.5 16.4/ 10 = 1.02 If H 0 : μ =29.5 1000 is true, p =0.3366.

SAS program options nocenter ls=80 ps=100; data salary; input salary @@; cards; 32 27 31 27 26 26 30 22 25 36 ; proc print; proc ttest H0=29.5; run;

The SAS System 22:45 Saturday, September 24, 2005 6 The TTEST Procedure Statistics Lower CL Upper CL Lower CL Variable N Mean Mean Mean Std Dev Std Dev salary 10 25.303 28.2 31.097 2.7855 4.0497 Statistics Upper CL Variable Std Dev Std Err Minimum Maximum salary 7.3932 1.2806 22 36 T-Tests Variable DF t Value Pr > t salary 9-1.02 0.336

Sample size estimation For a two-sided test, n = ( Z1 α/2 + Z 1 β (μ 1 μ 0 )/σ ) 2, where 1 β is the power of the test, i.e, the probability of detecting a difference if such a difference does exist. For an one-sided test, n = ( Z1 α + Z 1 β (μ 1 μ 0 )/σ ) 2

Type I, Type II errors Truth Decision H 0 is true H 0 is not true Reject TypeIerror(α) Power (1 β) Don t reject Type II error (β) Is there a Type III error? Status Test result No disease (D ) Disease (D + ) T + T Pr(T + D + )=Sensitivity Pr(T D )=Specificity

Since we obtained Pr(D + T + ) (or Pr(D T )) using the knowledge of Pr(D), Pr(T + D + ) and Pr(T D ). Question: can we do the same here, i.e., can we obtain Pr(D T )? In other words, can we use data to obtain the probability of H 0 being true? Many people tried that, but

Example 5.14. (p. 210). Suppose we wish to conduct a study to test μ = 100 at a 5% level of significance and 80% power. A difference of 5 units would be worthwhile. σ =9.5 n = ( Z1 α/2 + Z 1 β (μ 1 μ 0 )/σ ) 2 = ( ) 2 1.96 + 0.84 =28.33 29 5/9.5 proc power; onesamplemeans nullm=100 mean = 105 ntotal =. stddev = 9.5 power =.80; run;

The POWER Procedure One-sample t Test for Mean Fixed Scenario Elements Distribution Normal Method Exact Null Mean 100 Mean 105 Standard Deviation 9.5 Nominal Power 0.8 Number of Sides 2 Alpha 0.05 Computed N Total Actual N Power Total 0.809 31

Power calculation For a two-sided test, 1 β =Pr ( Z 1 α/2 μ ) 1 μ 0 σ/ n For an one-sided test, 1 β =Pr ( Z 1 α μ ) 1 μ 0 σ/ n

Example 5.11 (p. 208). μ 0 =80and μ 1 =85. α =5%. Two-sided test. n =20and σ =9.5 Z 1 α/2 μ 1 μ 0 σ/ n =1.96 5 9.5/ 20 = 0.40 Pr(Z > 0.40) = 1 Pr(Z < 0.4) = 1 0.3446 = 0.6554

proc power; onesamplemeans nullm = 80 mean = 85 ntotal = 20 stddev = 9.5 power =.; run;

The POWER Procedure One-sample t Test for Mean Fixed Scenario Elements Distribution Normal Method Exact Null Mean 80 Mean 85 Standard Deviation 9.5 Total Sample Size 20 Number of Sides 2 Alpha 0.05 Computed Power Power 0.608

The relationship between confidence interval and hypothesis testing: If a (1 α/2) 100% confidence interval contains the null hypothesis value, then the 2-sided test does not reject the null hypothesis at the α level This means that one can read off the hypothesis testing results by looking at a confidence interval. Suppose your confidence interval for μ is (-0.2, 0.5), and you want to test H 0 : μ =0, don t reject H 0. It is also clear what is the conclusion of testing H 0 : μ =0.51. In fact, you can do infinite many tests with one confidence interval.

Look-up t-critical value (quantile) with known degree-of-freedom and probability, use crit = tinv(prob, df); Look-up probability with known degree-of-freedom and calculated test statistic, use prob =probt(tcal, df); data; prob =0.95; df =12; crit = tinv(prob, df); tcal =1.812; df1 = 10; prob1 = probt(tcal, df1); ; proc print; run; Obs prob df crit tcal df1 prob1 1 0.95 12 1.78229 1.812 10 0.94996

SAS program for Ex 5.7 proc power; onesamplemeans ci=t alpha =0.05 halfwidth=1 2 3 4 5 10 stddev = 17 probwidth =.50 ntotal =.; run;

The POWER Procedure Confidence Interval for Mean Fixed Scenario Elements Distribution Normal Method Exact Alpha 0.05 Standard Deviation 17 Nominal Prob(Width) 0.5 Number of Sides 2 Prob Type Conditional Computed N Total Actual Half- Prob N Index Width (Width) Total 1 1 0.507 1113 2 2 0.508 280 3 3 0.516 126 4 4 0.521 72 5 5 0.525 47 6 10 0.574 14 considering many scenarios.

SAS program for Ex 5.12 (p. 210) proc power; onesamplemeans nullmean = 100 mean = 105 103 sides=1 2 alpha=0.05 stddev = 9.5 power=.80.90 ntotal =.; run;

The POWER Procedure One-sample t Test for Mean Fixed Scenario Elements Distribution Normal Method Exact Null Mean 100 Alpha 0.05 Standard Deviation 9.5 Computed N Total Nominal Actual N Index Sides Mean Power Power Total 1 1 105 0.8 0.804 24 2 1 105 0.9 0.906 33 3 1 103 0.8 0.803 64 4 1 103 0.9 0.902 88 5 2 105 0.8 0.809 31 6 2 105 0.9 0.901 40 7 2 103 0.8 0.802 81 8 2 103 0.9 0.902 108

SAS document we have for PROC POWER may have many errors, the corrected version can be obtained through http://ftp.sas.com/techsup/download/stat/power.pdf

Standardized quantity is our test statistic T = x μ 0 S/ n = n X μ 0 S if two-sided, look at Pr > T The large the n, the smaller the p. It is impossible not to reject H 0.

Confidence Interval and Significance Testing In theory, they are closely related. Confidence interval approach uses the sample statistic to find out what parameter values make this observed statistic most plausible; Significance testing fix a parameter value and asks what sample statistics are consistent with the fixed parameter value.

Recall Lower limit (L): the lowest parameter could make the observed one x become the 97.5% quantile cutoff point, i.e., a right tail test. x L S/ n = z 97.5 L = x 1.96S/ n Upper limit (U): the highest parameter could make the observed one x become the 2.5% quantile cutoff point, i.e., a left tail test. x U S/ n = z 2.5 U = x +1.96S/ n The values of the parameter inside the 95% confidence interval are precisely those which would not be contradicted by a two-sided test at 5% level.

It is a coincidence that L and U are symmetric about x in this simplest case. In general L and U are asymmetric about the sample estimate, just like our faces usually are asymmetric about our noses. L x and U x are called margins of errors.

Validity of a statistical procedure In practice, one data set cannot tell you if the procedure is valid. One can either use theory or simulation study, or both to justify.

If we want to know whether a sample size of 10 could make a confidence interval procedure for exponential mean valid, we could draw 10000 samples from exponential distribution, with each having 10 observations; use a procedure to construct a 95% CI s with each sample data, resulting in 10000 CI s; count how many of these 10000 CI s cover the true mean. If close to 9500, then the procedure is valid, otherwise, it is not. If we want to know a hypothesis testing procedure valid when sample size is 10, draw 10000 samples from normal distribution, with each having 10 observations; use the procedure to test the hypothesis with each sample at 5% level, result in 10000 conclusions (either reject or not reject); count how many rejections; if the rejection is close to 5%, then the procedure is valid, otherwise it s not.