Biostatistics VHM 802 Course Winter 2016, Atlantic Veterinary College, PEI Henrik Stryhn. Index of Lecture 13

Similar documents
Notes on sample size calculations

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Lecture 3: Inference in SLR

Topic 28: Unequal Replication in Two-Way ANOVA

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

STAT 4385 Topic 01: Introduction & Review

Regression With a Categorical Independent Variable: Mean Comparisons

Harvard University. Rigorous Research in Engineering Education

STATISTICS 141 Final Review

Inference for Distributions Inference for the Mean of a Population

Sleep data, two drugs Ch13.xls

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Confidence Intervals, Testing and ANOVA Summary

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Statistics 203: Introduction to Regression and Analysis of Variance Course review

STAT 3A03 Applied Regression With SAS Fall 2017

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

The simple linear regression model discussed in Chapter 13 was written as

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Two Sample Problems. Two sample problems

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

STAT 525 Fall Final exam. Tuesday December 14, 2010

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Review of One-way Tables and SAS

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

Simple Linear Regression: One Quantitative IV

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

M M Cross-Over Designs

y response variable x 1, x 2,, x k -- a set of explanatory variables

Lecture 9. ANOVA: Random-effects model, sample size

Review of Statistics 101

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups. Acknowledgements:

UNIVERSITY OF TORONTO Faculty of Arts and Science

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

MAS353 Experimental Design

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

Poisson regression: Further topics

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Ch 2: Simple Linear Regression

Difference in two or more average scores in different groups

3. Design Experiments and Variance Analysis

Comparing Means from Two-Sample

OHSU OGI Class ECE-580-DOE :Design of Experiments Steve Brainerd

What If There Are More Than. Two Factor Levels?

Multiple Linear Regression for the Salary Data

COMPARING SEVERAL MEANS: ANOVA

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Introduction to Crossover Trials

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Analysis of variance, multivariate (MANOVA)

Power Analysis using GPower 3.1

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Hypothesis testing. Data to decisions

Lecture 15. Hypothesis testing in the linear model

Power. Week 8: Lecture 1 STAT: / 48

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Factorial designs. Experiments

Experimental Design and Data Analysis for Biologists

Stat 705: Completely randomized and complete block designs

6 Sample Size Calculations

Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal Variances. Acknowledgements:

STA 101 Final Review

Business Statistics. Lecture 10: Course Review

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Power Analysis. Ben Kite KU CRMDA 2015 Summer Methodology Institute

STAT Chapter 11: Regression

Statistical Inference

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

One-way ANOVA (Single-Factor CRD)

psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests

16.400/453J Human Factors Engineering. Design of Experiments II

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Adaptive Designs: Why, How and When?

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

E509A: Principle of Biostatistics. GY Zou

HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY?

BIOS 312: Precision of Statistical Inference

Correlation and Simple Linear Regression

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Contrasts (in general)

Hypothesis Testing hypothesis testing approach

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

STAT 328 (Statistical Packages)

Psych 230. Psychological Measurement and Statistics

Paper Equivalence Tests. Fei Wang and John Amrhein, McDougall Scientific Ltd.

Outline Topic 21 - Two Factor ANOVA

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Transcription:

Biostatistics VHM 802 Course Winter 2016, Atlantic Veterinary College, PEI Henrik Stryhn Index of Lecture 13 Page Title 1 Practical information 2 Intro sample size issues 3 Statistical methods for sample size 4 Sample size based on estimation precision 5 Errors of type I II and power 6 Sample size based on power or effect size 7 Non-central distributions 8 Two-way ANOVA with no interaction 9 Calculations for 2-way ANOVA example 10 Unequal sample sizes 11 Sample size misconceptions 12 Equivalence and non-inferiority testing in two-sample situations 13 Methods for equivalence analysis 14 Project presentations 15 Project reports L13-0

Practical Information Today s lecture: power and sample size for experimental studies partly well-known material from VHM 801, plus: two-way ANOVA with/without interaction, material on equivalence/non-inferiority studies, software demonstrations (Minitab and Stata) using interactive menus ( no do-file!). revisiting complex experimental design 1 (L10 7/10). Reading: GO Chapter 7 2 and Section 10.3 (brief), suppl. Notes on sample size calculations (not part of course curriculum because mostly covered in textbook). Schedule: next lecture (April 7): project presentations, last lecture (April 14): course wrap-up and course evaluation, deadlines: last home assignment (April 4), project report (April 8). 1 Hasse diagrams not in course curriculum, see GO p. 296 for a summary. 2 Discussion about non-central F-distributions and power curves may be skipped because our focus is on using software for power calculations. L13-1

Intro Sample Size Issues Factors affecting the size of an experimental study: # treatments to compare (incl. control), # factors and for each factor, the number of levels, experimental design, e.g. block sizes in a block design, cost per experimental unit, management of experiment (e.g., size limitations). Statistical considerations: size should be sufficient to detect (obtain statistical significance for) treatment differences of interest, avoid waste of experimental units, reduce sensitivity to errors (by taking replications). Textbook example (GO 7.1-5, p. 150 ff.): patients with 3 different types of neurological diseases, VOR measurements as indicators of disease status, preliminary data (log scale): group means 2.82, 3.89 and 3.04, within-group variance 0.075. Additional notes example: within-subject differences 3 in blood pressure with an assumed standard deviation of 10 mm Hg. 3 E.g., differences computed from measurements before and after an intervention. L13-2

Statistical Methods to Choose Sample Size Fact: all procedures require pre-decided statistical model and detailed prior knowledge (estimates or guesses) about the outcomes: size of effects or precision of interest, standard deviation of observations (for normal data 4 ). Overview of approaches for determining sample size: from desired precision (standard error, size of 95% CI) on selected estimate (treatment mean, contrast), from effect of interest, using Cox s rule (practical rule), from desired power of test for effect of interest, preferably using statistical software: Minitab/Stata (or websites) for basic designs, SAS version 9 for a range of complex designs, specialised software for special/advanced designs 5 : (active researcher): http://www.stat.uiowa.edu/~rlenth/power (stats webpages): http://statpages.org/#power (G*Power = free power calculation software): http://www.gpower.hhu.de/ or using simulation (some refs in the notes). 4 For data with the variance estimated independently of the mean 5 Check also Stata s add-on packages (search sample size), and the UCLA intro webpage http://www.ats.ucla.edu/stat/seminars/intro_power/default.htm and the accompanying examples on http://www.ats.ucla.edu/stat/dae/. L13-3

Sample Size based on Estimation Precision Normal distribution models (without random effects): assume error std.dev. σ and estimate/guess of value, assume parameter of interest P ar and estimate Est, with standard error SE(Est) = σa, where A = A(n) is a known constant (depending on number of obs. n), approximate 6 95% CI: Est ± 2σA(n). Compute n to achieve desired margin of error (or CI length) by solving with respect to n in the equation: desired value 2σA(n). Blood pressure example: desired margin of error: 3 mm Hg, model: n i.i.d. observations (differences!) from N(µ,σ 2 ), Par=population mean, Est=sample mean, A=1/ n, solve: 3 2 10/ n n (2 10/3) 2 = 44.4 45. VOR example: desired CI of length 0.5 ( margin of error = 0.25) for mean differences between (any) two groups, model: 3 independent samples of size n from normal distributions N(µ i,σ 2 ); use σ=s p = 0.075 = 0.2739, Par = µ 1 µ 2, Est = ȳ 1 ȳ 2, A = 2/n, solve: 0.25 2 0.2739 2/n n 2(2 0.2739/0.25) 2 = 9.6, or n = 10; rerun with 2 replaced by t(.975, 27) = 2.052; n 10.1; choose n = 11 patients per group. 6 Valid for σ unknown and df large (say 40), so that t.975(df) z.975 2. L13-4

Errors of Type I II and Power Errors of type I and II: type I error: to reject H 0, when H 0 in reality true, type II error: to not reject H 0, when H 0 in reality false, definition of statistical tests involves only type I errors, which are controlled by the significance level (α), power of statistical tests involves type II errors (below), overview: Conclusion Truth about population from sample H 0 true H a true (H 0 false) reject H 0 type I error no error not reject H 0 no error type II error Power of a statistical test: involves a specific alternative, e.g. in the blood pressure example a difference of 3 mm Hg, definition: power = probability that the statistical test will reject H 0, when the specific alternative H a is true, = 1 type II error, important for planning of experiments: what chance of a significant result? difficult to calculate in all models (use software!), typical values for planning of a study: 0.8, 0.9, 0.95. L13-5

Sample Size based on Power or Effect Size Cox s rule for informal sample size determination: 7 assume effect size m 0 m a, where m 0 and m a are (true, population) values of Par under H 0 and H a, respect., rule: choose n by solving: m 0 m a = 3 σa(n), note: does not involve power or significance level. Requirements for sample size calc. based on power: 8 statistical model / design and corresponding software, size of effect desired to be detected, standard deviation of model (normal data), desired value of power (0.8, or 80%, commonly used), signif. level and direction (one/two-sided) of test used. Blood pressure example: true difference of 3 mm Hg, Cox s rule: 3=3 10 1/n n=100, Minitab / Stata, power 0.8 & sign. level 0.05: n = 90. VOR example: power for n = 4 and sign. level 0.01 with means as observed in preliminary data: Textbook / Stata menu: 0.930 (all means), Minitab / Stata fpower: 0.896 (maximal difference). 7 Discussed in Christensen (1996): Analysis of Variance, Design, and Regression. 8 Post-hoc power calculation (after study has been carried out) is controversial and not recommended, see page L13 11. L13-6

Non-Central Distributions Several of the common ref. distrib. for tests (t, χ 2, F distrib.) have a non-centrality parameter (ζ): ζ = 0 usual ref. distrib. under null hypothesis H 0, ζ 0 distrib. of test statistic under specific alternative hypothesis H a, where ζ deviation from H 0, power is computed as tail areas in distrib. with ζ 0, example: non-central F-distribution in ANOVA table see GO Figure 7.1, where ζ = i n iα 2 i/σ 2, blood pressure example: non-central t-distribution: t = ˆµ/SE(ˆµ) non-central t, where ˆµ N(δ, σ 2ˆµ ) and ζ = δ/σˆµ, with n = 90 we get ζ = 3/(10/ 90) = 2.846 and t = t.975,89 = 1.987: Cumulative distribution functions for t(89) 1.0 Variable central noncentr Cumulative probability 0.8 0.6 0.4 0.2 0.0-3 -2-1 0 1 x 2 3 4 5 L13-7

Sample Size for Two-way ANOVA Example 13.5-6 in IPS (5th/6th/7th ed.): calcium supplementation as a treatment for Osteoporosis in elderly people, factor: calcium (placebo, 800 mg/day), factor: vitamin D (placebo, 300 IU/day) 9, outcome: change in bone mineral density (BMD), assume n subjects per calcium vitamin D group, 10 assume calcium effect size of interest: δ = 5 BMD units, assume within-group standard deviation, for the change in BMD: σ = 10 units. Statistical model (anticipated): Y ijk = µ + α i + β j + (αβ) ij + ε ijk where α i, β j, (αβ) ij are the main effects and interaction, respectively, and ε ijk N(0, σ 2 ). Two possible situations: no interaction: sample size based on main effects, interaction, and different approaches for sample size: based on contrast of interest for one factor within one level of other factor ( two-sample situation), based on F-test for interaction, based on contrast within interaction (for 2 2 factorial: equivalent to full interaction). 9 10 Vitamin D is needed for the body to efficiently utilize calcium. In the notes, the number of subjects per group is denoted by c. L13-8

Calculations for 2-way ANOVA Example Sample size for calcium effect (δ) in no interaction model: two-sample calc. (power 0.80, Minitab) gives n = 64 actual n = 64/2 = 32, because the two vitamin D groups both contribute to the sample size, 11 using Cox s rule: SE = σ 1/(2n) + 1/(2n) = δ/3 n = (3σ/δ) 2 = 36. Interaction model two approaches: sample size for calcium effect in high (or low) vitamin D group: above two-sample calc. applies 11 : n = 64, sample size for interaction of size δ: interaction contrast 12 and SE: ˆγ = Ȳ11 + Ȳ22 Ȳ12 Ȳ21, SE(ˆγ) = σ 1/n + 1/n + 1/n + 1/n = = σ 4/n = ( 2σ) 2/n, last formula 2-sample comparison with standard deviation 2 σ: n = 127 (power 0.80, Minitab). 11 Actual power will be slightly less than 0.80 because the df is smaller in the two-way ANOVA than the two-sample situation. 12 The contrast estimates the difference in calcium effect between the two vitamin D levels, or the difference in vitamin D effect between the two calcium levels. L13-9

Unequal Sample Sizes Generally speaking, it is most efficient (gives best precision) to have equal sample sizes in different groups of a factor. Two special cases where unequal distributions across groups are useful: One control group and several treatment groups all to be compared to control only(!): the optimal sample size for control is larger than for treatment groups, g treatments (incl. control), n c =size of control, n t = size for other treatments solve equation: (n c /n t ) 2 = (g 1), e.g., for g = 5 we get n c = 2n t (control group of double size!), factor with quantitative (and equidistant) levels where certain polynomial contrasts are of particular interest: e.g., linear contrasts always give highest weights to the most extreme categories (Table D.6 in GO) higher precision for linear contrast may be achieved by overrepresenting the most extreme categories. 13 13 The gain in precision is counterbalanced by reduced precision for other contrasts; for example, if only the two extreme categories are included, the linear contrast is estimated with best precision, but no other polynomial contrasts can be estimated. L13-10

Sample Size Misconceptions Common misconceptions 14 in sample size calculations: use of standard effect sizes (general definitions of small, medium and large effects, relative to std. dev. 15 ): effects of interest should be determined exclusively from the context of your study, retrospective power calculation: after a study has been carried out and using its estimated values 15 : power/sample size calculations aid in planning of new studies, not in interpreting results of data analysis, confidence intervals give the best information about the unknown parameters from a study, if H 0 was not rejected, the conclusion may be strengthened by an equivalence test (following slides), instead of arguing from the study s power. Valid reasons to do statistical power calculations: planning a new study (that s the entire idea!), getting a sense of a new study s feasibility (given logistical and financial constraints), revising a pre-study sample calculation after the study has been carried out, if some of the assumptions/settings turned out to be unrealistic, e.g. the standard deviation (but typically not the effect size), insufficient information was available for a particular analysis. 14 Largely based on Lenth (2001), The American Statistician 55, 187 193. 15 Also the Stata manual does this, with literature references; this does not make the approach more valid... L13-11

Equivalence and Non-Inferiority Testing in Two-Sample Situations An equivalence test is for making a (statistical) statement that effects of two treatments (or other groups) are equivalent, in the sense that their effects differ at most by some pre-set, typically small amount. A non-inferiority test is for making a (statistical) statement that the effect of one treatment is not inferior to that of another treatment, in the sense that its effect is either better or possibly not worse by more than some pre-set, typically small amount. Comparison with the typical situation comparing treatment and control groups (say, with means µ 1 and µ 2 ), still relevant to test H 0 : µ 1 =µ 2 against a one- or two-sided Ha, but the desired outcome is to not reject H 0 weak, non-quantitative and essentially useless conclusion, need to set up different statistical hypotheses, so that the desired outcome corresponds to rejection of the null hypothesis, typically, the new H 0 will involve a range of parameter values (instead of the usual single value), as will Ha, typically, the new H 0 and Ha will require extra information from the study s context: the magnitude of a true difference between groups that is not considered important. Example to illustrate ideas (from Minitab, possibly made-up data): protein content in cat food, comparing new ( Discount ) and Original products, in two-sample study with 10 and 9 obs. and equivalence defined by a difference not exceeding 0.5 grams (per 100 grams). L13-12

Methods for Equivalence Analyses Main message: no new models needed, and typically a minor adjustment of standard methods is sufficient. 16 General (approximate) method for equivalence testing of a difference parameter θ (e.g., θ = µ 1 µ 2 for a two-sample situation) up to an importance threshold δ > 0: test the non-equivalence hypothesis H 0 (ne) : θ δ against the H a (ne) : θ < δ, as follows (at a 5% significance level): compute a 90% CI (not 95% CI) for θ, reject H 0 (ne), if the interval ( δ, δ) is entirely inside CI. (ni) General method for non-inferiority test of H 0 : θ δ (ni) against the H a : θ < δ: ( µ 2 > µ 1 δ) use same procedure as for testing the standard H 0 : θ = δ against the one-sided alternative. Equivalence analysis in practice: Minitab 17 has built-in equivalence trial menu covering some standard designs (1-sample, 2-sample, cross-over), and additional menu for sample size calculation. Stata options less clear, possibly in pk (Pharmacokinetic) module, and also in add-on package tost. 16 Reference textbook: Chow & Liu (2009), Design and Analysis of Bioavailability and Bioequivalence Studies. CRC Press. L13-13

Project Presentations scheduled for April 7, 9:00-11:00am, approx. 15 min. overview of problem, data, statistical analysis and conclusions, statistical models/methods must be explained! conclusions must be presented, including estimated effects, reduce biological introduction and discussion to the essentials..., approx. 5 min. informal discussion, involving all course participants, is it a good idea to assign opponents to each presentation? both biological and statistical issues, use whiteboard, overhead, Powerpoint and Minitab/Stata (use my laptop or bring your own), as you like, any priorities on order? (otherwise random), marking scheme: no marks for presentation alone (only combined with report), my main emphasis is on your understanding of what you did..., format and layout are of minor importance. L13-14

Project Reports manuscript-like layout: introduction, material and methods (in particular, statistical methods), results, discussion/conclusion, remember, statistical methods must be described in more detail than you would do in an applied paper, you need to document your analyses by suitable software listings or program files (e.g. a Stata do-file), please attach a data set prepared for analysis, the statistical analysis often comprises several parts/ methods (contrary to statistics reported in papers that are usually restricted to a single method), not a pile of annotated Minitab/Stata listings, listings may be put in an appendix (and could be numbered), probably 5-10 pages of text, marked (30% of course mark), emphasis will be on: problem and data description, statistical models and their validation, statistical inference, conclusions and presentation of results. due date listed at course homepage & Moodle account. L13-15