Power & Sample Size Calculation

Similar documents
Factorial Treatment Structure: Part I. Lukas Meier, Seminar für Statistik

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

STAT22200 Spring 2014 Chapter 8A

Two-Way Factorial Designs

Lec 5: Factorial Experiment

STAT22200 Spring 2014 Chapter 13B

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

16.3 One-Way ANOVA: The Procedure

Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Analysis of Variance

Chapter 11 - Lecture 1 Single Factor ANOVA

STAT22200 Spring 2014 Chapter 14

Analysis of Variance

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

Power. Week 8: Lecture 1 STAT: / 48

Contents. TAMS38 - Lecture 6 Factorial design, Latin Square Design. Lecturer: Zhenxia Liu. Factorial design 3. Complete three factor design 4

Battery Life. Factory

Factorial designs. Experiments

Lecture 9. ANOVA: Random-effects model, sample size

Chapter 4: Randomized Blocks and Latin Squares

. Example: For 3 factors, sse = (y ijkt. " y ijk

Chapter 10: Analysis of variance (ANOVA)

STAT22200 Spring 2014 Chapter 5

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

y = µj n + β 1 b β b b b + α 1 t α a t a + e

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Design & Analysis of Experiments 7E 2009 Montgomery

DESAIN EKSPERIMEN BLOCKING FACTORS. Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Solutions to Final STAT 421, Fall 2008

STATS Analysis of variance: ANOVA

Analysis of Variance and Design of Experiments-I

STAT22200 Chapter 14

Chapter 20 : Two factor studies one case per treatment Chapter 21: Randomized complete block designs

SIZE = Vehicle size: 1 small, 2 medium, 3 large. SIDE : 1 right side of car, 2 left side of car

Week 14 Comparing k(> 2) Populations

Factorial ANOVA. Psychology 3256

PLSC PRACTICE TEST ONE

Stat 217 Final Exam. Name: May 1, 2002

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Sample Size / Power Calculations

STAT Final Practice Problems

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

The Chi-Square Distributions

Unit 12: Analysis of Single Factor Experiments

The Chi-Square Distributions

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Chapter 11 - Lecture 1 Single Factor ANOVA

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

21.0 Two-Factor Designs

Random and Mixed Effects Models - Part III

STAT 525 Fall Final exam. Tuesday December 14, 2010

Review of Statistics 101

20.0 Experimental Design

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Sociology 6Z03 Review II

Difference in two or more average scores in different groups

Nesting and Mixed Effects: Part I. Lukas Meier, Seminar für Statistik

Topic 9: Factorial treatment structures. Introduction. Terminology. Example of a 2x2 factorial

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G

Unit 27 One-Way Analysis of Variance

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

STAT Chapter 10: Analysis of Variance

ANOVA Analysis of Variance

Data Analysis and Statistical Methods Statistics 651

ANOVA (Analysis of Variance) output RLS 11/20/2016

TWO OR MORE RANDOM EFFECTS. The two-way complete model for two random effects:

Statistics For Economics & Business

MEMORIAL UNIVERSITY OF NEWFOUNDLAND DEPARTMENT OF MATHEMATICS AND STATISTICS FINAL EXAM - STATISTICS FALL 1999

One-Way ANOVA Calculations: In-Class Exercise Psychology 311 Spring, 2013

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

One-way ANOVA (Single-Factor CRD)

STAT 506: Randomized complete block designs

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Lecture 3: Inference in SLR

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Multiple comparisons The problem with the one-pair-at-a-time approach is its error rate.

Allow the investigation of the effects of a number of variables on some response

One-Way ANOVA Source Table J - 1 SS B / J - 1 MS B /MS W. Pairwise Post-Hoc Comparisons of Means

Lec 1: An Introduction to ANOVA

Topic 7: Incomplete, double-blocked designs: Latin Squares [ST&D sections ]

Analysis of Variance

One-way ANOVA. Experimental Design. One-way ANOVA

Week 12 Hypothesis Testing, Part II Comparing Two Populations

Your schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

If we have many sets of populations, we may compare the means of populations in each set with one experiment.

Lecture 7: Hypothesis Testing and ANOVA

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Note: The problem numbering below may not reflect actual numbering in DGE.

Stat 6640 Solution to Midterm #2

IEOR165 Discussion Week 12

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

STAT 536: Genetic Statistics

Relating Graph to Matlab

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

Transcription:

Chapter 7 Power & Sample Size Calculation Yibi Huang Chapter 7 Section 10.3 Power & Sample Size Calculation for CRDs Power & Sample Size for Factorial Designs Chapter 7-1

Power & Sample Size Calculation When proposing an experiment (applying for funding etc), nowadays one needs to show that the proposed sample size (i.e. the number of experiment units) is neither so small that scientifically interesting effects will be swamped by random variation (i.e., has enough power to reject a false H 0 ) nor larger than necessary, wasting resources (time & money) Chapter 7-2

Errors and Power in Hypothesis Testing A Type I error occurs when H 0 is true but is rejected. A Type II error occurs when H 0 is false but is accepted. The (significance) level of a test is the chance of making a Type I error, i.e., the chance to reject a H 0 when it is true. The power of the test is the the chance of rejecting H 0 when H a is true: power = 1 P(making type II error H 0 is false) = 1 β = P(correctly reject H 0 H 0 is false) A good test should have small test-level and large power. H 0 is true H 0 is false H a is rejected H 0 is rejected (H 0 is accepted) (H a is accepted) α = P(Type I error) β = P(Type II error) Chapter 7-3

Power Calculation for ANOVA For a CRD design with g treatments, recall the means model: y ij = µ i + ε ij and the effects model: y ij = µ + α i + ε ij, for i = 1,..., g, and j = 1,..., n i. The two models are related via the relationship µ i = µ + α i. For CRDs, we use the means model more often. However, for power and sample size calculation, we need to use the effects model and µ must be defined as µ = 1 g n i µ i = 1 g n i µ i. N N i=1 j=1 which implies that g i=1 n iα i = 0. Here N = g i=1 n i is the total number of experimental units. Chapter 7-4 i=1

Recall the H 0 and H a for the ANOVA F test are H 0 : µ 1 = = µ g v.s. H a : µ i s not all equal, in terms of the means model, or equivalently in terms of the effects model, they are H 0 : α 1 = = α g = 0 v.s. H a : Not all α i s are 0. Recall we reject H 0 at level α if the test statistic F = MS Trt MSE exceeds the critical value F α,g 1,N g. So Power = P(Reject H 0 H a is true) = P(F > F α, g 1, N g ). To find P(F > F α, g 1, N g ), we must know the distribution of F. What is the distribution of F under H 0? F g 1,N g. And under H a? Chapter 7-5

For F = MS Trt MSE = SS Trt/(g 1), recall in Ch3 we ve shown that SSE/(N g) no matter under H 0 or H a, it is always true that SSE σ 2 χ2 N g, and E(MSE) = σ2 For the numerator, SS Trt /σ 2 has a χ 2 g 1 distribution under H 0. But under H a, it has a non-central Chi-square distribution χ 2 g 1,δ with non-central parameter Moreover, δ = g i=1 n iα 2 i σ 2. E(SS Trt ) = (g 1)σ 2 + g n iα 2 i=1 i }{{} δσ { 2 (g 1)σ 2 under H 0 = (g 1 + δ)σ 2 under H a Chapter 7-6

Non-Central F -Distribution Under H a, it can be shown that F = MS Trt MSE has a non-central F -distribution on degrees of freedom g 1 and N g, with non-centrality parameter δ, denoted as g i=1 F F g 1,N g,δ where δ = n iαi 2 σ 2. Recall that α i = µ i µ, and µ = 1 N g n i µ i = 1 N i=1 j=1 g n i µ i. Non-central F -distribution is also right skewed, and the greater the non-centrality parameter δ, the further away the peak of the distribution from 0. Chapter 7-7 i=1

Example 1: Power Calculation (1) g = 5 treatment groups group sizes: n 1 = n 2 = n 3 = 5, n 4 = 6, n 5 = 4 assume σ = 0.8 desired significance level α = 0.05 find the power of the test when H a is true with µ 1 = 1.6, µ 2 = 0.6, µ 3 = 2, µ 4 = 0, µ 5 = 1. Solution. The grand mean µ is g i=1 µ = n iµ i 5 1.6 + 5 0.6 + 5 2 + 6 0 + 4 1 = N 5 + 5 + 5 + 6 + 4 = 25 25 = 1 The α i s are thus α i = µ i µ = µ i 1: α 1 = 0.6, α 2 = 0.4, α 3 = 1, α 4 = 1, α 5 = 0. Chapter 7-8

Example 1: Power Calculation (2) The non-centrality parameter is g i=1 δ = n iαi 2 σ 2 = 5 0.62 + 5 ( 0.4) 2 + 5 1 2 + 6 ( 1) 2 + 4 0 2 0.8 2 = 13.6 0.64 = 21.25 So F = MS Trt MSE { F g 1, N g = F 4, 25 5 under H 0 F g 1, N g,δ = F 4, 25 5, 21.25 under H a 0.0 0.4 0.8 H 0 : F F 4,20 H a : F F 4,20,21.25 0 5 10 15 Chapter 7-9

Example 1: Power Calculation (3) The critical value to reject H 0 keeping the significance level at α = 0.05 is F 0.05,4,20 2.866. > qf(0.05,4,20, lower.tail=f) [1] 2.866081 When H 0 is true, the chance that H 0 is rejected is only 0.05 (the red shaded area.) 0.0 0.4 0.8 Accept H 0 Reject H 0 F 0.05,4,20 Area = 0.05 0 5 10 15 Reminder: Don t confuse the following two: in most STAT books, F α,df 1,df 2 means the area of the upper tail is α. in R, qf(alpha, df1, df2) means the the area of the lower Chapter 7-10

Example 1: Power Calculation (4) If H a is true and µ 1 = 1.6, µ 2 = 0.6, µ 3 = 2, µ 4 = 0, µ 5 = 1, we know then F F 4,20,δ=21.25. The power to reject H 0 is the area under the density of F 4,20,δ=21.25 beyond the critical value (the blue shaded area,) which is 0.9249. > pf(qf(.95,4,20),4,20, ncp=21.25, lower.tail=f) [1] 0.9249342 In the R codes, ncp means the non-centrality parameter. 0.0 0.4 0.8 Accept H 0 Reject H 0 Power F 0.05,4,20 0 5 10 15 Chapter 7-11

Example 1: Power Calculation (5) Remark The power of a test is not a single value, but a function of the parameters in H a. If parameter µ i s change their values, the power of the test also changes. It makes no sense to talk about the power of a test without specifying the parameters in H a Chapter 7-12

Power Of A Test Is Affected By... The larger the non-centrality parameter the greater the power. δ = The power of a test will increase if g i=1 n iα 2 i σ 2, the number of replicate n i per treatment increases the magnitude of treatment effects α i s increase (under the constraint i n iα i = 0) the size of noise σ 2 decreases. Chapter 7-13

Example 2: Sample Size Calculation (1) g = 5 treatment groups of equal sample size n i = n for all i assume σ = 0.8 desired significance level α = 0.05 Assuming the design is balanced that all groups have identical sample size n, what is the minimal sample size n per treatment to have power 0.95 when α 1 = 0.5, α 2 = 0.5, α 3 = 1, α 4 = 1, α 5 = 0? Solution. The non-centrality parameter is g i=1 δ = n iαi 2 σ 2 = n 0.8 2 [0.52 + ( 0.5) 2 + 1 2 + ( 1) 2 + 0 2 ] = n 2.5 = 3.9n 0.82 So F = MS Trt MSE { F g 1, N g = F 4, 5n 5 under H 0 F g 1, N g,δ = F 4, 5n 5, 3.9n under H a Chapter 7-14

Recall the critical value F for rejecting H 0 at level α = 0.05 is F = F α, g 1, N g = F 0.05, 4, 5n 5 which can be find in R via the command > F.crit = qf(alpha, g-1, N-g, lower.tail=f) # syntax > F.crit = qf(0.05, 4, 5*n-5, lower.tail=f) # sub-in the values By definition, Power = P(reject H 0 H a is true) = P(the non central F statistic F ) = P(F (g 1, N g, δ) F ) = P(F (4, 5n 5, 3.9n) F ) which can be found in R via the command > pf(f.crit, g-1, N-g, ncp=delta, lower.tail=f) # syntax > pf(f.crit, 4, 5*n-5, ncp=3.9*n, lower.tail=f) # sub-in values In the R codes, ncp means the non-centrality parameter. Chapter 7-15

Now we find the R code to find the power of the ANOVA F -test when n is known. Let s plug in different values of n and see what is the smallest n to make power 0.95. > F.crit = qf(alpha, g-1, N-g, lower.tail=f) > pf(f.crit, g-1, N-g, ncp=delta, lower.tail=f) # syntax > n = 5 > F.crit = qf(0.05, 4, 5*n-5, lower.tail=f) > pf(f.crit, 4, 5*n-5, ncp=3.9*n, lower.tail=f) [1] 0.8994675 # less than 0.95, not high enough > n = 6 > F.crit = qf(0.05, 4, 5*n-5, lower.tail=f) > pf(f.crit, 4, 5*n-5, ncp=3.9*n, lower.tail=f) [1] 0.9578791 # greater than 0.95, bingo! So we need 6 replicates in each of the 5 treatment groups to ensure a power of 0.95 when α 1 = 0.5, α 2 = 0.5, α 3 = 1, α 4 = 1, α 5 = 0. Remark: Again, we have to specify the H a fully to to find the appropriate sample size. Chapter 7-16

But σ 2 is Unknown... As σ 2 is usually unknown, here are a few ways to make a guess. Make a small-sample pilot study to get an estimate of σ. Based on prior studies or knowledge about the experimental units, can you think of a range of plausible values for σ? If so, choose the biggest one. You could repeat the sample size calculations for various levels of σ to see how it affects the needed sample size. Chapter 7-17

How to Specify the H a? As the power of a test depends on the alternative hypothesis H a, that is, the α i s, one might has to try several sets of α i s to find the appropriate sample size. But how many H a s we have to try? Here is a useful trick. 1. Suppose we would be interested if any two means differed by D or more. 2. The smallest value of δ in this case is when two means differ by exactly, D, and the other g 2 means are halfway between. 3. So try α 1 = D/2, α 2 = D/2, and α i = 0 for all other τ. Assuming equal sample sizes, the non-centrality parameter is δ = i nαi 2 σ 2 = n(d2 /4 + D 2 /4) σ 2 = nd2 2σ 2. Chapter 7-18

Power and Sample Size Calculation for Complete Block Designs For complete block designs y ij = µ + α i + β j + ε ij y ijk = µ + α i + β j + γ k + ε ijk y ijkl = µ + α i + β j + γ k + δ l + ε ijkl RCBD Latin Square Graeco-Latin Square where α i s are the treatment effect, and β j, γ k, δ l are effects of blocking factors, when H a is true, treatments have different effects, the ANOVA F -statistic also has a non-central F -distribution F = MS trt MSE { F g 1, df of MSE under H 0 under H a F g 1, df of MSE,δ where the non-centrality parameter δ is δ = r i α2 i σ 2, where r = # of replicates per treatment Note that the df changes from N g to the df of MSE. Chapter 7-19

Power and Sample Size Calculation for Balanced Factorial Designs Section 10.3 in Oehlert s book We may ignore factorial structure and compute power and sample size for the overall null hypothesis of no model effects. We will address methods for balanced data only because most factorial experiments are designed to be balanced, and simple formulae Power for balanced data for noncentrality parameters exist only for balanced data. For factorial data, we usually test H 0 about main effects or interactions in addition to the overall H 0 of no model effects. Chapter 7-20

Non-Centrality Parameter for Balanced Factorial Designs Ex, consider a 3-way factorial design y ijkl = µ + α i + β j + γ k + αβ ij + βγ jk + αγ ij + αβγ ijk + ε ijkl to calculate the power of the F -test of a main effect or an interaction effect under H a, the non-centrality parameter can be calculated as follows: Square the factorial effects (using the zero-sum constraint) and sum them, Multiply this sum by the total number of data in the design divided by the number of levels in the effect, and Divide that product by the error variance. Chapter 7-21

E.g., consider a a b c factorial design with r replicates per treatment y ijkl = µ + α i + β j + γ k + αβ ij + βγ jk + αγ ij + αβγ ijk + ε ijkl To test about the B main effects H 0 : β 1 = = β b = 0 v.s. H a : not all β j are 0 the ANOVA F -statistic for B main-effect is F = MS B MSE { F b 1, df of MSE under H 0 under H a F b 1, df of MSE,δ where the non-centrality parameter δ is ( N ) / ( δ = b j β2 j σ 2 = rac ) / j β2 j σ 2. Here N = abcr is the total number of observations. Chapter 7-22

E.g., consider a a b c factorial design with r replicates per treatment y ijkl = µ + α i + β j + γ k + αβ ij + βγ jk + αγ ij + αβγ ijk + ε ijkl To test about the AB interactions H 0 : all αβ ij = 0 v.s. H a : not all αβ ij = 0 the ANOVA F -statistic for AB interactions F = MS AB MSE { F (a 1)(b 1), df of MSE under H 0 under H a F (a 1)(b 1), df of MSE,δ where the non-centrality parameter δ is ( N ) / ( δ = ab (αβ ij) 2 σ 2 = rc (αβ ij) 2) / σ 2. ij ij Here N = abcr is the total number of observations. Chapter 7-23

Sample Size Should Be The Last Thing To Consider In practice, sample size calculation is the last thing to do in designing an experiment. Consider the followings before calculating the required sample size. asking the right question, choosing an appropriate response, thinking about what factors need to be blocked on, choosing a right design, setting up the right procedure to recruit subjects, and so on, All the above are more important than doing the right sample size calculation! Sometimes, choosing a better design can substantially reduces the required sample size. Chapter 7-24