Split-Plot Designs. David M. Allen University of Kentucky. January 30, 2014

Similar documents
Randomized Complete Block Designs

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

A Likelihood Ratio Test

Time-Invariant Predictors in Longitudinal Models

Sleep data, two drugs Ch13.xls

A discussion on multiple regression models

LOOKING FOR RELATIONSHIPS

Time-Invariant Predictors in Longitudinal Models

STAT 135 Lab 10 Two-Way ANOVA, Randomized Block Design and Friedman s Test

16.400/453J Human Factors Engineering. Design of Experiments II

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

OHSU OGI Class ECE-580-DOE :Design of Experiments Steve Brainerd

Topic 21 Goodness of Fit

Multiple comparisons - subsequent inferences for two-way ANOVA

Chapter 7 Student Lecture Notes 7-1

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Analysis of Variance

Simple logistic regression

Topic 22 Analysis of Variance

Chapter 3 Multiple Regression Complete Example

Ch 2: Simple Linear Regression

Time Invariant Predictors in Longitudinal Models

COMPARING SEVERAL MEANS: ANOVA

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

STAT 501 EXAM I NAME Spring 1999

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

Summary of Chapters 7-9

Math 423/533: The Main Theoretical Topics

Inferences for Regression

Lecture 3: Inference in SLR

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances

Statistical Distribution Assumptions of General Linear Models

Hypothesis Testing hypothesis testing approach

Logistic Regression Analysis

Sociology 6Z03 Review II

Basic Business Statistics, 10/e

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

A Re-Introduction to General Linear Models (GLM)

Lecture 21: October 19

Mathematical statistics

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Chapter 10: Inferences based on two samples

Chapter 14 Student Lecture Notes 14-1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

STK4900/ Lecture 3. Program

Regression With a Categorical Independent Variable: Mean Comparisons

Inference for Regression

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Linear Mixed Models: Methodology and Algorithms

df=degrees of freedom = n - 1

Chapter 13. Multiple Regression and Model Building

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Simple linear regression

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary


Analysis of Variance (ANOVA)

Institute of Actuaries of India

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

Exam details. Final Review Session. Things to Review

Introduction to SAS proc mixed

Confidence Intervals, Testing and ANOVA Summary

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

PLSC PRACTICE TEST ONE

Figure 9.1: A Latin square of order 4, used to construct four types of design

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

Week 14 Comparing k(> 2) Populations

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Statistical Inference: The Marginal Model

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

Analysis of Variance and Co-variance. By Manza Ramesh

UNIVERSITY OF TORONTO Faculty of Arts and Science

Introduction to SAS proc mixed

Introduction to Crossover Trials

Mixed Designs: Between and Within. Psy 420 Ainsworth

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

ST505/S697R: Fall Homework 2 Solution.

Answer to exercise: Blood pressure lowering drugs

Approximations to Distributions of Test Statistics in Complex Mixed Linear Models Using SAS Proc MIXED

LECTURE 5 HYPOTHESIS TESTING

General Linear Model (Chapter 4)

13 Simple Linear Regression

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

THE PEARSON CORRELATION COEFFICIENT

Central Limit Theorem ( 5.3)

:the actual population proportion are equal to the hypothesized sample proportions 2. H a

Multiple Linear Regression

Bios 6649: Clinical Trials - Statistical Design and Monitoring

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

10.2: The Chi Square Test for Goodness of Fit

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Logistic Regression. Continued Psy 524 Ainsworth

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Lecture 18 Miscellaneous Topics in Multiple Regression

Transcription:

Split-Plot Designs David M. Allen University of Kentucky January 30, 2014

1 Introduction In this talk we introduce the split-plot design and give an overview of how SAS determines the denominator degrees of freedom for various tests. Back 2

2 Drug-Alcohol Study The drug-alcohol study presented here is based on an actual study. It has been scaled down to facilitate more explicit displays. The responses have be changed because the original data are proprietary. See Allen and Cady [1] for more discussion. Back 3

Background Tranquilizers are one of the most prescribed classes of drugs. Unfortunately, the combination of tranquilizers and alcohol can compromise a driver s ability to operate a motor vehicle. It is desirable to develop a new tranquilizer that serves its intended purpose but does not combine with alcohol to give an undesirable effect. This trial is to compare effects of drug, effects of alcohol, and the effects of their interaction. The drugs are A a new drug, B a currently popular drug, and C a placebo. The response is the subject s performance on a simulated driving test. While multiple response measurements are recorded, the mean deviation (in feet) from the center of the driving lane is used here. Back 4

Randomization Subjects are the whole-plot unit. The alcohol and no alcohol treatments are randomly assigned to the twelve subjects with the restriction that there is the same number of subjects in each treatment group. Separately for each subject, the order of drugs A, B, and C is randomized. There is an adequate interval of time between administration of the different drugs to insure there are no carry-over effects. Back 5

The data Drugs Alcohol Subject A B C Yes EAS 3.56 4.04 3.26 Yes JBM 3.79 3.88 3.49 Yes ARE 4.09 5.32 3.79 Yes JBH 3.10 4.38 2.80 Yes WJT 3.33 3.63 3.03 Yes EEA 3.35 3.63 3.05 No JWL 2.83 2.55 2.63 No CJW 2.93 2.42 2.73 No RDF 3.58 3.99 3.38 No RLA 2.98 3.07 2.78 No HW 2.32 2.15 2.12 No AMR 2.73 3.23 2.53 Back 6

The model is The model y jk = μ + α + s j + δ k + (αδ) k + ε jk where y jk is the observation on the response variable; μ is the over-all mean; α is the effect of the th level of alcohol; s j is the effect of the jth subject; δ k is the effect of the kth drug; (αδ) k is the effect of the interaction of the th level of alcohol and kth level of drug; and ε jk is a random error. We assume s j N(0, σ 2 s ), ε jk N(0, σ 2 ), and that these effects are mutually independent. All other effects are considered fixed parameters. We have that j = 1 6 for = 1, and j = 7 12 for = 2. Back 7

Symbolic data Drugs Alcohol Subject A B C Yes EAS y 1,1,1 y 1,1,2 y 1,1,3 y 1,1, Yes JBM y 1,2,1 y 1,2,2 y 1,2,3 y 1,2,...... Yes WJT y 1,5,1 y 1,5,2 y 1,5,3 y 1,5, Yes EEA y 1,6,1 y 1,6,2 y 1,6,3 y 1,6, y 1,,1 y 1,,2 y 1,,3 y 1,, No JWL y 2,7,1 y 2,7,2 y 2,7,3 y 2,7, No CJW y 2,8,1 y 2,8,2 y 2,8,3 y 2,8,...... No HW y 2,11,1 y 2,11,2 y 2,11,3 y 2,11, No AMR y 2,12,1 y 2,12,2 y 2,12,3 y 2,12, y 2,,1 y 2,,2 y 2,,3 y 2,, Back 8

Symbolic analysis of variance Degrees of Sum of Mean Expected Source Freedom Squares Square Mean Square Alcohol 1 SS α MS α σ 2 + 3σ 2 s + Q(α, ( Subjects 10 SS s MS s σ 2 + 3σ 2 s Drugs 2 SS δ MS δ σ 2 + Q(δ, (αδ)) Alcohol*Drug 2 SS (αδ) MS (αδ) σ 2 + Q((αδ)) Residual 20 SS ε MS ε σ 2 Back 9

Numeric analysis of variance Degrees of Sum of Mean F- Source Freedom Squares Square statistic Alcohol 1 5.8968 5.8968 10.11 Subjects 10 5.8340 0.5834 Drugs 2 1.8772 0.9386 13.29 Alcohol*Drug 2 0.8686 0.4343 6.15 Residual 20 1.4126 0.0706 Back 10

3 Nested factors A factor B is said to be nested within factor A if the levels of factor B are different within each level of factor A. In this case, we say factor A contains factor B. Back 11

An example To facilitate explicit displays, we use a smaller version of the drug-alcohol study: Drug Alcohol Subject SubWithin A B Yes dma 1 4.35 6.82 Yes lwh 2 3.39 5.28 Yes rla 3 5.48 7.12 No clw 1 4.86 6.44 No red 2 6.66 8.21 No bbs 3 5.75 9.25 No kmd 4 3.87 5.70 The levels of Subjects are completely different for the yes Back 12

and no levels of Alcohol. We say that Subjects are nested within Alcohol and that Alcohol contains Subjects. Back 13

Coding Sometimes a nested factor is coded such that the levels are unique only within levels of the containing factor. For example, the factor SubWithin in the above display is unique only within levels of Alcohol. The remainder of this section deals with building the Z matrix. We assume Alcohol, Subject, and SubWithin are classes variables. Back 14

Z = Building the Z matrix 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 We can build Z by putting Subject in a random statement. We call this the direct method. Back 15

SAS notation We can build Z by putting either of the equivalent terms, Alcohol*SubWithin or SubWithin(Alcohol), in a random statement. We call this the product method. Back 16

Z = 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 = 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 Back 17

An editorial The product method has little to recommend it: A variable having a unique subject code must exist, for otherwise the randomization could not have been carried out. Why not use it? If there are unequal numbers of subjects in the alcohol groups, the second method will put one or more columns of all zeros in the design matrix. This increases computational time. Back 18

From the computational point of view, the worst possible specification is combine the two methods. For example, Subject(Alcohol) would introduce fourteen columns in the design matrix, and one-half of them would be all zeros. There is an additional consideration: SAS treats models specified by the product and direct methods differently. Back 19

4 Satterthwaite procedure In this section we give the simplest form of the Satterthwaite approximation [3]. This approximation may be thought of as synthesizing a mean square. Back 20

The setup Suppose a model depends on vector of fixed effects, β, and two variances, σ 2 1 and σ2. Our interest is in a linear 2 function of the fixed effects which we denote by δ. Assume that we have a normally distributed estimator, ˆδ, with variance c 1 σ 2 1 + c 2σ 2 2 where c 1 and c 2 are known constants. Available are SS 1 and SS 2 such that SS 1 σ 2 1 χ2 (ν 1 ) and SS 2 σ 2 2 χ2 (ν 2 ). You may look back to page 9 for an example of SS 1 and SS 2. SS 1, SS 2, and ˆδ are mutually independent. The test statistic for the null hypothesis that δ is equal a specified value δ 0 is t = ˆδ δ 0 c1 SS 1 /ν 1 + c 2 SS 2 /ν 2. Back 21

The question is: what is the distribution of t? Back 22

Decomposing t The approach used here is to approximate the distribution of t by a t-distribution. That reduces the problem to finding the degrees of freedom of the approximating t-distribution. Define and Z = ˆδ δ 0 c 1 σ 2 1 + c 2σ 2 2 U = c 1 σ 2 SS 1 1 c 2 σ 2 SS 2 2 ν 1 (c 1 σ 2 1 + c 2σ 2 2 ) σ 2 + 1 ν 2 (c 1 σ 2 1 + c 2σ 2 2 ) σ 2 2 then t = Z/ U. Under the null hypothesis, the distribution of Z is standard normal. Back 23

It remains to approximate the distribution of U by a Chi-square divided by it degrees of freedom, i.e. there exist a ν such that U χ 2 (ν)/ν is approximately satisfied. Back 24

Degrees of freedom for approximating distribution By approximately satisfied we mean U and χ 2 (ν)/ν should have the same variance. Now V r(u) = and c 1 σ 2 2 c 1 2 σ 2 2 ν 1 (c 1 σ 2 1 + c 2σ 2 2 ) 2ν 1 + ν 2 (c 1 σ 2 1 + c 2σ 2 2 ) = 2 c2 1 σ4 1 /ν 1 + c 2 2 σ4 2 /ν 2 (c 1 σ 2 1 + c 2σ 2 2 )2 V r χ 2 (ν)/ν = 2 ν. 2 Back 25

Equating these two variances and solving for ν gives ν = (c 1σ 2 1 + c 2σ 2 2 )2 c 2 1 σ4 1 /ν 1 + c 2 2 σ4 2 /ν 2 Back 26

5 Estimation with balanced data Estimators of linear combinations of fixed effects can be categorized in three ways: 1. estimators that are orthogonal to subjects; 2. estimators that involve only subject totals; and 3. other estimators. We will illustrate a represenitive estimator from each category. The estimators discussed in this section are defined in terms of notation given on page 8. Back 27

Drug A versus Drug C A comparison of Drug A with Drug C, averaged over possible interaction effects, is orthogonal to subjects. This is because each drug is used on each subject. The estimator of δ 1 δ 3 is (y 1,,1 + y 2,,1 y 1,,3 y 2,,3 )/2, and its variance is σ 2 /6. The residual mean square is an estimator of σ 2 and is distributed proportional to Chi-square. The t-distribution is used in the usual way for testing or or confidence intervals. A similar result is true for all contrasts among drug effects or among interaction effects. Back 28

Alcohol versus no alcohol A comparison alcohol with no alcohol, averaged over any interaction effects, involves only subject totals. The estimator of α 1 α 2 is y 1,, y 2,,, and its variance is (3σ 2 s + σ2 )/9. The subject mean square is an estimator of 3σ 2 s + σ2 and is distributed proportional to Chi-square. The t-distribution is used in the usual way for testing or or confidence intervals. Back 29

Response with Drug A and Alcohol The estimated response for a subject on Drug A and Alcohol is y 1,,1, and its variance is σ 2 + σ 2. We estimate s σ 2 + σ 2 s by 1 3 MS s + 2 3 MS ε. Unfortunately, 1 3 MS s + 2 3 MS ε is not distributed proportional to Chi-square, so the usual confidence interval based on the t-distribution not strictly valid. Back 30

We use the Satterthwaite procedure to find the degrees of freedom of the approximating Chi-square distribution. The correspondence of notation is σ 2 1 = 3σ2 s + σ2 σ 2 2 = σ2 ν 1 = 10 ν 2 = 20 c 1 = 1/3 c 2 = 2/3 Since the variances are not known, substitute the corresponding mean squares. The result is ν = 15. We proceed with the inference assuming a t-distribution with Back 31

fifteen degrees of freedom. Back 32

6 SAS degrees of freedom options On the estimate statement one may use the df option to specify the denominator degrees of freedom for the approximate t-distribution. However, except for simple tests with balanced data, most people will want SAS to provide the degrees of freedom. In this section we describe five different methods for determining denominator degrees of freedom that a accessible in SAS. Back 33

The containment method The containment method is the default when the RANDOM statement is used. Otherwise, the containment method is invoked with the DDFM = CONTAIN option on the model statement. Denote the fixed effect in question A, and search the RANDOM effect list for the effects that syntactically contain A. Among the random effects that contain A, compute their rank contribution to the [X Z] matrix. The denominator degrees of freedom assigned to A is the smallest of these rank contributions. If A is not found on the random statement, the containment method is not invoked, and the denominator degrees of freedom are the residual degrees of freedom. Back 34

Note that for a nested model, specified by the direct method, the containment method will not be invoked. Back 35

The between-within method The DDFM = BETWITHIN option is the default for REPEATED statement specifications (with no RANDOM statements). It is computed by dividing the residual degrees of freedom into between-subject and within-subject portions. PROC MIXED then checks whether a fixed effect changes within any subject. If so, it assigns within-subject degrees of freedom to the effect; otherwise, it assigns the between-subject degrees of freedom to the effect. If there are multiple within-subject effects containing classification variables, the within-subject degrees of freedom is partitioned into components corresponding to the subject-by-effect interactions. Back 36

The residual degrees of freedom The denominator degrees of freedom are the residual degrees of freedom. This will give exact test for all effects that are orthogonal to the Z matrix; i.e. split-plot treatment and interaction with whole-plot treatment. Back 37

The Satterthwaite method The Satterthwaite method is a generalization of the Satterthwaite method described in Section 4. The generalization is discussed in considerable detail in another lecture. Back 38

The Kenward-Roger method The Kenward-Roger method implements the method described in [2]. This method is in SAS starting with Version 8. The Kenward-Roger method uses the Satterthwaite method for determining the denominator degrees of freedom, but it modifies the estimator as well. Calling the Kenward-Roger method a denominator degrees of freedom method is a misnomer. Back 39

7 Comparison of degrees of freedom In section 5 we looked at three different estimators using traditional methods and taking advantage of the balanced data. In this section, we look at how SAS computes the denominator degrees of freedom for these estimates. We then remove some of the data and repeat the exercise. Back 40

Drug-Alcohol data with missing values Drugs Alcohol Subject A B C Yes HW. 4.04 3.26 Yes JBM. 3.88 3.49 Yes JWL 4.09. 3.79 Yes JBH 3.10. 2.80 Yes ARE 3.33 3.63 3.03 Yes EEA 3.35 3.63 3.05 No DCJ 2.83 2.55. No CJW 2.93 2.42 2.73 No RDF. 3.99 3.38 No RLA 2.98. 2.78 No EAS 2.32 2.15 2.12 No AMR 2.73 3.23 2.53 Back 41

We have removed seven observations or 19.4%. Four are from the alcohol group, and three are from the no alcohol group. Three observations are removed from both the Drug A and Drug B groups, and one observation is removed from Drug C. Back 42

The SAS code The SAS code used for this demonstration is proc mixed data = balanced; classes Alcohol Subject SubWithin Drug; model y = Alcohol Drug Alcohol*Drug / ddfm = conta random Subject; estimate 1 intercept 1 Alcohol 1 0 Drug 1 Alcoho estimate 2 Alcohol -1 1 ; estimate 3 Drug 1 0-1 ; run; The high lighted parts of the code are changed from run to run. We use the balanced data and the data with missing observations. We use all five methods of Back 43

computing the denominator degrees of freedom. We use both the direct and product method of specifying the random effect. Back 44

Estimate 1 Drug A with no alcohol Denominator degrees of freedom Method Balanced Missing Containment 20 13 Between-within 30 23 Residual 30 23 Satterthwaite 15 13.3 Kenward-Roger 15 13.3 Back 45

Estimate 2 Alcohol versus no alcohol Denominator degrees of freedom Method Balanced Missing Containment 20(10) 13(10) Between-within 30 23 Residual 30 23 Satterthwaite 10 9.85 Kenward-Roger 10 9.85 For the containment method, the first number is for direct specification, and the number in parentheses is for product specification. Back 46

Estimate 3 Drug A versus drug C Denominator degrees of freedom Method Balanced Missing Containment 20 13 Between-within 30 23 Residual 30 23 Satterthwaite 20 13.2 Kenward-Roger 20 13.2 Back 47

References [1] David M. Allen and Foster B. Cady. Analyzing Experimental Data by Regression. VanNostrand-Reinhold, Belmont, California, 1982. [2] M. G. Kenward and J. H. Roger. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53:983 997, 1997. [3] F. E. Satterthwaite. An approximate distribution of estimates of variance components. Biometrics Bulletin, 2:110 114, 1946. Back 48