Split-Plot Designs David M. Allen University of Kentucky January 30, 2014
1 Introduction In this talk we introduce the split-plot design and give an overview of how SAS determines the denominator degrees of freedom for various tests. Back 2
2 Drug-Alcohol Study The drug-alcohol study presented here is based on an actual study. It has been scaled down to facilitate more explicit displays. The responses have be changed because the original data are proprietary. See Allen and Cady [1] for more discussion. Back 3
Background Tranquilizers are one of the most prescribed classes of drugs. Unfortunately, the combination of tranquilizers and alcohol can compromise a driver s ability to operate a motor vehicle. It is desirable to develop a new tranquilizer that serves its intended purpose but does not combine with alcohol to give an undesirable effect. This trial is to compare effects of drug, effects of alcohol, and the effects of their interaction. The drugs are A a new drug, B a currently popular drug, and C a placebo. The response is the subject s performance on a simulated driving test. While multiple response measurements are recorded, the mean deviation (in feet) from the center of the driving lane is used here. Back 4
Randomization Subjects are the whole-plot unit. The alcohol and no alcohol treatments are randomly assigned to the twelve subjects with the restriction that there is the same number of subjects in each treatment group. Separately for each subject, the order of drugs A, B, and C is randomized. There is an adequate interval of time between administration of the different drugs to insure there are no carry-over effects. Back 5
The data Drugs Alcohol Subject A B C Yes EAS 3.56 4.04 3.26 Yes JBM 3.79 3.88 3.49 Yes ARE 4.09 5.32 3.79 Yes JBH 3.10 4.38 2.80 Yes WJT 3.33 3.63 3.03 Yes EEA 3.35 3.63 3.05 No JWL 2.83 2.55 2.63 No CJW 2.93 2.42 2.73 No RDF 3.58 3.99 3.38 No RLA 2.98 3.07 2.78 No HW 2.32 2.15 2.12 No AMR 2.73 3.23 2.53 Back 6
The model is The model y jk = μ + α + s j + δ k + (αδ) k + ε jk where y jk is the observation on the response variable; μ is the over-all mean; α is the effect of the th level of alcohol; s j is the effect of the jth subject; δ k is the effect of the kth drug; (αδ) k is the effect of the interaction of the th level of alcohol and kth level of drug; and ε jk is a random error. We assume s j N(0, σ 2 s ), ε jk N(0, σ 2 ), and that these effects are mutually independent. All other effects are considered fixed parameters. We have that j = 1 6 for = 1, and j = 7 12 for = 2. Back 7
Symbolic data Drugs Alcohol Subject A B C Yes EAS y 1,1,1 y 1,1,2 y 1,1,3 y 1,1, Yes JBM y 1,2,1 y 1,2,2 y 1,2,3 y 1,2,...... Yes WJT y 1,5,1 y 1,5,2 y 1,5,3 y 1,5, Yes EEA y 1,6,1 y 1,6,2 y 1,6,3 y 1,6, y 1,,1 y 1,,2 y 1,,3 y 1,, No JWL y 2,7,1 y 2,7,2 y 2,7,3 y 2,7, No CJW y 2,8,1 y 2,8,2 y 2,8,3 y 2,8,...... No HW y 2,11,1 y 2,11,2 y 2,11,3 y 2,11, No AMR y 2,12,1 y 2,12,2 y 2,12,3 y 2,12, y 2,,1 y 2,,2 y 2,,3 y 2,, Back 8
Symbolic analysis of variance Degrees of Sum of Mean Expected Source Freedom Squares Square Mean Square Alcohol 1 SS α MS α σ 2 + 3σ 2 s + Q(α, ( Subjects 10 SS s MS s σ 2 + 3σ 2 s Drugs 2 SS δ MS δ σ 2 + Q(δ, (αδ)) Alcohol*Drug 2 SS (αδ) MS (αδ) σ 2 + Q((αδ)) Residual 20 SS ε MS ε σ 2 Back 9
Numeric analysis of variance Degrees of Sum of Mean F- Source Freedom Squares Square statistic Alcohol 1 5.8968 5.8968 10.11 Subjects 10 5.8340 0.5834 Drugs 2 1.8772 0.9386 13.29 Alcohol*Drug 2 0.8686 0.4343 6.15 Residual 20 1.4126 0.0706 Back 10
3 Nested factors A factor B is said to be nested within factor A if the levels of factor B are different within each level of factor A. In this case, we say factor A contains factor B. Back 11
An example To facilitate explicit displays, we use a smaller version of the drug-alcohol study: Drug Alcohol Subject SubWithin A B Yes dma 1 4.35 6.82 Yes lwh 2 3.39 5.28 Yes rla 3 5.48 7.12 No clw 1 4.86 6.44 No red 2 6.66 8.21 No bbs 3 5.75 9.25 No kmd 4 3.87 5.70 The levels of Subjects are completely different for the yes Back 12
and no levels of Alcohol. We say that Subjects are nested within Alcohol and that Alcohol contains Subjects. Back 13
Coding Sometimes a nested factor is coded such that the levels are unique only within levels of the containing factor. For example, the factor SubWithin in the above display is unique only within levels of Alcohol. The remainder of this section deals with building the Z matrix. We assume Alcohol, Subject, and SubWithin are classes variables. Back 14
Z = Building the Z matrix 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 We can build Z by putting Subject in a random statement. We call this the direct method. Back 15
SAS notation We can build Z by putting either of the equivalent terms, Alcohol*SubWithin or SubWithin(Alcohol), in a random statement. We call this the product method. Back 16
Z = 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 = 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 Back 17
An editorial The product method has little to recommend it: A variable having a unique subject code must exist, for otherwise the randomization could not have been carried out. Why not use it? If there are unequal numbers of subjects in the alcohol groups, the second method will put one or more columns of all zeros in the design matrix. This increases computational time. Back 18
From the computational point of view, the worst possible specification is combine the two methods. For example, Subject(Alcohol) would introduce fourteen columns in the design matrix, and one-half of them would be all zeros. There is an additional consideration: SAS treats models specified by the product and direct methods differently. Back 19
4 Satterthwaite procedure In this section we give the simplest form of the Satterthwaite approximation [3]. This approximation may be thought of as synthesizing a mean square. Back 20
The setup Suppose a model depends on vector of fixed effects, β, and two variances, σ 2 1 and σ2. Our interest is in a linear 2 function of the fixed effects which we denote by δ. Assume that we have a normally distributed estimator, ˆδ, with variance c 1 σ 2 1 + c 2σ 2 2 where c 1 and c 2 are known constants. Available are SS 1 and SS 2 such that SS 1 σ 2 1 χ2 (ν 1 ) and SS 2 σ 2 2 χ2 (ν 2 ). You may look back to page 9 for an example of SS 1 and SS 2. SS 1, SS 2, and ˆδ are mutually independent. The test statistic for the null hypothesis that δ is equal a specified value δ 0 is t = ˆδ δ 0 c1 SS 1 /ν 1 + c 2 SS 2 /ν 2. Back 21
The question is: what is the distribution of t? Back 22
Decomposing t The approach used here is to approximate the distribution of t by a t-distribution. That reduces the problem to finding the degrees of freedom of the approximating t-distribution. Define and Z = ˆδ δ 0 c 1 σ 2 1 + c 2σ 2 2 U = c 1 σ 2 SS 1 1 c 2 σ 2 SS 2 2 ν 1 (c 1 σ 2 1 + c 2σ 2 2 ) σ 2 + 1 ν 2 (c 1 σ 2 1 + c 2σ 2 2 ) σ 2 2 then t = Z/ U. Under the null hypothesis, the distribution of Z is standard normal. Back 23
It remains to approximate the distribution of U by a Chi-square divided by it degrees of freedom, i.e. there exist a ν such that U χ 2 (ν)/ν is approximately satisfied. Back 24
Degrees of freedom for approximating distribution By approximately satisfied we mean U and χ 2 (ν)/ν should have the same variance. Now V r(u) = and c 1 σ 2 2 c 1 2 σ 2 2 ν 1 (c 1 σ 2 1 + c 2σ 2 2 ) 2ν 1 + ν 2 (c 1 σ 2 1 + c 2σ 2 2 ) = 2 c2 1 σ4 1 /ν 1 + c 2 2 σ4 2 /ν 2 (c 1 σ 2 1 + c 2σ 2 2 )2 V r χ 2 (ν)/ν = 2 ν. 2 Back 25
Equating these two variances and solving for ν gives ν = (c 1σ 2 1 + c 2σ 2 2 )2 c 2 1 σ4 1 /ν 1 + c 2 2 σ4 2 /ν 2 Back 26
5 Estimation with balanced data Estimators of linear combinations of fixed effects can be categorized in three ways: 1. estimators that are orthogonal to subjects; 2. estimators that involve only subject totals; and 3. other estimators. We will illustrate a represenitive estimator from each category. The estimators discussed in this section are defined in terms of notation given on page 8. Back 27
Drug A versus Drug C A comparison of Drug A with Drug C, averaged over possible interaction effects, is orthogonal to subjects. This is because each drug is used on each subject. The estimator of δ 1 δ 3 is (y 1,,1 + y 2,,1 y 1,,3 y 2,,3 )/2, and its variance is σ 2 /6. The residual mean square is an estimator of σ 2 and is distributed proportional to Chi-square. The t-distribution is used in the usual way for testing or or confidence intervals. A similar result is true for all contrasts among drug effects or among interaction effects. Back 28
Alcohol versus no alcohol A comparison alcohol with no alcohol, averaged over any interaction effects, involves only subject totals. The estimator of α 1 α 2 is y 1,, y 2,,, and its variance is (3σ 2 s + σ2 )/9. The subject mean square is an estimator of 3σ 2 s + σ2 and is distributed proportional to Chi-square. The t-distribution is used in the usual way for testing or or confidence intervals. Back 29
Response with Drug A and Alcohol The estimated response for a subject on Drug A and Alcohol is y 1,,1, and its variance is σ 2 + σ 2. We estimate s σ 2 + σ 2 s by 1 3 MS s + 2 3 MS ε. Unfortunately, 1 3 MS s + 2 3 MS ε is not distributed proportional to Chi-square, so the usual confidence interval based on the t-distribution not strictly valid. Back 30
We use the Satterthwaite procedure to find the degrees of freedom of the approximating Chi-square distribution. The correspondence of notation is σ 2 1 = 3σ2 s + σ2 σ 2 2 = σ2 ν 1 = 10 ν 2 = 20 c 1 = 1/3 c 2 = 2/3 Since the variances are not known, substitute the corresponding mean squares. The result is ν = 15. We proceed with the inference assuming a t-distribution with Back 31
fifteen degrees of freedom. Back 32
6 SAS degrees of freedom options On the estimate statement one may use the df option to specify the denominator degrees of freedom for the approximate t-distribution. However, except for simple tests with balanced data, most people will want SAS to provide the degrees of freedom. In this section we describe five different methods for determining denominator degrees of freedom that a accessible in SAS. Back 33
The containment method The containment method is the default when the RANDOM statement is used. Otherwise, the containment method is invoked with the DDFM = CONTAIN option on the model statement. Denote the fixed effect in question A, and search the RANDOM effect list for the effects that syntactically contain A. Among the random effects that contain A, compute their rank contribution to the [X Z] matrix. The denominator degrees of freedom assigned to A is the smallest of these rank contributions. If A is not found on the random statement, the containment method is not invoked, and the denominator degrees of freedom are the residual degrees of freedom. Back 34
Note that for a nested model, specified by the direct method, the containment method will not be invoked. Back 35
The between-within method The DDFM = BETWITHIN option is the default for REPEATED statement specifications (with no RANDOM statements). It is computed by dividing the residual degrees of freedom into between-subject and within-subject portions. PROC MIXED then checks whether a fixed effect changes within any subject. If so, it assigns within-subject degrees of freedom to the effect; otherwise, it assigns the between-subject degrees of freedom to the effect. If there are multiple within-subject effects containing classification variables, the within-subject degrees of freedom is partitioned into components corresponding to the subject-by-effect interactions. Back 36
The residual degrees of freedom The denominator degrees of freedom are the residual degrees of freedom. This will give exact test for all effects that are orthogonal to the Z matrix; i.e. split-plot treatment and interaction with whole-plot treatment. Back 37
The Satterthwaite method The Satterthwaite method is a generalization of the Satterthwaite method described in Section 4. The generalization is discussed in considerable detail in another lecture. Back 38
The Kenward-Roger method The Kenward-Roger method implements the method described in [2]. This method is in SAS starting with Version 8. The Kenward-Roger method uses the Satterthwaite method for determining the denominator degrees of freedom, but it modifies the estimator as well. Calling the Kenward-Roger method a denominator degrees of freedom method is a misnomer. Back 39
7 Comparison of degrees of freedom In section 5 we looked at three different estimators using traditional methods and taking advantage of the balanced data. In this section, we look at how SAS computes the denominator degrees of freedom for these estimates. We then remove some of the data and repeat the exercise. Back 40
Drug-Alcohol data with missing values Drugs Alcohol Subject A B C Yes HW. 4.04 3.26 Yes JBM. 3.88 3.49 Yes JWL 4.09. 3.79 Yes JBH 3.10. 2.80 Yes ARE 3.33 3.63 3.03 Yes EEA 3.35 3.63 3.05 No DCJ 2.83 2.55. No CJW 2.93 2.42 2.73 No RDF. 3.99 3.38 No RLA 2.98. 2.78 No EAS 2.32 2.15 2.12 No AMR 2.73 3.23 2.53 Back 41
We have removed seven observations or 19.4%. Four are from the alcohol group, and three are from the no alcohol group. Three observations are removed from both the Drug A and Drug B groups, and one observation is removed from Drug C. Back 42
The SAS code The SAS code used for this demonstration is proc mixed data = balanced; classes Alcohol Subject SubWithin Drug; model y = Alcohol Drug Alcohol*Drug / ddfm = conta random Subject; estimate 1 intercept 1 Alcohol 1 0 Drug 1 Alcoho estimate 2 Alcohol -1 1 ; estimate 3 Drug 1 0-1 ; run; The high lighted parts of the code are changed from run to run. We use the balanced data and the data with missing observations. We use all five methods of Back 43
computing the denominator degrees of freedom. We use both the direct and product method of specifying the random effect. Back 44
Estimate 1 Drug A with no alcohol Denominator degrees of freedom Method Balanced Missing Containment 20 13 Between-within 30 23 Residual 30 23 Satterthwaite 15 13.3 Kenward-Roger 15 13.3 Back 45
Estimate 2 Alcohol versus no alcohol Denominator degrees of freedom Method Balanced Missing Containment 20(10) 13(10) Between-within 30 23 Residual 30 23 Satterthwaite 10 9.85 Kenward-Roger 10 9.85 For the containment method, the first number is for direct specification, and the number in parentheses is for product specification. Back 46
Estimate 3 Drug A versus drug C Denominator degrees of freedom Method Balanced Missing Containment 20 13 Between-within 30 23 Residual 30 23 Satterthwaite 20 13.2 Kenward-Roger 20 13.2 Back 47
References [1] David M. Allen and Foster B. Cady. Analyzing Experimental Data by Regression. VanNostrand-Reinhold, Belmont, California, 1982. [2] M. G. Kenward and J. H. Roger. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53:983 997, 1997. [3] F. E. Satterthwaite. An approximate distribution of estimates of variance components. Biometrics Bulletin, 2:110 114, 1946. Back 48