Split-Plot Designs. David M. Allen University of Kentucky. January 30, 2014

Size: px

Start display at page:

Download "Split-Plot Designs. David M. Allen University of Kentucky. January 30, 2014"

Vivian Haynes
6 years ago
Views:

1 Split-Plot Designs David M. Allen University of Kentucky January 30, 2014

2 1 Introduction In this talk we introduce the split-plot design and give an overview of how SAS determines the denominator degrees of freedom for various tests. Back 2

3 2 Drug-Alcohol Study The drug-alcohol study presented here is based on an actual study. It has been scaled down to facilitate more explicit displays. The responses have be changed because the original data are proprietary. See Allen and Cady [1] for more discussion. Back 3

4 Background Tranquilizers are one of the most prescribed classes of drugs. Unfortunately, the combination of tranquilizers and alcohol can compromise a driver s ability to operate a motor vehicle. It is desirable to develop a new tranquilizer that serves its intended purpose but does not combine with alcohol to give an undesirable effect. This trial is to compare effects of drug, effects of alcohol, and the effects of their interaction. The drugs are A a new drug, B a currently popular drug, and C a placebo. The response is the subject s performance on a simulated driving test. While multiple response measurements are recorded, the mean deviation (in feet) from the center of the driving lane is used here. Back 4

5 Randomization Subjects are the whole-plot unit. The alcohol and no alcohol treatments are randomly assigned to the twelve subjects with the restriction that there is the same number of subjects in each treatment group. Separately for each subject, the order of drugs A, B, and C is randomized. There is an adequate interval of time between administration of the different drugs to insure there are no carry-over effects. Back 5

6 The data Drugs Alcohol Subject A B C Yes EAS Yes JBM Yes ARE Yes JBH Yes WJT Yes EEA No JWL No CJW No RDF No RLA No HW No AMR Back 6

7 The model is The model y jk = μ + α + s j + δ k + (αδ) k + ε jk where y jk is the observation on the response variable; μ is the over-all mean; α is the effect of the th level of alcohol; s j is the effect of the jth subject; δ k is the effect of the kth drug; (αδ) k is the effect of the interaction of the th level of alcohol and kth level of drug; and ε jk is a random error. We assume s j N(0, σ 2 s ), ε jk N(0, σ 2 ), and that these effects are mutually independent. All other effects are considered fixed parameters. We have that j = 1 6 for = 1, and j = 7 12 for = 2. Back 7

8 Symbolic data Drugs Alcohol Subject A B C Yes EAS y 1,1,1 y 1,1,2 y 1,1,3 y 1,1, Yes JBM y 1,2,1 y 1,2,2 y 1,2,3 y 1,2, Yes WJT y 1,5,1 y 1,5,2 y 1,5,3 y 1,5, Yes EEA y 1,6,1 y 1,6,2 y 1,6,3 y 1,6, y 1,,1 y 1,,2 y 1,,3 y 1,, No JWL y 2,7,1 y 2,7,2 y 2,7,3 y 2,7, No CJW y 2,8,1 y 2,8,2 y 2,8,3 y 2,8, No HW y 2,11,1 y 2,11,2 y 2,11,3 y 2,11, No AMR y 2,12,1 y 2,12,2 y 2,12,3 y 2,12, y 2,,1 y 2,,2 y 2,,3 y 2,, Back 8

9 Symbolic analysis of variance Degrees of Sum of Mean Expected Source Freedom Squares Square Mean Square Alcohol 1 SS α MS α σ 2 + 3σ 2 s + Q(α, ( Subjects 10 SS s MS s σ 2 + 3σ 2 s Drugs 2 SS δ MS δ σ 2 + Q(δ, (αδ)) Alcohol*Drug 2 SS (αδ) MS (αδ) σ 2 + Q((αδ)) Residual 20 SS ε MS ε σ 2 Back 9

10 Numeric analysis of variance Degrees of Sum of Mean F- Source Freedom Squares Square statistic Alcohol Subjects Drugs Alcohol*Drug Residual Back 10

11 3 Nested factors A factor B is said to be nested within factor A if the levels of factor B are different within each level of factor A. In this case, we say factor A contains factor B. Back 11

12 An example To facilitate explicit displays, we use a smaller version of the drug-alcohol study: Drug Alcohol Subject SubWithin A B Yes dma Yes lwh Yes rla No clw No red No bbs No kmd The levels of Subjects are completely different for the yes Back 12

13 and no levels of Alcohol. We say that Subjects are nested within Alcohol and that Alcohol contains Subjects. Back 13

14 Coding Sometimes a nested factor is coded such that the levels are unique only within levels of the containing factor. For example, the factor SubWithin in the above display is unique only within levels of Alcohol. The remainder of this section deals with building the Z matrix. We assume Alcohol, Subject, and SubWithin are classes variables. Back 14

15 Z = Building the Z matrix We can build Z by putting Subject in a random statement. We call this the direct method. Back 15

16 SAS notation We can build Z by putting either of the equivalent terms, Alcohol*SubWithin or SubWithin(Alcohol), in a random statement. We call this the product method. Back 16

17 Z = = Back 17

18 An editorial The product method has little to recommend it: A variable having a unique subject code must exist, for otherwise the randomization could not have been carried out. Why not use it? If there are unequal numbers of subjects in the alcohol groups, the second method will put one or more columns of all zeros in the design matrix. This increases computational time. Back 18

19 From the computational point of view, the worst possible specification is combine the two methods. For example, Subject(Alcohol) would introduce fourteen columns in the design matrix, and one-half of them would be all zeros. There is an additional consideration: SAS treats models specified by the product and direct methods differently. Back 19

20 4 Satterthwaite procedure In this section we give the simplest form of the Satterthwaite approximation [3]. This approximation may be thought of as synthesizing a mean square. Back 20

21 The setup Suppose a model depends on vector of fixed effects, β, and two variances, σ 2 1 and σ2. Our interest is in a linear 2 function of the fixed effects which we denote by δ. Assume that we have a normally distributed estimator, ˆδ, with variance c 1 σ c 2σ 2 2 where c 1 and c 2 are known constants. Available are SS 1 and SS 2 such that SS 1 σ 2 1 χ2 (ν 1 ) and SS 2 σ 2 2 χ2 (ν 2 ). You may look back to page 9 for an example of SS 1 and SS 2. SS 1, SS 2, and ˆδ are mutually independent. The test statistic for the null hypothesis that δ is equal a specified value δ 0 is t = ˆδ δ 0 c1 SS 1 /ν 1 + c 2 SS 2 /ν 2. Back 21

22 The question is: what is the distribution of t? Back 22

23 Decomposing t The approach used here is to approximate the distribution of t by a t-distribution. That reduces the problem to finding the degrees of freedom of the approximating t-distribution. Define and Z = ˆδ δ 0 c 1 σ c 2σ 2 2 U = c 1 σ 2 SS 1 1 c 2 σ 2 SS 2 2 ν 1 (c 1 σ c 2σ 2 2 ) σ ν 2 (c 1 σ c 2σ 2 2 ) σ 2 2 then t = Z/ U. Under the null hypothesis, the distribution of Z is standard normal. Back 23

24 It remains to approximate the distribution of U by a Chi-square divided by it degrees of freedom, i.e. there exist a ν such that U χ 2 (ν)/ν is approximately satisfied. Back 24

25 Degrees of freedom for approximating distribution By approximately satisfied we mean U and χ 2 (ν)/ν should have the same variance. Now V r(u) = and c 1 σ 2 2 c 1 2 σ 2 2 ν 1 (c 1 σ c 2σ 2 2 ) 2ν 1 + ν 2 (c 1 σ c 2σ 2 2 ) = 2 c2 1 σ4 1 /ν 1 + c 2 2 σ4 2 /ν 2 (c 1 σ c 2σ 2 2 )2 V r χ 2 (ν)/ν = 2 ν. 2 Back 25

26 Equating these two variances and solving for ν gives ν = (c 1σ c 2σ 2 2 )2 c 2 1 σ4 1 /ν 1 + c 2 2 σ4 2 /ν 2 Back 26

27 5 Estimation with balanced data Estimators of linear combinations of fixed effects can be categorized in three ways: 1. estimators that are orthogonal to subjects; 2. estimators that involve only subject totals; and 3. other estimators. We will illustrate a represenitive estimator from each category. The estimators discussed in this section are defined in terms of notation given on page 8. Back 27

28 Drug A versus Drug C A comparison of Drug A with Drug C, averaged over possible interaction effects, is orthogonal to subjects. This is because each drug is used on each subject. The estimator of δ 1 δ 3 is (y 1,,1 + y 2,,1 y 1,,3 y 2,,3 )/2, and its variance is σ 2 /6. The residual mean square is an estimator of σ 2 and is distributed proportional to Chi-square. The t-distribution is used in the usual way for testing or or confidence intervals. A similar result is true for all contrasts among drug effects or among interaction effects. Back 28

29 Alcohol versus no alcohol A comparison alcohol with no alcohol, averaged over any interaction effects, involves only subject totals. The estimator of α 1 α 2 is y 1,, y 2,,, and its variance is (3σ 2 s + σ2 )/9. The subject mean square is an estimator of 3σ 2 s + σ2 and is distributed proportional to Chi-square. The t-distribution is used in the usual way for testing or or confidence intervals. Back 29

30 Response with Drug A and Alcohol The estimated response for a subject on Drug A and Alcohol is y 1,,1, and its variance is σ 2 + σ 2. We estimate s σ 2 + σ 2 s by 1 3 MS s MS ε. Unfortunately, 1 3 MS s MS ε is not distributed proportional to Chi-square, so the usual confidence interval based on the t-distribution not strictly valid. Back 30

31 We use the Satterthwaite procedure to find the degrees of freedom of the approximating Chi-square distribution. The correspondence of notation is σ 2 1 = 3σ2 s + σ2 σ 2 2 = σ2 ν 1 = 10 ν 2 = 20 c 1 = 1/3 c 2 = 2/3 Since the variances are not known, substitute the corresponding mean squares. The result is ν = 15. We proceed with the inference assuming a t-distribution with Back 31

32 fifteen degrees of freedom. Back 32

33 6 SAS degrees of freedom options On the estimate statement one may use the df option to specify the denominator degrees of freedom for the approximate t-distribution. However, except for simple tests with balanced data, most people will want SAS to provide the degrees of freedom. In this section we describe five different methods for determining denominator degrees of freedom that a accessible in SAS. Back 33

34 The containment method The containment method is the default when the RANDOM statement is used. Otherwise, the containment method is invoked with the DDFM = CONTAIN option on the model statement. Denote the fixed effect in question A, and search the RANDOM effect list for the effects that syntactically contain A. Among the random effects that contain A, compute their rank contribution to the [X Z] matrix. The denominator degrees of freedom assigned to A is the smallest of these rank contributions. If A is not found on the random statement, the containment method is not invoked, and the denominator degrees of freedom are the residual degrees of freedom. Back 34

35 Note that for a nested model, specified by the direct method, the containment method will not be invoked. Back 35

36 The between-within method The DDFM = BETWITHIN option is the default for REPEATED statement specifications (with no RANDOM statements). It is computed by dividing the residual degrees of freedom into between-subject and within-subject portions. PROC MIXED then checks whether a fixed effect changes within any subject. If so, it assigns within-subject degrees of freedom to the effect; otherwise, it assigns the between-subject degrees of freedom to the effect. If there are multiple within-subject effects containing classification variables, the within-subject degrees of freedom is partitioned into components corresponding to the subject-by-effect interactions. Back 36

37 The residual degrees of freedom The denominator degrees of freedom are the residual degrees of freedom. This will give exact test for all effects that are orthogonal to the Z matrix; i.e. split-plot treatment and interaction with whole-plot treatment. Back 37

38 The Satterthwaite method The Satterthwaite method is a generalization of the Satterthwaite method described in Section 4. The generalization is discussed in considerable detail in another lecture. Back 38

39 The Kenward-Roger method The Kenward-Roger method implements the method described in [2]. This method is in SAS starting with Version 8. The Kenward-Roger method uses the Satterthwaite method for determining the denominator degrees of freedom, but it modifies the estimator as well. Calling the Kenward-Roger method a denominator degrees of freedom method is a misnomer. Back 39

40 7 Comparison of degrees of freedom In section 5 we looked at three different estimators using traditional methods and taking advantage of the balanced data. In this section, we look at how SAS computes the denominator degrees of freedom for these estimates. We then remove some of the data and repeat the exercise. Back 40

41 Drug-Alcohol data with missing values Drugs Alcohol Subject A B C Yes HW Yes JBM Yes JWL Yes JBH Yes ARE Yes EEA No DCJ No CJW No RDF No RLA No EAS No AMR Back 41

42 We have removed seven observations or 19.4%. Four are from the alcohol group, and three are from the no alcohol group. Three observations are removed from both the Drug A and Drug B groups, and one observation is removed from Drug C. Back 42

43 The SAS code The SAS code used for this demonstration is proc mixed data = balanced; classes Alcohol Subject SubWithin Drug; model y = Alcohol Drug Alcohol*Drug / ddfm = conta random Subject; estimate 1 intercept 1 Alcohol 1 0 Drug 1 Alcoho estimate 2 Alcohol -1 1 ; estimate 3 Drug ; run; The high lighted parts of the code are changed from run to run. We use the balanced data and the data with missing observations. We use all five methods of Back 43

44 computing the denominator degrees of freedom. We use both the direct and product method of specifying the random effect. Back 44

45 Estimate 1 Drug A with no alcohol Denominator degrees of freedom Method Balanced Missing Containment Between-within Residual Satterthwaite Kenward-Roger Back 45

46 Estimate 2 Alcohol versus no alcohol Denominator degrees of freedom Method Balanced Missing Containment 20(10) 13(10) Between-within Residual Satterthwaite Kenward-Roger For the containment method, the first number is for direct specification, and the number in parentheses is for product specification. Back 46

47 Estimate 3 Drug A versus drug C Denominator degrees of freedom Method Balanced Missing Containment Between-within Residual Satterthwaite Kenward-Roger Back 47

48 References [1] David M. Allen and Foster B. Cady. Analyzing Experimental Data by Regression. VanNostrand-Reinhold, Belmont, California, [2] M. G. Kenward and J. H. Roger. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53: , [3] F. E. Satterthwaite. An approximate distribution of estimates of variance components. Biometrics Bulletin, 2: , Back 48

Randomized Complete Block Designs

Randomized Complete Block Designs David Allen University of Kentucky February 23, 2016 1 Randomized Complete Block Design There are many situations where it is impossible to use a completely randomized