The efficient use of experimental design methodology is an important requirement for fast and reliable

Size: px

Start display at page:

Download "The efficient use of experimental design methodology is an important requirement for fast and reliable"

Jemimah Morris
6 years ago
Views:

1 QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL Published online 7 November 2006 in Wiley InterScience ( Research Analysis of Split-plot Designs: An Overview and Comparison of Methods T. Næs 1,2,,,A.H.Aastveit 3 andn.s.sahni 4 1 Matforsk, Oslovegen 1, 1430 Ås, Norway 2 Department of Mathematics, University of Oslo, Blindern, Norway 3 UMB, 1430 Ås, Norway 4 Diagenic AS, Østensjøveien 15B, N-0661 Oslo, Norway Split-plot designs are frequently needed in practice because of practical limitations and issues related to cost. This imposes extra challenges on the experimenter, both when designing the experiment and when analysing the data, in particular for nonreplicated cases. This paper is an overview and discussion of some of the most important methods for analysing split-plot data. The focus is on estimation, testing and model validation. Two examples from an industrial context are given to illustrate the most important techniques. Received 7 June 2004; Revised 15 May 2006 KEY WORDS: split-plot designs; fractional factorial designs; ANOVA; mixed models; response surface models 1. INTRODUCTION The efficient use of experimental design methodology is an important requirement for fast and reliable progress in many branches of today s society. Factorial and fractional factorial designs are amongst the most frequently used design types 1. They are simple and generally straightforward to generate and analyse, and can be used in a number of different situations. The basic requirements of randomization and blocking of the experimental runs may, however, be difficult to satisfy due to either economic or practical reasons. Various restrictions are therefore imposed on the structure of the design, resulting in data with a more complex error structure than of completely randomized designs. One of the most important and frequently used strategies of this type are split-plot designs, which have recently received much attention in the literature (see, for example, Montgomery 2, Letsinger et al. 3, Bingham and Sitter 4,Kowalskiet al. 5, Box and Jones 6 and Bisgaard 7 ). Classic split-plot designs, traditionally used in agricultural applications, are run in replicates in a number of blocks. Data from such split-plot designs can be analysed by adding an extra random error term to the analysis of variance (ANOVA) model. In modern industrial situations, however, where fast and inexpensive progress is essential, replicates may lead to experiments that are too large, expensive and time consuming. Therefore, attention has switched from replicated to unreplicated split-plot experiments (see, for example, Box and Jones 6 and Bisgaard 7 ). Correspondence to: T. Næs, Matforsk, Oslovegen 1, 1430 Ås, Norway. tormod.naes@matforsk.no

2 802 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI This paper is an overview and discussion of the available methodology for analysing data from split-plot designs, with the main focus on industrial unreplicated experiments. Both situations with continuous and categorical variables are covered. Several problems related to the use of the methodology, for instance model validation and outlier detection, are also discussed. Some areas of possible future research are indicated. The paper ends with two real-life examples of split-plot experiments in the food industry, illustrating some of the different methodologies described. 2. EXAMPLES OF TYPICAL SPLIT-PLOT DESIGNS AND THEIR ANALYSES 2.1. Agricultural example with replicates In this example, three different fertilizers and four irrigation systems are to be tested out on two different varieties of potato. The yield per square hectare is the response variable of interest (denoted by Y ). For the purpose of experimentation, three fields, so-called blocks, are available. The blocks are assumed to be large enough that each potato variety can be tested with all three fertilizers and all four irrigation systems in each block, i.e. it is possible to split each block into = 24 smaller (equally sized) sections or plots. Practical limitations for the irrigation system, however, imply that a full randomization of the 24 combinations is not possible; the same irrigation can only be used for strip sections of the whole experimental area, as shown in Figure 1. If the positions of the four irrigation systems are randomized within the block and the three fertilizers and two potato varieties are randomized within the irrigation system, the resulting design is an example of a classical splitplot design (see, for example, Montgomery 2, Weber and Skillings 8 and Mead et al. 9 ). Irrigation is a so-called whole-plot factor and the other factors are so-called sub-plot factors. Note that because the fertilizers and potato varieties are investigated within the irrigation system, there is reason to believe that for many cases, the experimental error for these effects is smaller than it would be for fully randomized experiments. This can lead to more reliable inference for sub-plot effects than for the same effects in a fully randomized experiment. This aspect can in some cases be an additional argument for running the experiment in split-plot mode. For the whole-plot effects, however, the situation is often the opposite. These types of data are traditionally analysed using regular ANOVA based on computing sums of squares for all of the effects, then dividing them by their degrees of freedom (DFs) to obtain the mean squares (MSs) before the ratios of MSs are used in F -tests to check the significance of the effects. The main difference between the ANOVA model for a fully randomized block experiment and a split-plot experiment is that an extra whole-plot error term needs to be added at the whole-plot level of randomization. If we let i be the index for the whole-plot factor (irrigation method), j and k the indices for the sub-plot factors (fertilizer and potato variety) and l the index for the replicate/block, a natural model for the split-plot experiment is Y ijkl = μ + α i + β j + γ k + αβ ij + αγ ik + βγ jk + δ l + E il + e ijkl i = 1, 2, 3, 4, j= 1, 2, 3, k= 1, 2, l= 1, 2, 3 (1) Here, α i,i= 1, 2, 3, 4, are the irrigation effects (fixed effect), β j, j = 1, 2, 3, 4, are the fertilizer effects (fixed effect), γ k are the potato variety effects (fixed effect), δ j,j= 1, 2, 3, are the replicate/block effects (random effect) and E il is the whole-plot random error term which accounts for interaction between irrigation system i and block l (Var(E il ) = σ 2 wp ). e ijkl is the regular residual (or sub-plot) error with variance Var(e ijkl ) = σ 2 sp. The other effects are two-factor interactions between the fixed effects. A three-factor interaction effect is also possible, but not included in model (1). It is also possible to incorporate random interactions between sub-plot and replicate in the model, but these are often considered to be negligible and are therefore omitted here. As some of the effects are random, the tests for the different effects are based on different error terms. For this model, the whole-plot main effects are tested against the whole-plot MS error, while the sub-plot effects and interactions between whole plots and sub-plots are tested against the regular residual or sub-plot MS.

3 ANALYSIS OF SPLIT-PLOT DESIGNS 803 Block1 I1 I2 I4 I3 F2*P1 F1*P2 F1*P1 F3*P1 F1*P2 F2*P1 F3*P2 F2*P1 F2*P2 F1*P1 F2*P2 F1*P2 F3*P2 F3*P2 F3*P1 F1*P1 F2*P1 F3*P1 F1*P2 F2*P2 F1*P1 F2*P2 F2*P1 F3*P2 Block 2 I3 I4 I2 I1 F3*P1 F2*P1 F2*P2 F3*P2 F2*P2 F2*P2 F1*P2 F1*P1 F3*P2 F1*P2 F1*P1 F2*P2 F1*P1 F3*P2 F2*P1 F1*P2 F1*P2 F3*P1 F3*P1 F2*P1 F2*P1 F1*P1 F3*P2 F3*P1 Block 3 I2 I3 I1 I4 F2*P1 F1*P1 F3*P2 F1*P2 F2*P2 F3*P2 F2*P1 F2*P1 F1*P1 F1*P2 F3*P1 F3*P1 F3*P2 F3*P1 F1*P2 F3*P2 F1*P2 F2*P1 F1*P1 F2*P2 F3*P1 F2*P2 F2*P2 F1*P1 Figure 1. Illustration of the agricultural example with three replicates (blocks). The symbols I1 I4 represent the four irrigation systems, F1 F3 the three fertilizers and P1 and P2 the two potato varieties. The solid vertical lines indicate the borders between the strips of land treated by the same irrigation system. The four irrigation systems are randomized and the potato varieties and fertilizing regimes are randomized within the irrigation system 2.2. Industrial example without replicates Time and economic constraints often prohibit complex replications of industrial experiments. An example more typical in industry is therefore the following. Assume that, in a particular manufacturing process, one is interested in the effect (on the yield) of five different factors: A, B, C, D and E. Factors A and B are related to two process variables while C, D and E represent three different ingredients. Further, the process variables A and B are expensive to change in a completely randomized manner, so that a fully randomized design is not feasible.

4 804 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI A split-plot design with, for instance, two levels for each of the five factors may then be used. This is done by randomizing and conducting the eight (2 2 2) experiments corresponding to all possible combinations of the ingredient variables C, D and E for each of the four (2 2) combinations of the two process variables A and B. As above, the process variables are called whole-plot factors and the ingredient variables are called sub-plot factors. In some cases it may also be necessary to reduce the design further by using a fractional factorial design. This type of design run in split-plot mode was discussed by Bisgaard 7. In the first real-life example below, such a reduced design is used. A possible model for such a split-plot experiment is Y ijklm = μ + α j + β j + αβ ij + E ij + γ k + η l + δ m + W + e ijklm (2) where α i and β j represent the whole-plot effects (process variables A and B), αβ ik represents the interactions between the different levels of the whole-plot effects, γ k, η l, δ k represent the sub-plot effects (ingredients) and W comprise sub-plot interactions and interactions between whole-plot and sub-plot effects. All of these parameters are considered as fixed effects. E ij is the random whole-plot error (Var(E ij ) = σ 2 wp )ande ijklm is the sub-plot error (Var(e ijk ) = σ 2 sp ). Note that model (2) is essentially the same as model (1) except that now the whole-plot error E ij and the interaction between A and B (αβ ij ) have the same indices ij and, therefore, cannot be separated. This means that if the whole-plot interaction is present (non-zero), the two effects (αβ ij and E ij ) cannot be separated (identified) unless extra information is available. If, however, the interaction αβ ij isassumedtobezero,e ij is valid as a separate random whole-plot error term and a regular ANOVA can technically be used. Note that the whole-plot main effects will, however, only have one DF for the denominator, so the whole-plot test is very weak for this example. Sub-plot effects and their interactions are tested against the sub-plot error. To avoid this problem, the number of whole-plot factors must generally be relatively large, say five or six, and the higher-order interactions among them set equal to zero. In cases with many design factors, probability plots can also be very useful 6 leaving it unnecessary to assume any interactions equal to zero. Separate probability plots must be made for the whole-plot and sub-plot factors Categorical versus continuous variables In the two examples presented in Sections 2.1 and 2.2, the experimental factors were considered as categorical variables. For the last example, however, it is equally interesting to model the output in a more detailed way by treating the factors as continuous variables. To be able to do so, one would normally need a more elaborate design allowing for fitting, for instance, quadratic polynomials. One possibility is to run a central composite design (CCD) in the five factors instead of only the 2 5 design considered above. If such a design is used in split-plot mode, different sub-plot combinations are used for each whole-plot combination. This can sometimes lead to more complex analyses 3. An alternative is to use a full CCD for the whole-plot factors and then cross this with a CCD in the sub-plot factors, but this strategy will lead to rather large experiments. The second real-life example below is an example where the variables are continuous. Kowalski et al. 5 have discussed strategies for setting up fractions for such experiments. The different methods of analysis described in Section 4 will cover both the continuous and the categorical cases with the main focus on methods of analysing unreplicated designs. 3. GENERAL MODEL STRUCTURE FOR SPLIT-PLOT DESIGNS The two models in Section 2 and all other split-plot models can be cast into the general model framework y = Db + δ + e (3)

5 ANALYSIS OF SPLIT-PLOT DESIGNS 805 where y is the vector of response measurements, D is the design matrix for the experimental factors, b is the vector of corresponding regression coefficients, δ is the vector of whole-plot error terms and e is the vector of sub-plot errors (see, for example, Letsinger et al. 3 ). The terms δ and e can be combined into one error term: e = δ + e. The elements of δ are defined in such a way that they are identical for experiments within the same whole plot and uncorrelated between whole plots, while all of the elements of e are assumed uncorrelated and they are also assumed to be uncorrelated with the elements of δ. If we let the whole-plot error variance be equal to σ 2 wp and the sub-plot error variance be equal to σ 2 sp, the covariance matrix of the vector e = δ + e can be written as V = σ 2 wp J + σ 2 sp I (4) Here J is a block diagonal matrix with ones in the blocks and zero elsewhere, and I is the identity matrix. The dimension of both these matrices is the same and equal to p p, wherep is the length of the response vector y, i.e. p is equal to the number of observations. Each block in the block diagonal matrix corresponds to the elements within the same whole plot. For the second example discussed in Sections 2.2 (split-plot without replicates), all of the effects of the design matrix D are fixed. For the first example, however, a random block effect is also present in the model and this factor will then be a part of the design matrix D (with the corresponding coefficient being random). A model which avoids this ambiguity is the general mixed linear model y = Xb + Zu + e (5) where Xb represents only the fixed effects of the design, Zu represents the random effects and e is the vector of uncorrelated residual errors. Note that the covariance matrix of Zu + e is identical to that in Equation (4) given that there is no random replicate effect in the model. 4. ALTERNATIVE WAYS TO ANALYSE SPLIT-PLOT EXPERIMENTS The model in Equation (5) can be analysed in many different ways. If V is known, a natural approach is to use a generalized least-squares (GLS) estimator to estimate b (see, for example, Letsinger et al. 3 ). The GLS estimator can be written as ˆb GLS = (X T V 1 X) 1 X T V 1 y with covariance matrix equal to cov(ˆb GLS ) = (X T V 1 X) 1. These quantities can easily be combined to produce tests and confidence intervals. If the elements of V are unknown, which they usually are, they will have to be estimated and plugged into Equation (4). In the following we will discuss a number of ways in which this can be achieved. In some of the simplest and frequently occurring split-plot situations, the ordinary least-squares (OLS) estimator is identical to the GLS estimator and therefore the estimates of the variance components are needed only for the covariance matrix cov(ˆb GLS ) = (X T V 1 X) 1. Letsinger et al. 3 provide a mathematical characterization of some important situations when OLS = GLS. In particular, a distinction is made between the so-called crossed and non-crossed designs, corresponding to whether the same sub-plot experiments are repeated within all whole-plot combinations or not. It is shown that for the crossed designs OLS = GLS. An example of a non-crossed design type with the same property is the class of fractional factorial two-level designs analysed in the regular way by first-order models. Method 1 The simplest approach is to ignore the split-plot structure in the design. In such cases, the regular OLS estimate is used for b and the covariance matrix is calculated as (X T X) 1 σsp 2. This approach has been investigated using simulations by Letsinger et al. 3 and by Kowalski et al. 5 for different types of experiments. In both cases, the results indicate that when the whole-plot error is small (compared to the residual error), this approach can be recommended. Letsinger et al. 3 have argued that if the ratio d = σwp 2 /σ sp 2 is larger than unity, the OLS approach

6 806 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI should not be used. For d smaller than 0.4, the properties seem to be quite good as compared with some of the other choices discussed below. Næs et al. 10 have presented an example of a method that can be used to check this assumption for mixture data prior to model building. The method is applicable when raw material combinations represent the whole plots and is based on systematically selected measurements of the chemical constituents of the mixture components and the mixtures themselves. In some situations, it may be obvious from the design of the experiment that the whole-plot error is negligible. An example is provided by Hersleth et al. 11, where consumer preferences for a number of food products were analysed. In this case it was obvious from prior knowledge that the whole-plot error, which related to the design of the objects, was very small as compared to the sub-plot error, which was related to the random error in the consumer preference scores. Therefore, the split-plot structure could be neglected. Method 2 Another and more generally applicable approach is to use a full restricted maximum likelihood (REML; see, for example, Hand and Crowder 12 and Little et al. 13 ) estimation based on Equation (5). The REML is a well-accepted estimation method for estimating variance components in mixed models, and is based on ML estimation of the residual error distributions. This can be done using a statistical software package, e.g. SAS 14, which will be shown below. The method provides error variances that can be plugged into the equations above and thus be used to compute t-values and their corresponding p-values for the effects or regression coefficients. The main advantage of this approach is that it can be used in all reasonably designed cases. Software has also been developed that provides both DFs and significance tests directly. In the examples below, the SAS system is used. The software will also produce tests of composite hypotheses or contrasts. Exact DFs needed for the tests are generally not simple to obtain. An approximation method often used is the so-called Satterwait approximation method (see, for example, Satterwait 15 and Little et al. 13 ), based on the design structure as well as the actual data. The approximation formulae can be quite complex for large data sets, but are easily and automatically available from some of the software packages. Method 3 The third possibility is to use iteratively reweighted GLS estimation. This method starts with the regular OLS estimator for the model parameters. From the estimated coefficients, estimators for the error variances can be found using formulae presented in Letsinger et al. 3 (p. 395, Equations (C6) and (C7)). These can then be plugged into V using Equation (4) which can then be used in the GLS formula to provide new regression coefficients. The process continues until convergence. Results in Letsinger et al. 3 indicate that method (2) is sometimes superior to (3) and method (3) will therefore not be pursued further here. Method 4 The fourth approach is based on ANOVA. The sub-plot error variance is obtained by simply using the regular residual mean square (MS sp ) and the whole-plot error variance is obtained by using the difference between the sums of squares for the saturated and reduced models containing only the fixed effects believed to be important (see Letsinger et al. 3 ). The difference between the two models is thus thought of as representing the whole-plot error. The corresponding expected MS (MS dif ) is equal to E(MS dif ) = σ 2 sp + mσ 2 wp (6) where m is the number of sub-plot experiments per whole-plot combination. Again we assume that we are in the crossed situation described above. Simple calculations then show that ˆσ 2 wp = (MS dif MS sp )/m (7)

7 ANALYSIS OF SPLIT-PLOT DESIGNS 807 is an unbiased estimator for the whole-plot error variance. Slightly modified formulae are needed for the noncrossed case (see Letsinger et al. 3, p. 395, Equations (C6) and (C7)). Note that this approach is essentially equivalent to the ANOVA described for the second unreplicated example in Section 2.2. In that case, the MS for the non-modelled whole-plot interactions was used directly as the denominator of the F -tests (or, equivalently, t-tests) of the whole-plot effects. It is easy to show that the two tests are identical. Method 5 The next method to be discussed is based on using additional replicated measurements. If additional replications are collected in a sensible way, the variances of the whole-plot and the sub-plot errors can be estimated using simple variance formulae. Such an approach was described in, for instance, Kowalski et al. 5. One of the advantages of this approach is that the estimated error variances are totally model-independent. The disadvantage is that more experiments have to be carried out. Sub-plot replicates can often be obtained easily by just replicating some of the sub-plot combinations for a given whole-plot configuration. If variances are different for the different whole-plot combinations, it is recommended that this procedure is repeated for a number of whole-plot combinations. With r different wholeplot combinations and with m different sub-plot combinations within each of them, the sub-plot error variance estimate can be written as Ssp 2 = 1 ri=1 mj=1 (y ij ȳ i. ) 2 (8) r m 1 Here, y ij is the response measurement for whole-plot combination i and sub-plot combination j and ȳ i. is the average of y ij over j. For whole-plot variance estimation, however, the situation is different. Making replicates involving only the whole-plot error is difficult to envision because all replicates will always involve a contribution from the residual random (here sub-plot) error. At least measurement error will always be present, even if sub-plot combinations can be kept in exactly the same position during the experimentation. A possible approach for estimating the whole-plot variance estimation is the following. Select one combination of the experimental variables and repeat it r times, each time by resetting both the whole-plot and sub-plot factors between each experiment. These replicates will then be identical except for the total random noise in the experiment, which is equal to δ + e and has variance equal to σtot 2 = σ wp 2 + σ sp 2. The total error variance can be estimated using the simple formula r ˆσ tot 2 = (y i ȳ) 2 /(r 1) (9) i=1 This can, as for the sub-plot variance, be repeated for a number of experimental settings and pooled to give a more precise estimate. The whole-plot error variance can then be computed by subtracting the sub-plot variance ˆσ sp 2 from ˆσ 2 tot giving ˆσ 2 wp =ˆσ 2 tot ˆσ 2 sp (10) A problem with this approach, however, is that it is not obvious how to find the distribution of the error variances in V and their DFs, which are needed for testing purposes. For the situation with many replicates, this is not a major problem. The reason is that the estimated elements of V can then be assumed known. Another approach, which can be used to provide both whole-plot and sub-plot variances simultaneously, is to use a hierarchical design strategy. First, a whole-plot combination is selected and then this combination is reset a number of times (r). For each of these resets, the same sub-plot combination is repeated a number of times (m). The data will then contain information only about the noise structure and can be modelled as y ij = μ + A i + e ij (11)

8 808 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI where A i corresponds to the whole-plot random error and e ij to the sub-plot error. The expected whole-plot MS for this model is equal to the same expression as in Equation (6). The variances of the two errors can be estimated using the standard ANOVA formulae. ˆσ 2 sp = MS sp = 1 m(r 1) r i=1 j=1 m (y ij ȳ i ) 2 (12) and ˆσ 2 wp = (MS wp MS sp )/m = 1 m (( m r ) (ȳ i ȳ) 2 /(r 1) i=1 ˆσ 2 sp ) (13) where MS wp and MS sp are the mean squares for the two effects in model (11). The two MSs have DFs equal to (r 1) and m(r 1), respectively. An advantage with this hierarchical approach is that the number of sub-plot replicates m can in some cases be chosen in such a way that the expectations of the MSs are exactly equal to the variance elements of cov(ˆb GLS ) = (X T V 1 X) 1. This means that estimates for the error variances used in the tests will have known distribution and can be used directly for testing purposes. We will comment further on this topic in Section 7 in the first of the examples. Method 6 For factorial and fractional factorial split-plot designs with only two levels for each factor, it is also possible to use Q Q plots (or other probability plots; see, for example, Box and Jones 6 ). For split-plot models, the different effects have different variances, so the sub-plot and whole-plot effects must be compared separately. In order to use such plots, however, one needs at least three factors (and their two-factor interactions). If this is not the case, it is difficult to judge whether the points fall close to a straight line or not. Method 7 The last technique that will be mentioned is the so-called Lenth method 16 for analysing fractional factorial splitplot designs. This method is quite similar to the regular t-tests/f -tests mentioned in Method 4. These tests divide the coefficient or effect estimates on a standard error, but instead of using the regular standard error, a pseudo standard error (PSE) is computed. This corrects for the fact that a search strategy is being used. The method is based on first ranking all possible effect estimates and then searching for a split between the significant and non-significant effects by comparing them with the PSE. Tables have been developed to determine which factors are active. Extensions of the Lenth method to split-plot designs have been discussed by Loeppky and Sitter 17.Asfor Method 6, the Lenth method must be used separately for the whole-plot and sub-plot effects. The whole-plot effects are ranked and tested in the regular way while the sub-plot effects are tested by a modified procedure which considers the situation as a block design. Loeppky and Sitter 17 developed tables of critical values of these tests for this particular situation. 5. FRACTIONAL FACTORIAL SPLIT-PLOT DESIGNS The fractional factorial designs can be analysed according to the methods described in Sections 4, but there are a few additional aspects for these methods that we would like to mention before we present the examples. Fractional factorial designs 1 are systematically selected subsets of full factorial designs which have as few runs as possible while still providing useful information. In this section we concentrate on the simplest of these designs, having only two levels for each factor, the so-called 2 (k p) designs (see Section 2.2).

9 ANALYSIS OF SPLIT-PLOT DESIGNS 809 Table I. Cartesian and split-plot confounding (crossed and non-crossed designs). A and B are whole-plot factors while C and D are sub-plot factors. Both are experiments, but with different confounding. (a) Cartesian confounding. The same sub-plot experiments are repeated for all whole-plot combinations. A full factorial in A and B is confounded with a half fraction of the full design of C and D. The generator for the experiment is C = D. (b) Split-plot confounding. Here different sub-plot combinations are represented for each whole-plot combination. The generator for the experiment is D = ABC C/D A/B Low/Low High/Low Low/High High/High (a) Low/Low High/Low Low/High High/High (b) Low/Low High/Low Low/High High/High Such designs used in a split-plot framework were considered thoroughly by Bisgaard 7, who also presented a number of examples with different types of confounding. In particular, Bisgaard 7 discussed a number of different ways to confound whole-plot and sub-plot effects with each other. An important distinction is made between so-called Cartesian confounding and split-plot confounding closely related to the distinction between crossed and non-crossed designs 3 mentioned in Section 4. An advantage of the former type is that they provide less aliased interactions between sub-plots and whole plots. This can be of some importance in robustness studies. The non-crossed designs are usually preferred because they have higher resolution among the sub-plot factors 7. Note that for both cases, LS = GLS. This means that the estimated effects can be computed as usual in factorial designs. An illustration of the two types of confounding is given in Table I. There are fewer DFs available for testing, error estimation and probability plotting than for full factorial split-plot designs. Therefore, for such designs it is important to use the replicate structures. Another important problem is to identify which effects or contrasts must be tested against whole-plot error variance and which must be tested against a sub-plot variance. This problem was also discussed by Bisgaard 7, who gave a simple rule: all contrasts obtained by multiplying the basic generators for the whole-plot design in all possible ways will have to be tested against a whole-plot error variance given by (using A as a token whole-plot contrast; see also Bingham and Sitter 4 ) and all remaining contrasts against (using P as token sub-plotcontrast) Var(Â) = 4 N (2n σ 2 wp + σ 2 sp ) (14) Var( ˆ P)= 4 N σ 2 sp (15) Here N is equal to the total number of experiments (N = 2 (k p) )and2 n is the number of sub-plot experiments for each whole-plot. Note that these formulae are identical to the corresponding elements of cov(ˆb GLS ) = (X T V 1 X) 1.

10 810 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI Thus, whole-plot and sub-plot effects can be tested by simply computing their effects in the usual way and dividing them by estimates of the simple formulae in Equations (14) and (15), respectively. These quantities are then usually compared with a t-distribution with the appropriate DFs, which depends on which estimation method is used, as discussed above. The hierarchical replicate structure described in Section 4 is important here. Setting m = 2 n, the expectation of MS wp becomes exactly equal to (except for the 4/N factor) Var(Â),andtheMS wp can be used directly as the denominator in tests for the whole-plot effects. These will then be tests with 1 and (r 1) DFs. For the sub-plot effects, MS sp can be used and give rise to tests with 1 and m(r 1) DFs. An example of the analysis of this type of experiment with the use of both ANOVA and probability plotting is given in Section MISCELLANEOUS TOPICS 6.1. Analysis of longitudinal data (correlated sub-plot errors) In some types of split-plot experiment, there may be structures or dependencies among the sub-plot errors. A typical example of this is the following. Assume that a number of samples have been produced using a regular randomized factorial design. Then one sample from each combination is split into two, each stored at a different temperature. The samples are then monitored and measured a number of times over a time period of a certain length. In such cases, the production factors are the whole-plot factors and the temperature and time are sub-plot factors. A whole-plot error has to be incorporated at the production factor level. The sub-plot errors in this case correspond to measurements taken at different times. In such cases it is natural to assume a correlation between the sub-plot errors, i.e. between observations taken within the same production and storing conditions. This type of correlation should be taken into account when analysing the data. This type of data are often called longitudinal data 12,13,18. The general model for such an experiment can be written as Y = μ + production factors + whole plot error + storing conditions + time + sub-plot errors (16) with a correlation structure imposed for the sub-plot error terms. A simple alternative for modelling correlation is the AR(1) model 12, but many other alternatives also exist. Useful criteria have been developed for selecting the best possible covariance matrix. This type of model can be analysed by using the REML procedure as described in Section 4. If the SAS computing system 14 is used, an extra statement is needed to tell the computer what type of correlation structure is required. An example of such a situation can be found in Bjerke et al. 19. For further discussion of this topic, see Little et al. 13 or the SAS/STAT Users Guide 14, version The squared multiple correlation coefficient, R 2 For traditional regression models, R 2 = 1 RSS SYY (17) is often used for assessing model adequacy. Here RSS is the residual sum of squares and SYY is the total sum of squares (corrected for the mean) for the model. A value close to 1 is taken as an indicator of a good model (in particular, if the number of observations in the y-vector is large as compared to the number of design variables in the design). The same criterion can technically also be used for split-plot models, but here it is not obvious how to define R 2 in the most useful way. One possibility is to use R 2 obtained by fitting the fixed effects model without taking the whole-plot errors into account. A problem with this approach, however, is that the errors are correlated and it is not obvious how this influences the interpretation of R 2. Another possibility is to use R 2 obtained with

11 ANALYSIS OF SPLIT-PLOT DESIGNS 811 the whole-plot error incorporated in the model fitting. A problem with this is that some of the random error is moved from being a part of the error to being a part of the model structure. Other ad hoc measures along the same path of reasoning can also be envisioned. One possibility is to use the ratio of the variance for the whole-plot error and the split-plot error. Another is to use the ratio of this ratio and the total variance of the data set. Together, these three ratios and their relative size will provide important indications of model fit and the relative error size. More research is needed on this issue Residuals It is common practice to perform the residual analysis from a regression model (see, for example, Cook and Weisberg 20 ). As for R 2, in split-plot designs there are two types of residuals that are of interest. One possibility is to use residuals obtained for models with the whole-plot error included in the model. These will only contain information about possible structure (for instance, outliers) at the sub-plot level of the design. Residuals obtained without the whole-plot error in the model will, however, contain information about possible unwanted structures both at the whole-plot level and at the sub-plot level. Both residuals plots are easiest to interpret if observations for the same whole-plot combinations are plotted adjacent to each other, as will be done for the examples in Sections 7 and 8. In practice, we recommend using both plots to assess tendencies at both error levels. 7. EXAMPLE 1: ANALYSING FRACTIONAL FACTORIAL SPLIT-PLOT DATA The data for this example are taken from a larger pilot plant study of the production of mayonnaise 21. The design used was a regular split-plot fractional factorial design with seven factors. Two of the factors were ingredient factors (A and B) and five of them were process variables (a, b, c, d and e). The two ingredient factors were used as whole-plot factors, due to practical restrictions. The design used was a (5 2), which means that the design in the whole-plot factors is a full factorial, and that the total design is a 2 (5+2) 2 = design. The sub-plot part of the design was first generated as a regular fractional factorial design using e = abcd. Then the two designs were put together by using the two generators c = ABab and d = ABe, i.e. by using a split-plot (or non-crossed) confounding (see Table I and Bisgaard 7 ). The defining relation for the design is thus I = abcde = ABabc = ABde. The design is shown in Table II. The experiment was conducted over four days with one whole-plot combination tested each day. In order to control the stability of the production process, a centre point was repeated twice every day (randomly positioned within the day). Note that even though this is a non-crossed situation, it preserves the OLS = GLS equivalence through their first-order structure (see Letsinger et al. 3 ). It can also be noted from the confounding pattern that the sub-plot interaction de is confounded with AB. This means that the de interaction has a whole-plot error, although it has the appearance of a sub-plot effect. No other two-factor interactions of the sub-plot terms are in this case confounded with whole plots and were therefore tested against the sub-plot error variance. The design was first analysed using a regular mixed model ANOVA using the REML procedure (the SAS code is given in Appendix A). The model involves all main effects, all possible two-factor interactions plus whole-plot and sub-plot errors, i.e. Y ijklmno = μ + α i + β j + χ k + δ l + φ m + η n + ϕ o + E ij + W + e ijklmno (18) Here, all of the Greek letters represent the fixed factor effects, E ij is the whole-plot error (see also model (2)) and e ijklmno is the residual error. In addition to the confounding between de and the whole-plot error AB, Be is confounded with Ad and Bd with Ae. Therefore, de, Ad and Bd were all excluded from the model. W here corresponds to all two-factor interactions. In this paper we have focused on a single response, Y, related to the colour of the mayonnaise as assessed by a sensory panel. A full multivariate study of several other attributes is given in Sahni et al. 21. The results from the mixed model analysis are shown in Table III. Note that the estimates in this case are computed as regression coefficients of variables with values 1 and 1. This means that the estimates are half the

12 812 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI Table II. Design for Example 1. The two first variables are the whole-plot variables and the last five are the sub-plot variables Obs A B a b c d e size of the so-called effects of the variables (see, for example, Box et al. 1 ). As can be seen, there is a significant effect of a and c. Both these effects are tested against the residual error term (DF = 6 for denominator). Note that the whole-plot tests have only one DF for error, making them very weak. The normal probability plot for the sub-plot effects is given in Figure 2. The same effects as above, a and c, are found to be significant. As there are only two whole-plot effects, no probability plot is used. The two different R 2 values discussed above for this model were equal to 0.88 and The latter is based on a model with one more DF than the former. They both indicate that the model explains a large amount of the variability, but this is not particularly useful information here with so few DFs for error. The two error variances were, using the REML procedure, estimated to be equal to and 0.057, indicating that the whole-plot error is much smaller than the sub-plot error (d = 0.14, see Method 1). This ratio is much smaller than unity and one can argue that using a regular LS estimation is quite appropriate here. The whole-plot error variance has a p-value of 0.3 (also provided by SAS), again indicating that the whole-plot error is quite small. Following Bisgaard 7, the variances for the whole-plot and sub-plot effects are given as Var(Â) = (4/32)(2 (5 2) σ 2 wp + σ 2 sp ) = σ 2 wp + 1/8σ 2 sp (19)

13 ANALYSIS OF SPLIT-PLOT DESIGNS 813 Table III. The mixed model ANOVA results for the mayonnaise data Effect Estimate Standard error DF t-value Pr > t Intercept A B a b c d e a*b a*c a*d a*e b*c b*d b*e c*d c*e A*a A*b A*c A*e B*a B*b B*c B*e Percent a ML Estimates Mean StDev Goodness of Fit AD* c Data Figure 2. Probability plot for the sub-plot effects (and interactions with whole plots) in the fractional factorial example

14 814 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI Figure 3. Analysing fractional factorial split-plot data. Residuals for model (a) with a whole-plot error term in the model and (b) without a whole-plot error term in the model and Var( ˆ P)= 1/8σ 2 sp (20) The whole-plot and sub-plot variance estimates from REML (0.008 and 0.057, respectively) can be inserted directly into these equations. In this case, the two variances are equal to 0.12 and 0.084, respectively. These are exactly twice as large as those obtained in the REML procedure for the regression coefficients, which of course was to be expected. If the hierarchical replicate structure in Section 4 (Method 5) is used for the present example, one would need m = 8 = 2 (5 2) sub-plot replicates for each whole-plot setting in order for the MS wp to be used directly in the whole-plot tests. The reason is that with this choice for m, E(MS wp ) = σwp 2 + 8σ sp 2, which is proportional to Equation (19). The use of simple t-tests is then straightforward. Residuals for the two models (model (18) with and without whole-plot error term) plotted versus wholeplot combination (WP) are presented in Figure 3. The residuals for blocks 1 and 2 in Figure 3(b) tend towards

15 ANALYSIS OF SPLIT-PLOT DESIGNS 815 positive values while the opposite is true for the other blocks. This may indicate that although the whole-plot error is weak, it is probably not zero. The other systematic patterns in the plots stem from the small DFs for error. The replicates of the centre point give a total variance equal to This is as low as the whole-plot error variance. A possible explanation of this very low value could be that the replicates are taken at the centre and this may be more stable than the corner points. At least, the small value indicates that the day effect is a negligible part of the whole-plot effect. The Lenth method was also tested for the sub-plot effects and the interactions between whole-plot and subplot factors. All of these effects were computed, then divided by the PSE to obtain the test statistic. This statistic was compared to the table in Loeppky and Sitter 17 corresponding to 32 experiments and 28 effects. It was found that the two main effects, a and c, plus the interaction Be were significant at level 0.05 (using the individual error rates; see Loeppky and Sitter 17 ). The latter was, however, only slightly so; at a level of 0.01 only the main effects were detected as significant. 8. EXAMPLE 2: ANALYSING MIXTURE-PROCESS SPLIT-PLOT DATA The data for this experiment come from the same pilot plant as was used for Example 1 (Sahni et al. 21 ). The data consist of measurements of a rheological property (measured one day after production) of mayonnaise produced using different recipes and a single process unit. Ten different recipes were obtained by mixing three ingredients in different proportions (evenly spread over the actual sub-region of interest) and treated at three different levels of a single process unit (temperature settings of a heat exchanger). All other factors were kept constant. The mixtures were the whole plots and the process variables the sub-plots. The design is presented and discussed more thoroughly in Sahni et al. 22. Note that this is a crossed split-plot situation leading to OLS = GLS. A possible model for the fixed part of this experiment is μ + β 1 X 1 + β 2 X 2 + β 12 X 1 X 2 + β 13 X 1 X 3 + β 23 X 2 X 3 + α 1 W + α 2 W 2 + γ 1 X 1 W + γ 2 X 2 W (21) where X correspond to the three ingredients that sum to a constant and W corresponds to the temperature. The model is obtained by multiplying a second-order mixture model with a second degree process model 23 and by reparametrizing in order to isolate the effects of the mixture and process components. In addition, all terms with an order higher than 2 have been eliminated. Model (21) is an empirical model, but corresponds well with experience and plots of the data. The random component of the model has a whole-plot contribution and a sub-plot contribution. The wholeplot error is nested under the X 1, X 2 and X 3 combinations while the sub-plot error is the regular residual error term. Model (21) was then analysed using OLS ignoring the split-plot structure and using REML accounting for the split-plot random structure. These two approaches correspond to methods (1) and (2) in Section 4. The results are given in Tables IV(a) and IV(b). It is clear from the REML table (Table IV(b)) that W, W 2 plus the interaction between X 1 and W are significant at the 5% level. X 2 is almost significant at the same level (p = 0.076), indicating a possible effect also of the mixture variables. There seems to be no second-order effect of the mixtures. There were, however, quite strong collinearities among some of the variables (variance inflation factor; see Cook and Weisberg 20 ) ranging from 1 to 1358, due to a highly restricted experimental mixture region, thus making it worthwhile to test a reduced model. Table IV(c) shows the results after eliminating the products of the mixture variables from the analysis. It can be seen that X 1, X 2, W, W 2 and the interaction between X 1 and W are now clearly significant. Comparing the OLS p-values in Table IV(a) with the REML p-values in Table IV(b), we see that the OLS p-values are higher for the sub-plot factors and lower for the whole-plot factors. Even some of the nonlinear mixture whole-plot terms are significant for LS. The LS tests have 20 DFs for all tests, while they vary between four and 16 for the REML analysis. The whole plots have very few DFs so the relatively large p-values are not surprising. Note, however, that even though the DFs for the sub-plots are smaller than for the OLS, the significances are improved. This corresponds to the fact that the residual variance is reduced.

16 816 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI Table IV. ANOVA tables based on Example 2, analysing mixtureprocess split-plot data: (a) LS analysis; (b) full REML analysis; (c) reduced REML analysis (a) Parameter Standard Variable DF estimate error t-value Pr > t Intercept < X X X X X W < W W*X W*X (b) Standard Effect Estimate error DF t-value Pr > t Intercept X X X X X W < W < W*X W*X (c) Standard Effect Estimate error DF t-value Pr > t Intercept X X W < W < W*X W*X The variance components of the two random effects for the full model are 3.3 for the whole plot and 1.3 for the sub-plot. In other words, it seems that the whole-plot contribution is substantial as compared to the residual. The two sum to 4.6. For the smaller model, the two were 2.8 and 1.3. The standard errors for the whole-plot error variance in the two cases were 2.6 and 1.7, yielding the p-values 0.1 and This shows that at least in the smaller model, the whole-plot error variance is significant at the 5% level. This again indicates that the whole-plot error is significant and that an OLS analysis is not valid (Method 1). In addition to the data used for the example above, nine true replicates of one of the recipes were made. The total error variance based on these replicates was 3.6. This is smaller than the model error variance for the full model with total error variance equal to 4.6, but close to the error variance for the reduced model with total error variance equal to 4.1. This indicates that the model is reasonable. Sixteen true sub-plot replicates were also made for this particular experiment. The variance (model-free sub-plot error) of these was equal to 0.7.

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of