The efficient use of experimental design methodology is an important requirement for fast and reliable

Size: px
Start display at page:

Download "The efficient use of experimental design methodology is an important requirement for fast and reliable"

Transcription

1 QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL Published online 7 November 2006 in Wiley InterScience ( Research Analysis of Split-plot Designs: An Overview and Comparison of Methods T. Næs 1,2,,,A.H.Aastveit 3 andn.s.sahni 4 1 Matforsk, Oslovegen 1, 1430 Ås, Norway 2 Department of Mathematics, University of Oslo, Blindern, Norway 3 UMB, 1430 Ås, Norway 4 Diagenic AS, Østensjøveien 15B, N-0661 Oslo, Norway Split-plot designs are frequently needed in practice because of practical limitations and issues related to cost. This imposes extra challenges on the experimenter, both when designing the experiment and when analysing the data, in particular for nonreplicated cases. This paper is an overview and discussion of some of the most important methods for analysing split-plot data. The focus is on estimation, testing and model validation. Two examples from an industrial context are given to illustrate the most important techniques. Received 7 June 2004; Revised 15 May 2006 KEY WORDS: split-plot designs; fractional factorial designs; ANOVA; mixed models; response surface models 1. INTRODUCTION The efficient use of experimental design methodology is an important requirement for fast and reliable progress in many branches of today s society. Factorial and fractional factorial designs are amongst the most frequently used design types 1. They are simple and generally straightforward to generate and analyse, and can be used in a number of different situations. The basic requirements of randomization and blocking of the experimental runs may, however, be difficult to satisfy due to either economic or practical reasons. Various restrictions are therefore imposed on the structure of the design, resulting in data with a more complex error structure than of completely randomized designs. One of the most important and frequently used strategies of this type are split-plot designs, which have recently received much attention in the literature (see, for example, Montgomery 2, Letsinger et al. 3, Bingham and Sitter 4,Kowalskiet al. 5, Box and Jones 6 and Bisgaard 7 ). Classic split-plot designs, traditionally used in agricultural applications, are run in replicates in a number of blocks. Data from such split-plot designs can be analysed by adding an extra random error term to the analysis of variance (ANOVA) model. In modern industrial situations, however, where fast and inexpensive progress is essential, replicates may lead to experiments that are too large, expensive and time consuming. Therefore, attention has switched from replicated to unreplicated split-plot experiments (see, for example, Box and Jones 6 and Bisgaard 7 ). Correspondence to: T. Næs, Matforsk, Oslovegen 1, 1430 Ås, Norway. tormod.naes@matforsk.no

2 802 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI This paper is an overview and discussion of the available methodology for analysing data from split-plot designs, with the main focus on industrial unreplicated experiments. Both situations with continuous and categorical variables are covered. Several problems related to the use of the methodology, for instance model validation and outlier detection, are also discussed. Some areas of possible future research are indicated. The paper ends with two real-life examples of split-plot experiments in the food industry, illustrating some of the different methodologies described. 2. EXAMPLES OF TYPICAL SPLIT-PLOT DESIGNS AND THEIR ANALYSES 2.1. Agricultural example with replicates In this example, three different fertilizers and four irrigation systems are to be tested out on two different varieties of potato. The yield per square hectare is the response variable of interest (denoted by Y ). For the purpose of experimentation, three fields, so-called blocks, are available. The blocks are assumed to be large enough that each potato variety can be tested with all three fertilizers and all four irrigation systems in each block, i.e. it is possible to split each block into = 24 smaller (equally sized) sections or plots. Practical limitations for the irrigation system, however, imply that a full randomization of the 24 combinations is not possible; the same irrigation can only be used for strip sections of the whole experimental area, as shown in Figure 1. If the positions of the four irrigation systems are randomized within the block and the three fertilizers and two potato varieties are randomized within the irrigation system, the resulting design is an example of a classical splitplot design (see, for example, Montgomery 2, Weber and Skillings 8 and Mead et al. 9 ). Irrigation is a so-called whole-plot factor and the other factors are so-called sub-plot factors. Note that because the fertilizers and potato varieties are investigated within the irrigation system, there is reason to believe that for many cases, the experimental error for these effects is smaller than it would be for fully randomized experiments. This can lead to more reliable inference for sub-plot effects than for the same effects in a fully randomized experiment. This aspect can in some cases be an additional argument for running the experiment in split-plot mode. For the whole-plot effects, however, the situation is often the opposite. These types of data are traditionally analysed using regular ANOVA based on computing sums of squares for all of the effects, then dividing them by their degrees of freedom (DFs) to obtain the mean squares (MSs) before the ratios of MSs are used in F -tests to check the significance of the effects. The main difference between the ANOVA model for a fully randomized block experiment and a split-plot experiment is that an extra whole-plot error term needs to be added at the whole-plot level of randomization. If we let i be the index for the whole-plot factor (irrigation method), j and k the indices for the sub-plot factors (fertilizer and potato variety) and l the index for the replicate/block, a natural model for the split-plot experiment is Y ijkl = μ + α i + β j + γ k + αβ ij + αγ ik + βγ jk + δ l + E il + e ijkl i = 1, 2, 3, 4, j= 1, 2, 3, k= 1, 2, l= 1, 2, 3 (1) Here, α i,i= 1, 2, 3, 4, are the irrigation effects (fixed effect), β j, j = 1, 2, 3, 4, are the fertilizer effects (fixed effect), γ k are the potato variety effects (fixed effect), δ j,j= 1, 2, 3, are the replicate/block effects (random effect) and E il is the whole-plot random error term which accounts for interaction between irrigation system i and block l (Var(E il ) = σ 2 wp ). e ijkl is the regular residual (or sub-plot) error with variance Var(e ijkl ) = σ 2 sp. The other effects are two-factor interactions between the fixed effects. A three-factor interaction effect is also possible, but not included in model (1). It is also possible to incorporate random interactions between sub-plot and replicate in the model, but these are often considered to be negligible and are therefore omitted here. As some of the effects are random, the tests for the different effects are based on different error terms. For this model, the whole-plot main effects are tested against the whole-plot MS error, while the sub-plot effects and interactions between whole plots and sub-plots are tested against the regular residual or sub-plot MS.

3 ANALYSIS OF SPLIT-PLOT DESIGNS 803 Block1 I1 I2 I4 I3 F2*P1 F1*P2 F1*P1 F3*P1 F1*P2 F2*P1 F3*P2 F2*P1 F2*P2 F1*P1 F2*P2 F1*P2 F3*P2 F3*P2 F3*P1 F1*P1 F2*P1 F3*P1 F1*P2 F2*P2 F1*P1 F2*P2 F2*P1 F3*P2 Block 2 I3 I4 I2 I1 F3*P1 F2*P1 F2*P2 F3*P2 F2*P2 F2*P2 F1*P2 F1*P1 F3*P2 F1*P2 F1*P1 F2*P2 F1*P1 F3*P2 F2*P1 F1*P2 F1*P2 F3*P1 F3*P1 F2*P1 F2*P1 F1*P1 F3*P2 F3*P1 Block 3 I2 I3 I1 I4 F2*P1 F1*P1 F3*P2 F1*P2 F2*P2 F3*P2 F2*P1 F2*P1 F1*P1 F1*P2 F3*P1 F3*P1 F3*P2 F3*P1 F1*P2 F3*P2 F1*P2 F2*P1 F1*P1 F2*P2 F3*P1 F2*P2 F2*P2 F1*P1 Figure 1. Illustration of the agricultural example with three replicates (blocks). The symbols I1 I4 represent the four irrigation systems, F1 F3 the three fertilizers and P1 and P2 the two potato varieties. The solid vertical lines indicate the borders between the strips of land treated by the same irrigation system. The four irrigation systems are randomized and the potato varieties and fertilizing regimes are randomized within the irrigation system 2.2. Industrial example without replicates Time and economic constraints often prohibit complex replications of industrial experiments. An example more typical in industry is therefore the following. Assume that, in a particular manufacturing process, one is interested in the effect (on the yield) of five different factors: A, B, C, D and E. Factors A and B are related to two process variables while C, D and E represent three different ingredients. Further, the process variables A and B are expensive to change in a completely randomized manner, so that a fully randomized design is not feasible.

4 804 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI A split-plot design with, for instance, two levels for each of the five factors may then be used. This is done by randomizing and conducting the eight (2 2 2) experiments corresponding to all possible combinations of the ingredient variables C, D and E for each of the four (2 2) combinations of the two process variables A and B. As above, the process variables are called whole-plot factors and the ingredient variables are called sub-plot factors. In some cases it may also be necessary to reduce the design further by using a fractional factorial design. This type of design run in split-plot mode was discussed by Bisgaard 7. In the first real-life example below, such a reduced design is used. A possible model for such a split-plot experiment is Y ijklm = μ + α j + β j + αβ ij + E ij + γ k + η l + δ m + W + e ijklm (2) where α i and β j represent the whole-plot effects (process variables A and B), αβ ik represents the interactions between the different levels of the whole-plot effects, γ k, η l, δ k represent the sub-plot effects (ingredients) and W comprise sub-plot interactions and interactions between whole-plot and sub-plot effects. All of these parameters are considered as fixed effects. E ij is the random whole-plot error (Var(E ij ) = σ 2 wp )ande ijklm is the sub-plot error (Var(e ijk ) = σ 2 sp ). Note that model (2) is essentially the same as model (1) except that now the whole-plot error E ij and the interaction between A and B (αβ ij ) have the same indices ij and, therefore, cannot be separated. This means that if the whole-plot interaction is present (non-zero), the two effects (αβ ij and E ij ) cannot be separated (identified) unless extra information is available. If, however, the interaction αβ ij isassumedtobezero,e ij is valid as a separate random whole-plot error term and a regular ANOVA can technically be used. Note that the whole-plot main effects will, however, only have one DF for the denominator, so the whole-plot test is very weak for this example. Sub-plot effects and their interactions are tested against the sub-plot error. To avoid this problem, the number of whole-plot factors must generally be relatively large, say five or six, and the higher-order interactions among them set equal to zero. In cases with many design factors, probability plots can also be very useful 6 leaving it unnecessary to assume any interactions equal to zero. Separate probability plots must be made for the whole-plot and sub-plot factors Categorical versus continuous variables In the two examples presented in Sections 2.1 and 2.2, the experimental factors were considered as categorical variables. For the last example, however, it is equally interesting to model the output in a more detailed way by treating the factors as continuous variables. To be able to do so, one would normally need a more elaborate design allowing for fitting, for instance, quadratic polynomials. One possibility is to run a central composite design (CCD) in the five factors instead of only the 2 5 design considered above. If such a design is used in split-plot mode, different sub-plot combinations are used for each whole-plot combination. This can sometimes lead to more complex analyses 3. An alternative is to use a full CCD for the whole-plot factors and then cross this with a CCD in the sub-plot factors, but this strategy will lead to rather large experiments. The second real-life example below is an example where the variables are continuous. Kowalski et al. 5 have discussed strategies for setting up fractions for such experiments. The different methods of analysis described in Section 4 will cover both the continuous and the categorical cases with the main focus on methods of analysing unreplicated designs. 3. GENERAL MODEL STRUCTURE FOR SPLIT-PLOT DESIGNS The two models in Section 2 and all other split-plot models can be cast into the general model framework y = Db + δ + e (3)

5 ANALYSIS OF SPLIT-PLOT DESIGNS 805 where y is the vector of response measurements, D is the design matrix for the experimental factors, b is the vector of corresponding regression coefficients, δ is the vector of whole-plot error terms and e is the vector of sub-plot errors (see, for example, Letsinger et al. 3 ). The terms δ and e can be combined into one error term: e = δ + e. The elements of δ are defined in such a way that they are identical for experiments within the same whole plot and uncorrelated between whole plots, while all of the elements of e are assumed uncorrelated and they are also assumed to be uncorrelated with the elements of δ. If we let the whole-plot error variance be equal to σ 2 wp and the sub-plot error variance be equal to σ 2 sp, the covariance matrix of the vector e = δ + e can be written as V = σ 2 wp J + σ 2 sp I (4) Here J is a block diagonal matrix with ones in the blocks and zero elsewhere, and I is the identity matrix. The dimension of both these matrices is the same and equal to p p, wherep is the length of the response vector y, i.e. p is equal to the number of observations. Each block in the block diagonal matrix corresponds to the elements within the same whole plot. For the second example discussed in Sections 2.2 (split-plot without replicates), all of the effects of the design matrix D are fixed. For the first example, however, a random block effect is also present in the model and this factor will then be a part of the design matrix D (with the corresponding coefficient being random). A model which avoids this ambiguity is the general mixed linear model y = Xb + Zu + e (5) where Xb represents only the fixed effects of the design, Zu represents the random effects and e is the vector of uncorrelated residual errors. Note that the covariance matrix of Zu + e is identical to that in Equation (4) given that there is no random replicate effect in the model. 4. ALTERNATIVE WAYS TO ANALYSE SPLIT-PLOT EXPERIMENTS The model in Equation (5) can be analysed in many different ways. If V is known, a natural approach is to use a generalized least-squares (GLS) estimator to estimate b (see, for example, Letsinger et al. 3 ). The GLS estimator can be written as ˆb GLS = (X T V 1 X) 1 X T V 1 y with covariance matrix equal to cov(ˆb GLS ) = (X T V 1 X) 1. These quantities can easily be combined to produce tests and confidence intervals. If the elements of V are unknown, which they usually are, they will have to be estimated and plugged into Equation (4). In the following we will discuss a number of ways in which this can be achieved. In some of the simplest and frequently occurring split-plot situations, the ordinary least-squares (OLS) estimator is identical to the GLS estimator and therefore the estimates of the variance components are needed only for the covariance matrix cov(ˆb GLS ) = (X T V 1 X) 1. Letsinger et al. 3 provide a mathematical characterization of some important situations when OLS = GLS. In particular, a distinction is made between the so-called crossed and non-crossed designs, corresponding to whether the same sub-plot experiments are repeated within all whole-plot combinations or not. It is shown that for the crossed designs OLS = GLS. An example of a non-crossed design type with the same property is the class of fractional factorial two-level designs analysed in the regular way by first-order models. Method 1 The simplest approach is to ignore the split-plot structure in the design. In such cases, the regular OLS estimate is used for b and the covariance matrix is calculated as (X T X) 1 σsp 2. This approach has been investigated using simulations by Letsinger et al. 3 and by Kowalski et al. 5 for different types of experiments. In both cases, the results indicate that when the whole-plot error is small (compared to the residual error), this approach can be recommended. Letsinger et al. 3 have argued that if the ratio d = σwp 2 /σ sp 2 is larger than unity, the OLS approach

6 806 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI should not be used. For d smaller than 0.4, the properties seem to be quite good as compared with some of the other choices discussed below. Næs et al. 10 have presented an example of a method that can be used to check this assumption for mixture data prior to model building. The method is applicable when raw material combinations represent the whole plots and is based on systematically selected measurements of the chemical constituents of the mixture components and the mixtures themselves. In some situations, it may be obvious from the design of the experiment that the whole-plot error is negligible. An example is provided by Hersleth et al. 11, where consumer preferences for a number of food products were analysed. In this case it was obvious from prior knowledge that the whole-plot error, which related to the design of the objects, was very small as compared to the sub-plot error, which was related to the random error in the consumer preference scores. Therefore, the split-plot structure could be neglected. Method 2 Another and more generally applicable approach is to use a full restricted maximum likelihood (REML; see, for example, Hand and Crowder 12 and Little et al. 13 ) estimation based on Equation (5). The REML is a well-accepted estimation method for estimating variance components in mixed models, and is based on ML estimation of the residual error distributions. This can be done using a statistical software package, e.g. SAS 14, which will be shown below. The method provides error variances that can be plugged into the equations above and thus be used to compute t-values and their corresponding p-values for the effects or regression coefficients. The main advantage of this approach is that it can be used in all reasonably designed cases. Software has also been developed that provides both DFs and significance tests directly. In the examples below, the SAS system is used. The software will also produce tests of composite hypotheses or contrasts. Exact DFs needed for the tests are generally not simple to obtain. An approximation method often used is the so-called Satterwait approximation method (see, for example, Satterwait 15 and Little et al. 13 ), based on the design structure as well as the actual data. The approximation formulae can be quite complex for large data sets, but are easily and automatically available from some of the software packages. Method 3 The third possibility is to use iteratively reweighted GLS estimation. This method starts with the regular OLS estimator for the model parameters. From the estimated coefficients, estimators for the error variances can be found using formulae presented in Letsinger et al. 3 (p. 395, Equations (C6) and (C7)). These can then be plugged into V using Equation (4) which can then be used in the GLS formula to provide new regression coefficients. The process continues until convergence. Results in Letsinger et al. 3 indicate that method (2) is sometimes superior to (3) and method (3) will therefore not be pursued further here. Method 4 The fourth approach is based on ANOVA. The sub-plot error variance is obtained by simply using the regular residual mean square (MS sp ) and the whole-plot error variance is obtained by using the difference between the sums of squares for the saturated and reduced models containing only the fixed effects believed to be important (see Letsinger et al. 3 ). The difference between the two models is thus thought of as representing the whole-plot error. The corresponding expected MS (MS dif ) is equal to E(MS dif ) = σ 2 sp + mσ 2 wp (6) where m is the number of sub-plot experiments per whole-plot combination. Again we assume that we are in the crossed situation described above. Simple calculations then show that ˆσ 2 wp = (MS dif MS sp )/m (7)

7 ANALYSIS OF SPLIT-PLOT DESIGNS 807 is an unbiased estimator for the whole-plot error variance. Slightly modified formulae are needed for the noncrossed case (see Letsinger et al. 3, p. 395, Equations (C6) and (C7)). Note that this approach is essentially equivalent to the ANOVA described for the second unreplicated example in Section 2.2. In that case, the MS for the non-modelled whole-plot interactions was used directly as the denominator of the F -tests (or, equivalently, t-tests) of the whole-plot effects. It is easy to show that the two tests are identical. Method 5 The next method to be discussed is based on using additional replicated measurements. If additional replications are collected in a sensible way, the variances of the whole-plot and the sub-plot errors can be estimated using simple variance formulae. Such an approach was described in, for instance, Kowalski et al. 5. One of the advantages of this approach is that the estimated error variances are totally model-independent. The disadvantage is that more experiments have to be carried out. Sub-plot replicates can often be obtained easily by just replicating some of the sub-plot combinations for a given whole-plot configuration. If variances are different for the different whole-plot combinations, it is recommended that this procedure is repeated for a number of whole-plot combinations. With r different wholeplot combinations and with m different sub-plot combinations within each of them, the sub-plot error variance estimate can be written as Ssp 2 = 1 ri=1 mj=1 (y ij ȳ i. ) 2 (8) r m 1 Here, y ij is the response measurement for whole-plot combination i and sub-plot combination j and ȳ i. is the average of y ij over j. For whole-plot variance estimation, however, the situation is different. Making replicates involving only the whole-plot error is difficult to envision because all replicates will always involve a contribution from the residual random (here sub-plot) error. At least measurement error will always be present, even if sub-plot combinations can be kept in exactly the same position during the experimentation. A possible approach for estimating the whole-plot variance estimation is the following. Select one combination of the experimental variables and repeat it r times, each time by resetting both the whole-plot and sub-plot factors between each experiment. These replicates will then be identical except for the total random noise in the experiment, which is equal to δ + e and has variance equal to σtot 2 = σ wp 2 + σ sp 2. The total error variance can be estimated using the simple formula r ˆσ tot 2 = (y i ȳ) 2 /(r 1) (9) i=1 This can, as for the sub-plot variance, be repeated for a number of experimental settings and pooled to give a more precise estimate. The whole-plot error variance can then be computed by subtracting the sub-plot variance ˆσ sp 2 from ˆσ 2 tot giving ˆσ 2 wp =ˆσ 2 tot ˆσ 2 sp (10) A problem with this approach, however, is that it is not obvious how to find the distribution of the error variances in V and their DFs, which are needed for testing purposes. For the situation with many replicates, this is not a major problem. The reason is that the estimated elements of V can then be assumed known. Another approach, which can be used to provide both whole-plot and sub-plot variances simultaneously, is to use a hierarchical design strategy. First, a whole-plot combination is selected and then this combination is reset a number of times (r). For each of these resets, the same sub-plot combination is repeated a number of times (m). The data will then contain information only about the noise structure and can be modelled as y ij = μ + A i + e ij (11)

8 808 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI where A i corresponds to the whole-plot random error and e ij to the sub-plot error. The expected whole-plot MS for this model is equal to the same expression as in Equation (6). The variances of the two errors can be estimated using the standard ANOVA formulae. ˆσ 2 sp = MS sp = 1 m(r 1) r i=1 j=1 m (y ij ȳ i ) 2 (12) and ˆσ 2 wp = (MS wp MS sp )/m = 1 m (( m r ) (ȳ i ȳ) 2 /(r 1) i=1 ˆσ 2 sp ) (13) where MS wp and MS sp are the mean squares for the two effects in model (11). The two MSs have DFs equal to (r 1) and m(r 1), respectively. An advantage with this hierarchical approach is that the number of sub-plot replicates m can in some cases be chosen in such a way that the expectations of the MSs are exactly equal to the variance elements of cov(ˆb GLS ) = (X T V 1 X) 1. This means that estimates for the error variances used in the tests will have known distribution and can be used directly for testing purposes. We will comment further on this topic in Section 7 in the first of the examples. Method 6 For factorial and fractional factorial split-plot designs with only two levels for each factor, it is also possible to use Q Q plots (or other probability plots; see, for example, Box and Jones 6 ). For split-plot models, the different effects have different variances, so the sub-plot and whole-plot effects must be compared separately. In order to use such plots, however, one needs at least three factors (and their two-factor interactions). If this is not the case, it is difficult to judge whether the points fall close to a straight line or not. Method 7 The last technique that will be mentioned is the so-called Lenth method 16 for analysing fractional factorial splitplot designs. This method is quite similar to the regular t-tests/f -tests mentioned in Method 4. These tests divide the coefficient or effect estimates on a standard error, but instead of using the regular standard error, a pseudo standard error (PSE) is computed. This corrects for the fact that a search strategy is being used. The method is based on first ranking all possible effect estimates and then searching for a split between the significant and non-significant effects by comparing them with the PSE. Tables have been developed to determine which factors are active. Extensions of the Lenth method to split-plot designs have been discussed by Loeppky and Sitter 17.Asfor Method 6, the Lenth method must be used separately for the whole-plot and sub-plot effects. The whole-plot effects are ranked and tested in the regular way while the sub-plot effects are tested by a modified procedure which considers the situation as a block design. Loeppky and Sitter 17 developed tables of critical values of these tests for this particular situation. 5. FRACTIONAL FACTORIAL SPLIT-PLOT DESIGNS The fractional factorial designs can be analysed according to the methods described in Sections 4, but there are a few additional aspects for these methods that we would like to mention before we present the examples. Fractional factorial designs 1 are systematically selected subsets of full factorial designs which have as few runs as possible while still providing useful information. In this section we concentrate on the simplest of these designs, having only two levels for each factor, the so-called 2 (k p) designs (see Section 2.2).

9 ANALYSIS OF SPLIT-PLOT DESIGNS 809 Table I. Cartesian and split-plot confounding (crossed and non-crossed designs). A and B are whole-plot factors while C and D are sub-plot factors. Both are experiments, but with different confounding. (a) Cartesian confounding. The same sub-plot experiments are repeated for all whole-plot combinations. A full factorial in A and B is confounded with a half fraction of the full design of C and D. The generator for the experiment is C = D. (b) Split-plot confounding. Here different sub-plot combinations are represented for each whole-plot combination. The generator for the experiment is D = ABC C/D A/B Low/Low High/Low Low/High High/High (a) Low/Low High/Low Low/High High/High (b) Low/Low High/Low Low/High High/High Such designs used in a split-plot framework were considered thoroughly by Bisgaard 7, who also presented a number of examples with different types of confounding. In particular, Bisgaard 7 discussed a number of different ways to confound whole-plot and sub-plot effects with each other. An important distinction is made between so-called Cartesian confounding and split-plot confounding closely related to the distinction between crossed and non-crossed designs 3 mentioned in Section 4. An advantage of the former type is that they provide less aliased interactions between sub-plots and whole plots. This can be of some importance in robustness studies. The non-crossed designs are usually preferred because they have higher resolution among the sub-plot factors 7. Note that for both cases, LS = GLS. This means that the estimated effects can be computed as usual in factorial designs. An illustration of the two types of confounding is given in Table I. There are fewer DFs available for testing, error estimation and probability plotting than for full factorial split-plot designs. Therefore, for such designs it is important to use the replicate structures. Another important problem is to identify which effects or contrasts must be tested against whole-plot error variance and which must be tested against a sub-plot variance. This problem was also discussed by Bisgaard 7, who gave a simple rule: all contrasts obtained by multiplying the basic generators for the whole-plot design in all possible ways will have to be tested against a whole-plot error variance given by (using A as a token whole-plot contrast; see also Bingham and Sitter 4 ) and all remaining contrasts against (using P as token sub-plotcontrast) Var(Â) = 4 N (2n σ 2 wp + σ 2 sp ) (14) Var( ˆ P)= 4 N σ 2 sp (15) Here N is equal to the total number of experiments (N = 2 (k p) )and2 n is the number of sub-plot experiments for each whole-plot. Note that these formulae are identical to the corresponding elements of cov(ˆb GLS ) = (X T V 1 X) 1.

10 810 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI Thus, whole-plot and sub-plot effects can be tested by simply computing their effects in the usual way and dividing them by estimates of the simple formulae in Equations (14) and (15), respectively. These quantities are then usually compared with a t-distribution with the appropriate DFs, which depends on which estimation method is used, as discussed above. The hierarchical replicate structure described in Section 4 is important here. Setting m = 2 n, the expectation of MS wp becomes exactly equal to (except for the 4/N factor) Var(Â),andtheMS wp can be used directly as the denominator in tests for the whole-plot effects. These will then be tests with 1 and (r 1) DFs. For the sub-plot effects, MS sp can be used and give rise to tests with 1 and m(r 1) DFs. An example of the analysis of this type of experiment with the use of both ANOVA and probability plotting is given in Section MISCELLANEOUS TOPICS 6.1. Analysis of longitudinal data (correlated sub-plot errors) In some types of split-plot experiment, there may be structures or dependencies among the sub-plot errors. A typical example of this is the following. Assume that a number of samples have been produced using a regular randomized factorial design. Then one sample from each combination is split into two, each stored at a different temperature. The samples are then monitored and measured a number of times over a time period of a certain length. In such cases, the production factors are the whole-plot factors and the temperature and time are sub-plot factors. A whole-plot error has to be incorporated at the production factor level. The sub-plot errors in this case correspond to measurements taken at different times. In such cases it is natural to assume a correlation between the sub-plot errors, i.e. between observations taken within the same production and storing conditions. This type of correlation should be taken into account when analysing the data. This type of data are often called longitudinal data 12,13,18. The general model for such an experiment can be written as Y = μ + production factors + whole plot error + storing conditions + time + sub-plot errors (16) with a correlation structure imposed for the sub-plot error terms. A simple alternative for modelling correlation is the AR(1) model 12, but many other alternatives also exist. Useful criteria have been developed for selecting the best possible covariance matrix. This type of model can be analysed by using the REML procedure as described in Section 4. If the SAS computing system 14 is used, an extra statement is needed to tell the computer what type of correlation structure is required. An example of such a situation can be found in Bjerke et al. 19. For further discussion of this topic, see Little et al. 13 or the SAS/STAT Users Guide 14, version The squared multiple correlation coefficient, R 2 For traditional regression models, R 2 = 1 RSS SYY (17) is often used for assessing model adequacy. Here RSS is the residual sum of squares and SYY is the total sum of squares (corrected for the mean) for the model. A value close to 1 is taken as an indicator of a good model (in particular, if the number of observations in the y-vector is large as compared to the number of design variables in the design). The same criterion can technically also be used for split-plot models, but here it is not obvious how to define R 2 in the most useful way. One possibility is to use R 2 obtained by fitting the fixed effects model without taking the whole-plot errors into account. A problem with this approach, however, is that the errors are correlated and it is not obvious how this influences the interpretation of R 2. Another possibility is to use R 2 obtained with

11 ANALYSIS OF SPLIT-PLOT DESIGNS 811 the whole-plot error incorporated in the model fitting. A problem with this is that some of the random error is moved from being a part of the error to being a part of the model structure. Other ad hoc measures along the same path of reasoning can also be envisioned. One possibility is to use the ratio of the variance for the whole-plot error and the split-plot error. Another is to use the ratio of this ratio and the total variance of the data set. Together, these three ratios and their relative size will provide important indications of model fit and the relative error size. More research is needed on this issue Residuals It is common practice to perform the residual analysis from a regression model (see, for example, Cook and Weisberg 20 ). As for R 2, in split-plot designs there are two types of residuals that are of interest. One possibility is to use residuals obtained for models with the whole-plot error included in the model. These will only contain information about possible structure (for instance, outliers) at the sub-plot level of the design. Residuals obtained without the whole-plot error in the model will, however, contain information about possible unwanted structures both at the whole-plot level and at the sub-plot level. Both residuals plots are easiest to interpret if observations for the same whole-plot combinations are plotted adjacent to each other, as will be done for the examples in Sections 7 and 8. In practice, we recommend using both plots to assess tendencies at both error levels. 7. EXAMPLE 1: ANALYSING FRACTIONAL FACTORIAL SPLIT-PLOT DATA The data for this example are taken from a larger pilot plant study of the production of mayonnaise 21. The design used was a regular split-plot fractional factorial design with seven factors. Two of the factors were ingredient factors (A and B) and five of them were process variables (a, b, c, d and e). The two ingredient factors were used as whole-plot factors, due to practical restrictions. The design used was a (5 2), which means that the design in the whole-plot factors is a full factorial, and that the total design is a 2 (5+2) 2 = design. The sub-plot part of the design was first generated as a regular fractional factorial design using e = abcd. Then the two designs were put together by using the two generators c = ABab and d = ABe, i.e. by using a split-plot (or non-crossed) confounding (see Table I and Bisgaard 7 ). The defining relation for the design is thus I = abcde = ABabc = ABde. The design is shown in Table II. The experiment was conducted over four days with one whole-plot combination tested each day. In order to control the stability of the production process, a centre point was repeated twice every day (randomly positioned within the day). Note that even though this is a non-crossed situation, it preserves the OLS = GLS equivalence through their first-order structure (see Letsinger et al. 3 ). It can also be noted from the confounding pattern that the sub-plot interaction de is confounded with AB. This means that the de interaction has a whole-plot error, although it has the appearance of a sub-plot effect. No other two-factor interactions of the sub-plot terms are in this case confounded with whole plots and were therefore tested against the sub-plot error variance. The design was first analysed using a regular mixed model ANOVA using the REML procedure (the SAS code is given in Appendix A). The model involves all main effects, all possible two-factor interactions plus whole-plot and sub-plot errors, i.e. Y ijklmno = μ + α i + β j + χ k + δ l + φ m + η n + ϕ o + E ij + W + e ijklmno (18) Here, all of the Greek letters represent the fixed factor effects, E ij is the whole-plot error (see also model (2)) and e ijklmno is the residual error. In addition to the confounding between de and the whole-plot error AB, Be is confounded with Ad and Bd with Ae. Therefore, de, Ad and Bd were all excluded from the model. W here corresponds to all two-factor interactions. In this paper we have focused on a single response, Y, related to the colour of the mayonnaise as assessed by a sensory panel. A full multivariate study of several other attributes is given in Sahni et al. 21. The results from the mixed model analysis are shown in Table III. Note that the estimates in this case are computed as regression coefficients of variables with values 1 and 1. This means that the estimates are half the

12 812 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI Table II. Design for Example 1. The two first variables are the whole-plot variables and the last five are the sub-plot variables Obs A B a b c d e size of the so-called effects of the variables (see, for example, Box et al. 1 ). As can be seen, there is a significant effect of a and c. Both these effects are tested against the residual error term (DF = 6 for denominator). Note that the whole-plot tests have only one DF for error, making them very weak. The normal probability plot for the sub-plot effects is given in Figure 2. The same effects as above, a and c, are found to be significant. As there are only two whole-plot effects, no probability plot is used. The two different R 2 values discussed above for this model were equal to 0.88 and The latter is based on a model with one more DF than the former. They both indicate that the model explains a large amount of the variability, but this is not particularly useful information here with so few DFs for error. The two error variances were, using the REML procedure, estimated to be equal to and 0.057, indicating that the whole-plot error is much smaller than the sub-plot error (d = 0.14, see Method 1). This ratio is much smaller than unity and one can argue that using a regular LS estimation is quite appropriate here. The whole-plot error variance has a p-value of 0.3 (also provided by SAS), again indicating that the whole-plot error is quite small. Following Bisgaard 7, the variances for the whole-plot and sub-plot effects are given as Var(Â) = (4/32)(2 (5 2) σ 2 wp + σ 2 sp ) = σ 2 wp + 1/8σ 2 sp (19)

13 ANALYSIS OF SPLIT-PLOT DESIGNS 813 Table III. The mixed model ANOVA results for the mayonnaise data Effect Estimate Standard error DF t-value Pr > t Intercept A B a b c d e a*b a*c a*d a*e b*c b*d b*e c*d c*e A*a A*b A*c A*e B*a B*b B*c B*e Percent a ML Estimates Mean StDev Goodness of Fit AD* c Data Figure 2. Probability plot for the sub-plot effects (and interactions with whole plots) in the fractional factorial example

14 814 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI Figure 3. Analysing fractional factorial split-plot data. Residuals for model (a) with a whole-plot error term in the model and (b) without a whole-plot error term in the model and Var( ˆ P)= 1/8σ 2 sp (20) The whole-plot and sub-plot variance estimates from REML (0.008 and 0.057, respectively) can be inserted directly into these equations. In this case, the two variances are equal to 0.12 and 0.084, respectively. These are exactly twice as large as those obtained in the REML procedure for the regression coefficients, which of course was to be expected. If the hierarchical replicate structure in Section 4 (Method 5) is used for the present example, one would need m = 8 = 2 (5 2) sub-plot replicates for each whole-plot setting in order for the MS wp to be used directly in the whole-plot tests. The reason is that with this choice for m, E(MS wp ) = σwp 2 + 8σ sp 2, which is proportional to Equation (19). The use of simple t-tests is then straightforward. Residuals for the two models (model (18) with and without whole-plot error term) plotted versus wholeplot combination (WP) are presented in Figure 3. The residuals for blocks 1 and 2 in Figure 3(b) tend towards

15 ANALYSIS OF SPLIT-PLOT DESIGNS 815 positive values while the opposite is true for the other blocks. This may indicate that although the whole-plot error is weak, it is probably not zero. The other systematic patterns in the plots stem from the small DFs for error. The replicates of the centre point give a total variance equal to This is as low as the whole-plot error variance. A possible explanation of this very low value could be that the replicates are taken at the centre and this may be more stable than the corner points. At least, the small value indicates that the day effect is a negligible part of the whole-plot effect. The Lenth method was also tested for the sub-plot effects and the interactions between whole-plot and subplot factors. All of these effects were computed, then divided by the PSE to obtain the test statistic. This statistic was compared to the table in Loeppky and Sitter 17 corresponding to 32 experiments and 28 effects. It was found that the two main effects, a and c, plus the interaction Be were significant at level 0.05 (using the individual error rates; see Loeppky and Sitter 17 ). The latter was, however, only slightly so; at a level of 0.01 only the main effects were detected as significant. 8. EXAMPLE 2: ANALYSING MIXTURE-PROCESS SPLIT-PLOT DATA The data for this experiment come from the same pilot plant as was used for Example 1 (Sahni et al. 21 ). The data consist of measurements of a rheological property (measured one day after production) of mayonnaise produced using different recipes and a single process unit. Ten different recipes were obtained by mixing three ingredients in different proportions (evenly spread over the actual sub-region of interest) and treated at three different levels of a single process unit (temperature settings of a heat exchanger). All other factors were kept constant. The mixtures were the whole plots and the process variables the sub-plots. The design is presented and discussed more thoroughly in Sahni et al. 22. Note that this is a crossed split-plot situation leading to OLS = GLS. A possible model for the fixed part of this experiment is μ + β 1 X 1 + β 2 X 2 + β 12 X 1 X 2 + β 13 X 1 X 3 + β 23 X 2 X 3 + α 1 W + α 2 W 2 + γ 1 X 1 W + γ 2 X 2 W (21) where X correspond to the three ingredients that sum to a constant and W corresponds to the temperature. The model is obtained by multiplying a second-order mixture model with a second degree process model 23 and by reparametrizing in order to isolate the effects of the mixture and process components. In addition, all terms with an order higher than 2 have been eliminated. Model (21) is an empirical model, but corresponds well with experience and plots of the data. The random component of the model has a whole-plot contribution and a sub-plot contribution. The wholeplot error is nested under the X 1, X 2 and X 3 combinations while the sub-plot error is the regular residual error term. Model (21) was then analysed using OLS ignoring the split-plot structure and using REML accounting for the split-plot random structure. These two approaches correspond to methods (1) and (2) in Section 4. The results are given in Tables IV(a) and IV(b). It is clear from the REML table (Table IV(b)) that W, W 2 plus the interaction between X 1 and W are significant at the 5% level. X 2 is almost significant at the same level (p = 0.076), indicating a possible effect also of the mixture variables. There seems to be no second-order effect of the mixtures. There were, however, quite strong collinearities among some of the variables (variance inflation factor; see Cook and Weisberg 20 ) ranging from 1 to 1358, due to a highly restricted experimental mixture region, thus making it worthwhile to test a reduced model. Table IV(c) shows the results after eliminating the products of the mixture variables from the analysis. It can be seen that X 1, X 2, W, W 2 and the interaction between X 1 and W are now clearly significant. Comparing the OLS p-values in Table IV(a) with the REML p-values in Table IV(b), we see that the OLS p-values are higher for the sub-plot factors and lower for the whole-plot factors. Even some of the nonlinear mixture whole-plot terms are significant for LS. The LS tests have 20 DFs for all tests, while they vary between four and 16 for the REML analysis. The whole plots have very few DFs so the relatively large p-values are not surprising. Note, however, that even though the DFs for the sub-plots are smaller than for the OLS, the significances are improved. This corresponds to the fact that the residual variance is reduced.

16 816 T. NÆS, A. H. AASTVEIT AND N. S. SAHNI Table IV. ANOVA tables based on Example 2, analysing mixtureprocess split-plot data: (a) LS analysis; (b) full REML analysis; (c) reduced REML analysis (a) Parameter Standard Variable DF estimate error t-value Pr > t Intercept < X X X X X W < W W*X W*X (b) Standard Effect Estimate error DF t-value Pr > t Intercept X X X X X W < W < W*X W*X (c) Standard Effect Estimate error DF t-value Pr > t Intercept X X W < W < W*X W*X The variance components of the two random effects for the full model are 3.3 for the whole plot and 1.3 for the sub-plot. In other words, it seems that the whole-plot contribution is substantial as compared to the residual. The two sum to 4.6. For the smaller model, the two were 2.8 and 1.3. The standard errors for the whole-plot error variance in the two cases were 2.6 and 1.7, yielding the p-values 0.1 and This shows that at least in the smaller model, the whole-plot error variance is significant at the 5% level. This again indicates that the whole-plot error is significant and that an OLS analysis is not valid (Method 1). In addition to the data used for the example above, nine true replicates of one of the recipes were made. The total error variance based on these replicates was 3.6. This is smaller than the model error variance for the full model with total error variance equal to 4.6, but close to the error variance for the reduced model with total error variance equal to 4.1. This indicates that the model is reasonable. Sixteen true sub-plot replicates were also made for this particular experiment. The variance (model-free sub-plot error) of these was equal to 0.7.

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Reference: Chapter 6 of Montgomery(8e) Maghsoodloo

Reference: Chapter 6 of Montgomery(8e) Maghsoodloo Reference: Chapter 6 of Montgomery(8e) Maghsoodloo 51 DOE (or DOX) FOR BASE BALANCED FACTORIALS The notation k is used to denote a factorial experiment involving k factors (A, B, C, D,..., K) each at levels.

More information

Reference: Chapter 14 of Montgomery (8e)

Reference: Chapter 14 of Montgomery (8e) Reference: Chapter 14 of Montgomery (8e) 99 Maghsoodloo The Stage Nested Designs So far emphasis has been placed on factorial experiments where all factors are crossed (i.e., it is possible to study the

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Chapter 13 Experiments with Random Factors Solutions

Chapter 13 Experiments with Random Factors Solutions Solutions from Montgomery, D. C. (01) Design and Analysis of Experiments, Wiley, NY Chapter 13 Experiments with Random Factors Solutions 13.. An article by Hoof and Berman ( Statistical Analysis of Power

More information

Design of Engineering Experiments Part 5 The 2 k Factorial Design

Design of Engineering Experiments Part 5 The 2 k Factorial Design Design of Engineering Experiments Part 5 The 2 k Factorial Design Text reference, Special case of the general factorial design; k factors, all at two levels The two levels are usually called low and high

More information

3.4. A computer ANOVA output is shown below. Fill in the blanks. You may give bounds on the P-value.

3.4. A computer ANOVA output is shown below. Fill in the blanks. You may give bounds on the P-value. 3.4. A computer ANOVA output is shown below. Fill in the blanks. You may give bounds on the P-value. One-way ANOVA Source DF SS MS F P Factor 3 36.15??? Error??? Total 19 196.04 Completed table is: One-way

More information

How To: Analyze a Split-Plot Design Using STATGRAPHICS Centurion

How To: Analyze a Split-Plot Design Using STATGRAPHICS Centurion How To: Analyze a SplitPlot Design Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus August 13, 2005 Introduction When performing an experiment involving several factors, it is best to randomize the

More information

Cost Penalized Estimation and Prediction Evaluation for Split-Plot Designs

Cost Penalized Estimation and Prediction Evaluation for Split-Plot Designs Cost Penalized Estimation and Prediction Evaluation for Split-Plot Designs Li Liang Virginia Polytechnic Institute and State University, Blacksburg, VA 24060 Christine M. Anderson-Cook Los Alamos National

More information

Chapter 11: Factorial Designs

Chapter 11: Factorial Designs Chapter : Factorial Designs. Two factor factorial designs ( levels factors ) This situation is similar to the randomized block design from the previous chapter. However, in addition to the effects within

More information

Lecture 9: Factorial Design Montgomery: chapter 5

Lecture 9: Factorial Design Montgomery: chapter 5 Lecture 9: Factorial Design Montgomery: chapter 5 Page 1 Examples Example I. Two factors (A, B) each with two levels (, +) Page 2 Three Data for Example I Ex.I-Data 1 A B + + 27,33 51,51 18,22 39,41 EX.I-Data

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

I-Optimal Versus D-Optimal Split-Plot Response Surface Designs

I-Optimal Versus D-Optimal Split-Plot Response Surface Designs I-Optimal Versus D-Optimal Split-Plot Response Surface Designs BRADLEY JONES SAS Institute and Universiteit Antwerpen PETER GOOS Universiteit Antwerpen and Erasmus Universiteit Rotterdam Response surface

More information

Lecture 11: Nested and Split-Plot Designs

Lecture 11: Nested and Split-Plot Designs Lecture 11: Nested and Split-Plot Designs Montgomery, Chapter 14 1 Lecture 11 Page 1 Crossed vs Nested Factors Factors A (a levels)and B (b levels) are considered crossed if Every combinations of A and

More information

20g g g Analyze the residuals from this experiment and comment on the model adequacy.

20g g g Analyze the residuals from this experiment and comment on the model adequacy. 3.4. A computer ANOVA output is shown below. Fill in the blanks. You may give bounds on the P-value. One-way ANOVA Source DF SS MS F P Factor 3 36.15??? Error??? Total 19 196.04 3.11. A pharmaceutical

More information

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel Institutionen för matematik och matematisk statistik Umeå universitet November 7, 2011 Inlämningsuppgift 3 Mariam Shirdel (mash0007@student.umu.se) Kvalitetsteknik och försöksplanering, 7.5 hp 1 Uppgift

More information

Explaining Correlations by Plotting Orthogonal Contrasts

Explaining Correlations by Plotting Orthogonal Contrasts Explaining Correlations by Plotting Orthogonal Contrasts Øyvind Langsrud MATFORSK, Norwegian Food Research Institute. www.matforsk.no/ola/ To appear in The American Statistician www.amstat.org/publications/tas/

More information

Design of Experiments SUTD - 21/4/2015 1

Design of Experiments SUTD - 21/4/2015 1 Design of Experiments SUTD - 21/4/2015 1 Outline 1. Introduction 2. 2 k Factorial Design Exercise 3. Choice of Sample Size Exercise 4. 2 k p Fractional Factorial Design Exercise 5. Follow-up experimentation

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Stat 217 Final Exam. Name: May 1, 2002

Stat 217 Final Exam. Name: May 1, 2002 Stat 217 Final Exam Name: May 1, 2002 Problem 1. Three brands of batteries are under study. It is suspected that the lives (in weeks) of the three brands are different. Five batteries of each brand are

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Classes of Second-Order Split-Plot Designs

Classes of Second-Order Split-Plot Designs Classes of Second-Order Split-Plot Designs DATAWorks 2018 Springfield, VA Luis A. Cortés, Ph.D. The MITRE Corporation James R. Simpson, Ph.D. JK Analytics, Inc. Peter Parker, Ph.D. NASA 22 March 2018 Outline

More information

PROBLEM TWO (ALKALOID CONCENTRATIONS IN TEA) 1. Statistical Design

PROBLEM TWO (ALKALOID CONCENTRATIONS IN TEA) 1. Statistical Design PROBLEM TWO (ALKALOID CONCENTRATIONS IN TEA) 1. Statistical Design The purpose of this experiment was to determine differences in alkaloid concentration of tea leaves, based on herb variety (Factor A)

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Unreplicated 2 k Factorial Designs

Unreplicated 2 k Factorial Designs Unreplicated 2 k Factorial Designs These are 2 k factorial designs with one observation at each corner of the cube An unreplicated 2 k factorial design is also sometimes called a single replicate of the

More information

OPTIMIZATION OF FIRST ORDER MODELS

OPTIMIZATION OF FIRST ORDER MODELS Chapter 2 OPTIMIZATION OF FIRST ORDER MODELS One should not multiply explanations and causes unless it is strictly necessary William of Bakersville in Umberto Eco s In the Name of the Rose 1 In Response

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

The 2 k Factorial Design. Dr. Mohammad Abuhaiba 1

The 2 k Factorial Design. Dr. Mohammad Abuhaiba 1 The 2 k Factorial Design Dr. Mohammad Abuhaiba 1 HoweWork Assignment Due Tuesday 1/6/2010 6.1, 6.2, 6.17, 6.18, 6.19 Dr. Mohammad Abuhaiba 2 Design of Engineering Experiments The 2 k Factorial Design Special

More information

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley Zellner s Seemingly Unrelated Regressions Model James L. Powell Department of Economics University of California, Berkeley Overview The seemingly unrelated regressions (SUR) model, proposed by Zellner,

More information

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Chap The McGraw-Hill Companies, Inc. All rights reserved. 11 pter11 Chap Analysis of Variance Overview of ANOVA Multiple Comparisons Tests for Homogeneity of Variances Two-Factor ANOVA Without Replication General Linear Model Experimental Design: An Overview

More information

Chapter 13: Analysis of variance for two-way classifications

Chapter 13: Analysis of variance for two-way classifications Chapter 1: Analysis of variance for two-way classifications Pygmalion was a king of Cyprus who sculpted a figure of the ideal woman and then fell in love with the sculpture. It also refers for the situation

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Practical Statistics for the Analytical Scientist Table of Contents

Practical Statistics for the Analytical Scientist Table of Contents Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning

More information

Answer Keys to Homework#10

Answer Keys to Homework#10 Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean

More information

Assignment 9 Answer Keys

Assignment 9 Answer Keys Assignment 9 Answer Keys Problem 1 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean 26.00 + 34.67 + 39.67 + + 49.33 + 42.33 + + 37.67 + + 54.67

More information

Optimal Selection of Blocked Two-Level. Fractional Factorial Designs

Optimal Selection of Blocked Two-Level. Fractional Factorial Designs Applied Mathematical Sciences, Vol. 1, 2007, no. 22, 1069-1082 Optimal Selection of Blocked Two-Level Fractional Factorial Designs Weiming Ke Department of Mathematics and Statistics South Dakota State

More information

Unit 6: Fractional Factorial Experiments at Three Levels

Unit 6: Fractional Factorial Experiments at Three Levels Unit 6: Fractional Factorial Experiments at Three Levels Larger-the-better and smaller-the-better problems. Basic concepts for 3 k full factorial designs. Analysis of 3 k designs using orthogonal components

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Confounding and fractional replication in 2 n factorial systems

Confounding and fractional replication in 2 n factorial systems Chapter 20 Confounding and fractional replication in 2 n factorial systems Confounding is a method of designing a factorial experiment that allows incomplete blocks, i.e., blocks of smaller size than the

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Design and Analysis of

Design and Analysis of Design and Analysis of Multi-Factored Experiments Module Engineering 7928-2 Two-level Factorial Designs L. M. Lye DOE Course 1 The 2 k Factorial Design Special case of the general factorial design; k factors,

More information

Maximal Rank - Minimum Aberration Regular Two-Level Split-Plot Fractional Factorial Designs

Maximal Rank - Minimum Aberration Regular Two-Level Split-Plot Fractional Factorial Designs Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 2, pp. 344-357 c 2007, Indian Statistical Institute Maximal Rank - Minimum Aberration Regular Two-Level Split-Plot Fractional Factorial

More information

Design of Experiments SUTD 06/04/2016 1

Design of Experiments SUTD 06/04/2016 1 Design of Experiments SUTD 06/04/2016 1 Outline 1. Introduction 2. 2 k Factorial Design 3. Choice of Sample Size 4. 2 k p Fractional Factorial Design 5. Follow-up experimentation (folding over) with factorial

More information

Key Features: More than one type of experimental unit and more than one randomization.

Key Features: More than one type of experimental unit and more than one randomization. 1 SPLIT PLOT DESIGNS Key Features: More than one type of experimental unit and more than one randomization. Typical Use: When one factor is difficult to change. Example (and terminology): An agricultural

More information

Reference: Chapter 13 of Montgomery (8e)

Reference: Chapter 13 of Montgomery (8e) Reference: Chapter 1 of Montgomery (8e) Maghsoodloo 89 Factorial Experiments with Random Factors So far emphasis has been placed on factorial experiments where all factors are at a, b, c,... fixed levels

More information

Sleep data, two drugs Ch13.xls

Sleep data, two drugs Ch13.xls Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch

More information

RCB - Example. STA305 week 10 1

RCB - Example. STA305 week 10 1 RCB - Example An accounting firm wants to select training program for its auditors who conduct statistical sampling as part of their job. Three training methods are under consideration: home study, presentations

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Lecture 10: Experiments with Random Effects

Lecture 10: Experiments with Random Effects Lecture 10: Experiments with Random Effects Montgomery, Chapter 13 1 Lecture 10 Page 1 Example 1 A textile company weaves a fabric on a large number of looms. It would like the looms to be homogeneous

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

TWO-LEVEL FACTORIAL EXPERIMENTS: BLOCKING. Upper-case letters are associated with factors, or regressors of factorial effects, e.g.

TWO-LEVEL FACTORIAL EXPERIMENTS: BLOCKING. Upper-case letters are associated with factors, or regressors of factorial effects, e.g. STAT 512 2-Level Factorial Experiments: Blocking 1 TWO-LEVEL FACTORIAL EXPERIMENTS: BLOCKING Some Traditional Notation: Upper-case letters are associated with factors, or regressors of factorial effects,

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Random and Mixed Effects Models - Part III

Random and Mixed Effects Models - Part III Random and Mixed Effects Models - Part III Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Quasi-F Tests When we get to more than two categorical factors, some times there are not nice F tests

More information

Two-Level Fractional Factorial Design

Two-Level Fractional Factorial Design Two-Level Fractional Factorial Design Reference DeVor, Statistical Quality Design and Control, Ch. 19, 0 1 Andy Guo Types of Experimental Design Parallel-type approach Sequential-type approach One-factor

More information

Designing Two-level Fractional Factorial Experiments in Blocks of Size Two

Designing Two-level Fractional Factorial Experiments in Blocks of Size Two Sankhyā : The Indian Journal of Statistics 2004, Volume 66, Part 2, pp 325-340 c 2004, Indian Statistical Institute Designing Two-level Fractional Factorial Experiments in Blocks of Size Two P.C. Wang

More information

Stat 705: Completely randomized and complete block designs

Stat 705: Completely randomized and complete block designs Stat 705: Completely randomized and complete block designs Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 16 Experimental design Our department offers

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

Session 3 Fractional Factorial Designs 4

Session 3 Fractional Factorial Designs 4 Session 3 Fractional Factorial Designs 3 a Modification of a Bearing Example 3. Fractional Factorial Designs Two-level fractional factorial designs Confounding Blocking Two-Level Eight Run Orthogonal Array

More information

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Mixed models Yan Lu Feb, 2018, week 7 1 / 17 Some commonly used experimental designs related to mixed models Two way or three way random/mixed effects

More information

Design of Screening Experiments with Partial Replication

Design of Screening Experiments with Partial Replication Design of Screening Experiments with Partial Replication David J. Edwards Department of Statistical Sciences & Operations Research Virginia Commonwealth University Robert D. Leonard Department of Information

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

Unit 9: Confounding and Fractional Factorial Designs

Unit 9: Confounding and Fractional Factorial Designs Unit 9: Confounding and Fractional Factorial Designs STA 643: Advanced Experimental Design Derek S. Young 1 Learning Objectives Understand what it means for a treatment to be confounded with blocks Know

More information

Solution to Final Exam

Solution to Final Exam Stat 660 Solution to Final Exam. (5 points) A large pharmaceutical company is interested in testing the uniformity (a continuous measurement that can be taken by a measurement instrument) of their film-coated

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Strategy of Experimentation II

Strategy of Experimentation II LECTURE 2 Strategy of Experimentation II Comments Computer Code. Last week s homework Interaction plots Helicopter project +1 1 1 +1 [4I 2A 2B 2AB] = [µ 1) µ A µ B µ AB ] +1 +1 1 1 +1 1 +1 1 +1 +1 +1 +1

More information

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Tentative solutions TMA4255 Applied Statistics 16 May, 2015 Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

Comparison of Re-sampling Methods to Generalized Linear Models and Transformations in Factorial and Fractional Factorial Designs

Comparison of Re-sampling Methods to Generalized Linear Models and Transformations in Factorial and Fractional Factorial Designs Journal of Modern Applied Statistical Methods Volume 11 Issue 1 Article 8 5-1-2012 Comparison of Re-sampling Methods to Generalized Linear Models and Transformations in Factorial and Fractional Factorial

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

More information

Design and Analysis of Multi-Factored Experiments

Design and Analysis of Multi-Factored Experiments Design and Analysis of Multi-Factored Experiments Two-level Factorial Designs L. M. Lye DOE Course 1 The 2 k Factorial Design Special case of the general factorial design; k factors, all at two levels

More information

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data Today s Class: Review of concepts in multivariate data Introduction to random intercepts Crossed random effects models

More information

DOE Wizard Screening Designs

DOE Wizard Screening Designs DOE Wizard Screening Designs Revised: 10/10/2017 Summary... 1 Example... 2 Design Creation... 3 Design Properties... 13 Saving the Design File... 16 Analyzing the Results... 17 Statistical Model... 18

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

8/04/2011. last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression)

8/04/2011. last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression) psyc3010 lecture 7 analysis of covariance (ANCOVA) last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression) 1 announcements quiz 2 correlation and

More information

Lecture 11: Blocking and Confounding in 2 k design

Lecture 11: Blocking and Confounding in 2 k design Lecture 11: Blocking and Confounding in 2 k design Montgomery: Chapter 7 Page 1 There are n blocks Randomized Complete Block 2 k Design Within each block, all treatments (level combinations) are conducted.

More information

y = µj n + β 1 b β b b b + α 1 t α a t a + e

y = µj n + β 1 b β b b b + α 1 t α a t a + e The contributions of distinct sets of explanatory variables to the model are typically captured by breaking up the overall regression (or model) sum of squares into distinct components This is useful quite

More information

Linear Regression Models

Linear Regression Models Linear Regression Models Model Description and Model Parameters Modelling is a central theme in these notes. The idea is to develop and continuously improve a library of predictive models for hazards,

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

Chapter 6 The 2 k Factorial Design Solutions

Chapter 6 The 2 k Factorial Design Solutions Solutions from Montgomery, D. C. (004) Design and Analysis of Experiments, Wiley, NY Chapter 6 The k Factorial Design Solutions 6.. A router is used to cut locating notches on a printed circuit board.

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

Nesting and Mixed Effects: Part I. Lukas Meier, Seminar für Statistik

Nesting and Mixed Effects: Part I. Lukas Meier, Seminar für Statistik Nesting and Mixed Effects: Part I Lukas Meier, Seminar für Statistik Where do we stand? So far: Fixed effects Random effects Both in the factorial context Now: Nested factor structure Mixed models: a combination

More information