Unit 9: Confounding and Fractional Factorial Designs

Unit 9: Confounding and Fractional Factorial Designs STA 643: Advanced Experimental Design Derek S. Young 1

Learning Objectives Understand what it means for a treatment to be confounded with blocks Know how generalized interactions are used in confounding Know how to construct and analyze incomplete block designs for 2 k and 3 k factorial designs Become familiar with half-fraction and quarter-fraction designs Understand how we use aliasing and design generators for fractional factorial designs Understand the resolution of a design Understand how to use the Plackett-Burman designs for the purpose of screening large number of factors 2

Outline of Topics 1 Confounding 2 Fractional Factorial Designs 3 Design Resolution 3

Outline of Topics 1 Confounding 2 Fractional Factorial Designs 3 Design Resolution 4

Leveraging Factorial Treatment Designs Recall that we designed and analyzed experiments that involved k factors, each at two levels i.e., 2 k factorial designs. A major advantage of 2 k factorial designs is that they help determine whether the factors act independently or if they interact with one another as they affect the EUs. By far, 2 k and 3 k (i.e., three levels of each factor) factorial designs are the most common, but as the number of factors increases, then the number of treatment combinations rapidly increases. Thus, we turn to incomplete block designs to help control experimental error, while simultaneously accomplishing what we hope to answer with a factorial design. We can leverage the factorial arrangement to provide effective incomplete block designs and subsequent analyses. 5

Example: Popcorn Experiment If you have ever made a bag of microwaveable popcorn, you probably noticed that there are always unpopped kernels at the bottom of the bag. An experiment involving (approximately) 3.5 ounce bags of popcorn was designed. Three factors are studied: brand of popcorn (a cheap brand and a costly brand), amount of time in the microwave (4 minutes and 6 minutes), and percent power of the microwave (75% and 100%). Thus, this is a 2 3 factorial design. The weight of the remaining unpopped kernels (in ounces) after popping is recorded. See the table below. Note that the observation in the second row is 3.5 oz. For this bag, virtually none of the popcorn popped. The first column is the run order ; i.e., the order in which the treatments were ran. Run Order Brand (A) Time (B) Power (C) Kernels (Y) 8 Cheap 4 75 3.1 1 Costly 4 75 3.5 2 Cheap 6 75 1.6 4 Costly 6 75 1.2 3 Cheap 4 100 0.7 5 Costly 4 100 0.7 7 Cheap 6 100 0.5 6 Costly 6 100 0.3 6

Example: Popcorn Experiment Below is the design matrix for the full model with the two-way interactions and three-way interaction, reported in standard order. The low levels of each factor are denoted by a - these are the levels of Cheap, 4, and 75. The high levels of each factor are denoted by a + these are the levels of Costly, 6, and 100. Std. Order I A B C AB AC BC ABC Y 1 + - - - + + + - y 111 = 3.1 2 + + - - - - + + y 211 = 3.5 3 + - + - - + - + y 121 = 1.6 4 + + + - + - - - y 221 = 1.2 5 + - - + + - - + y 112 = 0.7 6 + + - + - + - - y 212 = 0.7 7 + - + + - - + - y 122 = 0.5 8 + + + + + + + + y 222 = 0.3 7

Example: Popcorn Experiment A few notes about the design matrix on the previous slide: The use of + and - symbols is so that we can define contrasts of coefficients of +1 and -1 for each of the treatment combinations. The column I contains all +, which is used to estimate the grand mean with a divisor of 2 3 (or more generally, 2 k ). We say that the matrix is in standard order because the pattern of + s and - s in the columns (factors) have been arranged a certain way. Namely, column A has successive pairs of - and + signs, column B has pairs of - signs followed by pairs of + signs, and column C has four - signs followed by four + signs. In general, the k th column has 2 k 1 of the - signs followed by an equal number of + signs. Thus, you can construct a design matrix in standard order for an arbitrary 2 k factorial design. The coefficients for any interaction is the product of the columns of coefficients that comprise that interaction. For example, ABC in row 7 is found by (-1)(+1)(+1)=(-1), or simply, a - sign. 8

Example: Popcorn Experiment Below are the treatment means plots. We see that there is a potential interaction due to the magnitude of the responses. Recall that no interaction is present if lines at two levels of a treatment are parallel to each other. Power = 75% Power = 100% Amount of Kernels (oz) 1.5 2.0 2.5 3.0 3.5 1 2 1 2 Time 1 4 2 6 Amount of Kernels (oz) 0.3 0.4 0.5 0.6 0.7 1 1 2 2 Time 1 4 2 6 cheap costly cheap costly Brand Brand 9

Example: Popcorn Experiment B C Simple Effect of Cheap Brand to Costly Brand (A) 4 75 ˆµ 211 ˆµ 111 = 3.5 3.1 = 0.4 6 75 ˆµ 221 ˆµ 121 = 1.2 1.6 = 0.4 4 100 ˆµ 212 ˆµ 112 = 0.7 0.7 = 0.0 6 100 ˆµ 222 ˆµ 122 = 0.3 0.5 = 0.2 A C Simple Effect of 4 Minutes to 6 Minutes (B) Cheap 75 ˆµ 121 ˆµ 111 = 1.6 3.1 = 1.5 Costly 75 ˆµ 221 ˆµ 211 = 1.2 3.5 = 2.3 Cheap 100 ˆµ 122 ˆµ 112 = 0.5 0.7 = 0.2 Costly 100 ˆµ 222 ˆµ 212 = 0.3 0.7 = 0.4 A B Simple Effect of 75% Power to 100% Power (C) Cheap 4 ˆµ 112 ˆµ 111 = 0.7 3.1 = 2.4 Costly 4 ˆµ 212 ˆµ 211 = 0.7 3.5 = 2.8 Cheap 6 ˆµ 122 ˆµ 121 = 0.5 1.6 = 1.1 Costly 6 ˆµ 222 ˆµ 221 = 0.3 1.2 = 0.9 Note: ˆµ ijk = y ijk 10

Example: Popcorn Experiment Grand Mean: ȳ = 1 8 y = 1.45 Main Effects: Factor A: = Factor B: = Factor C: = Two-Factor Interactions: 1 AB: 4 (y 111 + y 112 + y 221 + y 222 ) 1 4 (y 211 + y 212 + y 121 + y 122 ) = 1.325 1.575 = 0.250 1 AC: 4 (y 111 + y 121 + y 212 + y 222 ) 1 4 (y 211 + y 221 + y 112 + y 122 ) = 1.425 1.475 = 0.050 1 BC: 4 (y 111 + y 211 + y 122 + y 222 ) 1 4 (y 121 + y 221 + y 112 + y 212 ) = 1.850 1.050 = 0.800 Three-Factor Interaction: ABC: 1 4 ((y 222 y 122 ) (y 212 y 112 )) 1 4 ((y 221 y 121 ) (y 211 y 111 )) = 0.050 ( 0.200) = 0.150 11

Example: Popcorn Experiment Analysis of Variance Table Response: bullets Df Sum Sq Mean Sq F value Pr(>F) brand 1 0.005 0.005 0.1111 0.79517 time 1 2.420 2.420 53.7778 0.08628. power 1 6.480 6.480 144.0000 0.05293. brand:time 1 0.125 0.125 2.7778 0.34404 brand:power 1 0.005 0.005 0.1111 0.79517 time:power 1 1.280 1.280 28.4444 0.11800 Residuals 1 0.045 0.045 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Since this is a 2 3 factorial design without replicates, we use the three-way interaction as our estimate of error. Moreover, the three-way interaction effect appears minor given that the effect calculated on the previous slide is only 0.150, which is smaller compared to most of the other calculated effects. In the above ANOVA table, it appears that all of the two-way interactions are not significant at the 0.05 level. While we could systematically remove each interaction based on the highest p-value, we will simply remove all two-way interactions from the analysis and focus on the main effects. 12

Example: Popcorn Experiment Analysis of Variance Table Response: bullets Df Sum Sq Mean Sq F value Pr(>F) brand 1 0.005 0.0050 0.0137 0.91232 time 1 2.420 2.4200 6.6529 0.06137. power 1 6.480 6.4800 17.8144 0.01347 * Residuals 4 1.455 0.3637 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 In the above ANOVA table, it appears that brand is not statistically significant give it has such a large p-value. This, again, is not unexpected given that the estimated main effect of -0.050 is small. So, we remove the effect due to brand. 13

Example: Popcorn Experiment Analysis of Variance Table Response: bullets Df Sum Sq Mean Sq F value Pr(>F) time 1 2.42 2.420 8.2877 0.034635 * power 1 6.48 6.480 22.1918 0.005286 ** Residuals 5 1.46 0.292 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 In the above ANOVA table, we are left with both time and power as significant effects on the amount of kernels left after popping. In all of these ANOVA tables, the F -tests were tested against the MSE. 14

Contrasts In general, the contrast among treatment means (main effect or interaction) is l AB = t c tȳ t, where t is the index over the different treatments and c t = ±1 are the coefficients for the contrasts, which are determined by the + s and - s in the column of the design matrix for treatment AB. The estimate of an effect and its standard error estimate for any contrast among treatment means in a complete 2 k factorial design are, respectively, AB = l AB 2 k 1 and s AB = 4σ 2 r2 k, where r is the number of replicates, which has been and will be 1 for most of our discussion. The 1 df SS for the effects is given by SS(AB ) = r(l AB ) 2 2 k 15

Incomplete Block Designs for 2 k Factorials A complete replication for a 2 k factorial with many factors may not be possible in a single, complete block; e.g., If there is insufficient raw material in a manufactured batch to accommodate all treatments, then each batch of raw materials can be used as an incomplete block. If experimental error is too large with an RCBD in an agricultural field, then the variation among field plots can be controlled in the experiment with reduced block sizes for more homogeneous groups of experimental plots. If a mother cat has a small litter of kittens, which are not enough for us to test all different treatments for a particular experiment on them, then the kittens could be used as an incomplete block. 16

Confounding Confounding occurs when some treatment effects (either main effects or interactions) are estimated by the same linear combination of the experimental observations as a blocking effect; i.e., the treatment effect is indistinguishable from the effect of the blocks with which it is confounded. Confounding naturally arises in full factorial designs that are run in blocks, where the block size is smaller than the number of different treatment combinations. Ordinarily the highest-order interaction effect in a 2 k factorial is chosen to be confounded with blocks. Usually, main effects and two-factor interactions are the effects of most interest, so confounding higher-order interactions means the other (important) effects are estimated without penalty. In other words, we usually avoid confounding main effects and two-factor interactions with blocks. 17

The Yates Notation Earlier we presented the design matrix for a 2 3 factorial design. To make referring to specific treatments easier (instead of referring to their standard order number), we use a particular labeling convention. The first column in the design matrix is how we refer to the treatments; i.e., it combines all of the letters for the main effects that occur at their high level ( + ) in that treatment, but where we use lower-case letters to not cause additional confusion. The first treatment is (1), which simply means that none of the main effects are at their high level. This labeling convention is known as Yates s notation. Treatment I A B C AB AC BC ABC (1) + - - - + + + - a + + - - - - + + b + - + - - + - + ab + + + - + - - - c + - - + + - - + ac + + - + - + - - bc + - + + - - + - abc + + + + + + + + 18

Rules for Standard Order There is a simple algorithm for writing a 2 k factorial design in standard order: 1 The first column is I and all entries are set to +. 2 The second column is for factor A. The first entry is -, the second entry is +, and we alternate between the signs until all 2 k entries are filled. 3 The third column is for factor B. The first two entries are -, the second two entries are +, and we alternate between the signs with two in a row until all 2 k entries are filled. 4 The fourth column is for factor C. The first four entries are -, the second four entries are +, and we alternate between the signs with four in a row until all 2 k entries are filled. 5 The j th column starts with 2 j 1 - entries in a row, then has 2 j 1 + entries in a row, and this pattern is repeated until all 2 k entries are filled. 19

Evens-Odds Rule In 2 k factorial designs, half of the treatments have a - and half of the treatments have a +. The treatments can be divided into two groups each of 2 k /2 = 2 k 1 EUs using the evens-odds rule, which says that any treatment combination (defined like the first column of the design matrix on the previous slide) that has an even number of letters receives one of the coefficients + or -, while treatments with an odd number of letters receive the other coefficient. The convention is that the treatment labelled (1) has zero letters, which is an even number. In this set-up, we confound the k th order interaction with blocks. We will illustrate this confounding to achieve incomplete blocks from both 2 3 and 2 4 factorials. 20

2 3 Factorial The contrast for the three-factor interaction ABC is given by l ABC = (a + b + c + abc) + ( (1) ab ac bc), where the first four treatments have a coefficient of c i = +1 and the second four treatments have a coefficient of c i = 1. Thus, the first four treatments with the + sign will be in Block 1 and the second four treatments with - sign will be in Block 2. Four EUs in Block 1 will be randomly assigned to one of the treatments a, b, c, or abc. Four EUs in Block 2 will be randomly assigned to one of the treatments (1), ab, ac, or bc. 21

2 3 Factorial By confounding the three-factor interaction with the blocking factor, we get the following incidence matrix: Treatment Block 1 2 (1) 0 1 a 1 0 b 1 0 ab 0 1 c 1 0 ac 0 1 bc 0 1 abc 1 0 22

2 4 Factorial The contrast for the four-factor interaction ABCD is given by l ABCD = The first eight treatments have a coefficient of c i = +1 and the second eight treatments have a coefficient of c i = 1. Thus, the first eight treatments with the + sign will be in Block 1 and the second eight treatments with - sign will be in Block 2. Eight EUs in Block 1 will be randomly assigned to one of the treatments (1), ab, ac, ad, bc, bd, cd, or abcd. Eight EUs in Block 2 will be randomly assigned to one of the treatments a, b, c, d, abc, abd, acd, or bcd. 23

2 4 Factorial By confounding the four-factor interaction with the blocking factor, we get the following incidence matrix: Treatment Block 1 2 (1) 1 0 a 0 1 b 0 1 ab 1 0 c 0 1 ac 1 0 bc 1 0 abc 0 1 d 0 1 ad 1 0 bd 1 0 cd 1 0 abd 0 1 acd 0 1 bcd 0 1 abcd 1 0 24

ANOVA Overview for Confounded Design The SS are computed as usual for completely confounded designs, except for the exclusion of a SS partition for the interaction effect confounded with the block. SSBlk will include the confounded factorial effect. Below is the ANOVA table (with df only) for a 2 3 factorial design with b incomplete blocks and r replicate groups note that ABC is confounded with the blocks, so the SSBlk includes the ABC effect. Source df Replicates r 1 Blocks within Replicates r(b 1) Treatments 6 A 1 B 1 C 1 AB 1 AC 1 BC 1 Error r(2 3 b) 6 Total r(2 3 ) 1 25

Method for Creating Incomplete Blocks For constructing incomplete block designs with chosen factorial effects confounded with blocks, we utilize residues modulo m, which means that for an integer k, the residue mod m is the remainder when k is divided by m. The residue r for the integer k mod m is written as k = r(mod m). For example, 7=1(mod 2) since 7 divided by 2 has a remainder of 1. With 2 k factorial designs, we work with residues of (mod 2), so the values we will be working with are 0 and 1. To determine the allocation of treatment contrasts to incomplete blocks, we use the defining contrast L = α 1 x 1 + α 2 x 2 + + α k x k, (1) which is a linear function where the contrast is confounded with blocks and α i = I{i th factors is present in the defining contrast} L is evaluated for each treatment combination. The value of x i is the level of the i th factor (either 0 or 1 for low or high, respectively) in any treatment combination under consideration for allocation to an incomplete block. 26

Example: Creating Incomplete Blocks Suppose the defining contrast for a 2 4 factorial in two blocks of eight EUs is the three-factor interaction ABD. A, B, and D are present in the defining contrast, so α 1 = α 2 = α 4 = 1, while the absence of C means α 3 = 0. The defining contrast for the allocation of treatments is then L = x 1 + x 2 + x 4. The treatment combinations with L = 0(mod 2) are assigned to Block 1 and those with L = 1(mod 2) are assigned to Block 2. The values (using standard order) and assignment are as follows: Treatment x 1 x 2 x 4 L = x 1 + x 2 + x 4 r(mod 2) Block Assignment (1) 0 0 0 0 0 Block 1 a 1 0 0 1 1 Block 2 b 0 1 0 1 1 Block 2 ab 1 1 0 2 0 Block 1 c 0 0 0 0 0 Block 1 ac 1 0 0 1 1 Block 2 bc 0 1 0 1 1 Block 2 abc 1 1 0 2 0 Block 1 d 0 0 1 1 1 Block 2 ad 1 0 1 1 0 Block 1 bd 0 1 1 2 0 Block 1 cd 0 0 1 1 1 Block 2 abd 1 1 1 3 1 Block 2 acd 1 0 1 2 0 Block 1 bcd 0 1 1 2 0 Block 1 abcd 1 1 1 3 1 Block 2 Note that if B 1 and B 2 represent the block totals, the treatments in B 1 have a + sign while the treatments in B 2 have a - sign. Thus, the contrast among the block totals is equivalent to the ABD contrast: l ABD = B 1 B 2. 27

Use of Two Defining Contrasts We have, thus far, focused on 2 k factorials with two incomplete blocks of 2 k 1 EUs, each with one defining contrast confounding with blocks. The use of two blocks with one factorial effect confounded may not reduce block size sufficiently. For example, a 2 6 factorial with 64 treatments may require block sizes no larger than 16 EUs per block with four blocks per replications. We can accomplish further reduction in block size by confounding an additional defining contrast with blocks. 28

Example: Creating Incomplete Blocks Suppose a 2 4 factorial is to have 16 treatment combination placed in four blocks of 2 4 2 = 4 EUs each. Suppose that AC and BD are chosen to be confounded with blocks. The defining contrasts are then L 1 = x 1 + x 3 and L 2 = x 2 + x 4. Each treatment will provide a pair of residues modulo 2: (L 1, L 2 ). The pairs (0,0), (0,1), (1,0), and (1,1) will be assigned to Block 1, Block 2, Block 3, and Block 4, respectively. Treatment x 1 x 2 x 3 x 4 L 1 L 2 Residue Block Assignment (1) 0 0 0 0 0 0 (0,0) Block 1 a 1 0 0 0 1 0 (1,0) Block 3 b 0 1 0 0 0 1 (0,1) Block 2 ab 1 1 0 0 1 1 (1,1) Block 4 c 0 0 1 0 1 0 (1,0) Block 3 ac 1 0 1 0 2 0 (0,0) Block 1 bc 0 1 1 0 1 1 (1,1) Block 4 abc 1 1 1 0 2 1 (0,1) Block 2 d 0 0 0 1 0 1 (0,1) Block 2 ad 1 0 0 1 1 1 (1,1) Block 4 bd 0 1 0 1 0 2 (0,0) Block 1 cd 0 0 1 1 1 1 (1,1) Block 4 abd 1 1 0 1 1 2 (1,0) Block 3 acd 1 0 1 1 2 1 (0,1) Block 2 bcd 0 1 1 1 1 2 (1,0) Block 3 abcd 1 1 1 1 2 2 (0,0) Block 1 29

Example: Generalized Interactions Let us continue with our previous example for the purpose of defining a generalized interaction. ABCD is the generalized interaction confounded as a consequence of purposely confounding AC and BD with blocks. We can use more generic algebraic rules for determining generalized interactions. We do this by forming the product of the symbols for the defining contrasts with the exponent of any symbol reduced to mod 2. The product of AC and BD is AC BD = ABCD. The generalized interaction of AC and BD, as determined by the symbol product, is ABCD since all exponents of the symbol product are 1(mod 2). If the contrasts ACD and BCD had been chosen for the original defining contrasts, then the product is ACD BCD=ABCCDD=ABC 2 D 2 =AB, where C 2 and D 2 are dropped because their exponents are 0(mod 2). Therefore, the generalized interaction is AB when ACD and BCD are the defining contrasts. 30

Incomplete Block Designs for 3 k Factorial Designs 3 k factorial designs make it possible to estimate linear and quadratic trends for quantitative factors and to provide more detailed descriptions of qualitative factors. One constraint is that the number of EUs in a 3 k factorial design increases by power of 3 as more factors are added. Because of the large number of necessary EUS, incomplete block designs for 3 k factorials become very useful. The levels of a factor are (usually) represented by x i = 0, 1, 2; e.g., a 3 2 factorial design with factors A and B results in the following t = 9 treatment conditions: A 0 1 2 0 00 10 20 B 1 01 11 21 2 02 12 22 31

Confounding for 3 k Factorial Designs Incomplete block designs for 3 k factorials require three blocks to have blocks of equal size. We will have 2 df between blocks and a treatment effect with 2 df must be confounded with blocks. We have 2 df for main effects, 2 2 df for two-factor interactions, and so on. As before, we would not want to confound main effects with blocks. Defining contrasts in 3 k factorials and confounding with three or more factors are both done analogously to what we presented for 2 k factorials. Note that now we must work with residues of (mod 3). 32

Outline of Topics 1 Confounding 2 Fractional Factorial Designs 3 Design Resolution 33

Utility of Fractional Treatment Designs Even when each factor is studied at two level (e.g., 2 k factorial designs), the number of EUs necessary grows geometrically with the number of factors; e.g., 2 2 = 4, 2 4 = 16, 2 6 = 64, 2 8 = 256, 2 10 = 1024, etc. Fractional factorial designs use only one-half, one-quarter, or smaller fractions of the 2 k treatment combinations. Some of the reasons for using fractional factorial designs include: the number of necessary treatments exceed resources; information is needed only from main effects and low-order interactions; screening studies; and working under the assumption that only a few effects are truly important (this is driven by what is called the factor sparsity hypothesis or sparsity of effects principle). 34

Half-Fraction Design The half-fraction design is referred to as a 2 k 1 fractional factorial design because 1 2 2k = 2 k 1. The notation indicates that the design will include k factors each at two levels but we only use 2 k 1 EUs. For incomplete blocks, we placed one replicate of a 2 k factorial design using a defining contrast that let us separate the treatment combinations into two sets. Each of the two sets was placed into one of the incomplete blocks according to the + and - coefficients of the treatments. Each block was half of a complete replication of the treatments and the defining contrast was confounded with blocks (although this still made it possible to estimate the remaining effects). We use the same principle for constructing fractional factorial designs, which is best illustrated by first presenting an example. 35

Example: Popcorn Experiment Recall the designed experiment involving (approximately) 3.5 ounce bags of popcorn. A 2 3 factorial design was used. Factor A is the brand of popcorn a cheap brand (-) and a costly brand (+). Factor B is the amount of time in the microwave 4 minutes (-) and 6 minutes (+). Factor C is the percent power of the microwave 75% (-) and 100% (+). The weight of the remaining unpopped kernels (in ounces) after popping is the response. The data are reported below, but this time we have divided the t = 8 treatments into two groups of four treatments using the defining contrast based on ABC. This particular division of treatments could have been used to construct an incomplete block design with ABC interaction confounded with blocks. If researchers wanted to construct a half-replicate fractional factorial design, then they could use the four treatments in the top-half of this table. Treatment I A B C AB AC BC ABC Y a + + - - - - + + 3.5 b + - + - - + - + 1.6 c + - - + + - - + 0.7 abc + + + + + + + + 0.3 (1) + - - - + + + - 3.1 ab + + + - + - - - 1.2 ac + + - + - + - - 0.7 bc + - + + - - + - 0.5 36

What is Sacrificed Through Fractional Designs? While we gain in terms of having reduced size of the experiment, it does come at a price. Namely, we lose some information on the treatment effects. If we use a half-replicate of the 2 3 factorial, then we lose the ability to estimate the three-factor interaction and each main effect is confounded (or aliased) with a two-factor interaction. Aliasing is mostly another term for confounding. We usually use confounding in the context of an incomplete block design; e.g., when we confound a treatment effect with a block. We usually use aliasing in the context of fractional designs; e.g., the treatment D is aliased with the three-factor interaction ABC in a 2 4 design. 37

Example: Popcorn Experiment Recall that the contrasts for treatment t is found by l t = t c tȳ t, where c t = ±1 corresponds to the sign in the design matrix for treatment t. If using only the top-half of the table for the popcorn experiment, we see the contrasts for the main effects are: l A = a b c + abc = 3.5 1.6 0.7 + 0.3 = 1.7 l B = a + b c + abc = 3.5 + 1.6 0.7 + 0.3 = 2.3 l C = a b + c + abc = 3.5 1.6 + 0.7 + 0.3 = 4.1 Contrasts for the two-factor interactions are: l BC = a b c + abc = 3.5 1.6 0.7 + 0.3 = 1.7 l AC = a + b c + abc = 3.5 + 1.6 0.7 + 0.3 = 2.3 l AB = a b + c + abc = 3.5 1.6 + 0.7 + 0.3 = 4.1 Notice that l A = l BC, l B = l AC, and l C = l AB. Therefore, the above contrasts estimate the combined effect of A+BC, B+AC, and C+AB, respectively. This means that we cannot differentiate between the main effect of brand (A) and the interaction between time (B) and power (C); differentiate between the main effect of time (B) and the interaction between brand (A) and power (C); and differentiate between the main effect of power (C) and the interaction between brand (A) and time (B). Since the same contrast estimates two different treatments, those treatments are aliased. Specifically, A is aliased with BC (written as A=BC), B is aliased with AC (written as B=AC), and C is aliased with AB (written as C=AB). 38

Design Generator and Defining Relation The higher-order interaction used as the defining contrast (i.e., such that c t 1), is known as the design generator. In the popcorn experiment, ABC is the design generator because it has all the same coefficients (+), which is the same as the identity column I. The defining relation for a design is the correspondence between a treatment and the identity column I. In the popcorn experiment, I=ABC is thus the defining relation. Thus, the aliasing scheme for a design can be determined from the defining relation by multiplying the column on each side of the defining relation by successive columns of the design matrix, such that the multiplication is carried out term-by-term. 39

Multiplication Rules for Design Generation It is helpful to establish some multiplication rules that allow us to establish the defining contrasts: 1 When multiplying the column I by I, all entries of remain 1; i.e., I I = I 2 = I 2 When multiplying any column (say, AB ) by I, the column entries are unchanged: I (AB ) = AB 3 When multiplying any column by itself, the resulting column is I: (AB ) (AB ) = I You can think of the above as being akin to identity properties. 40

Example: Popcorn Experiment We will use the defining relation I=ABC and the multiplication rules to show the aliasing relationships that we have already obtained. First we multiply the defining contrast by each main effect: A ABC = A 2 BC = BC B ABC = AB 2 C = AC C ABC = ABC 2 = AB Next we multiply the defining contrast by each two-factor interaction, which gives us the same relationship established above: BC ABC = AB 2 C 2 = A AC ABC = A 2 BC 2 = B AB ABC = A 2 B 2 C = C Note that if we had used the bottom-half of the original table, then the defining relationship would have been I=-ABC because all of the entries in the ABC column are - while all of the entries in the I column are +. The same logic applied above can be used to define the aliasing relationships when I=-ABC is the defining relationship. Note that the quantities above are also called generalized interactions because we are taking a treatment, symbolically multiplying it by a higher-order interaction, and then using our multiplication rules to reveal the resulting aliasing structure. 41

Constructing Half-Replicate 2 k 1 Designs The half-fraction design is constructed with the highest order interaction as the design generator; e.g., 2 3 1 half-fraction design has I=ABC, 2 4 1 half-fraction design has I=ABCD, etc. The treatment combinations are identified as follows: 1 Write the design matrix (using + s and - s) in standard order for the factors in a 2 k 1 design. 2 Identify the ± coefficients for the k th factor by equating them to the coefficients for the highest-order interaction. For example, to construct a half-replicate 2 3 1 design, we use the first four rows in standard order from the full 2 3 design (only writing the columns for the first two factors) and then calculate C=AB (see the table below). Note that the treatments match those in the first-half of the table for the popcorn experiment. A B C=AB Treatment - - + c + - - a - + - b + + + abc 42

Highest-Order Interaction The highest-order interaction of least interest is typically used to generate the half-replicate because the defining contrast chosen for the design generator cannot be estimated. For example, suppose that a 2 4 factorial is generated with the defining relation I=ABCD. A half-replicate for a 2 4 factorial requires 8 EUs and if using the defining relation I=ABCD, we get the following aliasing structure: A = BCD B = ACD C = ABD D = ABC AB = CD AC = BD AD = BC 43

Alternative Design Suppose we use the defining relation I=ABD to generate a different 2 4 1 fractional factorial design. The resulting aliasing structure is as follows: A = B = C = D = AC = BC = CD = 44

Quarter-Fraction and Smaller-Fraction Designs When the number of factors is large, the number of treatments in a half-fraction design may still be prohibitive. We can proceed to halve our half-fraction design, which results in a quarter-fraction design. We can proceed in this manner of halving to obtain smaller and smaller designs; e.g., eighth-fraction design, sixteenth-fraction design, etc. Fractional designs are usually written as 2 k f fractional designs, where f is a positive integer strictly less than k. In general, for a 2 k f fractional factorial design, there are 2 f terms in the defining relation, which consist of 1 The constant term, I. 2 The f interaction terms used to define the f successive fractionations. 3 The 2 f f 1 generalized interactions, constructed from the crossproducts involving pairs, triples, and so on, of the f interaction terms used to define the f successive fractionations. Since there are 2 f terms in the defining relation for a 2 k f fractional factorial design, we see that each factor effect is confounded with 2 f 1 other factor effects. 45

Example: 2 4 2 Design To obtain the aliasing structure for this quarter-fraction design, we first note that the half-fraction design is based on the defining relation I=ABCD. We then fractionate this half-fraction design by using the defining relation I=AB. This means that I=ABCD=AB. Moreover, the generalized interaction of AB and ABCD is CD. Therefore, I=ABCD=AB=CD is the defining relation for the 2 4 2 quarter-fraction design. To continue defining the aliasing structure, note that if we multiply each of the four quantities in the defining relation, we get A=BCD=B=ACD. With a little more work, it can be shown that the full aliasing scheme is given by I = ABCD = AB = CD (defining relation) A = BCD = B = ACD C = ABD = ABC = D AC = BD = BC = AD Since main effects are confounded with each other (A with B and C with D), this design is clearly undesirable. But let us proceed to construct the design matrix for this quarter-replicate 2 4 2 design. First, let column A be written in standard order (-,+,-,+). Since A=B, B will also be (-,+,-,+). Since A is confounded with B, we would then treat C as our second column in the standard order routine. Therefore, C is (-,-,+,+), which is what B would have been had it not been confounded with A. The design matrix with only the main effects written down (and it is easy to specify the higher-order terms): A B=A C D=C Treatment - - - - (1) + + - - ab - - + + cd + + + + abcd 46

Outline of Topics 1 Confounding 2 Fractional Factorial Designs 3 Design Resolution 47

Resolution The (maximum) resolution of a fractional factorial design, denoted by R, is the number of factors involved in the lowest-order effect in the defining relation, excluding the constant I. The resolution is important in identifying the severity of the confounding scheme. For example, in a 2 4 1 fractional factorial design, we showed that the defining relation is I=ABCD. The resolution of this design is R = 4 because there are four factors involved. This tells us that the most severe cases of confounding will involve (i) a main effect and a three-factor interaction (e.g., A=BCD) and (ii) two two-factor interactions (e.g., AB=CD). Roman numerals are commonly used to denote the resolution to avoid confusion with the number of factors. For example, in the 2 4 1 fractional factorial design discussed above, we showed that R = 4. Therefore, we write this as a 2 4 1 IV fractional factorial design. 48

General Comments on Design Resolution In general, the higher the resolution of the design, the less severe the degree of confounding. The resolution should never be less than III. Resolution II designs are not used since at least one pair of main effects will be confounded. Designs of resolution III, IV, and V are the most common. In designs of resolution III, no main effects are confounded with other main effects, some main effects are confounded with two-factor interactions, and two-factor interactions are confounded with other two-factor interactions. In designs of resolution IV, no main effects are confounded with other main effects or two-factor interactions, but some main effects are confounded with three-factor interactions and some two-factor interactions are confounded with other two-factor interactions. In designs of resolution V, no main effects or two-factor interactions are confounded with other main effects or two-factor interactions, but some main effects are confounded with four-factor interactions and some two-factor interactions are confounded with three-factor interactions. 49

Resolution for 3 to 8 Factors Number of Number of Defining Relation Fraction Factors Runs (Omitting Generalized Interactions) 3 2 3 1 III 4 I=ABC 4 2 4 1 IV 8 I=ABCD 5 2 5 1 V 16 I=ABCDE 2 5 2 III 8 I=ABC=ACE 6 2 6 1 VI 32 I=ABCDEF 2 6 2 IV 16 I=ABCE=BCDF 2 6 3 III 8 I=ABD=ACE=BCF 7 2 7 1 VII 64 I=ABCDEFG 2 7 2 IV 32 I=ABCDF=ABDEG 2 7 3 IV 16 I=ABCE=BCDF=ACDG 2 7 4 III 8 I=ABD=ACE=BCF=ABCG 8 2 8 2 V 64 I=ABCDG=ABEFH 2 8 3 IV 32 I=ABCF=ABDG=BCDEH 2 8 4 IV 16 I=BCDE=ACDF=ABCG=ABDH *Note: Other defining relationships can be used for some of these designs. 50

Example: Peanut Solids A scientist is studying the effect the extraction of food solids from peanuts using water. The seven factors of interest included effects like ph level of the water, extraction time, and agitation speed. The scientist is able to run 16 different treatment combinations. Therefore, they conducted a single replicate of a 2 7 3 design; i.e., an eighth-fractional design. Below is output for the design generation: $catlg.entry Design: 7-3.1 16 runs, 7 factors, Resolution IV Generating columns: 7 11 13 WLP (3plus): 0 7 0 0 0, 0 clear 2fis A resolution IV design is achieved for this setting. Near the top of the output are the generating columns. These are the locations of the treatments in standard order, where the first entry is for factor A and not for the constant I. Therefore, 7, 11, and 13, correspond to ABC, ABD, and ACD, respectively. This means the design generator is given by E=ABC, F=ABD, and G=ACD. One can then cycle through the multiplication rules and see how the aliasing scheme given above is achieved. 51

Example: Peanut Solids Using the design generator given by E=ABC, F=ABD, and G=ACD, we can then cycle through the multiplication rules and see how the aliasing scheme given above is achieved. Below are the aliasing schemes for main effects as well as two-factor and three-factor interactions: $aliased $aliased$legend [1] "A=A" "B=B" "C=C" "D=D" "E=E" "F=F" "G=G" $aliased$main [1] "A=BCE=BDF=CDG=EFG" "B=ACE=ADF=CFG=DEG" "C=ABE=ADG=BFG=DEF" "D=ABF=ACG=BEG=CEF" [5] "E=ABC=AFG=BDG=CDF" "F=ABD=AEG=BCG=CDE" "G=ACD=AEF=BCF=BDE" $aliased$fi2 [1] "AB=CE=DF" "AC=BE=DG" "AD=BF=CG" "AE=BC=FG" "AF=BD=EG" "AG=CD=EF" "BG=CF=DE" $aliased$fi3 [1] "ABG=ACF=ADE=BCD=BEF=CEG=DFG" If you take any of the aliased main effects and multiply both sides by that main effect, you will see the defining relationship involving I. For example, if you take A=BCE=BDF=CDG=EFG and multiply through by A, you get the (partial) defining relationship I=ABCE=ABDF=ACDG=AEFG. All treatments have 4 letters in them, which means the maximum resolution is IV, which we already showed on the previous slide. 52

Analyzing Fractional Factorial Designs We begin by estimating the effect of a factor in a fractional factorial design: AB = 2l AB N = 2 t ctȳt, N where N is the total number of EUs (or runs) required for the experiment and l AB is, again, the contrast of the treatment combination AB. If we let C be our contrast matrix (which amounts to using our design matrix for the fractional factorial design, but where the - and + signs are replaced by -1 and 1, respectively), then the estimated slopes for the linear model are ˆβ = (C T C) 1 C T Y, where ˆβ AB = t c tȳ t /N; therefore AB = 2 ˆβ AB The 1 df SS for an effects in a 2 k f design is SS(AB) = (l AB ) 2 2 k f Obviously, analyzing data collected using a fractional factorial design and fitting all of the effects does not leave us any df for estimating the error. However, we can plot the estimated factor effects and interactions from the 2 k f design on a normal probability plot (QQ-plot) to help us decide which, if any, effects appear to be negligible. We could then drop those factors that appear negligible (while obeying the hierarchy principle) and combine those effects to represent experimental variation, thus allowing us to use ANOVA. 53

Example: Filtration Experiment A chemical product is produced in a pressure vessel. A 2 4 1 fractional factorial design is conducted in the pilot plant to study the factors thought to influence the filtration rate (Y) of this product. The four factors are temperature (A), pressure (B), concentration of formaldehyde (C), and stirring rate (D). Each of these have a low and a high level. In the 2 4 1 design, the defining relation I=ABCD is used. The design matrix and data are given below: Treatment I A B C D=ABC Y (1) + - - - - 45 ad + + - - + 100 bd + - + - + 45 ab + + + - - 65 cd + - - + + 75 ac + + - + - 60 bc + - + + - 80 abcd + + + + + 96 54

Example: Filtration Experiment The full aliasing scheme for this 2 4 1 fractional factorial design is as follows: I = ABCD A = BCD B = ACD C = ABD D = ABC AB = CD AC = BD AD = BC The 1 df SS (Type I SS) are then as follows: Call: aov(formula = out) Terms: A B C D A:B A:C A:D Sum of Squares 722.0 4.5 392.0 544.5 2.0 684.5 722.0 Deg. of Freedom 1 1 1 1 1 1 1 Estimated effects may be unbalanced In the above, those SS quantities which are small are for B and AB. Hence, these are likely negligible effects. The estimated effects used in the calculation of the 1 df SS which can be plotted on a normal probability plot are given below: A1 B1 C1 D1 A1:B1 A1:C1 A1:D1 19.0 1.5 14.0 16.5-1.0-18.5 19.0 55

Example: Filtration Experiment A To the right is the normal probability plot of estimated factor effects and interactions for this experiment. The further values deviate from the 0-1 line (i.e., the black line), the more substantial their effect. From this figure, it looks like B and AB are negligible, which is consistent with the Type I SS that we calculated. Normal Probability 0.2 0.4 0.6 0.8 AC AB B C D AD 20 10 0 10 20 Effect Estimates 56

Example: Filtration Experiment We next drop B and AB from the analysis. The variability explained by B and AB will make up our estimate of the experimental error. The resulting ANOVA table is presented below. Clearly, all of the remaining effects are significant effects on filtration rate. Note that each F -test is constructed against the MSE, which is our estimate of the experimental error. Analysis of Variance Table Response: rate Df Sum Sq Mean Sq F value Pr(>F) A 1 722.0 722.00 222.15 0.004471 ** C 1 392.0 392.00 120.62 0.008189 ** D 1 544.5 544.50 167.54 0.005916 ** A:C 1 684.5 684.50 210.62 0.004714 ** A:D 1 722.0 722.00 222.15 0.004471 ** Residuals 2 6.5 3.25 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 57

Plackett-Burman Designs A limitation of fractional factorial designs of resolution III is that they require the number of treatment combinations to be a power of 2. Plackett-Burman designs are two-level, resolution III designs that can be used for studying up to N 1 factors in N experimental trials, where N is a multiple of 4. Valid run sizes of Plackett-Burman designs are, therefore, 4, 8, 12, 16, 20, etc. When N is a power of 2, then Plackett-Burman designs are equivalent to resolution III fractional factorial designs that we already presented. When N is not a power of 2, then the aliasing scheme is complicated. Analysis of Plackett-Burman designs is the same as what we presented for other fractional factorial designs. 58

Final Comments As we have seen, factorial experiments are very versatile treatment designs. In this lecture, we ve only provided the rudimentary principles for constructing fractional factorials. Fractional designs can also be constructed for 3 k factorials using the same principles as those for 2 k factorials. However, instead of dividing by powers of two (e.g., half-fractional, quarter-fractional, etc.), you now divide by powers of three (e.g., third-fractional, ninth-fractional, etc.) Most statistical software can handle 3 k f fractional factorial designs. The resolution of such designs can also be determined (or specified). 59

This is the end of Unit 9. 60