CHAPTER 7 - FACTORIAL ANOVA

Size: px

Start display at page:

Download "CHAPTER 7 - FACTORIAL ANOVA"

Emily Reynolds
6 years ago
Views:

1 Between-S Designs Factorial 7-1 CHAPTER 7 - FACTORIAL ANOVA Introduction to Factorial Designs A 2 x 2 Factorial Example Factorial ANOVA as Planned Contrasts ONEWAY ANOVA and Factorial Design Conventional Layout and Calculations for Factorial ANOVA Some SPSS Analyses for Factorial Designs SPSS MANOVA Analyses for Factorial Designs A 2 4 Factorial Study of Aggression SPSS Analyses for the Factorial Aggression Study GLM Analysis of Aggression Study Conclusions

2 Between-S Designs Factorial 7-2 INTRODUCTION TO FACTORIAL DESIGNS One strength of ANOVA is that a single analysis can examine relations between a dependent variable and multiple independent variables or factors. For example, cognitive researchers might be interested in the separate and joint effects of concreteness and relatedness on learning pairs of words. Four groups of subjects could be tested, with one group receiving a list of abstract-unrelated words, one group a list of abstract-related words, one group a list of concrete-unrelated words, and one group a list of concrete-related words. ANOVA would determine the significance of both the concreteness effect (the difference between concrete and abstract words) and the relatedness effect (the difference between related and unrelated words). Moreover, the researchers could determine whether the effect of concreteness was the same for related and unrelated words (i.e., whether or not there was an interaction between the two factors). Designs in which each level of one variable (e.g., concreteness) occurs (i.e., is crossed) with each level of another variable (e.g., relatedness) are called factorial designs. The factorial study of concreteness and relatedness allows researchers to determine whether subjects remember more concrete words than abstract, and also whether subjects remember more related words than unrelated words. Moreover, it would be possible to determine whether the effects of concreteness were the same for related and unrelated words, and whether the effects of relatedness were the same for concrete and abstract words. If the effects of one variable (e.g., concreteness) vary across the levels of another variable (e.g., relatedness), the two variables are said to interact with one another. For example, the difference between related and unrelated words (i.e., the effect of relatedness) might be greater for abstract words than for concrete words. We will say much more about interaction in this and the next chapter, but one of the primary strengths of factorial designs is the ability to test interactions for significance. A second example of a basic factorial design would be a social psychology study of the effects of both attitude similarity and physical attractiveness on interpersonal liking. Participants could read descriptions of people who express similar or dissimilar attitudes to the reader, with the descriptions being accompanied by pictures of either an attractive or an unattractive person. The combination of levels for the two factors (i.e., crossing of similarity and attractiveness) in

3 Between-S Designs Factorial 7-3 this factorial study produces four treatment conditions or cells: similar attractive, similar unattractive, dissimilar attractive, and dissimilar unattractive. ANOVA would be able to examine the effects of each of these variables and their combinations on some measure of attraction (e.g., rated liking for the described person, proximity). Factorial designs are often presented as tables with rows representing one variable and columns representing the second variable. The recall and liking examples are shown in Box 7.1. For the recall study, the left column represents concrete words and the right column abstract words. The top row is unrelated words and the bottom row is related words. Each condition is represented as a cell defined by the combination of levels of the two variables (e.g., C+U = concrete, unrelated). Concreteness by Relatedness Concrete Abstract Unrelated C+U A+U Related C+R A+R Attractiveness by Similarity Attractive Unattractive Similar S+A S+U Dissimilar D+A D+U Box 7.1. Table Representation of Factorial Designs. In these examples, each factor (i.e., independent variable) has only two levels, but factorial designs can include factors with more than two levels. Factorial designs are often described in terms of the number of levels for each of the independent variables. The present studies would be referred to as 2 by 2 (often written 2 2) factorial designs. A 3 4 factorial design would include one variable with 3 levels to it (e.g., conservative, liberal, socialist) and one variable with 4 levels to it (e.g., Canada, USA, UK, Germany). There are a total of 12 conditions or cells in a 3 4 Factorial design. Factorial designs can also include more than two variables. A design includes 3 variables, one with 3 levels, one with 4 levels, and one with 2 levels. There would be a total of = 24 cells or distinct conditions. The number of conditions in a factorial study increases dramatically with the number of factors and the number of levels of each factor, often with impractical consequences for the total number of subjects. To perform a between-subjects study with 24 cells, for example, we would require 240 subjects just to have 10 subjects per cell. This is less of an issue for Within-Subject factorial designs, which do not inflate the number of subjects required for the study. For example, an efficient way to do the recall study would be to have 20 subjects study all four types

4 Between-S Designs Factorial 7-4 of word pairs (i.e., CU, CR, AU, and AR) within a single list. Such designs are considered in later chapters. On the other hand, factorial designs can be economical because subjects contribute to every factor. Box 7.2 shows that 10 subjects per cell in the concreteness study would give 20 subjects per level for the comparison of concrete and abstract words, and 20 subjects per level for the comparison of related and unrelated words. It would take 80, rather than 40 subjects to obtain this many subjects per level in two separate single-factor designs, as also shown in Box 7.2. The simplest factorial design involves two Between-S variables; that is, different subjects are observed in all cells of the design. Factorial designs with one or more Within-S Factorial Design C A Total R U Total Separate Single-Factor Designs C A Sub-Totals R U Total 80 Box 7.2. Number of Subjects in Factorial Designs. variables are described in later chapters. The simplest of the two Between-S factorial designs would involve only two levels for each of the conditions, and we begin with an example of this sort. A 2 X 2 FACTORIAL EXAMPLE Box 7.3 shows recall results for 4 independent groups of 5 subjects each, with the groups defined by whether they studied Concrete (C) words or Abstract (A) words, and Unrelated (U) words or Related (R) words. Subjects remembered an average of 6.0, 8.0, 3.0, and 7.0 words for CU, CR, AU, and AR words respectively. A Single-Factor, Between-S ANOVA is also shown (i.e., the four groups have been treated as four levels of a single independent variable, rather than as 4 cells defined by two levels for each of two independent variables). The omnibus F indicates that differences among the four group means account for a significant amount of variation in words recalled, SS Treatment = 70.00, F = 9.33, p <.05, 2 =.636).

5 Between-S Designs Factorial 7-5 Because there are four groups, most researchers would want to conduct more specific comparisons or contrasts to identify the source of the variability. One way to think about factorial designs is to view them as a particular type of planned contrast between various combinations of means. That is, we will partition SS Treatment into the effects of concreteness and of relatedness (i.e., the two main effects), and the effect of the interaction between these factors. One way to perform this partition is to use appropriate planned contrasts, C A U R U R SS j y j y G = 6.0 y j -y G s G = SS Total = = (20-1) 2.406² SS Treatment = = 5 (0² + 2² + -3² + 1²) SS Error = = SS jk = Source SS df MS F F.05 Treatment k-1 = > 3.24 Within n-k = Total n-1 = 19 ² = 70.00/ =.636 reject Ho: cu = cr = au = ar Box 7.3. Single-Factor ANOVA of Factorial Design. which is especially easy in the 2 x 2 design. Factorial ANOVA as Planned Contrasts Factorial ANOVA represents one standard or conventional way of dividing the SS Treatment from a factorial design involving four groups into three (k - 1) meaningful single df contrasts. The three degrees of freedom for the treatment effect are separated into the effect of one factor (e.g., Concreteness here), df = 1, the effect of the other factor (e.g., Relatedness, df = 1), and the interaction between these two variables (df = 1). The three contrasts associated with the standard factorial partition are presented in Box 7.4.

6 Between-S Designs Factorial 7-6 Contrast 1 measures the effect of concreteness, averaged over relatedness because it compares groups CR and CU to groups AR and AU (i.e., contrast 1 compares the mean for the two concrete groups to the mean for the two abstract groups). Note that CU and CR, the two concrete groups, are both coded +1, whereas AU and AR, the two abstract groups, are both coded -1. CU CR AU AR y j L SS L F F.05 =4.49 ² c 1j Rej H 0 : C = A.182 c 2j Rej H 0 : R = U.409 c 3j Do not Rej H 0 :?.045 SS L1 = (5 4.0²)/( ) = 20.0, SS L2 =... SS L = 70.0 = SS Treatment Mean F = = Ftreatment ² =.636 T-tests for contrasts L1: t = 2.83 = 4/SQRT(2.5(1²/ ²/5)) = 8.0 L2: t = 4.24 = 6/SQRT(2.5(1²/ ²/5)) = 18.0 L3: t =-1.41 =-2/SQRT(2.5(1²/ ²/5)) = 2.0 Box 7.4. Traditional Factorial Contrasts. Contrast 2 measures the effect of relatedness, averaged over concreteness. Contrast 2 compares CR and AR (both +1) to CU and AU (both -1), and therefore compares the average for the two related groups to the average for the two unrelated groups. Contrasts 1 and 2 are interpreted in exactly the same way as the effects for a single treatment variable in a oneway design. Indeed, we will see later that the SSs for these effects could be calculated by assuming in turn that the study involved only the different levels of concreteness, and only the different levels of relatedness. The effects of single variables averaged across the levels of the other factors in factorial designs are called main effects. There will be one main effect for each factor in a factorial study. Main effects can be tested for significance using either t-tests or the equivalent Fs. Box 7.4 shows that the main effects of concreteness and relatedness are both significant, with the latter being somewhat stronger. The two main effects indicate that subjects recalled significantly more concrete than abstract words (y Concrete = 7.0, y Abstract = 5.0) and more words from related than from unrelated pairs (y Related = 7.5, y Unrelated = 4.5). The means being compared here are averaged across the level of the other variable (e.g., y Concrete is the average of CU and CR), which is what

7 Between-S Designs Factorial 7-7 defines a main effect. Contrast 3 is more difficult to conceptualize. The actual coefficients are produced by multiplying the coefficients for contrasts 1 and 2, our two main effects. Contrast 3 represents the variability among the treatment means that is not accounted for by the main effects of concreteness and relatedness; note that SS C3j = SS Treatment - SS Concreteness - SS Relatedness. The additional variability in SS Treatment represented by SS C3j is due to the specific combination of concreteness and relatedness, and is referred to as the interaction between the two variables. It is often denoted as the product of the two (or more) variables involved in the interaction (e.g., SS C R ). Here this interaction effect is not significant. Box 7.5 shows an alternative way to compute the factorial SSs. The two main effects are calculated as deviations of row and column means from y G. SS Concreteness represents the deviations from the CU CR AU AR Concreteness Relatedness CxR Interaction CON ABS y Rel UNR REL y Conc y G = 6.0 SS Contrast F F=t L = 10( )² + 10( )² L = 10( )² + 10( )² L3 5.0 = 10( )² + 10( )² Box 7.5. Contrasts as Variation in Row, Column, and Cell Means. grand mean of the column means for Concrete and Abstract words averaged across Relatedness (7.0 for Concrete and 5.0 for Abstract). These deviations are multiplied by n j = 10 because 10 observations go into each of the row and column means. SS Relatedness represents the deviations of the row means (4.5 and 7.5) from the grand mean. The SS for the interaction term (i.e., SS L3 ) is computed based on the deviations from y G of means coded +1, M CR+CU = ( )/2 = 5.5, and of means coded -1, M CU+AR = ( )/2 = 6.5, in Box 7.4 and Box 7.5. Each mean includes both levels of the relatedness variable (R and U) and both levels of the concreteness variable (C and A); therefore, the main effects of concreteness and relatedness do NOT contribute to the differences between the interaction term means. A fuller analysis of this interaction SS is presented later.

8 Between-S Designs Factorial 7-8 ONEWAY ANOVA and Factorial Design Because factorial ANOVA can be done as contrasts, the ONEWAY /CONTRAST command therefore can be used to perform factorial ANOVA (even though ONEWAY is specifically designed for single factor designs). Box 7.6 presents the relevant commands to analyze the concreteness by relatedness study using recall scores and a group variable that corresponds to the four conditions. DATA LIST FREE / recall group. BEGIN DATA END DATA. ONEWAY recall BY group (1,4) /CONTR = /CONTR = /CONTR = SUM OF MEAN F F SOURCE D.F. SQUARES SQUARES RATIO PROB. BETWEEN GROUPS WITHIN GROUPS TOTAL CONTRAST COEFFICIENT MATRIX Grp CONTRAST CONTRAST CONTRAST POOLED VARIANCE ESTIMATE VALUE S. ERROR T VALUE D.F. T PROB. CONTRAST CONTRAST CONTRAST Box 7.6. Factorial ANOVA with ONEWAY Contrasts. A ONEWAY ANOVA is performed for four groups (CU, CR, AU, and AR) and then contrasts test for the main effects of concreteness and relatedness, and their interaction (Contrasts 1 to 3, respectively). Note that the obtained statistics (MS Error, Ls, and ts) agree with our earlier hand calculations, and that the main effects of concreteness and relatedness are significant, whereas the interaction is not. From the contrasts, we could calculate SS Contrast, perform F tests that are equivalent to the reported t-tests (i.e., F Contrast = t 2 Contrast), and also calculate 2 for the main effect and interaction terms. For example, SS Concreteness = ( )/4 = 20.0, F Concreteness = 20.0/2.5 = 8.0 = , 2 Concreteness = 20.0/110.0 =.182. Concreteness accounts for a significant 18.2% of the variability in recall scores. Similar calculations could be

9 Between-S Designs Factorial 7-9 performed for the contrasts corresponding to the main effect of relatedness and the interaction between concreteness and relatedness. Although independent groups factorial ANOVAs can be done using ONEWAY and /CONTRAST, the SPSS ONEWAY command is especially designed for single factor studies. Several SPSS programs handle factorial designs directly, but first we illustrate the standard layout and calculations for Factorial ANOVAs. Conventional Layout and Calculations for Factorial ANOVA The contrast approach is not the conventional way to perform factorial ANOVA, although the results of the conventional and contrast approaches are identical. Box 7.7 shows a more standard layout of the data for a factorial ANOVA. In Box 7.7, SS Concreteness is obtained from the deviations of the column means (averaged over relatedness) from the grand mean (+1.0 for concrete words and -1.0 for abstract). These deviations squared times the number of observations in each of the column means (5 Unrelated + 5 Related = 10 observations) gives the SS for the main effect of concreteness. Note that these operations conform to those we did in Box 7.5, where we combined together groups coded the same on the contrast and calculated

10 Between-S Designs Factorial 7-10 deviations of the averaged means from the grand mean. The main effect of relatedness is calculated in a similar fashion; the deviations of row means from the grand mean (-1.5 for unrelated and +1.5 for related) are squared and multiplied times the number of subjects contributing to the means (again = 10 subjects). Calculation of these main effects is identical to the procedure for single factor designs, except that the second variable is ignored in performing the calculations and n is doubled because subjects from both levels of the second variable contribute to the main effect means. The dfs for the main effects are determined by CON ABS y Rel y Rel -y G UNR y CU =6.0(0) 5 y AU =3.0(-3) y U = REL y CR =8.0(2) 7 y AR =7.0(1) y R = y Conc y C = 7.0 y A = 5.0 y G = 6.0 y Conc -y G SS Treatment = 70.0 = 5 (0²+ 2.0² ² + 1.0²) SS Concreteness = 20.0 = 10 (1.0² ²) = n C (y Conc -y G ) 2 SS Relatedness = 45.0 = 10 (-1.5² + 1.5²) = n R (y Rel -y G ) 2 SS C R = 5.0 = = SS Trt -SS Conc -SS Rel SS Error = 40.0 = = SS Total - SS Treatment = SS j Factorial ANOVA Summary Table Source SS df MS F F.05 = 4.49 Concrete 20.0 c-1 = Rej H 0 : C = A Related 45.0 r-1 = Rej H 0 : R = U C R 5.0 (c-1)(r-1)= DoNotRej H 0 :? Error 40.0 n-c r = Total = 19 Box 7.7. Traditional Factorial ANOVA. the number of levels for each factor (e.g., df Conc = 2-1 = 1 and df Rel = 2-1 = 1), again analogous to the oneway ANOVA. For the moment, SS C R is calculated in two steps; first SS Treatment is obtained from the deviations of the four cell means about the grand mean. This represents the variability in all four cell means, which depends in part on main effects and in part on the unique combination of concreteness and relatedness (i.e., the interaction between the two variables). Subtracting the

11 Between-S Designs Factorial 7-11 two main effects, that is, SS Concreteness and SS Relatedness, from SS Treatment gives SS C R, the SS for the interaction between concreteness and relatedness. This result agrees with the value obtained in our earlier contrast analysis. The df for the interaction term depends on the number of levels for each factor involved in the interaction. In the recall study, df C R = (2-1)(2-1) = 1 because the concreteness and relatedness variables each have two levels. This again agrees with the fact that the contrast analysis needed only a single contrast (df = 1) for the interaction. The interaction term in a 3 4 factorial study would have (3-1)(4-1) = 6 degrees of freedom, which would be captured by 6 orthogonal contrasts. SOME SPSS ANALYSES FOR FACTORIAL DESIGNS We now consider several different SPSS approaches to this design, using ANOVA programs designed for 2 or more factors. We first examine the SPSS ANOVA procedure briefly, which is limited to Between-S Factors, and then concentrate on the more powerful and general SPSS MANOVA and GLM procedures. SPSS ANOVA Analyses for Factorial Designs DATA LIST FREE / recall con rel. BEGIN DATA END DATA. ANOVA recall BY con(1 2) rel(1 2). Sum of Squares df Mean Square F Sig. recall Main Effects (Combined) con rel Way con * rel Interactions Model Residual Total Box 7.8. SPSS ANOVA Approach to Factorial Designs. Box 7.8 shows the commands and output for the ANOVA command. The data is entered with separate variables for concreteness (CON = 1 or 2) and relatedness (REL = 1 or 2). The

12 Between-S Designs Factorial 7-12 general format of the ANOVA command is: ANOVA depvar BY indvar1(min,max) indvar2(min,max)..., where multiple independent variables (and the codes corresponding to their minimum and maximum values) are listed after the BY keyword. The ANOVA output produces SSs and MSs for main effects, interactions, and their various combinations. The ANOVA summary table must be read carefully, because effects are represented at more than one level. For example, note that SS Concreteness + SS Relatedness = SS Main Effects and that SS Main Effects + SS 2-Way = SS Explained. If the analysis included more than two variables, then there would be sets of SSs and Fs for Main Effects, Two-Way Interactions, and Three-Way Interactions, all of which would sum to Explained. To obtain ² for main effects and interactions, divide the SSs for the various effects by SS Total. One limitation of the SPSS ANOVA command is that it cannot handle Within-S variables, except under rather limited circumstances. Also, ANOVA does not permit contrasts. We have already used two more powerful (and complex) analysis of variance programs in SPSS, namely MANOVA (for Multivariate ANOVA) and GLM (for General Linear Model). We examined these programs previously for single factor ANOVA, but their full power becomes more apparent with factorial and other complex designs. SPSS MANOVA Analyses for Factorial Designs The basic MANOVA is very similar to the ANOVA command (but has many optional subcommands, some of which will be introduced later). The format of the MANOVA command and some common options are shown in Box 7.9. MANOVA is only available by syntax and is not available in the SPSS menu system. MANOVA depvarlist BY factor(min,max) factor(min,max)... /WSFACTOR = wsfname(#levels) wsfname... /PRINT = CELLINFO(MEANS) SIGNIFICANCE (AVERF SINGLEDF UNIVARIATE) /OMEANS = TABLES(CONSTANT factor factor BY factor) /PMEANS = TABLES(CONSTANT factor factor BY factor) /CONTRAST (factor) = SPECIAL (k 1s contrast codes) POLYNOMIAL(degree) /CINTERVAL = INDIVIDUAL UNIVARIATE(BONFER SCHEFFE) /WSDESIGN... /DESIGN effect effect BY effect WITHIN W MWITHIN... Box 7.9. Partial Format for MANOVA Procedure. A basic MANOVA simply requires specification of the dependent and independent

13 Between-S Designs Factorial 7-13 variables, with the minimum and maximum levels of the latter. Other commands in Box 7.9 will be discussed later. The DESIGN statement will be particularly important for focussed analyses discussed in later chapters. By default, MANOVA does a full factorial analysis (i.e., all main effects and interactions) when no DESIGN statement is included. Box 7.10 shows the SPSS commands and output for a MANOVA analysis of the recall study. The separate independent variables are listed after the BY keyword, and their minimum and maximum levels included in parentheses. The four cell means and the grand mean are reported because the /PRINT = CELLINFO subcommand was included. The default output does not present the means for the main effects of the two variables, although the optional OMEANS command (for observed means) provides that capability. No DESIGN subcommand was required because the analysis defaults to the full factorial, but it has been included to illustrate how users could specify the default partitioning of SS Total into SS Con, SS Rel, and SS ConXRel. The output would have been exactly as shown in Box 7.10 if the DESIGN subcommand had been omitted. MANOVA recall BY con(1 2) rel (1 2) /PRINT = CELLINFO /DESIGN con rel con BY rel. FACTOR CODE Mean Std. Dev. N CON 1 REL REL CON 2 REL REL For entire sample Source of Variation SS DF MS F Sig of F WITHIN CELLS CON REL CON BY REL (Model) (Total) R-Squared =.636 Adjusted R-Squared =.568 Box SPSS MANOVA Analysis for Factorial Recall Study. The summary table is similar to that for the SPSS ANOVA command, and leads to the identical results as earlier methods. The SS Error is called the WITHIN CELLS source and is presented first. The error term is followed by one line each for concreteness, relatedness, and

14 Between-S Designs Factorial 7-14 their interaction. SS Model would be the sum of these components and could be used to calculate 2 for the aggregate of main and interaction effects. That is, the R-squared reported by MANOVA corresponds to the overall Model with both main effects and interaction. Separate 2 s could be calculated for each effect by dividing the appropriate SS by SS Total. Box 7.11 demonstrates the use of the OMEANS command to obtain observed means for all of the conditions of the study. The keyword CONSTANT requests the grand mean, the factor names produce main effect means, and the BY keyword produces cell means for the combinations of two or more factors. The OMEANS procedure produces both weighted and unweighted means, although the unweighted means have been deleted from Box 7.11 except for the Grand Mean (too illustrate what the output MANOVA recall BY con (1,2) rel (1,2) /OMEANS = TABLE(CONSTANT, con, rel, con BY rel).... Combined Observed Grand Means Variable.. RECALL GMEAN WGT UNWGT Combined Observed Means for CON Variable.. RECALL CON 1 WGT WGT Combined Observed Means for REL Variable.. RECALL REL 1 WGT WGT Combined Observed Means for CON BY REL Variable.. RECALL CON 1 2 REL 1 WGT WGT Box Grand, Main Effect, and Cell Means Using OMEANS. would look like). When equal numbers of subjects appear in the various conditions, as in the present study, weighted and unweighted means are identical. When unequal numbers of subjects appear in conditions, the weighted means count each subject once; therefore, cells with more subjects contribute more heavily to weighted means in certain conditions. The unweighted means, however, weight each cell once; therefore, cells with few subjects contribute equally to unweighted means. Another very powerful analysis of variance routine in SPSS is GLM. The GLM procedure will be illustrated later for a second study, described next.

15 Between-S Designs Factorial 7-15 A 2 4 FACTORIAL STUDY OF AGGRESSION To examine the relationship between aggression and gender, biological psychologists examined aggression levels in 8 groups of 9 rats. Four of the groups were males and four groups were females. Nine animals within each gender were given 0, 1, 2, or 3 mg doses of testosterone prenatally. Gender and dose varied factorially. Raw scores and descriptive statistics are shown in Box The scores represent the amount of aggression shown by the animals as adults. The eight cell means shown in Box 7.12 clearly demonstrate considerable variability, especially for the Females. The Male means all tend to be rather high, from to across the levels of the Dose variable. The Female means, however, vary more as a function of Dose, and also vary more systematically. Males Females y gd s gd Aggression Box Factorial Analysis of Aggression Study. scores for Females rise from a low of at 0 testosterone to at 3 units of testosterone, equalling the aggression level of Males. Figure 7.1 plots the means for the eight groups. The male and female dose lines are clearly not parallel, demonstrating visually the interaction that will be tested for significance by Analysis of Variance. That is, the effect of testosterone on aggression is Figure 7.1. Plot of Dose by Gender Interaction.

16 Between-S Designs Factorial 7-16 different for Females (strong positive effect) than for Males (no effect or slight negative effect). Administration of prenatal testosterone increased aggression in females and had little effect on aggression in males. The net effect was that testosterone increased aggression scores of Females to those of Males. The total of 8 cells gives df = 8-1 = 7 for the combined treatment effect (i.e., main effects plus interaction). The main effect of Dose entails 4-1 = 3 df, and the main effect of Gender entails 2-1 = 1 df. That leaves df Males Females y GD y GD y.d y.d - y G y g y G = s G = y g. - y G SS Total = = (72-1) SS Error = = (9-1) ( ) SS Treatment = = SS Total - SS Error 9{( ) 2...( ) 2 } SS Gender = = 36{( ) 2 + ( ) 2 } SS Dose = = 18{( ) 2...( ) 2 } SS G D = = Source SS df MS F Gender p <.001 Dose p >.10 G D p <.003 Error Total Box Calculations for Factorial Anova of Gender by Dose Study. = = 3 = (4-1)(2-1) for the interaction between Dose and Gender. Box 7.13 shows the cell means and various calculations relevant to this analysis. As just noted the 8 cell means vary quite a bit about the grand mean of , from a low of for female rats not administered any prenatal testosterone to highs of 29 or more for several male groups. Of the total variability of units (calculated here from the SD for all 72 animals), the 8 treatment conditions accounted for units (41.09%). Variability within groups (determined from the SDs for each group shown in Box 7.12) accounted for the remaining units (58.01%). The treatment variability can be partitioned into three independent components, but this

17 Between-S Designs Factorial 7-17 requires calculation of some additional means. In addition to the cell means, Table 7.13 shows row and column means used to determine the main effects of Dose (rows) and Gender (columns). The deviations of these row and column means from the grand mean of have also been calculated; these quantities are squared, summed, and multiplied times the number of observations in each row and column mean to compute the SSs for the main effects. Gender (i.e., deviations of male and female means, n j = 36, from the grand mean) explained units (19.50% of SS Total ) and Dose explained units (4.72% of total). The remaining variability in SS Treatment, units or 16.69%, resulted from the interaction of Gender and Dose. This is obtained by subtraction here, and later by direct computation. The ANOVA table at the bottom of Box 7.13 summarizes the analysis. The dfs for the two main effects, 2-1 = 1 for Gender and 4-1 = 3 for Dose, and (2-1)(4-1) = 3 for the interaction, are used to compute MSs for each of the three effects. The MS Error is obtained from SS Error and df = 72 - (2 4) = 64 = 8 (9-1). The main effect of Gender is significant, but the main effect of Dose is not. Interpretation of the main effects is complicated, however, by the significant Gender by Dose interaction. Examination of the means reveals that males and females differ dramatically in aggression at dose 0, but that differences in aggression between males and females are considerably reduced by the administration of prenatal testosterone (see Figure 7.1). Such qualification of main effects by another factor is what is meant by interaction. SPSS Analyses for the Factorial Aggression Study Box 7.14 shows the data entry commands for the aggression study. There are 72 cases, each with a value for gender (GEND = 1 or 2), dose (DOSE = 0, 1, 2, or 3), and aggression (AGGR). A subject number from 1 to 72 (SUBJ) has also been included but is not necessary. To conserve space only 8 cases, two per line, are shown in Box The variability among the 8 cell means can be partitioned in various ways, as we will see in later chapters. The conventional approach to a factorial study, however, is to analyze variability between groups as a function of the main effect for DATA LIST FREE / subj gend dose aggr. BEGIN DATA END DATA. Box Data Entry for Aggression Study.

18 Between-S Designs Factorial 7-18 Gender, df = 2-1 = 1, the main effect for Dose, df = 4-1 = 3, and the interaction between Gender and Dose, df = 1 3 = 3, for a total of 7 degrees of freedom (8-1 = 7). This standard analysis can be done with MANOVA, ANOVA, GLM, or even ONEWAY with appropriate contrasts, although some calculations would be required for ONEWAY. As we see in a later chapter, regression can also perform factorial anova given appropriate indicator variables. The MANOVA commands and output for the traditional factorial analysis of variance are shown in Box 7.15; no DESIGN subcommand was needed for the full factorial because that is what MANOVA does by default. noted As previously, the main effect of Gender is significant, p =.000, and accounts for 19.5% of the variation in aggression scores, 2 Gend = / MANOVA aggr BY gend(1 2) dose (0 3) /PRINT = CELLINFO(MEANS). FACTOR CODE Mean Std. Dev. N GEND 1 DOSE DOSE DOSE DOSE GEND 2 DOSE DOSE DOSE DOSE For entire sample Source of Variation SS DF MS F Sig of F WITHIN CELLS GEND DOSE GEND BY DOSE R-Squared =.409 Adjusted R-Squared =.345 Box Condensed Data Entry and Factorial ANOVA for Aggression Study =.195. Since there were only two groups (i.e., df = 1), we can conclude from the means that males scored significantly higher than Females. The main effect of Dose was not significant by the standard analysis, p =.175, and accounted for little of the variability in aggression, 2 Dose =.047. However, dose is a numerical factor and because the traditional ANOVA divides SS Dose by k - 1 = 3 df, the main effect may dilute any orderly effect (e.g., linear) of this variable. An important consideration in interpreting the main effect of Dose (and Gender) is the significant interaction. The Gender by Dose interaction was significant, p =.001, and accounted for almost as much variation as the gender variable, 2 G D =.167. The significant gender by dose

19 Between-S Designs Factorial 7-19 interaction indicates that the effect of each variable was not constant across the levels of the other variable. As we noted earlier, dose had more effect on females than males, reducing and perhaps even eliminating gender differences in aggression between males and females (see Figure 7.1). Equivalently, gender demonstrates a strong relationship with aggression at low doses, but not at high doses; indeed, the relationship is slightly reversed at the highest dose. As in the single-factor ANOVA, the omnibus F can be somewhat vague in its conclusions, especially with variables that have more than two levels, such as Dose in this study. The overall Dose effect with df = 3 was not significant, but perhaps some specific contrast (e.g., linear) would be. The omnibus conclusions are also vague because a significant interaction can occur given quite diverse outcomes to the study. Researchers would generally follow the omnibus F test with more precise comparisons of the main and interaction effects. The topic of multiple comparisons for factorial designs is addressed in the next two chapters. GLM Analysis of Aggression Study Box 7.16 shows the GLM commands and output for this study. One nice feature of GLM is that it reports all cell and GLM aggr BY gend dose / PRINT = DESCR. GEND DOSE Mean Std. Deviation N Total Total Total Total Source Type III Sum of Squares df Mean Square F Sig. Corrected Model (a) Intercept GEND DOSE GEND * DOSE Error Total Corrected Total a R Squared =.409 (Adjusted R Squared =.345) Box GLM Analysis of Aggression Study. column means, as well as the grand mean. The Total rows under GEND = 1 and 2 represent the overall means for males and females, respectively. The four means under GEND = Total correspond to the means for the four doses averaged across gender. Finally, the Total row for

Between-S Designs Factorial 7-20 GEND = Total presents the grand mean. The descriptive statistics in GLM, thus, provide everything necessary to compute the factorial anova.

20 Between-S Designs Factorial 7-20 GEND = Total presents the grand mean. The descriptive statistics in GLM, thus, provide everything necessary to compute the factorial anova. The summary table reports the main effects, the interaction effect, and the error, as well as several aggregate rows. The Corrected Total row corresponds to SS Total, and the Corrected Model row to SS Treatment (aggregating main effects and interaction). The intercept value (not meaningful here) represents the deviation of the Grand Mean from 0; that is, SS Intercept = 72 x ( ) 2 = SS Total = SS Intercept + SS CorrectedTotal = 50212, which in essence represents the deviation of each score from 0. This is shown in Box Figure 7.2 shows steps involved in running GLM from menus. Assuming the data has been entered (see Data Editor screen below the command screens), users select Analyze GLM Univariate to activate the middle screen. AGGR is selected and entered into the Dependent Variable box. GEND and DOSE are similarly entered into the Fixed Factor(s) box. Clicking on Options activates the top screen and allows users to request means for all of the effects and the grand mean (along with other options discussed in later chapters). Clicking Continue and Ok would initiate the analysis (and produce a more detailed version of the syntax in Box 7.16 including various default commands). COMP aggr2 = (aggr - 0)**2. DESCR aggr2 /STAT = SUM. N Sum AGGR Box Calculation of GLM s SS Total. Figure 7.2. Invoking GLM with Menus.

21 Between-S Designs Factorial 7-21 CONCLUSIONS The standard factorial ANOVA described here can be applied to a variety of situations in which there are two or more factors of interest, irrespective of the number of levels for each factor. Whatever the number of factors and cells, the standard ANOVA partitions the variability among all of the cell means (i.e., SS Treatment ) into main effects and interaction components. Each component can be tested for significance. We have also seen some of the ways in which SPSS accommodates factorial ANOVAs. As for single factor ANOVAs, the omnibus F-tests for main effects and interactions may not permit very precise conclusions about the results (e.g., a main effect for the Dose factor with four levels would be ambiguous about the locus of the significant effect). More precise conclusions about main effects and interactions are possible using additional post hoc or planned comparisons discussed in Chapter 9. But first we want to present a notation for describing factorial calculations and to examine more closely the meaning of interactions and how that conceptualization maps onto our calculations.

22 Between-S Designs Factorial 7-22

23 Between-S Factorial More on Interactions 8.1 CHAPTER 8 - MORE ON FACTORIAL ANOVA AND INTERACTIONS Identifying Interactions in Tables and Figures Interactions in Tables of Cell Means Visualizing Interactions From Cell Means to SS Interaction Deviation of Observed Means from Predicted Cell Means if No Interaction Interaction as Cell Means Adjusted for Main Effects SS Interaction for the Aggression Study Using MANOVA to obtain Interaction Deviations Using GLM to obtain Interaction Deviations and SS Notation and Formula for Between-S Factorials Notation for Factorial Designs Equations to Calculate SSs for Between-S Factorial Conclusions

24 Between-S Factorial More on Interactions 8.2 This chapter will focus on the interaction term in factorial ANOVA. SS Interaction to this point has been calculated by subtraction; specifically, SS Interaction = SS Treatment - SS MainEffects, or by an appropriate contrast in the 2 x 2 design. This provides some rough understanding of what an interaction is (the variability in cell means that remains after row and column main effects are removed), but does not provide as full a conceptualization as is desirable. Moreover, calculation by subtraction provides no independent confirmation that in fact SS Treatment = SS MainEffects + SS Interaction, although that can be verified from SPSS output. Here we examine more closely the nature of interaction. IDENTIFYING INTERACTIONS IN TABLES AND FIGURES Recall that the basic definition of an interaction is that the effect of one variable depends on the level of a second variable (or more variables in higher-order interactions). We begin with several simple examples illustrating the presence or absence of interaction effects and how to identify these in tables of cell means or in corresponding plots of the means. Interactions in Tables of Cell Means In table format, an interaction can be determined by observing whether the effect of each variable is the same across the levels of the other variable. Box 8.1 shows the means CON ABS y Rel UNR REL y Conc y G = 6.0 Box 8.1. Cell Means from Recall Study. from the recall study. Consider first the concreteness effect. The difference between concrete and abstract words is 3.0 ( ) for unrelated words, but only 1.0 ( ) for related words. The overall difference between concrete and abstract words is 2.0, which is the average of the two differences across the levels of the relatedness factor. A similar comparison could be done for the effect of relatedness at each level of concreteness. The difference between related and unrelated words is 2.0 ( ) for concrete words, but increases to 4.0 ( ) for abstract words. The overall difference of 3.0 is the average of these differences across the levels of the concreteness factor. The difference in the differences indicates the presence of some interaction. The differences between levels of one factor would be identical at both levels of the other factor if there were absolutely no interaction. Of course, data is seldom an ideal fit to the underlying population values; hence a test of significance is used to determine when the

25 Between-S Factorial More on Interactions 8.3 deviation from no interaction is sufficiently large to conclude it is unlikely to have occurred by chance. In this study, we have already seen that the observed interaction could have occurred by chance (i.e., p >.05 even if there were no interaction in the population). There is another way to think about this issue that generalizes more readily to factors with more than two levels. If there is absolutely no interaction between two variables, then we should observe that the differences between cell means are exactly the same as the differences between the main effect means. In Box 8.1, for example, the main effect of concreteness is 2.0 units ( ). This same difference should be observed for both related and unrelated words (i.e., at each level of the other factor). As noted above, this was not the case. Observe again that the main effect of 2.0 is the average of the effects for unrelated words (3.0) and related words (1.0). Similarly, the main effect of relatedness is 3.0 ( ), which is the average of the effects for concrete words (2.0) and abstract words (4.0). Rather than talking in terms of differences between means (which works best when k = 2 for one or both factors), we could equivalently say that the deviation of each row mean from the grand mean should be observed when each cell mean is subtracted from its respective row mean. Because concrete words on average are 1.0 units above the grand mean ( ), we would expect concrete-unrelated words to be 1.0 units above the mean for unrelated words, and concrete-related words to be 1.0 units above the mean for related words. But this is not the case. Concrete-unrelated words are 1.5 units ( ) above the row mean for unrelated words, and concrete-related words are only.5 units ( ) units above the related row mean. Note that the overall deviation of concrete words from the grand mean (i.e., the average of related and unrelated words) equals the average of the effects for unrelated (1.5) and related (.5) words. A similar analysis of each row and column effect would demonstrate that deviations from no interaction exist in these data. Box 8.2 modifies this example slightly to observe how the data would have looked had there been absolutely no interaction in the data. Note first that the grand mean, the row means, and the column means are identical to CON ABS y Rel UNR REL y Conc y G = 6.0 Box 8.2. Revised No Interaction Result for Recall Study. those in Box 8.1. Only the cell means have been changed. Now there is absolutely no

26 Between-S Factorial More on Interactions 8.4 interaction. The difference between unrelated (4.5) and related (7.5) words is 3.0 units, which is exactly the same for both concrete ( = 3.0) and abstract ( = 3.0) words. Equivalently, for each row, the deviation of the row mean from the grand mean is identical to the deviation of each cell mean from its respective column mean. Related words, for example, are 1.5 units on average above the grand mean ( = 1.5), which is observed both for the concrete ( = 1.5) and abstract ( = 1.5) words. Similarly relationships hold for any row or any column in Box 8.2. We will see later that the deviations of the observed cell means from these no interaction means produce SS Interaction. Box 8.3 shows the cell means from the aggression study. The interaction can be clearly seen in several different ways. Consider first the Males Females Dose y GD y GD y.d y.d - y G y g y G = y g. - y G Box 8.3. Cell Means for Aggression Study. gender effect. The difference between Males and Females overall is = If there were absolutely no interaction between Gender and Dose, then the difference between males and females would be the same for all four doses. But it is not. For doses 0 to 3, respectively, the difference between males and females is ( ), 3.333, 3.555, and , values that are not even close to one another and even reverse the direction of difference. The average of these four differences is 3.944, the main effect difference. Equivalently, the Males overall were units above the grand mean. They should have been the same amount above the four row means for each dose, rather than being ( ), 1.666, 1.777, and away from the row means for doses 0 to 3, respectively. Because there are four levels to the dose variable, it is more difficult to consider interaction in terms of differences between means. There are six possible differences given four means. But the approach based on deviations from the grand mean works best. If there were no interaction, then each of the dose effects shown in the last column of Box 8.3 would be identical for the Male and Female columns, but that is not the case. Dose 0 overall deviated

27 Between-S Factorial More on Interactions 8.5 ( ) from the grand mean, whereas the corresponding deviations differ for Males (1.667 = ) and Females ( = ). The average of these values is , the main effect for that row. What we have just been considering is equivalent to what you might have learned in more basic classes about interactions; namely, that interactions result in cell means that are not parallel to one another. Absolutely no interaction occurs when lines are perfectly parallel. Note that cell means that are parallel are exactly the same distance from one another at each level of the other factor. We examine this in the next section. Visualizing Interactions Another way to conceptualize the interaction term in factorial ANOVA is to graph the individual cell means (i.e., the means for the combinations of conditions rather than the means for main effects). This allows us to see whether the effect of each variable is the same at each level of the other variable (i.e., no interaction), or whether the effect of each variable differs across the levels of the other variable (i.e., an interaction is present). Figure 8.1 shows the plot for the recall study. Concreteness is represented along the horizontal axis and Relatedness is represented by different symbols (o = Related and * = Unrelated). The vertical axis represents the dependent variable, Recall. An interaction exists when the difference between means for the levels of one variable varies across the levels of the other variable. Figure 8.1. Plot of Cell Means. As just noted, the difference between concrete and abstract words is somewhat larger for the unrelated words (y CU - y AU = = 3.0) than for related words (y CR - y AR = = 1.0). This appears in Figure 8.1 as a steeper decline from concrete to abstract words for unrelated pairs than for related pairs. Alternatively, we could note that the difference between related and unrelated conditions is somewhat greater for abstract words (y AR - y AU = = 4.0) than for concrete words (y CR - y CU = = 2.0). In Figure 8.1 this is revealed by the larger difference between Related (o) and Unrelated

28 Between-S Factorial More on Interactions 8.6 (*) for Concrete than Abstract words. Although there are deviations from no-interaction, previous analyses have already demonstrated that the interaction is not significant. A nonsignificant interaction indicates that the differences between the differences are not significant. The effect of concreteness is essentially the same for related and unrelated words. Alternatively, the effect of relatedness is equivalent for concrete and abstract words. Plots such as Figure 8.1 permit researchers to identify interactions because the lines joining the cell means will be parallel when there is absolutely no interaction between the variables, and will be markedly nonparallel when the variables interact. Figure 8.1 reveals that the lines connecting concrete and abstract scores are not absolutely parallel, so there is some interaction present, But the nonsignificant F for the interaction indicates that the deviations from parallel effects are not strong enough to reject the H 0 of no interaction. It could have occurred simply by chance. A plot for the significant interaction in the Aggression study was shown in chapter 7, and clearly demonstrated the non-parallel nature of the lines. Indeed, the lines crossed at the upper level where Females scored higher (slightly) than Males. Box 8.4 illustrates some alternative possible outcomes for a 2 2 factorial study. As in Figure 8.1, the vertical axis represents the dependent variable y, the horizontal axis represents levels of one independent variable (e.g., A1 and A2), and different symbols are used to represent the levels of the second independent Outcome 1 x x Y 5 - x o - x o - x o SS A > 0 (A2 > A1 averaged over B) - B2 x o SS B > 0 (B2 > B1 averaged over A) - o SS AB = 0 (lines parallel, 0 - B1 o B2-B1 same for A1 and A2; A1 A2 A2-A1 same for B1 and B2) Variable A Outcome 2 Y 5 - B2 x o - x o - x o SS A = 0 (A1 = A2 averaged over B) - o x SS B = 0 (B1 = B2 averaged over A) - o x SS AB > 0 (lines not parallel, 0 - B1 o x B2-B1 differs for A1 and A2; A1 A2 A2-A1 differs for B1 and B2) Variable A Outcome 3 Y 5 - B2 x x x x x xo - o - o SS A > 0 (A2 > A1 averaged over B) - o SS B > 0 (B2 > B1 averaged over A) - o SS AB > 0 (lines not parallel, 0 - B1 o B2-B1 differs for A1 and A2; A1 A2 A2-A1 differs for B1 and B2) Variable A Box 8.4. Some Alternative Outcomes for 2 2 Factorial Study.

29 Between-S Factorial More on Interactions 8.7 variable (e.g., open versus closed circles, solid versus dashed lines). Outcome 1 in Box 8.4 shows a study with only main effects for the two variables; A2 is higher than A1 and B2 is higher than B1. The lines are parallel, indicating that the differences between levels of one variable remain constant across levels of the other variable. Outcome 2 shows a situation in which there is no main effect for either variable, but the two variables interact. This pattern occurs because the variables have an exactly opposite effect for each level of the other variable. B1 is lower than B2 for A1, by exactly the same amount that B2 is lower than B1 for A2. Outcome 3 shows an example where both main effects and the interaction would be greater than zero. Interpretation of the main effects is complicated by the fact that the variables have no effect at one level of the other variable. It is not the case that A2 is uniformly higher than A1, nor that B2 is uniformly higher than B1. B2 is greater than B1 for A1, but B2 equals B1 for A2. Although plots of cell means help us to understand interactions, it is not obvious how they relate to the formula for the interaction SS. But there is indeed a relationship. Specifically, the SS for the interaction indicates how much the data would have to be adjusted in order for the lines to be parallel; that is, in order for there to be exactly 0 variability due to the interaction of the two factors. To see this relationship, we need to examine the interaction term more closely. FROM CELL MEANS TO SS INTERACTION The preceding discussion provides a good first step toward the direct calculation of the SS for the interaction term in ANOVA. We will carry this calculation out in two closely related ways, first by calculating the deviation of the observed cell means from what the cell means would have been if there were no interaction in the data, but the row, column, and grand means were the same as those observed. Deviation of Observed Means from Predicted Cell Means if No Interaction One precise way to think about interactions is in terms of deviations of cell means from the values expected given just the main effects of the two variables. That is, we can predict for each cell what the cell mean should be given the grand mean and the separate main effects of concreteness and relatedness. If there is absolutely no interaction between the variables, then the observed cell means should exactly equal these expected cell means. Any deviation of the

30 Between-S Factorial More on Interactions 8.8 observed cell means from the expected cell means represents interaction. The analysis relevant to this conceptualiza tion of interaction is shown in Box 8.5 for the recall study. This calculation of SS Interaction involves Add Main Effects y G Rel Conc y cr ' y cr y cr -y Ccr ' y UC ' = = y UA ' = = y RC ' = = y RA ' = = y cr ' = predicted value for each CR cell from just main effects. SS C R = n CR (y cr - y cr ') 2 = 5(.5² -.5² -.5² +.5²) = 5.0 Box 8.5. Interaction as Deviation of Cell Means from Cell Means Predicted from Main Effects. adding the effects of Relatedness and Concreteness to the Grand Mean to obtain a predicted mean for each the four cells (y cr ' in Box 8.5). The difference between the observed and predicted cell means represents the interaction. The reasoning behind this view of interaction is that the variability in the cell means is due to both main effects and interaction effects. For example, the mean for concrete/related words (y CR = 8.0) is high partly because of the main effects of concreteness (concrete words = +1.0 relative to the grand mean) and relatedness (related words = +1.5 relative to the grand mean). If only main effects were operating, this cell mean would be = 8.5, which is 2.5 units above the grand mean (+1.0 units because the words are concrete and +1.5 units because the words are related). But this predicted cell mean does not exactly equal the observed cell mean of 8.0. The observed cell mean is too low, either because of chance variation or because of a systematic interaction due to the unique combination of concreteness and relatedness. Squaring these deviations, summing them, and multiplying times the number of observations per cell gives us an overall estimate of how much scores in the four cells deviate from no interaction (SS C R = 5.0). That this value is small and nonsignificant indicates that deviations of cell means from expected means are not sufficiently large to reject the null hypothesis of no interaction between concreteness and relatedness.

31 Between-S Factorial More on Interactions 8.9 The predicted values are presented in Box 8.6 in matrix format to better appreciate the relationship between this calculation and our earlier discussion. Note in particular that the values we have just computed agree exactly with those presented in Box 8.2 as an example of no interaction. The effects of concreteness and relatedness are identical at each level of the other factor. These predicted values are the cell means we would observe if there were no interaction and assuming the same row, column, and grand means. The deviations of the observed cell means from these predicted cell means represents the interaction; squaring and multiplying these deviations by n j gives SS Interaction. Interaction as Cell Means Adjusted for Main Effects after the main A closely related way to think about the interaction term is as the variability in cell means effects have been removed (i.e., after adjusting the cell means for main effects). This approach is basically the reverse of the Subtract Main Effects y cr Rel Conc y cr ' y cr '-y G y UC ' = = y UA ' = = y RC ' = = y RA ' = = SS C R = n cr (y cr ' - y G ) 2 = 5(.5² -.5² -.5² +.5²) = 5.0 Box 8.7. Interaction as Variability in Cell Means After Removal of Main Effects. previous approach. Instead of predicting cell means from main effects, main effects are subtracted from cell means. Box 8.7 shows the calculations for the recall study. The adjustments in Box 8.7 remove all of the variability in the treatment means except for variability due to the interaction between concreteness and relatedness. The adjusted cell means have row and column means that equal the grand mean of 6.0. This is shown clearly in Box 8.8. Because there are no longer any main effects of concreteness or relatedness in the adjusted cell CON ABS y Rel UNR REL y Conc y G = 6.0 Box 8.6. Predicted Cell Means from Recall Study if No Interaction. Con Abs Ms Unr Rel Ms = M G Box 8.8. Adjusted Cell Means.

32 Between-S Factorial More on Interactions 8.10 means, whatever variability is left must be due to the interaction. If there were also no interaction in the original data (i.e., the plot of the original cell means produced parallel lines), then there would be no variation left in the cell means once main effects are removed; the cell means will all equal the grand mean. The remaining variability would be the squared deviations of the adjusted cell means from the grand mean multiplied times the number of observations; that is, SS CxR = 5( ) = 5.0. In the recall study, the adjusted cell means are not exactly equal to 6.0, the grand mean, but they are close and do not produce a significant F for the interaction. The 2 2 is the simplest form of two-way factorial, which simplifies somewhat the conceptualization of the interaction (e.g., each of the main effects and the interaction only have one degree of freedom). But the logic is similar when one or both factors have more than two levels. In essence, we can determine the variability in the row and column means about the grand mean to determine the main effects, and determine the interaction by how much the cell means differ from what we would expect given 0 interaction. Let us examine a study in which one of the variables has more than two levels. SS Interaction for the Aggression Study Box 8.9 shows the calculation of the interaction SS for the aggression study by predicting cell means from main effects. Each observation begins with the grand mean. Then the effects of gender are added ( for the four Male groups, and for the four Female Groups). Finally we add a Dose effect based on the deviation from the grand mean of the Dose means averaged across gender (-1.139, -.694, , and for doses 0, 1, 2, and 3, respectively).

33 Between-S Factorial More on Interactions 8.11 The resulting predicted cell means represent the idealized outcome when there is absolutely no interaction between the variables, but the same main effects are observed. The fact that these predicted cell means differ so much from the observed cell means is why the Dose Effect Gender Effect D y Dose y Dose -y G G y Gnd y Gnd -y G M F G D y G Gender Dose y gd ' y gd y gd -y gd ' M = M = M = M = F = F = F = F = SS G D = 9( ) Box 8.9. Interaction as Deviation of Cell Means from Means Predicted by Main Effects. interaction between Gender and Dose was significant. Main effects and chance do not account adequately for the variability in the 8 cell means. The main effects do a poor job because they assume that the effect of Gender is identical at all four Doses, and that the effect of Dose is identical for males and females. Neither of these (equivalent) assumptions is correct, which leads to poor estimation of the mean aggression in each cell, especially the cells that involve Dose 0 and Dose 3. The final column in Box 8.9 represents the deviation of the observed cell means from the predicted cell means given no interaction. These values are squared and multiplied by 9, the number of observations in each cell. Box 8.10 shows the predicted cell means in table format, along with the row, column, and grand means for these predicted cell means. Note that the latter values are identical to those observed in the actual data. Of greatest relevance to Male Female Ms Dose Ms Box Predicted Cell Means and Computed Row, Column, and Grand Means.

34 Between-S Factorial More on Interactions 8.12 the notion of interaction is the fact that the effect of each factor is now identical at each level of the other factor. The effect of being Male overall is to be units above the grand mean. This same difference now characterizes all four levels of the Dose factor; for example, = The same is true for the effect of being Female, and the effects for Dose 0, 1, 2, and 3. There is absolutely no interaction in these data. In other words, given the row, column, and grand means, these are what the cell means would have been like if there were no interaction. The deviations of the observed cell means from these values therefore represent the amount of interaction in the data. Using MANOVA to obtain Interaction Deviations It is possible to get SPSS to print out values corresponding to the above calculations. The trick is to limit the analysis to main effects, and to ask for predicted means (PMEANS) based just on those main effects. Box 8.11 uses MANOVA to illustrate this for the aggression study. MANOVA aggr BY gend(1 2) dose (0 3) /PMEANS = TABLE(gend BY dose) /DESIGN = gend dose. Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL GEND DOSE (Model) (Total) R-Squared =.242 Adjusted R-Squared =.197 Adjusted and Estimated Means CELL Obs. Mean Adj. Mean Est. Mean Raw Resid. Std. Resid Combined Adjusted Means for GEND BY DOSE GEND 1 2 DOSE 0 UNWGT UNWGT UNWGT UNWGT Box Using MANOVA to Produce Predicted Cell Means if No Interaction and Deviations of Observed from Predicted.

35 Between-S Factorial More on Interactions 8.13 Because we do not want the full factorial design, which includes the interaction, we must include a DESIGN subcommand and indicate just the main effects (GEND and DOSE in our study). We also ask MANOVA to produce the predicted means with PMEANS =. The remaining part of the subcommand asks for the cell means corresponding to our gend BY dose interaction. Compare our predicted values from Box 8.9 with the Adj. Mean and Est. Mean columns in Box They give the identical values. These are predictions of the cell means based solely on main effects and the grand mean. The Raw Resid. column in Box 8.11 agrees with our final column in Box 8.9, which we used to compute SS Interaction. These values squared, multiplied by n j (number of observations per cell), and summed up produce our index of the degree of interaction in the data. Using GLM to obtain Interaction Deviations and SS GLM is even more powerful than MANOVA with respect to demonstrating the nature of the interaction deviations and associated SSs. GLM allows the user to save predicted (and residual) scores from the analysis using the /SAVE option. Box 8.12 shows the use of the /SAVE command to save predicted scores from the full factorial analysis including both main effects and interaction (PINTER in top analysis). The PINTER predicted values correspond to the cell means, as shown in the listing in Box 8.13 (only the first case in each group is shown), and in the plot of the PINTER values as a function of gender and dose (Figure 8.2). GLM aggr BY gend dose /SAVE PRED(pinter). Source Type III Sum of df Mean Square F Sig. Squares Corrected Model (a) Intercept gend dose gend * dose Error Total Corrected Total a R Squared =.409 (Adjusted R Squared =.345) GLM aggr BY gend dose /DESIGN gend dose /SAVE PRED(pmain). Source Type III Sum of df Mean Square F Sig. Squares Corrected Model (a) Intercept gend dose Error Total Corrected Total a R Squared =.242 (Adjusted R Squared =.197) Box Saving Predicted Scores from Full Factorial (Top) and Analysis Without Interaction Term (Bottom).

36 Between-S Factorial More on Interactions 8.14 Clearly the observed means include a considerable interaction effect that results in the effect of Gender varying across levels of Dose or, equivalently, the effect of Dose varying across levels of Gender. Box 8.12 also shows a second GLM analysis that includes only main effects Figure 8.2. Plot of PINTER (Cell Means). (/DESIGN = dose gend) and then saves the predicted values, now not including the interaction (PMAIN). The PMAIN predicted values represent the predictions based solely on main effects and are shown in Figure 8.3 (and in the listing in Box 8.13 for the first case in each group). These are the expected values of the cells assuming no interaction between dose and gender. Note that the lines in Figure 8.3 are perfectly parallel to one another, indicating the complete absence of any interaction between dose and Figure 8.3. Plot of PMAIN Cell Values. gender. Note also that the

37 Between-S Factorial More on Interactions 8.15 PMAIN means are identical to intermediate values computed earlier in the manual calculation of SS for the interaction (e.g., Box 8.9). The interaction deviations are the differences between the cell means (PINTER) and the cell values given no interaction (PMAIN). These values are computed and shown in Box 8.13, and agree with our earlier calculations. Squaring and summing these deviations across the 72 subjects produces the SS for the interaction term, namely SS GxD = COMP inter = (pinter - pmain). FORMAT gend dose (F1.0) subj aggr (F2.0). LIST /CASES FROM 1 TO 72 BY 9. subj gend dose aggr pinter pmain inter COMP inter2 = inter**2. DESCR inter2 /STAT = SUM. N Sum inter Box Computation of SS Interaction Using Values Computed by GLM. NOTATION AND FORMULA FOR BETWEEN-S FACTORIALS To this point we have conducted factorial ANOVAs without introducing any additional notation, in part to demonstrate that the factorial ANOVA can be largely understood in terms of concepts and computations from a single-factor design, with some special considerations for the interaction. But to facilitate showing how the calculate SS Interaction directly, it will be helpful to introduce formula for the various calculations involved in factorial analysis of variance. To do this, we will need to modify somewhat the notation that we have used so far.

38 Between-S Factorial More on Interactions 8.16 Notation for Factorial Designs Box 8.14 illustrates the general notation that we will use for factorial designs. The two factors are labelled A and B, respectively (A is the column factor and B the row factor here). Uppercase A and B also refer to the number of levels of each factor (the value represented by k in the single factor designs), and the corresponding lowercase letters act as subscripts (instead of j as in the single factor designs) to represent levels of each factor (i.e., a = 1, 2,..., A; b = 1, 2,..., B). We will continue to use i and n (or N) to refer to subjects. S in Box 8.14 denotes Factor A a... A Row Factor B Means S 1 1 y 111 y y 112 i i... b y abiy ab y b... B the subject variable. Box Notation for Factorial Designs. Each observation in the design is indicated in general by y abi, where the three subscripts indicate the level of A, the level of B, and the individual subject (S) within that particular AB combination (e.g., the fifth observation in the third level of A and the second level of B would be y 325 ). Each cell of the design has n ab observations, representing the number of observations in each combination of a level of A and a level of B. The n ab observations in each cell can be averaged to produce a cell mean, indicated by y ab in this notation. We can also calculate a SSs for each cell, denoted by SS ab, by summing across the n ab subjects the squared deviation of each cell observation from the cell mean. The SSs within each of the cells is error variability, because the combination of conditions is identical for all subjects in a cell. n ab SS ab Column Means y a y G n a n b N SS G

39 Between-S Factorial More on Interactions 8.17 Equations to Calculate SSs for Between-S Factorial Equations 8.1 to 8.5 summarize the calculations of the various SSs, MSs, Fs, and so on for the Between-S factorial design. Equation 8.1 shows the calculation of SS Error. Although the notation has changed, the operations are essentially those from the single factor design, but now treating each cell as a condition. The basic operation is subtracting each observation from the group mean, squaring and summing over the observations in each group, and then summing over all of the AxB groups. This amounts to calculating SS for each group (if not provided) and then summing the SSs. The SS for any cell can be obtained by taking its standard deviation, squaring it, and multiplying by the number of subjects in the group minus one. This is also shown in equation form in 8.1. The df for SS Error is the total number of subjects, N, minus the total number of groups, A x B in the factorial design. Equat Formula for SS Error for Between-S Factorial. Calculations for the Main Effects of A and B are shown in Equations 8.2 and 8.3, respectively. The mean for each level of A (or B) is subtracted from the grand mean, squared, and then multiplied by the number of observations contributing to that mean. The resulting SS has df = # means - 1 (i.e., A - 1 or B - 1, depending on the factor). Also shown, are the calculations for F and eta 2. Equat Analysis of Main Effect of A. Equat Analysis of Main Effect of B. Equations 8.4 and 8.5 show the calculations relevant to the interaction. SS Interaction (or SS AxB in the equations) can be calculated by subtraction (Equation 8.4) or from the cell means using one of our two approaches (Equation 8.5). The SS for the interaction has df = (A - 1)(B - 1). The top formula in Box 8.5 calculates the interaction by calculating the deviation of the

40 Between-S Factorial More on Interactions 8.18 observed cell mean from the expected cell mean if there were no interaction, the part of the formula in {}. This predicted value is the grand mean, plus the effect of factor A (i.e., the deviation of the mean for each level of A from the grand mean), plus the effect of factor B (i.e., the deviation of the mean for each level of B from the grand mean). We then calculate the deviation of the observed cell mean from this predicted cell mean if no interaction to produce the deviations representing the interaction. Note how this corresponds to using GLM and MANOVA to obtain values for the calculation of SS Interaction, as well as to the approach in Box 8.9. Equat Analysis of Interaction Effect. Equat Formula for SS A x B for Between-S Factorial. The second formula represents a second approach to calculating the interaction, essentially determining whether there is any variability left in the cell means when the main effects are removed. We first subtract the main effects of A and B from the observed cell means; this is represented by the part of the formula in {}. These adjusted cell means are then subtracted from the grand mean to produce the deviations representing the interaction. These values will be identical to those calculated using the top formula.

41 Between-S Factorial More on Interactions 8.19 Box 8.15 illustrates the use this second equation to calculate SS Interaction for the aggression study. The Gender and Dose effects are subtracted from the cell means to produce adjusted cell means with only interaction (i.e., main effects are removed). There is still variability in these adjusted cell means, as indicated by the fact that they are not all equal to the grand mean. Subtracting the grand mean from the adjusted cell means produces the same G D y gd Gender Dose y gd ' y G y gd '-y G M = = M = = M = = M = = F = = F = = F = = F = = SS G D = 9( ) Box Using Second Version of Formula to Calculate SS Interaction. interaction deviations as using the other approaches. The final step is to square these deviations, multiply by the number of subjects per cell, and sum them over all the cells (i.e., over the levels of A and B). Although the value for df Interaction may be less intuitive than df for main effects, notice that the interaction df consumes the remaining df. That is, df Total = N - 1 = df Error + df A + df B + df AxB = N - AxB + A-1 + B-1 + (A- 1)(B-1) = N - AxB + A B AxB - A - B + 1 = N - 1. CONCLUSIONS This chapter has focussed on the interaction term in factorial ANOVA. Interactions are extremely important in research, both for theoretical and applied reasons. A factor that impacts performance differently at different levels of another factor provides information about underlying theory and also practice. Take our aggression study as a (hypothetical) example. The observed pattern suggests on the theoretical side that testosterone contributes to differences in aggression between males and females. On the applied side, treatments for aggression (e.g., in

42 Between-S Factorial More on Interactions 8.20 forensic settings) could be developed on the basis of the observed effect, perhaps especially for male offenders. Or perhaps female violent offenders have elevated levels of testosterone, making them more male-like in their aggressive tendencies. Each of the three tests of significance, two main effects and one interaction, may allow only vague conclusions depending on the number of levels to each factor and the pattern of results. The next chapter examines follow-up analyses for the main effects and the following chapter alternative approaches to interactions that allow for more specific interpretations of significant effects.

43 Main Effect Comparisons 9.1 CHAPTER 9 FOLLOW-UP COMPARISONS FOR FACTORIAL ANOVA MAIN EFFECTS Post Hoc Comparisons for Main Effects Post Hoc Comparisons and the 2 x 2 Recall Study Post Hoc Comparisons for the 2 x 4 Aggression Study Supplementary SPSS Analyses for the 2 x 4 Post Hoc Tests Planned Contrasts for Main Effects Planned Contrasts for the 2 x 2 Recall Study Planned Contrasts for the 2 x 4 Aggression Study Manova and the Aggression Study GLM and the Aggression Study Main Effect Contrasts Using Cell Means Conclusions

44 Main Effect Comparisons 9.2 As in oneway ANOVA, omnibus F tests for main effects and interactions do not always permit specific conclusions about the outcome of a study. For example, a significant main effect for a factor having four levels does not indicate which groups differ significantly from one another. If the four levels were ordered along a meaningful dimension, for example, it would be interesting to know whether the linear or other polynomial effect of the treatment was significant. This could be especially important to determine if the main effect itself was not significant, but would also help to understand more clearly the specific nature of a significant difference where one does occur (i.e., what pattern of differences among the means produced the significant main effect). These concerns about main effects implicate comparison procedures analogous to those used for the single-factor design. In the factorial design, however, follow-up analyses would be required for each main effect in the factorial design with more than two levels (i.e., df > 2). Factors with only two levels would not require follow-up analyses, as they do not involve multiple comparisons (only a single comparison is possible for two groups). The factorial ANOVA also raises specific statistical demands because of the interaction component. That variables interact significantly or not is too imprecise a conclusion for most research purposes. Researchers want to know what specific differences and non-differences contribute to the interaction. Moreover, omnibus F tests for interactions can be very insensitive (i.e., conservative with a high probability of a Type II Error). Focused contrasts and other special analyses may have to be performed in order to determine whether the effect of one variable in fact varies as a function of the levels of the other variable in the expected way. The present chapter describes focused follow-up analyses for main effects, and the next chapter covers follow-up analyses for interactions. Post hoc (i.e., pairwise) and planned comparisons for main effects operate similarly to those discussed for the single factor ANOVA. The only differences are that the means must be averaged over any other factors in the study, and the n used for the calculations equals the number of observations that contribute to the means being compared; that is, n is summed over the levels of other factors (in essence n A and n B in our formula). This procedure is often referred to as collapsing across the levels of one or more factors. If 10 males and 10 females are in each of three treatment groups, for example, then comparisons between pairs of treatment means

45 Main Effect Comparisons 9.3 (ignoring or collapsed across gender) will involve 20 subjects per treatment mean, rather than 10. POST HOC COMPARISONS FOR MAIN EFFECTS The post hoc procedures that we consider in this book involve only pairwise comparisions, although other procedures are available for comparisons involving more than two groups (i.e., analogous to some of the comparisons that we do as planned comparisons). Let us first consider a design that would in fact not require any follow-up analyses because the two factors in the study involve only two levels each. Post Hoc Comparisons and the 2 x 2 Recall Study Because both factors in the recall study have only two levels, neither post hoc nor planned comparisons are required for the main effects. Any additional tests would be redundant with the omnibus ANOVA. Moreover, the various post hoc tests would all be equivalent to one another. For educational purposes, however, it is still worthwhile to examine how such tests would be done for the 2 x 2 design. Calculations for the 2 x 2 recall study. The relevant calculations for pairwise comparisons of the concreteness effect are presented in Box 9.1. The mean for the concrete words averaged across low and high levels of relatedness was 7.0, and the mean for the abstract words was 5.0. These values are substituted in our formulas for the post-hoc t M ab s, n ab s = 5 Abstract Concrete M b n b Unrelated Related M a = M g n a t = ( ) / (2.5(1/10+1/10)) = = F Omnibus = q/ 2 q = ( ) / (2.5(1/10)) = 4.0 = t 2 Box 9.1. Pairwise Comparisons for Main Effect of Concreteness in the 2 x 2 Recall Study. and q statistics. The only difference from the single-factor between-s comparisons is that the y s and ns contributing to the various statistics are summed across the other factor, as shown in Box 9.1 (e.g., n j = = 10). The MS Error of 2.5 in Box 9.1 is from analyses reported in earlier chapters As shown in Box 9.1, the tests for k = 2 produce results identical to the overall ANOVA.

46 Main Effect Comparisons 9.4 The observed values would be compared to the critical values determined by the df associated with the MS Error. Because there are only two levels to the factor, all post hoc tests would be identical. Specifically, because k = 2, the stretch for the TUKEY and SNK procedures would both be 2, which is the same stretch used for the LSD procedure. Moreover, the Bonferroni test would use alpha =.05/1 =.05, because only one comparison is being made between the two groups. Hence it would also be identical to the other tests. These equivalencies nicely demonstrate that the differences among the various post hoc procedures only emerge when there are in fact multiple comparisons being made (i.e., when the number of levels of the factor are greater than two). SPSS analyses. The various pairwise comparison procedures are not available in SPSS MANOVA, but can be done using GLM. Alternatively, they could be done by obtaining the observed values using ts for multiple contrasts times 2 to get q, or by comparing t Observed to q / 2. Pairwise comparison procedures for main effects are also available in other statistical packages. Although the post hoc comparisons for the 2 x 2 factorial involving concreteness and relatedness would be redundant with the omnibus ANOVA results, it is educational to demonstrate these equivalencies using the GLM procedure. But GLM refuses to conduct any post hoc comparisons for this design because it requires factors with more than 2 levels before conducting post hoc tests. SPSS presents the following warning: Post hoc tests are not performed for CON because there are fewer than three groups. When factors involve more than two levels, then it is possible to conduct post hoc tests. Under most circumstances involving post hoc comparisons, the omnibus F for the main effect should significant, although that is not something that SPSS will check. The following post hoc analyses of the Dose variable in the Testosterone and Aggression study, for example, is undertaken despite the lack of significance in the omnibus F for Dose, p =.175.

47 Post Hoc Comparisons for the 2 x 4 Aggression Study Cell Means Main Effect Comparisons 9.5 Earlier chapters presented data and analyses on the effects of gender and prenatal testosterone exposure on aggression. The cell means and ANOVA results are reproduced in Box 9.2. The Gender effect and the Dose by Gender interaction were significant. With only two levels (i.e., df = 1), gender cannot be partitioned into more specific components. The (nonsignificant) main effect of dose, however, should be examined more closely. hoc tests are technically not Dose y Gend M F y D Source SS df MS F Sig. Gender Dose G D Error Total Box 9.2. Means and ANOVA for Aggression Study. Calculations for 2 x 4 post hoc tests. The main effect of dose was not significant, so post appropriate or at least a very conservative procedure should be used. Nonetheless, three of our tests are shown in Box 9.3 for illustration DEN q = (MS Err /n d ) = (13.26/18) =.858 df=64 Ordered Main Effect Means Stretch q.05,60 y L TUK y y LSD y TUKEY, SNK LSD purposes (n d is the n Box 9.3. Pairwise Comparisons for DOSE Main Effect of the Dose means that are being compared, averaged across gender = = 18). The results are consistent with the nonsignificant F, although the LSD procedure (the most liberal test) does lead to the rejection of H 0 : µ 0 = µ 2, the comparison for the two most extreme groups. With six t-

48 Main Effect Comparisons 9.6 tests, however, the probability of at least one Type I error is quite high, so considerable caution should be exercised in interpreting this effect. Since we know that the Bonferroni procedure is even more conservative than Tukey, we can safely anticipate that SPSS will report a nonsignificant Bonferroni for all pairwise comparisons in this study. GLM analyses for 2 x 4 post hoc tests. GLM can be used to perform the pairwise comparison procedures demonstrated in Box 9.3, as well as the Bonferroni correction described previously for the single-factor designs. The relevant GLM commands and (edited) output appear in Box 9.4. The tests are requested by adding the POSTHOC keyword, the name of the factor on which the tests are to be performed, and the desired test(s). As for the single-factor design, we will consider the LSD, SNK, TUKEY, and Bonferroni procedures, remembering that in practice just one of these tests would be specified depending on how liberal or conservative researchers had decided to be prior to the experiment. GLM aggr BY gend dose /POSTHOC = dose (LSD SNK TUKEY BONFERRONI).... Mean Difference Std. Sig. (I) DOSE (J) DOSE (I-J) Error LSD (*) Tukey HSD Bonferroni * The mean difference is significant at the.050 level. Homogeneous Subsets N Subset DOSE 1 Student-Newman Keuls(a,b,c) Sig..194 Tukey HSD(a,b,c) Sig..194 Box 9.4. GLM and Post Hoc Comparisons for Factorial Design. The results (with redundant lines and confidence intervals removed) in Box 9.4 are in the same format as results for single-factor designs. LSD and Bonferroni procedures are presented as probabilities associated with each pairwise comparison, the SNK test as homogeneous subsets of

means, and the TUKEY in both formats. Main Effect Comparisons 9.7 The LSD output produces the same conclusions as our earlier calculations in Box 9.3.

49 means, and the TUKEY in both formats. Main Effect Comparisons 9.7 The LSD output produces the same conclusions as our earlier calculations in Box 9.3. The difference between Dose 0 and Dose 2 is the only significant effect, p =.048, which is less than.05. The other ps presented in the last (i.e., Sig.) column do not approach significance, at least not using a standard nondirectional alpha =.05. The more conservative TUKEY and SNK tests produce no significant differences, as was also found in Box 9.3. The extremely conservative Bonferroni procedure produces even higher p values and again none of the differences are significant. Note that the p values for the Bonferroni are 6 times the p values for the LSD procedure (or 1.00 if 6 x p produces an impossible p value greater than 1.0). For example, 6 x.048 =.289, the Bonferroni p for Dose 0 versus Dose 2. The ps are multiplied by six because there are (4 x 3) / 2 = 6 pairwise comparisons. As noted previously, if p for the observed statistic is less than alpha/6, the standard Bonferroni criterion, then 6*p will be less than alpha, which is the way in which SPSS reports the Bonferroni. The preceding analyses could have been obtained using menu commands, as shown in Figure 9.1. As described in previous chapters, Analyze General Linear Model Univariate would bring up the middle dialogue box in Figure 9.1. Variables would be entered for the Dependent Variable (aggr) and for the Fixed Factor(s) (gend, dose). Figure 9.1. Post Hoc Comparisons for Main Effects Using Menu.

50 Main Effect Comparisons 9.8 Selecting Post Hoc activates the top box. The factor on which users want Post Hoc tests conducted is selected and moved into the Post Hoc Tests for: box. The available Post Hoc tests become activated. In Figure 9.1 we have selected LSD, SNK, Tukey, and Bonferroni. Clicking Continue and Ok would initiate the analyses shown previously. GLM provides another way to conduct post hoc comparisons, although the available tests are more limited. The /EMMEANS option allows users to conduct supplementary analyses of GLM aggr BY gend dose /EMMEANS = TABLE(dose) COMPARE(dose) ADJ(LSD).... Estimated Marginal Means dose Mean Std. 95% Confidence Interval Error Lower Bound Upper Bound Pairwise Comparisons (I) (J) Mean Difference Std. Sig.(a) 95% Confidence Interval for dose dose (I-J) Error Difference(a) Lower Bound Upper Bound (*) Univariate Tests Sum of Squares df Mean Square F Sig. Contrast Error Box 9.5. Post Hoc Comparisons Using EMMEANS Option in GLM. main effects and interactions, including the possibility of performing pairwise comparisons. The procedure is illustrated in Box 9.6. The term EMMEANS is an abbreviation for Estimated Marginal Means. A Table of means is requested by listing variable labels for main effects (as here) or interactions in brackets. Including the COMPARE command provides pairwise tests of the subsequently named variable, and ADJUSTED indicates which test to use. The options are NONE (i.e., LSD), BONFERRONI, or SIDAK, a test that we have not covered. The results in Box 9.6 agree with the results obtained previously for the LSD procedure. Although of little

51 Main Effect Comparisons 9.9 added benefit here, EMMEANS is useful for follow-up analyses of interactions (see next chapter) and for pairwise comparisons for Within-Subject factors (discussed in later chapters). The standard POSTHOC options cannot be used with Within-Subject factors. Supplementary SPSS Analyses for the 2 x 4 Post Hoc Tests Although the results in Box 9.4 provide all the information necessary to draw conclusions about the various pairwise tests, there are educational reasons to examine the above comparisons in slightly different ways. comparisons (Dose 0 versus Dose 2 and Dose 1 versus Dose 3) done as planned contrasts, The first supplementary analysis shown in Box 9.6 shows two of the six pairwise MANOVA aggr BY gend(1 2) dose(0 3) /CONTRASt(dose) = SPECIAL( ).... Estimates for AGGR --- Individual univariate.9500 confidence intervals... DOSE Parameter Coeff. Std. Err. t-value Sig. t Lower -95% CL- Upper Box 9.6. Two Pairwise Contrasts for 2 x 4 Design. along with a third contrast orthogonal to these two (i.e., Doses 0 & 2 versus Doses 1 & 3). Only the output for the Dose contrasts is shown. Note that the p values for Parameter 3 (Dose 0 versus 2) and Parameter 4 (Dose 1 versus 3) duplicate the LSD ps shown in Box 9.4 for the corresponding pairwise comparisons. The LSD procedure is essentially a t test for the difference between two means (or the equivalent F or q test). The ts in Box 9.6 could be converted to qs by multiplying them by 2, which would permit comparisons to appropriate critical values of q for the SNK or TUKEY procedures (as well as the LSD test, which can be performed using either t or q). Additional analyses would be needed to obtain all six pairwise t values. The ps in Box 9.6 could be converted to Bonferroni ps by multiplying by the total number of pairwise comparisons, six in the present case. The second supplementary analysis, which is shown in Box 9.7, computes LSD p values for the six qs computed in Box 9.3. The method involves converting the qs to ts and then computing the ps for a two-tailed t-test. Note that the six ps correspond to the values produced in

52 Main Effect Comparisons 9.10 Box 9.4 (and Box 9.5) for the LSD comparisons, with small differences due to rounding. Box 9.7 demonstrates that the ps reported in Box 9.4 for the LSD comparisons are easily available given a calculated value for t (which we or SPSS can readily compute), its associated df, and a table of the t-distribution. We could also have had SPSS compute 6 x the p values computed in Box 9.7, which would give the Bonferroni values in Box 9.4. As noted in previous chapters, it would also be possible to compute the SNK and TUKEY p values, or compute the critical values of t or q for the various post hoc procedures that we have considered. These calculations would again confirm the progression from liberal to conservative as we proceed from LSD to SNK to Tukey to Bonferroni. PLANNED CONTRASTS FOR MAIN EFFECTS As noted for single-factor designs, planned contrasts often allow for a more sensible and parsimonious interpretation of the results than do post hoc pairwise comparisons. Moreover, such contrasts can be performed even if the omnibus F for the effect is not significant. We begin again with the 2 x 2 design, even though follow-up analyses for that design are superfluous. Planned Contrasts for the 2 x 2 Recall Study DATA LIST FREE / d1 d2 q df. BEGIN DATA END DATA. COMP t = q/sqrt(2). COMP p = 2*(1- CDF.T(t, df)). FORMAT t p (F6.3). LIST. D1 D2 Q DF T P Box 9.7. Using SPSS to Calculate LSD p Values. Box 9.8 shows several planned contrast calculations for the main effect of concreteness. The two means for abstract and concrete words are averaged across levels of relatedness giving 5.0 and 7.0, respectively. These means produce a contrast of 2.0, which can then be used to compute a SS for the contrast and associated F test, or an equivalent t test. The only change in the calculation of these statistics is to use the appropriate n j to compute SS L and t L.

53 Main Effect Comparisons 9.11 The bottom few lines compute the SS for concreteness in a somewhat different way. The M ab s, n ab s = 5 Abstract Concrete M b n b Unrelated Related four cell means are arranged in a row and four contrast coefficients contrast the two Abstract groups (coded -1) with the two Concrete groups (coded +1). The final SS L of 20.0 is the same as using the first procedure, as would be F L and t L. This approach allows us to see that the main effect is the effect of concreteness averaged across levels of relatedness, or equivalently, the Concrete minus Abstract word difference averaged across Related and Unrelated word types. We will see later that there are alternative views of such differences (or more generally of patterns for one factor over levels of the other factor) that are relevant to understanding interactions. M a n a L SS L = nl²/ c² c j = (10 2.0²)/(-1²+ 1²) F L = (SS L /1) /MS Error = (20.0/1)/2.5 = 8.0 = t 2 t L = L/ (MS Error c 2 j/n j ) = (2.0-0)/ (2.5(1/10+1/10)) = = F Alternative Calculation UA UC RA RC M ab L SS = (5 x )/( ) Box 9.8. Concreteness Effect as Planned Contrast. SPSS and Planned Contrasts for Between-S Factorial Designs. Both SPSS MANOVA and GLM are capable of planned contrasts, although the subcommands are somewhat different than for ONEWAY. As noted previously, the /CONTRAST subcommand in MANOVA requires (a) that the variable name appears in parentheses following the CONTRAST keyword, (b) that the keyword SPECIAL appears after the CONTRAST keyword if specific coefficients are to be listed, and (c) that k ones and another k-1 sets of k orthogonal coefficients all be contained in a single set of brackets after the keyword SPECIAL. For a 3-group design, for example, a legitimate contrast is: /CONTRAST (factor) = SPECIAL ( ), where factor

54 Main Effect Comparisons 9.12 would be a valid independent variable having three levels. The k ones represent the grand mean, and the two sets of coefficients contrast group 1 vs. group 2 and groups 1 and 2 versus group 3. The CONTRAST subcommand defines the contrasts of interest. There are three ways to obtain the results of the tests. The default approach presents the results as t tests. To obtain SSs and Fs, include the /PRINT = SIGNIFICANCE(SINGLEDF) subcommand, or identify the specific effects to be tested on the DESIGN statement. Although the SINGLEDF and DESIGN approaches will report SSs and Fs for the requested contrasts, and the default option prints t-tests, the results of the t and F tests are equivalent (i.e., t² = F and the corresponding ps are identical). Box 9.9 illustrates the use of the CONTRAST command to perform planned contrasts for the concreteness by relatedness study. As mentioned previously, these additional analyses are redundant with the overall ANOVA results because concreteness and relatedness each have only two levels. To compute the SSs for the contrasts, use the number of observations contributing to each mean and not just the number of observations in each cell. For example, SS Concreteness = ² / 2 = 20.0, where 10 = and 2.0 is the value of L for the contrast. A final observation is that CINTERVAL, which is in fact optional since this is the default output, produces results for more than just the concreteness contrast, which is what we requested. Several commands in MANOVA automatically report statistics for any single-df effect in the design, and may even automatically partition multi-df effects into single-df effects. The default CINTERVAL is one such command.

55 Main Effect Comparisons 9.13 DATA LIST FREE/ recall con rel. BEGIN DATA END DATA. MANOVA recall BY con (1,2) rel (1,2) /PRINT = CELL /CONTR (con) = SPECIAL( ). FACTOR CODE Mean Std. Dev. N CON CON REL UNR REL REL CON ABS REL UNR REL REL For entire sample Source of Variation SS DF MS F Sig of F WITHIN CELLS CON REL CON BY REL Par. Coeff. Std. Err. t-value Sig. t Low-95% CL-Upper CON a REL b CxR c a ² = 8.0 = F Conc b = 18.0 = F Rel c = 2.0 = F C R Box 9.9. Planned Contrasts and Confidence Intervals in MANOVA. One note of caution. Because SPSS may initiate some contrasts of its own, as in the preceding analysis, it may not always be obvious exactly what contrasts SPSS is testing. Therefore, researchers should be as explicit as possible about the desired contrasts, and take care in reading the results. Do not assume that MANOVA is examining precisely the contrast of interest unless you have explicitly specified that contrast. We will save discussion of other MANOVA approaches to contrasts, and the GLM approach for consideration of factors with more than two levels, as in the following example.

56 Planned Contrasts for the 2 x 4 Aggression Study Main Effect Comparisons 9.14 Although educational about the continuity of planned contrasts with the omnibus main effects, the preceding analyses do not fully convey the capacity of planned contrasts to shed light on the pattern of differences that gives rise to significant main effects or, in other cases, on the capacity of planned contrasts to identify significant patterns in the data even when the main effect is not significant. We examine MANOVA aggr BY gend(1 2) dose (0 3) /PRINT = CELLINFO(MEANS) FACTOR CODE Mean Std. Dev. N GEND 1 DOSE DOSE DOSE DOSE GEND 2 DOSE DOSE DOSE DOSE For entire sample Source of Variation SS DF MS F Sig of F WITHIN CELLS GEND DOSE GEND BY DOSE Box Factorial ANOVA for Aggression Study. and demonstrate these strengths of planned comparisons for the Aggression study previously analyzed by pairwise comparisons. Box 9.10 reports the means for the study and the results of a traditional factorial analysis of variance, as discussed in earlier chapters. Because DOSE is an ordered variable, polynomial analysis(also called trend analysis) is one appropriate approach to planned follow-up analyses. It seems likely that theory would predict a linear increase in aggression with increases in testosterone. It is also possible that some component of the relation may be nonlinear. One possible nonlinear relation, for example, is that the effect of additional testosterone decreases with increasing doses, with most of the difference occurring at the lower ends of the scale. Alternatively, small amounts of testosterone may have little effect, and changes in aggression may only be observed after high doses are introduced. If the researchers had planned analysis of the linear and nonlinear effects of dose, the orthogonal planned contrasts could be done even though the main effect of dose was not significant. Computations for contrasts. The analysis for the linear trend in the dose effect is shown in Box After means and ns are collapsed across gender, the analysis is identical to that performed for the single-factor design (note that n in the formula for SS L is equal to 18, the sum of 9 males plus 9 females). As illustrated later, a linear contrast for the main effect could also have

57 Main Effect Comparisons 9.15 been computed using the 8 cell means and duplicating the linear coefficients for Males and Females. SS L would then have been computed using n ab = 9, which would produce the same SS. Observe that the linear effect of dose is marginally significant (p =.073), even though the main effect of dose did not approach significance (p =.175 in Box 9.10). This occurs because the Linear effect with df = 1 accounts for 44.1 units of variability (its SS), out of the total SS Dose = 67.72, which had df = 4-1 = 3. Dividing 44.1 by 1 results in a much larger numerator for the F test (44.1 vs ). The remaining units of M F y Dose c Linear L Linear = c j y Dose = = 7.0 n Dose = 18 = SS Linear = nl 2 Linear/ c² = ² / 20 = 44.1 F Linear = 44.1 / = 3.33 p =.073 t Linear = 3.33 = Box Contrast for Linear Effect of Dose. variability( = 23.62) would be due to the quadratic and cubic effects. Also keep in mind that the.073 is based on a two-tailed test (i.e., aggression either increases or decreases) and therefore could be divided in half for the probability of a linear increase in aggression (i.e., p =.0365). This analysis shows the benefits of planning statistical analyses carefully to avoid both inappropriate conclusions about the effects or noneffects of variables and the theoretical errors associated with such conclusions. Manova and the Aggression Study The Dose effect was not significant (p =.175) in the standard analysis. Because DOSE is an ordered variable, however, trend or polynomial analysis is appropriate and might produce a significant effect for one of its components. As just noted, the SS Dose of units was divided by df Dose = 3 in the standard ANOVA. If most of the SS Dose is due to the linear, quadratic, or cubic effect, then the F corresponding to one of these contrasts could be significant. The trend analysis can be done in MANOVA using either special contrast coefficients or the subcommand: /CONTRAST (factor) = POLYNOMIAL, which requests polynomial contrasts

58 Main Effect Comparisons 9.16 for the specified variable (DOSE in the present example). As noted earlier, there are three ways to obtain the statistical results for the three contrasts from MANOVA. The default MANOVA approach to polynomial contrasts is illustrated in Box Polynomial contrasts are requested and their significance levels are reported as ts following the Anova summary table. Parameters are listed under the overall effect to which they contribute MANOVA aggr BY gend(1 2) dose(0 3) /CONTR(dose) = POLY. Source of Variation SS DF MS F Sig of F WITHIN CELLS GEND DOSE GEND BY DOSE (Model) (Total) R-Squared =.409 Adjusted R-Squared =.345 Estimates for AGGR --- Individual univariate.9500 confidence intervals GEND Parameter Coeff. Std. Err. t-value Sig. t Lower -95% CL- Upper DOSE Parameter Coeff. Std. Err. t-value Sig. t Lower -95% CL- Upper GEND BY DOSE Parameter Coeff. Std. Err. t-value Sig. t Lower -95% CL- Upper Box MANOVA and Polynomial Contrasts. (Gender, Dose, Gender by Dose) and numbered sequentially as Parameters 2 to 8 (Parameter 1 is the constant or grand mean). Parameters 3, 4, and 5 correspond to the linear, quadratic, and cubic components for the main effect of Dose. Parameters 6, 7, and 8 reflect the interaction, and are discussed in the next chapter. Note that the t for the linear effect of Dose is 1.824, which agrees with the value calculated in Box The p associated with this t is.073. Note as well that the actual value for the linear contrast in Box 9.11, , does not equal our computed value of This is because MANOVA uses normalized coefficients for the polynomial (and some other built-in contrasts); in fact, 7.00/SQRT(20) = , where 20 = Because is a normalized contrast coefficient, SS Linear = 18 x = = the value calculated earlier and reported in various SPSS printouts. Normalized coefficients eliminate any

59 Main Effect Comparisons 9.17 contribution of the magnitude of the contrast coefficients to the contrast because the sum of the normalized coefficients squared (i.e., the divisor in our formula for SS L ) equals 1.00; that is, = = Box 9.13 shows a second MANOVA approach to contrasts and the resulting analysis. As well as the CONTRAST(dose) = POLYNOMIAL subcommand, the DESIGN command requests the three single df contrasts for the Dose effect, referred to as DOSE(1), DOSE(2), and DOSE(3), MANOVA aggr BY gend(1 2) dose (0 3) /CONTRAST (dose) = POLYNOMIAL /DESIGN gend dose(1) dose(2) dose(3) gend BY dose Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL GEND DOSE(1) DOSE(2) DOSE(3) GEND BY DOSE (Model) (Total) R-Squared =.409 Adjusted R-Squared = Box Explicit Requests for Polynomial Contrast Fs. rather than the default main effect of DOSE with df = 3. Given these commands, SPSS partitions the Dose effect into linear, quadratic, and cubic components. These contrasts can then be identified on the DESIGN statement as dose(1) for the linear, dose(2) for the quadratic, and dose(3) for the cubic, each with a corresponding SS, df = 1, an F, and a p value. The results here agree with earlier computations. The linear effect of dose is marginally significant, having the same p value as the corresponding t in Box In fact, SPSS would report those same t-tests given the commands in Box 9.13, but those results have been edited out. There are several benefits to the additional information provided in the ANOVA summary table, as opposed to the default t-test output. First, it is possible now to explicitly see the partitioning of SS Dose into the linear, quadratic, and cubic componets. That is, SS Dose = = 67.72, as shown in Box Second, each of the SSs can be used to calculate an 2 for each effect. For example, dividing SS Linear by SS Total gives 2 for the linear effect, 2 Linear =

60 44.10 / =.031. MANOVA aggr BY gend(1 2) dose (0 3) /PRINT = CELLINFO(MEANS) SIGNIF(SINGLEDF) /CONTRAST (dose) = POLYNOMIAL Main Effect Comparisons 9.18 Box 9.14 demonstrates a third way to obtain significance tests for the polynomial contrasts. The PRINT = SIGNIF(SINGLEDF) subcommand requests SPSS to partition multiple df effects into single df effects and report the resulting SSs and Fs. Source of Variation SS DF MS F Sig of F WITHIN CELLS GEND DOSE ST Parameter ND Parameter RD Parameter GEND BY DOSE ST Parameter ND Parameter RD Parameter (Model) (Total) Box SINGLEDF Keyword and Polynomial Contrasts. The ANOVA table now includes three additional lines for the main effect of Dose, as well as for the Gender by Dose interaction. SPSS reports the three additional effects as the 1st, 2nd, and 3rd Parameters. These three parameters correspond to the linear, quadratic, and cubic contrasts (each of which has one df) because the CONTRAST command specified polynomial contrasts. As in earlier output, only the linear component approaches significance (p =.073, twotailed). If different contrasts were specified, parameters 1, 2, and 3 would no longer correspond to polynomial effects. No DESIGN statement is needed because SPSS does a factorial analysis by default and it is the main effect of DOSE that is of primary interest here. The 1st, 2nd, and 3d parameters under GEND BY DOSE represent the Linear, Quadratic, and Cubic components of the interaction and are discussed in the next chapter. The equivalence of the default t test analysis to the preceding ANOVA analyses can be demonstrated by comparing p values, by squaring ts (e.g., t DLin = = 3.33 = F DLin ), or by

61 Main Effect Comparisons 9.19 computing SSs from the coefficients for the contrasts. To compute SSs, it is important to recall that SPSS uses normalized contrast coefficients to produce the contrast coefficients in the printouts. Therefore, nc 2 gives the appropriate SSs; for example, nc 2 DLin = = = SS DoseLinear, where 18 = 2 9 (i.e., number of levels of gender factor times number of subjects per cell). Some "trial and error" may be necessary to find the correct n and SSs, so procedures that explicitly produce SSs are preferred if one wants to calculate 2, Fs, or other statistic based on SSs. One interesting feature of the results in these analyses is that the linear component of the main effect for DOSE approached significance (p =.073), even though the omnibus F was not even close to significance (p =.175). This again emphasizes how a focused contrast with one degree of freedom is more sensitive to certain effects than the omnibus F with df > 1. The preceding trend or polynomial analysis could also have been performed using explicit polynomial coefficients; that is, /CONTRAST(dose) = SPECIAL( ). The final results would have been the same (except perhaps for the actual value of the contrast using integer rather than normalized coefficients), but the POLYNOMIAL keyword saves the effort of determining or finding the appropriate coefficients. For many contrasts other than polynomials, however, it will be necessary to explicitly state contrasts using the SPECIAL keyword. The parameters in the output will correspond to the contrasts in the order that they are specified in the CONTRAST command. GLM and the Aggression Study Box 9.15 shows the corresponding analyses using GLM. The p values again GLM aggr BY gend dose /CONTRAST(dose) = POLY.... Custom Hypothesis Tests Linear Contrast Estimate Hypothesized Value 0 Std. Error.858 Sig..073 Quadratic Contrast Estimate Hypothesized Value 0 Std. Error.858 Sig..479 Cubic Contrast Estimate Hypothesized Value 0 Std. Error.858 Sig..263 Source Sum of Squares df Mean Square F Sig. Contrast Error Box Polynomial Contrasts for Aggression Study Using GLM.

Main Effect Comparisons 9.20 accord with previous values, and the coefficients agree with previously reported values using normalized coefficients.

62 Main Effect Comparisons 9.20 accord with previous values, and the coefficients agree with previously reported values using normalized coefficients. GLM does not report the actual test statistics, but this could be readily calculated by dividing the contrast estimate by its standard error; for example, /.858 = 1.824, which agrees with ts reported previously. The analysis presented in Box 9.15 could also have been initiated using the menus, as shown in Figure 9.2. The design has been specified, as described previously, and clicking on CONTRASTS brings up the top dialogue box. The DOSE variable has been selected, and the POLYNOMIAL option is highlighted. Clicking CHANGE will replace the NONE in parentheses Figure 9.2. Menu Approach to Contrasts in GLM. following DOSE to POLYNOMIAL. CONTINUE and OK will run the analysis, which will duplicate the results presented in Box One shortcoming of the preceding GLM analyses is the failure to obtain SSs for each component of the polynomial contrasts. Without an SS for the linear component, for example, it would not be possible to determine a measure of the strength of that component (i.e., SS Linear / SS Total ). There are several ways to obtain the SSs for each contrast with GLM, although some involve somewhat obtuse commands. These ways essentially correspond to approaches that were introduced for planned comparisons with single factor designs.

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES FOR SINGLE FACTOR BETWEEN-S DESIGNS Planned or A Priori Comparisons We previously showed various ways to test all possible pairwise comparisons for