MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 09: Introduction to Analysis of Variance (ANOVA) April 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide 2 Aims of the Lecture 3 Typical Syntax 4 Introduction 5 Example... 5 Outline 9 Concepts of Analysis of Variance (ANOVA) 10 Key Steps in Analysis of Variance... 10 Designs of ANOVA... 11 Sum of Squares... 12 Two-Way ANOVA... 16 Prerequisites of ANOVA... 21 ANOVA with SPSS: Two Detailed Examples 22 One-way ANOVA... 22 Two-way ANOVA... 31
Aims of the Lecture Slide 3 You will understand the key steps in conducting an analysis of variance. You will understand the concept of sum of squares. You will understand the concept of multiple testing. You will understand the concept of interaction in a two-way analysis of variance. You can conduct an analysis of variance with SPSS In particular, you will know how to = interpret the output significance of overall model and factors adjusted R squared and partial eta squared interaction describe the output Typical Syntax Slide 4 Boxplot of variable split by experien EXAMINE VARIABLES= BY experien /PLOT=BOXPLOT/STATISTICS/NOTOTAL. Analysis of variance of by experien and position UNIANOVA BY experien position /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /POSTHOC=experien(BONFERRONI) /PLOT=PROFILE(experien*position position*experien) /PRINT=DESCRIPTIVE /CRITERIA=ALPHA(.05) /DESIGN=experien position experien*position. Variables in the model Post hoc test Profile plots
Introduction Slide 5 Example Research in human resource management: Survey of nurse salaries in hospitals Level of Experience grand mean 1 2 3 All All 36.- 38.- 42.- 39.- Nurse Salary [CHF/h] Data Subsample of n = 96 nurses Among other variables: work experience (3 levels), (hourly wage in CHF/h) Typical questions Has experience an effect on the level of? Are the results only due to chance? What is the relation between work experience and? Boxplot Slide 6 - - - grand mean The boxplot indicates that may differ significantly depending on levels of experience.
Slide 7 Questions Question in everyday language: Has work experience an effect on? Research question: Is there a relation between work experience and? What kind of model is suitable for the relation? Is analysis of variance the right model? Statistical question: Forming hypothesis H 0 : "No model" (= Not significant factors) H A : "Model" (= Significant factors) Can we reject H 0? Solution Linear model with as the dependent variable (y gk = wage of nurse k in group g) y = y+α +ε gk g gk y = grand mean α = effect of group g g ε = random term gk Slide 8 "How-to" in SPSS Scales Dependent Variable: metric Independent Variable(s): categorical, part of them metric (called covariates) SPSS AnalyzeGeneral Linear ModelUnivariate... Results Overall model significant ("Corrected Model": F(2, 93) = 46.193, p =.000). experien significant example interpretation: There is a main effect of experience (levels 1, 2, 3) on, F(2, 93) = 46.193, p =.000. The value of Adjusted R Squared =.488 shows that 48.8% of the variance in around the grand mean can be predicted by the model (here by experien).
Outline Slide 9 Basic situation Given: One metric dependent and one or more independent variables with categorical scales (called facrors), part of them with metric scales (called covariates) Task: Find a relationship between the characteristics. Analysis of Variance (ANOVA) ANOVA tests statistically whether or not the means of several groups are all equal. Therefore ANOVA generalizes the two-sample t-test for more than two groups. Analysis of variance Divides the observed variance into components due to different factors. Uses inferential statistics methods to estimate the parameters. ANOVA differs from regression analysis independent variables (called factors) are categorical interaction term is calculated automatically Concepts of Analysis of Variance (ANOVA) Slide 10 Key Steps in Analysis of Variance 1. Design of experiments ANOVA is typically used for analyzing the findings of experiments Oneway ANOVA, Repeated measures ANOVA Multi-factorial ANOVA (two or more factor analysis of variance) 2. Calculating differences and sum of squares Differences between group means, individual values and grand mean are squared and summed up. This leads to the fundamental equation of ANOVA. Test statistics for significance test is calculated from the means of the sums of squares. 3. Prerequisites Data is Independent Normally distributed variables Homogeneity of variance between groups 4. Verification of the model and the factors Is the overall model significant? (F-test)? Are the factors significant? Are prerequisites met? 5. Checking measures Adjusted R squared / partial Eta squared Mixed ANOVA
Designs of ANOVA Slide 11 One-way ANOVA: one factor analysis of variance (this Lecture 09) 1 dependent variable and 1 independent factor Multi-factorial ANOVA: two or more factor analysis of variance (this Lecture 09) 1 dependent variable and 2 or more independent factors MANOVA: multivariate analysis of variance Extension of ANOVA used to include more than one dependent variable Repeated measures ANOVA (see Lecture 10) 1 independent variable but measured repeatedly under different conditions ANCOVA: analysis of COVariance (see Lecture 10) Model includes a so called covariate (metric variable) MANCOVA: multivariate analysis of COVariances Mixed-design ANOVA possible (e.g. two-way ANOVA with repeated measures) Sum of Squares Step by step Guess: What if y1 y2 y 3? Slide 12 Survey on hospital nurse : Salaries differ by level of experience. y y 42.7 41.6 42.7 41.6 B y 3i y mean of experience level 3 3 of i-th nurse with experience level 3 Salary [CHF/h] 38.6 y 38.6 y 2 A y mean of all nurses 35.9 35.9 y 1 Legend individual nurse salaries A+B total variation from mean of all nurses A part of variation due to experience level Expand 1 2 3 B random part of variation level of experience
Calculation of group effects Slide 13 Linear model with as dependent variable y = y+α +ε gk g gk y = grand mean α = effect of group g (A) g ε = random term (B) gk y1k = y + (y1 y) +ε 1k y1k = 38.6 + (35.9 38.6) +ε 1k y1k = 38.6 2.7+ε 1k y2k = y + (y2 y) +ε 2k y2k = 38.6 + (38.4 38.6) +ε 2k y2k = 38.6 0.2+ε 2k y3k = y + (y3 y) +ε 3k y3k = 38.6 + (41.6 38.6) +ε 3k y3k = 38.6+ 3.0+ε 3k Basic idea of ANOVA If y y y then SS SS 1 2 3 between within Slide 14 Total sum of squared variance of differences SS total is separated into two parts (SS is short for Sum of Squares) SS between Part of sum of squared difference due to groups ("between groups", treatments) (here: between levels of experience) SS within Part of sum of squared difference due to randomness ("within groups", also SS error ) (here: within each experience group) Fundamental equation of ANOVA: G Kg G G Kg 2 2 2 (ygk y) = K g(yg y) + (ygk y g) g= 1 k= 1 g= 1 g= 1 k= 1 SStotal SSbetween SSwithin g: index for groups from 1 to G (here: G = 3 levels of experience) k: index for individuals within each group from 1 to K g (here: K 1 = K 2 = K 3 = 32, K total = K 1 + K 2 + K 3 = 96 nurses)s within
Significance testing of the model If y y y then MS MS 1 2 3 b w Test statistic F for significance testing is computed by relation of means of sum of squares Slide 15 MS t = SSt K 1 total Mean of SS total MS b = SSb G 1 Mean of SS between MS w = SSw K G total Mean of SS within Calculating test statistic F and significance testing for the global model MS F= MS b w F follows an F-distribution with (G 1) and (K total G) degrees of freedom The F-test verifies the hypothesis that the group means are equal: H 0: y1= y2 = y3 H : y A i j y for at least one pair ij Two-Way ANOVA Slide 16 Research in human resource management: Survey of nurse Level of Experience 1 2 3 All Position Office 35.- 37.- 39.- 37.- Hospital 37.- 40.- 44.- 40.- All 36.- 38.- 42.- 39.- Nurse Salary [CHF/h] Now two factors are in the design Work experience (Level of experience 1-3): experien Work position (Position in office or hospital): position Typical questions Do work position and experience have an effect on? ( main effects) What "interaction" exists between work position and experience? ( interaction effects)
Slide 17 Main effects The direct effect of an independent variable on the dependent variable is called main effect. In the example: The main effect of experien reveals that the nurses salaries depend on their level of professional experience. The main effect of position reveals that the nurses salaries depend on whether they work in the office or the hospital. Profile plots are used as visualization: Main effect experien Main effect position 45 40 35 30 25 20 15 10 5 0 1 2 3 experien 45 40 35 30 25 20 15 10 5 0 office hospital position If the profile plot shows a (nearly) horizontal line, the main effect in question is presumably not significant. (Attention: SPSS cuts off lower area of graph, Y-axis often does not start at 0!) Interaction effects An interaction between experience and position means there is dependency between the two variables. The independent variables have a complex influence on the dependent variable. The factors do not just function additively but act together in a different manner. Slide 18 An interaction means that the effect of one factor depends on the value of another factor. experience (factor A) interaction (factor A x B) position (factor B)
Interaction effects In the example: The interaction between experien and position means... that the effect of work experience on is not the same for nurses who work in offices and for nurses who work in the hospital. that the difference in between nurses working in the hospital and nurses working in the office depends on the level of experience. Slide 19 Profile plots: Separate lines for position Separate lines for experien 45 40 35 30 hospital office 45 40 35 30 experien 3 2 1 25 25 20 15 20 15 10 10 5 5 0 1 2 3 experien 0 office position hospital If there is an interaction, the lines are not parallel. The more the lines deviate from being parallel, the more likely is an interaction. If there is no interaction, the lines are parallel. Sum of Squares (with interaction) Again SS total = SS between + SS within With SS between = SS Experience + SS Position + SS Experience x Position Slide 20 Follows SS total = (SS Experience + SS Position + SS Experience x Position ) + SS within Where SS Experience x Position is the interaction of both factors simultaneously
Prerequisites of ANOVA Slide 21 0. Robustness ANOVA is relatively robust against violations of prerequisites. 1. Sampling Random sample, no treatment effects (more in Lecture 10) A well designed study avoids violation of this assumption 2. Distribution of residuals Residuals (= error) are normally distributed Correction transformation 3. Homogeneity of variances Residuals (= error) have constant variance (more in Lecture 10) Correction weight variances 4. Balanced design Same sample size in all groups Correction weight mean SPSS automatically corrects unbalanced designs by Sum of Squares "Type III" Syntax: /METHOD = SSTYPE(3) ANOVA with SPSS: Two Detailed Examples Slide 22 One-way ANOVA SPSS: AnalyzeGeneral Linear ModelUnivariate...
SPSS output ANOVA Tests of Between-Subjects Effects I Slide 23 Significant overall model (called "Corrected Model") Significant constant (called "Intercept") Significant variable experien Example interpretation for the main effect of experien: There is a main effect of experience (levels 1, 2, 3) on, F(2, 93) = 46.193, p =.000. The value of Adjusted R Squared (.488) shows that 48.8% of the variance in around the grand mean can be predicted by the model (here: variable experien). SPSS output ANOVA Tests of Between-Subjects Effects II Slide 24 Allocation of sum of squares to terms in the SPSS output "Grand mean" SS between SS within (= SS error ) SS total SS between reflects the sum of squares of all factors in the model. In this case (one-way analysis) SS between experien
Partial Eta Squared (partial η 2 ) Partial Eta Squared compares the amount of variation explained by a particular factor (all other variables fixed) to the amount of variation that is not explained by any other factor in the model. This means, we are only considering variation that is not explained by other variables in the model. Partial η 2 indicates what percentage of this variation is explained by a variable. Slide 25 2 SSEffect Partial η = SS + SS Effect Error In case of one-way ANOVA: Partial η 2 is the proportion of the corrected total variation that is explained by the model (= R 2 ). Example: Experience explains 49.8% of the previously unexplained variation. Note: The values of partial η 2 do not sum up to 100%! ( "partial") "Intercept" in SPSS In case of ANOVA, "Intercept" in SPSS refers to the grand mean. If the F-test for the grand mean is significant, this indicates that the grand mean differs significantly from 0. Slide 26 0 In our example, partial η 2 is.996 and thus very large. This indicates that the "grand mean" is large compared to the other variances. But: The focus of ANOVA lies on group differences. The grand mean itself is secondary. Thus partial η 2 of the "intercept" is not interpreted.
Parameter estimates Slide 27 SPSS sets the mean of one group artificially to 0 (default: last group) "SPSS coding" Coding as on slide 13 (as seen from a referece (as seen from grand mean) group, here experien=3) y 41.6 5.7 1k 2k = +ε 1k y1k = 38.6 2.7+ε 1k y 41.6 3.2 3k = +ε 2k y2k = 38.6 0.2+ε 2k y 41.6 0.0 add 3 to the means and substract 3 from the grand mean = + +ε 3k y3k = 38.6+ 3.0+ε 3k Multiple testing Post hoc comparisons I If H 0 is rejected, the group means will differ with a 95% probability. H 0: y1= y2 = y3 H : y A i j y for at least one pair ij Slide 28 Which of the groups are different? Dr. Sorglos thinks the risk of falling is only 5%! Why not simply compare means pairwise? Example: In the case of a rope with 20 knots, each knot has α = 5% as the probability of failure. All knots together, however, have a probability of failure of 1 - (1-0.05) 20 = 0.64. The risk of a deadly fall therefore is 64%! In order to keep this risk at the desired 5% level, each knot may not exceed the probability of failure of α Β = α/number of knots = 5%/20 = 0.25%. Cartoon: Dubben, H.-H.(2006): Der Hund, der Eier legt : Erkennen von Fehlinformation... 6. Auflage, Rowohlt, Hamburg.
Multiple testing Post hoc comparisons II There are a several methods for comparing the groups. All methods are similar, however, in that they solve the problem of multiple testing. Slide 29 Example Bonferroni correction If k means are tested in connection with each other, it becomes necessary to conduct n = k (k 1)/2 tests. In order to keep significance levels the same for the entire test, each test must be conducted using error probability α/n. Multiple testing Post hoc comparisons III 1.3 10-4 Slide 30 Groups 1 and 2 have a significant difference (p =.000) Groups 2 and 3 have a significant difference (p =.000) Groups 3 and 1 have a significant difference (p =.000) As a comparison: A t-test with Groups 1 and 2 as independent samples produces also p =.000 But the precise p-values show that the t-test is too optimistic Bonferroni adjusted test (Groups 1 and 2): p = 1.3 10-4 t-test (Groups 1 and 2): p = 4.2 10-8 p-value of t-test considerably lower
Two-way ANOVA Slide 31 SPSS: AnalyzeGeneral Linear ModelUnivariate... Interaction Slide 32 Interaction term between fixed factors is calculated by default in ANOVA Example interpretation (among other duty descriptions): There is also an interaction of experience and position on, F(2, 90) = 18.991, p =.000, partial η 2 =.297. The interaction term experien * position explains 29.7% of the previously unexplained variance.
Interaction Do different levels of experience influence the impact of different levels of position differently? Yes, if experience has values 2 or 3 then the influence of position is raised. Slide 33 office hospital Simplified: Lines not parallel Interpretation: Experience is more important in hospitals than in offices. More on interaction Slide 34 Main effect of experien Main effect of position Interaction Main effect of experien Main effect of position Interaction Main effect of experien Main effect of position Interaction experien experien experien Main effect of experien Main effect of position Interaction Main effect of experien Main effect of position Interaction Main effect of experien Main effect of position Interaction experien experien experien