Research Methodology: Tools

Similar documents
WELCOME! Lecture 13 Thommy Perlinger

Research Methodology: Tools

Neuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA:

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Categorical Predictor Variables

Regression With a Categorical Independent Variable

BIOMETRICS INFORMATION

ANCOVA. Lecture 9 Andrew Ainsworth

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Analyses of Variance. Block 2b

Correlations. Notes. Output Created Comments 04-OCT :34:52

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Analysis of Covariance (ANCOVA) Lecture Notes

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

SIMPLE REGRESSION ANALYSIS. Business Statistics

Least Squares Analyses of Variance and Covariance

An Old Research Question

Statistiek II. John Nerbonne using reworkings by Hartmut Fitz and Wilbert Heeringa. February 13, Dept of Information Science

Regression With a Categorical Independent Variable: Mean Comparisons

Factorial designs. Experiments

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

Review of the General Linear Model

Regression With a Categorical Independent Variable

same hypothesis Assumptions N = subjects K = groups df 1 = between (numerator) df 2 = within (denominator)

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

ANOVA Analysis of Variance

Analysis of Variance

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

Chap The McGraw-Hill Companies, Inc. All rights reserved.

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Analysis of variance

Types of Statistical Tests DR. MIKE MARRAPODI

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Battery Life. Factory

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Statistical Techniques II EXST7015 Simple Linear Regression

psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Chapter 7 Factorial ANOVA: Two-way ANOVA

One-Way ANOVA Source Table J - 1 SS B / J - 1 MS B /MS W. Pairwise Post-Hoc Comparisons of Means

1 DV is normally distributed in the population for each level of the within-subjects factor 2 The population variances of the difference scores

Using the GLM Procedure in SPSS

Difference in two or more average scores in different groups

Two-Way ANOVA. Chapter 15

Comparing Several Means: ANOVA

Experimental Design and Data Analysis for Biologists

Workshop Research Methods and Statistical Analysis

10/31/2012. One-Way ANOVA F-test

Two-Sample Inferential Statistics

One-Way ANOVA Cohen Chapter 12 EDUC/PSY 6600

Lecture Week 1 Basic Principles of Scientific Research

Research Design - - Topic 13a Split Plot Design with Either a Continuous or Categorical Between Subjects Factor 2008 R.C. Gardner, Ph.D.

Contents. Acknowledgments. xix

General Linear Model

Basic Statistical Analysis

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Contrasts (in general)

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

A Re-Introduction to General Linear Models (GLM)

Longitudinal Data Analysis of Health Outcomes

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

26:010:557 / 26:620:557 Social Science Research Methods

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

Analysis of Variance. ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร

Repeated Measures Analysis of Variance

4.1. Introduction: Comparing Means

8/04/2011. last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression)

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

What is a Hypothesis?

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

This module focuses on the logic of ANOVA with special attention given to variance components and the relationship between ANOVA and regression.

Unbalanced Designs & Quasi F-Ratios

Calculating Fobt for all possible combinations of variances for each sample Calculating the probability of (F) for each different value of Fobt

One-way between-subjects ANOVA. Comparing three or more independent means

Hypothesis testing: Steps

Analysis of Variance (ANOVA)

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

One-way Analysis of Variance. Major Points. T-test. Ψ320 Ainsworth

Factorial ANOVA. More than one categorical explanatory variable. See last slide for copyright information 1

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Interactions and Factorial ANOVA

Hypothesis testing: Steps

Simple Linear Regression

Interactions and Factorial ANOVA

Inference for Regression Simple Linear Regression

Fractional Factorial Designs

Unit 27 One-Way Analysis of Variance

One-way between-subjects ANOVA. Comparing three or more independent means

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2


Transcription:

MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 09: Introduction to Analysis of Variance (ANOVA) April 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide 2 Aims of the Lecture 3 Typical Syntax 4 Introduction 5 Example... 5 Outline 9 Concepts of Analysis of Variance (ANOVA) 10 Key Steps in Analysis of Variance... 10 Designs of ANOVA... 11 Sum of Squares... 12 Two-Way ANOVA... 16 Prerequisites of ANOVA... 21 ANOVA with SPSS: Two Detailed Examples 22 One-way ANOVA... 22 Two-way ANOVA... 31

Aims of the Lecture Slide 3 You will understand the key steps in conducting an analysis of variance. You will understand the concept of sum of squares. You will understand the concept of multiple testing. You will understand the concept of interaction in a two-way analysis of variance. You can conduct an analysis of variance with SPSS In particular, you will know how to = interpret the output significance of overall model and factors adjusted R squared and partial eta squared interaction describe the output Typical Syntax Slide 4 Boxplot of variable split by experien EXAMINE VARIABLES= BY experien /PLOT=BOXPLOT/STATISTICS/NOTOTAL. Analysis of variance of by experien and position UNIANOVA BY experien position /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /POSTHOC=experien(BONFERRONI) /PLOT=PROFILE(experien*position position*experien) /PRINT=DESCRIPTIVE /CRITERIA=ALPHA(.05) /DESIGN=experien position experien*position. Variables in the model Post hoc test Profile plots

Introduction Slide 5 Example Research in human resource management: Survey of nurse salaries in hospitals Level of Experience grand mean 1 2 3 All All 36.- 38.- 42.- 39.- Nurse Salary [CHF/h] Data Subsample of n = 96 nurses Among other variables: work experience (3 levels), (hourly wage in CHF/h) Typical questions Has experience an effect on the level of? Are the results only due to chance? What is the relation between work experience and? Boxplot Slide 6 - - - grand mean The boxplot indicates that may differ significantly depending on levels of experience.

Slide 7 Questions Question in everyday language: Has work experience an effect on? Research question: Is there a relation between work experience and? What kind of model is suitable for the relation? Is analysis of variance the right model? Statistical question: Forming hypothesis H 0 : "No model" (= Not significant factors) H A : "Model" (= Significant factors) Can we reject H 0? Solution Linear model with as the dependent variable (y gk = wage of nurse k in group g) y = y+α +ε gk g gk y = grand mean α = effect of group g g ε = random term gk Slide 8 "How-to" in SPSS Scales Dependent Variable: metric Independent Variable(s): categorical, part of them metric (called covariates) SPSS AnalyzeGeneral Linear ModelUnivariate... Results Overall model significant ("Corrected Model": F(2, 93) = 46.193, p =.000). experien significant example interpretation: There is a main effect of experience (levels 1, 2, 3) on, F(2, 93) = 46.193, p =.000. The value of Adjusted R Squared =.488 shows that 48.8% of the variance in around the grand mean can be predicted by the model (here by experien).

Outline Slide 9 Basic situation Given: One metric dependent and one or more independent variables with categorical scales (called facrors), part of them with metric scales (called covariates) Task: Find a relationship between the characteristics. Analysis of Variance (ANOVA) ANOVA tests statistically whether or not the means of several groups are all equal. Therefore ANOVA generalizes the two-sample t-test for more than two groups. Analysis of variance Divides the observed variance into components due to different factors. Uses inferential statistics methods to estimate the parameters. ANOVA differs from regression analysis independent variables (called factors) are categorical interaction term is calculated automatically Concepts of Analysis of Variance (ANOVA) Slide 10 Key Steps in Analysis of Variance 1. Design of experiments ANOVA is typically used for analyzing the findings of experiments Oneway ANOVA, Repeated measures ANOVA Multi-factorial ANOVA (two or more factor analysis of variance) 2. Calculating differences and sum of squares Differences between group means, individual values and grand mean are squared and summed up. This leads to the fundamental equation of ANOVA. Test statistics for significance test is calculated from the means of the sums of squares. 3. Prerequisites Data is Independent Normally distributed variables Homogeneity of variance between groups 4. Verification of the model and the factors Is the overall model significant? (F-test)? Are the factors significant? Are prerequisites met? 5. Checking measures Adjusted R squared / partial Eta squared Mixed ANOVA

Designs of ANOVA Slide 11 One-way ANOVA: one factor analysis of variance (this Lecture 09) 1 dependent variable and 1 independent factor Multi-factorial ANOVA: two or more factor analysis of variance (this Lecture 09) 1 dependent variable and 2 or more independent factors MANOVA: multivariate analysis of variance Extension of ANOVA used to include more than one dependent variable Repeated measures ANOVA (see Lecture 10) 1 independent variable but measured repeatedly under different conditions ANCOVA: analysis of COVariance (see Lecture 10) Model includes a so called covariate (metric variable) MANCOVA: multivariate analysis of COVariances Mixed-design ANOVA possible (e.g. two-way ANOVA with repeated measures) Sum of Squares Step by step Guess: What if y1 y2 y 3? Slide 12 Survey on hospital nurse : Salaries differ by level of experience. y y 42.7 41.6 42.7 41.6 B y 3i y mean of experience level 3 3 of i-th nurse with experience level 3 Salary [CHF/h] 38.6 y 38.6 y 2 A y mean of all nurses 35.9 35.9 y 1 Legend individual nurse salaries A+B total variation from mean of all nurses A part of variation due to experience level Expand 1 2 3 B random part of variation level of experience

Calculation of group effects Slide 13 Linear model with as dependent variable y = y+α +ε gk g gk y = grand mean α = effect of group g (A) g ε = random term (B) gk y1k = y + (y1 y) +ε 1k y1k = 38.6 + (35.9 38.6) +ε 1k y1k = 38.6 2.7+ε 1k y2k = y + (y2 y) +ε 2k y2k = 38.6 + (38.4 38.6) +ε 2k y2k = 38.6 0.2+ε 2k y3k = y + (y3 y) +ε 3k y3k = 38.6 + (41.6 38.6) +ε 3k y3k = 38.6+ 3.0+ε 3k Basic idea of ANOVA If y y y then SS SS 1 2 3 between within Slide 14 Total sum of squared variance of differences SS total is separated into two parts (SS is short for Sum of Squares) SS between Part of sum of squared difference due to groups ("between groups", treatments) (here: between levels of experience) SS within Part of sum of squared difference due to randomness ("within groups", also SS error ) (here: within each experience group) Fundamental equation of ANOVA: G Kg G G Kg 2 2 2 (ygk y) = K g(yg y) + (ygk y g) g= 1 k= 1 g= 1 g= 1 k= 1 SStotal SSbetween SSwithin g: index for groups from 1 to G (here: G = 3 levels of experience) k: index for individuals within each group from 1 to K g (here: K 1 = K 2 = K 3 = 32, K total = K 1 + K 2 + K 3 = 96 nurses)s within

Significance testing of the model If y y y then MS MS 1 2 3 b w Test statistic F for significance testing is computed by relation of means of sum of squares Slide 15 MS t = SSt K 1 total Mean of SS total MS b = SSb G 1 Mean of SS between MS w = SSw K G total Mean of SS within Calculating test statistic F and significance testing for the global model MS F= MS b w F follows an F-distribution with (G 1) and (K total G) degrees of freedom The F-test verifies the hypothesis that the group means are equal: H 0: y1= y2 = y3 H : y A i j y for at least one pair ij Two-Way ANOVA Slide 16 Research in human resource management: Survey of nurse Level of Experience 1 2 3 All Position Office 35.- 37.- 39.- 37.- Hospital 37.- 40.- 44.- 40.- All 36.- 38.- 42.- 39.- Nurse Salary [CHF/h] Now two factors are in the design Work experience (Level of experience 1-3): experien Work position (Position in office or hospital): position Typical questions Do work position and experience have an effect on? ( main effects) What "interaction" exists between work position and experience? ( interaction effects)

Slide 17 Main effects The direct effect of an independent variable on the dependent variable is called main effect. In the example: The main effect of experien reveals that the nurses salaries depend on their level of professional experience. The main effect of position reveals that the nurses salaries depend on whether they work in the office or the hospital. Profile plots are used as visualization: Main effect experien Main effect position 45 40 35 30 25 20 15 10 5 0 1 2 3 experien 45 40 35 30 25 20 15 10 5 0 office hospital position If the profile plot shows a (nearly) horizontal line, the main effect in question is presumably not significant. (Attention: SPSS cuts off lower area of graph, Y-axis often does not start at 0!) Interaction effects An interaction between experience and position means there is dependency between the two variables. The independent variables have a complex influence on the dependent variable. The factors do not just function additively but act together in a different manner. Slide 18 An interaction means that the effect of one factor depends on the value of another factor. experience (factor A) interaction (factor A x B) position (factor B)

Interaction effects In the example: The interaction between experien and position means... that the effect of work experience on is not the same for nurses who work in offices and for nurses who work in the hospital. that the difference in between nurses working in the hospital and nurses working in the office depends on the level of experience. Slide 19 Profile plots: Separate lines for position Separate lines for experien 45 40 35 30 hospital office 45 40 35 30 experien 3 2 1 25 25 20 15 20 15 10 10 5 5 0 1 2 3 experien 0 office position hospital If there is an interaction, the lines are not parallel. The more the lines deviate from being parallel, the more likely is an interaction. If there is no interaction, the lines are parallel. Sum of Squares (with interaction) Again SS total = SS between + SS within With SS between = SS Experience + SS Position + SS Experience x Position Slide 20 Follows SS total = (SS Experience + SS Position + SS Experience x Position ) + SS within Where SS Experience x Position is the interaction of both factors simultaneously

Prerequisites of ANOVA Slide 21 0. Robustness ANOVA is relatively robust against violations of prerequisites. 1. Sampling Random sample, no treatment effects (more in Lecture 10) A well designed study avoids violation of this assumption 2. Distribution of residuals Residuals (= error) are normally distributed Correction transformation 3. Homogeneity of variances Residuals (= error) have constant variance (more in Lecture 10) Correction weight variances 4. Balanced design Same sample size in all groups Correction weight mean SPSS automatically corrects unbalanced designs by Sum of Squares "Type III" Syntax: /METHOD = SSTYPE(3) ANOVA with SPSS: Two Detailed Examples Slide 22 One-way ANOVA SPSS: AnalyzeGeneral Linear ModelUnivariate...

SPSS output ANOVA Tests of Between-Subjects Effects I Slide 23 Significant overall model (called "Corrected Model") Significant constant (called "Intercept") Significant variable experien Example interpretation for the main effect of experien: There is a main effect of experience (levels 1, 2, 3) on, F(2, 93) = 46.193, p =.000. The value of Adjusted R Squared (.488) shows that 48.8% of the variance in around the grand mean can be predicted by the model (here: variable experien). SPSS output ANOVA Tests of Between-Subjects Effects II Slide 24 Allocation of sum of squares to terms in the SPSS output "Grand mean" SS between SS within (= SS error ) SS total SS between reflects the sum of squares of all factors in the model. In this case (one-way analysis) SS between experien

Partial Eta Squared (partial η 2 ) Partial Eta Squared compares the amount of variation explained by a particular factor (all other variables fixed) to the amount of variation that is not explained by any other factor in the model. This means, we are only considering variation that is not explained by other variables in the model. Partial η 2 indicates what percentage of this variation is explained by a variable. Slide 25 2 SSEffect Partial η = SS + SS Effect Error In case of one-way ANOVA: Partial η 2 is the proportion of the corrected total variation that is explained by the model (= R 2 ). Example: Experience explains 49.8% of the previously unexplained variation. Note: The values of partial η 2 do not sum up to 100%! ( "partial") "Intercept" in SPSS In case of ANOVA, "Intercept" in SPSS refers to the grand mean. If the F-test for the grand mean is significant, this indicates that the grand mean differs significantly from 0. Slide 26 0 In our example, partial η 2 is.996 and thus very large. This indicates that the "grand mean" is large compared to the other variances. But: The focus of ANOVA lies on group differences. The grand mean itself is secondary. Thus partial η 2 of the "intercept" is not interpreted.

Parameter estimates Slide 27 SPSS sets the mean of one group artificially to 0 (default: last group) "SPSS coding" Coding as on slide 13 (as seen from a referece (as seen from grand mean) group, here experien=3) y 41.6 5.7 1k 2k = +ε 1k y1k = 38.6 2.7+ε 1k y 41.6 3.2 3k = +ε 2k y2k = 38.6 0.2+ε 2k y 41.6 0.0 add 3 to the means and substract 3 from the grand mean = + +ε 3k y3k = 38.6+ 3.0+ε 3k Multiple testing Post hoc comparisons I If H 0 is rejected, the group means will differ with a 95% probability. H 0: y1= y2 = y3 H : y A i j y for at least one pair ij Slide 28 Which of the groups are different? Dr. Sorglos thinks the risk of falling is only 5%! Why not simply compare means pairwise? Example: In the case of a rope with 20 knots, each knot has α = 5% as the probability of failure. All knots together, however, have a probability of failure of 1 - (1-0.05) 20 = 0.64. The risk of a deadly fall therefore is 64%! In order to keep this risk at the desired 5% level, each knot may not exceed the probability of failure of α Β = α/number of knots = 5%/20 = 0.25%. Cartoon: Dubben, H.-H.(2006): Der Hund, der Eier legt : Erkennen von Fehlinformation... 6. Auflage, Rowohlt, Hamburg.

Multiple testing Post hoc comparisons II There are a several methods for comparing the groups. All methods are similar, however, in that they solve the problem of multiple testing. Slide 29 Example Bonferroni correction If k means are tested in connection with each other, it becomes necessary to conduct n = k (k 1)/2 tests. In order to keep significance levels the same for the entire test, each test must be conducted using error probability α/n. Multiple testing Post hoc comparisons III 1.3 10-4 Slide 30 Groups 1 and 2 have a significant difference (p =.000) Groups 2 and 3 have a significant difference (p =.000) Groups 3 and 1 have a significant difference (p =.000) As a comparison: A t-test with Groups 1 and 2 as independent samples produces also p =.000 But the precise p-values show that the t-test is too optimistic Bonferroni adjusted test (Groups 1 and 2): p = 1.3 10-4 t-test (Groups 1 and 2): p = 4.2 10-8 p-value of t-test considerably lower

Two-way ANOVA Slide 31 SPSS: AnalyzeGeneral Linear ModelUnivariate... Interaction Slide 32 Interaction term between fixed factors is calculated by default in ANOVA Example interpretation (among other duty descriptions): There is also an interaction of experience and position on, F(2, 90) = 18.991, p =.000, partial η 2 =.297. The interaction term experien * position explains 29.7% of the previously unexplained variance.

Interaction Do different levels of experience influence the impact of different levels of position differently? Yes, if experience has values 2 or 3 then the influence of position is raised. Slide 33 office hospital Simplified: Lines not parallel Interpretation: Experience is more important in hospitals than in offices. More on interaction Slide 34 Main effect of experien Main effect of position Interaction Main effect of experien Main effect of position Interaction Main effect of experien Main effect of position Interaction experien experien experien Main effect of experien Main effect of position Interaction Main effect of experien Main effect of position Interaction Main effect of experien Main effect of position Interaction experien experien experien