Analysis of Variance ANOVA ANalysis Of VAriance generalized t-test compares means of more than two groups fairly robust based on F distribution, compares variance two versions single ANOVA compare groups along 1 dim, eg school classes two-way ANOVA, etc compare groups along > 1 dim, eg school classes and sex ÊÊÇÈÉÊ 1 Analysis of Variance Typical applications single ANOVA compare time needed for lexical recognition in healthy adults, patients with Wernicke s aphasia, patients with Broca s aphasia two-way ANOVA compare lexical recognition time in male and female in same three groups ÊÊÇÈÉÊ 2
Comparing Multiple Means for two groups: t-test testing at p = 005 shows significance 1 time in 20 if there is no difference in population mean (effect of chance) but suppose there are 7 groups, ie, `7 2 = 21 pairs caution: several tests (on same data) run the risk of finding significance through sheer chance ÊÊÇÈÉÊ 3 Phony Significance through Multiple Tests Example: Suppose you run three tests, always seeking a result significant at 005 The chance of finding this in one of the three is Bonferroni s family-wise α-level α FW = 1 (1 α) n = 1 (1 05) 3 = 1 (95) 3 = 1 857 = 0143 to guarantee a family-wise alpha of 005, divide this by number of tests Example: 005/3 = 0017 (set α at 0017) note: 0983 3 095 ANOVA indicated, takes group effects into account ÊÊÇÈÉÊ 4
Analysis of Variance based on F distribution F distribution Moore & McCabe, 73, pp435-445 measures difference between two variances (variance = σ 2 ) always positive, since variance positive two degrees of freedom interesting, one for s 1, one for s 2 F = s2 1 s 2 2 ÊÊÇÈÉÊ 5 F -Test vs F Distribution F = s2 1 s 2 2 used in F -test H 0 : samples from same distribution (s 1 = s 2 ) H a : samples from diff distribution (s 1 s 2 ) value 1 indicates same variance values near 0 or + indicate diff F -test very sensitive to deviations from normal ANOVA uses F distribution, but is different ANOVA F -test! ÊÊÇÈÉÊ 6
F Distribution Critical area for F -distribution at p = 005 Note symmetry: P( s2 1 s 2 2 > x) = P( s2 2 s 2 1 < 1 x ) ÊÊÇÈÉÊ 7 F -test Example: height group sample mean std dev size boys 16 180cm 6cm girls 9 168 4 is the difference in std dev significant? (α = 005) examine F = s 2 boys s 2 girls degrees of freedom: s boys 16 1 s girls 9 1 ÊÊÇÈÉÊ 8
F -test Critical Area (for 2-Tailed Test) P(F(15, 8) > f) = α 2 = 005 2 = 0025 P(F(15, 8) < f) = 1 0025 P(F(15, 8) < 41) = 0975 from M&M, Tbl E, p706 (no values directly for P(F(df 1, df 2 ) > f)) P(F(15, 8) < x) = α 2 (= 0025) = P(F(8, 15) > x ) = α 2 x = 1 x = P(F(8, 15) > x ) = 0025 x = 1 x = P(F(8, 15) > 32)(tables) P(F(15, 8) < 1 32 ) = 0025 P(F(15, 8) < 031) = 0025 Reject H 0 if F < 031 or F > 41 Here, F = 62 42 = 225 (no evidence of diff in distribution) ÊÊÇÈÉÊ 9 ANOVA Analysis of Variance (ANOVA) most popular statistical test for numerical data several types single one-way two-, three-, n-way multiple ANOVA, MANOVA, repeated measures examines variation between-groups sex, age, within-subject, within-groups overall automatically corrects for looking at several relationships (like Bonferroni correction) uses F test, where F(n, m) fixes n typically at number of groups (less 1), m at number of subjects (data points) (less number of groups) ÊÊÇÈÉÊ 10
One-Way ANOVA to Analyse Function of Reviews Example: Gisela Redeker identified three roles for literary book reviews in newspapers Taalbeheersing 21(4), 1999, 295-310: communicate emotional reactions, subjective opinions communicate expert opinion, objective facts motivate reading and purchasing of book She investigated whether different reviewers emphasized different roles: Tom van Deel (Trouw), Arnold Heumakers (de Volkskrant), and Carel Peeters (Vrij Nederland) stylistic elements indicate one of the three functions, eg, ik, maar nee, ben ik bang, lijkt, eerlijk gezegd, ik bedoel, indicate subjective opinions; logical connectives want, temeer dat, and quotes indicate an objective point of view; etc Nb validating link between style and perspectives is important (see Redeker) ÊÊÇÈÉÊ 11 Redeker on Literary Criticism s Functions Gisela Redeker investigates role of lit criticism, asking whether different critics did not differ in the degree to which they emphasize one or another role Sample: reviews of the same books (by Hermans, Heijne and Mulisch), all published 1989-92 Similar in length Data: relative frequency of, eg, reader-oriented elements (per 1, 000 words) We are comparing three averages, asking whether their is a difference Because she compared more than two averages ANOVA is needed ÊÊÇÈÉÊ 12
Relative Frequency of Reader-Oriented Elements elements Critic van Deel Heumakers Peeters evocative 124 108 151 questions 32 0 61 dramquotes 69 129 82 intensifiers 259 301 382 ref to reader 36 57 117 Totals 260 298 397 H 0 : µ 1 = µ 2 = µ 3 H a : µ 1 µ 2 or µ 1 µ 3 or µ 2 µ 3 Results: statistically significant difference (p < 002) Similar comparisons for subjective style, and argumentative style (differences present, not statistically significant) ÊÊÇÈÉÊ 13 One-Way ANOVA Question: Are exam grades of four groups of foreign students Nederlands voor anderstaligen the same? More exactly, are four averages the same? H 0 : µ 1 = µ 2 = µ 3 = µ 4 H a : µ 1 µ 2 or µ 1 µ 3 or µ 3 µ 4 ie, alternative: at least one group has different mean for the question of whether any particular pair is the same, the t-test is appropriate for testing whether all language groups are the same, pairwise t-tests will exaggerate differences (increase the chance of type I error) we want to apply 1-way ANOVA ÊÊÇÈÉÊ 14
Data: Dutch Proficiency of Foreigners Four groups of ten each: Groups Eur Amer Af ri Asia 10 33 26 26 19 21 25 21 31 20 15 21 Mean 250 219 231 213 Samp SD 814 661 592 690 Samp Variance 6622 4366 3499 4757 ÊÊÇÈÉÊ 15 Anova Conditions Assumption: normal distribution per group, check with normal quantile plot, eg, for Europeans (and to be repeated for every group): NormalQ-Q Plotoftoets nlvoor anderstalige 20 1 5 1 0 5 Expected NormalValue 00-5 -1 0-1 5-20 0 10 20 30 40 Observed Value ÊÊÇÈÉÊ 16
Anova Conditions Normal Quantile plot for all values: 3 NormalQ-Q Plot of toets nl voor anderstalige 2 1 Expected Normal Value 0-1 -2-3 0 10 20 30 40 Observed Value ÊÊÇÈÉÊ 17 Anova Conditions ANOVA assumptions: normal distribution per subgroup same variance in subgroups: least sd > one-half of greatest sd independent observations: watch out for test-retest situations! Check differences in SD s! (some SPSS computing ) Valid Variable Std Dev N Label Europa 814 10 America 661 10 Africa 592 10 Azie 690 10 ÊÊÇÈÉÊ 18
Visualizing Anova Is there any significant difference in the means (of the groups being contrasted)? 40 30 toets nlvoor anderstalige 20 10 0 N = 10 Europa 10 America 10 Africa 10 Azie gebied van afkomst Take care that boxplots sketch medians, not means ÊÊÇÈÉÊ 19 Sketch of Anova Groups 1 2 3 4 = I Eur Amer Afri Asia x 1,j x 2,j x 3,j x 4,j Sample Mean x 1 x i 9 >= I number of groups >; For any data point x i,j (x i,j x) = ( x i x) + (x i,j x i ) total residue = group diff + error ANOVA question: is it sensible to include the group ( x i )? ÊÊÇÈÉÊ 20
Two Variances Reminder of high-school algebra: (a + b) 2 = a 2 + b 2 + 2ab A AB A 2 B B 2 AB ÊÊÇÈÉÊ 21 Two Variances (a + b) 2 = a 2 + b 2 + 2ab (x i,j x) = ( x i x) + (x i,j x i ) (x i,j x) 2 = ( x i x) 2 + (x i,j x i ) 2 + 2( x i x)(x i,j x i ) Sum over elements in i-th group: (x i,j x) 2 = ( x i x) 2 + (x i,j x i ) 2 + 2( x i x)(x i,j x i ) ÊÊÇÈÉÊ 22
Two Variances Note that this term must be zero: 2( x i x)(x i,j x i ) Since: 2( x i x)(x i,j x i ) = 2( x i x) (x i,j x i ) and (x i,j x i ) = 0 ÊÊÇÈÉÊ 23 Sketch of Anova (x i,j x) 2 = ( x i x) 2 + (x i,j x i ) 2 (+ 2( x i x)(x i,j x i ) = 0) Therefore: (x i,j x) 2 = ( x i x) 2 + (x i,j x i ) 2 SST = SSG + SSE ÊÊÇÈÉÊ 24
Anova Terminology For any data point x i,j (x i,j x) = ( x i x) + (x i,j x i ) total residue = group diff + error (x i,j x) 2 = ( x i x) 2 + (x i,j x i ) 2 SST SSG + SSE Total Sum of Squares = Group Sum of Squares + Error Sum of Squares and DFT DFG + DFE (n 1) = (I 1) + (n I) Total Degrees of Freedom = Group Degrees of Freedom + Error Degrees of Freedom ÊÊÇÈÉÊ 25 Variances are Mean Squared Differences to Mean Note that (x i,j x) 2 n 1 is a variance, and likewise SST/DFT SSG/DFG (=MSG) & SSE/DFE (=MSE) In ANOVA, we compare MSG (variance betwee groups) and MSE (variance within groups), ie we measure F = MSG MSE If this is large, differences between groups overshadow differences within groups ÊÊÇÈÉÊ 26
Two Variances H 0 : µ 1 = µ 2 = µ 3 = µ 4 ANOVA: calculate MSG (σ 2 between groups) and MSE (σ 2 within groups), ie measure we F = MSG MSE If this is large, differences between groups overshadow differences within groups ÊÊÇÈÉÊ 27 Two Variances 1 estimate pooled variance of population (MSE) P Pi G dfi s2 i i G df i = (n 1 1)s2 1 +(n 2 1)s2 2 +(n 3 1)s2 3 +(n 4 1)s2 4 (n 1 1)+(n 2 1)+(n 3 1)+(n 4 1) = 9 6622+9 4366+9 3499+9 4757 9+9+9+9 = 59598+39294+31491+42813 36 = 4811 estimates variance in groups (using df ), aka within-groups estimate of variance 2 suppose H 0 true (a) then group have sample means µ, variance σ 2 /10, (& sd σ/ 10) (b) 4 means, 250, 219, 231, 213, where s = 163, s 2 = 266 (c) s 2 estimate of σ 2 /10, ie, 10 s 2 is estimate of σ 2 ( s 2 = 266) (d) this is between-groups variance (MSG) ÊÊÇÈÉÊ 28
Interpreting Estimates via F if H 0 true, then we have two variances: between-groups estimate s 2 b (266) and within-groups estimate s 2 w (4811) and their ratio s2 b s 2 follows an F distribution with w ( groups 1)dF from s 2 b (3), (n 4)dF from s 2 w (36) in this case, 2662 4811 = 055 P(F(3, 30) > 292) = 005, (see tables), so no evidence of nonuniform behavior ÊÊÇÈÉÊ 29 ANOVA Summary Summary to-date (exam results for NL voor anderstalige) Source df SS M SS F between-g 3 799 266 F(3,36) = 055 within-g 36 17319 481 Total 39 18118 P(F(3, 30) > 292) = 005, (see tables) so no evidence of nonuniform behavior ÊÊÇÈÉÊ 30
SPSS Summary - - - - - O N E W A Y - - - - - Variable NL_NIVO toets nl voor anderstalige By Variable GROUP gebied van afkomst Analysis of Variance Sum of Mean F F Source DF Squares Squares Ratio Prob Between Groups 3 799 266 55 65 Within Groups 36 17319 481 Total 39 18118 ÊÊÇÈÉÊ 31 Other Questions ANOVA has H 0 : µ 1 = µ 2 = = µ n But sometimes particular contrasts important eg, are Europeans better (in learning Dutch)? Distinguish (in reporting results): prior contrasts questions asked before data collected and analyzed post-hoc (posterior) questions questions after collection and analysis data-snooping is exploratory, cannot contribute to hypothesis testing ÊÊÇÈÉÊ 32
Prior Contrasts Questions asked before collection and analysis eg, are Europeans better (in learning Dutch)? Another formulation: is H a : µ Eur (µ Am = µ Afr = µ Asia ) where H 0 : µ Eur = (µ Am + µ Afr + µ Asia ) Reformulation: 0 = µ Eur + 033µ Am + 033µ Afr + 033µ Asia SPSS requires this (reformulated) version ÊÊÇÈÉÊ 33 Prior Contrasts in SPSS (the mean of) every group gets a coefficient sum of coefficients is zero a t-test is carried out, & two-tailed p value is reported (as usual) Eur Am Afr Azie Contrast 1-10 3 3 3 Pooled Variance Estimate Value S Error T Value DF T Prob Contrast 1-29 253-115 36 260 No significant difference here (of course) Note: prior contrasts are legitimate as hypothesis tests as long as they are formulated before collection and analysis ÊÊÇÈÉÊ 34
Post-hoc Questions Assume H 0 rejected: which means are distinct? Data-snooping problem: in large set, some distinctions are likely to be stat significant But we can still look (we just cannot claim to have tested the hypothesis) We are asking whether m 1 m 2 is significantly larger, we apply a variant of the t-test The relevant sd is p MSE/n (differences among scores), but there s a correction since we re looking at half the scores in any one comparison ÊÊÇÈÉÊ 35 SD in Post-hoc ANOVA Questions NB SD (among diff in groups i and j): sd δ = v u tmse +N j N = + N j s 481 10+10 40 10 + 10 = s 481 2 20 = r 2405 20 = 49/ 20 and the t value is calculated as p/c where p is the desired significance level and c is number of comparisons For pairwise comparisons, c = `I 2 ÊÊÇÈÉÊ 36
Post-hoc Questions in SPSS SPSS Post-Hoc Bonferroni searches among all groupings for statistically significant ones - - - - - O N E W A Y - - - - - Variable NL_NIVO toets nl voor anderstalige By Variable GROUP gebied van afkomst Multiple Range Tests: Modified LSD (Bonferroni) test w signif level 05 The difference between two means is significant if MEAN(J)-MEAN(I) >= 49045 * RANGE * SQRT(1/N(I) + 1/N(J)) with the following value(s) for RANGE: 395 - No two groups significantly different at 05 level Homogeneous Subsets (highest \& lowest means not sig diff) Group Azie America Africa Europa Mean 213 219 231 250 but in this case there are none (of course) ÊÊÇÈÉÊ 37 How to Win at ANOVA Note ways in which F ratio increases (becomes more significant) F = MSG MSE 1 MSG increases, differences in means grow larger 2 MSE decreases, overall variation grows smaller ÊÊÇÈÉÊ 38
Two Models for Grouped Data First model x i,j = µ + ǫ i,j x i,j = µ + α i + ǫ i,j no group effect each datapoint represents error (ǫ) around a mean (µ) Second model real group effect each datapoint represents error (ǫ) around an overall mean (µ) combined with a group adjustment (α i ) ANOVA: is there sufficient evidence for α i? ÊÊÇÈÉÊ 39 Next: Two-way ANOVA ÊÊÇÈÉÊ 40