experimenteel en correlationeel onderzoek

expermenteel en correlatoneel onderzoek lecture 6: one-way analyss of varance Leary. Introducton to Behavoral Research Methods. pages 246 271 (chapters 10 and 11): conceptual statstcs Moore, McCabe, and Crag. Introducton to the Practce of Statstcs. pages 637 655 (chapter 12): one-way analyss of varance pages 661 664 (chapter 12): multple comparsons addtonal texts: 6, 7, and 8 Frank Busng, Leden Unversty, the Netherlands 1/40 ntroducton relaton between consumpton of alcohol and partner selecton n cafe s, dsco s, or nght-clubs research queston does subjectve percepton of physcal attractveness becomes more naccurate wth an ncreased alcohol level? three expermental groups (ndependent, categorcal varable, alcohol): 1 1. no alcohol group: some alcohol free beers (no alcohol whatsoever) 2. low alcohol group: some regular beers 3. hgh alcohol group: some strong beers (nce Belgum trples) dependent, nterval varable (attractveness): some objectve measure of the attractveness of the partner selected at the end of the evenng (values between 0 and 100) note: random assgnment to groups 2/40

hypothess research queston: s there a dfference n attractveness scores between the no alcohol, low alcohol, or hgh alcohol populatons? hypothess (always n terms of populaton parameters) H 0 : µ 1 = µ 2 = µ 3 H a : at least one µ µ j are the populaton means the same? or s there a dfference between at least two populaton means? 2 of course there s, but 3 s the dfference bg enough to dstngush t from samplng varablty? or s the dfference bg enough to pass some crtcal value? note: µ s the mean of populaton note: nformaton on varaton (CI) and sample sze (samplng dstrbuton) 3/40 soluton: t-tests we mght use a seres of t-tests n that case we have to do k(k 1)/2 tests (here 3): 4 1 test no alcohol versus low alcohol 2 test no alcohol versus hgh alcohol 3 test low alcohol versus hgh alcohol ths s possble, but holds a serous (type I error, type II error, power) problem, whch wll be solved later (when dscussng multple comparsons) note: k = number of groups 4/40

soluton: F-test 90 80 70 60 50 40 30 20 10 0 between-group varablty wthn-group varablty no low hgh total total varablty splt the total sum-of-squares (SST) n a between-groups sum-of-squares (SSG) and a wthn-groups sum-of-squares (SSE) 5 SST = SSG + SSE the proporton varance explaned or the varance accounted for by the groups s VAF = SSG/SST note: sum-of-squares s used as a measure of varablty 5/40 soluton: F-test 90 between-group varablty 90 between-group varablty 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 0 wthn-group varablty no low hgh 10 0 wthn-group varablty no low hgh between wthn = small between wthn = large 6/40

analyss of varance one-way analyss of varance one-way analyss of varance s an approprate analyss method for a study wth one quanttatve outcome varable and one categorcal explanatory varable 7/40 analyss of varance ANOVA Attractveness of Date Sum of Squares df Mean Square F Sg. Between Groups 3101.042 2 1550.521 13.426.000 Wthn Groups 5196.875 45 115.486 Total 8297.917 47 hypothess: H 0 : µ 1 = µ 2 = µ 3 H a : at least one µ µ j statstcal concluson: H 0 s rejected, snce F > F or, equvalently, p < α substantve concluson: at least two alcohol groups dffer sgnfcantly n the mean attractveness of the partner at the end of the evenng 8/40

notaton source SS DF MS F p between groups SSG DFG MSG F p wthn groups SSE DFE MSE total SST DFT k s the number of groups (sometmes denoted as I) n s the sze of group N = n 1 +...+n k s the total sample sze x j s the score of subject j n group x s the mean of the scores for group (also denoted as x. ) x s the overall mean of the scores x j (sometmes denoted as x.. ) k n j s the sum over all subjects, arranged per group 9/40 total source SS DF MS F p between groups wthn groups total j (x j x) 2 N 1 SST = k n (x j x) 2 j DFT = N 1 SST DFT = total varance 90 80 70 60 50 40 30 20 SSG 10 0 SSE SST no low hgh total 6 note: SS s the sum of N squared dfferences 10/40

between groups source SS DF MS F p between groups n (x x) 2 k 1 SSG/DFG wthn groups total SSG = = k n (x x) 2 j k n (x x) 2 DFG = k 1 MSG = SSG DFG j (x j x) 2 N 1 90 80 70 60 50 40 30 20 10 0 SSE SSG SST no low hgh total 7 note: SS s the sum of N squared dfferences 11/40 wthn groups source SS DF MS F p between groups n (x x) 2 k 1 SSG/DFG wthn groups j (x j x ) 2 N k SSE/DFE total j (x j x) 2 N 1 SSE = k n (x j x ) 2 j DFE = N k MSE = SSE DFE 90 80 70 60 50 40 30 20 SSG 10 0 SSE SST no low hgh total 8 note: SS s the sum of N squared dfferences 12/40

wrappng up source SS DF MS F p between groups n (x x) 2 k 1 SSG/DFG MSG/MSE table wthn groups j (x j x ) 2 N k SSE/DFE total j (x j x) 2 N 1 F = MSG MSE and remember that SST = SSG+SSE DFT = DFG+DFE = (N k)+(k 1) = N 1 p can be found n the table for crtcal values (F-dstrbuton) at F (DFG,DFE) 13/40 notes on F-test F = MSG/MSE f there s no effect for the ndependent varable (groups) then MSG estmates the same as MSE, or even less, because the group means are about equal to the overall mean, and the rato F 1.0 but, on the other hand f there s an effect for the ndependent varable (groups) then MSG wll be greater than MSE and the rato F > 1.0 possbly even larger than some crtcal value F crtcal values are found at F (DFG,DFE) = F (numerator,denomnator) the F-test n an ANOVA s an omnbus test t s an overall test that checks for at least one dfference although the hypothess s two-sded: dfferences n populaton means, the F-test n an ANOVA s always one-sded above all, a sgnfcant larger MSE s not what we re lookng for 14/40

effect szes suppose we found a sgnfcant result (F > F ) and at least one populaton mean dffers from another populaton mean s there an effect? does t mean anythng? s the dfference bg enough for mpact? what s the sze of the effect? measures for assocaton strength or proportonate reducton n error (PRE) frst, remember that test statstc = effect sze sample sze we may thus use an effect sze, but: 9 effect sze usage always look at the substantve sgnfcance of the results by placng them n a meanngful context and quantfyng ther contrbuton to knowledge note: IQ versus length 15/40 effect szes effect szes for one-way analyss of varance η 2 (eta squared) η 2 = SSG SST η 2 s the analyss-of-varance-name of R 2 = VAF = COD = η 2 η 2 s based on sample statstcs and often overestmates the populaton value 10 ω 2 corrects for ths overestmaton ω 2 (omega hat squared) ω 2 = SSG - DFG MSE SST + MSE ω 2 < η 2 snce the numerator decreases and the denomnator ncreases note: effect szes: small:.01; medum:.06; large:.14 16/40

one-way analyss of varance general lnear model for one-way ANOVA x j = µ+α + ǫ }{{}}{{} j }{{} data = ft + resdual assumpton: ǫ j N(0,σ), where σ s the common standard devaton compare: Moore, McCabe, and Crag: x j = µ +ǫ j, where µ = µ+α compare: multple regresson model: y = ŷ +ǫ, where ŷ = b 0 +b j x j components: µ s estmated by x = 1 N k n j x j µ s estmated by x = 1 n n j x j effect parameter α s estmated by x x 17/40 degrees of freedom (DF) source sze mean std.devaton no alcohol 16 63.75 8.466 low alcohol 16 64.69 9.911 hgh alcohol 16 47.19 13.288 total 48 58.54 13.287 DFG = k 1 DFE = N k DFT = N 1 source SS DF MS F p between groups 2 wthn groups 45 total 47 18/40

sum-of-squares between groups (SSG) source sze mean std.devaton no alcohol 16 63.75 8.466 low alcohol 16 64.69 9.911 hgh alcohol 16 47.19 13.288 total 48 58.54 13.287 SSG = k n (x x) 2 = k n (x x) 2 = k n α 2 j effect parameter α 1 = x 1 x = 63.75 58.54 = 5.21 effect parameter α 2 = x 2 x = 64.69 58.54 = 6.15, etc. SSG = 16 5.21 2 +16 6.15 2 +16 11.35 2 = 3101.042 source SS DF MS F p between groups 3101.042 2 wthn groups 45 total 47 19/40 total sum-of-squares (SST) source sze mean std.devaton no alcohol 16 63.75 8.466 low alcohol 16 64.69 9.911 hgh alcohol 16 47.19 13.288 total 48 58.54 13.287 SST = 1 N 1 SST = 1 N 1 k n (x j x) 2 j k n (x j x) 2 = s 2 total = total varance j so SST = (N 1)s 2 total = 47 13.2872 = 8297.917 source SS DF MS F p between groups 3101.042 2 wthn groups 45 total 8297.917 47 20/40

sum-of-squares wthn groups (SSE) source sze mean std.devaton no alcohol 16 63.75 8.466 low alcohol 16 64.69 9.911 hgh alcohol 16 47.19 13.288 total 48 58.54 13.287 f SST = SSG+SSE then SSE = SST SSG = 8297.917 3101.042 = 5196.875 or use the fact that MSE = s 2 p (the pooled sample varance) source SS DF MS F p between groups 3101.042 2 wthn groups 5196.875 45 total 8297.917 47 21/40 sde-step: pooled sample varance sde-step: pooled sample varance wthn-groups sum-of-squares = SSE = k n j (x j x ) 2 suppose for each group, we dvde the sum-of-squares by n 1 k 1 n n 1 j (x j x ) 2 = k s 2 = sum over group varances workng n the opposte drecton thus gves SSE from the group varances for example, a for 2 groups, SSE = (n 1 1)s 2 1 +(n 2 1)s 2 2 SSE DFE = MSE = (n 1 1)s 2 1 +(n 2 1)s 2 2 n 1 +n 2 2 = s 2 p = pooled sample varance note: see book for an explanaton of t 2 = F for two groups 22/40

wrappng up source sze mean std.devaton no alcohol 16 63.75 8.466 low alcohol 16 64.69 9.911 hgh alcohol 16 47.19 13.288 total 48 58.54 13.287 MSG = SSG/DFG MSE = SSE/DFE fnally F = MSG/MSE and p can be found n the table of crtcal values (F-dstrbuton) at F (DFG,DFE) source SS DF MS F p between groups 3101.042 2 1550.521 13.426 0.000 wthn groups 5196.875 45 115.486 total 8297.917 47 23/40 analyss n steps 1 check assumptons ndependence of resduals homogenety of varances normalty of errors 2 run analyss by hand or wth SPSS compute effect sze 3 nterpret results statstcal concluson based on F and effect sze substantve concluson 4 perform addtonal tests multple comparsons 5 report results 24/40

step 1: check assumptons ndependence of resduals (ndependent ǫ j ) why? ndcaton for wrong (lnear) model, correct estmaton of parameters how? plot resduals aganst number, predcted outcomes, and predctors homogenety of populaton varances (σ 2 1 =... = σ2 k ) why? pooled sample varance estmates wthn-groups σ 2 how? largest less than twce the smallest standard devaton (or test) normalty of error dstrbuton (ǫ j N(0,σ)) why? for nferental purpose how? QQ-plot or (modfed) box-plot f one of the assumptons fals: 1 re-check the data for outlers and other anomales or; 2 transform the data or; 3 use nonparametrc analyss technques 25/40 step 2: run analyss dependent varable (lst): attractveness of date ndependent varable (factor): alcohol consumpton 26/40

step 3: nterpret results (tables) Levene Statstc Test of Homogenety of Varances Attractveness of Date 2.767 df1 2 df2 45 Sg..074 ANOVA Attractveness of Date Sum of Squares df Mean Square F Sg. Between Groups 3101.042 2 1550.521 13.426.000 Wthn Groups 5196.875 45 115.486 Total 8297.917 47 compute by hand: effect sze η 2 = SSG/SST = 0.374 effect sze ω 2 = (SSG - DFG MSE)/(SST + MSE) = 0.341 27/40 step 3: nterpret results (plots) Mean of Attractveness of Date 65 60 55 50 45 x 1 { 2 { x 2 x 3 { 3 No Low Alcohol Consumpton Hgh one-way anova means plot wth group means x, overall mean x, and effect parameters α 28/40

step 4: multple comparsons suppose, the ANOVA omnbus F-test ndcates at least one dfference multple comparsons s a seres of two-sded tests to solate the dfferences two ways of testng all dfferences: 11 1 a seres of two-sded t-tests 2 smultaneous confdence ntervals n ths case, there are k = 3 groups, so we test 1 H 0 : µ 1 = µ 2 versus H a : µ 1 µ 2 2 H 0 : µ 1 = µ 3 versus H a : µ 1 µ 3 3 H 0 : µ 2 = µ 3 versus H a : µ 2 µ 3 n general, the number of t-tests or confdence ntervals equals k (k 1)/2 note: multple comparsons are also called post-hoc, a posteror, or follow-up tests 29/40 multple comparsons problem problem: ncreased type I error α (alpha) wth a test-wse type I error α =.05 we are wrong n 5% of the cases: reject H 0 whle H 0 s true we are rght n 95% of the cases: accept H 0 whle H 0 s true the probablty of makng 3 correct decsons n a row s (chance rule 5) 12.95.95.95 =.857 the actual famly-wse type I error s then gven by 1.857 =.143 whch s much larger than.05 note: problem s even more serous because these t-tests are not ndependent 30/40

multple comparsons soluton soluton: decrease α lower the type I error α or ncrease the p value for each test conducted multple comparsons dffer n the way these values are adjusted Bonferron ether dvde α or multply p by the number of tests thus wth 3 tests the Bonferron t s based on α =.05/3.0167 and the famly-wse type I error s then gven by 1.9833.9833.9833 =.0492 a lttle conservatve (.0492 <.05), but much better than the.143 type I error level =.05 /3=.0167 px3=.045 p=.015 t* t** t and compare wth t =.05 and compare wth 31/40 multple comparsons consequence consequence: ncreased type II error and decreased power whle lowerng the type I error, we ncrease type II error and lower power because we need bgger dfferences to fnd sgnfcant results H 0 = type I error =.05 =.0167 H a = type II error power 32/40

multple comparsons: two-sded t-tests note that the pooled sample varance uses the nformaton on all groups not only from the two groups that are compared t j = x x j s p 1 n + 1 n j = t for group versus group j f t j t, the populaton means µ and µ j are dfferent t uses the Bonferron correcton and s thus found at t (1 [α/2]/tests,dfe) for example, compare the low and hgh alcohol groups wth n 2 = 16,n 3 = 16,x 2 = 64.69,x 3 = 47.19,s 2 p = MSE = 115.486, and α = 0.05 t j = 64.69 47.19 115.486 0.125 = 17.5 3.799 = 4.606 t (α,dfe) = t (0.008333,45) t (0.005,40) = 2.704 (conservatve choce) 33/40 multple comparsons: smultaneous confdence ntervals a smultaneous confdence nterval for the dfferences between means m = t 1 s p + 1 n n j ths margn of error s called the mnmum sgnfcant dfference (MSD) for example, the MSD for the low and hgh alcohol groups wth n 2 = 16,n 3 = 16,s 2 p = MSE = 115.486, and α = 0.05 m = t 1 s p + 1 = 2.704 115.486 0.125 = 10.274 n n j f the absolute mean dfference, x 2 x 3 = 64.69 47.19 = 17.50 s larger than the margn of error 10.274, the dfference s sgnfcantly dfferent from zero 34/40

multple comparsons: smultaneous confdence ntervals f computed by hand the mnmum sgnfcant dfference s especally useful when group szes are equal n that case, only one mnmum sgnfcant dfference needs to be computed group means x 1 = 63.75, x 2 = 64.69, and x 3 = 47.19 provde the followng table wth absolute mean dfferences no alcohol low alcohol hgh alcohol no alcohol - 0.94 16.56 low alcohol 0.94-17.50 hgh alcohol 16.56 17.50 - absolute mean dfferences larger than the margn of error 10.274 are sgnfcantly dfferent from zero 35/40 multple comparsons: Bonferron by SPSS 36/40

multple comparsons: Bonferron by SPSS Attractveness of Date Bonferron (I) Alcohol Consumpton No Low Hgh (J) Alcohol Consumpton Low Hgh No Hgh No Low *. The mean dfference s sgnfcant at the 0.05 level. Multple Comparsons Mean Dfference (I-J) Std. Error Sg. 95% Confdence Interval Lower Bound Upper Bound -.938 3.799 1.000-10.39 8.51 16.563 * 3.799.000 7.11 26.01.938 3.799 1.000-8.51 10.39 17.500 * 3.799.000 8.05 26.95-16.563 * 3.799.000-26.01-7.11-17.500 * 3.799.000-26.95-8.05 column Mean Dfference (I-J): dfference between means of group I and J column Std.Error: s p 1/n +1/n j = MSE 1/n +1/n j column Sg.: the t-test p-value tmes the number of tests = p tests column 95% CI: (x x j )±t SE, where t s based on α = (0.05/2)/3 37/40 other post-hoc tests many methods keep the famly-wse type I error under control at the cost of an ncreased type II error (and a decreased power) the followng lst runs from most lberal to most conservatve: 1 Fsher s least sgnfcant dfference (LSD): no correcton whatsoever 2 Duncan s new multple range test: famly-wse α = 1 (1 α) tests - 1 3 Dunnett s test: reference group aganst the rest 4 Tukey s range test: based on studentzed range dstrbuton q 5 Sdak correcton: famly-wse α = 1 (1 α) 1/tests 6 Bonferron correcton: famly-wse α = α/tests 7 Scheffé s method methods are more or less senstve to volaton of assumptons such as homogenety of varances, normalty, and ndependence 38/40

step 5: report results report (n text) There was a statstcally sgnfcant dfference between groups as determned by one-way ANOVA (F(2,45) = 13.426,p =.000). A Bonferron post-hoc test revealed statstcally sgnfcant dfferences between groups: The hgh alcohol group (47.19 ± 13.288) attracted statstcally sgnfcantly less attractve partners than the no alcohol group (63.75±8.466,p =.000) and the low alcohol group (64.69±9.911,p =.000). There were no statstcally sgnfcant dfferences between the no alcohol and the low alcohol group (p = 1.000). 39/40 overvew fnally analyss of varance tests dfferences between groups on a numercal varable by comparng wthn and between group varances n an F-test test results are summarzed n an ANOVA table effect szes can be computed from the ANOVA table table content can be found usng raw data t can also be found usng aggregated data (means and varances) post-hoc tests (afterwards) need specal attenton for nflated type I errors an analyss of varance follows a number of commonly accepted steps 40/40