One-way ANOVA Inference for one-way ANOVA

Size: px

Start display at page:

Download "One-way ANOVA Inference for one-way ANOVA"

Roderick Webster
5 years ago
Views:

1 One-way ANOVA Inference for one-way ANOVA IPS Chater W.H. Freeman and Comany

2 Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA The ANOVA model Testing hyotheses in one-way ANOVA The F-test The ANOVA table

3 The idea of ANOVA Reminders: A factor is a variable that can take one of several levels used to differentiate one grou from another. An exeriment has a one-way, or comletely randomized, design if several levels of one factor are being studied and the individuals are randomly assigned to its levels. Examle: Four levels of nematode quantity in seedling growth exeriment. Two seed secies and four levels of nematodes would be a two-way design. Analysis of variance (ANOVA) is the technique used to determine if there are any differences among the means of the treatment grous. One-way ANOVA is used for comletely randomized, one-way designs.

4 Comaring means We want to know if the observed differences in samle means are likely to have occurred by chance just because of random samling. This will likely deend on both the difference between the samle means and how much variability there is within each samle.

Reminder: Two-samle t statistic A two samle t-test assuming equal variance or an ANOVA comaring only two grous will give you the exact same -value (for a two-sided hyothesis).

5 Reminder: Two-samle t statistic A two samle t-test assuming equal variance or an ANOVA comaring only two grous will give you the exact same -value (for a two-sided hyothesis). H 0 : µ 1 = µ 2 H a : µ 1 µ 2 One-way ANOVA F-statistic H 0 : µ 1 = µ 2 H a : µ 1 µ 2 t-test assuming equal variance t-statistic F = t 2 and both -values are the same. But the t-test is more flexible: You may choose a one-sided alternative instead, or you may want to run a t-test assuming unequal variance if you are not sure that your two oulations have the same standard deviation σ.

6 An Overview of ANOVA We first examine the multile oulations or multile treatments to test for overall statistical significance as evidence of any difference among the arameters we want to comare using the ANOVA F-test If that overall test shows statistical significance, then a detailed follow-u analysis is legitimate. If we lanned our exeriment with secific alternative hyotheses in mind (before gathering the data), we can test them using contrasts. If we do not have secific alternatives, we can examine all air-wise arameter comarisons to define which arameters differ from which, using multile comarisons rocedures.

7 Nematodes and lant growth Do nematodes affect lant growth? A botanist reares 16 identical lanting ots and adds different numbers of nematodes into the ots. Seedling growth (in mm) is recorded two weeks later. Hyotheses: All µ i are the same (H 0 ) versus not All µ i are the same (H a )

8 The ANOVA model Random samling always roduces chance variations. Any factor effect would thus show u in our data as the factor-driven differences lus chance variations ( error ): Data = fit ( grou mean ) + residual ( error ) The one-way ANOVA model analyzes situations where chance variations are normally distributed N(0,σ) so that:

9 The ANOVA table Source of variation Sum of squares SS DF Mean square MS F P value Grous I -1 SSG/DFG MSG/MSE Tail area SSG = n i (x i x) 2 above F Error N - I SSE/DFE 2 SSE = (n i 1)s i Total SST=SSG+SSE N 1 = (x ij x) 2 R 2 = SSG/SST MSE = s Coefficient of determination Pooled standard deviation The sum of squares reresents variation in the data: SST = SSG + SSE. The degrees of freedom likewise reflect the ANOVA model: DFT = DFG + DFE.

Testing hyotheses in one-way ANOVA We have I indeendent SRSs, from I oulations or treatments. The i th oulation has a normal distribution with unknown mean µ i.

10 Testing hyotheses in one-way ANOVA We have I indeendent SRSs, from I oulations or treatments. The i th oulation has a normal distribution with unknown mean µ i. All I oulations have the same standard deviation σ, unknown. The ANOVA F statistic tests: F SSG ( I = SSE ( N 1) I) H 0 : µ 1 = µ 2 = = µ I H a : not all the µ i are equal. When H 0 is true, F has the F distribution with I 1 (numerator) and N I (denominator) degrees of freedom.

11 The ANOVA F-test The ANOVA F-statistic comares variation due to secific sources (levels of the factor) with variation among individuals who should be similar (individuals in the same samle). F = variation among samlemeans variation among individuals in samesamle Difference in means small relative to overall variability Difference in means large relative to overall variability è F tends to be small è F tends to be large Larger F-values tyically yield more significant results. How large deends on the degrees of freedom (I 1 and N I).

12 Checking our assumtions Each of the I oulations must be normally distributed (histograms or normal quantile lots). But the test is robust to normality deviations for large enough samle sizes, thanks to the central limit theorem. The ANOVA F-test requires that all oulations have the same standard deviation σ. Since σ is unknown, this can be hard to check. Practically: The results of the ANOVA F-test are aroximately correct when the largest samle standard deviation is no more than twice as large as the smallest samle standard deviation. (Equal samle sizes also make ANOVA more robust to deviations from the equal σ rule)

13 Do nematodes affect lant growth? Seedling growth x i s i 0 nematode nematodes nematodes nematodes Conditions required: equal variances: checking that largest s i no more than twice smallest s i Largest s i = 2.053; smallest s i = Indeendent SRSs Four grous obviously indeendent Distributions roughly normal It is hard to assess normality with only four oints er condition. But the ots in each grou are identical, and there is no reason to susect skewed distributions.

14 Excel outut for the one-way ANOVA Menu/Tools/DataAnalysis/AnovaSingleFactor Anova: Single Factor SUMMARY Grous Count Sum Average Variance 0 nematode nematodes nematodes nematodes numerator denominator ANOVA Source of Variation SS df MS F P-value F crit Between Grous Within Grous Total Here, the calculated F-value (12.08) is larger than F critical (3.49) for α=0.05. Thus, the test is significant at α=5% è Not all mean seedling lengths are the same; the number of nematodes is an influential factor.

SPSS outut for the one-way ANOVA ANOVA SeedlingLength Between Grous Within Grous Total Sum of Squares df Mean Square F Sig. 100.647 3 33.549 12.080.001 33.328 12 2.777 133.

15 SPSS outut for the one-way ANOVA ANOVA SeedlingLength Between Grous Within Grous Total Sum of Squares df Mean Square F Sig The ANOVA found that the amount of nematodes in ots significantly imacts seedling growth. The grah suggests that nematode amounts above 1,000 er ot are detrimental to seedling growth.

16 Using Table E The F distribution is asymmetrical and has two distinct degrees of freedom. This was discovered by Fisher, hence the label F. Once again, what we do is calculate the value of F for our samle data and then look u the corresonding area under the curve in Table E.

17 Table E df num = I 1 For df: 5,4 F df den = N I

ANOVA Source of Variation SS df MS F P-value F crit Between Grous 101 3 33.5 12.08 0.00062 3.

18 ANOVA Source of Variation SS df MS F P-value F crit Between Grous Within Grous Total F critical for α 5% is 3.49 F = > Thus < 0.001

19 Yogurt rearation and taste Yogurt can be made using three distinct commercial rearation methods: traditional, ultra filtration, and reverse osmosis. To study the effect of these methods on taste, an exeriment was designed where three batches of yogurt were reared for each of the three methods. A trained exert tasted each of the nine samles, resented in random order, and judged them on a scale of 1 to 10. Variables, hyotheses, assumtions, calculations? ANOVA table Source of variation SS df MS F P-value F crit Between grous 17.3 I-1= Within grous 4.6 N-I= Total 21.9

20 df num = I 1 df den = N I F

21 Comutation details F = MSG MSE = SSG ( I SSE ( N 1) I) MSG, the mean square for grous, measures how different the individual means are from the overall mean (weighted average of square distances of samle averages to the overall mean). SSG is the sum of squares for grous. MSE, the mean square for error is the ooled samle variance s 2 and estimates the common variance σ 2 of the I oulations (weighted average of the variances from each of the I samles). SSE is the sum of squares for error.

22 One-way ANOVA Comaring the means IPS Chater W.H. Freeman and Comany

23 Objectives (IPS Chater 12.2) Comaring the means Contrasts Multile comarisons Power of the one-way ANOVA test

24 You have calculated a -value for your ANOVA test. Now what? If you found a significant result, you still need to determine which treatments were different from which. You can gain insight by looking back at your lots (boxlot, mean ± s). There are several tests of statistical significance designed secifically for multile tests. You can choose a riori contrasts, or a osteriori multile comarisons. You can find the confidence interval for each mean µ i shown to be significantly different from the others.

25 Contrasts can be used only when there are clear exectations BEFORE starting an exeriment, and these are reflected in the exerimental design. Contrasts are lanned comarisons. Patients are given either drug A, drug B, or a lacebo. The lacebo is meant to rovide a baseline against which the other drugs can be comared. Multile comarisons should be used when there are no secific lanned comarisons. Those are a osteriori, air-wise tests of significance. We comare gas mileage for eight brands of SUVs. We have no rior knowledge to exect one brand to erform differently from the rest. Pairwise comarisons should be erformed here, but only if an ANOVA test on all eight brands reached statistical significance first. It is NOT aroriate to use a contrast test when suggested comarisons aear only after the data is collected.

26 Contrasts: lanned comarisons When an exeriment is designed to test one or more secific hyotheses that some treatments are different from other treatments, we can use contrasts to test for significant differences between these secific treatments, EVEN if the overall ANOVA F-test is not significant. Contrasts are more owerful than multile comarisons because they are more secific. They are more able to ick u a significant difference. You can use a t-test on the contrasts or calculate a t-confidence interval. The results are valid regardless of the results of your multile samle ANOVA test (you are still testing a valid hyothesis).

27 A contrast is a combination of oulation means of the form : ψ = a µ Where the coefficients a i have sum 0. The corresonding samle contrast is : c = a i x i The standard error of c is : SE c = s i 2 ai = n i i MSE a n 2 i i To test the null hyothesis H 0 : ψ = 0 use the t-statistic: t = c SE c With degrees of freedom DFE that is associated with s. The alternative hyothesis can be one- or two-sided. A level C confidence interval for the contrast ψ is : c ± t * SE c Where t* is the critical value defining the middle C% of the t distribution with DFE degrees of freedom.

28 Contrasts are not always readily available in statistical software ackages (when they are, you need to assign the coefficients a i ), or may be limited to comaring each samle to a control. If your software doesn t rovide an otion for contrasts, you can test your contrast hyothesis with a regular t-test using the formulas we just highlighted. Remember to use the ooled variance and degrees of freedom as they reflect the better estimate of the oulation variance.

29 Nematodes and lant growth Do nematodes affect lant growth? A botanist reares 16 identical lanting ots and adds different numbers of nematodes into the ots. Seedling growth (in mm) is recorded two weeks later. Nematodes Seedling growth , , , overall mean 8.03 x i One grou contains no nematodes at all. If the botanist lanned this grou as a baseline/control, then a contrast of all the nematode grous against the control would be valid.

30 Nematodes: lanned comarison Contrast of all the nematode grous against the control: Combined contrast hyotheses: H 0 : µ 1 = 1/3 (µ 2 + µ 3 + µ 4 ) H a : µ 1 > 1/3 (µ 2 + µ 3 + µ 4 ) x i s i G1: 0 nematode G2: 1,000 nematodes G3: 5,000 nematodes G4: 1,0000 nematodes Contrast coefficients: (+1 1/3 1/3 1/3) or ( ) c = a i xi = 3 * = SE c = s a n 2 i i = 2.78 * ( 1) + 3* t = c SE = df : N-I = 12 c In Excel: TDIST(3.6,12,1) = tdist(t, df, tails) (-value). Nematodes result in significantly shorter seedlings (alha 1%).

31 ANOVA vs. contrasts in SPSS ANOVA: SeedlingLength Between Grous Within Grous Total H 0 : all µ i are equal vs. H a : not all µ i are equal ANOVA Sum of Squares df Mean Square F Sig è not all µ i are equal Planned comarison: H 0 : µ 1 = 1/3 (µ 2 + µ 3 + µ 4 ) vs. H a : µ 1 > 1/3 (µ 2 + µ 3 + µ 4 ) à one tailed Contrast coefficients: ( ) Contrast 1 Contrast Coefficients NematodesLevel SeedlingLength Assume equal variances Does not assume equal variances Contrast 1 1 Contrast Tests Value of Contrast Std. Error t df Sig. (2-tailed) Nematodes result in significantly shorter seedlings (alha 1%).

32 Multile comarisons Multile comarison tests are variants on the two-samle t-test. They use the ooled standard deviation s = MSE, the ooled degrees of freedom DFE, and they comensate for the multile comarisons. We comute the t-statistic for all airs of means: A given test is significant (µ i and µ j significantly different), when t ij t** (df = DFE). The value of t** deends on which rocedure you choose to use.

33 The Bonferroni rocedure The Bonferroni rocedure erforms a number of airwise comarisons with t-tests and then multilies each -value by the number of comarisons made. This ensures that the robability of making one or more false rejections among all comarisons made is no greater than the chosen significance level α. As a consequence, the higher the number of air-wise comarisons you make, the more difficult it will be to show statistical significance for each test. But the chance of committing a tye I error also increases with the number of tests made. The Bonferroni rocedure lowers the working significance level of each test to comensate for the increased chance of tye I errors among all tests erformed.

34 Simultaneous confidence intervals We can also calculate simultaneous level C confidence intervals for all air-wise differences (µ i µ j ) between oulation means: CI 1 : ( xi x j ) ± t ** s + n i 1 n j q s is the ooled variance, MSE. q t** is the t critical with degrees of freedom DFE = N I, adjusted for multile, simultaneous comarisons (e.g., Bonferroni rocedure).

35 SYSTAT File contains variables: GROWTH NEMATODES$ Categorical values encountered during rocessing are: NEMATODES$ (four levels): 10K, 1K, 5K, none De Var: GROWTH N: 16 Multile R: Squared multile R: Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P NEMATODES$ Error Post Hoc test of GROWTH Using model MSE of with 12 df. Matrix of airwise mean differences: Bonferroni Adjustment Matrix of airwise comarison robabilities:

36 SigmaStat One-Way Analysis of Variance Normality Test: Passed (P > 0.050) Equal Variance Test: Passed (P = 0.807) Grou Name N Missing Mean Std dev SEM None K K K Source of variation DF SS MS F P Between grous <0.001 Residual Total Power of erformed test with alha = 0.050: All Pairwise Multile Comarison Procedures (Bonferroni t-test): Comarisons for factor: Nematodes Comarison Diff of means t P P<0.050 None vs. 10K Yes None vs. 5K Yes None vs. 1K No 1K vs. 10K Yes 1K vs. 5K Yes 5K vs. 10K No

37 SPSS SeedlingLength Between Grous Within Grous Total ANOVA Sum of Squares df Mean Square F Sig Multile Comarisons Deendent Variable: SeedlingLength Bonferroni (I) NematodesLevel (J) NematodesLevel *. The mean difference is significant at the.05 level. Mean Difference 95% Confidence Interval (I-J) Std. Error Sig. Lower Bound Uer Bound * * * * * * * *

Teaching methods A study comares the reading comrehension ( COMP, a test score) of children randomly assigned to one of three teaching methods: basal, DRTA, and strategies.

38 Teaching methods A study comares the reading comrehension ( COMP, a test score) of children randomly assigned to one of three teaching methods: basal, DRTA, and strategies. We test: H 0 : µ Basal = µ DRTA = µ Strat vs. H a : H 0 not true The ANOVA test is significant (α 5%): we have found evidence that the three methods do not all yield the same oulation mean reading comrehension score.

39 What do you conclude? The three methods do not yield the same results: We found evidence of a significant difference between DRTA and basal methods (DRTA gave better results on average), but the data gathered does not suort the claim of a difference between the other methods (DRTA vs. strategies or basal vs. strategies).

40 Power The ower, or sensitivity, of a one-way ANOVA is the robability that the test will be able to detect a difference among the grous (i.e. reach statistical significance) when there really is a difference. Estimate the ower of your test while designing your exeriment to select samle sizes aroriate to detect an amount of difference between means that you deem imortant. Too small a samle is a waste of exeriment, but too large a samle is also a waste of resources. A ower of at least 80% is often suggested.

41 Power comutations ANOVA ower is affected by The significance level α The samle sizes and number of grous being comared The differences between grou means µ i The guessed oulation standard deviation You need to decide what alternative H a you would consider imortant, detect statistically for the means µ i, and to guess the common standard deviation σ (from similar studies or reliminary work). The ower comutations then require calculating a non-centrality aramenter λ, which follows the F distribution with DFG and DFE degrees of freedom to arrive at the ower of the test.

42 Systat: Power analysis Alha = 0.05 Model = Oneway If we anticiated a gradual decrease of Number of grous = 4 Within cell S.D. = 1.5 guessed seedling length for increasing amounts Mean(01) = 7.0 of nematodes in the ots and would Mean(02) = 8.0 consider gradual changes of 1 mm on Mean(03) = 9.0 Mean(04) = 10.0 average to be imortant enough to be Effect Size = reorted in a scientific journal Noncentrality arameter = * samle size SAMPLE SIZE POWER (er cell) then we would reach a ower of 80% or more when using six ots or more for each condition (Four ots er condition would only bring a ower of 55%.)

43 Systat: Power analysis If we anticiated that only large amounts of nematodes in the ots would lead to substantially shorter seedling lengths and would consider ste changes of 4 mm on average an imortant effect then we would reach a ower of 80% or more when using four ots or more in each grou (three ots er condition might be close enough though). Alha = 0.05 Power = 0.80 Model = One-way Number of grous = 4 Within cell S.D. = 1.7 Mean(01) = 6.0 Mean(02) = 6.0 Mean(03) = 10.0 Mean(04) = 10.0 Effect size = Noncentrality arameter = SAMPLE SIZE POWER (er cell) Total samle size = 16 guessed * samle size

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants. The idea of ANOVA Reminders: A factor is a variable that can take one of several levels used to differentiate one group from another. An experiment has a one-way, or completely randomized, design if several