One-factor analysis of variance (ANOVA)

Size: px

Start display at page:

Download "One-factor analysis of variance (ANOVA)"

Gladys Evans
5 years ago
Views:

1 One-factor analysis of variance (ANOVA) March 1, 2017 psych10.stanford.edu

2 Announcements / Action Items Schedule update: final R lab moved to Week 10 Optional Survey 5 coming soon, due on Saturday

3 Last time The surprising-ness of an event depends on whether we were interested in that specific event occurring or one of many possible events occurring implications for multiple comparisons in hypothesis testing Choosing the correct conditional probability matters We can use Bayes rule to find these correct conditional probabilities from partial information We need to account for whether our observations are independent

4 This time Recap: what types of questions do we ask when we compare two means? How much variance should we expect between group means? How can we compare multiple means?

5 This time Recap: what types of questions do we ask when we compare two means? How much variance should we expect between group means? How can we compare multiple means?

6 Memorizing letters Can people memorize more letters when they are grouped to have conceptual meaning than when they are not? n = 50 college students were given a sequence of thirty letters to memorize in twenty seconds grouping variable: randomly assigned to either recognizable three-letter groupings: JFK-CIA-FBI-USA- unrecognizable grouping: JFKC-IAF-BIU- response variable: immediately listed as many letters as they could remember, score was number of letters correctly listed before first mistake

7 What might we want to know? What s the best guess for # of items memorized, given that a person (past or future) is in a specific group? What our best guess for the difference between (population) group means? What are our the lowest and highest plausible guesses for the difference between (population) group means? Would it be likely to observe a difference between groups that is this large if there actually is no difference in the populations? Is recognizability of the words a major determinant in how many words people can memorize? Can these data address questions about a cause-and-effect relationship between recognizability and memory? What people and what situations do these findings generalize to? Lots of other things! Generate a question, find a tool (not v.v.)

8 What (and why) might we want to know? What s the best guess for # of items memorized, given that a person (past or future) is in a specific group? group sample means (11.15 words and words) What our best guess for the difference between (population) group means? difference in group sample means (3.17 words) 30 clean # memorized

9 What (and why) might we want to know? Is recognizability of the words a major determinant in how many words people can memorize? r 2 is roughly (-1.48) 2 ((-1.48) ) =.046 in our sample, it explains ~ 5% of the variability in number of words memorized 30 clean # memorized

10 What (and why) might we want to know? Would it be likely to observe a difference between groups that is this large if there actually is no difference in the populations? t = -1.48, p =.15 reasonably likely to observe this difference in sample means if there is no difference in population means don t have enough evidence to infer that population means are different 30 clean # memorized

11 What (and why) might we want to know? What are our the lowest and highest plausible guesses for the difference between (population) group means? confidence interval (-7.47 words, 1.14 words) there might be a small advantage for not recognizable strings or there might be a really big advantage for recognizable strings probably worth studying some more 30 clean # memorized

12 What (and why) might we want to know? Can these data address questions about a cause-and-effect relationship between recognizability and memory? What people and what situations do these findings generalize to? experiment with random assignment can make cause-andeffect conclusions might want to generalize to people similar to college students, situations similar to memorizing letter strings and recalling immediately after

13 This time Recap: what types of questions do we ask when we compare two means? How much variance should we expect between group means? How can we compare multiple means?

14 A note on terminology Analysis of variance (ANOVA) is a broad term that describes a general procedure This procedure can be used to analyze many situations, with multiple variables, termed factors Today, I m referring to a one-factor, independentsamples analysis of variance as simply ANOVA

15 Reminder x x pool x x 1 x pool x 2 x 1 x x 2 total variance variance we can explain variance we cannot explain (1) Sum of squares (SS): for each value, calculate its distance from the mean and square it, then sum these squared values (2) Variance (s 2 or σ 2 ): mean of these squared values (3) Standard deviation (s or σ): square root of the variance

16 Ratio of variances When comparing two means, a t-statistic is a ratio numerator: the difference in sample means that we observed denominator: a typical distance in sample means that we would expect to observe, if the population means were equal A difference can only compare two means how can we summarize distances between many values? variance: a typical squared distance from the mean We ll look at a new ratio numerator: the variance in sample means that we observed denominator: the variance in sample means that we would expect to observe, if the population means (and variances) were equal (after some rearranging)

17 we re imagining that our observed data are generated by the following process: we have a single distribution of values we select n1 values as group 1, n2 values as group 2, etc. but group is arbitrary because all of the values came from the same distribution any differences in means between groups are from sampling error this is a distribution of means problem

18 How can we calculate this ratio? central limit theorem describes a distribution of sample means: has a standard deviation equal to the population standard deviation divided by the square root of the sample size has a variance equal to the population variance divided by the sample size (n) (variance of sample means we observed) divided by (variance of sample means we would expect to observe if the population means and variances were equal) (variance of sample means we observed) divided by (variance of individual values / sample size (n) corresponding to each sample mean) (n * variance of sample means we observed) divided by (variance of individual values)

19 How can we calculate this ratio? numerator: n * variance of sample means we observed n Pgroup ( x group x pooled ) 2 df betweengroup # groups - 1 Pgroup n group( x group x pooled ) 2 df betweengroup Pvalues ( x group x pooled ) 2 SSbetween (aka SSexplained) df betweengroup denominator: variance of individual values we observed Pvalues (x i x group ) 2 df withingroup SSwithin (aka SSunexplained) # values - # groups

20 F-ratio F = SS between df between SS within df within clean total 10 Ratio of independent estimates of the population variance (if the null hypothesis is true) Memorized between within Using SSbetween 10 Using SSwithin 0 Observation

21 ANOVA table Where k = # of groups, N = total number of participants all sums are sums over all individual observations 30 clean total Source SS sum of squares Between Ʃ(x group - x pooled) 2 k - 1 Within Ʃ(xi - x group) 2 N - k Total Ʃ(xi - x pooled) 2 (= SSb + SSw) 20 df MS degrees of mean square freedom 10 (variance) Memorized N - 1 (= dfb + dfw) 0 between SSb / dfb MSw = SSw / dfw MSb = Observation F MSb / MSw within

22 SS with dplyr

23 SS with dplyr

24 ANOVA table Where k = # of groups, N = total number of participants all sums are sums over all individual observations Source SS sum of squares df degrees of freedom MS mean square (variance) F Between Ʃ(x group - x pooled) 2 k - 1 Within Ʃ(xi - x group) 2 N - k MSb = SSb / dfb MSw = SSw / dfw MSb / MSw Total Ʃ(xi - x pooled) 2 (= SSb + SSw) N - 1 (= dfb + dfw)

25 ANOVA table Where k = # of groups, N = total number of participants all sums are sums over all individual observations Source SS sum of squares df degrees of freedom MS mean square (variance) F Between k - 1 Within N - k MSb = SSb / dfb MSw = SSw / dfw MSb / MSw Total N - 1 (= dfb + dfw)

26 ANOVA table Where k = # of groups, N = total number of participants all sums are sums over all individual observations Source SS sum of squares df degrees of freedom MS mean square (variance) F Between = 1 Within = 49 MSb = SSb / dfb MSw = SSw / dfw MSb / MSw Total = 50

27 ANOVA table Where k = # of groups, N = total number of participants all sums are sums over all individual observations Source SS sum of squares df degrees of freedom MS mean square (variance) F Between = 1 Within = / 1 = / 49 = MSb / MSw Total = 50

28 ANOVA table Where k = # of groups, N = total number of participants all sums are sums over all individual observations Is 2.22 unlikely if the null hypothesis is true? Source SS sum of squares df degrees of freedom MS mean square (variance) F Between = 1 Within = / 1 = / 49 = / = 2.22 Total = 50

29 F distributions F is a ratio: Observed variance in sample means divided by expected variance in sample means Two independent estimates of the population variance If there are no differences between groups (in the population): What would we expect F to be? ratio of two things that we believe are equal 1 Will it always be exactly that? no, due to sampling error How high could F get? When should the numerator be greater than the denominator? no limit; the numerator should be greater than the denominator if the variability in our group means is the combination of random variability and different population means How low could F get? When should the numerator be less than the denominator? 0; random chance, but it is not consistent with any situation of interest only interested in upper (right) tail

30 F distributions What could we do if we didn t know the shape of the sampling distribution of F statistics we could have observed if there were no differences between groups (null hypothesis was true)? Already defined: a family of F distributions, defined by dfbetween (df1) and dfwithin (df2) this order matters Only interested in upper tail, which includes any ordering of differences between means

31 Statistical significance of F Where k = # of groups, N = total number of participants all sums are sums over all individual observations Is 2.22 unlikely if the null hypothesis is true? no Source SS sum of squares p > α df degrees of freedom Between = 1 Within = 49 MS mean square (variance) / 1 = / 49 = F / = 2.22 Total = 50 Fobserved < Fcritical

32 Relationship with t-test What if the t-test and ANOVA give us different results? (Shouldn t they give us the same result)? We assumed the population variances were equal (in addition to the mean), so the right comparison is to a Student s t-test that assumes equal variances between groups When we have k = 2, F = t 2 (The F-statistic that we get from our sample is exactly equal to the square of the Student s t-statistic that we get from our sample).

33 Relationship with t-test What if the t-test and ANOVA give us different results? (Shouldn t they give us the same result)? We assumed the population variances were equal (in addition to the mean), so the right comparison is to a Student s t-test that assumes equal variances between groups When we have k = 2, F = t 2 (The F-statistic that we get from our sample is exactly equal to the square of the Student s t-statistic that we get from our sample).

34 Relationship between F and t If you re curious The numerator of t 2 is (x 1 - x 2) 2 = ((x 1 - x 2) - (μ1 - μ2)) 2 / 1 This is an unbiased estimate of the variance of sample mean differences if H0 is true (proof omitted) The denominator of t 2 is sx 1-x 2 2 This is the expected variance of sample mean differences if H0 is true (estimated from individual observations) So t 2 is a ratio of variances, i.e., an F-statistic

35 Effect size in ANOVA r 2 = variance we can explain / total variance r 2 = SSbetween / SStotal r 2 = / =.043 Source SS sum of squares r 2 = t 2 / (t 2 + df) r 2 = (-1.49) 2 / ((-1.49) 2 + df) =.043 Between Within Total

36 Mini recap F is a ratio of variances that we would expect to see if all population groups had the same mean and variance If this ratio is large, it suggests that the group means are different from each other for reasons other than random variability If we have two groups, this maps on to the (two-tailed) Student s t-statistic (p-value is equal, F = t 2 ) Then why did we do this? procedure is not restricted to two groups partitioning variance is a useful strategy for other questions that cannot be answered with a t-test

37 This time Recap: what types of questions do we ask when we compare two means? How much variance should we expect between group means? How can we compare multiple means?

38 Comprehending passages Bransford & Johnson, If the balloons popped, the sound wouldn t be able to carry since everything would be too far away from the correct floor. 2. A closed window would also prevent the sound from carrying, since most buildings tend to be well-insulated. 3. Since the whole operation depends on a steady flow of electricity, a break in the middle of the wire would also cause problems. 4. Of course, the fellow could shout, but the human voice is not loud enough to carry that far. 5. An additional problem is that a string could break on the instrument. 6. Then there could be no accompaniment to the message. 7. It is clear that the best situation would involve less distance. 8. Then there would be fewer potential problems. 9. With face to face contact, the least number of things could go wrong.

39 Comprehending passages Bransford & Johnson, If the balloons popped, the sound wouldn t be able to carry since everything would be too far away from the correct floor. 2. A closed window would also prevent the sound from carrying, since most buildings tend to be well-insulated. 3. Since the whole operation depends on a steady flow of electricity, a break in the middle of the wire would also cause problems. 4. Of course, the fellow could shout, but the human voice is not loud enough to carry that far. 5. An additional problem is that a string could break on the instrument. 6. Then there could be no accompaniment to the message. 7. It is clear that the best situation would involve less distance. 8. Then there would be fewer potential problems. 9. With face to face contact, the least number of things could go wrong.

40 Comprehending passages Grouping variable: 57 participants randomly assigned to: hear passage alone (no picture) see picture before hearing passage see picture after hearing passage Response variable: test of comprehension of the passage (ranges from 1-7) Two hypotheses, use family-wise α =.05 H 0 : μ none = μ before = μ after H A : it is not the case that μ none = μ before = μ after (careful, this is not the same thing as as μ none μ before μ after )

41 F (statistical significance), ratio of (SSbetween / dfbetween) to (SSwithin / dfwithin) r 2 (effect size), ratio of SSbetween / SStotal what is the role of sample size (n) in each of these? between within clean total Observation Comprehension Comprehending passages

42 ANOVA table Where k = # of groups, N = total number of participants all sums are sums over all individual observations Source SS sum of squares df degrees of freedom MS mean square (variance) F Between Ʃ(x group - x pooled) 2 k - 1 Within Ʃ(xi - x group) 2 N - k MSb = SSb / dfb MSw = SSw / dfw MSb / MSw Total Ʃ(xi - x pooled) 2 (= SSb + SSw) N - 1 (= dfb + dfw)

43 Sums of squares

44 ANOVA table Where k = # of groups, N = total number of participants all sums are sums over all individual observations Source SS sum of squares df degrees of freedom MS mean square (variance) F Between k - 1 Within N - k MSb = SSb / dfb MSw = SSw / dfw MSb / MSw Total N - 1 (= dfb + dfw)

45 ANOVA table Where k = # of groups, N = total number of participants all sums are sums over all individual observations Source SS sum of squares df degrees of freedom MS mean square (variance) F Between = 2 Within = 54 MSb = SSb / dfb MSw = SSw / dfw MSb / MSw Total = 56

46 ANOVA table Where k = # of groups, N = total number of participants all sums are sums over all individual observations Source SS sum of squares df degrees of freedom MS mean square (variance) F Between = 2 Within = / 2 = / 54 = 1.75 MSb / MSw Total = 56

47 ANOVA table Where k = # of groups, N = total number of participants all sums are sums over all individual observations Source SS sum of squares df degrees of freedom MS mean square (variance) F Between = 2 Within = / 2 = / 54 = / 1.75 = Total = 56

48 ANOVA table in R

49 Effect size r 2 = SS between / SS total = / =.27 We can explain 27% of the variance between comprehension scores by taking into account which condition a person was in Source SS sum of squares df degrees of freedom MS mean square (variance) F Between = 2 Within = / 2 = / 54 = / 1.75 = Total = 56

50 Comprehending passages Two hypotheses, use family-wise α =.05 H 0 : μ none = μ before = μ after infer that it is not the case that μ none = μ before = μ after (careful, this is not the same thing as as μ none μ before μ after ) We have inferred that some population means are different, and we have described that taking condition into account explains 27% of the variance across scores But we want to know which groups are different Follow up pair-wise confidence intervals for differences in population means: 95% CI for difference between after - before: [-2.63, -0.85] 95% CI for difference between after - none: [-1.03, +0.72] 95% CI for difference between before - none: [+0.73, +2.42] which pairwise comparison are significant at α =.05?

51 Recap We can make inferences based on analysis of ratios of variance we compare two estimates of variance that we would expect to be equal if there were no differences between groups in the population We can use analysis of variance to compare means between multiple groups (control family-wise α)

52 Quiz 3 Some comments 30 Mean = 0.72 count Median = 0.74 SD = 0.16 IQR = (truncated) score

53 Questions

Two-Sample Inferential Statistics

The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is