Introduction to the Analysis of Variance (ANOVA)

Similar documents
Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

COMPARING SEVERAL MEANS: ANOVA

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Difference in two or more average scores in different groups

Sampling Distributions: Central Limit Theorem

10/31/2012. One-Way ANOVA F-test

An Old Research Question

Two-Sample Inferential Statistics

Hypothesis testing: Steps

Hypothesis testing: Steps

Analysis of Variance (ANOVA)

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Multiple Comparisons

Comparing Several Means: ANOVA

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

One-Way ANOVA Cohen Chapter 12 EDUC/PSY 6600

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

Note: k = the # of conditions n = # of data points in a condition N = total # of data points

ANOVA TESTING 4STEPS. 1. State the hypothesis. : H 0 : µ 1 =

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

Your schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

HYPOTHESIS TESTING. Hypothesis Testing

Contrasts and Multiple Comparisons Supplement for Pages

One-way Analysis of Variance. Major Points. T-test. Ψ320 Ainsworth

8/23/2018. One-Way ANOVA F-test. 1. Situation/hypotheses. 2. Test statistic. 3.Distribution. 4. Assumptions

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

LECTURE 5. Introduction to Econometrics. Hypothesis testing

N J SS W /df W N - 1

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Review of Statistics 101

psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests

One-way between-subjects ANOVA. Comparing three or more independent means

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009

Psychology 282 Lecture #4 Outline Inferences in SLR

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Chapter Seven: Multi-Sample Methods 1/52

An inferential procedure to use sample data to understand a population Procedures

Calculating Fobt for all possible combinations of variances for each sample Calculating the probability of (F) for each different value of Fobt

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Analysis of Variance

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

Using SPSS for One Way Analysis of Variance

One-Way ANOVA Source Table J - 1 SS B / J - 1 MS B /MS W. Pairwise Post-Hoc Comparisons of Means

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Lecture 5: ANOVA and Correlation

Econometrics. 4) Statistical inference

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

ANOVA 3/12/2012. Two reasons for using ANOVA. Type I Error and Multiple Tests. Review Independent Samples t test

One-way ANOVA. Experimental Design. One-way ANOVA

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Rejection regions for the bivariate case

H0: Tested by k-grp ANOVA

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

H0: Tested by k-grp ANOVA

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Design of Experiments. Factorial experiments require a lot of resources

df=degrees of freedom = n - 1

One-way between-subjects ANOVA. Comparing three or more independent means

WELCOME! Lecture 13 Thommy Perlinger

Multiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600

Analysis of Variance and Contrasts

Lecture 11: Two Way Analysis of Variance

ANOVA Analysis of Variance

INTRODUCTION TO ANALYSIS OF VARIANCE

HYPOTHESIS TESTING SAMPLING DISTRIBUTION. the sampling distribution for di erences of means is. 2 is known. normal if.

Lecture 3: Inference in SLR

Statistics Introductory Correlation

Introduction to Business Statistics QM 220 Chapter 12

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Lectures 5 & 6: Hypothesis Testing

One-factor analysis of variance (ANOVA)

Hypothesis T e T sting w ith with O ne O One-Way - ANOV ANO A V Statistics Arlo Clark Foos -

Analysis of variance (ANOVA) Comparing the means of more than two groups

Single Sample Means. SOCY601 Alan Neustadtl

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Data analysis and Geostatistics - lecture VII

Introduction to Analysis of Variance. Chapter 11

Multiple Regression Analysis

What Does the F-Ratio Tell Us?

Unit 12: Analysis of Single Factor Experiments

1. What does the alternate hypothesis ask for a one-way between-subjects analysis of variance?

Sociology 6Z03 Review II

Chapter 14: Repeated-measures designs

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Unit 27 One-Way Analysis of Variance

Independent Samples ANOVA

These are all actually contrasts (the coef sum to zero). What are these contrasts representing? What would make them large?

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

ANALYTICAL COMPARISONS AMONG TREATMENT MEANS (CHAPTER 4)

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Lecture 18: Analysis of variance: ANOVA

Introduction to Statistical Hypothesis Testing

Transcription:

Introduction to the Analysis of Variance (ANOVA)

The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique for testing for differences between the means of multiple (more than two) groups It is probably the most prevalent statistical technique used in psychological research. The ANOVA is a flexible technique that can be used with a variety of different research designs. In today s lecture, I will explain the logic behind the ANOVA and introduce the one-way between groups ANOVA, which is an ANOVA in which the groups are defined along only one independent (or quasi-independent) variable

The Analysis of Variance The purpose of ANOVA is much the same as the t tests presented in the preceding lectures Are the mean differences obtained for sample data sufficiently large for us to conclude that there are mean differences between the populations from which the samples were obtained The difference between ANOVA and the t tests is that ANOVA can be used in situations where there are two or more means being compared, whereas the t tests are limited to situations where only two means are involved.

Instructor 1 Instructor 2 Instructor 3 Populations (µ,σ unknown) Samples

The Problem of Multiple Comparisons The ANOVA is necessary to protect researchers from an excessive experimentwise error rate in situations where a study is comparing more than two population means. Experimentwise error rate: the probability of making at least one Type I error across mutliple comparisons These situations would require a series of several t tests to evaluate all of the mean differences. (Remember, a t test can compare only two means at a time) So? Why not just use multiple t-tests?

The Problem of Multiple Comparisons Why not just use multiple t-tests? Although each t test can be evaluated using a specific α-level (risk of Type I error), the α-levels accumulate over a series of tests so that the final familywise α-level can be quite large Example: For 5 levels of the independent variable, there are 10 possible pairwise comparisons between group means: {1,2},{1,3},{1,4},{1,5},{2,3},{2,4},{2,5},{3,4},{3,5},{4,5}

The Problem of Multiple Comparisons Assume H 0 is true and α=0.05. Then the probability of accepting H 0 in a single pairwise comparison is: P accept single pairwise H 1 0.95 0 However, we have to make 10 such comparisons. Using the multiplicative law of probability (remember that?), and assuming independent pairwise tests, the probability of correctly retaining the null in all 10 comparisons is: accept all 1 1... 1 P H 0 0.95 10 0.599 Therefore, experiment 1 P accept all H 10.599 0.401 We now have a 40% overall chance of making a Type I error! 0

Intro to ANOVA

Null and Alternative Hypotheses in ANOVAs The omnibus null hypothesis is the null hypothesis in the ANOVA: that the population means of all groups being compared are equal i.e., for three groups, H 0 : μ 1 = μ 2 = μ 3 Alternative Hypothesis: at least one population mean is different from the others.

Omnibus Null Hypothesis: µ 1 = µ 2 = µ 3

The Logic of the Analysis of Variance The test statistic for ANOVA is an F-ratio, which is a ratio of two sample variances. F variance including any treatment effects variance without any treatment effects MS MS between within In the context of ANOVA, the sample variances are called mean squares, or MS values The numerator, MS between, measures the size of mean differences between samples from different treatment groups The denominator, MS within (or MS error ), measures the magnitude of differences that would be expected without any treatment effects

The Logic of the Analysis of Variance Total Variance Between Treatments Variance Measures differences caused by: Systematic treatment effects Sampling error Within Treatments Variance Measures differences caused by: Sampling error

Assumptions of the ANOVA Normality of Scores I.e., we assume that the scores in all of our group populations are normally distributed Since this is important primarily for the sampling distribution of the mean, the ANOVA is fairly robust to violations of this assumption, especially if the sample sizes are reasonably large Homogeneity of variances We assume that each population of scores has the same variance E.g., [error variance] ANOVA is fairly robust to violations of this assumption Independence of observations E.g., given the population parameters, knowing one person s score tells you nothing about another person s score. Violations of this assumption can have serious implications for an analysis.

The Logic of the ANOVA Regardless of whether or not the null hypothesis is true, the assumption of homogeneity of variances implies that all population variances are equal 3 2 2 2 2 1 2 Thus, as we did for the independent-samples t-test, we can estimate this shared population variance by taking the average of the sample variances (the pooled variance),, s s s n s Avg s1 s s 3 2 2 2 2 2 ˆwithi p 2 3 2 2 2 1 2 3 (assuming n 1 = n 2 = n 3 )

The Logic of the ANOVA However, if all the population means are equal (under H 0 ), then we have a second way to estimate the population variance we can estimate the population variance using the variance of the sample means Recall that the Central Limit Theorem tells us how to compute the variance of sample means from the population variance: 2 2 M n We can rearrange this formula to solve for the population variance given the variance of sample means: n 2 2 M

The Logic of the ANOVA Of course, we don t have the variance of sample means either. However, we can estimate it by computing the variance of our three group means s Var M, M, M 2 2 ˆM M 1 2 3 Plugging this into the previous equation, our second estimate of the population variance is ns 2 2 ˆbetween M

The Logic of the ANOVA We now have two estimates of the population variance: An estimate computed from the sample variances, which should estimate the population variance regardless of whether H 0 is true 2 2 2 2 2 ˆwith,, in s p Avg s1 s2 s 3 A second estimate computed from the sample means, which only estimates the population variance if H 0 is true 2 2 ˆ between nsm nvar M1, M 2, M 3

The Logic of the ANOVA The F-ratio used as the test statistic for the ANOVA is simply the ratio between these two estimates of the population variance F 2 MS ˆ 1,, between nvar M M M between 2 2 2 2 MS ˆ within within Avg s1, s2, s 3 If H 0 is true, then these two estimates should be equal (on average) In this case, the ratio should be 1.0 However, if H 0 is false, then the estimate in the numerator (which is based on the variability of sample means) will include the treatment effect in addition to differences in sample means expected by chance In this case, the ratio should be greater than 1.0 2 3

The F distribution reject H 0 retain H 0

Populations Samples

The Logic of the ANOVA Sample 1 Sample 2 Sample 3 n = 20 n = 20 n = 20 M = 65.4 M = 70.95 M = 71.2 s 2 = 12.18 s 2 = 33.18 s 2 = 63.52 M 65.40 4277.16 70.95 5033.90 71.20 5069.44 sum 207.55 14380.50 M 2 ˆ s 2 2 within p 2 2 2 s1 s2 s3 3 12.18 33.18 63.52 3 36. 29 ns 2 2 ˆbetween M SS n df M M 21.50 20 2 215.0 M T SS M nm M 207.55 69.18 n k 3 M 2 2 2 207.55 M 14380.50 k 3 14380.50 14359.00= 21.50 2 ˆbetween 215 F 2 ˆ within 36.29 5.92