Method to compare more than two samples simultaneously without inflating Type I Error rate (α) Simplicity Few assumptions Adequate for highly complex hypothesis testing 09/30/12 1
Outline of this class Data organization and layout Repartitioning of variance Definition of a linear model Combine the linear model with the repatitioning of variances Definition of a statistic (F-test) 09/30/12 2
Data organization Suppose that we want to investigate the average length of a fish species in three different lakes because we suspect that there might be some form of local adaptation We sample 5 fish (replicates) at each lake 09/30/12 3
Data organization First we establish how to measure length Lenght This is an important part of experimental design! 09/30/12 4
Data organization Then we collect the data 09/30/12 5
Data organization Factor Lake has three levels: 1, 2 and 3 09/30/12 6
Data organization 09/30/12 7
Data organization We may represent it as Note that Lake is a classification criteria, that is, we can classify each fish according to the lake where it belongs 09/30/12 8
Repartitioning the variance Total variation = Ronald Aylmer Fisher (1890-1962) 09/30/12 9
Why this formula? Total variation = Sum of all the squared differences between each individual value and the grand mean (overall mean) But why squaring the differences? 09/30/12 10
Total variation = 09/30/12 11
= 0 Total variation Within treatments variation Among (between) treatments variation 09/30/12 12
Repartitioning the variance 09/30/12 13
Repartitioning the variance What do these quantities measure? 09/30/12 14
Repartitioning the variance Why use analysis of variance to test hypothesis about the means? 09/30/12 15
Defining a linear model Any single measurement can be predicted if we know the mean (μ) of the treatment or sample where it belongs (i) and the error (e) associated with that particular replicate (j) in the sample i 09/30/12 16
An interesting property Take sample 1 (Lake 1) 09/30/12 17
An interesting property 09/30/12 18
An interesting property 09/30/12 19
An interesting property We can represent any sample in terms of its errors We will make use of this property later on... 09/30/12 20
Back to the linear model If the null hypothesis is true, all samples (treatments or levels) came from the same population H 0 : μ 1 = μ 2 = μ 3 =... = μ i... = μ a = μ 09/30/12 21
Defining the linear model If the null hypothesis is false, some samples will deviate from the grand mean by an amount called A H 0 : A 1 = A 2 = A 3 =... = A i... = A a = 0 09/30/12 22
Defining the linear model 09/30/12 23
Defining the linear model 09/30/12 24
Defining the linear model 09/30/12 25
Joining the linear model and the repartitioning of variances 09/30/12 26
09/30/12 27
09/30/12 28
Where do we know this from? We know that a sample can also be represented by the deviations of each replicate to the sample mean (errors) 09/30/12 29
09/30/12 30
09/30/12 31
09/30/12 32
09/30/12 33
09/30/12 34
09/30/12 35
COVARIANCE 1 st Assumption: individual observations are independent from each other (that is, no particular observation influences any other observation in the same or other sample) INDEPENDENCE OF OBSERVATIONS 09/30/12 36
COVARIANCE If observations are independent, covariance is null (zero) INDEPENDENCE OF OBSERVATIONS 09/30/12 37
If observations are independent, covariance is null (zero) INDEPENDENCE OF OBSERVATIONS 09/30/12 38
Let s focus on this term... This is the deviation of sample means from the grand mean (Remember the Central Limit Theorem?) 09/30/12 39
The central limit theorem says 09/30/12 40
2 nd Assumption: sample variances are equal (homogeneous or homoscedastic) HOMOGENEITY OF VARIANCES 09/30/12 41
2 nd Assumption: sample variances are equal (homogeneous or homoscedastic) HOMOGENEITY OF VARIANCES 09/30/12 42
2 nd Assumption: sample variances are equal (homogeneous or homoscedastic) HOMOGENEITY OF VARIANCES 09/30/12 43
Using the same argument 2 nd Assumption: sample variances are equal (homogeneous or homoscedastic) HOMOGENEITY OF VARIANCES 09/30/12 44
09/30/12 45
Change the order of Between and Within samples since this is the most common layout for an ANOVA 09/30/12 46
Introducing degrees of freedom For a factor with a levels: a-1 For the within samples variation: a(n-1) For the Total variation: an-1 09/30/12 47
Introducing degrees of freedom 09/30/12 48
Introducing degrees of freedom and Mean Squares Mean Square (MS) = Sum of Squares / Degrees of Freedom (SS/DF) 09/30/12 49
Introducing degrees of freedom and Mean Squares Mean Square (MS) = Sum of Squares / Degrees of Freedom (SS/DF) 09/30/12 50
Revisiting the null hypothesis If the null hypothesis is true, sample means will be the same as the grand mean and deviations from the latter (A i ) will be zero H 0 : A 1 = A 2 = A 3 =... = A i... = A a = 0 09/30/12 51
If the null hypothesis is true 09/30/12 52
If the null hypothesis is true 09/30/12 53
Choosing a statistical test 09/30/12 54
The adequate statistical test 3 th Assumption: the variable being sampled follows a normal distribution (often stated as: the population being sampled follows a normal distribution) NORMALITY OF SAMPLED POPULATION If this is true, the ratio between two variances follows a F-distribution 09/30/12 55
The F distribution F 1: H 0 true F > 1: H 0 false 09/30/12 56
ANOVA in action Source of variation Lakes 48.933 Error 50.000 Total 98.933 SS DF MS F P 09/30/12 57
ANOVA in action Source of variation Lakes 48.933 2 Error 50.000 12 Total 98.933 14 SS DF MS F P 09/30/12 58
ANOVA in action Source of variation SS DF MS F P Lakes 48.933 2 24.467 Error 50.000 12 4.167 Total 98.933 14 09/30/12 59
ANOVA in action Source of variation SS DF MS F P Lakes 48.933 2 24.467 5.872 Error 50.000 12 4.167 Total 98.933 14 09/30/12 60
ANOVA in action Source of variation SS DF MS F P Lakes 48.933 2 24.467 5.872 Error 50.000 12 4.167 Total 98.933 14 0.017 F > F crit H0 rejected HA accepted Average length of fish species differs among lakes 09/30/12 61