Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

Similar documents
Workshop 7.4a: Single factor ANOVA

Introduction to Analysis of Variance (ANOVA) Part 2

Experimental Design and Data Analysis for Biologists

Notes on Maxwell & Delaney

Multiple Predictor Variables: ANOVA

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Repeated-Measures ANOVA in SPSS Correct data formatting for a repeated-measures ANOVA in SPSS involves having a single line of data for each

Comparing Several Means: ANOVA

COMPARING SEVERAL MEANS: ANOVA

Analysis of Covariance

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Repeated Measures Analysis of Variance

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Introduction to Linear regression analysis. Part 2. Model comparisons

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Analysis of Variance

ANCOVA. Psy 420 Andrew Ainsworth

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Difference in two or more average scores in different groups

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Statistics Lab #6 Factorial ANOVA

Analyzing More Complex Experimental Designs

Statistics Univariate Linear Models Gary W. Oehlert School of Statistics 313B Ford Hall

Analysis of variance (ANOVA) Comparing the means of more than two groups

BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA

1 Use of indicator random variables. (Chapter 8)

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Introduction to Factorial ANOVA

Introduction. Chapter 8

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

One-way between-subjects ANOVA. Comparing three or more independent means

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Introduction to Analysis of Variance. Chapter 11

Unbalanced Designs & Quasi F-Ratios

Unbalanced Data in Factorials Types I, II, III SS Part 1

Sleep data, two drugs Ch13.xls

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Topic 12. The Split-plot Design and its Relatives (Part II) Repeated Measures [ST&D Ch. 16] 12.9 Repeated measures analysis

Psy 420 Final Exam Fall 06 Ainsworth. Key Name

Topic 4: Orthogonal Contrasts

OHSU OGI Class ECE-580-DOE :Design of Experiments Steve Brainerd

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

STK4900/ Lecture 3. Program

Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model

Inferences for Regression

Analysis of Variance (ANOVA)

A discussion on multiple regression models

Extensions of One-Way ANOVA.

Six Sigma Black Belt Study Guides

Stat 705: Completely randomized and complete block designs

EPSE 592: Design & Analysis of Experiments

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

ANOVA continued. Chapter 11

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Introduction to the Analysis of Variance (ANOVA)

Analysis of Variance: Repeated measures

More about Single Factor Experiments

Topic 9: Factorial treatment structures. Introduction. Terminology. Example of a 2x2 factorial

22s:152 Applied Linear Regression. 1-way ANOVA visual:

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

df=degrees of freedom = n - 1

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Multiple Predictor Variables: ANOVA

Notes on Maxwell & Delaney

Stat 5102 Final Exam May 14, 2015

1. What does the alternate hypothesis ask for a one-way between-subjects analysis of variance?

ANCOVA. Lecture 9 Andrew Ainsworth

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Inference for Regression Inference about the Regression Model and Using the Regression Line

Notes on Maxwell & Delaney

ANOVA continued. Chapter 10

Lec 1: An Introduction to ANOVA

H0: Tested by k-grp ANOVA

Chapter 16. Simple Linear Regression and Correlation

ANOVA: Analysis of Variation

13: Additional ANOVA Topics

Suppose we needed four batches of formaldehyde, and coulddoonly4runsperbatch. Thisisthena2 4 factorial in 2 2 blocks.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Inference for Regression Simple Linear Regression

R 2 and F -Tests and ANOVA

Chapter 10. Design of Experiments and Analysis of Variance

Lecture 15 Topic 11: Unbalanced Designs (missing data)

ANOVA continued. Chapter 10

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

610 - R1A "Make friends" with your data Psychology 610, University of Wisconsin-Madison

Mean Comparisons PLANNED F TESTS

8/28/2017. Repeated-Measures ANOVA. 1. Situation/hypotheses. 2. Test statistic. 3.Distribution. 4. Assumptions

Using R formulae to test for main effects in the presence of higher-order interactions

8/04/2011. last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression)

Variance Decomposition and Goodness of Fit

ANOVA CIVL 7012/8012

Checking model assumptions with regression diagnostics

Analysis of Variance (ANOVA)

Transcription:

1 Stats fest 2007 Analysis of variance murray.logan@sci.monash.edu.au Single factor ANOVA 2 Aims Description Investigate differences between population means Explanation How much of the variation in response variable (Y) is explained by differences in the predictor variable (X) Single factor ANOVA 3 Data Dependent (response) variable Continuous Normally distributed Independent (predictor) variable Categorical - factor n sampling units (replicates) per group (=level) 1

Single factor ANOVA 4 Linear model Regression ANOVA ANOVA is equivalent to a multiple regression Where α 1 is the effect of group 1, Single factor ANOVA 5 Linear model Over parameterized More parameters than there are groups Cannot include all α i s Can only include p-1 terms in model, plus the overall mean Must redefine the α terms Single factor ANOVA 6 Contrasts redefine the α terms Treatment contrasts (default behaviour of R) Can define own contrasts (Planned comparisons) 2

Single factor ANOVA 7 Linear model Regression ANOVA Reduced model (when H 0 is true, α=0) Regression H 0 : ANOVA Population group means are all equal µ 1 = µ 2 = µ 3 = = µ α 1 = α 2 = = 0 (Where α 1 = µ 1 - µ) Null hypotheses 8 Null hypotheses (H 0 ) Parameter based Population mean = 0 (µ=0) Population slope1 = 0 (β 1 =0) Population slope2 = 0 (β 2 =0) Use t-tests But what do these slopes mean in ANOVA?? Null hypotheses 9 Null hypotheses (H 0 ) Model based (variance based) Compare fit Generate a statistic based on the ratio of fit of the full and reduced models F-ratio 3

Analysis of variance (ANOVA) 10 Partitioning total variance Residual (observed expected) Analysis of variance (ANOVA) 11 Analysis of variance (ANOVA) 12 F-ratio Fdistribution.R 4

Analysis of variance (ANOVA) 13 F-distributions Analysis of variance (ANOVA) 14 Analysis of variance (ANOVA) 15 Weight of Lobelia seedlings grown in one of four composts (A, B, C & D) Biological hypothesis: seedlings grow differently in different composts There is an effect of compost type on the weight of Lobelia seedlings H 0 : µ A = µ B = µ C = µ D = µ Population group means are all equal 5

Analysis of variance (ANOVA) 16 Analysis of Variance Table Response: WEIGHT Df Sum Sq Mean Sq F value Pr(>F) COMPOST 3 29.480 9.827 6.7498 0.0009943 Residuals 36 52.411 1.456 --- F-ratio = Explained Unexplained F-ratio = 9.827 = 6.749 1.456 Single factor ANOVA 17 There is nothing on this page! Well why don t you delete it then? Why don t you get stuffed? Are you talking to me? Don t you look at me!! I could have a more intelligent conversation with a thimble full of pond scum! Single factor ANOVA 18 Assumptions Independent observations Normality (residuals) Boxplot of response variable Homogeneity of variance (residuals) Spread of observations around regression line Residual plot > boxplot(response ~ FACTOR, data=data) 6

Single factor ANOVA 19 Fit linear model > *.lm <- lm(response ~ FACTOR, data=data) > *.aov <- aov(response ~ FACTOR, data=data) Single factor ANOVA 20 Final checks (influence measures) Residual How much each Y value differs from expected Residual plot > plot(*.lm) OR > plot(*.aov) Single factor ANOVA 21 Analysis sequence Design experiment/survey Collect data Test assumptions Fit linear model Estimate parameters Full vs reduced Partition variability into explained & unexplained r 2 SS / SS Groups Total 7

Single factor ANOVA 22 Analysis sequence cont. Test H 0 s > summary(*.aov) α 1 = µ = 0 (t-test) α 2 - α 1 =0 (t-test) By default.. (t-test) Full vs Reduced (explained vs unexplained) α 1 = α 2 = = 0 F-ratio statistic = MS Groups /MS Residual F-distribution (df=p-1, p(n-1)) > anova(*.aov) Conclusions Reject or not reject H 0 Planned comparisons 23 Following a significant ANOVA Some group comparisons more biologically meaningful than others Control vs Treatment1, etc Planned prior to data analysis! p-1 comparisons maximum All comparisons must be independent Planned comparisons 24 Consider four groups By default, this tests α 1 = µ = 0 α 2 - α 1 =0 α 3 - α 1 =0 α 4 - α 1 =0 This might not be useful All means are compared to group 1 8

Planned comparisons 25 Consider four groups We can redefine α 2, α 3, α 4 p-1 alternative comparisons E.g. α 2 : group2 vs group4 E.g. α 2 : average of group1 & group2 vs group3 Define specific comparisons Need to specify which groups contribute to each effect (α 2, α 3, α 4 ) Planned comparisons 26 Consider four groups Define specific comparisons grp1 vs grp2 Average gr2&grp3 vs grp4 These are contrast coefficients Grp1 Grp2 Grp3 Grp4 α 2 1-1 0 0 α 3 0 0.5 0.5-1 α 4???? Planned comparisons 27 Consider four groups Define specific comparisons Linear trend Quadratic trend Cubic trend Grp1 Grp2 Grp3 Grp4 α 2-3 -1 1 3 α 3-1 1 1-1 α 4-1 3-3 1 9

Planned comparisons 28 Defining contrasts >contrasts(factor) <- cbind(c(coefs), c(coefs)) Where coefs are the contrast coefficients Ensure that the contrasts are independent (orthogonal) >crossprod(contrasts(partridge$group)) All lower left (triangle) values must be 0 [,1] [,2] [,3] [,4] [1,] 2 0 0 0 [2,] 0 30 0 0 [3,] 0 0 1 0 [4,] 0 0 0 1 [,1] [,2] [,3] [,4] [1,] 2 0 1 0 [2,] 0 30-4 0 [3,] 1-5 4 0 [4,] 0 0 0 1 Post-hoc unplanned comparisons 29 Following a significant ANOVA None of the possible comparisons appear more or less biologically meaningful than others Compare all groups pairwise Lots of comparisons Not independent Control familywise Type I error rate at 0.05 Adjust each comparison Specific comparisons of means 30 Familywise Type I error rate Probability of at least one Type I error among all comparisons Increases with increasing number of groups No. of groups 3 5 10 No. of comparisons 3 10 45 Familywise Type I error probability 0.14 0.40 0.90 10

Post-hoc unplanned comparisons 31 Bonferoni corrections Divide p-value by number of comparisons Tukey s test Adjust degrees of freedom according to the number of comparisons >library(multcomp) >summary(simtest(dv~factor,data,type= Tukey")) More comparisons more brutal the adjustments Reduced power Factors 32 Fixed factors Only interested in the specific groups (levels) tested Can t generalize to other levels E.g. control/caged Levels selected to represent levels interested in Random factors Want to extrapolate to all other possible levels E.g. Temperature, Site, ph Levels selected as random representatives Nested ANOVA 33 Aims Investigate the effect of a main factor (A) Sub-replicates collected (Factor B) within A Random factor Factor A Burnt Unburnt Burnt Unburnt Burnt Unburnt Factor B 11

Nested ANOVA 34 Data Dependent (response variable) Factor A (categorical variable) Main effect Fixed factor Factor B (categorical variable) Subreplicates Random factor Factor B(A) Nested ANOVA 35 Linear model Ho: Populations means of Factor A are equal No effect of Factor A µ 1 = µ 2 = = µ α 1 = α 2 = = 0 Populations means of Factor B are equal No variance between levels of Factor B within Factor A Not really interested in testing whether there are differences between sub-replicates! Nested ANOVA 36 ANOVA table Factor A Factor B(A) MS MS A MS B(A) F-ratio MS A /MS B(A) MS B(A) /MS Resid Factor A Burnt Unburnt Burnt Unburnt Burnt Unburnt Factor B 12

Nested ANOVA 37 Factor A B(A) C(B(A)) MS A MS MS B(A) MS C(B(A)) F-ratio MS A /MS B(A) MS B(A) /MS C(B(A)) MS C(B(A)) /MS Resid Factor A Burnt Unburnt Burnt Unburnt Burnt Unburnt Factor C Factor B Nested ANOVA 38 Assumptions As with single factor ANOVA Consider what the replicates are Blocks are the replicates of Factor A For Factor A, the average of Factor B within A are the replicates These need to be used for boxplots Factorial ANOVA 39 Aims To investigate the effects of two or more factors (and there interactions) on a response variable Data Dependent (response variable) Factor A (categorical variable) Fixed or random factor Factor B (categorical variable) Fixed or random factor Every level of Factor A crossed with every level of Factor B 13

Factorial ANOVA 40 Factor A White or Gray Factor B Stripes or not stripes Factorial ANOVA 41 Linear model Ho: Populations means of Factor A are equal No effect of Factor A α 1 = α 2 = = 0 Populations means of Factor B are equal No effect of Factor B β 1 = β 2 = = 0 No interaction between Factor A and Factor B Factorial ANOVA 42 ANOVA Factor MS F-ratio (both fixed) F-ratio (A fixed, B random) F-ratio (both random) A MS A MS A /MS Resid MS A /MS A:B MS A /MS A:B B MS B MS B /MS Resid MS B /MS Resid MS B /MS A:B A:B MS A:B MS A:B /MS Resid MS A:B /MS Resid MS A:B /MS Resid Interpret interactions first! Simple main effects 14

Factorial ANOVA 43 Unbalanced designs (models) - non-orthogonal Unequal sample sizes Source of unequal variance Complicate partitioning of variance When balanced SS Total =SS A +SS B +SS AB +SS Residual When unbalanced SS Total SS A +SS B +SS AB +SS Residual Factorial ANOVA 44 Type I (regular, sequential) SS SS = improvement with each successive term RSS MEAN RSS A RSS A,B RSS A,B,AB SS A = RSS MEAN -RSS A SS B = RSS A RSS A,B SS AB = RSS A,B RSS A,B,AB Order important when unbalanced Factorial ANOVA 45 Type II (hierarchical) SS SS = improvement based on a hierarchical model comparison RSS A Same as Type I! RSS B RSS A,B RSS A,B,AB Order unimportant SS A = RSS B RSS A,B SS B = RSS A RSS A,B SS AB = RSS A,B RSS A,B,AB 15

Factorial ANOVA 46 Type III (marginal, orthogonal) SS SS = improvement based comparison with and without term RSS B,AB RSS A,AB RSS A,B RSS A,B,AB Same as Type I! Order unimportant Best choice?? SS A = RSS B,AB RSS A,B,AB SS B = RSS A,AB RSS A,B,AB SS AB = RSS A,B RSS A,B,AB Randomized block 47 Aims Groups of main treatment blocked together Reduce unexplained variation without increasing sample size Blocks (Factor B) Factor A Randomized block 48 Data Dependent variable Factor A main effect Each level is observed within Factor B Factor B (blocking factor) Random factor Factor A Factor B 16

Randomized block 49 When the levels of Factor A cannot occur simultaneously e.g. given at different times Called repeated measures Eg. Measurements taken every five minutes from the same individual 0, 5, 10, 15, 20, 30, 40 0, 5, 10, 15, 20, 30, 40 0, 5, 10, 15, 20, 30, 40 0, 5, 10, 15, 20, 30, 40 Randomized block 50 Linear model Ho: Populations means of Factor B (blocks) are equal No difference between blocks β 1 = β 2 = = 0 Populations means of Factor A are equal No effect of Factor A α 1 = α 2 = = 0 Randomized block 51 ANOVA table Factor B Factor A MS MS B MS A F-ratio MS B /MS Resid MS A /MS Resid Factor A Factor B 17

Randomized block 52 Assumptions Normality, equal var No block (B) by A interaction Reduces power of test of A Difficult to interpret Diagnose with an interaction plot Sphericity Equal variances of differences between all pairs of treatments var of group1 group2 = var of group2 group 3 etc Non sphericity chances of type I errors Randomized block 53 Sphericity likely to be met when treatments randomly assigned to units within blocks E.g. treatments randomized within quadrats E.g. order of drugs given to rats random Sphericity not likely to be met when treatments are time Since cant randomize time Time 1 is likely to be more similar to Time 2 than to Time 10 Calculate the degree to which sphericity not met Epsilon (<1) Greenhouse Geisser (preferred) Huyhn-Feldt Adjust degrees of freedom Split-plot designs 54 Aims Combination of nested and blocking designs Factor A Factor C Factor B 18

Split-plot designs 55 Data Factor A (between plot effect) Factor B (plot) Factor C (within plot effect) Factor C Factor C Factor B Split-plot designs 56 Data Factor A (between plot effect) Main effect 1 Factor B (plot) Random factor Factor C (within plot effect) Main effect 2 Nested Blocking Split-plot designs 57 ANOVA Factor A B(A) C A:C B(A):C MS MS A MS B(A) MS C MS A:C MS B(A):C F-ratio MS A /MS B MS B(A) /MS Resid MS C /MS B(A):C MS A:C /MS B(A):C MS B(A):C /MS Resid 19

Split-plot designs 58 Assumptions Normality, equal variance Establish correct residuals Sphericity (for within plot effects) As for blocking designs No interaction between plot and within plot effects 20