One-way ANOVA (Single-Factor CRD)

Similar documents
22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

One-way ANOVA Model Assumptions

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Two-Way ANOVA (Two-Factor CRD)

Unbalanced Data in Factorials Types I, II, III SS Part 1

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

STAT 705 Chapter 16: One-way ANOVA

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Multiple comparisons - subsequent inferences for two-way ANOVA

Analysis of Variance. Read Chapter 14 and Sections to review one-way ANOVA.

Ch 2: Simple Linear Regression

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Chapter 12. Analysis of variance

Single Factor Experiments

Comparison of a Population Means

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Unit 12: Analysis of Single Factor Experiments

STAT 135 Lab 10 Two-Way ANOVA, Randomized Block Design and Friedman s Test

Lec 1: An Introduction to ANOVA

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Lecture 4. Random Effects in Completely Randomized Design

Analysis of Variance

Chapter 11. Analysis of Variance (One-Way)

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

STAT22200 Spring 2014 Chapter 8A

Lecture 7: Latin Square and Related Design

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Outline Topic 21 - Two Factor ANOVA

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

STAT 705 Chapter 19: Two-way ANOVA

Two-Way Factorial Designs

One-Way ANOVA Model. group 1 y 11, y 12 y 13 y 1n1. group 2 y 21, y 22 y 2n2. group g y g1, y g2, y gng. g is # of groups,

Using SPSS for One Way Analysis of Variance

Factorial Treatment Structure: Part I. Lukas Meier, Seminar für Statistik

STAT 8200 Design and Analysis of Experiments for Research Workers Lecture Notes

Confidence Intervals, Testing and ANOVA Summary

Topic 20: Single Factor Analysis of Variance

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

Inferences for Regression

Analysis of Variance

Design & Analysis of Experiments 7E 2009 Montgomery

Ch 3: Multiple Linear Regression

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

2 Prediction and Analysis of Variance

Lecture 14: ANOVA and the F-test

Independent Samples ANOVA

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

[y i α βx i ] 2 (2) Q = i=1

Statistics 512: Applied Linear Models. Topic 9

R 2 and F -Tests and ANOVA

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

TWO-LEVEL FACTORIAL EXPERIMENTS: BLOCKING. Upper-case letters are associated with factors, or regressors of factorial effects, e.g.

STAT 430 (Fall 2017): Tutorial 8

Factorial designs. Experiments

STAT 705 Chapter 19: Two-way ANOVA

QUEEN MARY, UNIVERSITY OF LONDON

Six Sigma Black Belt Study Guides

STAT 350: Geometry of Least Squares

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Interactions and Factorial ANOVA

iron retention (log) high Fe2+ medium Fe2+ high Fe3+ medium Fe3+ low Fe2+ low Fe3+ 2 Two-way ANOVA

Lecture 10. Factorial experiments (2-way ANOVA etc)

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Battery Life. Factory

STAT 525 Fall Final exam. Tuesday December 14, 2010

Interactions and Factorial ANOVA

Section 4.6 Simple Linear Regression

Analysis of Variance

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Stat 217 Final Exam. Name: May 1, 2002

Chapter 16. Simple Linear Regression and dcorrelation

Inference with Simple Regression

Lecture 13 Extra Sums of Squares

20.0 Experimental Design

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Solutions to Final STAT 421, Fall 2008

Formal Statement of Simple Linear Regression Model

Lecture 15. Hypothesis testing in the linear model

Business Statistics. Lecture 9: Simple Regression

Lectures on Simple Linear Regression Stat 431, Summer 2012

Unit 7: Random Effects, Subsampling, Nested and Crossed Factor Designs

Week 7.1--IES 612-STA STA doc

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Lecture 9. ANOVA: Random-effects model, sample size

Ch. 5 Two-way ANOVA: Fixed effect model Equal sample sizes

F-tests and Nested Models

Section 11: Quantitative analyses: Linear relationships among variables

Statistics for Engineers Lecture 9 Linear Regression

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Transcription:

One-way ANOVA (Single-Factor CRD) STAT:5201 Week 3: Lecture 3 1 / 23

One-way ANOVA We have already described a completed randomized design (CRD) where treatments are randomly assigned to EUs. There is no blocking and no nesting in a CRD. We will now take a closer look at the model for a CRD when there is tdd/0; only one factor. l/;wj--. We will consider two models (or parameterizations) for describing the single-factor CRD here. The first is called the cell means model and the second is called the effects model. We will mostly use the latter. 2 / 23

Notation tdd/0; Let Y ij be the jth response in treatment i. We have i = 1, 2,..., g groups and j = 1, 2,..., n i where the number of observations from each group does not have to be the same. ni j=1 Y ij Let Ȳ i = n i be the mean response in the ith treatment group (stated as Y-bar or a cell mean ). Let Ȳ = mean. l/;wj--. g ni i=1 j=1 Y ij N be the grand mean response or the overall N = i n i is the total number of observations in the study. 3 / 23

One-way ANOVA: Cell means model Cell Means Model Y ij = µ i + ɛ ij iid with ɛ ij N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i We have one mean parameter µ i for each cell, or separate group. This is the same as Y ij N(µ i, σ 2 ). 4 / 23

One-way ANOVA: Cell means model Cell Means Model iid Y ij = µ i + ɛ ij with ɛ ij N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i The estimates for the mean-structure parameters are simply the cell means, or ˆµ i = Ȳi Estimate for the noise: ˆσ 2 = i j (Y ij Ȳ i ) 2 N g where N = i n i Positive Characteristics We do not need any constraints or restrictions for estimation because we use g parameters to describe g means. ˆµ i is the estimated group mean. Estimates are easy, very intuitive. Negative Characteristic The estimated parameters don t directly tell us how far a treatment mean is from the overall mean, nor how far a treatment mean is from another treatment mean (but we can get this information from our ˆµ i values.) 5 / 23

One-way ANOVA: Cell means model Cell Means Model Y ij = µ i + ɛ ij with ɛ ij iid N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i The Design matrix X is of full rank for the cell means model. Again, very easy to work with, intuitive. Example (1-way ANOVA with g = 3 and n = 2) Suppose we have a one-way ANOVA framework with g = 3 and n = 2 for each group. In the cell means model, the design matrix is of rank 3 and has 6 rows and 3 columns. µ 1 µ 2 µ 3 X = 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 Letting Y = X µ + ɛ using OLS we have ˆµ = ˆµ 1 ˆµ 1 ˆµ 1 = (X X ) 1 X Y = Ȳ 1 Ȳ 2 Ȳ 3 6 / 23

One-way ANOVA: Effects model Effects Model Y ij = µ + α i + ɛ ij with ɛ ij iid N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i In this model, we use g + 1 parameters to describe g means. This is an overparameterization. - One parameter µ for the overall mean. - One parameter α i for each group. This is the same as Y ij N(µ + α i, σ 2 ). 7 / 23

One-way ANOVA: Effects model Effects Model Y ij = µ + α i + ɛ ij with ɛ ij iid N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i µ + α i represents the mean of a group. Because this is an overparameterization, we need a constraint to make the parameters identifiable (i.e. uniquely determined). One option is to use the sum-to-zero constraints which provides intuitive interpretation of the parameters. For balanced data, this is g i ˆα i = 0 and we use the estimates of ˆµ = Ȳ ˆα i = Ȳi Ȳ ˆσ 2 = i j (Y ij Ȳi ) 2 N g. Here, ˆµ represents the overall mean and ˆα i represents the distance that group i is from the overall mean. Some ˆα i values will be positive and some will be negative. 8 / 23

One-way ANOVA: Effects model Effects Model Y ij = µ + α i + ɛ ij with ɛ ij iid N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i Positive Characteristics In the sum-to-zero constraints, the estimated parameters directly tell us how far a treatment mean is from the overall mean. The effects are just deviations from the grand mean. Negative Characteristic We need a constraint or restriction on the parameters for estimation due to overparameterization. * No statistical software uses sum-to-zero-constraints by default, but we will use these when calculating estimates by hand (it s easiest). * SAS uses a constraint that sets the last ˆα i = 0. By default, R sets the first ˆα i = 0. In these constraints µ no longer represents the overall mean, but the mean of a specific reference group. 9 / 23

One-way ANOVA The choice between these models (cell means model or effects model) or constraints does affect the interpretation of the parameters, but the important estimates are the same under any of these choices... * Fitted Ŷ values * Differences between groups or ˆµ i ˆµ j * Residual ˆɛ ij values In a one-way ANOVA, we perceive a scenario where we have distinct group means, with normally distributed errors around the means, and we are interested in comparing group means. This perception is the same regardless of the model and constraint choices above. 10 / 23

One-way ANOVA Unbalanced data in One-way ANOVA If you have unbalance data, n l n k for some l, k and you are using the effects model, then the grand mean i j Y ij i ˆµ = Ȳ = N = i n = i looks like a weighted average of the group means. ˆµ will be pulled toward the larger groups. j Y ij The sum-to-zero constraints are g i n i ˆα i = 0 i n i ˆµ i N Estimates of the effects are shown with the same formula as deviations from the grand mean which is ˆα i = ˆµ i ˆµ But most of the time we will have balanced data, so I will usually state the constraints on the board as g i ˆα i = 0 NOTE: if n i = n j for all i, j then g i n i ˆα i = 0 g i ˆα i = 0. 11 / 23

One-way ANOVA: Sums of Squares ANOVA - The partitioning of the sums of squares is called Analysis of Variance, or ANOVA. In an ANOVA, we break down the total variability in the data into component parts, i.e. into the differing sources of variation. Consider the one-factor experiment: Y ij = µ + α i + ɛ ij with ɛ ij iid N(0, σ 2 ) for i = 1,..., g and j = 1,..., n i We analyze such data as a 1-way ANOVA with only one factor and the hypothesis test of interest is H 0 : µ 1 = = µ g vs. H 1 : at least one group is not equal If we reject this null hypothesis, we usually do follow-up comparisons to see which of the groups are statistically significant from each other. 12 / 23

One-way ANOVA: Sums of Squares Total Variation: SS TOT = i j (Y ij Ȳ ) 2 This is the total sum of squares (corrected for the mean). Variation due to Treatment: SS TRT = i n i(ȳ i Ȳ ) 2 This is the treatment sum of squares. This quantifies how far the groups means are from the overall mean. Unexplained Variation: SS E = i j (Y ij Ȳi ) 2 This is the sum of squares for error. This quantifies how far the individual observations are from their group mean. We know SS TOT = SS TRT + SS E *Fundamental ANOVA identity 13 / 23

One-way ANOVA: Sums of Squares At a minimum, an ANOVA table will list the sources of variation in the experiment and their degrees of freedom. We usually also include the sum of squares (SS x ) and the related mean squares (MS x ). Here is a general layout for a 1-way ANOVA: 14 / 23

One-way ANOVA: Correcting for the mean We almost always estimate an overall mean in our model, so we lose 1 degree of freedom (d.f.) right away. Thus, we essentially start with N 1 d.f., and say the total sum of squares is corrected for the mean. Once we have our overall mean estimated (or ˆµ), then we only need g 1 more parameters to describe the mean structure (i.e. to describe the g cell means). Thus, we use g 1 d.f. for Treatment. The leftover N g d.f. are given for estimation of the error. 15 / 23

One-way ANOVA: Example Example (Response time for circuit types) Three different types of circuit are investigated for response time in milliseconds. Fifteen are completed in a balanced CRD with the single factor of Type (1,2,3). Circuit Type Response Time 1 9 12 10 8 15 2 20 21 23 17 30 3 6 5 8 16 7 From D.C Montgomery (2005). Design and Analysis of Experiments. Wiley:USA 16 / 23

One-way ANOVA: Example Example (Response time for circuit types) 17 / 23

One-way ANOVA: Example Example (Response time for circuit types) See handout for annotated output. 18 / 23

Class Level Information One-way ANOVA: MS TRT and MS E Class Levels Values type 3 1 2 3 Why is the ANOVA table useful? The MS values will be used to perform statistical tests. Number of Observations Read 15 Number of Observations Used 15 Example (Response time for circuit types) Dependent Variable: time Sum of Source DF Squares Mean Square F Value Pr > F Model 2 543.6000000 271.8000000 16.08 0.0004 Error 12 202.8000000 16.9000000 Corrected Total 14 746.4000000 We need to know what we EXPECT2 to get from MS TRT and MS E... E(MS TRT ) = σ 2 + g i n i α 2 i g 1 E(MS E ) = σ 2 If H 0 : µ 1 = µ 2 = µ 3 α i = 0 i is true, then E(MS TRT ) = σ 2, and MS TRT and MS E should be similar. If H A : α i 0 is true for at least one i, then MS TRT > MS E. 19 / 23

One-way ANOVA: MS TRT and MS E We base our statistical test on the ratio of MS TRT MS E. Under H 0 true, F o = MS TRT MS E 1 for our F o (in general). F (g 1,N g) and we expect a value near Under H A true, F o has a stochastically greater distribution than F (g 1,N g) and we reject the null if F o > F (g 1,N g,0.95) 20 / 23

One-way ANOVA: MS TRT and MS E Example (Response time for circuit types) Circuit data F o = 271.8 16.9 = 16.08 compared to F (2,12) to get p-value. p-value is 0.0004 Only valid if model assumptions are met (we ll return to checking the assumptions for this model soon). 21 / 23

One-way ANOVA: Full vs. Reduced Models The overall F -test in a 1-way ANOVA is actually a test for comparing a full model and a reduced model that is nested in the full model. - A reduced model is nested in a full model if it is a particular case of the full model. NOTE: we will use the design term nested in another way later, so be aware of this. The ANOVA table in the 1-way ANOVA compares a full model (requiring g parameters to describe the mean structure) and a reduced model (requiring only 1 parameter to describe the mean structure). Thus, we can think of the F -test as comparing a full and reduced model. 22 / 23

SIDENOTE: SAS Settings 1 On the first line of all my SAS code files, I set the following options: options linesize = 79 nocenter nodate formchar = " ---- + ---+= -/\<>*" ; 2 I set my preferences to have SAS output the results in both listing and HTML format. The HTML output is nice because you automatically get HTML graphics generated, but I ve also found the HTML output difficult to deal with at times as well (like when I m trying to save pieces of it). Therefore, I also generate all my output as a listing. If you are on virtual desktop, I know you can choose this option by going to... Tools Options Preferences... Click the Results tab, and check the box that says Create Listing. Then OK. This listing output is just text and you can easily copy and paste the pieces into LaTeX and use the verbatim environment to present it. If you copy and past into Word, you might use a monospace font, such as Andale Mono or SAS monospace. If you save your listing output it will be as a.lst file. 23 / 23