ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Similar documents
1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

ANOVA: Analysis of Variation

More about Single Factor Experiments

Confidence Intervals, Testing and ANOVA Summary

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Section 4.6 Simple Linear Regression

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

1 Introduction to Minitab

Math 141. Lecture 16: More than one group. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

Statistics for EES Factorial analysis of variance

Comparing Several Means: ANOVA

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

STATISTICS 141 Final Review

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S. Duration - 3 hours. Aids Allowed: Calculator

9 One-Way Analysis of Variance

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Two (or more) factors, say A and B, with a and b levels, respectively.

Battery Life. Factory

9 Correlation and Regression

One-Way Analysis of Variance: ANOVA

2-way analysis of variance

Data Set 8: Laysan Finch Beak Widths

Chapter 11: Analysis of variance

FACTORIAL DESIGNS and NESTED DESIGNS

Homework 2: Simple Linear Regression

Mathematical statistics

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Regression Analysis: Exploring relationships between variables. Stat 251

Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model

2. Outliers and inference for regression

(1) The explanatory or predictor variables may be qualitative. (We ll focus on examples where this is the case.)

1 Introduction to One-way ANOVA

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Inferences for Regression

Tables Table A Table B Table C Table D Table E 675

Stat 427/527: Advanced Data Analysis I

Stat 6640 Solution to Midterm #2

Orthogonal contrasts for a 2x2 factorial design Example p130

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

Week 7.1--IES 612-STA STA doc

Multiple comparisons The problem with the one-pair-at-a-time approach is its error rate.

Difference in two or more average scores in different groups

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Introduction and Single Predictor Regression. Correlation


Analysis of Variance

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Confidence Interval for the mean response

Group comparison test for independent samples

Ch Inference for Linear Regression

Formal Statement of Simple Linear Regression Model

One-way analysis of variance

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Sociology 6Z03 Review II

4.1. Introduction: Comparing Means

STA 101 Final Review

Regression. Marc H. Mehlman University of New Haven

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Diagnostics and Remedial Measures

Introduction to Crossover Trials

CHAPTER 13: F PROBABILITY DISTRIBUTION

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

Introduction to Analysis of Variance (ANOVA) Part 2

Business Statistics. Lecture 10: Course Review

Simple Linear Regression

ANOVA Multiple Comparisons

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

INFERENCE FOR REGRESSION

MAT3378 (Winter 2016)

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Intro to Linear Regression

MATH Notebook 3 Spring 2018

Unit 12: Analysis of Single Factor Experiments

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Intro to Linear Regression

1 One-way Analysis of Variance

CHAPTER 13: F PROBABILITY DISTRIBUTION

Elementary Statistics for the Biological and Life Sciences

III. Inferential Tools

Categorical Predictor Variables

STAT 215 Confidence and Prediction Intervals in Regression

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables

Multi-Way Analysis of Variance (ANOVA)

Lec 5: Factorial Experiment

22s:152 Applied Linear Regression. 1-way ANOVA visual:

Chapter 12. Analysis of variance

Week 14 Comparing k(> 2) Populations

Hypothesis testing: Steps

STAT22200 Spring 2014 Chapter 5

Analysis of Variance Bios 662

Review of Statistics 101

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

An inferential procedure to use sample data to understand a population Procedures

df=degrees of freedom = n - 1

Introduction to Analysis of Variance. Chapter 11

Model Building Chap 5 p251

STA2601. Tutorial letter 203/2/2017. Applied Statistics II. Semester 2. Department of Statistics STA2601/203/2/2017. Solutions to Assignment 03

Transcription:

1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College

An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment B, Placebo (P) Measurement: # of days until blisters heal Data [and means]: A: 5,6,6,7,7,8,9,10 [7.25] B: 7,7,8,9,9,10,10,11 [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11] Question: Are these differences significant? (Or would we expect differences this large by random chance?)

Are these differences significant? 12 12 Ex_1 Ex_2 days 10 8 6 days 10 8 6 A B P A B P A B P Whether differences between the groups are significant depends on the difference in the means the amount of variation within each group the sample sizes

Some notation for ANOVA Entire data set: n = number of individuals all together k = number of groups y ij = value for individual i in group j ȳ = sample mean of quant. variable for entire data set s = sample s.d. of quant. variable for entire data set Group j: n j = # of individuals in group j µ j = population mean for group j ȳ j = sample mean for group j σ j = population standard deviation for group j s j = sample standard deviation for group j

The ANOVA model y ij }{{} = µ j + }{{} ɛ ij }{{} DATA FIT ERROR Relationship to Regression: ɛ ij N(0, σ) FIT gives mean for each level of categorical variable, but the means need not fall in a pattern as in regression. ANOVA can be thought of as regression with a categorical predictor. Assumptions of this model: Each group... is normally distributed about its group mean (µ j ) has the same standard deviation

The ANOVA model (take 2) y ij }{{} DATA = µ + τ j }{{} FIT + ɛ ij }{{} ERROR ɛ ij N(0, σ) The time we express FIT as an overall average (µ) plus a treatment effect (τ j ) for each group. Assumptions of this model still the same: Each group... is normally distributed about its group mean (µ + τ j ) has the same standard deviation

Once again we look at residuals: Checking the assumptions residual ij = e ij = y ij ȳ j Normality Check: normal quantile plots of residuals (boxplots, histograms) ANOVA can handle some skewness, but is quite sensitive to outliers Equal Variance Check: s j s should be approximately equal Rule of Thumb: 2:1 ratio between largest and smallest of group sample standard deviations 4:1 ratio between variances is equivalent Remember that the assumptions are about the population, not the sample. These are hard to check when the data sets are small.

What does ANOVA do? At its simplest (there are extensions) ANOVA tests the following hypotheses: H 0 : The means of all the groups are equal. µ 1 = µ 2 = = µ k τ 1 = τ 2 = = τ k = 0 H a : Not all the means are equal. doesn t say how or which ones differ can follow up with multiple comparisons if we reject H 0

A quick look at the ANOVA table Df SS MS F-stat P-value treatment 2 34.74 17.37 6.45 0.0063 Residuals 22 59.26 2.69 Total 24 94.00

The ANOVA test statistic (ingredients) ANOVA measures two sources of variation in the data and compares their relative sizes variation BETWEEN groups for each data value look at the difference between its group mean and the overall mean SS Tr = (ȳ j ȳ) 2 = (response group mean) 2 MS Tr = SS Tr k 1 variation WITHIN groups for each data value we look at the difference between that value and the mean of its group SS E = (y ij ȳ j ) 2 = (residual) 2 MS E = SS E n k

The ANOVA test statistic (example) We want to measure and compare the amount of variation due to BETWEEN group variation and WITHIN group variation: So for each data value, we calculate its contribution to: BETWEEN group variation: (ȳ j ȳ) 2 WITHIN group variation: (y ij ȳ j ) 2 Example (An Even Smaller Dataset) Suppose we have three groups Group 1: 5.3, 6.0, 6.7 [ȳ 1 = 6.00] Group 2: 5.5, 6.2, 6.4, 5.7 [ȳ 2 = 5.95] Group 3: 7.5, 7.2, 7.9 [ȳ 3 = 7.53]

Computing F overall mean: 6.44; F = 2.5528/0.25025 = 10.21575

Connections between SS Tot, MS Tot, and standard deviation Ignoring groups for a moment: s 2 y = (yij ȳ) 2 n 1 = SS Tot DF Tot = MS Tot So MS Tot = s 2 y and SS Tot = (n 1)s 2 y. These measure the TOTAL variation (of y).

Connections between SS E, MS E, and standard deviation If we just look at the data from one of the groups: s 2 j = j (y ij ȳ j ) 2 n j 1 = So SS[within Group i] = (s 2 j )(DF j) This means SS[within group i] DF j SS E = SS[within] = j SS[within group i] = j s 2 j (n j 1) = j s 2 j DF j

Pooled estimate for σ One of the ANOVA assumptions is that all groups have the same standard deviation. We can estimate this with a weighted average: s 2 p = DF 1 s 2 1 + DF 2 s 2 2 + + DF k s 2 k DF 1 + DF 2 + + DF k s 2 p = SS E DF E = MS E So MS E is the pooled estimate of variance parameter σ 2. If we do this with only two groups, we get the 2-sample t-test with pooled estimate of variance (the check box in StatCrunch that we have been turning off). A similar thing can be done to estimate σ in the simple linear regression model.

Tukey s Honest Significant Differences One method of adjusting the simultaneous confidence intervals for all the pairs is called Tukey s method. Tukey 95% Simultaneous Confidence Intervals A substracted from Lower Upper B -0.4365 3.6865 P 0.8577 4.8645 B subtracted from Lower Upper P -0.7673 3.2395 This corresponds to 98.01% CIs for each pairwise difference. Only P vs A is significant in our example (lower and upper limits have the same sign).

Other views of Multiple Comparisons Tukey multiple comparisons of means 95% family-wise confidence level diff lwr upr B-A 1.6250-0.43650 3.6865 P-A 2.8611 0.85769 4.8645 P-B 1.2361-0.76731 3.2395 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ----------+---------+---------+------ A 8 7.250 1.669 (-------*-------) B 8 8.875 1.458 (-------*-------) P 9 10.111 1.764 (------*-------) ----------+---------+---------+------ Pooled StDev = 1.641 7.5 9.0 10.5