Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Similar documents
Chapter 10: Analysis of variance (ANOVA)

Week 14 Comparing k(> 2) Populations

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

1 Introduction to One-way ANOVA

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

Chapter 11 - Lecture 1 Single Factor ANOVA

Chapter 11 - Lecture 1 Single Factor ANOVA

ANOVA: Comparing More Than Two Means

16.3 One-Way ANOVA: The Procedure

An inferential procedure to use sample data to understand a population Procedures

EX1. One way ANOVA: miles versus Plug. a) What are the hypotheses to be tested? b) What are df 1 and df 2? Verify by hand. , y 3

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

STATS Analysis of variance: ANOVA

Unit 27 One-Way Analysis of Variance

Ch 2: Simple Linear Regression

Introduction to Business Statistics QM 220 Chapter 12

Formal Statement of Simple Linear Regression Model

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Lecture notes 13: ANOVA (a.k.a. Analysis of Variance)

Inference for the Regression Coefficient

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter 10. Design of Experiments and Analysis of Variance

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance (ANOVA)

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

ANOVA - analysis of variance - used to compare the means of several populations.

In ANOVA the response variable is numerical and the explanatory variables are categorical.

Mathematics for Economics MA course

4.1. Introduction: Comparing Means

Inference for Regression Simple Linear Regression

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

df=degrees of freedom = n - 1

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Analysis of Variance (ANOVA)

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Simple Linear Regression: One Quantitative IV

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

The Multiple Regression Model

Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

CBA4 is live in practice mode this week exam mode from Saturday!

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

1 The Randomized Block Design

An Old Research Question

While you wait: Enter the following in your calculator. Find the mean and sample variation of each group. Bluman, Chapter 12 1

Analysis of Variance. ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Simple Linear Regression: One Qualitative IV

Analysis of Variance

ANOVA Analysis of Variance

Chapter 14 Simple Linear Regression (A)

Statistics and Quantitative Analysis U4320

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

The t-statistic. Student s t Test

Chapter 7, continued: MANOVA

Ch 3: Multiple Linear Regression

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

Mathematical Notation Math Introduction to Applied Statistics

Practice Final Exam. December 14, 2009

Topic 22 Analysis of Variance

This document contains 3 sets of practice problems.

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

SIMPLE REGRESSION ANALYSIS. Business Statistics

Lecture 18: Analysis of variance: ANOVA

Design of Experiments. Factorial experiments require a lot of resources

Fractional Factorial Designs

Chapter 15: Analysis of Variance

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

We need to define some concepts that are used in experiments.

Statistics For Economics & Business

Chapter 12 - Lecture 2 Inferences about regression coefficient

Inference for Regression

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

ANOVA: Comparing More Than Two Means

INTRODUCTION TO ANALYSIS OF VARIANCE

Multiple comparisons - subsequent inferences for two-way ANOVA

Statistiek II. John Nerbonne using reworkings by Hartmut Fitz and Wilbert Heeringa. February 13, Dept of Information Science

Chapter Seven: Multi-Sample Methods 1/52

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Concordia University (5+5)Q 1.

ECO220Y Simple Regression: Testing the Slope

3. Design Experiments and Variance Analysis

Factorial designs. Experiments

ANOVA TESTING 4STEPS. 1. State the hypothesis. : H 0 : µ 1 =

STAT 115:Experimental Designs

Ch 11- One Way Analysis of Variance

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

One-way Analysis of Variance. Major Points. T-test. Ψ320 Ainsworth

BNAD 276 Lecture 10 Simple Linear Regression Model

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Hypothesis testing: Steps

Transcription:

Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population order to predict a future value X n1 using its point estimate X the prediction error X X n1 being a linear combination of normals is itself normally distributed with mean E [ X X n1 ] = = 0 and variance V [ X X n1 ]=V [ X ]V [ X n1 ]= 2 n 2 = 2 1 1 n. Thus the standardized variable Z = X X n1 1 1 n will be standard normal. Hence with probability 1 we can say z /2 Z z / 2 or solving for the future value X n1 we find that the 1001 % predictionterval for a single future observation X n1 (PI for short) is : X z / 2 1 1 n X n1 X z / 2 1 1 n. Note that the width of the predictionterval which is proportional to the factor 1 1 n (close to 1) is considerably wider (by approximately a factor of n ) than the width of the confidence interval for the mean whose width is proportional to 1 n. Completely randomized designs (or one way classification) : Given independent random samples from different populations which could represent treatments, groups, etc. the experimenter wishes to test the null hypothesis H 0 : 1 = 2 =...= = that the (actual population) means from these samples are the same. Denoting the j th observation the i th sample by y i j We have the one way classification scheme Observations Means Sums of Squares n 1 Sample 1 : y 11, y 12,..., y 1n 1 y 1 y 1j y 1 2 n 1 Sample 2: y 21, y 22,..., y 2n2 y 2 y 2j y 2 2 n 1 Sample : y 1, y 2,..., y n y 1 y j y 2 The sum of all observations, the total sample size N and the overall sample mean or grand mean are y

T o = y = ij T i where T i = y N = ij y= y ij i =1 j =1 = WEEK 13 page 2 i =1 y i = Total sum of squares decomposition : (SSE=within sample SS) + (SS(Tr)=between sample SS) total sum of squares = error sum of squares + treatment sum of squares SST = SSE + SS( Tr ) n y i ij y 2 = y ij y i 2 y i y 2 j =1 Degrees of freedom N - 1 = N- + -1 Mean Square (MS) treatment mean square MS(Tr) = SS(Tr) /(-1) error mean square MSE = SSE/(N-) To get mean square sum of squares we divide each sum of squares by its number of degrees of freedom. j =1 Derivation : The sum of squares identity follows with a little algebra upon squaring out y ij y = y ij y i y i y and summing the result, noting that the sum of the cross term vanishes by definition of the sample means while the square of the last term is the same within a given sample of size as it doesn't depend on j. With the correction term for the mean given by one has the Shortcut formulas : 2 C = T o N = N y 2, SST = j =1 2 T i y 2 ij C SS Tr = C i =1 SSE = SST SS Tr where T i = y = j i y i is the sum of observations in the i th sample. Hypothesis test : Under the assumption of the null hypothesis H 0 : 1 = 2 =...= = that the treatment means are the same, both the error and the treatment mean squares are unbiased estimates of 2. That is one has 2 =E [ MS Tr]=E [ MSE ]. Thus under H 0 these mean square quantities behave very much lie sample variances. Their ratio upon which the test is based is the F test statistic MS Tr SS Tr / 1 F =F 1, N = = MSE SSE /N with -1 numerator and N- denominator degrees of freedom. We reject H 0 at significance level if the above F statistic exceeds the F-critical value F F T o N

corresponding to the numbers of degrees of freedom parameters above. Typically WEEK 13 page 3 the results obtained by decomposing the total sum of squares into its parts is summarized in an Analysis of Variance Table : Source of Degrees of Sum of variation freedom squares Mean square F Treatments -1 SS(Tr) MS(Tr) = SS(Tr)/(-1) MS Tr/ MSE Error N- SSE MSE = SSE/(N-) Total N-1 SST Conclusion : Reject H 0 : 1 = 2 =...= = if F F The alternative hypothesis here is that at least two of the population means are different : H a : m n for some m n (at least two means are unequal). EXAMPLE 1 In an effort to determine the most effective way to teach safety principles to a group of employees at Weedco, four different methods were used. A sample of 20 employees were randomly assigned to one of the four groups. The first group was given programmed instruction boolets and wored through the course at their own pace. The second group attended lectures. The third group watched television presentations, and a fourth was divided into small discussion groups. At the end of the session, a test was given to the four groups. A high score of 10 was possible. The results were : TEST GRADES Programmed Group instruction Lecture TV discussion 6 8 7 5 5 7 9 5 6 8 6 6 5 8 8 6 6 8 9 5 The following is an Analysis of Variance Mini-tab software output with missing information: ANALYSIS OF VARIANCE ON GRADES TREAT 26.550 8.850 ERROR TOTAL 36.550 a) Complete the missing values : SSE = SST SS(Tr) = 36.550 26.550 = 10.0 N = 20 total observations so N-1 = 19 is the total DF (degrees of freedom), There are = 4 groups so -1 = 3 is the treatment DF and N - = 16 is the error DF, MSE = SSE/(N- ) = 10/16 =.625 F = MS(Tr) / MSE = 8.850/.625 = 14.16 To summarize we have TREAT 3 26.550 8.850 14.16 ERROR 16 10.0.625 TOTAL 19 36.550 b) Test at the.05 level that there is no difference among the four means.

H 0 : 1 = 2 =...= = H a : m n for some m n Reject H 0 if F F.05 3,16=3.24 WEEK 13 page 4 Decision : F = 14.16 is greater than 3.24 so we reject H 0 at significance level.05. For these sample means y 1 =5.6, y 2 =7.8, y 3 =7.8, y 4 =5.4, the largest difference is y 2 y 4 =2.4. To get a 95% confidence interval for the actual difference 2 4, for the sample variance s 2 we use the pooled MSE (the mean sum of squares for error) which is an unbiased t estimator of 2 / 2 t= y 2 y 4 2 4. Re-woring the statement s2 1 1 t /2 where the number n 2 n 4 of degrees of freedom of the t variable is the same as the number N- = 16 for s 2 = MSE, since s 2 1 1 =.6252 /5=.25= 1 n 2 n 4 4 so s2 1 1 = 1 n 2 n 4 2, we find our 95% CI for 2 4 is y 2 y 4 ±t.025, 16 s2 1 n 2 1 n 4 =2.4±2.12/ 2=[1.34,3.46]. Note that we can not be 95% sure that all 2 = 4 2 =6 differences of means simultaneously lie in these six 95% confidence intervals for each since nowing that 6 different events each have probability.95 does not allow us to conclude the same is true of their intersection. Bonferroni's method discussed in section 12.4 says that if we want a confidence interval statement for all 6 differences of means to hold simultaneously with probability 1 in the individual statements we should replace by /6 to get our CI's for each difference. EXAMPLE 2 Problem 12.5 of text The following are numbers of mistaes made in 5 successive days for 4 technicians woring for a photographic laboratory : Technician I Technician II Technician III Technician IV 6 14 10 9 14 9 12 12 10 12 7 8 8 10 15 10 11 14 11 11 Test at the level of significance =.01 whether the differences among the 4 sample means can be attributed to chance. There are N = 20 total observations here hence 19 total degrees of freedom The grand meas y=10.65 The four sample means are y 1 =9.8, y 2 =11.8, y 3 =11, y 4 =10 corresponding to the sample totals T 1 =49, T 2 =59, T 3 =55, T 4 =50 The grand total is T o =213 so C =213 2 / 20=2268.45.

4 5 y 2 ij j =1 = 2383 so SST = 2383 2268.45 = 114.55 by the shortcut formula. 4 2 T Since each =5, SS Tr= i 5 C=2281.4 2268.45=12.95 Thus SSE = SST SS(Tr) = 114.55-12.95 = 101.6 and we have the ANOVA table TREAT 3 12.95 4.3167.67979 ERROR 16 101.6 6.350 TOTAL 19 114.55 Decision : Since (using table 6(b) of appendix B), F =.67979 F.01 3,16 = 5.29 we do not reject H 0 at significance level.01. WEEK 13 page 5 Note in this case if we want to write down a confidence interval using the grand mean y to estimate the common mean belonging to all = 4 populations, for the pooled variance s 2, under the assumption of H 0 we can use the total sum of squares over the total degrees of freedom or s 2 = SST/(N-1) = 114.55/19 = 6.028947 which gives s = 2.45539 to get (for a t critical value with N-1 = 19 degrees of freedom) a 95% CI for we solve the inequality t / 2 t= y s/ N t /2 giving the 1001 % CI for : y±t.025,19 s/20=10.65±2.0932.45539/20=10.65±1.1491=[9.501,11.799] EXAMPLE 3 Problem 12.7 of text Given the following observations collected according to the one way analysis of variance design Treatment 1 : 6 4 5 Treatment 2 : 13 10 13 12 Treatment 3 : 7 9 11 Treatment 4 : 3 6 1 4 1 a) Decompose each observation y i j as y ij =y y i y y ij y i and obtain the sum of squares and degrees of freedom for each component. There are N = 15 observations, with total sum T o =105 hence grand mean 105/15 = 7 The sample sums are T 1 =15, T 2 =48, T 3 =27, T 4 =15 The corresponding sample means are y 1 =15/3=5, y 2 =48/ 4=12, y 3 =27/3=9, and y 4 =15/5=3 In matrix notation we have y ij = y y i y y ij y i 6 4 5 1 7 7 7 7 2 2 2 1 1 0 13 10 13 12 = 7 7 7 7 5 5 5 5 1 2 1 0 7 9 11 7 7 7 2 2 2 2 0 2 3 6 1 4 7 7 7 7 4 4 4 4 4 0 3 2 1 2 SST = 1 2 3 2 2 2 6 2 3 2 6 2 5 2 0 2 2 2 4 2 4 2 1 2 6 2 3 2 6 2 =238 with N-1 = 14 d. f.

SS(Tr) = 3 2 2 45 2 32 2 5 4 2 =204 with -1 = 3 d. f. WEEK 13 page 6 SSE = SST SS(Tr) = 34 with N - = 11 d. f. b) Construct an analysis of variance table and test the equality of treatments using =.05 : TREAT 3 204 68 22 ERROR 11 34 3.0909 TOTAL 14 238 By table 6(a) of appendix B F =22 F.05 3,11=3.59 so we reject H 0 at level.05. (said differently F = 22 is significant at this level ) The Model equation for one way classificatios : Y ij = i ij for,2,..., ;,2,..., where the ij are independent normals with zero means and common variance 2. Here i = i gives the mean of the i th population. The null hypothesis in this formulation says that with i = the effect of the i th treatment, all the effects are zero or H o : 1 = 2 =...= =0. Our best estimates of the parameters under a least squares criterion are =y= grand mean, i = y i y, i = y i