Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Similar documents
Six Sigma Black Belt Study Guides

Model Building Chap 5 p251

1 Introduction to Minitab

23. Inference for regression

INFERENCE FOR REGRESSION

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Inference for Regression Inference about the Regression Model and Using the Regression Line

Ch 13 & 14 - Regression Analysis

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

Correlation and Regression

Basic Business Statistics 6 th Edition

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Correlation & Simple Regression

Examination paper for TMA4255 Applied statistics

Multiple comparisons - subsequent inferences for two-way ANOVA

Data Set 8: Laysan Finch Beak Widths

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Inferences for Regression

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Regression. Marc H. Mehlman University of New Haven

Basic Business Statistics, 10/e

(1) The explanatory or predictor variables may be qualitative. (We ll focus on examples where this is the case.)

Inference for the Regression Coefficient

STATISTICS 110/201 PRACTICE FINAL EXAM

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

Table 1: Fish Biomass data set on 26 streams

Review of Statistics 101

Models with qualitative explanatory variables p216

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

LEARNING WITH MINITAB Chapter 12 SESSION FIVE: DESIGNING AN EXPERIMENT

SMAM 314 Exam 42 Name

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

STAT 328 (Statistical Packages)

10.2: The Chi Square Test for Goodness of Fit

Lecture 6 Multiple Linear Regression, cont.

Multiple Regression Examples

Conditions for Regression Inference:

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Chapter 12 - Lecture 2 Inferences about regression coefficient

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

MULTIPLE LINEAR REGRESSION IN MINITAB

This gives us an upper and lower bound that capture our population mean.

1. Least squares with more than one predictor

Analysis of Bivariate Data

School of Mathematical Sciences. Question 1

EXAM IN TMA4255 EXPERIMENTAL DESIGN AND APPLIED STATISTICAL METHODS

Review of Multiple Regression

The simple linear regression model discussed in Chapter 13 was written as

STAT 350 Final (new Material) Review Problems Key Spring 2016

Chapter 9. Correlation and Regression

This document contains 3 sets of practice problems.

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

Confidence Interval for the mean response

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

28. SIMPLE LINEAR REGRESSION III

1 Use of indicator random variables. (Chapter 8)

ECO220Y Simple Regression: Testing the Slope

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

STAT 360-Linear Models

Multiple Linear Regression

Inference with Simple Regression

Warm-up Using the given data Create a scatterplot Find the regression line

Sociology 6Z03 Review II

Confidence Intervals, Testing and ANOVA Summary

Mathematical Notation Math Introduction to Applied Statistics

Chapter 12: Multiple Regression

Chapter 14 Multiple Regression Analysis

Module 2. General Linear Model

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Multiple Regression an Introduction. Stat 511 Chap 9

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

General Linear Model (Chapter 4)

Multiple Regression Methods

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

1. How will an increase in the sample size affect the width of the confidence interval?

SMAM 314 Practice Final Examination Winter 2003

CRP 272 Introduction To Regression Analysis

Homework 2: Simple Linear Regression

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Notebook Tab 6 Pages 183 to ConteSolutions

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Lecture 18: Simple Linear Regression

Inference for Regression Simple Linear Regression

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Correlation and Linear Regression

Simple Linear Regression: A Model for the Mean. Chap 7

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

Unit 12: Analysis of Single Factor Experiments

Solution: X = , Y = = = = =

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Difference in two or more average scores in different groups

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Transcription:

Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a study is designed to evaluate different methods of teaching reading to 8-year old children. The response variable is final scores of the children after participating in the reading program. However, the children participating in the study will have different reading ability prior to entering the program. Also, there will be many factors outside the school that may have an influence on the reading score of the child, such as socioeconomic variables associated with the child s family. The variables that describe the differences in experimental units or experimental conditions are called covariates. The analysis of covariance is a method by which the influence of the covariates on the treatment means is reduced. This will often result in increase power for tests of hypotheses. In an analysis of covariance, we estimate factor effects over and above the effect of the covariate. Thus, we obtain estimates of differences among factor level means that would occur if all units had the same value of the covariate. The resulting means are called adjusted treatment means (or least squares means) and are calculated for the mean of the covariate for all observations. For a clear interpretation of the results in an analysis of covariance, the covariate should be measured before the study; or if measured during the study, it should not be influenced by the treatments in any way. The following example illustrates a case where the covariate is affected by the treatments. Example of treatments affecting covariate A company was conducting a training school for engineers to teach them accounting and budgeting principles. Two teaching methods were used, and engineers were assigned at random to one of the two. At the end of the program, a score was obtained for each engineer reflecting the amount of learning. The analyst decided to use as a covariate the amount of time devoted to study (which the engineers were required to record). After conducting the analysis of covariance, the analyst found that training method had virtually no effect. The analyst was baffled by this finding until it was pointed out that the amount of study time was also effected by the treatments, and analysis indeed confirmed this. One of the training methods involved computer-assisted learning which appealed to the engineers so that they spent more time studying and also learned more. In other words, both the learning score and the amount of study time were influenced by the treatment in this case. As a result of the high correlation between the amount of study time and the learning score, the marginal treatment effect of the teaching methods on amount of learning was small and the test for treatment effects showed no significant difference between the two teaching methods. Covariates commonly used with human subjects include prestudy attitudes, age, socioeconomic status, and aptitude.

The hypothesis test of interest in analysis of covariance is: H 0 : µ, Adj,2 =... Adj Adj, t Ha: At least two of the adjusted population means are unequal Assumptions made for analysis of covariance. For individuals with the same value on the covariate(x) and the same value for the categorical predictor variable, the dependent variable has a normal distribution. 2. Homogeneity of variance. Observations are independent 4. The relationship between the response and the covariate is linear. 5. The slopes of the different treatment regression lines are equal. 6. The treatments do not affect the covariate. Analysis of Covariance In Minitab Example An experiment has been set up to determine the effectiveness of three new ergonomic designs for airplane control panels. Twenty-four pilots have been randomly selected for the experiment and assigned to training in a flight simulator that contains one of the control panels (eight planes per panel). After completion of training on their respective control panels, the pilots are presented with eight emergency situations in the flight simulator. The emergency situations are presented in random order, and the total time (in seconds) required to make all emergency responses is recorded for each pilot. These data are found in the table below. The only factor of interest in this experiment is panel configuration. The response variable is reaction time. The table also gives the number of years of experience each pilot has. The latter variable is not controlled by the experimenter, but this uncontrolled variable (or covariate) may influence reaction time. How can the effect of the covariate be accounted for in the analysis of the data.

Inputting Data: Reaction Time Years Experience Panel Indicator Indicator 2 6.7 7.7.0 0.0 0.0 6. 7.4.0 0.0 0.0 6.0 0.7.0 0.0 0.0 5.9 22..0 0.0 0.0 7. 6..0 0.0 0.0 7.7 4..0 0.0 0.0 6.0 6.5.0 0.0 0.0 6.4 8.8.0 0.0 0.0 5.8.2 2.0.0 0.0 6.5 2.6 2.0.0 0.0 6.8 4. 2.0.0 0.0 6. 5. 2.0.0 0.0 6.0 4.7 2.0.0 0.0 5.4. 2.0.0 0.0 5.7 4.6 2.0.0 0.0 5.4 8. 2.0.0 0.0 6.0.8.0 0.0.0 6.5 8.2.0 0.0.0 7.0 7.0.0 0.0.0 7.0 6.0.0 0.0.0 7.2 0.9.0 0.0.0 6.8.2.0 0.0.0 6.6 8.9.0 0.0.0 7.4.0.0 0.0.0 Minitab Commands for Scatterplot: GRAPH > SCATTERPLOT > WITH GROUPS > OK > Y-VARIABLE Reaction Time > X-VARIABLE Years of Experience > CATEGORICAL VARIABLE Panel > OK Reaction Time for the Three Panels with Covariate Years of Experience 8.0 7.5 Panel 2 Reaction Time 7.0 6.5 6.0 5.5 0 5 0 5 Years of Experience 20 25

Results from Performing One-Way Analysis of Variance Note: These results can be used for comparison purposes later to learn if any benefits are achieved from performing an analysis of covariance. One-way ANOVA: Reaction Time versus Panel Source DF SS MS F P Panel 2 2.790.95 4.62 0.022 Error 2 6.46 0.02 Total 2 9.6 H 0 : µ 2 H a : At least two of the treatment means are unequal α=.05 F=4.62 P=.022 Reject H 0 in favor of H a. The data provide sufficient evidence to conclude that at least two of the treatment means are unequal. Results from Performing an Analysis of Covariance MODEL Years of Experience Panel > COVARIATE Years of Experience > OK > OK Analysis of Variance for Reaction Time, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Panel 2 2.7900.8574.9287 5.88 0.000 Years of Experience.975.975.975 2.26 0.000 Error 20 2.4288 2.4288 0.24 Total 2 9.6 H 0 : µ Adj, Adj,2 Adj, H a : At least two of the adjusted treatment means are unequal α=.05 F=5.88 P=.000 Reject H 0 in favor of H a. The data provide sufficient evidence to conclude that at least two of the adjusted treatment means are unequal. Note: Compared with the one-way analysis of variance, the analysis of covariance has provided stronger evidence against the null hypothesis.

How Estimated Adjusted Treatment Means are Calculated COMMANDS: STAT > REGRESSION > REGRESSION > RESPONSE VARIABLE Reaction Time > PREDICTOR VARIABLES Years of Experience Indicator Indicator 2 > OK Regression Analysis: Reaction Tim versus Years of Exp, Indicator,... The regression equation is Reaction Time = 7.55-0.0885 Years of Experience - 0.85 Indicator + 0.00 Indicator 2 Predictor Coef SE Coef T P Constant 7.545 0.297 4.5 0.000 Years of Experience -0.08846 0.0558-5.68 0.000 Indicator -0.854 0.86-4.65 0.000 Indicator 2 0.002 0.806 0.7 0.869 S = 0.48480 R-Sq = 7.4% R-Sq(adj) = 69.4% From the output above, ŷ = 7.545 -.088YrsExp -.85Indicator +.0Indicator2 The estimated adjusted treatment means can be obtained using the following regression equations: Control Panel : ŷ =7.545 -.088YrsExp -.85(0) +.0(0) = 7.545 -.088YrsExp Control Panel 2: ŷ =7.545 -.088YrsExp -.85() +.0(0) = 6.692 -.088YrsExp Control Panel : ŷ =7.545 -.088YrsExp -.85(0) +.0() = 7.575 -.088YrsExp Estimated adjusted treatment means are calculated at the overall mean for YrsExp which is 9.42. Thus the estimated adjusted treatment mean for control panel is Control Panel : ŷ = 7.545 -.088YrsExp = 7.545-.088(9.42) = 6.7 Pairwise Comparisons Using Bonferroni Procedure MODEL Years of Experience Panel > COVARIATE Years of Experience > OK > COMPARISONS > TERMS Panel > Bonferroni > OK > OK

Grouping Information Using Bonferroni Method and 95.0% Confidence Panel N Mean Grouping 8 6.7 A 8 6.7 A 2 8 5.9 B Means that do not share a letter are significantly different. Bonferroni 95.0% Simultaneous Confidence Intervals Response Variable Reaction Time All Pairwise Comparisons among Levels of Panel Panel = subtracted from: Panel Lower Center Upper -------+---------+---------+--------- 2 -. -0.854-0.78 (-----*-----) -0.442 0.002 0.5020 (-----*-----) -------+---------+---------+--------- -0.80 0.00 0.80 Panel = 2 subtracted from: Panel Lower Center Upper -------+---------+---------+--------- 0.4276 0.886.40 (-----*-----) -------+---------+---------+--------- -0.80 0.00 0.80 Assessing Reasonableness of Normality and Equal Variance Assumption MODEL Years of Experience Panel > COVARIATE Years of Experience > GRAPHS > Histogram of Residuals Normal Plot of Residuals Residuals versus fits > OK > OK Assessing Reasonableness of Normality Assumption Normal Probability Plot (response is Reaction Time) Histogram (response is Reaction Time) 99 95 90 5 4 Percent 80 70 60 50 40 0 20 0 5 Frequency 2-0.8-0.6-0.4-0.2 0.0 Residual 0.2 0.4 0.6 0.8 0-0.6-0. 0.0 Residual 0. 0.6 From the above plots, the normality assumption is reasonable.

Assessing Reasonableness of Equal Variance Assumption Versus Fits (response is Reaction Time) 0.50 0.25 Residual 0.00-0.25-0.50-0.75 5.5 6.0 6.5 Fitted Value 7.0 7.5 From the above plots, the equal variance assumption is reasonable. Assessing Reasonableness of Equal Slopes Assumption MODEL Years of Experience Panel Years of Experience*Panel > COVARIATE Years of Experience > OK > OK Source DF Seq SS Adj SS Adj MS F P Panel 2 2.7900 0.8226 0.4.05 0.072 Years of Experience.975.098.098 22.9 0.000 Panel*Years of Experience 2 0.005 0.005 0.0008 0.0 0.994 Error 8 2.4272 2.4272 0.48 Total 2 9.6 H 0 : Panel and Years of Experience Do Not Interact (The slopes of the different treatment regression lines are equal) H a : Panel and Years of Experience Do Interact α=.05 F =.0 p-value=.994 Fail to reject H 0. The assumption of equal slopes for the different treatment regression lines is reasonable.