Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n

Similar documents
Formal Statement of Simple Linear Regression Model

Remedial Measures, Brown-Forsythe test, F test

Ch 2: Simple Linear Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

F-tests and Nested Models

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Diagnostics and Remedial Measures

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Linear models and their mathematical foundations: Simple linear regression

Concordia University (5+5)Q 1.

Inference for Regression

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

3. Diagnostics and Remedial Measures

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Chapter 3. Diagnostics and Remedial Measures

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Statistics for Managers using Microsoft Excel 6 th Edition

Simple and Multiple Linear Regression

Simple Linear Regression

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Inferences for Regression

Variance Decomposition and Goodness of Fit

Chapter 2 Inferences in Simple Linear Regression

Basic Business Statistics 6 th Edition

Mathematics for Economics MA course

STAT 540: Data Analysis and Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Multiple Linear Regression

Simple Linear Regression

Chapter 1. Linear Regression with One Predictor Variable

The Multiple Regression Model

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Inference for the Regression Coefficient

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Correlation Analysis

Chapter 12 - Lecture 2 Inferences about regression coefficient

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Regression and Statistical Inference

Lecture 6 Multiple Linear Regression, cont.

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Measuring the fit of the model - SSR

Chapter 4. Regression Models. Learning Objectives

Ch. 1: Data and Distributions

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

TMA4255 Applied Statistics V2016 (5)

Ch 3: Multiple Linear Regression

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Regression Analysis II

Chapter 16. Simple Linear Regression and dcorrelation

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Applied Regression Analysis

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Sociology 6Z03 Review II

Inference in Regression Analysis

Analysis of Variance

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

STA121: Applied Regression Analysis

Section 3: Simple Linear Regression

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Linear Models and Estimation by Least Squares

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Finding Relationships Among Variables

Bias Variance Trade-off

16.3 One-Way ANOVA: The Procedure

Simple Linear Regression

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Lecture 5: Linear Regression

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Inference in Normal Regression Model. Dr. Frank Wood

ECON 450 Development Economics

Chapter 4: Regression Models

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Diagnostics and Remedial Measures: An Overview

Inference for Regression Simple Linear Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 16. Simple Linear Regression and Correlation

Lecture 13 Extra Sums of Squares

Simple Linear Regression

COPYRIGHT. Abraham, B. and Ledolter, J. Introduction to Regression Modeling Belmont, CA: Duxbury Press, 2006

CS 5014: Research Methods in Computer Science

The simple linear regression model discussed in Chapter 13 was written as

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Unit 27 One-Way Analysis of Variance

Lectures on Simple Linear Regression Stat 431, Summer 2012

Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2

2.2 Classical Regression in the Time Series Context

Chapter 14 Simple Linear Regression (A)

Unit 10: Simple Linear Regression and Correlation

Transcription:

Chapter 2. Continued Proofs For ANOVA Proof of ANOVA Identity We are going to prove that Writing SST SSR + SSE. Y i Ȳ (Y i Ŷ i ) + (Ŷ i Ȳ ) Squaring both sides summing over all i 1,...n, we get (Y i Ȳ ) 2 (Y i Ŷ i ) 2 + (Ŷ i Ȳ )) 2 Noting that +2 (Y i Ŷ i )(Ŷ i Ȳ ) Y i Ŷ i Y i (b 0 + b 1 X i ) Y i (Ȳ b 1 X + b 1 X i ) (Y i Ȳ ) b 1 (X i X) Ŷ i Ȳ (b 0 + b 1 X i ) Ȳ (Ȳ b 1 X) + b 1 X i Ȳ b 1 (X i X) the product term in the above equation can be simplified as n (Y i Ŷ i )(Ŷ i Ȳ ) [(Y i Ȳ ) b 1 (X i X)] b 1 (X i X) b 1 S xy b 2 1 Sxx b 1 [S xy b 1 S xx] And since b 1 S xy/s xx, S xy b 1 S xx the rhs of the above equation is zero. Therefore, (Y i Ȳ ) 2 (Y i Ŷ i ) 2 + (Ŷ i Ȳ )) 2 or Expected Mean Squares We are going to prove that SST SSE + SSR E(MSE) σ 2 E(MSR) σ 2 + β 2 1S xx Note that these results allow us to compare MSE MSR in average sense when H 0 : β 1 0 H 1 : β 1 0. Since E(MSR) > E(MSE) when β 1 0, 1 2 we can create a decision rule; reject H 0 if MSR MSE This gives the F -test of ANOVA. is large To find the E{MSE}, we may use the result quoted earlier that SSE χ 2 (n 2) σ 2 This gives E{ SSE } (n 2) σ2 hence, E{MSE} E{SSE/(n 2)} (n 2)σ 2 /(n 2) σ 2 Alternatively, we will prove without normality assumption that E{SST O} (n 1)σ 2 + β 2 1 Sxx Using the expression for E{MSR}, this in turn provides, E{SSE} E{SST } E{SSR} (n 1)σ 2 + β1 2Sxx σ2 β1 2Sxx (n 2)σ 2 this implies that E{MSE} σ 2 Proof of E{SST } (n 1)σ 2 + β 2 1 Sxx Using the model Y i β 0 + β 1 X i + ɛ i, we can write [ Ȳ 1 β0 + β n n 1 X i + β 0 + β 1 X + ɛ Hence, Y i Ȳ β 1 (X i X) + (ɛ i ɛ) Squaring both sides we get SST β 2 1 Sxx + Sɛɛ + 2β 1S xɛ ɛ i Note that since S ɛɛ/(n 1) denotes the sample variance of ɛ 1,..., ɛ n which are i.i.d. with mean zero variance σ 2, we have E{S ɛɛ} (n 1)σ 2. For the expectetion of the product term we see that E{β n 1 (X i X)(ɛ i ɛ)} β n 1 (X i X)E{(ɛ i ɛ)} 0 since E{(ɛ i ɛ)} 0 by the assumption on the errors. This proves that E{SST } (n 1)σ 2 + β 2 1S xx ] 3 4

Proof of E{SSR} σ 2 + β 2 1 Sxx For this result we note that Ŷ i Ȳ b 1 (X i X i ), hence SSR (Ŷ i Ȳ ) 2 b 2 1 (X i X) 2 b 2 1S xx Therefore E{SSR} S xxe{b 2 1 } To evaluate E{b 2 1 }, use the formula which gives V ar(b 1 ) E{b 2 1 } (E{b 1}) 2 E{b 2 1 } V ar(b 1) + (E{b 1 }) 2 Using the sampling properies of b 1, we obtain from the above equation hence E{b 2 1} σ 2 /S xx + β 2 1, E{SSR} S xx[σ 2 /S xx + β1 2] σ 2 + β1 2Sxx Equivalence of t F for H 0 : β 1 0 vs. H 1 : β 1 0 The test statistic t is given by Using the formula we find that t b 1 /s{b 1 } s 2 {b 1 } MSE/S xx t 2 b 2 1 Sxx/MSE The numerator b 2 1Sxx may be recognized to be SSR MSR. Hence t 2 MSR/MSE which is the usual F, the ANOVA F -test statistic. Since t 2 (ν) follows F (1, ν) distribution, the C.R. is equivalent to t > t(1 α 2, n 2) F > t 2 (1 α, n 2) F (1 α; 1, n 2). 2 5 6 2.8 General Linear Test Approach This approach is based on the fact under restrictions on the model the sum of squared errors is generally larger as compared to that without any restriction. (Because the SSE without any restrictions is the absolute minimum). The difference in these summ of squared errors is used to propose a test statistic for the hypothesis imposing restrictions on the model. The model without any hypothesis is known as the full model the model under the hypothesis is called the reduced model. Let SSE(F ) SSE(R) denote the sum of squared errors under these models, the SSE(F ) (Y i Ŷ i ) 2 min (Y i Ŷ for any linear prediction Ŷ i, SSE(F ) SSE(R) Under departures from the H 0, the difference SSE(R) SSE(F ) is expected to be significantly large. create a T.S. as F [SSE(R) SSE(F )]/(df R df F ) SSE(F )/df F i )2 Hence, we may 7 This test statistic follows an F (df R df F, df F ) under the null hypothesis, when the errors are assumed normally distributed. Hence the decision rule to reject the null hypothesis is given by Testing H 0 : β 1 0 F > F (1 α; df R df F, df F ) In this case, SSE(F ) SSE. The reduced model becomes, Y i β 0 + ɛ i, in which case the L.S. estimator of β 0 becomes b 0 (R) Ȳ Ŷ ( R) Ȳ, hence SSE(R) the T.S. becomes F (Y i Ŷ i (R)) 2 SST (SST SSE)/[(n 1) (n 2)] SSE/(n 2) the usual F statistic. MSR/MSE 8

2.9 Descriptive Measures of Association The goodness of fit of the line can be measured by amount of the total variation attributed to regression. For example, if SSR SST then SSE 0 all the predicted values fall on the LS line. In this case we can say that the regression explains the variation in Y i s 100%. SSR is termed as the Explained Variation SSE as the unexplained variation. Coefficient of Determination It is defined as the ratio of Explained Variation to the Total Variation, r 2 SSR SST Since the denominator SST SSR + SSE SSR, all the sum of squares are non-negative, 0 r 2 1 1. When all the values fall on the regression line, SSE 0 r 2 1. Hence the predictor variable accounts for all the variability. 2. When b 1 0, the predictor variable drops out from the model we have SSE SST, i.e. SSR 0 r 2 0. And the variable X does not play any role in explaining the variation in Y s. 3. The above two cases are extreme cases. The value of r 2 closer to 1 is regarded as giveing a good fit. Usually it is measured in percentages is also called as Multiple Correlation Coefficient. The Correlation Coefficient It is defined by r ± r 2 positive sign corresponds to positive slope i.e b 1 > 0 the negative sign corresponds to negative slope b 1 < 0. It is clear that 1 r 1 A computational formula is given by r b 1 Sxx Syy The use of correlation coefficient is more in describing the joint association between X Y. And since r 2 < r, r may give an impression of a closer relationship than r 2 9 10 Example 2.9.1 The coefficient of determination for the height-weight data from the ANOVA table is given by since b 1 > 0. Adjusted R 2 r 2 2930.8.5570 55.7% 2329.5 r.7463 Since the SSR SSE carry different degrees of freedom, their ratio adjusted for degrees of freedom may be more appropriate as a measure of goodness of fit; it is given by AdjustedR 2 SSE/(n 2) 1 SST O/(n 1) 1 n 2 SSE n 1 SST O For the previous example Adj.R 2 1 22 2329.5 23 5260.3.5359 which tallies with the value in the ANOVA table. Chapter 3. Diagnostics Remedial Measures Diagnostic Tools Diagnostic tools are used to check any irregularities in the data. Graphical techniques are visual aids in locating patterns in the data identifying any extreme or unusual observations. 3.1. Diagnostics for Predictor Variable (X) Dot Plots These are basically frequency plots. Dots are placed above the variable line for the values of the variable. Dots are stacked over each other if a value is repeated in the data. These plots display the dispersion of the variable. It is desirable that the data is evenly dispersed. Figure 3.1 below represents the Dot Plot for the heights of the HtWt data. It shows that the data are evenly distributed there are no outlying observations. It can be obtained using the Graph/Dotplot menu from MINITAB. 11 12

Stem Leaf Plot Sequence Plots Sequence plot is the plot of the observation against its place in the data. Such plots are useful when the data is observed as a sequence of time. The points are connected to show the time sequence more effectively, can depict a time trend or some other pattern. Figure 3.2 gives a sequence plot for the height data. But in this case it does not have much meaning since the order of the data is arbitrary. Such a plot can be obtained using the Graph/Time Series Plot menu from MINITAB. This plot is an alternative way to display the data. The main column, called stem of the data generally displays, first n 1 disgits for n-digit numbers in the data. The data is displayed by listing the last digit of the observation (called leaf) beside the proper stem. To the left of the stem column may appear the frequency of the branch, meaning the number of the observations in the corresponding row. Also the frequency of the branch containing the median is written in parenthesis. It basically resembles the histogram displayed sideways may bring out the symmetry or asymmetry of the distribution. [Note that symmetry of the distribution is preferred.] Figure 3.3 displays the distribution of the heights. The distribution is concentrated more towards larger values. It can be obtained using Graph/Stem Leaf Plot menu from MINITAB. 13 14 3.2 Residual Plots Residuals as introduced earlier may be used for checking various model departures. Recall that the residuals defined by Box-Whisker Plot This plot gives a Box with the top boundary as the 3rd quartle the bottom boundary as the first quartile a line in the middle of the box signifying the median. Two lines protrude from the bottom top giving the minimum maximum. This is known as the five number summary of the data. The median being in the centre signifies symmetry of the data. Any long whisker signifies outlying observation. Figure 3.4 gives the Box-Whisker plot of the height data signifies that asymmetry as reported looking at the Stem--Leaf plot is not severe. It can be obtained using Graph/Box Plot menu from MINITAB. e i Y i Ŷ i may be regarded as the observed errors in contrast to the unknown true errors ɛ i Y i E{ɛ i } The properties used in diagnostic residual plots are 1. Mean The mean of n-residuals is ē e i 0. n Since ē is always zero, it does not provide any information on the assumption E{ɛ i } 0. 2. Variance The variance of the n residuals for the simple regression model is defined by (e i ē) 2 n 2 e2 i n 2 MSE If the model is appropriate it provides an unbiased estimator of the error variance σ 2. 15 16

3. Nonindependence The residuals in general are not independent as they are subject to the linear constraints (i) e i 0 (ii) X ie i 0. The dependency could, however, be ignored when n is large Semistudentized Residuals It may be helpful to stardize the residuals for residual analysis. The following form of stardization is useful: e i e i ē MSE These are known as Semistudentized Residuals because ei they are approximation to the stardized residual s.d.{ei}. Since the s.d.{e i } is complex varies for each X i, MSE is only an approximation to this stard deviation. Departures to be Studied from residuals 1. The regression function is not linear. 2. The variance of the error terms is not constant. 3. The error terms are not independent. 4. The model fits all but a few outlier observations. 5. The error terms are not normally distributed. 6. One or several important predictor variables are absent from the model. 17