SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Similar documents
Formal Statement of Simple Linear Regression Model

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression

Linear models and their mathematical foundations: Simple linear regression

2.1: Inferences about β 1

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 1 Linear Regression with One Predictor Variable.p2

Multiple Linear Regression

F-tests and Nested Models

Inferences for Regression

STAT 540: Data Analysis and Regression

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n

Ch 2: Simple Linear Regression

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Chapter 3. Diagnostics and Remedial Measures

9. Linear Regression and Correlation

Simple Linear Regression

Lectures on Simple Linear Regression Stat 431, Summer 2012

STAT 705 Chapter 16: One-way ANOVA

STAT Chapter 11: Regression

Correlation 1. December 4, HMS, 2017, v1.1

Chapter 12 - Lecture 2 Inferences about regression coefficient

Correlation Analysis

Scatter plot of data from the study. Linear Regression

Lecture 10 Multiple Linear Regression

Sections 7.1, 7.2, 7.4, & 7.6

Lecture 18 MA Applied Statistics II D 2004

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Scatter plot of data from the study. Linear Regression

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Lecture 6 Multiple Linear Regression, cont.

Simple linear regression

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

6. Multiple Linear Regression

where x and ȳ are the sample means of x 1,, x n

Topic 7: Analysis of Variance

3. Diagnostics and Remedial Measures

Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

y response variable x 1, x 2,, x k -- a set of explanatory variables

Topic 20: Single Factor Analysis of Variance

Mathematics for Economics MA course

Simple and Multiple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

Inference for Regression

STAT 705 Chapter 19: Two-way ANOVA

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Simple Linear Regression

Density Temp vs Ratio. temp

Remedial Measures, Brown-Forsythe test, F test

Lecture 14 Simple Linear Regression

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Statistics for Managers using Microsoft Excel 6 th Edition

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Simple Linear Regression

Chapter 2 Inferences in Regression and Correlation Analysis

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Chapter 16. Simple Linear Regression and dcorrelation

STAT 705 Chapter 19: Two-way ANOVA

Diagnostics and Remedial Measures

Simple Linear Regression

Chapter 4. Regression Models. Learning Objectives

Regression Models for Quantitative and Qualitative Predictors: An Overview

Concordia University (5+5)Q 1.

Handout 4: Simple Linear Regression

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Applied Regression Analysis

Regression Analysis II

Statistics for Engineers Lecture 9 Linear Regression

Chapter 14. Linear least squares

Basic Business Statistics 6 th Edition

Chapter 4: Regression Models

The Simple Linear Regression Model

Homework 2: Simple Linear Regression

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Simple linear regression

Applied Regression Analysis

Diagnostics and Remedial Measures: An Overview

Measuring the fit of the model - SSR

Chapter 12 - Part I: Correlation Analysis

Ch 3: Multiple Linear Regression

Chapter 12: Multiple Linear Regression

Math 3330: Solution to midterm Exam

Chapter 1. Linear Regression with One Predictor Variable

Multiple Linear Regression for the Supervisor Data

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Chapter 6 Multiple Regression

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

4 Multiple Linear Regression

Linear Models and Estimation by Least Squares

ECON The Simple Regression Model

9 Correlation and Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Transcription:

Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about grand mean {}}{ Y i Ȳ = explained by model slop left over {}}{{}}{ Ŷ i Ȳ + Y i Ŷ i Our regression uses line explains how Y varies with x. We are interested in how much variability in the Y 1,...,Y n is soaked up by the regression model. This can be represented mathematically by partitioning the total sum of squares (SSTO): n SSTO = (Y i Ȳ ) 2. i=1 1

SSTO is a measure of the total (sample) variation of Y ignoring x. Note: SSTO = (n 1)S 2 Y. The sum of squares explained by the regression line is given by SSR = n (Ŷi Ȳ )2. i=1 The sum of squared errors measures how much Y varies around the regression line n SSE = (Y i Ŷi) 2. i=1 It happily turns out that SSR + SSE = SSTO. 2

An aside... Let Y = Y 1 Y 2... Y n,ŷ = b 0 + b 1 x 1 b 0 + b 1 x 2... b 0 + b 1 x n, and 1 n = 1 1... 1. Certainly this is always true: (Ŷ 1 nȳ ) + (Y Ŷ) = (Y 1 nȳ ). This is just a vector version of the first expression on the first slide. 3

The pythagorean theorem gives us the decomposition of the total sum of squares. Since Ŷ 1 nȳ is orthogonal to Y Ŷ, (shown later), we have that is SSR+SSE=SSTO. Ŷ 1 nȳ 2 + Y Ŷ 2 = Y 1 n Ȳ 2, It is worthwhile to verify SSR= Ŷ 1 nȳ 2, etc. Recall that for any vector v = (v 1,...,v n ) the length of the vector is v = a 2 1 + + a2 n (thank s again, Pythagorus). (end of aside) 4

Restated: The variation in the data (SSTO) can be divided into two parts: the part explained by the model (SSR), and the slop that s left over, i.e. unexplained variability (SSE). Associated with each sum of squares are their degrees of freedom (df) and mean squares, forming a nice table: Source SS df MS E(M S) Regression Error Total SSR= SSE= SSTO= n i=1 (Ŷ i Ȳ ) 2 SSR 1 1 σ 2 + β 2 1 n i=1 (Y i Ŷ ) 2 n 2 SSE n i=1 (Y i Ȳ ) 2 n 1 n 2 σ 2 n i=1 (x i x) 2 5

Note: E(MSR) > E(MSE) β 1 0. Loosely, we expect MSR to be larger than MSE when β 1 0. So testing whether the simple linear regression model explains a significant amount of the variation in Y is equivalent to testing H 0 : β 1 = 0 versus H a : β 1 0. Consider the ratio MSR/MSE. If H 0 : β 1 = 0 is true, then this should be near one. In fact F = MSR MSE F 1,n 2 when H 0 : β 1 = 0 is true. So E(F ) = (n 2)/(n 4) which goes to one as n (when β 1 = 0). 6

This leads to an F-test of H 0 : β 1 = 0 versus H a : β 1 0 using F = MSR/MSE: If F > F 1,n 2 (1 α/2) then reject H 0 : beta 1 = 0 at significance level α. Note: F = (t ) 2 and that the F-test is completely equivalent (is exactly the same) to the Wald t-test based on t = b 1 /se(b 1 ) for H 0 : β 1. SAS Example... 7

General linear test Note that if H 0 : β 1 = 0 holds our reduced model is Y i = β 0 + ɛ i. It can be show that the least-squares estimate of β 0 in this reduced model is ˆβ 0 = Ȳ. Thus SSE for the reduced model is n SSE(R) = (Y i Ȳ )2, i=1 which is the SSTO from the full model. 8

Note that the SSE(R) can never be less than the SSE(F), the sum of squared errors from the full model. Including a predictor can never explain less variation in Y, only as much or more. So... SSE(R) SSE(F). If SSE(R) is only a little more than SSE(F), the predictor is not helping much (and so the reduced model may be adequate). We can generally test this with an F-test: [ ] SSE(R) SSE(F) F df R df F = [ ], SSE(F) df F and reject H 0 : reduced model holds if F > F dfr df F,df F (1 α/2). This idea/test will be used often in complex regression models with multiple predictors. Full model / reduced model 9

R 2 and r The coefficient of determination is R 2 = SSR SSTO = 1 SSE SSTO, the proportion of total sample variation in Y that is explained by its linear relationship with x. Note: 0 R 2 1. R 2 = 1 data perfectly linear. R 2 = 0 regression line horizontal (b 1 = 0). The closer R 2 is to one, the greater the linear relationship between x and Y. 10

R 2 = 0.9985658 R 2 = 0.7179004 y 0 20 60 100 y 0 50 100 0 20 40 60 80 100 x 0 20 40 60 80 100 x R 2 = 0.9383456 R 2 = 0 y 0 2 4 6 8 10 y 0 1000 2000 0 20 40 60 80 100 x 0 20 40 60 80 100 x 11

Note: Let r = corr(x,y) = n i=1 (x i x)(y i Ȳ ) n i=1 (x i x) 2 n i=1 (Y i Ȳ ) 2 be the sample correlation between x and Y. Then R 2 = r 2. So R = SSR/SSTO is equal to r. Note: b 1 > 0 r > 0 and b 1 < 0 r < 0. So r = R 2 (b 1 / b 1 ). As usual: r near 0 little linear association between x and Y r near 1 strong positive, linear association between x and Y r near 1 strong negative, linear association between x and Y 12

Cautions about R 2 and r R 2 could be close to one, but the E(Y i ) may not lay on a line. (Why? Which plot?) R 2 may not be close to one, but a line is best for E(Y i ) (Why? Which plot?) R 2 could be essentially zero, but x and Y could be highly related. (Why? Which plot?) Example: Toluca data... 13

Correlation models In the regression model The x values are assumed to be known constants, and We generally want to predict Y from x. If we simply have two continuous variables X and Y without neither being a natural response/predictor, a correlation model can be used. Example: For the Toluca data, say we are interested in simply determining whether lot size and work hours are linearly related. 14

If appropriate, we could assume that X and Y have a bivariate normal normal distribution with parameters µ x, µ y, σ x, σ y, and ρ. Then X i Y i N 2 µ x µ y, σ2 x σ x σ y ρ σ x σ y ρ σy 2. Investigation of linear association between X and Y is done through inferences about ρ = corr(x i, Y i ). A point estimator of ρ is r as defined a few slides back, the sample correlation. r is the MLE under normality, but also a consistent moment-based estimator in general (not assuming normality). 15

Testing H 0 = 0 is equivalent to testing H 0 : β 1 = 0 in the regression of Y on x. A large-sample CI for ρ uses Fishers z-transformation: ( ) 1 + r z = 0.5 log. 1 r In large samples, a (1 α)100% CI for ζ = E(z ) is z ± z(1 α/2) 1/(n 3). Then use Table B.8 in the back of your text to back-transform endpoints to get a CI for ρ. Here, z(1 α/2) = Φ 1 (1 α/2). 16

The Spearman rank correlation coefficient replaces the X values with their ranks, replaces the Y values with their ranks, then carries out a (standard Pearson, described in last slide) correlation analysis on the ranks. Example: Toluca data in R. 17

Pearson (P) versus Spearman (S). Last plot takes log of each Y in 2nd to last plot. What happens to the Spearman correlation? y 5 10 20 30 P= 0.837 S= 0.952 y 0.0 0.4 0.8 P= 0.936 S= 1 2 4 6 8 10 0 20 40 60 80 100 x x y 0.0 1.0 2.0 P= 0.575 S= 0.7 y 6 4 2 0 P= 0.661 S= 0.7 0 20 40 60 80 100 0 20 40 60 80 100 x x 18

Cautions about regression When predicting future values, the conditions affecting Y and x should remain similar for the prediction to be trustworthy. Beware of extrapolation: predicting Y h for x h far outside the range of x in the data. The relationship may not hold outside of the observed x-values. Concluding that x and Y are linearly related (that beta 1 0) does not imply a cause and effect relationship between x and Y. Beware of making multiple predictions or inferences simultaneously unless using an appropriate procedure (e.g. Scheffe s method). One needs to consider both the individual Type I error and the family error rate. 19

The least squares estimates are not unbiased if x is measured with error in fact coefficients are biased towards zero. Slightly more advanced techniques are needed (see Section 4.2, p. 172). We have not discussed model checking and diagnostics. These will come next when we start adding more predictors to the model. For simple linear regression, in most cases a scatterplot tells us all we need to know about (i) linear mean and (ii) homoscedastic (constant variance) errors. (iii) A QQ plot to assess normality can be examined for the residuals e 1,...,e n. 20