Measuring the fit of the model - SSR

Similar documents
Ch 2: Simple Linear Regression

Simple Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Multiple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Chapter 14 Simple Linear Regression (A)

Inferences for Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

Linear models and their mathematical foundations: Simple linear regression

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

TMA4255 Applied Statistics V2016 (5)

Inference for Regression

Simple and Multiple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Linear Models and Estimation by Least Squares

Simple Linear Regression Analysis

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Linear Regression Model. Badr Missaoui

Lecture 14 Simple Linear Regression

Ch 3: Multiple Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Statistical View of Least Squares

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Correlation Analysis

Simple linear regression

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Scatter plot of data from the study. Linear Regression

Simple Linear Regression

Data Analysis and Statistical Methods Statistics 651

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

13 Simple Linear Regression

Simple Linear Regression

Inference for the Regression Coefficient

Scatter plot of data from the study. Linear Regression

Applied Econometrics (QEM)

Lecture 6 Multiple Linear Regression, cont.

Inference for Regression Simple Linear Regression

Coefficient of Determination

15.1 The Regression Model: Analysis of Residuals

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Applied Econometrics (QEM)

BNAD 276 Lecture 10 Simple Linear Regression Model

Applied Regression Analysis

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Intro to Linear Regression

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Inference in Regression Analysis

Lecture 11: Simple Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

ECON The Simple Regression Model

Simple Linear Regression

appstats27.notebook April 06, 2017

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 27 Summary Inferences for Regression

Lectures on Simple Linear Regression Stat 431, Summer 2012

Overview Scatter Plot Example

Regression Models - Introduction

Math 3330: Solution to midterm Exam

Correlation and Regression

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Simple linear regression

Simple Linear Regression

23. Inference for regression

Chapter 16. Simple Linear Regression and dcorrelation

Intro to Linear Regression

ECO220Y Simple Regression: Testing the Slope

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Simple Linear Regression Using Ordinary Least Squares

Weighted Least Squares

Section 3: Simple Linear Regression

Chapter 4. Regression Models. Learning Objectives

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

STAT 3A03 Applied Regression With SAS Fall 2017

Basic Business Statistics 6 th Edition

STAT 4385 Topic 03: Simple Linear Regression

Multivariate Regression (Chapter 10)

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

Single and multiple linear regression analysis

Regression Analysis: Basic Concepts

The Multiple Regression Model

Simple Linear Regression

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Multiple Regression Analysis: The Problem of Inference

Simple Linear Regression: The Model

Lectures 5 & 6: Hypothesis Testing

STA121: Applied Regression Analysis

ST Correlation and Regression

2. Regression Review

Transcription:

Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do this is take some measure of how big the errors/residuals are, where the errors/residuals are given by e i = y i ŷ i. This measure is called the sum of squares due to error, SSE = n i=1 (y i ŷ i ) 2 ; it is the quantity which the least squares procedure attempts to minimize. Note that this quantity depends on the units in which the dependent variable is measured.

Measuring the fit of the model - R 2 Another way to measure the fit of the model is to look at the proportion of the total variability in the dependent variable that can be explained by the independent variable. We can measure the total variability in the dependent variable using the total sum of squares SST = n i=1 (y i ȳ i ) 2. We can meausure the variability in the dependent variable that can be explained by the independent variable SSR = n i=1 (ŷ i ȳ) 2 This means that the proportion of total variability in the dependent variable that can be explained by the independent variable is SSR SST. This quantity is called the coefficient of determination and denoted R 2. Since SST = SSR + SSE, R 2 can also be written SST SSE SST =1 SSE SST

Sample correlation We d like to have a way to estimate the true correlation, ρ, using the data This is the sample correlation r, givenby: r = n i=1 (x i x)(y i ȳ) n i=1 (x i x) 2 n i=1 (y i ȳ) 2 This can be re-expressed in terms that we have used before. Remember, that we can write ˆβ 1 as S xy. This yields r = S xy Sxx S yy = ˆβ 1 Sxx S yy. This relationship also means that we can write the regression equation given r,, S yy, and the sample means of x and y. We know ˆβ Syy 1 = r. In the case of simple linear regression (one independent variable), the coefficient of determination R 2 = r 2.

Inferences concerning the linear model parameters The least squares estimates ˆβ 0 and ˆβ 1 obtained using our sample are only estimates of β 0 and β 1 How good are these estimators? What are their means, variances, etc.? How can we make a confidence interval/hypothesis test for these parameters?

Sampling distribution for slope estimate, ˆβ 1 E( ˆβ 1 )=β 1,so ˆβ 1 is unbiased Var( ˆβ 1 )= σ2 where = n i=1 (x i x) 2 and σ 2 = Var(Y )=Var(ε) The distribution of ˆβ 1 depends on the distribution of the error term ε. Itis normally distributed if ε is normally distributed. We will generally be looking at models, in which we assume that ε is normally distributed.

Sampling distribution for intercept estimate, ˆβ 0 E( ˆβ 0 )=β 0,so ˆβ 0 is unbiased Var( ˆβ 0 )= σ2 n i=1 (x i x) 2 n The distribution of ˆβ 0 also depends on the distribution of the error term ε. It is normally distributed if ε is normally distributed.

Estimator for σ 2 We rarely know σ 2, so we will need to estimate it based on the data Since σ 2 represents the variance of the Y i s around the line β 0 + β 1 X i,it makes since to estimate it using some function of the distances between the data points and the fitted line. This (unbiased) estimator for σ 2 is s 2 = SSE n 2, where SSE = n i=1 (y i ŷ i ) 2. Given that ˆβ 0 and ˆβ 1 are normally distributed (given known σ 2 ), when we substitute our estimate s 2 for σ 2, these estimators have t distributions with n 2 degrees of freedom. Knowledge of the sampling distributions of these statistics enables us to conduct hypothesis tests and form confidence intervals.

Hypothesis tests/cis for coefficients After fitting a linear model, we might ask whether there is sufficient evidence to conclude that the x variable is a useful predictor of the y variable. This is a hypothesis test with H 0 : β 1 = 0 and H A : β 1 0. We can conduct the test as usual, formulating the test statistic as: T n 2 = ˆβ 1 0 s 2 Of course, we can also use the same methodology to test hypotheses which involve another value of β 1 (instead of 0) or to test hypotheses involving β 0. Using the information about the sampling distributions of ˆβ 0 and ˆβ 1,we can form confidence intervals for these parameters. To find a (1 α)100% confidence interval: ˆβ i ± t α 2 SE ˆβi

Are the assumptions of the model met? Suppose we use least squares to obtain an estimated regression line. In order to make inferences concerning the parameters β 0 and β 1, we need to make assumptions about the distribution/correlation of the residuals One way to examine the truthfulness of the assumptions is to look at a scatter plot of the residuals (e = y ŷ) vs. the fitted values (ŷ) They should form a cloud (no patterns), symmetric about 0, with fairly even variation in the the variation of the residuals over the range of fitted values.

Confidence interval for E(Y ) Remember that our regression line is just an estimate for the expected value of the Y variable. This means we re estimating E(Y )=β 0 + β 1 X with ˆβ 0 + ˆβ 1 X, where X is just the value of X for which we want to estimate E(Y ) We know that since ˆβ 0 and ˆβ 1 are unbiased estimators, the quantity ˆβ 0 + ˆβ 1 X is an unbiased estimator for E(Y ). The standard error for our estimate is fairly complicated to derive (see pp. 502-4), but it can be shown to be s 1 n + (x x) 2 This yields a confidence interval for E(Y )=β 0 + β 1 X,oftheform ( ˆβ 0 + ˆβ 1 x ) ± t n 2 1 α s 2 n + (x x) 2 where s 2 = SSE n 2.

Prediction interval for Y when X = x Let s say that instead of a confidence interval for the mean E(Y ), we want a confidence interval for a prediction of Y when X = x. Before, we were estimating a parameter E(Y ). Now we want to estimate the valueofarv,they we observe at some specific time when X = x. Intuitively, we would estimate this value somewhere near the middle of the distribution for Y for X = x. The center of this distribution is E(Y )=β 0 + β 1 x, which is estimated by ˆβ 0 + ˆβ 1 x. So we have the same estimate for E(Y ) as we do for a prediction of Y, but intuitively, the variance for a prediction must be larger. The S.E. is (again) fairly complicated to derive (see pp. 506-8). It can be shown to be s 1+ 1 n + (x x) 2, yielding a prediction interval for Y (when X = x and s 2 = SSE n 2 ) ( ˆβ 0 + ˆβ 1 x ) ± t n 2 α s 1+ 1 2 n + (x x) 2