Coefficient of Determination

Similar documents
Multiple Linear Regression

Ch 2: Simple Linear Regression

Inference for Regression

Ch 3: Multiple Linear Regression

ST430 Exam 1 with Answers

Simple Linear Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Lecture 6 Multiple Linear Regression, cont.

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Applied Regression Analysis

Simple and Multiple Linear Regression

Measuring the fit of the model - SSR

Variance Decomposition and Goodness of Fit

ST430 Exam 2 Solutions

Handout 4: Simple Linear Regression

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Regression Analysis II

MATH 644: Regression Analysis Methods

Simple Linear Regression

Density Temp vs Ratio. temp

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Math 3330: Solution to midterm Exam

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

Inferences for Regression

Correlation Analysis

Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

6. Multiple Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Comparing Nested Models

Lecture 18: Simple Linear Regression

Lecture 1: Linear Models and Applications

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Linear Regression Model. Badr Missaoui

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

BNAD 276 Lecture 10 Simple Linear Regression Model

Chapter 14 Simple Linear Regression (A)

14 Multiple Linear Regression

1 Use of indicator random variables. (Chapter 8)

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Multiple Regression: Example

Inference for the Regression Coefficient

MS&E 226: Small Data

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

1 Introduction 1. 2 The Multiple Regression Model 1

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

1 Multiple Regression

MODELS WITHOUT AN INTERCEPT

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

STAT 350: Summer Semester Midterm 1: Solutions

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Inference for Regression Simple Linear Regression

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Lecture 4 Multiple linear regression

Model Specification and Data Problems. Part VIII

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Scatter plot of data from the study. Linear Regression

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Lecture 14 Simple Linear Regression

Inferences on Linear Combinations of Coefficients

Introduction and Single Predictor Regression. Correlation

Statistics for Engineers Lecture 9 Linear Regression

13 Simple Linear Regression

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

The Multiple Regression Model

Simple Linear Regression

Simple linear regression

Tests of Linear Restrictions

Formal Statement of Simple Linear Regression Model

Scatter plot of data from the study. Linear Regression

Correlation and Regression

Chapter 12: Linear regression II

Regression and the 2-Sample t

Chapter 14. Linear least squares

STAT5044: Regression and Anova. Inyoung Kim

Chapter 16. Simple Linear Regression and dcorrelation

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference in Regression Analysis

STAT Chapter 11: Regression

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Homework 9 Sample Solution

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Statistics for Managers using Microsoft Excel 6 th Edition

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Lecture 11: Simple Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Biostatistics 380 Multiple Regression 1. Multiple Regression

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

SCHOOL OF MATHEMATICS AND STATISTICS

Transcription:

Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance explained by the regression model. It measures the correlation between the dependent variable Y and the independent variables jointly; R 2 is also the (square of the) multiple correlation and is sometimes called the multiple R 2. 1 / 15 Multiple Linear Regression Coefficients of Determination

Adjusted Coefficient of Determination Because the regression model is adapted to the sample data, it tends to explain more variance in the sample data than it will in new data. Rewrite: 1 R 2 = SS E SS yy = 1 n 1 n (yi ŷ i ) 2 (yi ȳ) 2 Numerator and denominator are biased estimators of variance. 2 / 15 Multiple Linear Regression Coefficients of Determination

Replace 1 n with the multipliers that give unbiased variance estimators: 1 n p 1 n 1 (yi ŷ i ) 2 (yi ȳ), 2 where as before p = k + 1, the number of estimated βs. This defines the adjusted coefficient of determination: 1 (yi ŷ Ra 2 n p i ) 2 = 1 1 (yi ȳ) n 1 2 = 1 n 1 n p (yi ŷ i ) 2 (yi ȳ). 2 R 2 a < R 2, and for a poorly fitting model you may even find R 2 a < 0! 3 / 15 Multiple Linear Regression Coefficients of Determination

Looking Ahead... To assess how well a model will predict new data, you can use deleted residuals (see Section 8.6, The Jackknife): Delete one observation, say y i ; Refit the model, and use it to predict the deleted observation as ŷ (i) ; The deleted residual (or prediction residual) is d i = y i ŷ (i). More R 2 (see Section 5.11, External Model Validation): R 2 jackknife = 1 ( ) 2 ( ) 2 yi ŷ (i) (yi ȳ) 2, yi ŷ (i) P2 = 1 ( ) 2. yi ȳ (i) 4 / 15 Multiple Linear Regression Coefficients of Determination

A useful R function: PRESS <- function(l) { r <- residuals(l) sse <- sum(r^2) d <- r / (1 - hatvalues(l)) press <- sum(d^2) sst <- sse / (1 - summary(l)$r.squared) n <- length(r) ssti <- sst * (n / (n - 1))^2 c(stat = press, pred.rmse = sqrt(press / n), pred.r.square = 1 - press / sst, P.square = 1 - press / ssti) } 5 / 15 Multiple Linear Regression Coefficients of Determination

Estimation and Prediction ST 430/514 The multiple regression model may be used to make statements about the response that would be observed under a new set of conditions x new = (x 1,new, x 2,new,..., x k,new ). As before, the statement may be about: E(Y x = x new ), the expected value of Y under the new conditions; a single new observation of Y under the new conditions. 6 / 15 Multiple Linear Regression Estimation and Prediction

The point estimate of E(Y x = x new ) and the point prediction of Y when x = x new are both ŷ = ˆβ 0 + ˆβ 1 x 1,new + + ˆβ k x k,new. The standard errors are different because, as always, Y = E(Y ) + ɛ. 7 / 15 Multiple Linear Regression Estimation and Prediction

Example: prices of grandfather clocks Get the data and plot them: clocks = read.table("text/exercises&examples/gfclocks.txt", header = pairs(clocks[, c("price", "AGE", "NUMBIDS")]) Fit the first-order model and summarize it: clockslm = lm(price ~ AGE + NUMBIDS, clocks) summary(clockslm) PRESS(clocksLm) Check the residuals: plot(clockslm) 8 / 15 Multiple Linear Regression Estimation and Prediction

Predictions are for an auction of a 150-year old clock with 10 bidders. 95% confidence interval for E(Y AGE = 150, NUMBIDS = 10): predict(clockslm, newdata = data.frame(age = 150, NUMBIDS = 10), interval = "confidence", level =.95) 95% prediction interval for Y when AGE = 150, NUMBIDS = 10: predict(clockslm, newdata = data.frame(age = 150, NUMBIDS = 10), interval = "prediction", level =.95) 9 / 15 Multiple Linear Regression Estimation and Prediction

Interaction Models One property of the first-order model E(Y ) = β 0 + β 1 x 1 + β 2 x 2 + + β k x k is that β i is the change in E(Y ) as x i increases by 1 with all the other independent variables held fixed, and is the same regardless of the values of those other variables. Not all real-world situations work like that. When the magnitude of the effect of one variable is affected by the level of another, we say that they interact. 10 / 15 Multiple Linear Regression Interaction Model

A simple model for two factors that interact is E(Y ) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2. Rewrite this in two ways: E(Y ) = β 0 + (β 1 + β 3 x 2 )x 1 + β 2 x 2 = β 0 + β 1 x 1 + (β 2 + β 3 x 1 )x 2. Holding x 2 fixed, the slope of E(Y ) against x 1 is β 1 + β 3 x 2. Holding x 1 fixed, the slope of E(Y ) against x 2 is β 2 + β 3 x 1. 11 / 15 Multiple Linear Regression Interaction Model

We can fit this using k = 3: if we set x 3 = x 1 x 2. E(Y ) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 Example: grandfather clocks again. clockslm2 <- lm(price ~ AGE * NUMBIDS, clocks) summary(clockslm2) Note: the formula PRICE AGE * NUMBIDS specifies the interaction model, which includes the separate effects of AGE and NUMBIDS, together with their product, which will be labeled AGE:NUMBIDS. 12 / 15 Multiple Linear Regression Interaction Model

Output Call: lm(formula = PRICE ~ AGE * NUMBIDS, data = clocks) Residuals: Min 1Q Median 3Q Max -154.995-70.431 2.069 47.880 202.259 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 320.4580 295.1413 1.086 0.28684 AGE 0.8781 2.0322 0.432 0.66896 NUMBIDS -93.2648 29.8916-3.120 0.00416 ** AGE:NUMBIDS 1.2978 0.2123 6.112 1.35e-06 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 88.91 on 28 degrees of freedom Multiple R-squared: 0.9539, Adjusted R-squared: 0.9489 F-statistic: 193 on 3 and 28 DF, p-value: < 2.2e-16 13 / 15 Multiple Linear Regression Interaction Model

Note that the t-value on the AGE:NUMBIDS line is highly significant. That is, we strongly reject the null hypothesis H 0 : β 3 = 0. Effectively, this test is a comparison of the interaction model with the original non-interactive (additive) model. The other two t-statistics are usually irrelevant: if AGE:NUMBIDS is important, then both AGE and NUMBIDS should be included in the model; do not test the corresponding null hypotheses. 14 / 15 Multiple Linear Regression Interaction Model

We can write the fitted model as E(PRICE) = 320 + (0.88 + 1.3 NUMBIDS) AGE 93 NUMBIDS meaning that the effect of age increases with the number of bidders. Check the model: plot(clockslm2) More satisfactory than the additive model. 15 / 15 Multiple Linear Regression Interaction Model