Applied Regression Analysis

Similar documents
Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Ch 3: Multiple Linear Regression

Ch 2: Simple Linear Regression

Inference for Regression

Lecture 6 Multiple Linear Regression, cont.

Multiple Linear Regression

Linear Regression Model. Badr Missaoui

Simple Linear Regression

MATH 644: Regression Analysis Methods

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Linear models and their mathematical foundations: Simple linear regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Variance Decomposition and Goodness of Fit

Simple and Multiple Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Coefficient of Determination

Simple Linear Regression

14 Multiple Linear Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Multiple Linear Regression

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Measuring the fit of the model - SSR

Lecture 4 Multiple linear regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

ST430 Exam 1 with Answers

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Formal Statement of Simple Linear Regression Model

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Statistics for Engineers Lecture 9 Linear Regression

Categorical Predictor Variables

STAT 540: Data Analysis and Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Regression Models - Introduction

Lecture 14 Simple Linear Regression

Correlation Analysis

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Tests of Linear Restrictions

Chapter 14. Linear least squares

Multivariate Regression

Mathematics for Economics MA course

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

MS&E 226: Small Data

Math 3330: Solution to midterm Exam

Density Temp vs Ratio. temp

Lecture 18: Simple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

Introduction and Single Predictor Regression. Correlation

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

STAT 3A03 Applied Regression With SAS Fall 2017

Multiple Regression Analysis. Part III. Multiple Regression Analysis

6. Multiple Linear Regression

Applied Econometrics (QEM)

The Multiple Regression Model

Multivariate Regression (Chapter 10)

Homework 2: Simple Linear Regression

Simple linear regression

ECON 450 Development Economics

Comparing Nested Models

Lectures on Simple Linear Regression Stat 431, Summer 2012

Homoskedasticity. Var (u X) = σ 2. (23)

Chapter 4: Regression Models

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Basic Business Statistics 6 th Edition

Simple Linear Regression

Inference for Regression Simple Linear Regression

where x and ȳ are the sample means of x 1,, x n

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 4. Regression Models. Learning Objectives

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Handout 4: Simple Linear Regression

Lecture 10 Multiple Linear Regression

Statistics for Managers using Microsoft Excel 6 th Edition

ST430 Exam 2 Solutions

Lecture 15. Hypothesis testing in the linear model

Chapter 12: Multiple Linear Regression

ECO220Y Simple Regression: Testing the Slope

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Multiple Regression Methods

Inferences for Regression

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Simple Linear Regression

Lecture 1: Linear Models and Applications

Regression Models. Chapter 4. Introduction. Introduction. Introduction

13 Simple Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Applied Regression Analysis. Section 2: Multiple Linear Regression

Chapter 14 Simple Linear Regression (A)

ECON3150/4150 Spring 2015

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Data Mining Stat 588

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Ordinary Least Squares Regression Explained: Vartanian

Linear Models and Estimation by Least Squares

Regression Models - Introduction

STAT 4385 Topic 03: Simple Linear Regression

Transcription:

Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013

Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of Regression Coefficients 4 Properties of the Least Squares Estimators 5 Multiple correlation coefficient

Recall simple linear regression Multiple Linear Regression I In the last lesson, we have learned that ŵage = 0.90 + 0.54eper How about other variables besides experience that are related to the wages? How about the level of education?

Recall simple linear regression Multiple Linear Regression II ŵage = β 0 + β 1 educ + β 2 eper + ɛ where exper is years of labor market experience and wage is the level of education. Multiple regression analysis is also useful for generalizing functional relationships between variables. For example, considering the relationship between consumption(cons) and family income(inc): cons = β 0 + β 1 inc + β 2 inc 2 + ɛ

Recall simple linear regression Multiple Linear Regression III After taking x 1 = inc and x 2 = inc 2, it is still a multiple linear regression problem. Y = β 0 + β 1 X 1 + β 2 X 2 + + β p X p + ε (3.1)

Recall simple linear regression Multiple Linear Regression IV According to above equation, each observation can be written as y i = β 0 + β 1 x i1 + β 2 x i2 + + β p x ip + ε i

Recall simple linear regression Multiple Linear Regression V The key assumption of multiple linear regression is : E(ε X 1,, X p ) = 0

Parameter Estimation 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of Regression Coefficients 4 Properties of the Least Squares Estimators 5 Multiple correlation coefficient

Parameter Estimation Parameter Estimation I The errors can be written as ε i = y i (β 0 + β 1 x i1 + β 2 x i2 + + β p x ip ) The sum of squares of these errors is S(β 0, β 1,, β p ) = n ε 2 i = i=1 n (y i (β 0 +β 1 x i1 +β 2 x i2 + +β p x ip )) 2 i=1

Parameter Estimation Parameter Estimation I In the general case with k independent variables, we seek estimates in the equation of ˆβ 0, ˆβ 1,, ˆβ p ŷ = ˆβ 0 + ˆβ 1 x 1 + ˆβ 2 x 2 + + ˆβ p x p + ε (3.1)

Parameter Estimation Parameter Estimation II The OLS estimates, of the p + 1 parameters,are chosen to minimize the sum of squared residuals: S(β 0, β 1,, β p ) 1 By a direct application of calculus, it can be show that the least squares estimates ˆβ 0, ˆβ 1,, ˆβ p

Parameter Estimation Parameter Estimation III which minimize S(β 0, β 1,, β p ), are given by the solution of the following system of equations: s 11 ˆβ 1 + s 12 ˆβ 2 + + s 1p ˆβ p = s y1 s 12 ˆβ 1 + s 22 ˆβ 2 + + s 2p ˆβ p = s y2. s 1p ˆβ 1 + s 2p ˆβ 2 + + s pp ˆβ p = s yp

Parameter Estimation Parameter Estimation I where n s ij = (x αi x i )(x αj x j ) s yj = α=1 n (y α ȳ)(x αj x j ) α=1 n α=1 x j = x αj n n α=1 ȳ = y α n

Parameter Estimation Parameter Estimation II and β 0 = ȳ ˆβ 1 x 1 ˆβ 2 x 2 ˆβ p x p.

Parameter Estimation The equations in the above system are called the normal equations. β 0 is usually referred to as the intercept or constant. β j, j = 1, 2,, p, is usually referred to as the regression coefficients or partial coefficients.

Interpretations of Regression Coefficients 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of Regression Coefficients 4 Properties of the Least Squares Estimators 5 Multiple correlation coefficient

Interpretations of Regression Coefficients Interpretations of Regression Coefficients I 1 β 0 is the value of Y when X 1 = X 2 = = X p = 0, as in the simple regression.

Interpretations of Regression Coefficients Interpretations of Regression Coefficients II 2 β j, j = 1, 2,, p: has several interpretations: the change in Y corresponding to a unit change in X j when all other predictor variables are held constant. Magnitude of the change is not depend on the values at which the other predictor variables are fixed. partial regression coefficient-represents the contribution of X j to the response variable Y after it has been adjusted for the other predictor variables.

Interpretations of Regression Coefficients Interpretations of Regression Coefficients III 3 Ref P57 Explain: Partial regression coefficients

Interpretations of Regression Coefficients Check the data I Y 40 50 60 70 80 90 40 50 60 70 80 40 50 60 70 80 90 X1 40 50 60 70 80 30 40 50 60 70 80 30 40 50 60 70 80 X2

Interpretations of Regression Coefficients Check the data II 40 50 60 70 80 90 40 50 60 70 80 X1 Y

Interpretations of Regression Coefficients Check the data III 30 40 50 60 70 80 40 50 60 70 80 X2 Y

Interpretations of Regression Coefficients Explain: Partial regression coefficients I Supervisor data > pairs( Y~ X1 + X2, pch = 16, col ="blue", data = ch3) > lm1 <- lm(y ~ X1+X2, data = ch3) > summary(lm1)

Interpretations of Regression Coefficients Explain: Partial regression coefficients II Call: lm(formula = Y ~ X1 + X2, data = ch3) Residuals: Min 1Q Median 3Q Max -12.7887-5.6893-0.0284 6.2745 9.9726 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 15.32762 7.16023 2.141 0.0415 *

Interpretations of Regression Coefficients Explain: Partial regression coefficients III X1 0.78034 0.11939 6.536 5.22e-07 *** X2-0.05016 0.12992-0.386 0.7025 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 7.102 on 27 degrees of freedom Multiple R-squared: 0.6831,

Interpretations of Regression Coefficients Explain: Partial regression coefficients IV Adjusted R-squared: 0.6596 F-statistic: 29.1 on 2 and 27 DF, p-value: 1.833e-07

Properties of the Least Squares Estimators 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of Regression Coefficients 4 Properties of the Least Squares Estimators 5 Multiple correlation coefficient

Properties of the Least Squares Estimators Properties of the Least Squares Estimators I 1 The estimator ˆβ j, j = 0, 1,, p, is an unbiased estimate of β j and has a variance of σ 2 c jj, where c jj is the j th diagonal element of (X T X ) 1. The least square estimators are BLUE(best linear unbiased estimator has the smallest variance among all unbiased estimators).

Properties of the Least Squares Estimators Properties of the Least Squares Estimators II 2 The estimator ˆβ j, j = 0, 1,, p, is normally distributed with mean β j and variance σ 2 c jj.

Properties of the Least Squares Estimators Properties of the Least Squares Estimators III 3 W = SSE/σ 2 has a χ 2 distribution with n p 1 degree of freedom, and ˆβ j s and ˆσ 2 are distributed independently of each other.

Properties of the Least Squares Estimators Properties of the Least Squares Estimators IV 4 The vector ˆβ = ( ˆβ 0, ˆβ 1,, ˆβ p ) has a (p + 1)-variate normal distribution with mean vector β = (β 0, β 1,, β p ) and variance covariance matrix with elements σ 2 c ij.

Multiple correlation coefficient 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of Regression Coefficients 4 Properties of the Least Squares Estimators 5 Multiple correlation coefficient

Multiple correlation coefficient Multiple correlation coefficient I 1 The strength of the linear relationship between Y and the set of predictors X 1, X 2,, X p can be assessed through the examination of the scatter plot of Y versus Ŷ and

Multiple correlation coefficient Multiple correlation coefficient II 2 the correlation coefficient between Y and Ŷ (yi ȳ)(ŷ i ŷ) Cor(Y, Ŷ ) = (yi ȳ) 2 (ŷ i ŷ) 2

Multiple correlation coefficient Multiple correlation coefficient III 3 Goodness-of-Fit: The coefficient of determination SST: Total Sum of Squares SSE: Explained Sum of Squares SSR: Residual Sum of Squares (or Sum of Squared Residuals)

Multiple correlation coefficient Multiple correlation coefficient IV SST SSE SSR n (y i ȳ) 2 i=1 n (ŷ i ȳ) 2 i=1 n (y i ŷ i ) 2 i=1 SST = SSR + SSE R 2 = SSR SST = 1 SSE SST = 1 (yi ŷ i ) 2 (yi ȳ) 2

Multiple correlation coefficient Inference for individual regression coefficients I 1 H 0 : β j = β 0 j P61 2 Test statistic t j = ˆβ j β 0 j s.e.( ˆβ j )

Multiple correlation coefficient Inference for individual regression coefficients II 3 C.I. for β j The confidence limits for β j with confidence coefficient α are given by ˆβ j ± ˆσ c jj t (n p 1,α/2)

Multiple correlation coefficient Supervisor Performance I The fitted regression equation is Ŷ = 10.787+0.613x 1 0.073X 2 +0.320X 3 +0.081X 4 +0.038X 5 0.217X 6 1 How to interpret the output Variable Coefficient s.e. t-test p-value Constant 10.787 11.5890 0.93 0.3616 X 1 X 2 X 3 X 4 X 5 X 6 n = 30 R 2 = 0.73 Ra 2 = 0.60 ˆσ = 7.068 d.f. =23

Multiple correlation coefficient Supervisor Performance II

Multiple correlation coefficient Test of Hypothesis in a linear model I 1 All the regression coefficients associated with the predictor variables are zero. 2 Some of the regression coefficients are zero. 3 Some of the regression coefficients are equal to each other. 4 the regression parameters satisfy certain specified constraints.

Multiple correlation coefficient Model Compare I The full model: Y = β 0 + β 1 X 1 + β 2 X 2 + + β p X p + ε (Full Model-FM) If we set some of the regression coefficients to be 0, then we get a reduced model-rm Like, for a given k, β k = 0, then we get a reduced model. The number of distinct parameters to be estimated in the reduced model is smaller than the number of parameters to be estimated in the full model.

Multiple correlation coefficient Model Compare II Accordingly, we wish to test: H 0 : Reduced model is adequate against H 1 : Full model is adequate 1 What s nested model. A set of models are said to be nested if they can be obtained from a larger model as special cases. 2 P64 The sum of squares due to error associated with the FM (p + 1 parameters), SSE(FM) = (y i ŷ i ) 2.

Multiple correlation coefficient Model Compare III 3 P64 The sum of squares due to error associated with the RM(k distinct parameters), SSE(RM) = (y i ŷ i ) 2.

Multiple correlation coefficient Model Compare IV Here for sure SSE(RM) SSE(FM), the point is how large is the difference between the residual sum of squares. If the difference is large, the reduced model is inadequate. F = [SSE(RM) SSE(FM)]/(p + 1 k) SSE(FM)/(n p 1) H 0 is rejected if F F (p+1 k,n p 1;α). or, equivalently, if p(f ) α

Multiple correlation coefficient Testing all regression coefficients equal to zero I RM: H 0 : Y = β 0 + ε FM: H 1 : Y = β 0 + β 1 X 1 + + β p X p + ε The F-test reduced to F = [SST SSE]/p SSE/(n p 1) = SSR/p SSE/(n p 1) = MSR MSE

Multiple correlation coefficient Testing a subset of regression coefficients equal to zero I An important goal in regression analysis is to arrive at adequate descriptions of observed phenomenon in terms of as few meaningful variables as possible. Simplicity of description or the principle of parsimony is one of the important guiding principles in regression analysis.

Multiple correlation coefficient Testing a subset of regression coefficients equal to zero II 1 RM: Y = β 0 + β 1 X 1 + β 3 X 3 + ε which corresponds to hypothesis 2 In simple regression, p = 1. H 0 : β 2 = β 4 = β 5 = β 6 = 0 t 1 = ˆβ 1 s.e.( ˆβ 1 ) Therefore,

Multiple correlation coefficient Testing a subset of regression coefficients equal to zero III F = t 2 1

Multiple correlation coefficient Testing the Equality of Regression coefficients I 1 H 0 : β 1 = β 3 (= β 1) (β 2 = β 4 = β 5 = β 6 = 0) Under H 0 2 Y = β 0 + β 1(X 1 + X 3 ) + ε

Multiple correlation coefficient Estimating and Testing of regression parameters under constrains I 1 H 0 : β 1 + β 3 = 1 (β 2 = β 4 = β 5 = β 6 = 0) Under H 0 2 Y = β 0 + β 1 X 1 + (1 β 1 )X 3 ) + ε

Multiple correlation coefficient Predictions 1 suppose x 0 = (x 01, x 02,, x 0p ), the predicted value, ŷ 0, corresponding to x 0 is given by ŷ 0 = ˆβ 0 + ˆβ 1 x 01 + ˆβ 2 x 02 + + ˆβ p x 0p 2 The C.I with confidence coefficient α, ŷ 0 ± t (n p 1,α/2) s.e.(ŷ 0 ).

Multiple correlation coefficient Homework 1 P75 3.1 3.5 2 3.15