Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Similar documents
Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Need for Several Predictor Variables

Linear Algebra Review

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Chapter 5 Matrix Approach to Simple Linear Regression

Formal Statement of Simple Linear Regression Model

Matrix Approach to Simple Linear Regression: An Overview

STAT5044: Regression and Anova

6. Multiple Linear Regression

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Chapter 2 Multiple Regression (Part 4)

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

LINEAR REGRESSION MODELS W4315

STAT5044: Regression and Anova. Inyoung Kim

Lecture 9 SLR in Matrix Form

STAT 540: Data Analysis and Regression

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Lecture 10 Multiple Linear Regression

Sections 7.1, 7.2, 7.4, & 7.6

MATH 644: Regression Analysis Methods

Linear models and their mathematical foundations: Simple linear regression

Concordia University (5+5)Q 1.

STA 4210 Practise set 2a

F-tests and Nested Models

Ch 2: Simple Linear Regression

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Chapter 2 Multiple Regression I (Part 1)

Biostatistics 380 Multiple Regression 1. Multiple Regression

Math 423/533: The Main Theoretical Topics

Ch 3: Multiple Linear Regression

Chapter 6 Multiple Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Correlation Analysis

Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Multiple Linear Regression

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Remedial Measures, Brown-Forsythe test, F test

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

STAT 360-Linear Models

Inferences for Regression

Diagnostics and Remedial Measures

Lecture 13 Extra Sums of Squares

Inference for Regression

STA 4210 Practise set 2b

Lecture 6 Multiple Linear Regression, cont.

Lecture 4 Multiple linear regression

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Regression Models for Quantitative and Qualitative Predictors: An Overview

Introduction to Simple Linear Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Chapter 12: Multiple Linear Regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Simple Linear Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Remedial Measures for Multiple Linear Regression Models

Chapter 2 Inferences in Simple Linear Regression

14 Multiple Linear Regression

5. Multiple Regression (Regressioanalyysi) (Azcel Ch. 11, Milton/Arnold Ch. 12) The k-variable Multiple Regression Model

Simple and Multiple Linear Regression

3. Diagnostics and Remedial Measures

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 1: Linear Models and Applications

Diagnostics and Remedial Measures: An Overview

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

Chapter 3. Diagnostics and Remedial Measures

Lecture 3: Inference in SLR

Multivariate Linear Regression Models

Ordinary Least Squares Regression

STAT 705 Chapter 16: One-way ANOVA

Multivariate Regression (Chapter 10)

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

1 Least Squares Estimation - multiple regression.

Simple Linear Regression

STOR 455 STATISTICAL METHODS I

2.1: Inferences about β 1

Chapter 14 Student Lecture Notes 14-1

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Applied Regression Analysis

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Lecture 18 MA Applied Statistics II D 2004

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Measuring the fit of the model - SSR

Nonparametric Regression and Bonferroni joint confidence intervals. Yang Feng

Statistics for Managers using Microsoft Excel 6 th Edition

STAT5044: Regression and Anova. Inyoung Kim

2. Regression Review

STAT 705 Chapter 19: Two-way ANOVA

Chapter 14 Simple Linear Regression (A)

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Topic 14: Inference in Multiple Regression

Sociology 6Z03 Review II

Chapter 4: Regression Models

Transcription:

Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression Model 3 Polynomial Regression Models 4 Interaction Regression Models 5 Model Selection 6 Weighted Least Squares and Ridge Regression 7 Principle Component Analysis

General Regression Model in Matrix Terms Y 1. 1 X 11 X 12... X 1,p 1 Y =. X =.... 1 X n1 X n2... X n,p 1 Y n β 0 ɛ 1.. β =. ɛ =... β p 1 ɛ n

General Linear Regression in Matrix Terms With E(ɛ) = 0 and Y = X β + ɛ σ 2 0... 0 σ 2 (ɛ) = 0 σ 2... 0... 0 0... σ 2 We have E(Y ) = X β and σ 2 {y} = σ 2 I

Least Square Solution The matrix normal equations can be derived directly from the minimization of w.r.t to β Q = (y Xβ) (y Xβ)

Least Square Solution We can solve this equation X Xb = X y (if the inverse of X X exists) by the following (X X) 1 X Xb = (X X) 1 X y and since we have (X X) 1 X X = I b = (X X) 1 X y

Fitted Values and Residuals Let the vector of the fitted values are ŷ 1 ŷ 2 ŷ =... ŷ n in matrix notation we then have ŷ = Xb

Hat Matrix-Puts hat on y We can also directly express the fitted values in terms of X and y matrices ŷ = X(X X) 1 X y and we can further define H, the hat matrix ŷ = Hy H = X(X X) 1 X The hat matrix plans an important role in diagnostics for regression analysis.

Hat Matrix Properties 1. the hat matrix is symmetric 2. the hat matrix is idempotent, i.e. HH = H Important idempotent matrix property For a symmetric and idempotent matrix A, rank(a) = trace(a), the number of non-zero eigenvalues of A.

Residuals The residuals, like the fitted value ŷ can be expressed as linear combinations of the response variable observations Y i e = y ŷ = y Hy = (I H)y also, remember e = y ŷ = y Xb these are equivalent.

Covariance of Residuals Starting with we see that but which means that e = (I H)y σ 2 {e} = (I H)σ 2 {y}(i H) σ 2 {y} = σ 2 {ɛ} = σ 2 I σ 2 {e} = σ 2 (I H)I(I H) = σ 2 (I H)(I H) and since I H is idempotent, we have σ 2 {e} = σ 2 (I H)

Quadratic Forms In general, a quadratic form is defined by y Ay = i j a ijy i Y j where a ij = a ji with A the matrix of the quadratic form. The ANOVA sums SSTO,SSE and SSR can all be arranged into quadratic forms. SSTO = y (I 1 n J)y SSE = y (I H)y SSR = y (H 1 n J)y

Inference We can derive the sampling variance of the β vector estimator by remembering that b = (X X) 1 X y = Ay where A is a constant matrix A = (X X) 1 X A = X(X X) 1 Using the standard matrix covariance operator we see that σ 2 {b} = Aσ 2 {y}a

Variance of b Since σ 2 {y} = σ 2 I we can write σ 2 {b} = (X X) 1 X σ 2 IX(X X) 1 = σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1 I = σ 2 (X X) 1 Of course E(b) = E((X X) 1 X y) = (X X) 1 X E(y) = (X X) 1 X Xβ = β

F-test for regression H 0 : β 1 = β 2 = = β p 1 = 0 H a : no all β k, (k = 1,, p 1) equal zero Test statistic: Decision Rule: F = MSR MSE if F F (1 α; p 1, n p), conclude H 0 if F > F (1 α; p 1, n p), conclude H a

R 2 and adjusted R 2 The coefficient of multiple determination R 2 is defined as: R 2 = SSR SSTO = 1 SSE SSTO 0 R 2 1 R 2 always increases when there are more variables. Therefore, adjusted R 2 : R 2 a = 1 SSE n p SSTO n 1 R 2 a may decrease when p is large. Coefficient of multiple correlation: Always positive square root! = 1 R = R 2 ( ) n 1 SSE n p SSTO

Inferences The estimated variance-covariance matrix Then, we have s 2 {b} = MSE(X X) 1 b k β k s{b k } t(n p), k = 0, 1,, p 1 1 α confidence intervals: b k ± t(1 α/2; n p)s{b k }

t test Tests for β k : H 0 : β k = 0 H 1 : β k 0 Test Statistic: t = b k s{b k } Decision Rule: t t(1 α/2; n p); conclude H 0 Otherwise, conclude H a

Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression Model 3 Polynomial Regression Models 4 Interaction Regression Models 5 Model Selection 6 Weighted Least Squares and Ridge Regression 7 Principle Component Analysis

Extra Sums of Squares A topic unique to multiple regression An extra sum of squares measures the marginal decrease in the error sum of squares when one or several predictor variables are added to the regression model, given that other variables are already in the model. Equivalently-one can view an extra sum of squares as measuring the marginal increase in the regression sum of squares

Definitions Definition SSR(X 1 X 2 ) = SSE(X 2 ) SSE(X 1, X 2 ) Equivalently SSR(X 1 X 2 ) = SSR(X 1, X 2 ) SSR(X 2 ) We can switch the order of X 1 and X 2 in these expressions We can easily generalize these definitions for more than two variables SSR(X 3 X 1, X 2 ) = SSE(X 1, X 2 ) SSE(X 1, X 2, X 3 ) SSR(X 3 X 1, X 2 ) = SSR(X 1, X 2, X 3 ) SSR(X 1, X 2 )

ANOVA Table Various software packages can provide extra sums of squares for regression analysis. These are usually provided in the order in which the input variables are provided to the system, for instance Figure:

Test whether a single β k = 0 Does X k provide statistically significant improvement to the regression model fit? We can use the general linear test approach Example First order model with three predictor variables Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ɛ i We want to answer the following hypothesis test H 0 : β 3 = 0 H 1 : β 3 0

Test whether a single β k = 0 For the full model we have SSE(F ) = SSE(X 1, X 2, X 3 ) The reduced model is Y i = β 0 + β 1 X i1 + β 2 X i2 + ɛ i And for this model we have SSE(R) = SSE(X 1, X 2 ) Where there are df r = n 3 degrees of freedom associated with the reduced model

Test whether a single β k = 0 The general linear test statistics is which becomes F = SSE(R) SSE(F ) df R df F / SSE(F ) df F F = SSE(X 1,X 2 ) SSE(X 1,X 2,X 3 ) (n 3) (n 4) / SSE(X 1,X 2,X 3 ) n 4 but SSE(X 1, X 2 ) SSE(X 1, X 2, X 3 ) = SSR(X 3 X 1, X 2 )

Test whether a single β k = 0 The general linear test statistics is F = SSR(X 3 X 1,X 2 ) 1 / SSE(X 1,X 2,X 3 ) n 4 = MSR(X 3 X 1,X 2 ) MSE(X 1,X 2,X 3 ) Extra sum of squares has one associated degree of freedom.

Test whether β k = 0 Another example H 0 : β 2 = β 3 = 0 H 1 : not both β 2 and β 3 are zero The general linear test can be used again F = SSE(X 1) SSE(X 1,X 2,X 3 ) (n 2) (n 4) / SSE(X 1,X 2,X 3 ) n 4 But SSE(X 1 ) SSE(X 1, X 2, X 3 ) = SSR(X 2, X 3 X 1 ) so the expression can be simplified.

Tests concerning regression coefficients Summary: General linear test can be used to determine whether or not a predictor variable( or sets of variables) should be included in the model The ANOVA SSE s can be used to compute F test statistics Some more general tests require fitting the model more than once unlike the examples given.

Summary of Tests Concerning Regression Coefficients Test whether all β k = 0 Test whether a single β k = 0 Test whether some β k = 0 Test involving relationships among coefficients, for example, H 0 : β 1 = β 2 vs. H a : β 1 β 2 H 0 : β 1 = 3, β 2 = 5 vs. H a : otherwise Key point in all tests: form the full model and the reduced model