STAT5044: Regression and Anova

Similar documents
Sections 7.1, 7.2, 7.4, & 7.6

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Chapter 2 Multiple Regression (Part 4)

STAT5044: Regression and Anova. Inyoung Kim

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

Ch 3: Multiple Linear Regression

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Regression Models - Introduction

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

4 Multiple Linear Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

STAT Checking Model Assumptions

STAT 705 Chapter 16: One-way ANOVA

Ch 2: Simple Linear Regression

Regression Models for Quantitative and Qualitative Predictors: An Overview

Chapter 6 Multiple Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Inference for Regression Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

Lecture 13 Extra Sums of Squares

Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Lecture 6 Multiple Linear Regression, cont.

3 Multiple Linear Regression

STAT Chapter 11: Regression

STAT 540: Data Analysis and Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Regression Models - Introduction

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

2. Regression Review

Statistical Techniques II EXST7015 Simple Linear Regression

Inferences for Regression

Multiple regression: introduction

Lecture 10 Multiple Linear Regression

The Multiple Regression Model

Measuring the fit of the model - SSR

Chapter 3: Multiple Regression. August 14, 2018

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Linear Regression. 1 Introduction. 2 Least Squares

Inference for Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

Chapter 2 Inferences in Simple Linear Regression

STA 4210 Practise set 2b

Simple Linear Regression

F-tests and Nested Models

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Unbalanced Data in Factorials Types I, II, III SS Part 1

Applied Econometrics (QEM)

Homework 2: Simple Linear Regression

Simple and Multiple Linear Regression

STAT 3A03 Applied Regression With SAS Fall 2017

Simple Linear Regression Analysis

14 Multiple Linear Regression

MATH 644: Regression Analysis Methods

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Overview Scatter Plot Example

Correlation and the Analysis of Variance Approach to Simple Linear Regression

STA 4210 Practise set 2a

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

STOR 455 STATISTICAL METHODS I

Multiple Linear Regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Lecture 18 MA Applied Statistics II D 2004

Basic Business Statistics 6 th Edition

Simple Linear Regression

Formal Statement of Simple Linear Regression Model

TMA4255 Applied Statistics V2016 (5)

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Chapter 8 Quantitative and Qualitative Predictors

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Lecture 14 Simple Linear Regression

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Lecture 6: Linear models and Gauss-Markov theorem

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

6. Multiple Linear Regression

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

A Modern Look at Classical Multivariate Techniques

Well-developed and understood properties

Chapter 1. Linear Regression with One Predictor Variable

Introduction to Estimation Methods for Time Series models. Lecture 1

ST430 Exam 1 with Answers

Lecture 4 Multiple linear regression

Applied Regression Analysis. Section 2: Multiple Linear Regression

Statistical Modelling in Stata 5: Linear Models

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

y response variable x 1, x 2,, x k -- a set of explanatory variables

where x and ȳ are the sample means of x 1,, x n

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Inference for the Regression Coefficient

Multivariate Regression (Chapter 10)

Simple Linear Regression

Mathematics for Economics MA course

Applied Regression Analysis

General Linear Model: Statistical Inference

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Transcription:

STAT5044: Regression and Anova Inyoung Kim 1 / 25

Outline 1 Multiple Linear Regression 2 / 25

Basic Idea An extra sum of squares: the marginal reduction in the error sum of squares when one or several predictor variables are added to the regression model, given that other predictor variables are already in the model. Equivalently, one can view an extra sum of squares as measuring the marginal increase in regression sum of squares when one or several predictor variables are added to the regression model. 3 / 25

Example A study of the relation of amount of body fat (Y ) to several possible predictor variables, based on a sample of 20 healthy females 25-34 years old. The possible predictor variables are triceps skinfold thickness (X 1 ), thigh circumference (X 2 ), and midarm circumference (X 3 ). The amount of body fat for each of the 20 persons was obtained by a cumbersome and expensive procedure requiring the immersion of the person in water. It would therefore be very helpful if a regression model with some or all of these predictor variables could provide reliable estimates of the amount of body fat since the measurements needed for the predictor variables are easy to obtain. 4 / 25

Multiple Linear Regression When there are m independent variables, y i = β 0 + β 1 x i1 + β 2 x i2 + + β m x im + ε i where ε i N(0,σ 2 ). Special case is polynomial regression: X m = x m Y = Xβ + ε X = [1 x 1 x 2 x m ] LSE ˆβ = (X t X) 1 X t y N(β,(X t X) 1 σ 2 ) 5 / 25

Multiple Regression Set of methods for estimation of model parameters and testing hypotheses about simple model relating a response and explanatory variable Is there a linear relationship? what variables are important? Is there a best model to use? Predict a new value Predict the value of the explanatory variable that causes a specified response 6 / 25

Multiple Regression Similar to SLR - multiple explanatory vars Tools: graphical assessment more difficult interpretation more complex scatterplot matrix sequential and partial tests variable selection methods residual analysis 7 / 25

Gauss Markov Theorem E(Y ) = Xβ, Var(Y ) = σ 2 I, ˆβ : LS estimator of β C t ˆβ is an unbiased estimate of C t β For any other linear unbiased estimate of C t β, Ψ, Var( Ψ) Var( ˆΨ), where ˆΨ = C t ˆβ NOTE: Do NOT need normality assumption 8 / 25

Testing H 0 : There is no regression relationship between Y and X. β 1 = β 2 = = β m = 0 H 1 : at least one parameter 0 ANOVA table Source SS df regression (Ŷ i Ȳ ) 2 m error (Y i Ŷ i ) 2 n-m-1 total (Y i Ȳ.. ) n-1 Test statistic F m,n m 1 = SSreg/m SSE/(n m 1) 9 / 25

Testing H 0 : β 3 = 0 H 0 : β 3 = 0 vs H a : β 3 0 Reduced model: Y i = β 0 + β 1 X i1 + β 2 X i2 + ε i SSE(R) = SSE(X 1,X 2 ) Full model: Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ε i SSE(F) = SSE(X 1,X 2,X 3 ) Test statistic: what is df F =? what is df R =? F = (SSE(R) SSE(F))/(df R df F ) SSE(F)/df F 10 / 25

Testing H 0 : β k = 0 H 0 : β k = 0 vs H a : β k 0 Reduced model: Y i = β 0 + β 1 X i1 + + β k 1 X i,k 1 + β k+1 X i,k+1 + β m x mi + ε i SSE(R) = SSE(X 1,...,X k 1,X k+1,...,x m ) Full model: Y i = β 0 + β 1 X i1 + + β k X i,k + + β m x mi + ε i SSE(F) = SSE(X 1,...,X k,...,x m ) Test statistic: what is df F = what is df R = F = (SSE(R) SSE(F))/(df R df F ) SSE(F)/df F 11 / 25

Testing H 0 : β 2 = β 3 = 0 H 0 : β 2 = β 3 = 0 vs H a : not both β 2 and β 3 equal zero Reduced model: Y i = β 0 + β 1 X i1 + ε i SSE(R) = SSE(X 1 ) Full model: Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ε i SSE(F) = SSE(X 1,X 2,X 3 ) SSE(X 1 ) SSE(X 1,X 2,X 3 ) = SSR(X 2,X 3 X 1 ) Test statistic: F = SSE(X 1) SSE(X 1,X 2,X 3 )/((n 2) (n 4)) SSE(X 1,X 2,X 3 )/(n 4) = SSR(X 2,X 3 X 1 )/(4 2) SSE(X 1,X 2,X 3 )/(n 4) = MSR(X 2,X 3 X 1 ) MSE(X 1,X 2,X 3 ) 12 / 25

Test H 0 : some β k = 0 H 0 : β q = β q+1 = = β p 1 = 0 vs H a : not all of the β k in H 0 equal zero. Test statistic: NOTE: F = SSR(X q,...,x p 1 X 1,...,X q 1 )/(p q)) SSE(X 1,...,X p 1 )/(n p) = MSR(X q,...,x p 1 X 1,...,X q 1 ) MSE SSR(X q,...,x p 1 X 1,...,X q 1 ) = SSR(X q X 1,...,X q 1 ) + + SSR(X p 1 X 1,...,X p 2 ) 13 / 25

Testing H 0 : β 1 = β 2 Full model: Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ε i Test: H 0 : β 1 = β 2 vs H a : β 1 β 2 Define β c as β 1 = β 2 Reduced model Y i = β 0 + β c (X i1 + X i2 ) + β 3 X i3 + ε i where β c : the common coefficient for β 1 and β 2 under H 0 and X i1 + X i2 is corresponding new X variable. Test statistics is What is df R? What is df F? F = (SSE(R) SSE(F))/(df R df F ) SSE(F)/df F 14 / 25

Test whether some β k = 0 Test statistic can be stated equivalently in terms of the coefficients of multiple determination for the full and reduced models when these models contain the intercept term β 0 as follow: F = (R2 y x 1,...,x p 1 R 2 y x 1,...,x q 1 )/(p q) (1 R 2 y x 1,...,x p 1 )/(n p) where R 2 Y x 1,...,x p 1 : the coefficient of multiple determination when Y is regression on all X variables and R 2 y x 1,...,x q 1 : the coefficient when Y is regressed on x 1,...,x q 1 only. Q: What is the coefficient of multiple determination? 15 / 25

Coefficients of Partial Determination Extra sums of squares are not only useful for tests on the regression coefficients of a multiple regression model, but they are also encountered in descriptive measures of relationship called coefficients of partial determination. Recall that the coefficient of multiple determination, R 2, measures the proportionate reduction in the variation of Y achieved by the introduction of the entire set of X variables considered in the model. A coefficient of partial determination, in contrast, measures the marginal contribution of one X variable when all others are already included in the model. 16 / 25

Calculation of Coefficients of Partial Determination R 2 y,x 1 x 2 = SSE(X 2) SSE(X 1,X 2 ) SSE(X 2 ) R 2 y,x 2 x 1 = SSE(X 1) SSE(X 1,X 2 ) SSE(X 1 ) = SSR(X 1 X 2 ) SSE(X 2 ) = SSR(X 2 X 1 ) SSE(X 1 ) R 2 y,x 1 x 2,x 3 = SSR(X 1 X 2,X 3 ) SSE(X 2,X 3 ) R 2 y,x 2 x 1,x 3 = SSR(X 2 X 1,X 3 ) SSE(X 1,X 3 ) R 2 y,x 3 x 1,x 2 = SSR(X 3 X 1,X 2 ) SSE(X 1,X 2 ) R 2 y,x 4 x 1,x 2,x 3 = SSR(X 4 X 1,X 2,x 3 ) SSE(X 1,X 2,x 3 ) 17 / 25

Coefficients of Partial Determination The coefficients of partial determination can take on values between 0 and 1, as the definitions readily indicates 18 / 25

Coefficients of Partial Determination A coefficient of partial determination can be interpreted as a coefficient of simple determination. Suppose we regress Y on X 2 and obtain the residuals e i (Y X 2 ) = Y i Ŷ i (X 2 ) where Ŷ i (X 2 ): fitted values of Y when X 2 is in the model. Suppose we further regress X 1 on X 2 and obtain the residuals e i (X 1 X 2 ) = X i1 ˆXi1 (X 2 ) where ˆXi1 (X 2 ): the fitted values of X 1 in the regression of X 1 on X 2. 19 / 25

Coefficients of Partial Determination The coefficient of simple determination R 2 between these two sets of residuals equals the coefficient of partial determination R 2 Y,X 1 X 2. This coefficient measures the relation between Y and X 1 when both of these variables have been adjusted for their linear relationships to X 2 20 / 25

Added variable plots/partial regression plots The plot of the residuals e i (Y X 2 ) against e i (X 1 X 2 ) provides a graphical representation of the strength of the relationship between Y and X 1, adjusted for X 2. Such plots of residuals, called added variable plots or partial regression plots 21 / 25

R 2 and partial correlation R 2 Y,X 2 X 1 = [r Y,X2 X 1 ] 2 = (r Y,X 2 r X1,X 2 r Y,X1 ) 2 (1 rx 2 1 X 2 )(1 ryx 2 1 ) R 2 Y,X 2 X 1 X 3 = [r Y,X2 X 1 X 3 ] 2 = (r Y,X 2 X 3 r X1,X 2 X 3 r Y,X1 X 3 ) 2 (1 r 2 )(1 X 1 X 2 X 3 r 2 YX 1 X 3 ) 22 / 25

Partial correlation r 2 y,x k x 1,...,x k 1 0 r 2 y,x k x 1,...,x k 1 1 r 2 y,x k x 1,...,x k 1 = 1 r 2 y,ŷ(x 1,...,x k 1,x k ) = 1 r 2 {y T (I H [k 1] )x k } 2 y,x k x 1,...,x k 1 = {y T (I H [k 1] )y}{xk t (I H [k 1])x k } where H [k 1] = X [k 1] [X t [k 1] X [k 1]] 1 X t k 1, X [k 1] = [1,x 1,...,x k 1 ] 23 / 25

Recall: R 2 R 2 : determination of coefficient. R 2 = SSreg S yy = 1 SSE S yy In the simple linear regression case, In general, R 2 = (r y,x ) 2 = [ (x i x) (y i ȳ)] 2 S yy S xx = r 2 R 2 = (r y,ŷ ) 2 = [ (Y i Ȳ )(Ŷ i Ŷ i )] 2 (Y i Ȳ ) 2 (Ŷ i Ŷ i ) 2 24 / 25

Proof In general, R 2 = (r y,ŷ ) 2 = [ (Y i Ȳ )(Ŷ i Ŷ i )] 2 (Y i Ȳ ) 2 (Ŷ i Ŷ i ) 2 25 / 25