Need for Several Predictor Variables

Similar documents
Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Linear Algebra Review

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Chapter 5 Matrix Approach to Simple Linear Regression

Matrix Approach to Simple Linear Regression: An Overview

6. Multiple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

STAT 540: Data Analysis and Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 9 SLR in Matrix Form

Formal Statement of Simple Linear Regression Model

Lecture 10 Multiple Linear Regression

Chapter 2 Multiple Regression I (Part 1)

LINEAR REGRESSION MODELS W4315

Lecture 4 Multiple linear regression

Well-developed and understood properties

Math 423/533: The Main Theoretical Topics

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Models for Quantitative and Qualitative Predictors: An Overview

1 Least Squares Estimation - multiple regression.

Multivariate Linear Regression Models

Ordinary Least Squares Regression

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Chapter 6 Multiple Regression

Chapter 14. Linear least squares

Chapter 12: Multiple Linear Regression

STA 2101/442 Assignment Four 1

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Ch 3: Multiple Linear Regression

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

2.1 Linear regression with matrices

Multiple Linear Regression

Nonparametric Regression and Bonferroni joint confidence intervals. Yang Feng

Introduction to Simple Linear Regression

MATH 644: Regression Analysis Methods

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

STAT 360-Linear Models

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Multiple Linear Regression

BIOS 2083: Linear Models

STAT 100C: Linear models

Simple Linear Regression

STAT 100C: Linear models

Advanced Quantitative Methods: ordinary least squares

3 Multiple Linear Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Section 4.6 Simple Linear Regression

Inference in Normal Regression Model. Dr. Frank Wood

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

STAT Homework 8 - Solutions

Simple and Multiple Linear Regression

Simple Linear Regression

Applied Regression Analysis

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

ANOVA: Analysis of Variance - Part I

Multivariate Regression

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

14 Multiple Linear Regression

Categorical Predictor Variables

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Weighted Least Squares

Lecture 1: Linear Models and Applications

STAT 705 Chapter 16: One-way ANOVA

Linear Models Review

Lecture 2: Linear and Mixed Models

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Chapter 12 - Lecture 2 Inferences about regression coefficient

Linear Regression. Junhui Qian. October 27, 2014

MIT Spring 2015

Lecture 13: Simple Linear Regression in Matrix Format

STA 302f16 Assignment Five 1

. a m1 a mn. a 1 a 2 a = a n

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Measuring the fit of the model - SSR

Remedial Measures, Brown-Forsythe test, F test

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Multivariate Regression (Chapter 10)

BNAD 276 Lecture 10 Simple Linear Regression Model

Unit 11: Multiple Linear Regression

Diagnostics and Remedial Measures

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

General Linear Model (Chapter 4)

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Ma 3/103: Lecture 24 Linear Regression I: Estimation

COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure)

Transcription:

Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression

Need for Several Predictor Variables Often the response is best understood as being a function of multiple input quantities Examples Spam filtering-regress the probability of an email being a spam message against thousands of input variables Football prediction - regress the probability of a goal in some short time span against the current state of the game.

First-Order with Two Predictor Variables When there are two predictor variables X 1 and X 2 the regression model Y i = β 0 + β 1 X i1 + β 2 X i2 + ɛ i is called a first-order model with two predictor variables. A first order model is linear in the predictor variables. X i1 and X i2 are the values of the two predictor variables in the i th trial.

Functional Form of Regression Surface Assuming noise equal to zero in expectation E(Y ) = β 0 + β 1 X 1 + β 2 X 2 The form of this regression function is of a plane -e.g. E(Y ) = 10 + 2X1 + 5X 2

Loess example

Meaning of Regression Coefficients β 0 is the intercept when both X 1 and X 2 are zero; β 1 indicates the change in the mean response E(Y ) per unit increase in X 1 when X 2 is held constant β 2 -vice versa Example: fix X 2 = 2 E(Y ) = 10 + 2X 1 + 5(2) = 20 + 2X 1 X 2 = 2 intercept changes but clearly linear In other words, all one dimensional restrictions of the regression surface are lines.

Terminology 1. When the effect of X 1 on the mean response does not depend on the level X 2 (and vice versa) the two predictor variables are said to have additive effects or not to interact. 2. The parameters β 1 and β 2 are sometimes called partial regression coefficients.

Comments 1. A planar response surface may not always be appropriate, but even when not it is often a good approximate descriptor of the regression function in local regions of the input space 2. The meaning of the parameters can be determined by taking partials of the regression function w.r.t. to each.

First order model with > 2 predictor variables Let there be p 1 predictor variables, then Y i = β 0 + β 1 X i1 + β 2 X i2 +... + β p 1 X i,p 1 + ɛ i which can also be written as p 1 Y i = β 0 + β k X ik + ɛ i k=1 and if X i0 = 1 is also can be written as where X i0 = 1 p 1 Y i = β k X ik + ɛ i k=0

Geometry of response surface In this setting the response surface is a hyperplane This is difficult to visualize but the same intuitions hold Fixing all but one input variables, each β p tells how much the response variable will grow or decrease according to that one input variable

General Linear Regression Model We have arrived at the general regression model. In general the X 1,..., X p 1 variables in the regression model do not have to represent different predictor variables, nor do they have to all be quantitative(continuous). The general model is p 1 Y i = β k X ik + ɛ i where X i0 = 1 k=0 with response function when E(ɛ i )=0 is E(Y ) = β 0 + β 1 X 1 +... + β p 1 X p 1

Qualitative(Discrete) Predictor Variables Until now we have (implicitly) focused on quantitative (continuous) predictor variables. Qualitative(discrete) predictor variables often arise in the real world. Examples: Patient sex: male/female/other Goal scored in last minute: yes/no Etc

Example Regression model to predict the length of hospital stay(y) based on the age (X 1 ) and gender(x 2 ) of the patient. Define gender as: X 2 = { 1 if patient female 0 if patient male And use the standard first-order regression model Y i = β 0 + β 1 X i1 + β 2 X i2 + ɛ i

Example cont. WhereX i1 is patient s age, and X i2 is patient s gender If X 2 = 0, the response function is E(Y ) = β 0 + β 1 X 1 Otherwise, it s E(Y ) = (β 0 + β 2 ) + β 1 X 1 Which is just another parallel linear response function with a different intercept

Polynomial Regression Polynomial regression models are special cases of the general regression model. They can contain squared and higher-order terms of the predictor variables The response function becomes curvilinear. For example Y 1 = β 0 + β 1 X i + β 2 Xi 2 + ɛ i which clearly has the same form as the general regression model.

General Regression Transformed variables log Y, 1/Y Interaction effects Y i = β 0 + β 1 X i1 + β 2 X i2 + β 2 X i1 X i2 + ɛ i Combinations Y i = β 0 + β 1 X i1 + β 2 X 2 i1 + β 3X i2 + β 4 X i1 X i2 + ɛ i Key point-all linear in parameters!

General Regression Model in Matrix Terms Y 1. 1 X 11 X 12... X 1,p 1 Y =. X =.... 1 X n1 X n2... X n,p 1 Y n β 0 ɛ 1.. β =. ɛ =... β p 1 ɛ n

General Linear Regression in Matrix Terms With E(ɛ) = 0 and Y = X β + ɛ σ 2 0... 0 σ 2 (ɛ) = 0 σ 2... 0... 0 0... σ 2 We have E(Y ) = X β and σ 2 {y} = σ 2 I

Least Square Solution The matrix normal equations can be derived directly from the minimization of w.r.t to β Q = (y Xβ) (y Xβ)

Least Square Solution We can solve this equation X Xb = X y (if the inverse of X X exists) by the following (X X) 1 X Xb = (X X) 1 X y and since we have (X X) 1 X X = I b = (X X) 1 X y

Fitted Values and Residuals Let the vector of the fitted values are ŷ 1 ŷ 2 ŷ =... ŷ n in matrix notation we then have ŷ = Xb

Hat Matrix-Puts hat on y We can also directly express the fitted values in terms of X and y matrices ŷ = X(X X) 1 X y and we can further define H, the hat matrix ŷ = Hy H = X(X X) 1 X The hat matrix plans an important role in diagnostics for regression analysis.

Hat Matrix Properties 1. the hat matrix is symmetric 2. the hat matrix is idempotent, i.e. HH = H Important idempotent matrix property For a symmetric and idempotent matrix A, rank(a) = trace(a), the number of non-zero eigenvalues of A.

Residuals The residuals, like the fitted value ŷ can be expressed as linear combinations of the response variable observations Y i e = y ŷ = y Hy = (I H)y also, remember e = y ŷ = y Xb these are equivalent.

Covariance of Residuals Starting with e = (I H)y we see that σ 2 {e} = (I H)σ 2 {y}(i H) but σ 2 {y} = σ 2 {ɛ} = σ 2 I which means that σ 2 {e} = σ 2 (I H)I(I H) = σ 2 (I H)(I H) and since I H is idempotent (check) we have σ 2 {e} = σ 2 (I H)

ANOVA We can express the ANOVA results in matrix form as well, starting with where SSTO = (Y i Ȳ ) 2 = Y 2 i ( Y i ) 2 n leaving y y = Y 2 i ( Y i ) 2 n = 1 n y Jy SSTO = y y 1 n y Jy

SSE Remember SSE = e 2 i = (Y i Ŷ i ) 2 In matrix form this is SSE = e e = (y Xb) (y Xb) = y y 2b X y + b X Xb = y y 2b X y + b X X(X X) 1 X y = y y 2b X y + b IX y Which when simplified yields SSE = y y b X y or, remembering that b = (X X) 1 X y yields SSE = y y y X(X X) 1 X y

SSR We know that SSR = SSTO SSE, where SSTO = y y 1 n y Jy and SSE = y y b X y From this SSR = b X y 1 n y Jy and replacing b like before SSR = y X(X X) 1 X y 1 n y Jy

Quadratic forms The ANOVA sums of squares can be interpretted as quadratic forms. An example of a quadratic form is given by 5Y 2 1 + 6Y 1Y 2 + 4Y 2 2 Note that this can be expressed in matrix notation as (where A is always (in the case of a quadratic form) a symmetric matrix) ( ) ( ) ( ) 5 3 Y1 Y1 Y 2 3 4 = y Ay The off diagonal terms must both equal half the coefficient of the cross-product because multiplication is associative. Y 2

Quadratic Forms In general, a quadratic form is defined by y Ay = i j a ijy i Y j where a ij = a ji with A the matrix of the quadratic form. The ANOVA sums SSTO,SSE and SSR can all be arranged into quadratic forms. SSTO = y (I 1 n J)y SSE = y (I H)y SSR = y (H 1 n J)y

Inference We can derive the sampling variance of the β vector estimator by remembering that b = (X X) 1 X y = Ay where A is a constant matrix A = (X X) 1 X A = X(X X) 1 Using the standard matrix covariance operator we see that σ 2 {b} = Aσ 2 {y}a

Variance of b Since σ 2 {y} = σ 2 I we can write σ 2 {b} = (X X) 1 X σ 2 IX(X X) 1 = σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1 I = σ 2 (X X) 1 Of course E(b) = E((X X) 1 X y) = (X X) 1 X E(y) = (X X) 1 X Xβ = β