Lecture 2. The Simple Linear Regression Model: Matrix Approach

Similar documents
Multiple Linear Regression

STAT 350: Geometry of Least Squares

MATH 644: Regression Analysis Methods

Categorical Predictor Variables

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

14 Multiple Linear Regression

Lecture 4 Multiple linear regression

Inference for Regression

Lecture 6 Multiple Linear Regression, cont.

Lecture 18: Simple Linear Regression

Comparing Nested Models

Simple Linear Regression

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

. a m1 a mn. a 1 a 2 a = a n

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Ch 2: Simple Linear Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

Lecture 10 Multiple Linear Regression

Weighted Least Squares

STAT5044: Regression and Anova. Inyoung Kim

Lecture 19 Multiple (Linear) Regression

Linear Regression. 1 Introduction. 2 Least Squares

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Variance Decomposition and Goodness of Fit

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Statistical Modelling in Stata 5: Linear Models

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

6. Multiple Linear Regression

Inferences for Regression

Well-developed and understood properties

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Math 423/533: The Main Theoretical Topics

Lecture 10. Factorial experiments (2-way ANOVA etc)

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Regression Models - Introduction

Scatter plot of data from the study. Linear Regression

11 Hypothesis Testing

Confidence Intervals, Testing and ANOVA Summary

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Chapter 3: Multiple Regression. August 14, 2018

Ch 3: Multiple Linear Regression

MS&E 226: Small Data

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

Linear Algebra Review

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Workshop 7.4a: Single factor ANOVA

Scatter plot of data from the study. Linear Regression

Topic 20: Single Factor Analysis of Variance

Weighted Least Squares

Chapter 14 Student Lecture Notes 14-1

Chapter 4: Regression Models

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

STAT 525 Fall Final exam. Tuesday December 14, 2010

Regression Analysis II

CHAPTER 5. Outlier Detection in Multivariate Data

Applied Regression Analysis

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

1 The Classic Bivariate Least Squares Model

Correlation and simple linear regression S5

13 Simple Linear Regression

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

Linear Regression Model. Badr Missaoui

Simple Linear Regression

STAT 540: Data Analysis and Regression

Chapter 5 Matrix Approach to Simple Linear Regression

STAT 705 Chapter 16: One-way ANOVA

Analysing data: regression and correlation S6 and S7

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

VIII. ANCOVA. A. Introduction

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

ST430 Exam 1 with Answers

Business Statistics. Lecture 10: Correlation and Linear Regression

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

y response variable x 1, x 2,, x k -- a set of explanatory variables

Mathematics for Economics MA course

Regression Models for Quantitative and Qualitative Predictors: An Overview

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Coefficient of Determination

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

5. Linear Regression

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

Multivariate Regression (Chapter 10)

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

STAT 3A03 Applied Regression With SAS Fall 2017

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

STAT 350: Summer Semester Midterm 1: Solutions

Appendix A: Review of the General Linear Model

Ordinary Least Squares Regression Explained: Vartanian

Transcription:

Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1

Vectors and Matrices Where it is necessary to consider a distribution of numbers, eg daily temperature, we collect the relevant numbers into an array. If the array is a single column or row, it is termed a vector. Matrices are arrays of rows and columns and they are enclosed with square brackets. 2

For example: the matrix called X might be X= 1 0 0 0 1 0 0 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 1 The dimensions of the matrix are denoted by the rows, followed by the columns. In this case we write: X 7,4. Each number in the matrix is called an element. 3

The identity matrix I is a diagonal matrix whose elements on the leading diagonal are all 1s. The other elements of I are 0s, such that I= 1 0 0 0 1 0...... 0 0 1 We have the result:. IA = AI = A In other words, I is the matrix equivalent of the number one in ordinary algebra. 4

Matrix Multiplication We define a matrix product by C p,s =A p,n B n,s. Matrix multiplication is via the elements of the rows of A being multiplied by the elements of the columns of B. 5

Let s consider a simple example: A = 1 2 3 4 B= 5 6 Then the product C = AB is given by: C= 17 39 1 2 3 4 5 6 = 1 5 + 2 6 3 5 + 4 6 = 6

Matrix Representation of the Simple Linear Regression Model The simple linear regression model was defined as Y i = β 0 + β 1 X i + ǫ i (1) where i = 1,..., n, ǫ i N(0, σ 2 ) Thus: Y 1 = β 0 + β 1 X 1 + ǫ 1 Y 2 = β 0 + β 1 X 2 + ǫ 2. Y n = β 0 + β 1 X n + ǫ n 7

We can define the observation vector Y, the X matrix, the vector of regression coefficients β, and the residuals vector, ǫ as follows: Y n,1 = β 2,1 = Y 1 Y 2. Y n [ β0 β 1 ] X n,2 = ǫ n,1 = 1 X 1 1 X 2. 1 X n ǫ 1 ǫ 2. ǫ n 8

Thus (1) can be written in matrix form as Y = Xβ + ǫ where Y is a vector of responses β is a vector of parameters (regression coefficients) X is a matrix of constants. 9

Note that X is called the design matrix and the first column of 1 s in the design matrix is associated with the intercept. ǫ is a vector of independent normal random variables with expectation E(ǫ) = 0 and a variance-covariance matrix σ 2 (ǫ)= σ 2 0 0 0 σ 2 0.... 0 0 σ 2 = σ 2 I where I is an n n identity matrix. 10

Example: For the following simple data set, use R to fit the simple linear regression model and then represent it in matrix terms. X 1 1 3 3 Y 1 3 4 6 ######## R code ######## options(digits=3) X <- c(1,1,3,3) Y <- c(1,3,4,6) # fit the linear model # include argument "x=t" # to store the design matrix XY.lm <- lm(y~x,x=t) # Summary of coefficients print(summary(xy.lm)) 11

######## Output ######## Call: lm(formula = Y ~ X) Residuals: 1 2 3 4-1 1-1 1 Coefficients: Estimate SE t P(> t ) Inter 0.50 1.58 0.32 0.78 X 1.50 0.71 2.12 0.17 Residual standard error: 1.41 on 2 df Multiple R-Squ: 0.692, Adjusted R-squ: 0.538 F-statistic: 4.5 on 1 and 2 DF, p-value: 0.168 12

Find Ŷ, the design matrix X, the coefficients vector ˆβ, and the residuals vector ˆǫ. ˆβ is the vector of coefficients estimated from the data. ######## R code ######## # print estimates of intercept and slope print(xy.lm$coefficients) # print the observed y, the fitted values # and the residuals print(cbind(y, xy.lm$fitted, xy.lm$residuals)) # print the design matrix print(xy.lm$x) 13

######## Output ######## (Intercept) X 0.5 1.5 Y 1 1 2-1 2 3 2 1 3 4 5-1 4 6 5 1 (Intercept) X 1 1 1 2 1 1 3 1 3 4 1 3 14

We can thus write the fitted model in the form Y = X ˆβ + ˆǫ, where X = 1 1 1 1 1 3 1 3 ˆǫ= 1 1 1 1 ˆβ = 0.5 1.5 15

Hence, the full model for this example in matrix form becomes: 1 3 4 6 = 1 1 1 1 1 3 1 3 0.5 1.5 + 1 1 1 1 16

Exercise Use matrix multiplication to show that the fitted values, Ŷ, can be found using Ŷ = X ˆβ. 17

Solution Ŷ =X ˆβ = 1 1 1 1 1 3 1 3 0.5 1.5 = 2 2 5 5 18

Lecture 3 One way ANOVA: Matrix Approach 19

Predictor Variable as Factor In the previous lecture the predictor (or explanatory) variable was a quantitative variable. Suppose we now consider an experiment where the treatments correspond to different levels of a single factor. In other words, our predictor variable is a categorical variable. 20

Example: Let s look at the example from Lecture 2, but with a slight modification - X is now a factor. X A A B B Y 1 3 4 6 We now fit the model Y ij = µ i + ǫ ij where i = 1,2 j = 1,2 ǫ ij N(0, σ 2 ) 21

We will compare two models in R: one where the intercept is included one where it is not. 22

First, fit the model with an intercept. The R default : (Y g) allows a comparison of group means. Let s produce the fitted values and the estimates for the means. 23

######## R code ######## X <- c("a","a","b","b") # declare predictor as factor. g <- factor(x) Y <- c(1,3,4,6) # linear model # store design matrix xy.lm <- lm(y~g,x=t) print(summary(xy.lm)) print(xy.lm$coefficients) print(cbind(y,xy.lm$fitted, xy.lm$residuals))

######## Output ######## Call: lm(formula = Y ~ g) Residuals: 1 2 3 4-1 1-1 1 Coefficients: Estimate Std.Error t Pr(> t ) (Intercept) 2.000 1.000 2.000 0.184 gb 3.000 1.414 2.121 0.168 Residual standard error: 1.414 on 2 DF Multiple R-Squ: 0.6923, Adjusted R-squ: 0.5385 F-statistic: 4.5 on 1 and 2 DF, p-value: 0.1679 (Intercept) gb 2 3 Y 1 1 2-1 2 3 2 1 3 4 5-1 4 6 5 1 24

The fitted model in the form Ŷ = X ˆβ is given by: 2 2 5 5 = 1 0 1 0 1 1 1 1 [ 2 3 ] (2) So the parameter estimates are ˆβ = ˆµ A ˆµB ˆµ A 25

The coefficients, from R, are the mean for the base level (level A) the difference between the mean for level B and the mean for level A. Hence, the estimate for µ A is 2 and the estimate for µ B is found by adding the two coefficients together: 2+3 = 5. Note that the default base level in R is decided alphanumerically. 26

Now the form Y = Ŷ + ˆǫ is given by 1 3 4 6 = 2 2 5 5 + 1 1 1 1 27

Now, let s exclude the intercept term. The form (Y g - 1) gives the individual group means and standard errors. ######## R code ######## # do not include intercept xy1.lm <- lm(y~g-1,x=t) print(summary(xy1.lm)) print(xy1.lm$coefficients) print(cbind(y,xy1.lm$fitted, xy1.lm$residuals)) 28

######## Output ######## Call: lm(formula = Y ~ g - 1) Residuals: 1 2 3 4-1 1-1 1 Coefficients: Estimate Std. Error t value Pr(> t ) ga 2 1 2 0.1835 gb 5 1 5 0.0377 Residual standard error: 1.414 on 2 DF Multiple R-Squ: 0.9355, Adjusted R-squ: 0.871 F-statistic: 14.5 on 2 and 2 DF, p-value: 0.065 ga gb 2 5 Y 1 1 2-1 2 3 2 1 3 4 5-1 4 6 5 1 29

The fitted model in the form Ŷ = X ˆβ is given by 2 2 5 5 = 1 0 1 0 0 1 0 1 2 5 So the parameters are ˆβ = ˆµ A and ˆµB we can obtain the individual estimates of µ A and µ B directly. 30

Compare this with (2) for the intercept model: 2 2 5 5 = 1 0 1 0 1 1 1 1 2 3 31

Note that the different paramaterization does not affect the outcome. The form Y = Ŷ + ˆǫ is the same as before. 1 3 4 6 = 2 2 5 5 + 1 1 1 1 32

Comments: Compare the design matrices for the two models: print(xy.lm$x) # model with intercept (Intercept) gb 1 1 0 2 1 0 3 1 1 4 1 1 print(xy1.lm$x) # model without intercept ga gb 1 1 0 2 1 0 3 0 1 4 0 1 33

The first design matrix corresponds to the model Y g and contains an intercept term corresponding to the base level A of factor g. The second column corresponds to the difference between the base level and level B. The second design matrix arises from the model Y g - 1 and has no intercept and provides individual estimates of the means for levels A and B. 34

Why consider the two models? The first model, which includes an intercept (Y g) allows us to test for differences between means. The model excluding the intercept term (Y g 1) provides individual estimates of the parameters but does not allow a test for differences. 35

Lecture 4 The General Linear Model: More complex models, where there is more than one explanatory variable (quantitative and/or qualitative). The simple linear regression model and one-way analysis of variance are special cases of the general linear model, with only one predictor variable. 36

The General Linear Regression Model We will assume there are p 1 predictor variables, X 1, X 2,..., X p 1, hence Y i = β 0 +β 1 X 1,i +β 2 X 2,i +...+β p 1 X p 1,i +ǫ i (3) ǫ i represents the random part of the model. As for simple linear regression, it is assumed that the ǫ i N(0, σ 2 ) and are independently distributed. The mean response (or systematic part of the model) is then µ Yi = β 0 + β 1 X 1,i + β 2 X 2,i +...+β p 1 X p 1,i. 37

When our model contains 2 predictor variables, we move from a straight line representation to a surface. For example, Y = β 0 + β 1 X 1 + β 2 X 2 + ǫ produces a flat surface: Y X1 X2 38

More complex general linear models produce twisted or curved surfaces. For example: Y = β 0 + β 1 X1 2 + β 2 X2 2 + ǫ produces: Y X1 X2 39

Interpretation of regression coefficients For simple linear regression the slope parameter, β 1, can then be interpreted as the expected increase in the response variable, Y, when the predictor, X, is increased by one unit. In multiple regression, β k is the expected change in response when the value of X k is increased by one unit provided the other predictors remain unchanged. 40

Hence the parameters, β 1,..., β p 1 are called partial regression coefficients. Caution: trying to interpret partial regression parameters by holding all other predictors constant is very dangerous when the predictor variables are correlated. A change in one predictor variable will result in changes to some (or all) of the other predictors. 41

Estimation of the Parameters The parameters, β 0, β 1,..., β p 1 are unknown constants. The estimates will be denoted by ˆβ 0, ˆβ 1,..., ˆβ p 1. Hence, Ŷ i, the predicted response for the i th observation is given by: Ŷ i = ˆβ 0 +ˆβ 1 X 1,i +ˆβ 2 X 2,i +...+ˆβ p 1 X p 1,i The i th residual is then defined as, ˆǫ i = observed predicted response = Y i Ŷ i = Y i (ˆβ 0 + ˆβ 1 X 1,i + ˆβ 2 X 2,i +... + ˆβ p 1 X p 1,i ) 42

To estimate σ 2, we use the residual mean square error, s 2. There are p parameters to be estimated for multiple linear regression, (β 0, β 1,..., β p 1 ), so s 2 has n p = n (p 1) 1 degrees of freedom. Source df Regression p 1 Residual n p (For simple linear regression p = 2 (β 0, β 1 ) so that s 2 has n 2 degrees of freedom as we saw in Chapter 1.) 43

Two Significance Tests for Regression Coefficients Two types of hypothesis are of interest. 1. H 0 : no relationship between the observed value, Y i, and any of the predictors. H 0 : β 1 = β 2 = = β p 1 = 0 H a : not all coefficients are equal to 0. For this test we use the test statistic: F = MSR MSE F p,n p 44

Two Significance Tests for Regression Coefficients The second type of hypothesis of interest is that an individual coefficient is equal to zero. That is: H 0 : β k = 0 H a : β k 0. These hypotheses are tested using the test statistic: T = ˆβ k se(ˆβ k ) t n p The t-tests are also obtained from the R output. 45

Matrix Representation The model can be written in matrix form as Y = Xβ + ǫ Note that this is the same representation we use for simple linear regression. 46

Here, X n,p = Y n,1 = Y 1 Y 2. Y n, 1 X 11 X 21 X p 1,1 1 X 12 X 22 X p 1,2....., 1 X 1n X 2n X p 1,n β p,1 = β 0 β 1. β p 1, ǫ n,1 = ǫ 1 ǫ 2. ǫ n 47

The vectors, Y and ǫ, are the same as for the simple linear regression case. The vector β contains the extra regression coefficients corresponding to the additional predictor variables. The design matrix X contains extra columns of n observations for each of the additional predictor variables in the model. 48

The fitted values are represented, as before, by: Ŷ = X ˆβ

Summary We have already seen that the simple linear regression, t-tests and one-way anova are all examples of the general linear model. Other situations that we will consider in subsequent lectures include: models with more than one quantitative predictor variable. models with more than one qualitative predictor (factorial designs) 49

models with quantitative and qualitative predictors (sometimes called analysis of covariance) models with interaction terms polynomial regression, where the model contains squared and higher order terms of the predictor variable(s).

All of these can be represented in matrix form as : Y = Xβ + ǫ 50