Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1
Vectors and Matrices Where it is necessary to consider a distribution of numbers, eg daily temperature, we collect the relevant numbers into an array. If the array is a single column or row, it is termed a vector. Matrices are arrays of rows and columns and they are enclosed with square brackets. 2
For example: the matrix called X might be X= 1 0 0 0 1 0 0 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 1 The dimensions of the matrix are denoted by the rows, followed by the columns. In this case we write: X 7,4. Each number in the matrix is called an element. 3
The identity matrix I is a diagonal matrix whose elements on the leading diagonal are all 1s. The other elements of I are 0s, such that I= 1 0 0 0 1 0...... 0 0 1 We have the result:. IA = AI = A In other words, I is the matrix equivalent of the number one in ordinary algebra. 4
Matrix Multiplication We define a matrix product by C p,s =A p,n B n,s. Matrix multiplication is via the elements of the rows of A being multiplied by the elements of the columns of B. 5
Let s consider a simple example: A = 1 2 3 4 B= 5 6 Then the product C = AB is given by: C= 17 39 1 2 3 4 5 6 = 1 5 + 2 6 3 5 + 4 6 = 6
Matrix Representation of the Simple Linear Regression Model The simple linear regression model was defined as Y i = β 0 + β 1 X i + ǫ i (1) where i = 1,..., n, ǫ i N(0, σ 2 ) Thus: Y 1 = β 0 + β 1 X 1 + ǫ 1 Y 2 = β 0 + β 1 X 2 + ǫ 2. Y n = β 0 + β 1 X n + ǫ n 7
We can define the observation vector Y, the X matrix, the vector of regression coefficients β, and the residuals vector, ǫ as follows: Y n,1 = β 2,1 = Y 1 Y 2. Y n [ β0 β 1 ] X n,2 = ǫ n,1 = 1 X 1 1 X 2. 1 X n ǫ 1 ǫ 2. ǫ n 8
Thus (1) can be written in matrix form as Y = Xβ + ǫ where Y is a vector of responses β is a vector of parameters (regression coefficients) X is a matrix of constants. 9
Note that X is called the design matrix and the first column of 1 s in the design matrix is associated with the intercept. ǫ is a vector of independent normal random variables with expectation E(ǫ) = 0 and a variance-covariance matrix σ 2 (ǫ)= σ 2 0 0 0 σ 2 0.... 0 0 σ 2 = σ 2 I where I is an n n identity matrix. 10
Example: For the following simple data set, use R to fit the simple linear regression model and then represent it in matrix terms. X 1 1 3 3 Y 1 3 4 6 ######## R code ######## options(digits=3) X <- c(1,1,3,3) Y <- c(1,3,4,6) # fit the linear model # include argument "x=t" # to store the design matrix XY.lm <- lm(y~x,x=t) # Summary of coefficients print(summary(xy.lm)) 11
######## Output ######## Call: lm(formula = Y ~ X) Residuals: 1 2 3 4-1 1-1 1 Coefficients: Estimate SE t P(> t ) Inter 0.50 1.58 0.32 0.78 X 1.50 0.71 2.12 0.17 Residual standard error: 1.41 on 2 df Multiple R-Squ: 0.692, Adjusted R-squ: 0.538 F-statistic: 4.5 on 1 and 2 DF, p-value: 0.168 12
Find Ŷ, the design matrix X, the coefficients vector ˆβ, and the residuals vector ˆǫ. ˆβ is the vector of coefficients estimated from the data. ######## R code ######## # print estimates of intercept and slope print(xy.lm$coefficients) # print the observed y, the fitted values # and the residuals print(cbind(y, xy.lm$fitted, xy.lm$residuals)) # print the design matrix print(xy.lm$x) 13
######## Output ######## (Intercept) X 0.5 1.5 Y 1 1 2-1 2 3 2 1 3 4 5-1 4 6 5 1 (Intercept) X 1 1 1 2 1 1 3 1 3 4 1 3 14
We can thus write the fitted model in the form Y = X ˆβ + ˆǫ, where X = 1 1 1 1 1 3 1 3 ˆǫ= 1 1 1 1 ˆβ = 0.5 1.5 15
Hence, the full model for this example in matrix form becomes: 1 3 4 6 = 1 1 1 1 1 3 1 3 0.5 1.5 + 1 1 1 1 16
Exercise Use matrix multiplication to show that the fitted values, Ŷ, can be found using Ŷ = X ˆβ. 17
Solution Ŷ =X ˆβ = 1 1 1 1 1 3 1 3 0.5 1.5 = 2 2 5 5 18
Lecture 3 One way ANOVA: Matrix Approach 19
Predictor Variable as Factor In the previous lecture the predictor (or explanatory) variable was a quantitative variable. Suppose we now consider an experiment where the treatments correspond to different levels of a single factor. In other words, our predictor variable is a categorical variable. 20
Example: Let s look at the example from Lecture 2, but with a slight modification - X is now a factor. X A A B B Y 1 3 4 6 We now fit the model Y ij = µ i + ǫ ij where i = 1,2 j = 1,2 ǫ ij N(0, σ 2 ) 21
We will compare two models in R: one where the intercept is included one where it is not. 22
First, fit the model with an intercept. The R default : (Y g) allows a comparison of group means. Let s produce the fitted values and the estimates for the means. 23
######## R code ######## X <- c("a","a","b","b") # declare predictor as factor. g <- factor(x) Y <- c(1,3,4,6) # linear model # store design matrix xy.lm <- lm(y~g,x=t) print(summary(xy.lm)) print(xy.lm$coefficients) print(cbind(y,xy.lm$fitted, xy.lm$residuals))
######## Output ######## Call: lm(formula = Y ~ g) Residuals: 1 2 3 4-1 1-1 1 Coefficients: Estimate Std.Error t Pr(> t ) (Intercept) 2.000 1.000 2.000 0.184 gb 3.000 1.414 2.121 0.168 Residual standard error: 1.414 on 2 DF Multiple R-Squ: 0.6923, Adjusted R-squ: 0.5385 F-statistic: 4.5 on 1 and 2 DF, p-value: 0.1679 (Intercept) gb 2 3 Y 1 1 2-1 2 3 2 1 3 4 5-1 4 6 5 1 24
The fitted model in the form Ŷ = X ˆβ is given by: 2 2 5 5 = 1 0 1 0 1 1 1 1 [ 2 3 ] (2) So the parameter estimates are ˆβ = ˆµ A ˆµB ˆµ A 25
The coefficients, from R, are the mean for the base level (level A) the difference between the mean for level B and the mean for level A. Hence, the estimate for µ A is 2 and the estimate for µ B is found by adding the two coefficients together: 2+3 = 5. Note that the default base level in R is decided alphanumerically. 26
Now the form Y = Ŷ + ˆǫ is given by 1 3 4 6 = 2 2 5 5 + 1 1 1 1 27
Now, let s exclude the intercept term. The form (Y g - 1) gives the individual group means and standard errors. ######## R code ######## # do not include intercept xy1.lm <- lm(y~g-1,x=t) print(summary(xy1.lm)) print(xy1.lm$coefficients) print(cbind(y,xy1.lm$fitted, xy1.lm$residuals)) 28
######## Output ######## Call: lm(formula = Y ~ g - 1) Residuals: 1 2 3 4-1 1-1 1 Coefficients: Estimate Std. Error t value Pr(> t ) ga 2 1 2 0.1835 gb 5 1 5 0.0377 Residual standard error: 1.414 on 2 DF Multiple R-Squ: 0.9355, Adjusted R-squ: 0.871 F-statistic: 14.5 on 2 and 2 DF, p-value: 0.065 ga gb 2 5 Y 1 1 2-1 2 3 2 1 3 4 5-1 4 6 5 1 29
The fitted model in the form Ŷ = X ˆβ is given by 2 2 5 5 = 1 0 1 0 0 1 0 1 2 5 So the parameters are ˆβ = ˆµ A and ˆµB we can obtain the individual estimates of µ A and µ B directly. 30
Compare this with (2) for the intercept model: 2 2 5 5 = 1 0 1 0 1 1 1 1 2 3 31
Note that the different paramaterization does not affect the outcome. The form Y = Ŷ + ˆǫ is the same as before. 1 3 4 6 = 2 2 5 5 + 1 1 1 1 32
Comments: Compare the design matrices for the two models: print(xy.lm$x) # model with intercept (Intercept) gb 1 1 0 2 1 0 3 1 1 4 1 1 print(xy1.lm$x) # model without intercept ga gb 1 1 0 2 1 0 3 0 1 4 0 1 33
The first design matrix corresponds to the model Y g and contains an intercept term corresponding to the base level A of factor g. The second column corresponds to the difference between the base level and level B. The second design matrix arises from the model Y g - 1 and has no intercept and provides individual estimates of the means for levels A and B. 34
Why consider the two models? The first model, which includes an intercept (Y g) allows us to test for differences between means. The model excluding the intercept term (Y g 1) provides individual estimates of the parameters but does not allow a test for differences. 35
Lecture 4 The General Linear Model: More complex models, where there is more than one explanatory variable (quantitative and/or qualitative). The simple linear regression model and one-way analysis of variance are special cases of the general linear model, with only one predictor variable. 36
The General Linear Regression Model We will assume there are p 1 predictor variables, X 1, X 2,..., X p 1, hence Y i = β 0 +β 1 X 1,i +β 2 X 2,i +...+β p 1 X p 1,i +ǫ i (3) ǫ i represents the random part of the model. As for simple linear regression, it is assumed that the ǫ i N(0, σ 2 ) and are independently distributed. The mean response (or systematic part of the model) is then µ Yi = β 0 + β 1 X 1,i + β 2 X 2,i +...+β p 1 X p 1,i. 37
When our model contains 2 predictor variables, we move from a straight line representation to a surface. For example, Y = β 0 + β 1 X 1 + β 2 X 2 + ǫ produces a flat surface: Y X1 X2 38
More complex general linear models produce twisted or curved surfaces. For example: Y = β 0 + β 1 X1 2 + β 2 X2 2 + ǫ produces: Y X1 X2 39
Interpretation of regression coefficients For simple linear regression the slope parameter, β 1, can then be interpreted as the expected increase in the response variable, Y, when the predictor, X, is increased by one unit. In multiple regression, β k is the expected change in response when the value of X k is increased by one unit provided the other predictors remain unchanged. 40
Hence the parameters, β 1,..., β p 1 are called partial regression coefficients. Caution: trying to interpret partial regression parameters by holding all other predictors constant is very dangerous when the predictor variables are correlated. A change in one predictor variable will result in changes to some (or all) of the other predictors. 41
Estimation of the Parameters The parameters, β 0, β 1,..., β p 1 are unknown constants. The estimates will be denoted by ˆβ 0, ˆβ 1,..., ˆβ p 1. Hence, Ŷ i, the predicted response for the i th observation is given by: Ŷ i = ˆβ 0 +ˆβ 1 X 1,i +ˆβ 2 X 2,i +...+ˆβ p 1 X p 1,i The i th residual is then defined as, ˆǫ i = observed predicted response = Y i Ŷ i = Y i (ˆβ 0 + ˆβ 1 X 1,i + ˆβ 2 X 2,i +... + ˆβ p 1 X p 1,i ) 42
To estimate σ 2, we use the residual mean square error, s 2. There are p parameters to be estimated for multiple linear regression, (β 0, β 1,..., β p 1 ), so s 2 has n p = n (p 1) 1 degrees of freedom. Source df Regression p 1 Residual n p (For simple linear regression p = 2 (β 0, β 1 ) so that s 2 has n 2 degrees of freedom as we saw in Chapter 1.) 43
Two Significance Tests for Regression Coefficients Two types of hypothesis are of interest. 1. H 0 : no relationship between the observed value, Y i, and any of the predictors. H 0 : β 1 = β 2 = = β p 1 = 0 H a : not all coefficients are equal to 0. For this test we use the test statistic: F = MSR MSE F p,n p 44
Two Significance Tests for Regression Coefficients The second type of hypothesis of interest is that an individual coefficient is equal to zero. That is: H 0 : β k = 0 H a : β k 0. These hypotheses are tested using the test statistic: T = ˆβ k se(ˆβ k ) t n p The t-tests are also obtained from the R output. 45
Matrix Representation The model can be written in matrix form as Y = Xβ + ǫ Note that this is the same representation we use for simple linear regression. 46
Here, X n,p = Y n,1 = Y 1 Y 2. Y n, 1 X 11 X 21 X p 1,1 1 X 12 X 22 X p 1,2....., 1 X 1n X 2n X p 1,n β p,1 = β 0 β 1. β p 1, ǫ n,1 = ǫ 1 ǫ 2. ǫ n 47
The vectors, Y and ǫ, are the same as for the simple linear regression case. The vector β contains the extra regression coefficients corresponding to the additional predictor variables. The design matrix X contains extra columns of n observations for each of the additional predictor variables in the model. 48
The fitted values are represented, as before, by: Ŷ = X ˆβ
Summary We have already seen that the simple linear regression, t-tests and one-way anova are all examples of the general linear model. Other situations that we will consider in subsequent lectures include: models with more than one quantitative predictor variable. models with more than one qualitative predictor (factorial designs) 49
models with quantitative and qualitative predictors (sometimes called analysis of covariance) models with interaction terms polynomial regression, where the model contains squared and higher order terms of the predictor variable(s).
All of these can be represented in matrix form as : Y = Xβ + ǫ 50