Simple and Multiple Linear Regression

Sta. 113 Chapter 12 and 13 of Devore March 12, 2010

Table of contents 1 Simple Linear Regression 2

Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where Y is the reponse x is the predictor β 0 is the unknown intercept of the line β 1 is the unknown slope of the line ɛ N(0, σ 2 ) is the noise with unknown variance σ 2

Model Simple Linear Regression Notice that Y is a random quantity due to ɛ only E(Y ) = β 0 + β 1 x V(Y ) = σ 2 Y N(β 0 + β 1 x, σ 2 )

Assumptions Simple Linear Regression Notice that A linear underlying relationship between the response and the predictor Normality of random noise Constant variance of random noise all throughout the data Independence of random noise

Least Squares Find the line passing through the data points such that the sum of squared vertical distances from this line to the data points is minimized. min b 0,b 1 n (y i b 0 b 1 x i ) 2 i=1 Since this is a minimization problem, taking the derivatives with respect to b 0 and b 1 and setting them equal to zero will result in two equations which are called the normal equations. nb 0 + ( x i )b 1 = 0 ( x i )b 0 + ( x 2 i )b 1 = x i y i

Least Squares If we solve this system we obtain b 1 = ˆβ 1 = (xi x)(y i ȳ) (xi x) 2 b 0 = ˆβ 0 = ȳ b 1 x.

How does LSE relate to MLE Notice that there is nothing probabilistic about least squares estimation. It s merely an optimization problem where the sum of squared vertical distances from actual points to a line is minimized. There is no underlying distribution assumption. In fact, nothing is treated as random. We just have a cloud of points and we pass a line through them. In the beginning we made certain assumptions about the response. We said Y i N(β 0 + β 1 x i, σ 2 ). Assuming that the responses are distributed normally with mean β 0 + β 1 x i and variance σ 2 will yield a likelihood over the unknown model parameters β 0, β 1 and σ 2. Maximizing this likelihood will yield the MLE. It turns out that under the assumptions we made earlier, the maximum likelihood estimators for β 0 and β 1 are identical to the least squares estimators.

Estimating the error variance The maximum likelihood estimator for the error variance σ 2 is easily obtained as ˆσ 2 = n i=1 (y i b 0 b 1 x i ) 2. n Recall that this is a biased estimator for σ 2. To correct for the bias we have to subtract the number of parameters estimated prior to the estimation of σ 2 from n. Thus, the unbiased estimator is obtained as n s 2 i=1 = (y i b 0 b 1 x i ) 2. n 2

Example - Murder rate vs unemployment percentage

The coefficient of determination, R 2 The coefficient of determination, denoted by R 2, is given by R 2 = 1 SSE n SST = 1 i=1 (y i b 0 b 1 x i ) 2 n i=1 (y i ȳ) 2. It is interpreted as the proportion of observed variation in y that is explained by the simple linear regression model.

s about β 1 It can be shown that b 1 = ˆβ 1 is normally distributed with mean E(b 1 ) = β 1 and variance V(b 1 ) = σ2 S xx where S xx = (x i x) 2. Thus the quantity z = b 1 β 1 σ/ S xx would be standard normally distributed. Since we don t know σ 2, if we replace is by its estimator s 2 t = b 1 β 1 s/ S xx has a t distribution with n 2 df.

Confidence interval and hypothesis test for β 1 A 100(1 α)% CI for the slope β 1 of the true regression line is given by s b 1 ± t α/2,n 2. S xx We usually test the null hypothesis H 0 : β 1 = 0 vs H a : β 1 0 where the test statistic is t = b 1 s/ S xx. Since under the null hypothesis t = b 1 s/ S xx has a t distribution with n 2 degrees of freedom, the null hypothesis is rejected if t t alpha/2,n 2 or t t alpha/2,n 2. This can easily be turned into a one-sided test.

s on µ Y,x Simple Linear Regression and the prediction of future Y values Notice that once b 0 and b 1 are calculated, b 0 + b 1 x is a point estimate of µ Y,x (the expected ot true average value of Y when x = x ). The point estimate or prediction by itself gives no information concerning how precisely µ Y,x has been estimated or Y predicted. This can be remedied by developing a CI for µ Y,x and a prediction interval (PI) for a single Y value.

s on µ Y,x Simple Linear Regression and the prediction of future Y values A 100(1 α)% CI for µ Y,x, the expected value of Y when x = x, is given by b 0 + b 1 x 1 ± t α/2,n 2 s n + (x x) 2. S xx A 100(1 α)% PI for a future Y observation to be made when x = x is given by b 0 + b 1 x ± t α/2,n 2 s 1 + 1 n + (x x) 2. S xx

Model Simple Linear Regression A multiple linear regression model is given by Y = β 0 + β 1 x 1 + β 3 x 3 +... + ɛ where Y is the reponse x 1, x 2, x 3,... are the predictors β 0, β 1, β 2,... are unknown regression coefficients ɛ N(0, σ 2 ) is the noise with unknown variance σ 2

Model Simple Linear Regression When we have n observations from such a model, i.e. y = (y 1, y 2,..., y n ), with we define X as the design matrix 1 x 11 x 12 x 1p 1 x 21 x 22 x 2p X =........ 1 x n1 x n2 x np

Least Squares The least squares solution ˆβ = (β 0, β 1, β 2,..., β p ) is given by ˆβ = (X X) 1 X y. Just like in the simple linear regression case, this is equivalent to the MLE under aforementioned assumptions.

Example - Cirrhosis data