Chapter 2 Multiple Regression I (Part 1)

Size: px

Start display at page:

Download "Chapter 2 Multiple Regression I (Part 1)"

Jane Jenkins
6 years ago
Views:

1 Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p Observations (or Design) Thus, generally for individual i, theresponseis:y i obs Y X 1 X 2 X p 1 Y 1 X 11 X 12 X 1p 2 Y 2 X 21 X 22 X 2p n Y 2 X n1 X n2 X np the predictors variables are: X i1,x i2,, X ip 2 Linear regression model with Two predictor variables The linear regression model assumes that for any subject/individual with response Y i and predictor X i1,x i2 satisfies Y i = β 0 + β 1 X i1 + β 2 X i2 + ε i where Eε i =0,orequivalently E(Y i )=β 0 + β 1 X i1 + β 2 X i2 Sometimes, it is also written as, Y = β 0 + β 1 X 1 + β 2 X 2 + ε 1

2 where Eε =0 orequivalently E(Y )=β 0 + β 1 X 1 + β 2 X 2 where β 0,β 1,β 2 are called regression coefficient β 0 is called intercept β 1 is called coefficient of X 1 ; β 2 is called coefficient of X 2 For example: (height in inch) (Expected height of girl) = (Farther s height) + 05(Mother s height) (Expected height of boy) = (Farther s height) + 05(Mother s height) Meaning of the regression coefficients β 1 indicate the change in the mean response EY per unit increase in X 1 when X 2 holds constant β 2 indicate the change in the mean response EY per unit increase in X 2 when X 1 holds constant Note that X 1 and X 2 have some correlation, thus you need to know the difference in statistical and mathematical models [in mathematical model, X 1 and X 2 can be really free the change, but statistical model may not completely free] 3 Linear regression model with p predictor variables The linear regression model assumes that for any subject/individual with response Y i and predictor X i1,, X ip satisfies where Eε i =0,orequivalently Y i = β 0 + β 1 X i1 + + β p X ip + }{{} ε }{{} i predictable unpredictable E(Y i )=β 0 + β 1 X i1 + + β p X ip This means for each individual, the expected value of the response is a functional relationship with the independent variables But the real value has random error ε i from the expected value Sometimes, the model is written as Y = β 0 + β 1 X β p X p + ε 2

3 where Eε =0,orequivalently E(Y )=β 0 + β 1 X β p X p which is called a hyperplane, whereβ 0,β 1,, β p are called regression coefficient Meaning of the regression coefficients β k indicate the change in the mean response EY per unit increase in X k when the other predictors remain constant It is easy to see that we have studied the case p = 1, ie simple linear regression model We usually make the following assumptions (L) Linearity (implied in the model) (I) Independence of Error Terms, thus Cov(ε i,ε j )=0, if i j (N) Normality of Error Terms: ε N(0,σ 2 ) (E) Equal/constant Error Variance: Var{ε i } = σ 2 (F) Fixed design: X i1,, X ip are known and nonrandom There are p+1 coefficients β 0,, β p, one common variance σ 2, they are called parameters of the model 4 Some Examples Here we give some examples that are nonlinear, but can be transformed to linear regression models Qualitative Predictor variables It is understandable that the predictors must be quantitative But we can also consider qualitative predictor, by denoting the predictor using dummy variables For example Y a person s height, X 1 is his/her father s height, X 2 his/her mother s height, S is the gender of the person We can denote the gender by Then our model is X 3 = { 1, if the person is male 0, if the person is male Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + ε 3

4 Polynomial regression models, for example Y i = β 0 + β 1 X i + β 2 Xi 2 + ε i, Y i = β 0 + β 1 X i + β 3 Xi β k Xi k + ε i, Y i = β 0 + β 1 X i1 + β 2 X i2 + β 2 X i3 + β 4 Xi1 2 + β 5 X i1 X i2 + β 6 Xi3 3 + ε i, Y i = β 0 + β 1 X i1 + β 2 X i2 + β 2 X i3 + β 4 Xi1 2 + β 5X i1 X i2 + β 6 Xi1 4 + ε i, X i1 X i2 are usually called interaction of X 1 and X 2, how about X i2 X i3? Transformed model (after variable transformation, the model become a linear regression model) Here are some examples (a) For model Y i = a 0 exp(β 1 X i1 + + β p X ip )ξ i,letz i = log(y i ), ε i =log(ξ i )and β 0 =log(a 0 ) Taking logrithm, the model becomes Z i = β 0 + β 1 X i1 + + β p X ip + ε i (b) model Y i = β 0 + β 1 X i1 + β 2 X i2 + β 2 X i3 + β 4 X 2 i1 + β 5X i1 X i2 + β 6 X 3 i3 + ε i, can be written as Y i = β 0 + β 1 X i1 + β 2 X i2 + β 2 X i3 + β 4 Z i4 + β 5 Z i5 + β 6 Z i6 + ε i, where Z i4 = X 2 i1, Z i5 = X i1 X i2 and Z i6 = X 3 i3 5 General linear regression model in matrix terms Again, our general model can be written as Y i = β 0 + β 1 X i1 + + β p X ip + ε i, i =1,, n or Y 1 = β 0 + β 1 X β p X 1p + ε 1, Y 2 = β 0 + β 1 X β p X 2p + ε 2, Y n = β 0 + β 1 X n1 + + β p X np + ε n (with the 5 assumptions) 4

5 Let β = β 0 β 1 β p 1 X 11 X 1,p 1 X 21 X 2,p X =, 1 X n1 X n,p called coefficient vector Y = E = ε 1 ε 2 ε n It is easy to check 1 X 11 X 1,p 1 X 21 X 2,p 1 X n1 X n,p β 0 β 1 β p The regression model can be written as called Design matrix Y 1 Y 2 Y n called random error vector = Y = Xβ + E β 0 + β 1 X 1,1 + + β p X 1,p β 0 + β 1 X 2,1 + + β p X 2,p β 0 + β 1 X n,1 + + β p X n,p called response vector The (L-I-N-E) assumptions can be written as σ σ E{E} = 0, Var{E} = σ 2 E N(0,σ 2 I) = σ2 I 6 Least squares estimation Minimize Q(b 0,, b p )= n (Y i b 0 b 1 X i1 b p X i,p ) 2 i=1 by calculus, we have the following (p+1) Normal equations: (how?) n (Y i b 0 b 1 X i1 b p X i,p ) = 0 i=1 n (Y i b 0 b 1 X i1 b p X i,p )X i1 = 0 i=1 n (Y i b 0 b 1 X i1 b p X i,p )X ip = 0 i=1 5

6 let b =(b 0,b 1,, b p ) Then the Normal equations can be written as X Xb = X Y The solution, ie the estimator of the coefficient vector, is b =(X X) 1 X Y The estimated model is Ŷ = b 0 + b 1 X b p X p Fitted values Ŷ i = b 0 + b 1 X i1 + + b p X ip, i =1,, n (Fitted) residuals e i = Y i Ŷi = Y i (b 0 + b 1 X i1 + + b p X ip ), i =1,, n Estimator of σ 2, denoted by ˆσ 2, MSE = n i=1 e 2 i /{n (p +1)} called Mean squared error why (p+1)? (because there are p+1 constraints, p+1 is the number of (free) coefficients, or more exactly the number of Normal equations) Dwaine Studios example Y -sales, X 1 - number of persons aged 16 or less, X 2 - income 21 observations X X = b = Y = ; X = , , , , , , , , , , , , , X Y = 3, , , 073 3, , , 073 =

7 4 The estimated model is Ŷ = X X obs X 1 X 2 Y Fitted Ŷi residuals e i ˆσ 2 = MSE = 21 i=1 e2 i n p 1 = = Unbias of the estimators of coefficients The estimator of coefficient vector is unbiased, ie E(b) =β and Var(b) =σ 2 (X X) 1 In details E(b k )=β k and Var(b k )=σ 2 c k+1,k+1, k =0, 1,, p 1 where c kk is the (k, k)th entry in (X X) 1 [Proof: Note that EY = Xβ Thus E{b} = E{(X X) 1 X Y} = {(X X) 1 X E{Y} = β and Var(b) =(X X) 1 X Var(Y)X(X X) 1 =(X X) 1 X σ 2 IX(X X) 1 = σ 2 (X X) 1 ] 7

8 8 Fitted values and residuals in matrix form fitted value Ŷ = Ŷ 1 Ŷ 2 Ŷ n = = X(X X) 1 X Y b 0 + b 1 X b p X 1,p b 0 + b 1 X b p X 2,p b 0 + b 1 X n1 + + b p X n,p = Xb fitted residuals e = Y Ŷ =(I X(X X) 1 X )Y Denote X(X X) 1 X by H, wehaveŷ = HY, e =(I H)Y 9 Variance-covariance matrix for residuals e Var{e} = Var{(I H)Y} =(I H)Var{Y}(I H) Var{Y} = Var{E} = σ 2 I (I H) = I H = I H HH = X(X X) 1 X X(X X) 1 X = X(X X) 1 X = H (I H)(I H) =I 2H + HH = I H Var{e} = σ 2 (I H), which can be estimated by ˆσ 2 (I H), where ˆσ 2 = MSE = e e n p 1 = Y (I H)Y (n p 1) Eˆσ 2 = E(MSE)=σ 2 [The proof can be ignored] 8

9 10 Variance-covariance matrix for b Recall b =(X X) 1 X Y, Var{b} =(X X) 1 X Var{Y}X(X X) 1 = σ 2 (X X) 1 where σ 2 can be estimated by ˆσ 2 =MSE In other word, we estimate Var{b} by ˆσ 2 (X X) 1, denoted s(b) =ˆσ 2 (X X) 1 For the above example, MSE = SSE n p 1 = e e s 2 {b} = 12116(X X) 1 = 11 The distribution of estimators If E N(0,σ 2 I) (ie ε i are IID N(0,σ 2 )), then = 2, = , The estimated coefficients b N(β,σ 2 (X X) 1 ) Denote the (i, j)th entry of (X X) 1 by c ij,then b k N(β k,σ 2 c k+1,k+1 ), k =0, 1,, p 1 (where b =(b 0,b 1,, b p ) ) Let s(b k )= MSE c k+1,k+1, called Standard Error (SE) for b k (which can be found in the output of R), then b k β k s(b k ) t(n p 1) t-value t = b k s(b k ) 9

10 12 Confidence interval for β k with 1 α confidence, the Confidence interval for β k is [b k s(b k ) t(1 α/2,n p 1), b k s(b k ) t(1 α/2,n p 1)] For the Dwaine Studios example, the 95% Confidence interval for β 2 is [ , ] = [083, 1790] where quantile (critical value) t(1 α/2,n p 1) = t(0975, 21 3) = 2101 is used 13 test for β k =0 Our hypothesis is H 0 : β k =0, H a : β k 0 under H 0, t = b k β k s(b k ) For significant level α, our criterion is = b k = t(n p 1) s(b k ) If the calculated t >t(1 α/2,n p 1), reject H 0 If the calculated t t(1 α/2,n p 1), accept H 0 Similarly, we can do the test based on the p-value If p-value <α, reject H 0 If p-value α, accept H 0 For the Dwaine Studios example, test H 0 : β 1 =0, H a : β 1 0 with significance level 5%, since t =6868 >t(1 α/2,n p 1) = 2101 we reject H 0 (in other words, β 1 is significantly different from 0) Similarly, H 0 : β 0 = 0 can be accepted H 0 : β 2 = 0 should be rejected 10

11 14 Prediction For any new individual with X new =(x 1,, x p ), the predict mean response is Ŷ new = X b where X =(1,x 1,, x p ) We have EŶnew = EY new Note that if normal errors are assumed, ie ε i are IID N(0,σ 2 ), then Ŷ new N(EY new, X (X X) 1 X σ 2 ) Let s 2 (Ŷnew) =X (X X) 1 X ˆσ 2 = X (X X) 1 X MSE We have Ŷ new EY new t(n p 1) s(ŷnew) With confidence 100(1 α)%, the CI for E(Y new )is [Ŷnew s(ŷnew) t(1 α/2,n p 1), Ŷ new + s(ŷnew) t(1 α/2,n p 1)] What about the prediction interval (PI) for the value Y new? With confidence 100(1 α)%, the PI for Y new is [Ŷnew s(pred) t(1 α/2,n p 1), Ŷ new + s(pred) t(1 α/2,n p 1)] where s 2 (pred) =MSE + s 2 (Ŷnew) =MSE{1+X (X X) 1 X} 15 R code regression=lm(y x 1 + x x p ) summary(regression) Xnew = dataframe(x1=c(), x2=c(),, xp=c()) predict(regression, Xnew, interval = "confidence"/"prediction", level=095) 11

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the