LINEAR REGRESSION MODELS W431 HOMEWORK ANSWERS March 9, 2010 Due: 03/04/10 Instructor: Frank Wood 1. (20 points) In order to get a maximum likelihood estimate of the parameters of a Box-Cox transformed simple linear regression model (Yi β 0 + β 1 X i + ɛ i ), we need to find the gradient of the likelihood with respect to its parameters (the gradient consists of the partial derivatives of the likelihood function w.r.t. all of the parameters). Derive the partial derivatives of the likelihood w.r.t all parameters assuming that ɛ i N(0, σ 2 ). (N.B. the parameters here are, β 0, β 1, σ) (Extra Credit: Given this collection of partial derivatives (the gradient), how would you then proceed to arrive at final estimates of all the parameters? Hint: consider how to increase the likelihood function by making small changes in the parameter settings.) The gradient of a multi-variate function is defined to be a vector consisting of all the partial derivatives w.r.t every single variable. So we need to write down the full likelihood first: L P 1 2πσ e (y i β 0 β 1 x i ) 2 2σ 2 Then the log-likelihood function is: P (y i β 0 β 1 x i ) 2 l n 2 log(σ2 ) 2σ 2 Take derivatives w.r.t to all the four parameters, we have the followings: 1 (y σ 2 i β 0 β 1 x i )yi lny i (1) 1 (y β 0 σ 2 i β 0 β 1 x i ) (2) 1 (y β 1 σ 2 i β 0 β 1 x i )x i (3) σ n (y 2 2σ + i β 0 β 1 x i ) 2 (4) 2 2σ 4 From the above equations array, we can have the gradient. 1
2. (1 points) 1 Derive an extension of Bonferroni inequality (4.2a) which is given as P (Ā1 Ā2 ) 1 α α 1 2α for the case of three statements, each with statement confidence coefficient 1 α. Following the thread on Page 1 in the textbook, we have: SupposeP (A 1 ) P (A 2 ) P (A 3 ) α, then P (Ā1 Ā2 Ā3) P ( A 1 A 2 A 3 ) 1 P (A 1 A 2 A 3 ) 1 P (A 1 ) + P (A 2 ) + P (A 3 ) P (A 1 A 2 ) P (A 1 A 1 3α + P (A 1 A 2 ) + P (A 1 A 3 ) + P (A 2 A 3 ) P (A 1 A 2 A 3 ) So we have P (Ā1 Ā2 Ā3) 1 P (A 1 ) P (A 2 ) P (A 3 ) 3. (2 points) 2 Refer to Consumer finance Problems. and.13. a. Using matrix methods, obtain the following: (1) vector of estimated regression coefficients, (2) vector of residuals, (3) SSR, (4) SSE, () estimated variance-covariance matrix of b, (6) point estimate of E{Y h } when X h 4, (7) s 2 {pred} when X h 4 b. From your estimated variance-covariance matrix in part (a), obtain the following: (1) s{b 0, b 1 }; (2) s 2 {b 0 }; (3) s{b 1 } c. Find the hat matrix H d. Find s 2 {e} 16 (a) X ; Y 10 1 ; X X 6 17 4 3 3 4 17 13 22 (X X) 1 1 17 ; 17 6 (X X) 1 X 1 17 17 6 4 3 3 4 1 13 38 2 4 13 7 11 7 1 This is problem 4.22 in Applied Linear Regression Models(4th edition) by Kutner etc. 2 This is problem.24 in Applied Linear Regression Models(4th edition) by Kutner etc. 2
H X(X X) 1 X 1 13 38 2 4 13 7 11 7 16 (1): ˆβ (X X) 1 X Y 1 13 38 2 4 13 10 7 11 7 1 13 22 1 16 2.8780 0.0488 (2): ResidualY X ˆβ 10 1 0.4390 0.3 4.6098 0.7317 13 1.2683 22 (3): SSR Y H 1 JY 14.2073 n (4): SSE Y (I H)Y 20.2927 3.1220 1 1 6 1 8 8 1 6 27 16 6 6 11 6 6 1 8 6 7 7 8 8 6 7 7 8 1 6 1 8 8 1 18 0.4390 189 4.6098 (): The estimated variance-covariance matrix of b s 2 {b} MSE(X X) 1 (6): The point estimate of E{Y h } X 0.4390 hb 18.8780 4.6098 (7): At X h 4, s 2 {pred} MSE(1 + X h(x X) 1 X h ) 6.9292 (b) s{b 0, b 1 } 2.103; s 2 {b 0 } 6.80; s{b 1 } 0.7424 0.8616 (c) As calculated in part(a), the hat matrix H X(X X) 1 X 1 6.80 2.103 2.103 0.7424 13 38 2 4 13 7 11 7 3
1 1 6 1 8 8 1 0.369 0.1463 0.0244 0.191 0.191 0.369 6 27 16 6 0.1463 0.68 0.3902 0.1220 0.1220 0.1463 6 11 6 6 1 8 6 7 7 8 0.0244 0.3902 0.2683 0.1463 0.1463 0.0244 0.191 0.1220 0.1463 0.1707 0.1707 0.191 8 6 7 7 8 0.191 0.1220 0.1463 0.1707 0.1707 0.191 1 6 1 8 8 1 0.369 0.1463 0.0244 0.191 0.191 0.369 3.2171 0.7424 0.1237 0.9899 0.9899 1.860 0.7424 1.7323 1.9798 0.6187 0.6187 0.7424 (d) s 2 {e} MSE(I H) 0.1237 1.9798 3.7121 0.7424 0.7424 0.1237 0.9899 0.6187 0.7424 4.2070 0.8662 0.9899 0.9899 0.6187 0.7424 0.8662 4.2070 0.9899 Matlab Code: X;;;;; Y16;;10;1;13;22 Jones(6,6) Ieye(6,6) n, m size(y ) Z inv(x X) HX*Z*X betaz*x *Y residualy-h*y SSRY *(H-(1/n)*J)*Y SSEY *(I-H)*Y MSESSE/(n-2) covmse*z s2 e MSE (I H) Xh1;4 YhhatXh *beta s2 pred MSE (1 + Xh Z Xh) 1.860 0.7424 0.1237 0.9899 0.9899 3.2171 4. (2 points) 3 In a small-scale regression study, the following data were obtained: Assume 3 This is problem 6.27 in Applied Linear Regression Models(4th edition) by Kutner etc. 4
i: 3 4 6 X i1 7 4 16 3 21 8 X i2 33 7 49 31 Y i 42 33 7 28 91 that regression model (1) which is: Y i β 0 + β 1 X i1 + βx i2 + ɛ i () with independent normal error terms is appropriate. Using matrix methods, obtain (a) b; (b) e; (c) H; (d) SSR; (e) s 2 {b}; (f) Ŷh when X h1 10, X h2 30; (g) s 2 {Ŷh} when X h1 10, X h2 30 33.9321 (a) b (X X) 1 X Y 2.7848 0.2644 2.6996 1.2300 (b) e Y Xb 1.6374 1.3299 0.0900 6.9868 0.2314 0.217 0.2118 0.1489 0.048 0.2110 0.217 0.3124 0.0944 0.2663 0.1479 0.2231 (c) H X(X X) 1 X 0.2118 0.0944 0.7044 0.3192 0.104 0.20 0.1489 0.2663 0.3192 0.6143 0.14 0.1483 0.048 0.1479 0.104 0.14 0.9404 0.0163 0.2110 0.2231 0.20 0.1483 0.0163 0.1971 (d) SSR Y H 1 JY 3009.926 n 71.4711 34.189 13.949 (e) s 2 {b} MSE(X X) 1 34.189 1.6617 0.64 13.949 0.64 0.262
33.9321 (f) Ŷh X hb 0 30 2.7848 3.8471 0.2644 (g) At X h1 10 and X h2 30, s 2 {Ŷh} X hs 2 {b}x h.4246 Matlab Code: X1 7 33; ;6 7; 49;1 ; 1 8 31 Y42;33;7;28;91; Jones(6,6) Ieye(6,6) n, m size(y ) Zinv(X *X) HX*Z*X betaz*x *Y residualy-h*y SSRY *(H-(1/n)*J)*Y SSEY *(I-H)*Y MSESSE/(n-3) covmse*z s2 emse*(i-h) Xh1;10;30 YhhatXh *beta s2 yhatxh *cov*xh. (1 points) Consider the classic regression model using matrix, i.e. Y Xβ + ɛ where X is a n p design matrix whose first column is an all 1 vector, ɛ N(0, I) and I is an identity matrix. Prove the followings: a. The residual sum of squares RSS ê ê can be written in a matrix form: RSS y (I X(X X) 1 X )y (6) b. We call the RHS of (2) a sandwich. Prove the matrix in the middle layer of the sandwich N I X(X X) 1 X is an idempotent matrix. 6
c. Prove that the rank of N defined in part (b) is n p. N.B. p columns in design matrix means there are p 1 predictors plus 1 intercept term. Before handling the problem, make clear of the dimensions of all the matrices here. (a) SSE e e (y Xb) (y Xb) (y b X )(y Xb) y y 2b X y + b X Xb y y 2b X y+b X X(X X) 1 X y y y 2b X y+b IX y y y b X y y (I b X )y y (I ((X X) 1 X ) X )y y (I X(X X) 1 X )y (b)a 2 AA (I X(X X) 1 X )(I X(X X) 1 X ) I 2X(X X) 1 X +X(X X) 1 X X(X X) 1 X I 2X(X X) 1 X + XI(X X) 1 X I X(X X) 1 X A Therefore, A is an idempotent matrix. (c) Since A is a symmetric and idempotent matrix, rank(a)trace(a) Let H X(X X) 1 X trace(a) trace(i n n H n n ) trace(i) trace(h) n trace(h) n trace(x(x X) 1 X ) n trace((x X) 1 p px p nx n p ) n trace(i p p ) n p So rank(a)n-p 7