Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the variances of y i for i = 1,, n. Solution:V ar(y i ) = V ar(ε i ) = σ 2 for i = 1,, n. b. (4 marks) Suppose we have n = 2 observations in( the observation ) vector y = (y 1, y 2 ). 1 0 Compute the variances of y and Ay, where A =. 1 1 Solution: V ar(y) = σ 2 ( 1 0 0 1 ) (2points), V ar(ay) = σ 2 AA = σ 2 ( 1 1 1 2 ).(2points) c. (4 marks) Given x = xnew, when the parameters β 0, β 1 and σ 2 are known, compute 1 α prediction interval. Solution: E(ynew) = β 0 + β 1 xnew (1point), y new E(y new ) N(0, 1). (1point) σ So 1 α prediction interval would be E(ynew) ± z(1 α/2)σ. (2points) d. (4 marks) Given x = xnew, when the parameters β 0, β 1 and σ 2 are unknown, compute 1 α prediction interval. Solution: Let ˆβ 0 and ˆβ 1 be least squares estimates of β 0 and β 1, respectively. ŷnew = ˆβ 0 + ˆβ 1 xnew. (1point) The 1 α prediction interval would be ynew ± t(1 α/2; n 2) MSE(1 + 1 + (x new x)2 ).(3points) n (xi x) 2 1
Question 2: (12 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i. We have n = 6 observations. The summary statistics are as follows: y i = 8.5, x i = 6, x 2 i = 16, xi y i = 15.5, yi 2 = 17.25. a. (5 marks) Compute the least squares point estimates of β 0 and β 1. Solution: b 1 = 0.7 (3points); b 0 = 0.7166667 (2points) b. (4 marks) Calculate SSE and MSE. Solution: SSE = 0.3083(2points), M SE = 0.0771 (2points). c. (3 marks) Use a 5% level of significance to conduct the hypothesis test of H 0 : β 1 = 0 versus H 0 : β 1 0. Solution: t = 7.973 (1point). If t > t(0.975, 4), we reject H 0, otherwise we conclude H 0. (2points) 2
Question 3: (8 marks) Consider the following simple linear regression model: y i = β 0 + β 1 x i + ε i, where i = 1,..., n. Assume that ε 1,..., ε n are independent and identically distributed as N(0, σ 2 ). We have the following n = 4 data points. x 3 0 1 5 y 1 3 2 0 a. (3 marks) Write down y and the matrix X for the regression in matrix form. Solution: y = (1, 3, 2, 0) (1point), X = 1 3 1 0 1 1 1 5. (2points) b. (3 marks) Obtain the least squares regression line using matrix approach. Solution: ˆβ 0 = 2.7966102(1point), ˆβ 1 = 0.5762712(2points). c. (2 marks) Report ŷ 1. Solution: ŷ 1 = 1.06779661 (2points) 3
Question 4: (12 marks) Consider the following partial SAS ouptut of a simple linear regression model. a. (8 marks) Fill in the spaces marked with ***. Solution: The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Square Square F Value Pr > F Model 1 72.88 72.88 1.4799 0.2395 Error 18 886.39 49.244 Corrected Total 19 959.27 Root MSE 7.017407 R-Square 0.07597 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 6.104 4.820 1.266 0.222 X 1-2.422 1.991-1.217 0.239 b. (2 marks) At 5% level of significance, is there evidence that x is useful in explaining y? Solution: No, since the p value 0.239 is greater than 0.05. c. (2 marks) Construct a 95% confidence interval for β 1. Solution: 2.422 ± t(0.975, )1.991 4
Question 5: (6 marks) Suppose that you fit the model E(y) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 to 15 data points and found F equal to 55. a. (3 marks) Do the data provide sufficient evident to indicate that the model contributes information for the prediction of y? Test using a 5% level of significance. (F (0.95, 3, 11) = 3.59, F (0.95, 3, 12) = 3.49). Solution: The hypothesis to be tested is H 0 : β 1 = β 2 = β 3 = 0, H a : at lease one β i differs from zero. (1point) Since F = 55 > F (0.95, 3, 11), H 0 is rejected. There is evidence that the model contributes information for the prediction of y. (2points) b. (3 marks) Use the value of F to calculate R 2. Interpret its value. Solution: Use the fact that F = R2 /3 (1 R 2 )/11 (1point) Solving for R2 you find R 2 = 0.9375 (1point), which means the total sum of squares of deviations of the y-values about their mean has been reduced by 93.75% by using the linear model to predict y. (1point) 5
Question 6: (8 marks) a. (2 marks) Using a simple graph, show that the assumption of constant variance does not hold for a data set. b. (2 marks) Using a simple graph, show that a point is an influential observation. c. (2 marks) Using a simple graph, show that the assumption of normal populations does not hold for a data set. 6
d. (2 marks) We fit a simple linear regression model to a data set. The residuals are given as follows: e.2-0.2 0.2-0.2 0.2-0.2 0.2-0.2 0.2-0.2 0.2-0.2 0.2-0.2 0.2-0.2 Compute the Durbin-Watson statistic. 7