(5+5)Q 1. Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Mid Term Test May 26, 2004 Two Hours 3 Instructor Course Examiner Marks Y.P. Chaubey Y.P. Chaubey 80 Special Instructions: Closed Book Exam 1. Calculators are permitted. 2. Full Credit will be given only for answering questions systematically. 3. Answer ANY THREE questions. 4. Tables showing the required percentage points appear on the last page (a) Consider a simple linear model with one dependent variable (Y ) and one predictor variable (X). Given the independent observations (X i, Y i ), i = 1, 2,..., n, show that the least squares estimators of the regression parameters β 0, β 1 are given by b 1 = S xy /, b 0 = Ȳ b 1 X where, n S xy = (X i X)(Y i Ȳ ). i=1 (b.) Shown below are the number of galleys for a manuscript (X) and the dollar cost of correcting typographical errors (Y ) in a random sample of recent orders handled by a firm specializing in technical manuscripts. Y X 128.0 7.0 213.0 12.0 75.0 4.0 250.0 14.0 446.0 25.0 540.0 30.0 Show that the best fitted regression line is given by y = 1.60 + 17.9x. 1
(5+5)Q 2. (a.) Express Ŷh = b 0 + b 1 X h as a linear function of Y s and hence deduce for a normal linear regression model that Ŷ h N(µ h, σ 2 {Ŷh}) where ( 1 µ h = β 0 + β 1 X h, σ 2 {Ŷh} = σ 2 n + (X h X) 2 ). (5+5)Q 3. (b.) The following output is obtained in studying the relation between the asking price(x) and selling price (Y ) in a survey of 51 houses. Two new houses are available on the market with asking prices X = $40, 000. Find a 90% confidence interval for the expected selling price for such a house. Note: the value of X required in your computations may be obtained using the formulae; s 2 {b 1 } = MSE. and s 2 {b 0 } = MSE { 1 n + X 2 } = MSE n The regression equation is y = 5076 + 0.836 x + X 2 s 2 {b 1 }. Predictor Coef StDev T P Constant 5076 2193 2.31 0.025 x 0.83571 0.03266 25.59 0.000 S = 5517 R-Sq = 93.0% R-Sq(adj) = 92.9% (a) Use the result of Q. 2(a) to deduce that b 0 N(β 0, σ 2 ( 1 n + X 2 ). Outline the steps and facts in establishing that (b 0 β 0 ) MSE ( 1 + X 2 n ) t (n 2) 2
(5+5)Q 4. (b.) For the data in Q. 1 (b), an analyst fitted the simpler model Y i = βx i + ɛ i. Test an appropriate hypothesis to justify the use of the alternative model (use α = 0.05.) (a) Define the sum of squares denoted by SST O, SSR and SSE and establish the identity (6+4)Q 5. SST O = SSR + SSE Show that SSR can be computed from the formula SSR = b 1 S xy. (b) Compute the above sum of squares for the data in Q. 1 (b). Present the ANOVA table for testing the null hypothesis H 0 : β 1 = 0 at 5% level of significance. Please, specify the alternative, test statistic, decision rule and conclusion clearly. (a.) Explain the General Linear Test and derive the F statistic of ANOVA from the general Linear Test for testing the null hypothesis, H 0 : β 1 = 0. (b.) Consider testing the null hypothesis, H 0 : β 0 = β 1. Show that the sum of squares due to error under the reduced model is given by where SSE(r) = (Y i Ŷi(r)) 2, Ŷ i (r) = b(x i + 1), b = i Y i (X i + 1) i(x i + 1) 2. Give the expression for the F statistic along with the corresponding degrees of freedom. Simplify, the expression as much as you can. (6+4)Q 6. (a). For a simple linear model, prove that in the sample the residuals e i = Y i Ŷi, i = 1,..., n are uncorrelated with Ŷi, i = 1,..., n. Explain, how this information is used to detect departures from the model. 3
(5+5)Q 7. (b). Let r denote the sample correlation coefficient between X and Y. Show that the t statistic for testing H 0 : β 1 = 0 can be written as t = r (n 2). 1 r 2 (a.) Consider the data setup for testing Lack of Fit of the linear regression model as Y ij = β 0 + β 1 X j + ɛ ij, j = 1,..., c; i = 1,..., n j. Define SSP E and SSLF and prove the identity SSE = SSP E + SSLF. Use the General Linear Test approach to justify the test statistic F = MSLF MSP E (b.) Consider the following data on (X, Y ). Use the lack of fit test to determine if the simple linear model is appropriate here (Use α = 0.05.) X Y 75 28, 125 160,150 150 152 200 124,104 You may use the following regression output in aid to your computations. The regression equation is ynew = 65.4 + 0.367 xnew Predictor Coef SE Coef T P Constant 65.42 72.48 0.90 0.418 x 0.3674 0.4756 0.77 0.483 S = 51.60 R-Sq = 13.0% R-Sq(adj) = 0.0% Analysis of Variance Source DF SS MS F P Regression 1 1589 1589 0.60 0.483 Residual Error 4 10649 2662 Total 5 12238 4
(5+5)Q 8. (a). Let Ŷi(f) = b 0f + b 1f X i and Ŷi(r) = b 0r + b 1r X i denote the fitted values of Y i under the full model and reduced model respectively, prove that SSE(r) SSE(f) = i (Ŷi(r) Ŷi(f)) 2. (5+5)Q 9. (ii) Show that the estimators of regression parameters in a simple linear model obtained through the Maximum Likelihood method are the same as those obtained by the Least Square method if Y 1, Y 2,..., Y n are normally distributed? (a.) Prove that MSE = SSE n 2 is unbiased for σ2. (b.) Outline the steps and facts in establishing that Ŷ h Y h(new) t(n 2). s 2 {Ŷh} + MSE Use the above result in providing a 95% prediction interval for predicting the cost when the number of galleys is 10 for the data in Q.2 (b). 5