SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

Size: px

Start display at page:

Download "SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester"

Kellie Richards
5 years ago
Views:

1 RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: "Statistics Tables" by H.R. Neave PAS 371 SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester Linear Models 2 hours Marks will be awarded for your best three answers. RESTRICTED OPEN BOOK EXAMINATION Candidates may bring to the examination lecture notes and associated lecture material (but no textbooks) plus a calculator that conforms to University regulations. There are 99 marks available on the paper. Please leave this exam paper on your desk Do not remove it from the hall Registration number from U-Card (9 digits) to be completed by student PAS Turn Over

2 Blank PAS Continued

3 1 Four objects O 1, O 2, O 3, O 4 are weighed in a balance. Four weighings are made; the (i, j)-th element in the matrix below is +1 if object O i is placed in the left pan of the balance for weighing j, and it is -1 if it is placed in the right pan. We are required to estimate the weights of the four objects, given the weights y j required in the right pan to achieve balance (j = 1, 2, 3, 4): (i) (ii) (iii) Formulate a regression model for this problem, expressing the observed weights y i in terms of the unknown weights β j of the four objects. (7 marks) Showing your working explicitly, obtain expressions for the least-squares estimators of the weights of the four objects. (8 marks) Evaluate these estimates for data y 1 = 20.2, y 2 = 8.0, y 3 = 9.7, y 4 = 1.9. (3 marks) (iv) The whole experiment is now replicated n times. Showing your working explicitly, calculate the new least-squares estimates. (15 marks) PAS Turn Over

4 2 In an agricultural experiment, measurements are collected on the volume (in mm 3 ), height (in cm) and diameter (in mm) at 4.5 ft. above ground level for a sample of 31 black cherry trees in the Allegheny National Forest, Pennsylvania, USA. Denote the volume by y, the height by h and the diameter by d. The following model was considered: y i = α + βh i + γd i + δd 2 i + ɛ i. (i) Write down the tted model and discuss the suitability of this model, based on the S-Plus output given below. (6 marks) Coefficients: Value Std. Error t value Pr(> t ) (Intercept) Height Diameter I(Diameter^2) Residual standard error: on 27 degrees of freedom Multiple R-Squared: F-statistic: on 3 and 27 degrees of freedom, the p-value is 0 PAS Question 2 continued on next page

5 2 (continued) (ii) Some further analysis showed that the standardized residuals and the diagonal elements of the hat matrix were as follows. Standardized residuals Hat values [1] [7] [13] [19] [25] [31] (a) (b) (c) (d) Calculate approximate variances of the non-standardized residuals e 1, e 2, e 3, e 30 and e 31. (5 marks) Using the standardized residuals use an appropriate test to nd out whether there are any outliers, and comment. (7 marks) Calculate the Cook's distance for observation y i, with i = 1, 2, 3, 30, 31 and check whether these are inuential observations. (10 marks) Summarize your conclusions from (b) and (c) and make recommendations for any further analysis. (5 marks) PAS Turn Over

6 3 The following table shows ve observations of a response variable y and two explanatory variables x and z. (i) y i x i z i It is initially suggested that a linear model is considered as y i = α + βx i + γz i + ɛ i, ɛ i N(0, σ 2 ). Show that this model is overparameterised. Describe briey the phenomenon that is behind this particular form of overparametrisation. How can overparameterisation be resolved for this particular data set? (8 marks) (ii) Now consider the alternative model y i = α + βx i + γx 2 i + ɛ i, ɛ i N(0, σ 2 ). Find the least squares estimate of β = (α, β, γ) T and provide 95% condence intervals for γ and for σ 2. HINT: you can make use of the following inverse matrix result: = (20 marks) (iii) Without doing any further calculations, with the information given in (i) and (ii), suggest a suitable model and give brief explanations. (5 marks) PAS Continued

7 4 (i) Explain briey the role of the Akaike information criterion (AIC), and the S-Plus command step, in model reduction. (5 marks) (ii) In a study of timber, the volume v of usable timber when a tree is felled is studied in terms of the height h and girth g of the tree. (a) In a polynomial regression, terms in h, g, h 2, hg, g 2 are introduced. When the term hg 2 is then introduced, it is found that this term is rejected as AIC is increased, not reduced. Why might this be? (5 marks) (b) How would you decide between the model using hg 2 and that using {h, g, h 2, hg, g 2 }, neither of which is nested within the other? (5 marks) (iii) In a study of loss through evaporation in a petrol tank, the loss y is measured in grams. The regressors thought relevant are: x 1 : x 2 : x 3 : x 4 : the initial tank temperature (F), the temperature of the petrol when dispensed (F), the initial vapour pressure in this tank (pounds per square inch), the vapour pressure of the petrol when dispensed (pounds per square inch). The data set consists of 32 points (y, x 1, x 2, x 3, x 4 ). An initial regression study suggests that interaction terms are not needed. The attached S-Plus output relates to three models under consideration. (a) Discuss briey the initial model; (5 marks) (b) Discuss briey the intermediate model; (5 marks) (c) Discuss briey the nal model. (5 marks) (d) Summarise your conclusions for use by the petrol company commissioning the study. (3 marks) PAS Question 4 continued on next page

8 4 (continued) Call: lm(formula = y ~ x1 + x2 + x3 + x4) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) x x x x Residual standard error: on 27 degrees of freedom Multiple R-Squared: F-statistic: on 4 and 27 degrees of freedom, the p-value is 9.215e-015 **Using stepwise regression** >step(lm(y~x1+x2+x3+x4)) Start: AIC = y ~ x1 + x2 + x3 + x4 Single term deletions Model: y ~ x1 + x2 + x3 + x4 scale: Df Sum of Sq RSS Cp <none> x x x x PAS Question 4 continued on next page

9 4 (continued) Step: AIC = y ~ x2 + x3 + x4 Single terms deletions Model: y ~ x2 + x3 + x4 scale: Df Sum of Sq RSS Cp <none> x x x Call: lm(formula = y ~ x2 + x3 + x4) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) x x x Residual standard error: on 28 degrees of freedom Multiple R-Squared: F-statistic: on 3 and 28 degrees of freedom, the p-value is 8.882e-016 PAS Question 4 continued on next page

10 4 (continued) Call: lm(formula = y ~ x2 + x4) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) x x Residual standard error: on 29 degrees of freedom Multiple R-Squared: F-statistic: 151 on 2 and 29 degrees of freedom, the p-value is 4.441e-016 End of Question Paper PAS

SCHOOL OF MATHEMATICS AND STATISTICS

RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester