EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 017 MODULE 4 : Liear models Time allowed: Oe ad a half hours Cadidates should aswer THREE questios. Each questio carries 0 marks. The umber of marks allotted for each part-questio is show i brackets. Graph paper ad Official tables are provided. Cadidates may use calculators i accordace with the regulatios published i the Society's "Guide to Examiatios" (documet Ex1). The otatio log deotes logarithm to base e. Logarithms to ay other base are explicitly idetified, e.g. log10. Note also that r is the same as C. r 1 HC Module 4 017 This examiatio paper cosists of 8 prited pages. This frot cover is page 1. Questio 1 starts o page 3. RSS 017 There are 4 questios altogether i the paper.
BLANK PAGE
1. A gas geeratio plat distills liquid air to produce oxyge. The percetage purity of the oxyge is thought to be liearly related to the amout of impurities i the air, as measured by the "pollutio cout" i parts per millio by volume (ppm). The followig data were collected o 15 successive days. Purity (%) y 93.3 9.0 9.4 91.7 94.0 94.6 93.6 93.1 Pollutio cout (ppm) x 1.10 1.45 1.36 1.59 1.08 0.75 1.0 0.99 Purity (%) y 93. 9.9 9. 91.3 90.1 91.6 91.9 Pollutio cout (ppm) x 0.83 1. 1.47 1.81.03 1.75 1.68 (i) Fit a liear regressio to the data usig xi 0.31, y i 1387.9, S 1.959 56, S 5.6846 ad S 18.8693, where S ( x x ), xx xy S ( x x )( y y) ad S yy ( yi y). xy i i yy xx i (4) Costruct the ANOVA table for this model ad perform the appropriate hypothesis test usig the 0.1% sigificace level. Hece write dow the value of s, the estimate of the error variace. (9) (iii) Fid a 95% cofidece iterval for the slope of the regressio equatio. (3) (iv) Fid a 90% cofidece iterval for the mea purity o a day whe the pollutio cout is 1.00. (4) [You may use the fact that the estimated variace for the predicted mea respose ˆ ˆ 1 ( x0 x) x0 is s S xx.] 3
. (a) Sketch scatter diagrams to illustrate the followig features of bivariate data. Commet briefly o each of your plots. (i) (iii) Strog positive associatio, appropriately reflected by the product momet correlatio coefficiet. () Weak egative associatio, appropriately reflected by the product momet correlatio coefficiet. () Strog egative associatio, appropriately reflected by Spearma's rak correlatio coefficiet, but less satisfactorily by the product momet correlatio coefficiet. () (iv) Strog associatio with a o-mootoic tred. () (b) The followig table shows diastolic (DBP) ad systolic (SBP) blood pressure measuremets (i mm Hg) for 10 radomly chose cardiac patiets. DBP 55 60 70 75 80 85 90 95 105 110 SBP 15 115 10 135 105 145 130 00 190 150 S 96.5, S 3437.5 ad S 880.5, where S ( x x ), xx xy S ( x x )( y y) ad S yy ( yi y). xy i i yy xx i (i) Calculate the sample product-momet correlatio coefficiet of these data. Test at the 1% level the ull hypothesis that = 0 agaist the alterative hypothesis that > 0, where is the populatio value of the product momet correlatio coefficiet. State ay assumptios made i performig the test. (6) Calculate the value of Spearma's rak correlatio coefficiet for these data, ad carry out the correspodig test. State your coclusios clearly. (6) 4
3. A experimet was coducted to examie the effect of differet lightig coditios o the umber of eggs laid by a certai breed of chickes. The treatmets were O : cotrol (atural daylight), E : exteded day (atural daylight exteded by artificial light to a total of 14 hours), F : flashlight (atural daylight plus flashes of light every 0 secods through the ight). Twelve pes each cotaiig 6 chickes were radomly allocated to the three treatmets. The total umber of eggs laid i a give period was recorded as follows. You are give that jth pe uder the ith treatmet. O 330 88 95 313 E 37 340 343 341 F 359 337 373 30 3 4 yij 1337 535, where ij j 1 y is the umber of eggs laid i the (i) (iii) Write dow a appropriate model for these data, explaiig fully all the terms i the model. State ay assumptios that are made for this model. (4) Draw up a Aalysis of Variace table for this model ad test for differeces i the treatmets at the 10% sigificace level. (11) Test whether there are treatmet differeces betwee the exteded day (E) ad flashlight (F) at the 5% sigificace level. (5) 5
4. Data for the first year box office receipts (Y) have bee collected for a umber of movies. A project to model these receipts collects data o the total productio costs (Xl), promotioal costs (X) ad ay associated book sales (X3), all data beig measured i millios of US dollars. Cosider the edited computer output from three regressio models, labelled A, B ad C, as give below ad o the ext page. (i) (iii) (iv) Briefly commet o the scatter plots, i Figures 1, ad 3, of the observed agaist fitted y for the three models. Relate your commets i each case to s, the square root of the mea square error. (4) I Model A, test for the global sigificace, at the 5% level, of the regressio model, statig clearly the ull ad alterative hypotheses that you are testig. State the ull distributio of the test statistic. Iterpret the statemet "Multiple R-Squared: 0.9668" ad explai how this quatity is calculated. (5) By cosiderig the output from all three models, say which of the explaatory variables should be icluded i the model ad justify your aswer usig appropriate t tests. (6) Which of the three models do you cosider best describes the data? Justify your aswer. (5) Model A: Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 7.6760 6.760 1.135 0.995 x1 3.6616 1.1178 3.76 0.0169 * x 7.611 1.6573 4.598 0.0037 ** x3 0.885 0.5394 1.536 0.1754 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual stadard error: 7.541 o 6 degrees of freedom Multiple R-squared: 0.9668, Adjusted R-squared: 0.950 F-statistic: 58. o 3 ad 6 DF, p-value: 7.913e-05 Model B: Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 11.848 6.765 1.751 0.1334 x1 4.8 1.153 3.667 0.00800 ** x 7.436 1.806 4.117 0.00448 ** --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual stadard error: 8.41 o 7 degrees of freedom Multiple R-squared: 0.9537, Adjusted R-squared: 0.9405 F-statistic: 7.14 o ad 7 DF, p-value:.131e-05 6
Model C: Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 3.163 9.65.407 0.047 * x 1.669 1.771 7.155 9.66e-05 *** --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual stadard error: 13.17 o 8 degrees of freedom Multiple R-squared: 0.8648, Adjusted R-squared: 0.8479 F-statistic: 51.19 o 1 ad 8 DF, p-value: 9.665e-05 Figure 1 Figure Model A Scatterplot of Y vs fitted values Model B Scatterplot of Y vs fitted values Figure 3 Model C Scatterplot of Y vs fitted values 7
BLANK PAGE 8