SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

Similar documents
SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

ST430 Exam 2 Solutions

> Y ~ X1 + X2. The tilde character separates the response variable from the explanatory variables. So in essence we fit the model

MATH 644: Regression Analysis Methods

Multiple Regression: Example

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Inference for Regression

Stat 5102 Final Exam May 14, 2015

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1 Introduction 1. 2 The Multiple Regression Model 1

MODELS WITHOUT AN INTERCEPT

Linear Model Specification in R

Regression and the 2-Sample t

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

ST430 Exam 1 with Answers

Example: 1982 State SAT Scores (First year state by state data available)

General Linear Statistical Models - Part III

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

14 Multiple Linear Regression

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

Regression. Marc H. Mehlman University of New Haven

1 Multiple Regression

Lecture 18: Simple Linear Regression

Unit 6 - Introduction to linear regression

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Linear Modelling: Simple Regression

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

School of Mathematical Sciences. Question 1

Tests of Linear Restrictions

Chapter 8 Conclusion

Lecture 2. Simple linear regression

Exam Applied Statistical Regression. Good Luck!

Chapter 1 Statistical Inference

STAT 350: Summer Semester Midterm 1: Solutions

MS&E 226: Small Data

Practice Final Examination

Question Possible Points Score Total 100

Lecture 10. Factorial experiments (2-way ANOVA etc)

STAT 572 Assignment 5 - Answers Due: March 2, 2007

Stat 401B Final Exam Fall 2015

Generalised linear models. Response variable can take a number of different formats

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

22s:152 Applied Linear Regression

1 The Classic Bivariate Least Squares Model

STATISTICS 110/201 PRACTICE FINAL EXAM

Unit 6 - Simple linear regression

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Föreläsning /31

R in Linguistic Analysis. Wassink 2012 University of Washington Week 6

Chapter 3: Examining Relationships Review Sheet

Linear Models (continued)

Multiple Regression and Regression Model Adequacy

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Stat 401B Final Exam Fall 2016

Lecture 1: Linear Models and Applications

Tutorial 6: Linear Regression

MATH 423/533 - ASSIGNMENT 4 SOLUTIONS

Density Temp vs Ratio. temp

Introduction and Background to Multilevel Analysis

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 4 Multiple linear regression

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

R Output for Linear Models using functions lm(), gls() & glm()

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

1 Use of indicator random variables. (Chapter 8)

Topic 18: Model Selection and Diagnostics

Answers to Problem Set #4

Regression Analysis for Data Containing Outliers and High Leverage Points

Lecture 18 MA Applied Statistics II D 2004

CAS MA575 Linear Models

REVIEW 8/2/2017 陈芳华东师大英语系

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

AMS-207: Bayesian Statistics

General Linear Statistical Models

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

Biostatistics 380 Multiple Regression 1. Multiple Regression

STK4900/ Lecture 3. Program

STAT 350 Final (new Material) Review Problems Key Spring 2016

Econometrics Homework 1

Math 2311 Written Homework 6 (Sections )

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Correlation and Regression

Simple linear regression

Lecture 6. Multiple linear regression. Model validation, model choice

Inferences for Regression

Lecture 4: Regression Analysis

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Introduction to Linear Regression

STATISTICS 174: APPLIED STATISTICS TAKE-HOME FINAL EXAM POSTED ON WEBPAGE: 6:00 pm, DECEMBER 6, 2004 HAND IN BY: 6:00 pm, DECEMBER 7, 2004 This is a

Class: Taylor. January 12, Story time: Dan Willingham, the Cog Psyc. Willingham: Professor of cognitive psychology at Harvard

Sociology 593 Exam 1 February 17, 1995

STA 6167 Exam 1 Spring 2016 PRINT Name

Transcription:

RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: "Statistics Tables" by H.R. Neave PAS 371 SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester 2008 9 Linear Models 2 hours Marks will be awarded for your best three answers. RESTRICTED OPEN BOOK EXAMINATION Candidates may bring to the examination lecture notes and associated lecture material (but no textbooks) plus a calculator that conforms to University regulations. There are 99 marks available on the paper. Please leave this exam paper on your desk Do not remove it from the hall Registration number from U-Card (9 digits) to be completed by student PAS 371 1 Turn Over

Blank PAS 371 2 Continued

1 Four objects O 1, O 2, O 3, O 4 are weighed in a balance. Four weighings are made; the (i, j)-th element in the matrix below is +1 if object O i is placed in the left pan of the balance for weighing j, and it is -1 if it is placed in the right pan. We are required to estimate the weights of the four objects, given the weights y j required in the right pan to achieve balance (j = 1, 2, 3, 4): 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1. (i) (ii) (iii) Formulate a regression model for this problem, expressing the observed weights y i in terms of the unknown weights β j of the four objects. (7 marks) Showing your working explicitly, obtain expressions for the least-squares estimators of the weights of the four objects. (8 marks) Evaluate these estimates for data y 1 = 20.2, y 2 = 8.0, y 3 = 9.7, y 4 = 1.9. (3 marks) (iv) The whole experiment is now replicated n times. Showing your working explicitly, calculate the new least-squares estimates. (15 marks) PAS 371 3 Turn Over

2 In an agricultural experiment, measurements are collected on the volume (in mm 3 ), height (in cm) and diameter (in mm) at 4.5 ft. above ground level for a sample of 31 black cherry trees in the Allegheny National Forest, Pennsylvania, USA. Denote the volume by y, the height by h and the diameter by d. The following model was considered: y i = α + βh i + γd i + δd 2 i + ɛ i. (i) Write down the tted model and discuss the suitability of this model, based on the S-Plus output given below. (6 marks) Coefficients: Value Std. Error t value Pr(> t ) (Intercept) 10.1751 1.5214 6.6880 0.0000 Height -0.0644 0.0221-2.9134 0.0071 Diameter 0.3287 0.0306 10.7234 0.0000 I(Diameter^2) -0.0017 0.0004-4.5288 0.0001 Residual standard error: 0.6068 on 27 degrees of freedom Multiple R-Squared: 0.9664 F-statistic: 258.5 on 3 and 27 degrees of freedom, the p-value is 0 PAS 371 4 Question 2 continued on next page

2 (continued) (ii) Some further analysis showed that the standardized residuals and the diagonal elements of the hat matrix were as follows. Standardized residuals 1 2 3 4 5 6 7 8-1.0397-1.1000-0.9346 0.0293 0.2623 0.2508 0.6173 0.3776 9 10 11 12 13 14 15 16-0.8651-0.0481-1.3087-0.0857-0.2599-0.4780 1.6633 1.7043 17 18 19 20 21 22 23 24-1.7923 1.6481 1.2791 1.0997-0.8744 0.7333-1.1277 0.5589 25 26 27 28 29 30 31 0.1761-1.3299-0.9426-1.1155 0.8063 0.9430 2.0158 Hat values [1] 0.160950 0.192763 0.225723 0.065416 0.131057 0.166165 [7] 0.120374 0.058250 0.081715 0.048201 0.063211 0.048424 [13] 0.047078 0.075931 0.052287 0.041557 0.136463 0.180705 [19] 0.070914 0.210029 0.070187 0.071404 0.101612 0.144019 [25] 0.098835 0.113444 0.113520 0.138115 0.102000 0.100692 [31] 0.768945 (a) (b) (c) (d) Calculate approximate variances of the non-standardized residuals e 1, e 2, e 3, e 30 and e 31. (5 marks) Using the standardized residuals use an appropriate test to nd out whether there are any outliers, and comment. (7 marks) Calculate the Cook's distance for observation y i, with i = 1, 2, 3, 30, 31 and check whether these are inuential observations. (10 marks) Summarize your conclusions from (b) and (c) and make recommendations for any further analysis. (5 marks) PAS 371 5 Turn Over

3 The following table shows ve observations of a response variable y and two explanatory variables x and z. (i) y i x i z i 2 1 10 3 4 40 3 3 30 1 1 10 7 10 100 It is initially suggested that a linear model is considered as y i = α + βx i + γz i + ɛ i, ɛ i N(0, σ 2 ). Show that this model is overparameterised. Describe briey the phenomenon that is behind this particular form of overparametrisation. How can overparameterisation be resolved for this particular data set? (8 marks) (ii) Now consider the alternative model y i = α + βx i + γx 2 i + ɛ i, ɛ i N(0, σ 2 ). Find the least squares estimate of β = (α, β, γ) T and provide 95% condence intervals for γ and for σ 2. HINT: you can make use of the following inverse matrix result: 5 19 1 27 1.375 0.669 0.054 19 127 1093 = 0.669 0.413 0.035. 127 1093 10339 0.054 0.035 0.003 (20 marks) (iii) Without doing any further calculations, with the information given in (i) and (ii), suggest a suitable model and give brief explanations. (5 marks) PAS 371 6 Continued

4 (i) Explain briey the role of the Akaike information criterion (AIC), and the S-Plus command step, in model reduction. (5 marks) (ii) In a study of timber, the volume v of usable timber when a tree is felled is studied in terms of the height h and girth g of the tree. (a) In a polynomial regression, terms in h, g, h 2, hg, g 2 are introduced. When the term hg 2 is then introduced, it is found that this term is rejected as AIC is increased, not reduced. Why might this be? (5 marks) (b) How would you decide between the model using hg 2 and that using {h, g, h 2, hg, g 2 }, neither of which is nested within the other? (5 marks) (iii) In a study of loss through evaporation in a petrol tank, the loss y is measured in grams. The regressors thought relevant are: x 1 : x 2 : x 3 : x 4 : the initial tank temperature (F), the temperature of the petrol when dispensed (F), the initial vapour pressure in this tank (pounds per square inch), the vapour pressure of the petrol when dispensed (pounds per square inch). The data set consists of 32 points (y, x 1, x 2, x 3, x 4 ). An initial regression study suggests that interaction terms are not needed. The attached S-Plus output relates to three models under consideration. (a) Discuss briey the initial model; (5 marks) (b) Discuss briey the intermediate model; (5 marks) (c) Discuss briey the nal model. (5 marks) (d) Summarise your conclusions for use by the petrol company commissioning the study. (3 marks) PAS 371 7 Question 4 continued on next page

4 (continued) Call: lm(formula = y ~ x1 + x2 + x3 + x4) Residuals: Min 1Q Median 3Q Max -5.799-1.211-0.1308 1.3 5.205 Coefficients: Value Std. Error t value Pr(> t ) (Intercept) 0.9591 1.8812 0.5098 0.6143 x1-0.0385 0.0914-0.4211 0.6770 x2 0.2233 0.0679 3.2914 0.0028 x3-3.6639 2.7406-1.3369 0.1924 x4 8.3536 2.6620 3.1381 0.0041 Residual standard error: 2.754 on 27 degrees of freedom Multiple R-Squared: 0.9247 F-statistic: 82.93 on 4 and 27 degrees of freedom, the p-value is 9.215e-015 **Using stepwise regression** >step(lm(y~x1+x2+x3+x4)) Start: AIC = 280.7007 y ~ x1 + x2 + x3 + x4 Single term deletions Model: y ~ x1 + x2 + x3 + x4 scale: 7.586504 Df Sum of Sq RSS Cp <none> 204.8356 280.7007 x1 1 1.34549 206.1811 266.8731 x2 1 82.18732 287.0229 347.7150 x3 1 13.55965 218.3953 279.0873 x4 1 74.70807 279.5437 340.2357 PAS 371 8 Question 4 continued on next page

4 (continued) Step: AIC = 266.8731 y ~ x2 + x3 + x4 Single terms deletions Model: y ~ x2 + x3 + x4 scale: 7.586504 Df Sum of Sq RSS Cp <none> 206.1811 266.8731 x2 1 83.80282 289.9839 335.5030 x3 1 32.29814 238.4793 283.9983 x4 1 88.12139 294.3025 339.8215 Call: lm(formula = y ~ x2 + x3 + x4) Residuals: Min 1Q Median 3Q Max -5.864-1.32-0.06399 1.555 4.947 Coefficients: Value Std. Error t value Pr(> t ) (Intercept) 1.0313 1.8457 0.5588 0.5808 x2 0.2145 0.0636 3.3735 0.0022 x3-4.3911 2.0967-2.0943 0.0454 x4 8.6799 2.5091 3.4594 0.0018 Residual standard error: 2.714 on 28 degrees of freedom Multiple R-Squared: 0.9242 F-statistic: 113.9 on 3 and 28 degrees of freedom, the p-value is 8.882e-016 PAS 371 9 Question 4 continued on next page

4 (continued) Call: lm(formula = y ~ x2 + x4) Residuals: Min 1Q Median 3Q Max -7.938-1.453 0.333 1.803 5.28 Coefficients: Value Std. Error t value Pr(> t ) (Intercept) 1.1859 1.9032 0.0977 0.9228 x2 0.2750 0.0599 4.5943 0.0001 x4 8.5991 0.6768 5.3179 0.0000 Residual standard error: 2.868 on 29 degrees of freedom Multiple R-Squared: 0.9124 F-statistic: 151 on 2 and 29 degrees of freedom, the p-value is 4.441e-016 End of Question Paper PAS 371 10