Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

Similar documents
MTH6134 / MTH6134P: Statistical Modelling II

MTH4107 / MTH4207: Introduction to Probability

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

School of Mathematical Sciences. Question 1. Best Subsets Regression

ASTM109 Stellar Structure and Evolution Duration: 2.5 hours

School of Mathematical Sciences. Question 1

MTH4101: Calculus II [SOLUTIONS]

TMA4255 Applied Statistics V2016 (5)

SPA7023P/SPA7023U/ASTM109 Stellar Structure and Evolution Duration: 2.5 hours

PHY214 Thermal & Kinetic Physics Duration: 2 hours 30 minutes

PHY413 Quantum Mechanics B Duration: 2 hours 30 minutes

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Examination paper for TMA4255 Applied statistics

Inference for Regression Inference about the Regression Model and Using the Regression Line

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

BSc/MSci MidTerm Test

STAT 212 Business Statistics II 1

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

EXAM IN TMA4255 EXPERIMENTAL DESIGN AND APPLIED STATISTICAL METHODS

Regression Models - Introduction

Concordia University (5+5)Q 1.

Model Building Chap 5 p251

23. Inference for regression

YOU ARE NOT PERMITTED TO READ THE CONTENTS OF THIS QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY AN INVIGILATOR.

INFERENCE FOR REGRESSION

Chapter 12: Multiple Regression

Question Possible Points Score Total 100

Models with qualitative explanatory variables p216

The simple linear regression model discussed in Chapter 13 was written as

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

STATISTICS 110/201 PRACTICE FINAL EXAM

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Multiple Regression Examples

Weighted Least Squares

Analysis of Bivariate Data

Ch 3: Multiple Linear Regression

Ph.D. Preliminary Examination Statistics June 2, 2014

2. Regression Review

MATH 644: Regression Analysis Methods

28. SIMPLE LINEAR REGRESSION III

SMAM 314 Practice Final Examination Winter 2003

Lecture 18: Simple Linear Regression

This document contains 3 sets of practice problems.

STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 6:00 PM

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Contents. 2 2 factorial design 4

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Math Section MW 1-2:30pm SR 117. Bekki George 206 PGH

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Ch 13 & 14 - Regression Analysis

Multiple Linear Regression

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

STAT 360-Linear Models

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

STAT-UB.0103 Exam APRIL.11 SQUARE Version Solutions

1 Introduction to Minitab

Examination paper for TMA4255 Applied statistics

Confidence Interval for the mean response

Math 423/533: The Main Theoretical Topics

Introduction to Regression

13 Simple Linear Regression

Regression Models - Introduction

1 Least Squares Estimation - multiple regression.

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Multiple Regression Methods

Business 320, Fall 1999, Final

Final Exam Bus 320 Spring 2000 Russell

Chapter 1. Linear Regression with One Predictor Variable

14 Multiple Linear Regression

ECONOMETRICS I. Cheating and the violation of any of the above instructions, lead to the cancellation of the student s paper.

Simple Linear Regression: A Model for the Mean. Chap 7

MSci EXAMINATION. Date: XX th May, Time: 14:30-17:00

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Multiple Linear Regression

Quantitative Methods I: Regression diagnostics

3 Multiple Linear Regression

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Conditions for Regression Inference:

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Six Sigma Black Belt Study Guides

MULTIPLE LINEAR REGRESSION IN MINITAB

Basic Business Statistics 6 th Edition

Basic Business Statistics, 10/e

SEAT NUMBER: STUDENT NUMBER: SURNAME: {FAMILY NAME) OTHER NAMES:

Solution: X = , Y = = = = =

Stat 500 Midterm 2 8 November 2007 page 0 of 4

Inferences for linear regression (sections 12.1, 12.2)

Oregon Hill Wireless Survey Regression Model and Statistical Evaluation. Sky Huvard

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

(1) The explanatory or predictor variables may be qualitative. (We ll focus on examples where this is the case.)

CHAPTER EIGHT Linear Regression

Transcription:

B. Sc. Examination by course unit 2014 MTH5120 Statistical Modelling I Duration: 2 hours Date and time: 16 May 2014, 1000h 1200h Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator. You should attempt all questions. Marks awarded are shown next to the questions. Calculators ARE permitted in this examination. The unauthorized use of material stored in pre-programmable memory constitutes an examination offence. Please state on your answer book the name and type of machine used. Statistical functions provided by the calculator may be used provided that you state clearly where you have used them. The New Cambridge Statistical Tables are provided. Complete all rough workings in the answer book and cross through any work which is not to be assessed. Important note: the Academic Regulations state that possession of unauthorized material at any time by a student who is under examination conditions is an assessment offence and can lead to expulsion from QMUL. Please check now to ensure you do not have any notes, mobile phones or unauthorised electronic devices on your person. If you have any, then please raise your hand and give them to an invigilator immediately. Please be aware that if you are found to have hidden unauthorised material elsewhere, including toilets and cloakrooms, it will be treated as being found in your possession. Unauthorised material found on your mobile phone or other electronic device will be considered the same as being in possession of paper notes. Disruption caused by mobile phones is also an examination offence. Exam papers must not be removed from the examination room. Examiner(s): B Bogacka, L I Pettit TURN OVER

Page 2 MTH5120 (2014) Question 1 [26 marks] The Environmental Protection Agency has been interested in evaluating the fuel efficiency of cars. In an observational study n = 38 recordings of miles per gallon (Y ) and two explanatory variables: weight (X 1 ) and displacement (X 2 ) of engine were taken. After the power transformation Y λ with λ = 0.5, a second order model Y λ i = β 0 + β 1 x 1i + β 2 x 2i + β 11 x 2 1i + β 22 x 2 2i + β 12 x 1i x 2i + ε i was fitted, assuming ε i iid N (0, σ 2 ). A part of the MINITAB output is given below. y^-0.5 = - 0.205-0.0377 x1 + 0.0142 x2-0.0147 x1^2-0.0220 x2^2 + 0.0371 x1x2 Coefficients Predictor Coef SE Coef T P Constant -0.204908 0.002717-75.41 0.000 x1-0.037662 0.007113-5.29 0.000 x2 0.014222 0.008231 1.73 0.094 x1^2-0.01468 0.01242-1.18 0.246 x2^2-0.02199 0.01183-1.86 0.072 x1x2 0.03707 0.02379 1.56 0.129 S = 0.00915622 R-Sq = 90.6% R-Sq(adj) = 89.1% PRESS = 0.00442312 R-Sq(pred) = 84.44% Analysis of Variance Source DF SS MS F P Regression 5 0.0257390 0.0051478 61.40 0.000 Residual Error 32 0.0026828 0.0000838 Total 37 0.0284218 Source DF Seq SS x1 1 0.0242711 x2 1 0.0011307 x1^2 1 0.0000107 x2^2 1 0.0001230 x1x2 1 0.0002035 (a) Write down the null and alternative hypotheses for the parameter β 22 tested in the table of Coefficients. What do you conclude? [4] (b) Write down the null and alternative hypotheses tested in the Analysis of Variance table. What do you conclude? [4] (c) Test, at the significance level α = 0.1, the null hypothesis H 0 : β 11 = β 22 = β 12 = 0 versus H 1 : H 0. Write down the test statistic and explain your notation. [11] Question 1 continues on the next page.

MTH5120 (2014) Page 3 The reduced model for the transformed variable was fitted and a part of the MINITAB output is given below. y^-0.5 = - 0.206-0.0426 x1 + 0.0178 x2 Predictor Coef SE Coef T P Constant -0.206296 0.001507-136.90 0.000 x1-0.042571 0.004928-8.64 0.000 x2 0.017837 0.004928 3.62 0.001 S = 0.00928900 R-Sq = 89.4% R-Sq(adj) = 88.8% PRESS = 0.00360448 R-Sq(pred) = 87.32% (d) List three indicators shown in the numerical output which suggest an improvement in the model fit compared to the full model and briefly justify your choice. [3] (e) Briefly comment on the residual plots shown above. [4] Question 2 [24 marks] The Least Squares Estimator of β 1, β1, in the Simple Linear Model Y i = β 0 + β 1 x i + ε i, where ε i iid N (0, σ 2 ), can be written as n β 1 = c i Y i, where c i = x i x S xx and S xx = Show that β 1 i=1 n (x i x) 2. (a) is normally distributed, [4] (b) is unbiased for β 1, [10] (c) and has variance equal to σ 2 /S xx. [10] i=1 TURN OVER

Page 4 MTH5120 (2014) Question 3 [24 marks] In the investigation of the dependence of heat rate (Y, in KJ/KW/h) of gas turbines on cycle speed (X 1, in revolutions per minute), cycle pressure ratio (X 2 ), inlet temperature (X 3, in C ) and exhaust gas temperature (X 4, in C ), n = 67 observations were recorded. X 1 - X 4 are potential explanatory variables for a multiple linear regression model of the response variable Y. A part of the MINITAB numerical output is displayed below. Response is y Mallows x x x x Vars R-Sq R-Sq(adj) Cp S 1 2 3 4 1 71.2 70.8 156.9 862.01 X 1 64.1 63.5 211.5 963.13 X 2 87.3 86.9 36.5 578.32 X X 2 84.8 84.3 55.4 631.88 X X 3 91.9 91.5 3.0 464.98 X X X 3 90.2 89.7 16.3 512.27 X X X 4 91.9 91.4 5.0 468.63 X X X X (a) Based on all the information above, suggest, with justification, the two best models for describing the relationship between the response and explanatory variables. Indicate which one of the models might be better and why. [8] A part of the MINITAB numerical output for the model including all available explanatory variables is given below. (b) Give the definition of the variance inflation factor (VIF). What impact on the statistical inference can a high variance inflation factor make? [4] (c) Briefly comment on the values of the VIF shown in the output below. [3] y = 14398 + 0.106 x1-4.4 x2-9.03 x3 + 12.1 x4 Predictor Coef SE Coef T P VIF Constant 14397.9 784.6 18.35 0.000 x1 0.10551 0.01107 9.53 0.000 1.818 x2-4.36 30.08-0.15 0.885 4.897 x3-9.033 1.529-5.91 0.000 13.264 x4 12.054 3.308 3.64 0.001 6.407 S = 468.635 R-Sq = 91.9% R-Sq(adj) = 91.4% A part of the MINITAB numerical output for the model including X 1, X 3, X 4 is given on the next page. (d) Briefly comment on the values of the VIF. [3] (e) Based on the MINITAB output for both models compare the accuracy of estimation of the model parameters. [2]

MTH5120 (2014) Page 5 y = 14360 + 0.105 x1-9.22 x3 + 12.4 x4 Predictor Coef SE Coef T P VIF Constant 14359.7 733.3 19.58 0.000 x1 0.10515 0.01071 9.82 0.000 1.727 x3-9.2226 0.7869-11.72 0.000 3.570 x4 12.426 2.071 6.00 0.000 2.551 S = 464.980 R-Sq = 91.9% R-Sq(adj) = 91.5% Plots of leverage values and Cook s distance values for both models are given below. (f) Briefly comment on the impact of removing X 2 from the full model on the potentially influential observations. [4] Model including all four explanatory variables. Model without X 2. F 0.5;4,62 = 0.8484 F 0.5;3,63 = 0.7973 TURN OVER

Page 6 MTH5120 (2014) Question 4 [26 marks] Consider the model Y = Xβ + ε, where Y is an n-dimensional vector of response variables, E(ε) = 0, Var(ε) = σ 2 I, β is a p-dimensional vector of unknown, constant parameters and X is an (n p)- dimensional design matrix. We define the vector of residuals as e = Y Ŷ, where Ŷ = HY and H = X(X T X) 1 X T. Show the following properties of matrix H: (a) H is symmetric and HH = H, [3] (b) (I H)(I H) = I H. [2] Furthermore, show the following properties of the vector of residuals: (c) E(e) = 0, [3] (d) Var(e) = σ 2 (I H), [3] (e) SS E = Y T (I H)Y, where SS E = e T e, [3] (f) E(SS E ) = (n p)σ 2, [10] (g) E(MS E ) = σ 2. [2] End of Paper