School of Mathematical Sciences. Question 1

Similar documents
School of Mathematical Sciences. Question 1. Best Subsets Regression

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

Simple Linear Regression: A Model for the Mean. Chap 7

INFERENCE FOR REGRESSION

Multiple Regression Examples

Models with qualitative explanatory variables p216

Confidence Interval for the mean response

The simple linear regression model discussed in Chapter 13 was written as

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Ch 13 & 14 - Regression Analysis

Analysis of Bivariate Data

1. Least squares with more than one predictor

Basic Business Statistics, 10/e

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

LECTURE 5 HYPOTHESIS TESTING

Six Sigma Black Belt Study Guides

Ch 3: Multiple Linear Regression

SMAM 314 Exam 42 Name

Simple Linear Regression

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

TMA4255 Applied Statistics V2016 (5)

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

Inference for Regression Inference about the Regression Model and Using the Regression Line

Multiple Predictor Variables: ANOVA

Inference for the Regression Coefficient

R 2 and F -Tests and ANOVA

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Difference in two or more average scores in different groups

Regression Analysis II

Inferences for Regression

28. SIMPLE LINEAR REGRESSION III

Lecture 6 Multiple Linear Regression, cont.

Multiple Regression Methods

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

23. Inference for regression

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Tests of Linear Restrictions

MATH ASSIGNMENT 2: SOLUTIONS

Q Lecture Introduction to Regression

STATISTICS 110/201 PRACTICE FINAL EXAM

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

SMAM 314 Practice Final Examination Winter 2003

Oregon Hill Wireless Survey Regression Model and Statistical Evaluation. Sky Huvard

Chapter 12: Multiple Regression

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

This document contains 3 sets of practice problems.

Examination paper for TMA4255 Applied statistics

Inference for Regression

Ch 2: Simple Linear Regression

A discussion on multiple regression models

Variance Decomposition and Goodness of Fit

Comparing Nested Models

MATH 644: Regression Analysis Methods

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Basic Business Statistics 6 th Edition

Model Building Chap 5 p251

Statistics and Quantitative Analysis U4320

Chapter 14 Multiple Regression Analysis

1 Use of indicator random variables. (Chapter 8)

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

:Effects of Data Scaling We ve already looked at the effects of data scaling on the OLS statistics, 2, and R 2. What about test statistics?

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

The Multiple Regression Model

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT 212 Business Statistics II 1

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Biostatistics 380 Multiple Regression 1. Multiple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

SIMPLE REGRESSION ANALYSIS. Business Statistics

Ph.D. Preliminary Examination Statistics June 2, 2014

Introduction to Regression

14 Multiple Linear Regression

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

22s:152 Applied Linear Regression. Take random samples from each of m populations.

13 Simple Linear Regression

Chapter 14 Simple Linear Regression (A)

Correlation and the Analysis of Variance Approach to Simple Linear Regression

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Concordia University (5+5)Q 1.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

y = µj n + β 1 b β b b b + α 1 t α a t a + e

Chapter 14. Linear least squares

Chapter 3: Multiple Regression. August 14, 2018

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Inference for Regression Simple Linear Regression

Hypothesis Testing The basic ingredients of a hypothesis test are

Single and multiple linear regression analysis

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

ECON 4230 Intermediate Econometric Theory Exam

Transcription:

School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 8 and Assignment 7 Solutions Question 1 Figure 1: The residual plots do not contradict the model assumptions of normality, constant variance and linearity. Regression Analysis: Y versus X1, X2, X3, X4, X5 Y = - 2.35 + 1.73 X1-0.00887 X2 + 0.180 X3 + 0.0140 X4 + 0.0620 X5 Constant -2.3521 0.7732-3.04 0.005 X1 1.7256 0.4325 3.99 0.000 X2-0.008866 0.002882-3.08 0.004 X3 0.1799 0.1151 1.56 0.128 X4 0.013982 0.007252 1.93 0.063 X5 0.06203 0.05176 1.20 0.240 S = 0.366788 R-Sq = 91.3% R-Sq(adj) = 89.9% Regression 5 45.1395 9.0279 67.11 0.000 Residual Error 32 4.3051 0.1345 Total 37 49.4446 X1 1 42.4219 X2 1 1.7170 X3 1 0.5000 X4 1 0.3073 X5 1 0.1932 1

Comments: T-tests: The hypotheses are H 0 : β i = 0, that is, that the coefficient of x i is zero given all other explanatory variables are in the model. The p-values for testing β 1 and β 2 are smaller than 0.01, hence we can reject the null hypothesis for each of these two parameters. The intercept is also highly significant. The p-value for testing β 4 is 0.063, hence we can reject the null hypothesis on the significance level α = 0.1 but not at α = 0.05. There is weak evidence against the null hypothesis. The p-values for testing β 3 and β 5 are larger than 0.1, hence there is no evidence against the null hypotheses, neither for β 3 nor for β 5. H 0 : β 1 = β 2 = β 3 = β 4 = β 5 versus H 1 : H 0 can be rejected at α = 0.001. That is, the overall regression is highly significant. It means that some of the explanatory variables explain a significant part of the variability of the response variable. The sequential sums of squares show that SS(β 1 ) is much larger than SS(β 2 β 1 ), that is, adding X 2 to the model given that X 1 is already in the model does not increase the regression sum of squares very much. Also SS(β 3 β 1, β 2 ) is very small and so adding X 3 to the model given that X 1 and X 2 are already there will not increase the SS R by a large amount. Similarly, SS(β 4 β 1, β 2, β 3 ) and SS(β 5 β 1, β 2, β 3, β 4 ) are very small. However, we do need to check their statistical significance. Figure 2: The matrix plot shows that there is a clear relationship between Y and each of X 1 - X 4, however not X 5. Apart from X 5, which seems not to be related to any of the variables, we can see that all explanatory variables show some dependencies among themselves. This may make the model fitting more difficult.. 2

Figure 3: The residual plots do not indicate any contradiction to the assumptions of normality, constant variance and linearity. Regression Analysis: Y versus X1, X2, X4 Y = - 1.33 + 2.09 X1-0.00781 X2 + 0.0105 X4 Constant -1.3313 0.4538-2.93 0.006 X1 2.0900 0.3488 5.99 0.000 X2-0.007812 0.002256-3.46 0.001 X4 0.010453 0.005898 1.77 0.085 S = 0.377956 R-Sq = 90.2% R-Sq(adj) = 89.3% Regression 3 44.588 14.863 104.04 0.000 Residual Error 34 4.857 0.143 Total 37 49.445 X1 1 42.422 X2 1 1.717 X4 1 0.449 Comments: We have three predictors in the model. The p-values for testing β 1 and β 2 are smaller than 0.01, hence we can reject the null hypothesis for each of these parameters. The intercept is also highly significant. However, the p-value for β 4 shows that this parameter is weakly significant. H 0 : β 1 = β 2 = β 4 = 0 versus H 1 : H 0 can be rejected at α = 0.001, that is, the overall regression is highly significant. This means that at least one of the three predictors explains a significant part of the variability of the response variable. As before, the sequential sums of squares show that SS(β 1 ) is much larger than SS(β 2 β 1 ), that is, adding X 2 to the model given that X 1 is already in the model, does not increase the regression sum of squares very much. SS(β 4 β 1, β 2 ) is smaller than SS(β 2 β 1 ). Although the statistical significance of the parameters needs to be tested. 3

Figure 4: The residual plots do not contradict the model assumptions of normality, constant variance and linearity. Regression Analysis: Y versus X1, X2 Y = - 1.30 + 2.45 X1-0.00782 X2 Constant -1.2961 0.4670-2.78 0.009 X1 2.4497 0.2922 8.38 0.000 X2-0.007821 0.002324-3.37 0.002 S = 0.389345 R-Sq = 89.3% R-Sq(adj) = 88.7% Regression 2 44.139 22.069 145.59 0.000 Residual Error 35 5.306 0.152 Total 37 49.445 X1 1 42.422 X2 1 1.717 Comments: The p-values for testing β 1 and β 2 are smaller than 0.01, hence we can reject the null hypothesis for each of these parameters. The intercept is also highly significant. H 0 : β 1 = β 2 = 0 versus H 1 : H 0 can be rejected at α = 0.001, that is, the overall regression is highly significant. This means that at least one of the two explanatory variables explains a significant part of the variability of the response variable. As before, the sequential sums of squares show that SS(β 1 ) is much larger than SS(β 2 β 1 ), that is, adding X 2 to the model given that X 1 is already in the model, does not increase the regression sum of squares very much, but again, the statistical significance of the parameters needs to be tested. 4

Question 2 The question is: can we reduce the set of regressors X 1, X 2,..., X p 1 to, say, X 1, X 2,..., X q 1 (renumbering if necessary) where q < p, by omitting X q, X q+1,..., X p 1? This can be done by testing the hypothesis versus The test function is H 0 : β q = β q+1 =... = β p 1 = 0 H 1 : H 0. F = SS extra/(p q) MS E H0 F p q,n p, where SS extra = SS R SSR red; SS R, MS E are the sums of squares obtained in the full model, SSR red is the regression sum of squares obtained in the reduced model. (a) Here we compare two models: the full model including all explanatory variables X 1, X 2, X 3, X 4, X 5 with the reduced model including only X 1, X 2 and X 4. The number of parameters in the reduced model is q = 4 and in the full model is p = 6, that is, we have p q = 2. The null and alternative hypotheses are H 0 : β 3 = β 5 = 0 versus H 1 : H 0. To obtain the value of the test statistic we calculate: SS extra = SS R SS red R = 45.1395 44.588 = 0.5515 F obs = 0.5515/2 0.1345 = 2.05019. Comparing this value with the critical value of F α;p q,n p = F 0.1;2,32 = 2.47651 we see that there is no evidence to reject the null hypothesis. (b) Here we compare two models: the full model including all explanatory variables X 1, X 2, X 3, X 4, X 5 with the reduced model including only X 1 and X 2. Here q = 3 and p = 6, that is, p q = 3. The null and alternative hypotheses are H 0 : β 3 = β 4 = β 5 = 0 versus H 1 : H 0. To obtain the value of the test statistic we calculate: SS extra = SS R SS red R = 45.1395 44.139 = 1.005 F obs = 1.005/3 0.1345 = 2.47955. We have F 0.1;3,32 = 2.26345 and F 0.05;3,32 = 2.90112. Hence we would reject the null hypothesis at the significance level α = 0.1 but not at α = 0.05. There is weak evidence against the null hypothesis. It is not very clear which model would be better for fitting the petrol consumption. The X 4 predictor is not highly significant when X 1 and X 2 are in the model, also there is only weak evidence against the null hypothesis H 0 : β 3 = β 4 = β 5 = 0 while there is no evidence against H 0 : β 3 = β 5 = 0. The values of R 2 adjusted are not much different, both are high. However, the model with X 1 and X 2 only is more parsimonious. 5