ST430 Exam 2 Solutions

Similar documents
ST430 Exam 1 with Answers

MATH 644: Regression Analysis Methods

Inference for Regression

Stat 5102 Final Exam May 14, 2015

Lecture 6 Multiple Linear Regression, cont.

Ch 2: Simple Linear Regression

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Simple Linear Regression

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Comparing Nested Models

Workshop 7.4a: Single factor ANOVA

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Stat 401B Final Exam Fall 2015

Chapter 4: Regression Models

MS&E 226: Small Data

Variance Decomposition and Goodness of Fit

22s:152 Applied Linear Regression

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

Multiple Linear Regression

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Ch 3: Multiple Linear Regression

Tests of Linear Restrictions

SCHOOL OF MATHEMATICS AND STATISTICS

Regression and the 2-Sample t

Coefficient of Determination

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Stat 401B Exam 2 Fall 2015

Regression Analysis II

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

MODELS WITHOUT AN INTERCEPT

1 Multiple Regression

Multiple Linear Regression. Chapter 12

Biostatistics 380 Multiple Regression 1. Multiple Regression

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

Chapter 8 Conclusion

Chapter 4. Regression Models. Learning Objectives

Concordia University (5+5)Q 1.

Multiple Regression: Example

A discussion on multiple regression models

STAT 572 Assignment 5 - Answers Due: March 2, 2007

Categorical Predictor Variables

1 Use of indicator random variables. (Chapter 8)

Simple and Multiple Linear Regression

Multiple Regression Part I STAT315, 19-20/3/2014

Chapter 13. Multiple Regression and Model Building

36-707: Regression Analysis Homework Solutions. Homework 3

STATISTICS 110/201 PRACTICE FINAL EXAM

Simple linear regression

Example: 1982 State SAT Scores (First year state by state data available)

Introduction and Background to Multilevel Analysis

Density Temp vs Ratio. temp

Chapter 3: Multiple Regression. August 14, 2018

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1 Introduction 1. 2 The Multiple Regression Model 1

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Exam Applied Statistical Regression. Good Luck!

CAS MA575 Linear Models

Lecture 18: Simple Linear Regression

Amount of Weight Gained. Regained 5 lb or Less Regained More Than 5 lb Total. In Person Online Newsletter

Basic Business Statistics, 10/e

Multiple linear regression S6

Swarthmore Honors Exam 2012: Statistics

6. Multiple Linear Regression

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Dealing with Heteroskedasticity

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

Lecture 3: Inference in SLR

Applied Regression Analysis

Linear Regression Model. Badr Missaoui

Chapter 12: Linear regression II

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Correlation and simple linear regression S5

Stat 401B Exam 2 Fall 2016

STA 6167 Exam 1 Spring 2016 PRINT Name

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Inferences for Regression

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Ch 13 & 14 - Regression Analysis

Lecture 4 Multiple linear regression

Homework 9 Sample Solution

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Model Building Chap 5 p251

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Linear Model Specification in R

Lecture 1: Linear Models and Applications

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Transcription:

ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving or receiving assistance from other students is not allowed. Show work to receive partial credit! Partial credit will be given, but only for work written on the exam. The total points are 25. Good luck!

1. Consider the regression with response Y = teacher s performance rating and two predictors: 1) a continuous predictor: X 1 = the teacher s years of experience; 2) another continuous predictor: X 2 = the teacher s age. We consider three models: Model 1 : Model 2 : Model 3 : Y = β 0 + β 1 X 1 + ɛ Y = β 0 + β 1 X 1 + β 2 X 2 + ɛ Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 + ɛ (a) (1 point) Explain how the interpretation of β 1 differs between Model 1 and Model 2. Answer: In Model 2, unlike Model 1, β 1 is the effect of teacher s years of experience while accounting for teacher s age. (b) (1 point) Give the change in the expected response if age (X 2 ) increases by one while years of experience (X 1 ) is held constant for Model 2. Answer: [β 0 + β 1 X 1 + β 2 (X 2 + 1)] [β 0 + β 1 X 1 + β 2 X 2 ] = β 2. (c) (1 point) Give the change in the expected response if age (X 2 ) increases by one while years of experience (X 1 ) is held constant for Model 3. Answer: [β 0 + β 1 X 1 + β 2 (X 2 + 1) + β 3 X 1 (X 2 + 1)] [β 0 + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 ] =β 2 + β 3 X 1. (d) (1 point) If age (X 2 ) has positive effect on performance rating for teachers in their first year (X 1 = 0) but no effect for teacher in their tenth year (X 1 = 9), should β 3 in Model 3 be positive, negative or zero? Answer: The effect of age (X 2 ) is β 2 + β 3 X 1. A positive effect in the first year means that β 2 + β 3 0 = β 2 > 0. No effect in the tenth year means that β 2 + β 3 9 = 0. It follows that β 3 = β 2 /9 must be negative.

2. Consider the sale prices of residential properties (denoted by Y and in the unit of 1, 000 dollars) in one neighborhood as a function of two predictors: land value (denoted by X 1 and in the unit of 1, 000 dollars) and improvement value (denoted by X 2 and and in the unit of 1, 000 dollars). Consider the model Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 + ɛ. Below is the summary and anova table of the fitted model. Call: lm(formula = Y ~ X1 * X2) Residuals: Min 1Q Median 3Q Max -117.848-45.208-1.167 24.546 228.616 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 22.782754 45.196951 0.504 0.6163 X1 0.108923 0.833835 0.131 0.8966 X2 1.213412 0.191601 6.333 5.71e-08 *** X1:X2 0.004930 0.002282 2.160 0.0354 * Residual standard error: 62.64 on 52 degrees of freedom Multiple R-squared: 0.9357,Adjusted R-squared: 0.932 F-statistic: 252.4 on 3 and 52 DF, p-value: < 2.2e-16 Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) X1 1 1814019 1814019 462.3601 < 2e-16 *** X2 1 1138831 1138831 290.2670 < 2e-16 *** X1:X2 1 18307 18307 4.6662 0.03539 * Residuals 52 204016 3923 (a) (1 point) Give the estimated least squares regression equation. Answer: E(Y ) = 22.782754 + 0.108923X 1 + 1.213412X 2 + 0.004930X 1 X 2.

(b) (2 points) Conduct a test if there is an interaction effect between land value and improvement value: specify the null and alternative, give the testing statistic, specify its null distribution, and make a conclusion. Answer: Null H 0 : β 3 = 0; alternative H a : β 3 0. The testing statistic is the t- testing statistic and its null distribution is a t-distribution with 52 degrees of freedom. The t-statistic has a value of 2.16. Since the p-value is 0.0354 < 0.05, we reject the null hypothesis at the 0.05 significance level. An alternative testing statistic is the F -testing statistic and its null distribution is a F -distribution with 1 and 52 degrees of freedom. The F -statistic has a value of 4.6662. Since the p-value is 0.03539 < 0.05, we reject the null hypothesis at the 0.05 significance level. (c) (1 points) Based on the model, what would you predict for the sale price of a residential property with land value 50, 000 dollars and improvement value 100, 000 dollars? Answer: The unit of variables are 1000 dollars, hence X 1 = 50 and X 2 = 100. The predicted sale price is 22.782754 + 50 0.108923 + 1.213412 100 + 0.004930 50 100 = 174.2201, which gives 174220.1 dollars. 3. Consider a two-factor regression model with the mean performance of a diesel engine as a function of fuel type (three types: F 1, F 2 and F 3 ) and engine brand (two brands: B 1 and B 2 ). (a) (1 point) Give the main effects model without interactions. Identify and code all dummy variables. Answer: Let X 1 be 1 if fuel type F 2 and 0 if not; X 2 be 1 if fuel type F 3 and 0 if not. Let X 3 be 1 if engine brand B 2 and 0 if engine brand B 1. Then the main effects model is E(Y ) = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3. (b) (2 points) Interpret all of model (a) s parameters (the β j s) in the context of the problem. Answer: β 0 is the mean performance of engines of fuel type F 1 and brand B 1. β 1 is the difference in mean performance between engines of fuel types F 2 and F 1 when the brands are the same. β 2 is the difference in mean performance between engines of fuel types F 3 and F 1 when the brands are the same. β 3 is the difference in mean performance between engines of brands B 2 and B 1 when the fuel types are the same. (c) (2 points) Give the two-factor model with interactions. Identify and code all dummy variables and specify the reference group.

Answer: With similar coding of dummy variables in (a), the model is E(Y ) = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 1 X 3 + β 5 X 2 X 3 and the reference group is engines of fuel type F 1 and brand B 1. (d) (2 points) Fill in the following table of expected responses E(Y ) using your model from (c). Note: the entries in the table should be combinations of β j s. E(Y ) Engine brand B 1 Engine brand B 2 Fuel type F 1 β 0 β 0 + β 3 Fuel type F 2 β 0 + β 1 β 0 + β 1 + β 3 + β 4 Fuel type F 3 β 0 + β 2 β 0 + β 2 + β 3 + β 5 (e) (4 points) The analysis of variance table for model (a) is below: Analysis of Variance Table Response: PERFORM Df Sum Sq Mean Sq F value Pr(>F) FUEL 2 170.17 85.08 0.4501 0.65280 BRAND 1 688.09 688.09 3.6397 0.09285. Residuals 8 1512.41 189.05 And the analyis of variance table for model (c) is below: Analysis of Variance Table Response: PERFORM Df Sum Sq Mean Sq F value Pr(>F) FUEL 2 170.17 85.08 7.5443 0.0230306 * BRAND 1 688.09 688.09 61.0130 0.0002323 *** FUEL:BRAND 2 1444.74 722.37 64.0526 8.956e-05 *** Residuals 6 67.67 11.28 Based on the above outputs and models (a) and (c), conduct a test of interaction efffect between fuel type and engine brand: specify the null and alternative, calculate the testing statistic, specify the null distribution of the testing statistic, and make a conclusion. Note: at the 0.05 level, the cutoff value here is 5.14. Answer: Based on model (c), the null is H 0 : β 4 = β 5 = 0 and the alternative is H a : at least one of β 4 and β 5 is non-zero. The testing statistic is the F -statistic, defined as F = (SSE r SSE c )/2 SSE c /6 = (1512.41 67.67)/2 67.67/6 = 64.05.

The null distribution is a F -distribution with 2 and 6 degrees of freedom. Since 64.05 > 5.14, we reject the null hypothesis at the 0.05 significance level. 4. (a) (2 points) Suppose Y is the response, X 1, X 2 are two continuous predictors, and X 3 is a categorical predictor with three levels: A, B and C. Give the complete second order models using all predictors. Answer: Let X 3 = 1 if the categorical predictor has level B and 0 if not; X 4 = 1 if the categorical predictor has level C and 0 if not. Then the complete second order model is E(Y ) =β 0 + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 + β 4 X 2 1 + β 5 X 2 2 + β 6 X 3 + β 7 X 4 + β 8 X 1 X 3 + β 9 X 1 X 4 + β 10 X 2 X 3 + β 11 X 2 X 4 + β 12 X 1 X 2 X 3 + β 13 X 1 X 2 X 4 + β 14 X 2 1X 3 + β 15 X 2 1X 4 + β 16 X 2 2X 3 + β 17 X 2 2X 4. (b) (1 point) We would like to test if there is an interaction effect between continuous predictors and categorical predictor in model (a) above. Specify the null and alternative. Answer: H 0 : β 8 = β 9 = = β 17 = 0; H a : at least one of β 8,..., β 17 is not zero. (c) (1 point) Give the formula for Akaike s information criterion AIC (Explain any abbreviation or symbol). Answer: The formula is n log(ˆσ 2 ) + 2(k + 1), where n is the sample size, k is the number of predictors, and ˆσ 2 estimated error variance. is the (d) (1 point) Explain why the AIC criteria rewards models that fit the data well and rewards models with a small number of predictors. Answer: Models with small AIC are preferred. Models that fit the data well will have small ˆσ 2 and hence small AIC; models that have a small number of predictors will also have small AIC. (e) (1 point) Explain why the stepwise variable selection procedure generally should not be used for building second order models. Answer: Because when we build second order models, if a second order term is included in the model then all lower order terms have to be included. But the stepwise variable selection procedure does not necessarily follow the rule.