Multiple Regression: Example

Similar documents
MODELS WITHOUT AN INTERCEPT

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

1 Use of indicator random variables. (Chapter 8)

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

ST430 Exam 2 Solutions

36-707: Regression Analysis Homework Solutions. Homework 3

Stat 5303 (Oehlert): Randomized Complete Blocks 1

STA 303H1F: Two-way Analysis of Variance Practice Problems

Tests of Linear Restrictions

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

NC Births, ANOVA & F-tests

MATH 423/533 - ASSIGNMENT 4 SOLUTIONS

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lecture 10. Factorial experiments (2-way ANOVA etc)

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

R Output for Linear Models using functions lm(), gls() & glm()

Regression and the 2-Sample t

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

Coefficient of Determination

Inference for Regression

General Linear Statistical Models - Part III

Dealing with Heteroskedasticity

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

R 2 and F -Tests and ANOVA

1 Introduction 1. 2 The Multiple Regression Model 1

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Lecture 6 Multiple Linear Regression, cont.

Multiple Linear Regression

Variance Decomposition and Goodness of Fit

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

MATH 644: Regression Analysis Methods

More about Single Factor Experiments

Multiple Predictor Variables: ANOVA

Workshop 7.4a: Single factor ANOVA

Statistics Lab #6 Factorial ANOVA

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

22s:152 Applied Linear Regression. 1-way ANOVA visual:

Chapter 8 Conclusion

SCHOOL OF MATHEMATICS AND STATISTICS

Ch 2: Simple Linear Regression

Statistics for Engineers Lecture 9 Linear Regression

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

A discussion on multiple regression models

Multiple Linear Regression. Chapter 12

Biostatistics 380 Multiple Regression 1. Multiple Regression

STAT 350: Summer Semester Midterm 1: Solutions

CAS MA575 Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS

Stat 5102 Final Exam May 14, 2015

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

STK4900/ Lecture 3. Program

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

1 Multiple Regression

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

ANOVA (Analysis of Variance) output RLS 11/20/2016

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

ST430 Exam 1 with Answers

One-Way ANOVA Calculations: In-Class Exercise Psychology 311 Spring, 2013

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

The Classical Linear Regression Model

FACTORIAL DESIGNS and NESTED DESIGNS

Week 7 Multiple factors. Ch , Some miscellaneous parts

Partial derivatives BUSINESS MATHEMATICS

Explanatory Variables Must be Linear Independent...

Regression Analysis Chapter 2 Simple Linear Regression

Extensions of One-Way ANOVA.

STAT 572 Assignment 5 - Answers Due: March 2, 2007

Stat 5303 (Oehlert): Models for Interaction 1

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Extensions of One-Way ANOVA.

Psychology 405: Psychometric Theory

Pumpkin Example: Flaws in Diagnostics: Correcting Models

Example of treatment contrasts used by R in estimating ANOVA coefficients

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)

STAT 510 Final Exam Spring 2015

Suppose we needed four batches of formaldehyde, and coulddoonly4runsperbatch. Thisisthena2 4 factorial in 2 2 blocks.

Multiple Predictor Variables: ANOVA

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

General Linear Statistical Models

Multiple regression. Partial regression coefficients

Comparing Nested Models

Introduction and Background to Multilevel Analysis

Lecture 18: Simple Linear Regression

Booklet of Code and Output for STAC32 Final Exam

Stat 401B Exam 2 Fall 2016

Ch 3: Multiple Linear Regression

6. Multiple Linear Regression

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Linear Model Specification in R

Transcription:

Multiple Regression: Example

Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c i is capital input u i is a random error term O i = e β 0 l β 1 i c β 2 i u i 1

Cobb-Douglas Production Function (cont.) Taking natural logs, we have that Y i = β 0 + β 1 x i1 + β 2 x i2 + ɛ i where Y i = ln(o i ) is log output x i1 = ln(l i ) is log labour input x i2 = ln(c i ) is log capital input ɛ i = ln(u i ) is a random error term We will term this model the complete" model. 2

Data: 50 US states plus Dist. of Columbia. Manufacturing sector, 2005. 8 9 10 11 12 13 14 12 14 16 18 x1 y 10 12 14 16 12 14 16 18 x2 y Note that also x 1 and x 2 are highly positively correlated: > cor(x1,x2) [1] 0.960402 3

Analysis in R 1 > fit12<-lm(y x1+x2,data=cobb); summary(fit12) 2 Coefficients: 3 Estimate Std. Error t value Pr(> t ) 4 (Intercept) 3.88760 0.39623 9.812 4.70e-13 *** 5 x1 0.46833 0.09893 4.734 1.98e-05 *** 6 x2 0.52128 0.09689 5.380 2.18e-06 *** 7 --- 8 Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 9 10 Residual standard error: 0.2668 on 48 degrees of freedom 11 Multiple R-squared: 0.9642, Adjusted R-squared: 0.9627 12 F-statistic: 645.9 on 2 and 48 DF, p-value: < 2.2e-16 13 14 > summary(fit12)$sigma 15 [1] 0.2667521 We see from this analysis that SS Res SS Res (β 0, β 1, β 2 ) = (n p) σ 2 = 48 0.2667521 2 = 3.41552 which can be extracted as 16 > summary(fit12)$df[2]*summary(fit12)$sigma^2 17 [1] 3.41552 4

Analysis in R: anova 18 > anova(fit12) 19 Analysis of Variance Table 20 21 Response: y 22 Df Sum Sq Mean Sq F value Pr(>F) 23 x1 1 89.865 89.865 1262.915 < 2.2e-16 *** 24 x2 1 2.060 2.060 28.947 2.183e-06 *** 25 Residuals 48 3.416 0.071 26 --- 27 Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Here we have the decomposition SS R (β 1, β 2 β 0 ) = SS R (β 1 β 0 ) + SS R (β 2 β 0, β 1 ) where line 23 (Sum Sq) : SS R (β 1 β 0 ) = 89.865; line 24 (Sum Sq) : SS R (β 2 β 0, β 1 ) = 2.060 Note from line 25 (Sum Sq), SS Res (β 0, β 1, β 2 ) = 3.416 as before. 5

Analysis in R: anova 28 > fit21<-lm(y x2+x1,data=cobb) 29 > anova(fit21) 30 Analysis of Variance Table 31 32 Response: y 33 Df Sum Sq Mean Sq F value Pr(>F) 34 x2 1 90.330 90.330 1269.450 < 2.2e-16 *** 35 x1 1 1.595 1.595 22.412 1.981e-05 *** 36 Residuals 48 3.416 0.071 37 --- 38 Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Here we have the decomposition SS R (β 1, β 2 β 0 ) = SS R (β 2 β 0 ) + SS R (β 1 β 0, β 2 ) where line 34 (Sum Sq) : SS R (β 2 β 0 ) = 90.330; line 35 (Sum Sq) : SS R (β 1 β 0, β 2 ) = 1.595 Again from line 36 (Sum Sq), SS Res (β 0, β 1, β 2 ) = 3.416 as before. 6

The F-tests carried out using anova are partial F-tests. From the first analysis 39 > anova(fit12) 40 Analysis of Variance Table 41 Response: y 42 Df Sum Sq Mean Sq F value Pr(>F) 43 x1 1 89.865 89.865 1262.915 < 2.2e-16 *** 44 x2 1 2.060 2.060 28.947 2.183e-06 *** 45 Residuals 48 3.416 0.071 The test on line 43 is the comparison of the models Reduced" : E[Y i x i ] = β 0 Full" : E[Y i x i ] = β 0 + β 1 x i1 whilst recognizing that x 2 may also be used to estimate σ 2. 7

We compute where F = (SS Res(β 0 ) SS Res (β 0, β 1 ))/r SS Res (β 0, β 1, β 2 )/(n p) p = 3 (number of coefficients in the complete" model) r = 1 (number of coefficients set to zero in the full" model to obtain the reduced" model) 8

We may access these elements in R as follows: 46 >SSRes0<-anova(lm(y 1,data=Cobb))[1,2] 47 >MSRes012<-anova(lm(y x1+x2,data=cobb))[3,3] 48 >SSRes01<-anova(lm(y x1,data=cobb))[2,2] 49 >F<-((SSRes0-SSRes01)/1)/MSRes012 The anova function returns a matrix, and we must access elements of the matrix using the R notation [1,2],[3,3] and [2,2] respectively. This yields 50 > SSRes0 51 [1] 95.34013 52 > MSRes012 53 [1] 0.07115667 54 > SSRes01 55 [1] 5.475317 56 > F 57 [1] 1262.915 which matches the result on line 43 (F value). 9

The test on line 44 is the comparison of the models We compute where Reduced" : E[Y i x i ] = β 0 + β 1 x i1 Full" : E[Y i x i ] = β 0 + β 1 x i1 + β 2 x i2 F = (SS Res(β 0, β 1 ) SS Res (β 0, β 1, β 2 ))/r SS Res (β 0, β 1, β 2 )/(n p) p = 3 (number of coefficients in the complete" model) r = 1 (number of coefficients set to zero in the full" model to obtain the reduced" model) 10

We may access these elements in R as follows: 58 > SSRes01<-anova(lm(y x1,data=cobb))[2,2] 59 > MSRes012<-anova(lm(y x1+x2,data=cobb))[3,3] 60 > SSRes012<-anova(lm(y x1+x2,data=cobb))[3,2] 61 > F<-((SSRes01-SSRes012)/1)/MSRes012 62 > 63 > SSRes0 64 [1] 95.34013 65 > MSRes012 66 [1] 0.07115667 67 > SSRes01 68 [1] 5.475317 69 > F 70 [1] 28.94735 which matches the result on line 44 (F value). 11

The F-value on line 34 performs the partial F-test for testing Reduced" : E[Y i x i ] = β 0 Full" : E[Y i x i ] = β 0 + β 2 x i2 whilst recognizing that x 1 may also be used to estimate σ 2 using the statistic F = (SS Res(β 0 ) SS Res (β 0, β 2 ))/r SS Res (β 0, β 1, β 2 )/(n p) 71 > SSRes0<-anova(lm(y 1,data=Cobb))[1,2] 72 > MSRes012<-anova(lm(y x1+x2,data=cobb))[3,3] 73 > SSRes02<-anova(lm(y x2,data=cobb))[2,2] 74 > (F<-((SSRes0-SSRes02)/1)/MSRes012) 75 [1] 1269.45 12

The F-value on line 35 performs the partial F-test for testing using the statistic Reduced" : E[Y i x i ] = β 0 + β 2 x i2 Full" : E[Y i x i ] = β 0 + β 1 x i1 + β 2 x i2 F = (SS Res(β 0, β 2 ) SS Res (β 0, β 1, β 2 ))/r SS Res (β 0, β 1, β 2 )/(n p) 76 > SSRes02<-anova(lm(y x2,data=cobb))[2,2] 77 > MSRes012<-anova(lm(y x1+x2,data=cobb))[3,3] 78 > SSRes012<-anova(lm(y x1+x2,data=cobb))[3,2] 79 > (F<-((SSRes02-SSRes012)/1)/MSRes012) 80 [1] 22.41237 13

The conclusions of the above analyses are that when we start with x 1 in the model, and try to add x 2, there is a significant improvement in fit; we see this from line 44: the p-value is 2.183e-06 when we start with x 2 in the model, and try to add x 1, there is a significant improvement in fit; we see this from line 35: the p-value is 1.981e-05 14

Note that, if we considered x 2 irrelevant from the start, we might omit it from any analysis and consider the alternative complete" model. Y i = β 0 + β 1 x i1 + ɛ i. Then to test we would compute where now p = 2. Reduced" : E[Y i x i ] = β 0 Full" : E[Y i x i ] = β 0 + β 1 x i1 F = (SS Res(β 0 ) SS Res (β 0, β 1 ))/r SS Res (β 0, β 1 )/(n p) 15

81 > summary(lm(y x1,data=cobb)) 82 Coefficients: 83 Estimate Std. Error t value Pr(> t ) 84 (Intercept) 4.99902 0.42371 11.80 6.29e-16 *** 85 x1 0.97950 0.03454 28.36 < 2e-16 *** 86 --- 87 Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 88 89 Residual standard error: 0.3343 on 49 degrees of freedom 90 Multiple R-squared: 0.9426, Adjusted R-squared: 0.9414 91 F-statistic: 804.2 on 1 and 49 DF, p-value: < 2.2e-16 92 93 > anova(lm(y x1,data=cobb)) 94 Analysis of Variance Table 95 96 Response: y 97 Df Sum Sq Mean Sq F value Pr(>F) 98 x1 1 89.865 89.865 804.22 < 2.2e-16 *** 99 Residuals 49 5.475 0.112 100 --- 101 Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 16

The numerical result ( 804.22) on lines 91 ( F-statistic) and 98 (F value) is different from that on lines 43 and 57 (1262.915). Both F-tests compare Reduced" : E[Y i x i ] = β 0 Full" : E[Y i x i ] = β 0 + β 1 x i1 however, the results on line 43 and 57 acknowledge a possible influence of x 2 ; this leads to a reduction the MS Res quantity which is in the denominator of the F-statistic. 17

To assess the importance of each of the variables x 1 and x 2 directly, we may use the drop1 command: 102 > fit12<-lm(y x1+x2,data=cobb) 103 > drop1(fit12,test= F ) 104 Single term deletions 105 106 Model: 107 y x1 + x2 108 Df Sum of Sq RSS AIC F value Pr(>F) 109 <none> 3.4155-131.88 110 x1 1 1.5948 5.0103-114.34 22.412 1.981e-05 *** 111 x2 1 2.0598 5.4753-109.81 28.947 2.183e-06 *** 112 --- 113 Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 reproducing the results on lines 35 and 44 respectively. 18