Tests of Linear Restrictions

Similar documents
Variance Decomposition and Goodness of Fit

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Inference with Heteroskedasticity

Inferences on Linear Combinations of Coefficients

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

MODELS WITHOUT AN INTERCEPT

Introduction to Simple Linear Regression

Inference for Regression

MATH 644: Regression Analysis Methods

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Stat 5102 Final Exam May 14, 2015

Comparing Nested Models

ST430 Exam 2 Solutions

36-707: Regression Analysis Homework Solutions. Homework 3

Regression Analysis II

NC Births, ANOVA & F-tests

Applied Regression Analysis

MS&E 226: Small Data

Lecture 6 Multiple Linear Regression, cont.

1 Use of indicator random variables. (Chapter 8)

1 Multiple Regression

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Statistical Inference. Part IV. Statistical Inference

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Regression and the 2-Sample t

Chapter 8 Conclusion

Multiple Regression: Example

SCHOOL OF MATHEMATICS AND STATISTICS

General Linear Statistical Models - Part III

Explanatory Variables Must be Linear Independent...

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Biostatistics 380 Multiple Regression 1. Multiple Regression

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

School of Mathematical Sciences. Question 1

Ch 2: Simple Linear Regression

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Model Specification and Data Problems. Part VIII

R 2 and F -Tests and ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Multiple Regression Part I STAT315, 19-20/3/2014

Booklet of Code and Output for STAC32 Final Exam

Ch 3: Multiple Linear Regression

Multiple Regression: Inference

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Computer Exercise 3 Answers Hypothesis Testing

STAT 572 Assignment 5 - Answers Due: March 2, 2007

movies Name:

Workshop 7.4a: Single factor ANOVA

Statistics and Quantitative Analysis U4320

Linear Probability Model

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Homework 9 Sample Solution

Estimating the return to education for married women mroz.csv: 753 observations and 22 variables

Dealing with Heteroskedasticity

ST430 Exam 1 with Answers

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

14 Multiple Linear Regression

Psychology 405: Psychometric Theory

Coefficient of Determination

Exercise 2 SISG Association Mapping

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

STAT 350 Final (new Material) Review Problems Key Spring 2016

Lecture 10. Factorial experiments (2-way ANOVA etc)

Diagnostics and Transformations Part 2

Week 7 Multiple factors. Ch , Some miscellaneous parts

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

Factorial Analysis of Variance with R

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

Multiple Linear Regression

Stat 401B Exam 2 Fall 2015

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR

R Output for Linear Models using functions lm(), gls() & glm()

CAS MA575 Linear Models

Motor Trend Car Road Analysis

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

A discussion on multiple regression models

Multiple Linear Regression. Chapter 12

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Chapter 16: Understanding Relationships Numerical Data

Multiple Predictor Variables: ANOVA

Statistics GIDP Ph.D. Qualifying Exam Methodology

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

STAT 350: Summer Semester Midterm 1: Solutions

Regression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix

Regression and Models with Multiple Factors. Ch. 17, 18

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Table 1: Fish Biomass data set on 26 streams

Cuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths

Final Exam. Name: Solution:

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Mathematics for Economics MA course

Multiple Predictor Variables: ANOVA

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Transcription:

Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some specific tests of linear restrictions, including: 1. F-test for regression fit: A joint f-test on regression coefficients for all the explanatory variables. The hypotheses are: H 0 : β 1 = 0; β 2 = 0; β 3 = 0;...; β k = 0 H A : At least one β j 0 This is a test with k linear restrictions, all of which are exclusion restrictions. 2. F-test for a subset of regression coefficients: A joint f-test on exclusion restrictions for a subset of regression coefficients. For example, suppose we jointly consider variables x 2 and x 3. The hypotheses are: H 0 : β 2 = 0; β 3 = 0 H A : At least one of β 2 0 and/or β 3 0 The p-value for the F-test for regression fit is included in the summary regression output in R. For both of the above tests, the null hypothesis restricts the regression model and the alternative hypothesis is the general unrestricted regression. Both of these tests involve two regressions where one is a restricted test is nested in the unrestricted test. Whenever one has nested regression models and wishes to compare the explanatory power of the unrestricted model versus the restricted model, the following F-test applies: F = (SSR r SSR u )/q SSR u /(n k 1) where q is equal to the number of restrictions, SSR r is the residual sum of squares from the restricted regression model, SSR u is the residual sum of squares from the unrestricted model, n is the sample size, and k is the number of explanatory variables in the unrestricted model. 2. General Linear Restrictions 2.1 Example: Regression coefficients equal to each other Suppose we have a regression model with two explanatory variables and we want to test the hypothesis whether the two regression coefficients are equal to each other or not: The unrestricted regression model takes the form: H 0 : β 1 = β 2 H 1 : β 1 β 2 y i = β 0 + β 1 x 1,i + β 2 x 2,i + ɛ i It can be estimated in R with a call to lm() of the form: 1

unres_lm <- lm(y ~ x1 + x2) The restricted model takes the form, y i = β 0 + β 1 x 1,i + β 1 x 2,i + ɛ i where we substituted β 1 in for β 2. We combine the terms with like coefficients: y i = β 0 + β 1 (x 1,1 + x 2,i ) + ɛ i This restricted regression can be estimated with a call to lm() of the form: res_lm <- lm(y ~ I(x1 + x2)) where the call to the function I() tells lm() to treat the sum x1 + x2 as a single variable, thereby forcing a single, identical, coefficient on each of x1 and x2. 2.2 Example: Coefficients proportional to one another Suppose again we have a regression model with two explanatory variables, but we are interested in whether one coefficient is twice the magnitude of another coefficient. The null and alternative hypotheses are: H 0 : β 1 = 2β 2 H 1 : β 1 2β 2 The unrestricted regression takes the same form as the example above. The restricted regression can be written in the form, y i = β 0 + 2β 2 x 1,i + β 2 x 2,i + ɛ i where 2β 2 was substituted in place of β 1. The restricted regression can be estimated with a call to lm() of the form, res_lm <- lm(y ~ I(2*x1 + x2)) where again the call to I() is used to combine variables x 1 and x 2 to a single variable that is consistent with our restriction. 2.3 F-test for linear restrictions The F-test will determine whether the unrestricted regression has significantly more explanatory power than the restricted regression. If so, the test will be statistically significant, and the conclusion will be that the restrictions do not hold as described in the null hypothesis. The following call to anova() generally compares two nested regression models and performs the f-test: anova(unres_lm, res_lm) 2

3. Example: Monthly Earnings and Years of Education Let us examine a data set that explores the relationship between total monthly earnings (MonthlyEarnings) and a number of factors that may influence monthly earnings including including each person s IQ (IQ), a measure of knowledge of their job (Knowledge), years of education (YearsEdu), years experience (YearsExperience), years at current job (Tenure), mother s education (MomEdu), and father s education (DadEdu). The code below downloads a CSV file that includes data on the above variables from 1980 for 935 individuals and assigns it to a data set that we name wages. wages <- read.csv("http://murraylax.org/datasets/wage2.csv"); The following call to lm() estimates an unrestricted multiple regression predicting monthly earnings based on the seven explanatory variables given above. lm_unres <- lm(wages$monthlyearnings ~ wages$iq + wages$knowledge + wages$yearsedu + wages$yearsexperience + wages$tenure + as.numeric(wages$momedu) + as.numeric(wages$dadedu)) The as.numeric() calls around mothers education and fathers education were necessary as R would otherwise interpret these variables as categorical variables because of how the data was coded. 3.1 Regression coefficients equal to each other Let us test the hypothesis that a mother s number of years of education has the same influence on monthly earnings as a father s number of years of education. The null and alternative hypotheses are: H 0 : β MomEdu = β DadEdu H 1 : β MomEdu β DadEdu The restricted regression is estimated with the following call to lm(): lm_res <- lm(wages$monthlyearnings ~ wages$iq + wages$knowledge + wages$yearsedu + wages$yearsexperience + wages$tenure + I( as.numeric(wages$momedu) + as.numeric(wages$dadedu) ) ) We compare the relative explanatory power of the restricted and unrestricted regressions with the followin call to anova(), anova(lm_unres, lm_res) Analysis of Variance Table Model 1: wages$monthlyearnings ~ wages$iq + wages$knowledge + wages$yearsedu + wages$yearsexperience + wages$tenure + as.numeric(wages$momedu) + as.numeric(wages$dadedu) Model 2: wages$monthlyearnings ~ wages$iq + wages$knowledge + wages$yearsedu + wages$yearsexperience + wages$tenure + I(as.numeric(wages$MomEdu) + 3

as.numeric(wages$dadedu)) Res.Df RSS Df Sum of Sq F Pr(>F) 1 927 123432640 2 928 123518160-1 -85520 0.6423 0.4231 The p-value to the F-test is 0.4231. This far exceeds a signficance level of 0.05, so we fail to reject the null hypothesis. We fail to find statistical evidence that the coefficients for mothers education is different from fathers education. 3.2 Regression coefficients proportional to each other Let us examine the unrestricted coefficient estimates: summary(lm_unres) Call: lm(formula = wages$monthlyearnings ~ wages$iq + wages$knowledge + wages$yearsedu + wages$yearsexperience + wages$tenure + as.numeric(wages$momedu) + as.numeric(wages$dadedu)) Residuals: Min 1Q Median 3Q Max -850.67-235.04-46.71 189.00 2235.79 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -467.194 118.967-3.927 9.24e-05 *** wages$iq 3.609 0.965 3.739 0.000196 *** wages$knowledge 8.002 1.829 4.374 1.36e-05 *** wages$yearsedu 46.778 7.292 6.415 2.24e-10 *** wages$yearsexperience 12.077 3.247 3.719 0.000212 *** wages$tenure 6.589 2.460 2.678 0.007534 ** as.numeric(wages$momedu) -3.693 2.032-1.817 0.069495. as.numeric(wages$dadedu) -1.189 1.925-0.618 0.537007 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 364.9 on 927 degrees of freedom Multiple R-squared: 0.1918, Adjusted R-squared: 0.1856 F-statistic: 31.42 on 7 and 927 DF, p-value: < 2.2e-16 The return to monthly income from an additional year of education is $46.78. This appears to be more than double the effect of experience ($12.08) and tenure ($6.59) combined. Let us see if there is statistical evidence for such a claim. The f-test is only a two-sided test, so we test the hypotheses: H 0 : β Y earsedu = 2(β Y earsexperience + β Y earst enure ) H 1 : β Y earsedu 2(β Y earsexperience + β Y earst enure ) Substituting the restrictions in the null hypothesis into the model, and combining like coefficients, yields a linear model of the form, 4

lm_res <- lm(wages$monthlyearnings ~ wages$iq + wages$knowledge + I( wages$yearsexperience + 2*wages$YearsEdu ) + I( wages$tenure + 2*wages$YearsEdu ) + as.numeric(wages$momedu) + as.numeric(wages$dadedu)) Finally we call anova() to compare the restricted and unrestricted model. anova(lm_res, lm_unres) Analysis of Variance Table Model 1: wages$monthlyearnings ~ wages$iq + wages$knowledge + I(wages$YearsExperience + 2 * wages$yearsedu) + I(wages$Tenure + 2 * wages$yearsedu) + as.numeric(wages$momedu) + as.numeric(wages$dadedu) Model 2: wages$monthlyearnings ~ wages$iq + wages$knowledge + wages$yearsedu + wages$yearsexperience + wages$tenure + as.numeric(wages$momedu) + as.numeric(wages$dadedu) Res.Df RSS Df Sum of Sq F Pr(>F) 1 928 123616450 2 927 123432640 1 183811 1.3805 0.2403 The p-value is equal to 0.2403. We fail to reject the null hypothesis, so we fail to find statistical evidence that the return to education on monthly earnings is different than twice the combined impact from experience and tenure. 5