LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Similar documents
LECTURE 5. Introduction to Econometrics. Hypothesis testing

Regression Analysis II

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Chapter 14 Simple Linear Regression (A)

Variance Decomposition and Goodness of Fit

Ch 2: Simple Linear Regression

Applied Econometrics (QEM)

BNAD 276 Lecture 10 Simple Linear Regression Model

Inference for Regression

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Finding Relationships Among Variables

Inference for the Regression Coefficient

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

SIMPLE REGRESSION ANALYSIS. Business Statistics

Ch 3: Multiple Linear Regression

Measuring the fit of the model - SSR

Simple Linear Regression

ECO220Y Simple Regression: Testing the Slope

Inferences for Regression

Multiple Regression Analysis

Econometrics I Lecture 3: The Simple Linear Regression Model

Chapter 4: Regression Models

ECON The Simple Regression Model

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Lecture 6 Multiple Linear Regression, cont.

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

What is a Hypothesis?

Coefficient of Determination

Inference for Regression Simple Linear Regression

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Basic Business Statistics 6 th Edition

Correlation Analysis

Mathematics for Economics MA course

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

Lectures 5 & 6: Hypothesis Testing

LECTURE 11. Introduction to Econometrics. Autocorrelation

Simple Linear Regression Using Ordinary Least Squares

ECON 450 Development Economics

Econometrics Multiple Regression Analysis: Heteroskedasticity

Applied Regression Analysis. Section 2: Multiple Linear Regression

LECTURE 5 HYPOTHESIS TESTING

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Statistics and Quantitative Analysis U4320

Statistics for Managers using Microsoft Excel 6 th Edition

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Simple Linear Regression

AMS 7 Correlation and Regression Lecture 8

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Semester 2, 2015/2016

Chapter 8 Heteroskedasticity

Tests of Linear Restrictions

The Simple Regression Model. Simple Regression Model 1

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Chapter 14. Linear least squares

Inference in Regression Model

Statistics and Quantitative Analysis U4320. Lecture 13: Explaining Variation Prof. Sharyn O Halloran

Applied Regression Analysis

ECON3150/4150 Spring 2016

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

STA121: Applied Regression Analysis

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Chapter 4. Regression Models. Learning Objectives

ECON3150/4150 Spring 2015

Ordinary Least Squares Regression Explained: Vartanian

Lecture 16 - Correlation and Regression

Introduction to Econometrics

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Basic Business Statistics, 10/e

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

Lecture 2 Multiple Regression and Tests

ST Correlation and Regression

STAT 3A03 Applied Regression With SAS Fall 2017

Inference for Regression Inference about the Regression Model and Using the Regression Line

Lecture 10 Multiple Linear Regression

Lecture 3: Multiple Regression

Section 4: Multiple Linear Regression

Hypothesis Testing hypothesis testing approach

CS 5014: Research Methods in Computer Science

Review of Statistics

Econometrics Review questions for exam

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Lecture 14 Simple Linear Regression

Simple and Multiple Linear Regression

The Multiple Regression Model

Section 3: Simple Linear Regression

Statistical Inference with Regression Analysis

Simple Linear Regression: One Qualitative IV

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Transcription:

LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23

ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define the notion of the overall significance of a regression We will introduce a measure of the goodness of fit of a regression (R 2 ) Readings for this week: Studenmund, Chapters 5.5 & 2.4 Wooldridge, Chapters 4 & 3 2 / 23

TESTING MULTIPLE HYPOTHESES Suppose we have a model y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + ε i Suppose we want to test multiple linear hypotheses in this model For example, we want to see if the following restrictions on coefficients hold jointly: β 1 + β 2 = 1 and β 3 = 0 We cannot use a t-test in this case (t-test can be used only for one hypothesis at a time) We will use an F-test 3 / 23

RESTRICTED VS. UNRESTRICTED MODEL We can reformulate the model by plugging the restrictions as if they were true (model under H 0 ) We call this model restricted model as opposed to the unrestricted model The unrestricted model is y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + ε i We derive (on the lecture) the restricted model: y i = β 0 + β 1 x i + ε i, where y i = y i x i2 and x i = x i1 x i2 4 / 23

IDEA OF THE F-TEST If the restrictions are true, then the restricted model fits the data in the same way as the unrestricted model residuals are nearly the same If the restrictions are false, then the restricted model fits the data poorly residuals from the restricted model are much larger than those from the unrestricted model The idea is thus to compare the residuals from the two models 5 / 23

IDEA OF THE F-TEST How to compare residuals in the two models? Calculate the sum of squared residuals in the two models Test if the difference between the two sums is equal to zero (statistically) H0 : the difference is zero (residuals in the two models are the same, restrictions hold) H A : the difference is positive (residuals in the restricted model are bigger, restrictions do not hold) Sum of squared residuals n SSE = (y i ŷ i ) 2 = n e 2 i 6 / 23

F-TEST The test statistic is defined as F = (SSE R SSE U )/J SSE U /(n k) F J,n k, where: SSE R... sum of squared residuals from the restricted model SSE U... sum of squared residuals from the unrestricted model J... number of restrictions n... number of observations k... number of estimated coefficients (including intercept) 7 / 23

F-TEST Distribution FJ,n-k Rejection region : p-value = 5% 5% FJ,n-k,0.95 8 / 23

EXAMPLE We had the model We wanted to test y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + ε i H 0 : { β1 + β 2 = 1 β 3 = 0 vs. H A : { β1 + β 2 1 β 3 0 Under H 0, we obtained the restricted model y i = β 0 + β 1 x i + ε i, where y i = y i x i2 and x i = x i1 x i2 9 / 23

EXAMPLE We run the regression on the unrestricted model, we obtain SSE U We run the regression on the restricted model, we obtain SSE R We have k = 4 and J = 2 We construct the F-statistic F = (SSE R SSE U )/2 SSE U /(n 4) We find the critical value of the F distribution with 2 and n 4 degrees of freedom at the 95% confidence level If F > F 2,n 4,0.95, we reject the null hypothesis we reject that the restrictions hold jointly 10 / 23

OVERALL SIGNIFICANCE OF THE REGRESSION Usually, we are interested in knowing if the model has some explanatory power, i.e. if the independent variables indeed explain the dependent variable We test this using the F-test of the joint significance of all (k 1) slope coefficients: H 0 : β 1 = 0 β 2 = 0. β k 1 = 0 β j 0 vs. H A : for at least one j = 1,..., k 1 11 / 23

OVERALL SIGNIFICANCE OF THE REGRESSION Unrestricted model: y i = β 0 + β 1 x i1 + β 2 x i2 +... + β k 1 x ik 1 + ε i Restricted model: y i = β 0 + ε i F-statistic: F = (SSE R SSE U )/(k 1) SSE U /(n k) F k 1,n k Number of restrictions = k 1 This F-statistic and the corresponding p-value are part of the regression output 12 / 23

EXAMPLE 13 / 23

GOODNESS OF FIT MEASURE We know that education and experience have a significant influence on wages But how important are they in determining wages? How much of difference in wages between people is explained by differences in education and in experience? How well variation in the independent variable(s) explains variation in the dependent variable? This are the questions answered by the goodness of fit measure - R 2 14 / 23

TOTAL AND EXPLAINED VARIATION Total variation in the dependent variable: n (y i y n ) 2 Predicted value of the dependent variable = part that is explained by independent variables: ŷ i = β 0 + β 1 x i (case of regression line - for simplicity of notation) Explained variation in the dependent variable: n (ŷ i y n ) 2 15 / 23

GOODNESS OF FIT - R 2 Denote: n ( ) 2 SST = yi y n... Total Sum of Squares SSR = n (ŷ i y n ) 2... Regression Sum of Squares Define the measure of the goodness of fit: R 2 = SSR SST = Explained variation in y Total variation in y 16 / 23

GOODNESS OF FIT - R 2 In all models: 0 R 2 1 R 2 tells us what percentage of the total variation in the dependent variable is explained by the variation in the independent variable(s) R 2 = 0.3 means that the independent variables can explain 30% of the variation in the dependent variable Higher R 2 means better fit of the regression model (not necessarily a better model!) 17 / 23

DECOMPOSING THE VARIANCE For models with intercept, R 2 can be rewritten using the decomposition of variance. Variance decomposition: n ( ) 2 yi y n = n (ŷ i y n ) 2 + n e 2 i n ( ) 2 SST = yi y n... Total Sum of Squares SSR = n (ŷ i y n ) 2... Regression Sum of Squares SSE = n e 2 i... Sum of Squared Residuals 18 / 23

VARIANCE DECOMPOSITION AND R 2 Variance decomposition: SST = SSR + SSE Intuition: total variation can be divided between the explained variation and the unexplained variation the true value y is a sum of estimated (explained) ŷ and the residual e i (unexplained part) yi = ŷ i + e i We can rewrite R 2 : R 2 = SSR SST = SST SSE SST = 1 SSE SST 19 / 23

ADJUSTED R 2 The sum of squared residuals (SSE) decreases when additional explanatory variables are introduced in the model, whereas total sum of squares (SST) remains the same R 2 = 1 SSE SST increases if we add explanatory variables Models with more variables automatically have better fit. To deal with this problem, we define the adjusted R 2 : R 2 adj = 1 SSE n k SST n 1 ( R 2 ) (k is the number of coefficients including intercept) This measure introduces a punishment for including more explanatory variables 20 / 23

EXAMPLE 21 / 23

F-TEST - REVISITED Let us recall the F-statistic: F = (SSE R SSE U )/J SSE U /(n k) We can use the formula R 2 = 1 SSE SST F-statistic in R 2 form: F J,n k to rewrite the F = (R2 U R2 R )/J (1 R 2 U )/(n k) F J,n k We can use this R 2 form of F-statistic under the condition that SST U = SST R (the dependent variables in restricted and unrestricted models are the same) 22 / 23

SUMMARY We showed how restrictions are incorporated in regression models We explained the idea of the F-test We defined the notion of the overall significance of a regression We introduced the measure or the goodness of fit - R 2 We learned how total variation in the dependent variable can be decomposed 23 / 23