Econometrics. 7) Endogeneity

Similar documents
Econometrics. 5) Dummy variables

Econometrics. 8) Instrumental variables

Econometrics. 4) Statistical inference

LECTURE 11. Introduction to Econometrics. Autocorrelation

Econometrics. 9) Heteroscedasticity and autocorrelation

Multiple Linear Regression CIVL 7012/8012

1 Motivation for Instrumental Variable (IV) Regression

Dealing With Endogeneity

Regression Models - Introduction

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

2 Prediction and Analysis of Variance

Introduction to Econometrics. Heteroskedasticity

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Club Convergence: Some Empirical Issues

Handout 11: Measurement Error

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

2. Linear regression with multiple regressors

Chapter 2: simple regression model

Instrumental Variables, Simultaneous and Systems of Equations

Lecture#12. Instrumental variables regression Causal parameters III

Chapter 14. Simultaneous Equations Models Introduction

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

ECNS 561 Multiple Regression Analysis

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 18th Class 7/2/10

Topic 10: Panel Data Analysis

Homoskedasticity. Var (u X) = σ 2. (23)

The regression model with one stochastic regressor (part II)

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

The Multiple Regression Model

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Simple Regression Model (Assumptions)

1. The OLS Estimator. 1.1 Population model and notation

Lecture 4: Multivariate Regression, Part 2

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Simultaneous Equation Models

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

ECO220Y Simple Regression: Testing the Slope

Multiple Linear Regression

WISE International Masters

Quantitative Methods I: Regression diagnostics

Econometrics Summary Algebraic and Statistical Preliminaries

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Empirical Application of Simple Regression (Chapter 2)

ECON The Simple Regression Model

Essential of Simple regression

ECON 4230 Intermediate Econometric Theory Exam

Econ 510 B. Brown Spring 2014 Final Exam Answers

10) Time series econometrics

ECONOMETRICS HONOR S EXAM REVIEW SESSION

8. Instrumental variables regression

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Steps in Regression Analysis

Introductory Econometrics

Review of Econometrics

Motivation for multiple regression

Econometrics Problem Set 11

Reliability of inference (1 of 2 lectures)

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply

08 Endogenous Right-Hand-Side Variables. Andrius Buteikis,

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

FNCE 926 Empirical Methods in CF

Applied Statistics and Econometrics

Econ107 Applied Econometrics

Intermediate Econometrics

Lab 07 Introduction to Econometrics

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Correlation Analysis

Topic 4: Model Specifications

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Lecture 4: Multivariate Regression, Part 2

Week 11 Heteroskedasticity and Autocorrelation

Advanced Econometrics I

Multivariate Regression: Part I

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Economics 308: Econometrics Professor Moody

Applied Econometrics (QEM)

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Lecture 14. More on using dummy variables (deal with seasonality)

Econometrics of Panel Data

Föreläsning /31

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Econometrics of Panel Data

Using Instrumental Variables to Find Causal Effects in Public Health

Multivariate Regression Analysis

Instrumental Variables

Regression Analysis Tutorial 34 LECTURE / DISCUSSION. Statistical Properties of OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Lecture 6: Dynamic panel models 1

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

Transcription:

30C00200 Econometrics 7) Endogeneity Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen

Today s topics Common types of endogeneity Simultaneity Omitted variables Measurement errors Introduction to instrumental variables

Assumptions <-> properties Finite sample properties Required assumptions Unbiasedness Exogeneity Efficiency Exogeneity, No autocorrelation, Homoscedasticity Asymptotic properties Consistency Exogeneity, No autocorrelation Asymptotic normality Exogeneity, No autocorrelation, Homoscedasticity

OLS estimator of slope β 1 b n ( x x) i i i 1 1 1 n 1 2 ( xi x) i 1 Est. Cov( x, ) Est. Var( x) Interpretation: OLS estimator b 1 is equal to the true parameter β 1 + an error term The greater the sample covariance of x and ε, the greater the error in b 1 The greater the sample variance of x, the smaller the error in b 1

Classic examples of endogeneity Exogeneity: Cov(x, ε) = 0 MLR.4 Zero Conditional Mean: E[ε x] = 0 Examples of violations (endogeneity): Simultaneity bias Example 1: supply and demand functions are not identifiable from the observed data of prices and quantities. Example 2: firm takes its productivity (ε) into account in its use of inputs x. Omitted variable bias Omitted explanatory factor attributed to ε is correlated with x Measurement errors in regressors x Both proxy variable and the disturbance contain the measurement error, and are hence correlated

Simultaneity in demand equation Suppose we want to estimate demand dunction D(p), given time series of observed market prices p and demanded quantities q. Problem: in market equilibrium, price is determined endogenously such that supply equals demand. We cannot identify changes in demand function from changes in supply function. p D1 D2 S1 S2 Regression line q

Simultaneity in production function Consider the production model by Cobb and Douglas (1928) lny i = β 0 + β 1 lnx 1i + β 2 lnx 2i + ε i Where y i is the output of firm and x 1i, x 2i are the inputs (e.g., labor, capital) of firm i. In this model, exp(ε i ) can be interpreted as productivity of firm i. Marschak and Andrews (1944) argue that rational firm managers can learn over time the produductivity level of their firm, and adjust their inputs x accordingly. If input choices depend on ε i, then clearly the exogeneity assumption is violated. This is a common problem in observational data where rational decision makers choose x-variables endogenously.

Omitted variable bias Suppose the underlying data generating process is governed by equation y i = β 0 + β 1 x 1i + β 2 x 2i + ε i However, suppose we ignore x 2i and estimate equation y i = β 0 + β 1 x 1i + ε i where ε i = β 2 x 2i + ε i. Applying the rules of covariance, we have b Est. Cov( x, ) Est. Cov(, ). (, ) x x Est Covx. ( ). ( ) 1 1 2 1 1 1 1 2 Est Var x1 Est. Var( x1 ) Est Var x1

b 1 1 Omitted variable bias Est. Cov( x, x ) 2 Est. Var( x ) 1 2 1 1 Est. Cov( x, ) Est. Var( x We see that omitting x 2 does not cause bias if x 1 and x 2 are uncorrelated (or if β 2 = 0). 1 ) The bias is equal to [. (, )] (, ) ( ) E Est Cov x x Cov x x Eb 1 2 1 2 1 1 2 2 E[ Est. Var( x1)] Var( x1) Note that the direction of bias depends on the sign of β 2 and that of the correlation of x 1 and x 2.

Measurement errors - examples Self-reported data (survey responses) are subjective, and may be biased Example: condition of apartment (weak, satisfactory, good) Human errors in data processing Example: typing errors In many situations, we need to use imperfect proxy variables to capture the effects that are difficult or impossible to measure Example: IQ test score as proxy for ability

Model of measurement errors True but unobserved variable: y i Measurement error represented by random variable: v i E[v] = 0 Var[v] = σ v 2 Observed proxy: q i = y i + v i

Effect of measurement errors in y Assume the correct model is: y i = β 1 + β 2 x i + ε i Suppose we use proxy q as dependent variable insead of y Noting that y i = q i - v i we have y i = q i - v i = β 1 + β 2 x i + ε i Using proxy q i as dependent variable, the model becomes q i = β 1 + β 2 x i + (ε i + v i )

Effect of measurement errors in y Using proxy q as dependent variable, the model is q i = β 1 + β 2 x i + (ε i + v i ) Interpretation: measurement error in the dependent variable is just an additional source of error attributed to the disturbance ε Recall: this was one reason for introducing the disturbance term ε in the first place Assuming x and v are statistically independent (uncorrelated), then measurement errors in y do not affect the statistical properties of the OLS estimator OLS remains unbiased, efficient, and consistent Note: errors v increase the variance of the disturbance term. The larger Var(v), the larger the standard errors of OLS coefficients.

Effect of measurement errors in x Assume next the dependent variable y is free of error. However, the explanatory variable x is subject to error: True regressor: Random measurement error: Imperfect proxy: S i v i x i = S i + v i

Effect of measurement errors in x Assume the true model is: y i = β 0 + β 1 S i + ε i Suppose we use a proxy x as explanatory variable instead of S? Assume x i = S i + v i, where v is random measurement error. Noting that S i = x i - v i we have y i = β 0 + β 1 (x i - v i ) + ε i = β 0 + β 1 x i + (ε i - β 1 v i )

Effect of measurement errors in x OLS estimator for slope β 1 b 1 Est. Cov( x, y) Est. Var( x) Inserting x = S + v and y = β 0 + β 1 S + ε, we find that b 1 Est. Cov( S v, y) 1 Est. Var( S) Est. Var( S v) Est. Var( S) Est. Var( v) 1 1 Est. Var( v) Est. Var( S) Est. Var( v)

Effect of measurement errors in x Expected value Var() v Eb ( 1) 1 1 Var( S) Var( v) Since Var is non-negative, the measurement error v makes the OLS estimator biased towards zero For positive β 1, the OLS estimator is downward biased For negative β 1, the OLS estimator is upward biased Bias does not disapper as n increases: the OLS estimator is statistically inconsistent.

Measurement errors as a source of endogeneity problem Measurement error in the regressor is another example of the violation of the exogeneity assumption: Cov(x, ε) = 0 y i = β 0 + β 1 x i + (ε i β 1 v i ) = β 0 + β 1 (S i +v i )+ (ε i β 1 v i ) Measurement error v is included in both the regressor x and the disturbance term, so the exogeneity assumption fails OLS estimator is biased and inconsistent

Conclusions: Effects of measurement errors Measurement errors in the dependent variable y can be harmlessly attributed to the disturbance term Still, precise data preferred, as the errors in y increase the variance and the standard errors of OLS estimator Measurement errors in the explanatory variables x are more problematic: OLS estimator is biased toward zero OLS estimator is inconsistent

Standard econometric solution to endogeneity: instrumental variables Suppose regressor x correlates with the disturbance term ε. We can try to find an instrumental variable z i that is Highly correlared with x i : Cov(x i, z i ) >> 0 Uncorrelated with disturbance ε i : Cov(ε i, z i ) = 0

Example: hedonic model of housing market Good condition of the apartment may influence the price, but it is difficult to measure the condition objectively. The data provides a self-reported evaluation (poor, satisfactory, good), which may be biased.

Regression Statistics Multiple R 0,869856 R Square 0,75665 Adjusted R Square 0,752007 Standard Error 53672,36 Observations 642 ANOVA df SS MS F Significance F Regression 12 5,63E+12 4,69E+11 162,9793 6,3E-184 Residual 629 1,81E+12 2,88E+09 Total 641 7,45E+12 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 57230,46 16729,81 3,420867 0,000665 24377,41 90083,5 Size m2 4246,643 227,7986 18,64209 4,42E-62 3799,305 4693,981 Bedrooms -35271,4 5328,539-6,61934 7,74E-11-45735,3-24807,5 Age -2732,05 146,6195-18,6336 4,9E-62-3019,97-2444,12 Elevartor 5714,083 5077,285 1,125421 0,26084-4256,4 15684,56 1st floor -8235,33 5811,223-1,41714 0,156936-19647,1 3176,413 Top floor 9726,951 5449,142 1,785043 0,074736-973,762 20427,66 Loc. Espoonlahti -15838,1 11255,1-1,40719 0,159864-37940,2 6263,998 Loc. Kivenlahti -18496,3 8562,531-2,16014 0,031139-35310,9-1681,65 Loc. Leppävaara -4865,67 6511,45-0,74725 0,455193-17652,5 7921,141 Loc. Olari 354,1232 8057,83 0,043948 0,96496-15469,4 16177,63 Loc. Tapiola 129275,4 7124,512 18,14515 1,71E-59 115284,7 143266,1 Condition 1,2,3 16726,68 4626,302 3,615561 0,000324 7641,81 25811,54

Instrumental variable approach Step 1: Find additional instruments (e.g., energy class, sauna) to explain the stated condition. Condition i = β 1 + β 2 EnergyClass i + β 3 Sauna i + β 4 x i + + ε i Apply the estimated model to form a prediction for condition: PredictedCondition i = b 1 + b 2 EnergyClass i + b 3 Sauna i + b 4 x i +

Regression Statistics Multiple R 0,443 R Square 0,196 Adjusted R Square 0,179 Standard Error 0,462 Observations 642,000 ANOVA df SS MS F Significance F Regression 13,000 32,705 2,516 11,768 0,000 Residual 628,000 134,257 0,214 Total 641,000 166,961 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 2,923 0,089 32,850 0,000 2,748 3,098 Sauna 0,031 0,053 0,572 0,567-0,074 0,135 Energy class A-C -0,096 0,086-1,109 0,268-0,265 0,074 Size m2-0,001 0,002-0,724 0,469-0,005 0,002 Bedrooms 0,037 0,046 0,792 0,429-0,054 0,128 Age -0,011 0,001-7,656 0,000-0,014-0,008 Elevartor 0,091 0,044 2,080 0,038 0,005 0,176 1st floor -0,022 0,050-0,433 0,665-0,120 0,077 Top floor 0,017 0,047 0,362 0,717-0,075 0,109 Loc. Espoonlahti 0,047 0,097 0,483 0,629-0,144 0,238 Loc. Kivenlahti -0,009 0,074-0,123 0,903-0,155 0,136 Loc. Leppävaara -0,020 0,057-0,349 0,727-0,132 0,092 Loc. Olari 0,084 0,069 1,210 0,227-0,052 0,220 Loc. Tapiola 0,047 0,062 0,766 0,444-0,074 0,168

RESIDUAL OUTPUT Observation Predicted Condition 1,2,3 Residuals 1 2,4790-0,4790 2 2,4316 0,5684 3 2,4806-0,4806 4 2,4431-0,4431 5 2,4439 0,5561 6 2,4949-1,4949 7 2,5245 0,4755 8 2,5736-0,5736 9 2,3231-0,3231 10 2,4944 0,5056 11 2,5331-0,5331 12 2,5037-0,5037 13 2,3949 0,6051 14 2,6470-0,6470 15 2,4500-0,4500

Instrumental variable approach Step 2: Use the predicted condition in the original regression model instead of the condition Price i = β 1 + β 2 x 2i + + β K PredictedCondition i + ε i

Regression Statistics Multiple R 0,867774 R Square 0,753032 Adjusted R Square 0,748321 Standard Error 54069,82 Observations 642 ANOVA df SS MS F Significance F Regression 12 5,61E+12 4,67E+11 159,8243 6,3E-182 Residual 629 1,84E+12 2,92E+09 Total 641 7,45E+12 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 626423,5 271822,8 2,304529 0,021518 92633,54 1160213 Size m2 3926,654 275,5462 14,25044 3,9E-40 3385,552 4467,756 Bedrooms -26620,9 6768,741-3,93292 9,33E-05-39913 -13328,9 Age -4960,2 1072,248-4,62598 4,53E-06-7065,82-2854,58 Elevartor 23590,85 9938,087 2,373782 0,017906 4075,007 43106,69 1st floor -12667,3 6223,733-2,03533 0,042236-24889,2-445,532 Top floor 12876,1 5691,009 2,262534 0,024005 1700,427 24051,78 Loc. Espoonlahti -7596,31 11999,68-0,63304 0,526936-31160,6 15967,98 Loc. Kivenlahti -21160 8718,879-2,42692 0,015507-38281,7-4038,39 Loc. Leppävaara -7501,13 6678,863-1,12312 0,261817-20616,7 5614,435 Loc. Olari 16238,13 11100,13 1,462877 0,144001-5559,68 38035,93 Loc. Tapiola 137428,2 8161,734 16,83811 8,59E-53 121400,7 153455,7 Predicted Condition 1,2,3-177624 92752,38-1,91504 0,055941-359766 4517,712

Next time Wed 30 Sept Topic: Instrumental variables