Mgmt 469. Causality and Identification

Similar documents
Using Instrumental Variables to Find Causal Effects in Public Health

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

8. Instrumental variables regression

Instrumental Variables and the Problem of Endogeneity

Ec1123 Section 7 Instrumental Variables

SIMULTANEOUS EQUATION MODEL

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Practical Regression: Noise, Heteroskedasticity, and Grouped Data

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Rockefeller College University at Albany

1 Motivation for Instrumental Variable (IV) Regression


An explanation of Two Stage Least Squares

Multiple Linear Regression CIVL 7012/8012

Instrumental variables estimation using heteroskedasticity-based instruments

WISE International Masters

ECO Class 6 Nonparametric Econometrics

Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models. An obvious reason for the endogeneity of explanatory

The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic)

Lecture #11: Introduction to the New Empirical Industrial Organization (NEIO) -

Applied Statistics and Econometrics

Econometrics Problem Set 11

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply

Topics in Applied Econometrics and Development - Spring 2014

Instrumental Variables

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

Handout 12. Endogeneity & Simultaneous Equation Models

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

ECON Introductory Econometrics. Lecture 17: Experiments

Applied Health Economics (for B.Sc.)

Motivation for multiple regression

Simple Regression Model (Assumptions)

EMERGING MARKETS - Lecture 2: Methodology refresher

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

Instrumental variables estimation using heteroskedasticity-based instruments

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Empirical approaches in public economics

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Simultaneous Equation Models

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Dealing With Endogeneity

Session 3-4: Estimating the gravity models

Introduction to Panel Data Analysis

Hypothesis Tests and Confidence Intervals in Multiple Regression

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables

Development. ECON 8830 Anant Nyshadham

Simultaneous equation model (structural form)

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

The regression model with one stochastic regressor (part II)

An overview of applied econometrics

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Lecture Notes Part 7: Systems of Equations

An example to start off with

ECON 497 Midterm Spring

The gravity models for trade research

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Click to edit Master title style

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Honors General Exam Part 3: Econometrics Solutions. Harvard University

Applied Statistics and Econometrics

Econometrics Homework 4 Solutions

ECONOMETRICS HONOR S EXAM REVIEW SESSION

An Introduction to Econometrics. A Self-contained Approach. Frank Westhoff. The MIT Press Cambridge, Massachusetts London, England

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

2. Linear regression with multiple regressors

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

The Simple Linear Regression Model

Instrumental Variables

Session IV Instrumental Variables

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Job Training Partnership Act (JTPA)

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"


Sampling and Sample Size. Shawn Cole Harvard Business School

HOW TO TEST ENDOGENEITY OR EXOGENEITY: AN E-LEARNING HANDS ON SAS

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

1 Correlation and Inference from Regression

1 A Non-technical Introduction to Regression

THE AUSTRALIAN NATIONAL UNIVERSITY. Second Semester Final Examination November, Econometrics II: Econometric Modelling (EMET 2008/6008)

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 18th Class 7/2/10

Instrumental Variables, Simultaneous and Systems of Equations

Chapter 6 Stochastic Regressors

Transcription:

Mgmt 469 Causality and Identification As you have learned by now, a key issue in empirical research is identifying the direction of causality in the relationship between two variables. This problem often arises when it is unclear in a regression whether X causes Y or Y causes X. An example from my own research experience helps to clarify the problem and its solution. The Problem of Supplier Induced Demand Historically, MDs have controlled the medical process. Analysts have long noted that medical prices are higher in markets that have more MDs per capita. Some have hypothesized that when MDs who face competition induce demand" so as to maintain their incomes. The theory of demand inducement is almost dogma among health policy analysts, especially in Canada and Europe. The supplier induced demand (SID) hypothesis may be stated as follows: The SID hypothesis: When the supply of physicians increases, the demand for their services increases. That is, increased supply causes increased demand. Early proponents of SID proposed testing it with the following model: Q d = g(x d, P, Q s ) + ε, where Q d = the quantity of physician services provided (e.g., number of surgeries), X d = one or more demand shifters, P = price Q s = the number of physicians per capita, and ε = the error term. One might want to test this model by gathering the required data and running OLS. A positive, statistically significant coefficient on Q s might be considered evidence of SID. It turns out that this would be a very bad approach. The reason: doubts about causality 1

Consider the following, somewhat related example: There is a positive correlation between construction workers and the number of office buildings under construction. What is the assumed direction of causality? The fundamental causality question in supplier induced demand is: Do more MDs cause high demand (X causes Y), or does high demand cause more MDs (Y causes X)? Unless this causality is disentangled, you cannot interpret the coefficient on Q s. X is an endogenous regressor; i.e., the value of X is correlated with the underlying error in the model. More on endogeneity The concern is this problem is that Y might cause X Suppose you have an observation for which underlying error is positive. This will give a larger value for Y Y causes X and if Y is larger, then X will be larger This implies that X and the error are positively correlated; thus, there is endogeneity bias 2

Identification in a model with endogeneity bias The trick to resolving causality in a model with endogeneity bias is to find one or more variables that have two features: They must be correlated with X They must be uncorrelated with the underlying error - This implies that they are not caused by Y. - Nor can they be variables that you might have included as control variables on the RHS of your original regression These variables are called instruments. The technique for using them in regression is known as instrumental variables (IV) regression or two-stage least squares (TSLS) regression. The following thought experiment may clarify the intuition behind this approach Suppose we could randomly allocate MDs to different places, making sure to allocated more MDs to some places than others. We are now sure that Q s is not related to any factor that might affect Y (and therefore affect X). Thus, Q s is truly exogenous If we run our regression and find that per capita utilization is higher in the places that have more MDs, we have confirmed SID. Causality is not in doubt. 3

IV and TSLS seek the real world equivalent of such randomization. In other words, we need some instruments that cause real world MDs to locate in certain places for reasons that have nothing to do with demand. If demand is high in places that score high on these instruments, then we conclude that MDs do cause demand (instruments MD supply demand) The hard part is finding these instruments - Perhaps we could examine the number and quality of golf courses. It seems doubtful that golf courses are directly related to demand. - If we saw that areas with high golf courses have high demand, we would blame inducement. - Our logic would then be: (golf courses MD supply demand). If we want to use IV/TSLS estimate Q d = g(x d, Q s, P) and resolve causality, we need to proceed as follows: Stage one: regress Q s = f(x s, X d, P) We include the identifiers X s. These are variables that plausibly shift Q s but are otherwise unrelated to Q d. To maximize efficiency, we also include any other variables on the RHS of the original model, in this case X d and P. Recover the predicted values of Q s. Call the predicted values Q s. Stage two: regress Q d = g(x d, Q s, P) The coefficient on Q s indicates the extent to which Q s affects Q d. It is purged of any other relationship between Q d and Q s thanks to using instruments in stage one. Note: the stage 2 regression must be adjusted to account for the fact that Q s is a noisy estimate of Q s. A good software package like Stata will do this for us. In Stata, we type regress Q d X d P (X s X d P) 4

To summarize: TSLS/IV is necessary if causality is in doubt TSLS/IV requires identifiers for the RHS variable whose causative impact is in question. Good identifiers have the following characteristics - They are correlated with the problematic RHS variable They have no causative relationship with the LHS variable - that is, they should not belong in X d or be caused by Y. Second stage replaces the actual value of the problematic variable with its predicted value from the first stage 5

Victor Fuchs used TSLS in his study of inducement. His first stage estimates predicted physician supply by location. He uses predicted supply in the second stage estimate of demand, and obtains an inducement elasticity of.28. (That is, for every 10 percent increase in the supply of surgeons, the number of surgeries per capita would increase by 2.8 percent.) Later studies using more control variables find elasticities of about.10. These findings have led to calls for controls on MD supply and specialist training 6

The SID literature has subsequently reexamined these studies, assessing whether the instruments are valid The key identifiers are $ hotel/capital (Fuchs) and weather conditions (later authors) Are these identifiers? - Most of the time, we rely on theory to be our guide. Do these cause MD locations, but are otherwise unrelated to demand. It is difficult to see why these are any more related to MD supply than they are to demand. - In cases such as these, we can employ econometric techniques to test our instruments. Econometrician Jerry Hausman has developed several tests for identifiers. I describe Hausman tests below. I published a paper that showed flaws in the Fuchs identifiers. - I tested SID in a market where we would be surprised to see it the market for childbirths - I used the same identifiers as Fuchs and others and found an elasticity of about.10. More OB/GYN MDs appeared to lead to more childbirths! The method appears to find inducement even when there should not be any. - This is not a problem with TSLS per se, but does show the need for good identifiers. 7

Implementation Issues (optional) Minimum requirements for TSLS - General rule of thumb is that R 2 from the identifiers should be about.20 or higher. - Should have at least 50 observations Testing the identifiers - To test for endogeneity, you can perform a Hausman test - Hausman tests generally examine whether certain excluded variables are correlated with dependent variables. 1) Regress Y = a + bx + cz + dx * (where X are the original values of the predictors whose causality is in doubt, and X * are the predicted values) 2) Test significance of d using t or F statistic. If d is significant, then you should reject hypothesis that X variables are exogenous, and seek out better identifiers. Here is another Hausman test that tells you whether you have chosen good identifiers 1) Regress Y on the endogenous RHS variable (e.g., #MDs), X d and those variables in X s that are not identifiers. 2) Take the residuals and correlate them with the identifiers. - If the correlation is significant, then the identifiers are predictors of Y, and are not really identifiers - they belong in X d - I did this with weather and hotels and found that they belonged in X d 8

TSLS and Omitted Variable Bias (optional) Because simultaneity bias and omitted variable bias are different flavors of the same problem (endogeneity bias), they share the same solution. Suppose that you are regressing Y on X but are concerned that X is correlated with some omitted factor Z. Suppose further that you cannot collect data on Z. Your coefficient on X may suffer from omitted variable bias. If you can find one or more instruments for X - variables that cause X but have no causative relationship with Z - then you can eliminate the bias. You would use the methods described above, using the predicted value of X instead of the actual value. The strength of TSLS and IV regression is that they avoid bias. The weakness is that you need good instruments. Never trust OLS results when IV/TSLS is required. But do not always expect to be able to perform IV/TSLS 9