Lecture#12. Instrumental variables regression Causal parameters III

Similar documents
Exercices for Applied Econometrics A

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Empirical Application of Simple Regression (Chapter 2)

Binary Dependent Variables

Dealing With and Understanding Endogeneity

Эконометрика, , 4 модуль Семинар Для Группы Э_Б2015_Э_3 Семинарист О.А.Демидова

Econometrics. 8) Instrumental variables

Handout 12. Endogeneity & Simultaneous Equation Models

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

Lecture 4: Multivariate Regression, Part 2

Practice 2SLS with Artificial Data Part 1

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Specification Error: Omitted and Extraneous Variables

Lecture 4: Multivariate Regression, Part 2

Problem Set 10: Panel Data

Instrumental Variables, Simultaneous and Systems of Equations

An explanation of Two Stage Least Squares

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

GMM Estimation in Stata

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

General Linear Model (Chapter 4)

ECO220Y Simple Regression: Testing the Slope

Handout 11: Measurement Error

Lecture 8: Instrumental Variables Estimation

Econometrics II Censoring & Truncation. May 5, 2011

Lab 07 Introduction to Econometrics

Mediation Analysis: OLS vs. SUR vs. 3SLS Note by Hubert Gatignon July 7, 2013, updated November 15, 2013

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Practice exam questions

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Essential of Simple regression

Problem set - Selection and Diff-in-Diff

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

STATISTICS 110/201 PRACTICE FINAL EXAM

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

Correlation and Simple Linear Regression

Lab 10 - Binary Variables

Applied Statistics and Econometrics

Lab 6 - Simple Regression

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Section Least Squares Regression

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply

Statistical Modelling in Stata 5: Linear Models

Monday 7 th Febraury 2005

. *DEFINITIONS OF ARTIFICIAL DATA SET. mat m=(12,20,0) /*matrix of means of RHS vars: edu, exp, error*/

Graduate Econometrics Lecture 4: Heteroskedasticity

Lecture 7: OLS with qualitative information

Lecture 3 Linear random intercept models

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

ECON3150/4150 Spring 2016

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!

Statistical Inference with Regression Analysis

ECON Introductory Econometrics. Lecture 13: Internal and external validity

Econometrics. 7) Endogeneity

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

Ecmt 675: Econometrics I

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

1 The basics of panel data

Sociology 63993, Exam 2 Answer Key [DRAFT] March 27, 2015 Richard Williams, University of Notre Dame,

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

Rockefeller College University at Albany

8. Nonstandard standard error issues 8.1. The bias of robust standard errors

sociology 362 regression

Lecture #8 & #9 Multiple regression

Quantitative Methods Final Exam (2017/1)

Introduction to Econometrics

Jeffrey M. Wooldridge Michigan State University

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Extended regression models using Stata 15

Applied Statistics and Econometrics

Problem Set 1 ANSWERS

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

sociology 362 regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Econometrics. 9) Heteroscedasticity and autocorrelation

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Dynamic Panels. Chapter Introduction Autoregressive Model

08 Endogenous Right-Hand-Side Variables. Andrius Buteikis,

Homework Solutions Applied Logistic Regression

Multiple Regression: Inference

Lecture#17. Time series III

Autoregressive models with distributed lags (ADL)

14.32 Final : Spring 2001

ECON Introductory Econometrics. Lecture 17: Experiments

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Sociology Exam 2 Answer Key March 30, 2012

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

Fixed and Random Effects Models: Vartanian, SW 683

Nonrecursive models (Extended Version) Richard Williams, University of Notre Dame, Last revised April 6, 2015

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Dynamics of Firms and Trade in General Equilibrium. Robert Dekle, Hyeok Jeong and Nobuhiro Kiyotaki USC, Seoul National University and Princeton

Control Function and Related Methods: Nonlinear Models

Transcription:

Lecture#12 Instrumental variables regression Causal parameters III 1

Demand experiment, market data analysis & simultaneous causality 2

Simultaneous causality Your task is to estimate the demand function for a homogenous good sold at unit price. How would you do this in an experiment? By changing the price yourself ( at random ) and observing how many units are sold at each price. 3

Experiment What does choosing prices at random mean? 1. We offer different randomized prices to individual consumers. 2. We offer different randomized prices each to a group of consumers. 4

Experiment We record quantity sold at different prices. We study the outcomes. 5

Experiment Let s run 10 000 experiments. We choose the price at random as follows:. set obs 10000 number of observations (_N) was 0, now 10,000. set seed 85729476. gen p_exp = 103.5131 + invnorm(uniform()) * 5 6

96.901564 98.901564 p_exp 32.978439 34.978439 q_exp p_exp q_exp 97.901564 33.978439 7

p_exp 100 105 110 115 120 26 28 30 32 34 q_exp 8

90 p_exp 100 110 120 26 28 30 32 34 36 q_exp 9

80 90 p_exp 100 110 120 25 30 35 40 q_exp 10

80 90 p_exp 100 110 120 25 30 35 40 q_exp 11

80 90 100 110 120 25 30 35 40 q_exp p-q linear regr. 12

Regression analysis. regr q_exp p_exp Source SS df MS Number of obs = 10,000 F(1, 9998) = 62500.28 Model 27537.4966 1 27537.4966 Prob > F = 0.0000 Residual 4405.09861 9,998.440597981 R-squared = 0.8621 Adj R-squared = 0.8621 Total 31942.5952 9,999 3.19457898 Root MSE =.66378 q_exp Coef. Std. Err. t P> t [95% Conf. Interval] p_exp -.3332208.0013329-250.00 0.000 -.3358335 -.3306081 _cons 66.66225.1381827 482.42 0.000 66.39138 66.93311 13

Robustness analysis. regr q_exp p_exp* Source SS df MS Number of obs = 10,000 F(2, 9997) = 31252.50 Model 27538.1636 2 13769.0818 Prob > F = 0.0000 Residual 4404.43159 9,997.440575332 R-squared = 0.8621 Adj R-squared = 0.8621 Total 31942.5952 9,999 3.19457898 Root MSE =.66376 q_exp Coef. Std. Err. t P> t [95% Conf. Interval] p_exp -.3824384.0400223-9.56 0.000 -.4608901 -.3039867 p_exp2.0002378.0001933 1.23 0.219 -.000141.0006166 _cons 69.20313 2.069639 33.44 0.000 65.14622 73.26004 14

Parameters for inverse demand function. sum alpha_exp beta_exp Variable Obs Mean Std. Dev. Min Max alpha_exp 10,000 200.1629 0 200.1629 200.1629 beta_exp 10,000 3.003041 0 3.003041 3.003041 15

Market outcomes Assume you are an outside observer of a market say a prospective buyer of a firm or the competition authority. you cannot run experiments. You would still want to know demand (to calculate e.g. price-cost margins, consumer surplus). 16

Market outcomes We collect data from the market. We observe pairs (P i, Q i ), i = market. Let s think how such pairs are determined, using a simple monopoly model. 17

Market outcome data 18

101.93862 103.93862 p p_exp q_exp 97.901564 33.978439 p q 102.93862 32.288517 31.288517 33.288517 q 19

100 102 104 106 108 p 31 31.5 32 32.5 q 20

100 102 104 106 108 p 31 31.5 32 32.5 33 33.5 q 21

98 100 102 104 106 108 p 30 31 32 33 34 q 22

98 100 102 104 106 108 p 30 31 32 33 34 q 23

98 100 102 104 106 108 30 31 32 33 34 q p-q linear regr. 24

80 90 Fitted values 100 110 120 25 30 35 40 market data experimental data 25

Regression using market data. regr p q Source SS df MS Number of obs = 10,000 F(1, 9998) = 135.06 Model 297.257566 1 297.257566 Prob > F = 0.0000 Residual 22004.1539 9,998 2.20085556 R-squared = 0.0133 Adj R-squared = 0.0132 Total 22301.4115 9,999 2.23036418 Root MSE = 1.4835 p Coef. Std. Err. t P> t [95% Conf. Interval] q -.3447938.029668-11.62 0.000 -.4029491 -.2866384 _cons 114.6012.9542 120.10 0.000 112.7308 116.4716 26

Parameter comparison for inverse demand fcn. sum alpha_exp beta_exp alpha_ols beta_ols Variable Obs Mean Std. Dev. Min Max alpha_exp 10,000 200.1629 0 200.1629 200.1629 beta_exp 10,000 3.003041 0 3.003041 3.003041 alpha_ols 10,000 114.6012 1.42e-14 114.6012 114.6012 beta_ols 10,000.3446774 0.3446774.3446774 27

Challenge with market data How is price determined? By the firm(s). Firm(s) take market conditions into account: 1. demand 2. (variable) costs of production 3. competition. 28

Challenge with market data Price quantity pairs are a leading example of simultaneous causality. This generalizes to more complicated markets with 1. differentiated goods 2. multiproduct firms. 3. endogenous entry and exit 4. dynamic considerations 5. advertising 6. 29

Market data - solution Need to address simultaneous causality. need to understand and exploit determinants of price and quantity. How did the experiment solve the problem? By having the researcher shift (=change) prices instead of the firm. 30

Linear monopoly Demand function Q i = a bp i + ε i a = average intercept b = slope ε i = market-specific deviation from the average intercept. NOTE: we assume the firm observes all these parameters. 31

Linear monopoly Inverse demand function P i = a b 1 b Q i + 1 b ε i = α βq i + ε i 32

Linear monopoly Supply function how to get? Need to specify costs of production: constant marginal cost c i = c 0 + c 1 z i + η i 33

Linear monopoly c i = c 0 + c 1 z i + η i c 0 = average of marginal cost when w i =0 z i = a component of marginal cost that varies across markets (cost of raw materials / unit of output, cost of labor / unit of output,..) η i = shock to average marginal cost / deviation from the avg. 34

Linear monopoly Profit function max Pi π i = (P i -c i )Q i Equilibrium price P i = a 2b + 1 2 c i + 1 2b ε i 35

Linear monopoly Equilibrium price is a function of P i = a 2b + 1 2 (c 0+c 1 z i + η i ) + 1 2b ε i 1. (fixed) demand parameters a and b 2. (fixed) supply side parameters c 0, c 1 3. variable cost determinant z i 4. cost shock η i 5. demand shock ε i 36

Linear monopoly Equilibrium quantity is a function of Q i = a 2 b 2 c i + 1 2 ε i 1. (fixed) demand parameters a and b 2. (fixed) supply side parameters c 0, c 1 3. variable cost determinant z i 4. cost shock η i 5. demand shock ε i 37

Market data - challenge Both eq. price and eq. quantity are functions of 1. demand shock ε i and 2. supply shock η i simultaneous causality. 38

Market data - solution We want to learn the demand curve. Could we mimic the experimental approach with market data? Needed: something that shifts firm s (supply) decision at random. Random = without being affected by demand shock ε i. 39

Experiment #2 Imagine the firm still sets the price, But we choose (at random) z i = z i exp. Recall that c i = c 0 + c 1 z i + η i we shift firm s marginal cost. 40

Experiment #2 Now the firm sets each time the price P i = a 2b + 1 2 (c 0+c 1 z i exp + η i ) + 1 2b ε i equivalent to running an experiment. Change in price due to (known) change in z i exp. 41

Experiment #2 Imagine we raise z i exp by 1 unit. By how much does 1. marginal cost c i = c 0 + c 1 z i + η i change? c 1 2. price change? 1 2 c 1 (by eqn on slide #41) 3. demand change? b 1 2 c 1 (by eqn on slide #37) 42

Experiment #2 How could we get the slope of the demand function from these changes? Yes, by dividing the change in quantity by the change in demand: b 1 2 c 1 1 2 c 1 = b 43

Experiment #2 How could we get those numbers? 1. Regress P i on z i exp to get 1 2 c 1. P i = γ 0 + γ 1 z i + e i γ 1 = 1 2 c 1 (by eqn on slide #41) 44

Experiment #2 2. Regress Q i on z i exp to get b 1 2 c 1. Q i = μ 0 + μ 1 z i + w i μ 1 = b 1 2 c 1 (by eqn on slide #37) These are called reduced form equations. 45

Experiment #2 When would this work? 1. z i exp has to impact firm decision, i.e., have an effect on c i. c 1 cannot be (insignificantly different from) zero. 2. z i exp may not have an effect on Q i directly, but only via c i. 46

Let s regress P on z. regr p z Source SS df MS Number of obs = 10,000 F(1, 9998) = 7814.03 Model 9783.4964 1 9783.4964 Prob > F = 0.0000 Residual 12517.9151 9,998 1.25204192 R-squared = 0.4387 Adj R-squared = 0.4386 Total 22301.4115 9,999 2.23036418 Root MSE = 1.1189 p Coef. Std. Err. t P> t [95% Conf. Interval] z.4986242.0056407 88.40 0.000.4875673.5096812 _cons 101.0138.0304066 3322.11 0.000 100.9542 101.0734. scalar red_1 = _b[z] 47

Let s regress Q on z. regr q z Source SS df MS Number of obs = 10,000 F(1, 9998) = 8117.89 Model 1120.46318 1 1120.46318 Prob > F = 0.0000 Residual 1379.96352 9,998.138023957 R-squared = 0.4481 Adj R-squared = 0.4481 Total 2500.42671 9,999.250067677 Root MSE =.37152 q Coef. Std. Err. t P> t [95% Conf. Interval] z -.1687428.0018729-90.10 0.000 -.1724139 -.1650716 _cons 33.00446.0100957 3269.17 0.000 32.98467 33.02425. scalar red_2 = _b[z] 48

Let s calculate b. scalar b_red = red_2 / red_1. scalar list b_red b_red = -.33841667 49

Instrumental variable Instrumental variable = a variable that causes variation in price (explanatory variable X) but does not affect demand (dependent variable Y) directly. If the variable cost component z i affecting demand directly, varies at random, i.e., without market data allows us to use the experimental approach indirectly. 50

Approach #2 Could we proceed differently? 1. Regress P i on z i. Calculate predicted price P i = γ 0 + γ 1 z i. 2. Regress Q i on P i to get b (and a). Equation Q i = a bp i + ε i is a structural equation. 51

Regress P on z, created predicted P. regr p z Source SS df MS Number of obs = 10,000 F(1, 9998) = 7814.03 Model 9783.4964 1 9783.4964 Prob > F = 0.0000 Residual 12517.9151 9,998 1.25204192 R-squared = 0.4387 Adj R-squared = 0.4386 Total 22301.4115 9,999 2.23036418 Root MSE = 1.1189 p Coef. Std. Err. t P> t [95% Conf. Interval] z.4986242.0056407 88.40 0.000.4875673.5096812 _cons 101.0138.0304066 3322.11 0.000 100.9542 101.0734. predict p_hat (option xb assumed; fitted values) 52

Regress Q on predicted P. regr q p_hat Source SS df MS Number of obs = 10,000 F(1, 9998) = 8117.89 Model 1120.46318 1 1120.46318 Prob > F = 0.0000 Residual 1379.96352 9,998.138023957 R-squared = 0.4481 Adj R-squared = 0.4481 Total 2500.42671 9,999.250067677 Root MSE =.37152 q Coef. Std. Err. t P> t [95% Conf. Interval] p_hat -.3384167.003756-90.10 0.000 -.3457793 -.3310541 _cons 67.18923.388817 172.80 0.000 66.42707 67.95139 53

Why / how do these approaches work? 1. Regress P i on z i to get 1 2 c 1 = cov(p i,z i ) var(z i ). 2. Regress Q i on z i to get b 1 2 c 1 = cov(q i,z i ) var(z i ). b = cov(q i,z i ) cov(p i,z i ) 54

Why / how do these approaches work? Regress Q i on P i : b = cov(q i, P i ) var( P i ) = cov(q i, γ 0 + γ 1 z i ) var( γ 0 + γ 1 z i ) b = γ 1cov(Q i, z i ) γ 2 i var(z i ) ; recall γ 1 = cov(p i, z i ) var(z i ) b = var(z i ) cov(p i, z i ) cov(q i, z i ) var(z i ) 55

Why / how do these approaches work? b = var(z i ) cov(p i, z i ) cov(q i, z i ) var(z i ) b = cov(q i, z i ) cov(p i, z i ) 56

2sls / instrumental variables regression In practice, want to use the so-called Two Stage Least Squares (2sls) or instrumental variables regression command. In Stata, ivregress. Manual and ivregress commands produce same point estimates, but the latter corrects the standard errors. This is important, as the manual approach yields too small se: It ignores the uncertainty in the parameters γ 0 and γ 1 used to calculate P i. 57

2sls estimation of demand. ivregress 2sls q (p = z) Instrumental variables (2SLS) regression Number of obs = 10,000 Wald chi2(1) = 2506.07 Prob > chi2 = 0.0000 R-squared =. Root MSE =.66866 q Coef. Std. Err. z P> z [95% Conf. Interval] p -.3384167.0067601-50.06 0.000 -.3516663 -.3251671 _cons 67.18923.6997938 96.01 0.000 65.81766 68.5608 Instrumented: p Instruments: z 58

Requirements for an instrument Think of our normal regression Y = β 0 + β 1 X + u. 1. Instrument relevance: The instrument Z has to affect the (endogenous) explanatory variable X of the equation of interest ( 2nd stage equation ) in the equation X = α 0 + α 1 Z + v 2. Instrument exogeneity: The instrument Z may not be correlated with the error term of the equation of interest, i.e., cov Z, u = 0 59

Instrument relevance / Weak instrument Relevance = instrument z needs to be correlated enough with the endogenous explanatory variable X. What happens when cov z, X 0? β 1 becomes undefined! 60

Instrument relevance / Weak instrument you want to check that your instrument is relevant = you don t have a weak instrument. Rule of thumb: F-statistic of Z when you regress X on Z (and possible further controls) > 10. Note: with 1 instrument, F-test is the square of the t-test. 61

. estat firststage First-stage regression summary statistics Adjusted Partial Variable R-sq. R-sq. R-sq. F(1,9998) Prob > F p 0.4387 0.4386 0.4387 7814.03 0.0000 62

Instrument relevance / Weak instrument There are more sophisticated tests. There are ways of allowing for weak instruments. We leave all that for later courses. 63

Instrument correlated with error = exogeneity assumption fails. biased estimate of β 1. Similar to omitted variable bias. 64

Instrument correlated with error What can be done? 1. Strong story for why no correlation between instrument and error. 2. With multiple instruments, may do tests. 3. There are ways of allowing for (some) correlation to check robustness of your results (for later). 65