Lecture#12 Instrumental variables regression Causal parameters III 1
Demand experiment, market data analysis & simultaneous causality 2
Simultaneous causality Your task is to estimate the demand function for a homogenous good sold at unit price. How would you do this in an experiment? By changing the price yourself ( at random ) and observing how many units are sold at each price. 3
Experiment What does choosing prices at random mean? 1. We offer different randomized prices to individual consumers. 2. We offer different randomized prices each to a group of consumers. 4
Experiment We record quantity sold at different prices. We study the outcomes. 5
Experiment Let s run 10 000 experiments. We choose the price at random as follows:. set obs 10000 number of observations (_N) was 0, now 10,000. set seed 85729476. gen p_exp = 103.5131 + invnorm(uniform()) * 5 6
96.901564 98.901564 p_exp 32.978439 34.978439 q_exp p_exp q_exp 97.901564 33.978439 7
p_exp 100 105 110 115 120 26 28 30 32 34 q_exp 8
90 p_exp 100 110 120 26 28 30 32 34 36 q_exp 9
80 90 p_exp 100 110 120 25 30 35 40 q_exp 10
80 90 p_exp 100 110 120 25 30 35 40 q_exp 11
80 90 100 110 120 25 30 35 40 q_exp p-q linear regr. 12
Regression analysis. regr q_exp p_exp Source SS df MS Number of obs = 10,000 F(1, 9998) = 62500.28 Model 27537.4966 1 27537.4966 Prob > F = 0.0000 Residual 4405.09861 9,998.440597981 R-squared = 0.8621 Adj R-squared = 0.8621 Total 31942.5952 9,999 3.19457898 Root MSE =.66378 q_exp Coef. Std. Err. t P> t [95% Conf. Interval] p_exp -.3332208.0013329-250.00 0.000 -.3358335 -.3306081 _cons 66.66225.1381827 482.42 0.000 66.39138 66.93311 13
Robustness analysis. regr q_exp p_exp* Source SS df MS Number of obs = 10,000 F(2, 9997) = 31252.50 Model 27538.1636 2 13769.0818 Prob > F = 0.0000 Residual 4404.43159 9,997.440575332 R-squared = 0.8621 Adj R-squared = 0.8621 Total 31942.5952 9,999 3.19457898 Root MSE =.66376 q_exp Coef. Std. Err. t P> t [95% Conf. Interval] p_exp -.3824384.0400223-9.56 0.000 -.4608901 -.3039867 p_exp2.0002378.0001933 1.23 0.219 -.000141.0006166 _cons 69.20313 2.069639 33.44 0.000 65.14622 73.26004 14
Parameters for inverse demand function. sum alpha_exp beta_exp Variable Obs Mean Std. Dev. Min Max alpha_exp 10,000 200.1629 0 200.1629 200.1629 beta_exp 10,000 3.003041 0 3.003041 3.003041 15
Market outcomes Assume you are an outside observer of a market say a prospective buyer of a firm or the competition authority. you cannot run experiments. You would still want to know demand (to calculate e.g. price-cost margins, consumer surplus). 16
Market outcomes We collect data from the market. We observe pairs (P i, Q i ), i = market. Let s think how such pairs are determined, using a simple monopoly model. 17
Market outcome data 18
101.93862 103.93862 p p_exp q_exp 97.901564 33.978439 p q 102.93862 32.288517 31.288517 33.288517 q 19
100 102 104 106 108 p 31 31.5 32 32.5 q 20
100 102 104 106 108 p 31 31.5 32 32.5 33 33.5 q 21
98 100 102 104 106 108 p 30 31 32 33 34 q 22
98 100 102 104 106 108 p 30 31 32 33 34 q 23
98 100 102 104 106 108 30 31 32 33 34 q p-q linear regr. 24
80 90 Fitted values 100 110 120 25 30 35 40 market data experimental data 25
Regression using market data. regr p q Source SS df MS Number of obs = 10,000 F(1, 9998) = 135.06 Model 297.257566 1 297.257566 Prob > F = 0.0000 Residual 22004.1539 9,998 2.20085556 R-squared = 0.0133 Adj R-squared = 0.0132 Total 22301.4115 9,999 2.23036418 Root MSE = 1.4835 p Coef. Std. Err. t P> t [95% Conf. Interval] q -.3447938.029668-11.62 0.000 -.4029491 -.2866384 _cons 114.6012.9542 120.10 0.000 112.7308 116.4716 26
Parameter comparison for inverse demand fcn. sum alpha_exp beta_exp alpha_ols beta_ols Variable Obs Mean Std. Dev. Min Max alpha_exp 10,000 200.1629 0 200.1629 200.1629 beta_exp 10,000 3.003041 0 3.003041 3.003041 alpha_ols 10,000 114.6012 1.42e-14 114.6012 114.6012 beta_ols 10,000.3446774 0.3446774.3446774 27
Challenge with market data How is price determined? By the firm(s). Firm(s) take market conditions into account: 1. demand 2. (variable) costs of production 3. competition. 28
Challenge with market data Price quantity pairs are a leading example of simultaneous causality. This generalizes to more complicated markets with 1. differentiated goods 2. multiproduct firms. 3. endogenous entry and exit 4. dynamic considerations 5. advertising 6. 29
Market data - solution Need to address simultaneous causality. need to understand and exploit determinants of price and quantity. How did the experiment solve the problem? By having the researcher shift (=change) prices instead of the firm. 30
Linear monopoly Demand function Q i = a bp i + ε i a = average intercept b = slope ε i = market-specific deviation from the average intercept. NOTE: we assume the firm observes all these parameters. 31
Linear monopoly Inverse demand function P i = a b 1 b Q i + 1 b ε i = α βq i + ε i 32
Linear monopoly Supply function how to get? Need to specify costs of production: constant marginal cost c i = c 0 + c 1 z i + η i 33
Linear monopoly c i = c 0 + c 1 z i + η i c 0 = average of marginal cost when w i =0 z i = a component of marginal cost that varies across markets (cost of raw materials / unit of output, cost of labor / unit of output,..) η i = shock to average marginal cost / deviation from the avg. 34
Linear monopoly Profit function max Pi π i = (P i -c i )Q i Equilibrium price P i = a 2b + 1 2 c i + 1 2b ε i 35
Linear monopoly Equilibrium price is a function of P i = a 2b + 1 2 (c 0+c 1 z i + η i ) + 1 2b ε i 1. (fixed) demand parameters a and b 2. (fixed) supply side parameters c 0, c 1 3. variable cost determinant z i 4. cost shock η i 5. demand shock ε i 36
Linear monopoly Equilibrium quantity is a function of Q i = a 2 b 2 c i + 1 2 ε i 1. (fixed) demand parameters a and b 2. (fixed) supply side parameters c 0, c 1 3. variable cost determinant z i 4. cost shock η i 5. demand shock ε i 37
Market data - challenge Both eq. price and eq. quantity are functions of 1. demand shock ε i and 2. supply shock η i simultaneous causality. 38
Market data - solution We want to learn the demand curve. Could we mimic the experimental approach with market data? Needed: something that shifts firm s (supply) decision at random. Random = without being affected by demand shock ε i. 39
Experiment #2 Imagine the firm still sets the price, But we choose (at random) z i = z i exp. Recall that c i = c 0 + c 1 z i + η i we shift firm s marginal cost. 40
Experiment #2 Now the firm sets each time the price P i = a 2b + 1 2 (c 0+c 1 z i exp + η i ) + 1 2b ε i equivalent to running an experiment. Change in price due to (known) change in z i exp. 41
Experiment #2 Imagine we raise z i exp by 1 unit. By how much does 1. marginal cost c i = c 0 + c 1 z i + η i change? c 1 2. price change? 1 2 c 1 (by eqn on slide #41) 3. demand change? b 1 2 c 1 (by eqn on slide #37) 42
Experiment #2 How could we get the slope of the demand function from these changes? Yes, by dividing the change in quantity by the change in demand: b 1 2 c 1 1 2 c 1 = b 43
Experiment #2 How could we get those numbers? 1. Regress P i on z i exp to get 1 2 c 1. P i = γ 0 + γ 1 z i + e i γ 1 = 1 2 c 1 (by eqn on slide #41) 44
Experiment #2 2. Regress Q i on z i exp to get b 1 2 c 1. Q i = μ 0 + μ 1 z i + w i μ 1 = b 1 2 c 1 (by eqn on slide #37) These are called reduced form equations. 45
Experiment #2 When would this work? 1. z i exp has to impact firm decision, i.e., have an effect on c i. c 1 cannot be (insignificantly different from) zero. 2. z i exp may not have an effect on Q i directly, but only via c i. 46
Let s regress P on z. regr p z Source SS df MS Number of obs = 10,000 F(1, 9998) = 7814.03 Model 9783.4964 1 9783.4964 Prob > F = 0.0000 Residual 12517.9151 9,998 1.25204192 R-squared = 0.4387 Adj R-squared = 0.4386 Total 22301.4115 9,999 2.23036418 Root MSE = 1.1189 p Coef. Std. Err. t P> t [95% Conf. Interval] z.4986242.0056407 88.40 0.000.4875673.5096812 _cons 101.0138.0304066 3322.11 0.000 100.9542 101.0734. scalar red_1 = _b[z] 47
Let s regress Q on z. regr q z Source SS df MS Number of obs = 10,000 F(1, 9998) = 8117.89 Model 1120.46318 1 1120.46318 Prob > F = 0.0000 Residual 1379.96352 9,998.138023957 R-squared = 0.4481 Adj R-squared = 0.4481 Total 2500.42671 9,999.250067677 Root MSE =.37152 q Coef. Std. Err. t P> t [95% Conf. Interval] z -.1687428.0018729-90.10 0.000 -.1724139 -.1650716 _cons 33.00446.0100957 3269.17 0.000 32.98467 33.02425. scalar red_2 = _b[z] 48
Let s calculate b. scalar b_red = red_2 / red_1. scalar list b_red b_red = -.33841667 49
Instrumental variable Instrumental variable = a variable that causes variation in price (explanatory variable X) but does not affect demand (dependent variable Y) directly. If the variable cost component z i affecting demand directly, varies at random, i.e., without market data allows us to use the experimental approach indirectly. 50
Approach #2 Could we proceed differently? 1. Regress P i on z i. Calculate predicted price P i = γ 0 + γ 1 z i. 2. Regress Q i on P i to get b (and a). Equation Q i = a bp i + ε i is a structural equation. 51
Regress P on z, created predicted P. regr p z Source SS df MS Number of obs = 10,000 F(1, 9998) = 7814.03 Model 9783.4964 1 9783.4964 Prob > F = 0.0000 Residual 12517.9151 9,998 1.25204192 R-squared = 0.4387 Adj R-squared = 0.4386 Total 22301.4115 9,999 2.23036418 Root MSE = 1.1189 p Coef. Std. Err. t P> t [95% Conf. Interval] z.4986242.0056407 88.40 0.000.4875673.5096812 _cons 101.0138.0304066 3322.11 0.000 100.9542 101.0734. predict p_hat (option xb assumed; fitted values) 52
Regress Q on predicted P. regr q p_hat Source SS df MS Number of obs = 10,000 F(1, 9998) = 8117.89 Model 1120.46318 1 1120.46318 Prob > F = 0.0000 Residual 1379.96352 9,998.138023957 R-squared = 0.4481 Adj R-squared = 0.4481 Total 2500.42671 9,999.250067677 Root MSE =.37152 q Coef. Std. Err. t P> t [95% Conf. Interval] p_hat -.3384167.003756-90.10 0.000 -.3457793 -.3310541 _cons 67.18923.388817 172.80 0.000 66.42707 67.95139 53
Why / how do these approaches work? 1. Regress P i on z i to get 1 2 c 1 = cov(p i,z i ) var(z i ). 2. Regress Q i on z i to get b 1 2 c 1 = cov(q i,z i ) var(z i ). b = cov(q i,z i ) cov(p i,z i ) 54
Why / how do these approaches work? Regress Q i on P i : b = cov(q i, P i ) var( P i ) = cov(q i, γ 0 + γ 1 z i ) var( γ 0 + γ 1 z i ) b = γ 1cov(Q i, z i ) γ 2 i var(z i ) ; recall γ 1 = cov(p i, z i ) var(z i ) b = var(z i ) cov(p i, z i ) cov(q i, z i ) var(z i ) 55
Why / how do these approaches work? b = var(z i ) cov(p i, z i ) cov(q i, z i ) var(z i ) b = cov(q i, z i ) cov(p i, z i ) 56
2sls / instrumental variables regression In practice, want to use the so-called Two Stage Least Squares (2sls) or instrumental variables regression command. In Stata, ivregress. Manual and ivregress commands produce same point estimates, but the latter corrects the standard errors. This is important, as the manual approach yields too small se: It ignores the uncertainty in the parameters γ 0 and γ 1 used to calculate P i. 57
2sls estimation of demand. ivregress 2sls q (p = z) Instrumental variables (2SLS) regression Number of obs = 10,000 Wald chi2(1) = 2506.07 Prob > chi2 = 0.0000 R-squared =. Root MSE =.66866 q Coef. Std. Err. z P> z [95% Conf. Interval] p -.3384167.0067601-50.06 0.000 -.3516663 -.3251671 _cons 67.18923.6997938 96.01 0.000 65.81766 68.5608 Instrumented: p Instruments: z 58
Requirements for an instrument Think of our normal regression Y = β 0 + β 1 X + u. 1. Instrument relevance: The instrument Z has to affect the (endogenous) explanatory variable X of the equation of interest ( 2nd stage equation ) in the equation X = α 0 + α 1 Z + v 2. Instrument exogeneity: The instrument Z may not be correlated with the error term of the equation of interest, i.e., cov Z, u = 0 59
Instrument relevance / Weak instrument Relevance = instrument z needs to be correlated enough with the endogenous explanatory variable X. What happens when cov z, X 0? β 1 becomes undefined! 60
Instrument relevance / Weak instrument you want to check that your instrument is relevant = you don t have a weak instrument. Rule of thumb: F-statistic of Z when you regress X on Z (and possible further controls) > 10. Note: with 1 instrument, F-test is the square of the t-test. 61
. estat firststage First-stage regression summary statistics Adjusted Partial Variable R-sq. R-sq. R-sq. F(1,9998) Prob > F p 0.4387 0.4386 0.4387 7814.03 0.0000 62
Instrument relevance / Weak instrument There are more sophisticated tests. There are ways of allowing for weak instruments. We leave all that for later courses. 63
Instrument correlated with error = exogeneity assumption fails. biased estimate of β 1. Similar to omitted variable bias. 64
Instrument correlated with error What can be done? 1. Strong story for why no correlation between instrument and error. 2. With multiple instruments, may do tests. 3. There are ways of allowing for (some) correlation to check robustness of your results (for later). 65