Instrumental Variable Regression

Similar documents
Multivariate Regression: Part I

ECON Introductory Econometrics. Lecture 17: Experiments

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Econometrics. 8) Instrumental variables

An explanation of Two Stage Least Squares

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables

Instrumental Variables, Simultaneous and Systems of Equations

Experiments and Quasi-Experiments

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Introduction to Econometrics

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Handout 12. Endogeneity & Simultaneous Equation Models

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

1 Motivation for Instrumental Variable (IV) Regression

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Dynamic Panel Data Models

Econometrics Midterm Examination Answers

8. Instrumental variables regression

Econometrics Problem Set 11

Hypothesis Tests and Confidence Intervals in Multiple Regression

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Nonlinear Regression Functions

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

The Simple Linear Regression Model

Statistical Inference with Regression Analysis

Lecture 8: Instrumental Variables Estimation

Econometrics Homework 1

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Lecture 4: Multivariate Regression, Part 2

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

At this point, if you ve done everything correctly, you should have data that looks something like:

ECON3150/4150 Spring 2015

Introduction to Econometrics

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Graduate Econometrics Lecture 4: Heteroskedasticity

ECO220Y Simple Regression: Testing the Slope

Introduction to Econometrics. Regression with Panel Data

Applied Statistics and Econometrics

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Introduction to Econometrics. Multiple Regression (2016/2017)

Dynamic Panels. Chapter Introduction Autoregressive Model

Essential of Simple regression

Lecture 4: Multivariate Regression, Part 2

Handout 11: Measurement Error

Introduction to Econometrics. Review of Probability & Statistics

Linear Regression with Multiple Regressors

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

Lab 07 Introduction to Econometrics

Empirical Application of Simple Regression (Chapter 2)

ECON3150/4150 Spring 2016

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Extensions to the Basic Framework II

Problem Set 1 ANSWERS

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

ECON Introductory Econometrics. Lecture 13: Internal and external validity

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

Regression #8: Loose Ends

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

5.2. a. Unobserved factors that tend to make an individual healthier also tend

Applied Statistics and Econometrics

Nonrecursive models (Extended Version) Richard Williams, University of Notre Dame, Last revised April 6, 2015

Introduction to Econometrics. Multiple Regression


CHAPTER 6: SPECIFICATION VARIABLES

Instrumental Variables and the Problem of Endogeneity

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Lecture 14. More on using dummy variables (deal with seasonality)

Control Function and Related Methods: Nonlinear Models

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Question 1 [17 points]: (ch 11)

Applied Statistics and Econometrics


Ec1123 Section 7 Instrumental Variables

Econ 510 B. Brown Spring 2014 Final Exam Answers

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Econometrics. 9) Heteroscedasticity and autocorrelation

ECON 836 Midterm 2016

Instrumental Variables

Quantitative Methods Final Exam (2017/1)

Lab 6 - Simple Regression

Chapter 6: Linear Regression With Multiple Regressors

Chapter 11. Regression with a Binary Dependent Variable

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Greene, Econometric Analysis (7th ed, 2012)

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Extensions to the Basic Framework I

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Practice exam questions

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Transcription:

Topic 6 Instrumental Variable Regression ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà

Outline of this topic Randomized Experiments, natural experiments and causation Instrumental variables: causation versus correlation. Endogeneity bias Derivation of IV with GMM Properties of IV Tests

Experiments and quasi-experiments i The ideal environment is one in which we can control an experiment and randomly assign a treatment across individual units. Think of a pharmaceutical trial where you could control every single variable in your patients lives and you could randomly assign a treatment and a placebo. Experiments are expensive and in economics, usually rare so we often have to rely on quasiexperiments or natural experiments: a situation where something similar to a treatment assignment took place.

Puerto Rico and Hurricane Betsy Puerto Rico and the effects of class size on earnings: in 1956 there was a hurricane (Betsy) that traversed the island of Puerto Rico. Not many casualties but much infrastructure damage. As a result, children along the hurricane s path had to attend neighboring schools, effectively doubling class sizes in some districts but not others. The hurricane effectively assigned at random what school districts had class sizes doubled.

Program evaluation In statistics/econometrics this refers to studies designed to assess the effects of public policy: e.g. class size reduction, anti-smoking campaigns, etc. Three types of experiments: 1. Clinical i l drug trial: does a proposed drug lower cholesterol? Y = cholesterol level X = treatment t t or control group (or dose of drug) 2. Job training program (Job Training Partnership Act) Y = has a job, or not (or Y = wage income) X = went through experimental program, or not 3. Class size effect (Tennessee class size experiment) Y = test score (Stanford Achievement Test) X = class size treatment group (regular, regular + aide, small)

Causality An ideal randomized controlled experiment assigns subjects to treatment and control groups. More generally, the treatment level X is randomly assigned y i = x i + ² i Random assignment ensures that x i and ² i are independent and hence E(² i jx i )=0which is what we need to ensure ^ ^ is unbiased. Suppose x i is a binary variable (i.e. 1 for treatment, 0 for control) then

Treatment Effect the average treatment effect is simply: AT E = E(y i jx i =1) E(y i jx i =0); AT ^ E =¹y x=1 ¹y x=0 This is sometimes called the differences estimator. In practice, several reasons things fail: omitted variable bias: we omit important explanatory variables correlated with x. Errors-in-variables bias. Simultaneous causality bias.

Omitted Variables Bias Suppose the data are generated by the model: y = X + W± + ² But you estimate the model: Then: y = X + u ^ 1 1 ^ =(X 0 X) 1 (X 0 y)=(x 0 X) 1 (X 0 (X + W± + ²) = = +(X 0 X) 1 (X 0 W)± +(X 0 X) 1 (X 0 ²) So the bias depends d on the correlation between X and W and. Notice: E(ujX) = E(X 0 W±jX) = 0

Errors-in-variables Suppose the data are generated by: y i 0 = 1 + 2x 0 i + ² 0 i However, we observe our variables with error, e.g.: x i = x 0 i + v xi ;andy i = yi 0 + v yi For simplicity (although perhaps not realistically) assume the errors are iid i.i.d. D(0; ¾ 2 j ); j = x; y Substituting the observed y, x into the original regression, we have

Errors-in-Variables y = 0 i 1 + 2(x i v xi ) + ² i + v yt = 1 + 2x i + f² 0 i + v yi 2v xi g Measurement error clearly increases the error variance, since: But this is a minor problem compared to: u i V (u i )=¾² 2 + ¾y 2 + 22¾ x 2 E(u i jx i )=E(² 0 i + v yi 2v xi jx 0 i + v xi )= 2v xi 6=0

Simultaneous Causality Classic example: a regression of quantity of wheat (Q) on the price of wheat (P). Suppose you want to estimate the elasticity of demand for wheat: ln(q i ) = 0 + 1 ln(p i ) + u i In principle ^ 1 would be an estimate of this elasticity (why?). In practice, it is not To see this, suppose we have data on quantities and prices for different years

Simultaneous Causality Observations of quantities and prices over time:

Simultaneous Causality The interaction of supply and demand generates the following scatter of points

Simultaneous Causality Now suppose that I use the variable rainfall as a way to isolate shifts in the supply curve out of this cloud of points(assuming that rainfall does nothing to change the demand for wheat):

What is the common thread? In all of these situations, the regression parameters are estimated t with bias. The source of the bias is that the residuals of the actual regression that we estimate are correlated with the regressors E(Xi² 0 i ) 6= 0: Omitted variable bias: the regressors are correlated with the omitted variables. In the regression of test scores on class size omitting parental income will bias the effect of class size (our policy variable of interest) since richer districts i t tend to have smaller classes and richer parents have more resources (unmeasured) to help their children do well in school

The common thread (cont.) Errors-in-variables: we saw that even in the benign case that t the measurement error is random and well-behaved we had bias. Of course, this only gets worse if the bias is not random (e.g. the size of the measurement error is related to the value the regressor takes). Simultaneous causality: perhaps the more fundamental source of bias (often called endogeneity bias). Many variables are jointly dt determined dso some work needs to be done to filter the portion of the causality direction that we want to measure

The Solution: Instrumental t Variables Instrumental Variables: Z i is an l 1vectorsuchthat E(Zi² 0 i )=0andE(ZiX 0 i ) 6= 0withl k Remarks: Not all elements of X i need be endogenous. Those that are exogenous could be used as elements in Z i. In fact, if all the X i were exogenous, we would recover the usual moment condition we used for the method of moments derivation of OLS. If l = k we say the model is just identified, d otherwise we say the model is overidentified.

Two Stage Least Squares So the instruments are correlated with that part of the regressors that is uncorrelated with the error term. We could get at that part by regressing the regressors on the instruments: X = Z + U n ll k n k n k ^X = Z ^ =Z(Z 0 Z) 1 Z 0 X = P z X with P z idempotent Next, plug-in X into the linear regression

TSLS (continued) Y = X + ² =(Z +U) + ² = Z +(U + ²) =Z + V Notice that: E(Z 0 V )=E(Z 0 U )+E(Z 0 ²)=0 ^ =(Z 0 Z) 1 Z 0 Y and from before ^ =(Z 0 Z) 1 Z 0 X Let s think of the just identified case, i.e., = k 1 k kk 1! = 1 l = k ^ IV = (Z 0 Z) 1 Z 0 X 1 (Z 0 Z) 1 Z 0 Y =(Z 0 X) 1 (Z 0 Y )

TSLS Now using method of moments: E(Z 0 ²)=0! E(Z 0 (Y X )) = 0! E(Z 0 Y )=E(Z 0 X) And the analogy principle: ^ IV = μ Z 0 X n 1 μ Z 0 Y n =(Z 0 X) 1 (Z 0 Y ) However, what happens when we have overidentification, i.e. l >k ^ l k is not directly invertible

TSLS (cont.) However, notice ^ = ^ ^! (Z 0 Z) 1 Z 0 Y =(Z 0 Z) 1 (Z 0 X) l k Pre-multiplying both sides by X Z X 0 Z(Z 0 Z) 1 Z 0 Y = X 0 Z(Z 0 Z) 1 (Z 0 X) ^ k 1 ^ k k k 1 hence ^ k 1 1 = X 0 Z(Z 0 Z) 1 Z 0 X X 0 Z(Z 0 Z) 1 (Z 0 Y ) ^ IV = (X 0 P z X) ) 1 (X 0 P z Y )

TSLS The General Case Let s go back. First stage: X = Z +U ^X = Z ^ =Z(Z 0 Z) 1 Z 0 X = P z X Second Stage: do OLS on the auxiliary regression Y = ^X + V ^ IV =(^X 0 ^X) 1 ^X0 Y =(X 0 P z X) 1 (X 0 P z Y ) Since P z P z = P z

Remarks When doing TSLS note that the standard errors of the second stage regression (the one where you substitute X with ^X ) are incorrect. The reason is that the usual OLS formulas in the second stage do not take into account the estimation error in ^X^X. As we will see in more detail, the same issues with heteroscedasticity can arise in instrumental variable regression as well.

Example: Demand for Cigarretes We want to estimate: ln(q i d )= 0 + 1ln(p i )+² i Data: a panel of observations across states and time Annual cigarette consumption and average prices paid (including tax) 48 continental US states, 1985-1995 Proposed instrumental variable: Z i = general sales tax per pack in the state = SalesTax i Is this a valid instrument? (1) Relevant? (2) Exogenous?

STATA example: First Stage Instrument = Z = rtaxso = general sales tax (real $/pack) X Z. reg lravgprs rtaxso if year==1995, r; Regression with robust standard errors Number of obs = 48 F( 1, 46) = 40.39 Prob > F = 0.0000 R-squared = 0.4710 Root MSE =.09394 ------------------------------------------------------------------------------ Robust lravgprs Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- rtaxso.0307289.0048354 6.35 0.000.0209956.0404621 _cons 4.616546.0289177 159.64 0.000 4.558338 4.674755 ------------------------------------------------------------------------------ X-hat. predict lravphat; Now we have the predicted values from the 1 st stage

Second Stage Y X-hat. reg lpackpc lravphat if year==1995, r; Regression with robust standard errors Number of obs = 48 F( 1, 46) = 10.54 Prob > F = 0.0022 R-squared = 0.1525 Root MSE =.22645 ------------------------------------------------------------------------------ Robust lpackpc Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lravphat -1.083586.3336949-3.25 0.002-1.755279 -.4118932 _cons 9.719875 1.597119 6.09 0.000 6.505042 12.93471 ------------------------------------------------------------------------------ These coefficients are the TSLS estimates The standard errors are wrong because they ignore the fact that ARE/ECN the 240A first stage was estimated

Using STATA s built-in ivreg command Y X Z. ivreg lpackpc (lravgprs = rtaxso) if year==1995, r; IV (2SLS) regression with robust standard errors Number of obs = 48 F( 1, 46) = 11.54 Prob > F = 0.0014 R-squared = 0.4011 Root MSE =.19035 ------------------------------------------------------------------------------ Robust lpackpc Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lravgprs -1.083587.3189183-3.40 0.001-1.725536 -.4416373 _cons 9.719876 1.528322 6.36 0.000 6.643525 12.79623 ------------------------------------------------------------------------------ Instrumented: lravgprs This is the endogenous regressor Instruments: rtaxso This is the instrumental varible ------------------------------------------------------------------------------ OK, the change in the SEs was small this time...but not always! ln(^qd i d )= 9:72 1:08 ln(^p i), n = 48 (1:53) (0:32)

Elaborating on the Example cigarettes cigarettes ln( Q ) = β 0 + β 1 ln( P i ) + β 2 ln(income i i) ) + u i ( i i Z 1i = general sales tax i Z 2i = cigarette-specific ifi tax i cigarettes Endogenous variable: ln( P i )( one X ) ) Included exogenous variable: ln(income i ) ( one W ) Instruments (excluded endogenous variables): general sales tax, cigarette-specific ifi tax ( two Zs ) Is the demand elasticity β 1 overidentified, exactly identified, or underidentified?

Cigarrete Demand: One Instrument t Y W X Z. ivreg lpackpc lperinc (lravgprs = rtaxso) if year==1995, r; IV (2SLS) regression with robust standard errors Number of obs = 48 F( 2, 45) = 8.19 Prob > F = 0.0009 R-squared = 0.4189 Root MSE =.18957 ------------------------------------------------------------------------------ Robust lpackpc Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lravgprs -1.143375.3723025-3.07 0.004-1.893231 -.3935191 lperinc.214515.3117467 0.69 0.495 -.413375.842405 _cons 9.430658 1.259392 7.49 0.000 6.894112 11.9672 ------------------------------------------------------------------------------ Instrumented: lravgprs Instruments: lperinc rtaxso STATA lists ALL the exogenous regressors as instruments slightly different terminology than we have been using ------------------------------------------------------------------------------ Running IV as a single command yields correct SEs Use, r for heteroskedasticity-robust SEs

Cigarrete Demand: Two Instruments t Y W X Z 1 Z 2. ivreg lpackpc lperinc (lravgprs = rtaxso rtax) if year==1995, r; IV (2SLS) regression with robust standard errors Number of obs = 48 F( 2, 45) = 16.17 Prob > F = 0.0000 R-squared = 0.4294 Root MSE =.18786 ------------------------------------------------------------------------------ Robust lpackpc Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lravgprs -1.277424.2496099-5.12 0.000-1.780164 -.7746837 lperinc.2804045.2538894 1.10 0.275 -.230955.7917641 _cons 9.894955.9592169 10.32 0.000 7.962993 11.82692 ------------------------------------------------------------------------------ Instrumented: lravgprs Instruments: lperinc rtaxso rtax STATA lists ALL the exogenous regressors as instruments slightly different terminology than we have been using ------------------------------------------------------------------------------

IV as Generalized Method of Moments Rather than examining the properties of the justidentified or the overidentified TSLS-IV estimator, it is easier to examine IV using GMM LetZ Z i ; X i be 1 l and 1 k vectors respectively Suppose E(Zi² 0 i )=0 Hence define: g i(y 0 i (y i ; X i ; Z i ; ) = Z i i X i ) From the population moment condition, the sample analog is 1 n nx i=1 g i (y i ;X i ;Z i ; ) =g n ( ) = Z0 (Y X ) n =0

GMM The generic GMM objective function is: ^Q n ( ) = g n ( ) 0 ^Wgn ( ) For some weighting l l matrix Notice that ^G = @g n( ) @ = Z 0 X ^W p! W So if I apply directly (big if since I have not checked assumptions) the results from extremum estimators p ^ n( 0) d 1 1! N(0; (G 0 WG) 1 G 0 W ÐWG(G 0 WG) 1 ) Were. If homoscedastic Ð = E(ZiZ 0 2 2 0 i Z i ² i ) Ð = ¾ (Z Z)

GMM in detail μ Z 0 (Y X ) 0 max n First order conditions: Recall: μ Z 0 X 0 2 ^W n W^W μ Z 0 (Y X ) n μ Z 0 (Y X ) =0 n X 0 Z ^WZ 0 Y = X 0 Z ^WZ 0 X ^ GMM = (X 0 Z ^W Z 0 X) ) 1 X 0 Z ^W Z 0 Y 1 ^ IV = X 0 Z(Z 0 Z) 1 Z 0 X 1 X 0 Z(Z 0 Z) 1 (Z 0 Y )

GMM (cont.) Intriguing, so if I choose ^W =(Z 0 Z) 1 then ^ GMM = ^ IV In fact, since ^G = G = Z 0 X then under homoscedasticity Ð=¾ 2 (Z 0 Z) and hence p n( ^ 0) d! N(0; (G 0 WG) 1 G 0 W ÐWG(G 0 WG) 1 ) p d n( ^ 0) )! N(0;¾ 2 (X 0 Z(Z 0 Z) 11 Z 0 X) 11 X 0 Z(Z 0 Z) 11 (Z 0 Z)(Z 0 Z) 11 Z 0 X(X 0 Z(Z 0 Z) 11 Z 0 X) p n( ^ 0) d! N(0;¾ 2 (X 0 Z(Z 0 Z) 1 Z 0 X) 1 ) p n( ^ 0) d! N(0;¾ 2 (X 0 P z X) 1 )

GMM (cont.) Ask yourself, why (and when) would choosing be optimal? ^W =(Z 0 Z) 1 In general, we know that the optimal choice of weighting matrix is Ã! 11 1 nx ^W = g i (y i ;X i ;Z i ; n ^ )g i (y i ;X i ;Z i ; ^ ) 0 i=1 ^W p! W = E(g i ( )g i ( ) 0 )=E(Z 0 iz i ² 2 i )

A Recap: A Lot to Digest Take GMM as the primitive way of thinking about the instrumental variable problem and TSLS as a special case Under just-identification (and linearity in the regression), things are simple: the choice of weighting matrix is irrelevant because we have the same number of moments and parameters (unique solution). Hence ^ IV = ^ GMM =(Z 0 X) 1 (Z 0 Y )

Recap Continued In the over-identified case we have more moment conditions than parameters. Hence the weighting matrix matters. Under certain conditions, ^ TSLS = ^ GMM =(X 0 P z X) 1 (X 0 P z Y ) And from the usual GMM results p n( ^ 0) d! N(0;¾ 2 (X 0 P z X) 1 ) But, there are very specific assumptions for this to work. Two critical assumptions about the instruments

Assumptions on the Instruments 1. Instrument Relevance: it is easiest to think about why this matters in the context of TSLS. If the Z do not explain the X, then in the first stage regression ^X X = Z +U we get lousy for the second stage. 2. Instrument Exogeneity: it better be the case that E(Z 0 ²)=0, otherwise there is no advantage with respect to using the X themselves.

Distribution of GMM Estimator Assume that ^W! p W and let: E(ZiX 0 i )=Q. Further assume that Then: Hence: n( )! N(0; V ) Ð=E(Z 0 iz i ² 2 i )=E(g 0 ig i )whereg i = Z 0 i² i μ μ 1 1 n X0 Z ^W n Z0 X μ μ 1 n X0 Z ^W p 1 Z 0 ² n p ( ^ ) d N(0 V ) p! Q 0 WQ d! Q 0 WN(0; Ð) V =(Q 0 WQ) 1 (Q 0 W ÐWQ)(Q 0 WQ) 1

Efficient GMM If then: W =Ð 1 p n( ^ ) d! N(0; (Q 0 Ð 1 Q) 1 ) In small samples: Ã 1 n! 1 X W ^W = (^g (g ¹g) 0 (^g ¹g) ^g 0^² n i (g i with g i = Z i ² i i=1 And in the linear model under some assumptions one could use W ^W = (Z 0 Z) 11 as a first stage estimator for ^ and hence construct ^² i and thus ^g i

Assessing IV Instrument t Relevance Let s think back to TSLS. The just-identified case: ^ IV =(Z 0 X) 1 (Z 0 Y ) In the special case: and E(z 2 i ² 2 i )=¾ 2 ² ¾ 2 z y i = 0 + 1x i + ² i with E(z i ² i ) =0 ^ IV = S ZY S ZX p! ¾ zy ¾ zx with V ( ^ IV )= ¾2 ² ¾ 2 z (¾ zx ) 2 So, if ¾ zx! 0 then ^ IV! 1 but V ( IV ^ IV )! 1 as well

What Happens Asymptotically When E(z ix 0 i )=0 then by the CLT 1 nx p z 0 d n ix i! N1» N(0;E(zi 2 x 2 i )) i=1 1 nx p z 0 d n i ² i! N2» N(0;E(zi 2 ² 2 i )) n i=1 Therefore: ^ = 1 n P n p n i=1 z0 i ² i 1 P n p n i=1 z0 i x i d! N 2» Cauchy N 1

Weak Identification The Cauchy distribution has a mean, variance, and in fact higher moments, that are not defined (although its mode and its median are). Let s think of a less extreme case, where there is correlation between the instrument and the endogenous variable, but it is weak. One way to do this asymptotically (in a one variable-one instrument case: y i = x i + ² i x i = z i + u i = n 1=2 c

Weak Identification (cont.) Clearly, this device assures that asymptotically the instrument and the endogenous variable are uncorrelated. Specifically, n n n 1 nx p z n ix 0 i = 1 nx p z n i 2 + 1 nx p z i u i n i=1 = 1 n i=1 nx z i 2 c + p 1 n i=1 d! Qc + N 1 i=1 nx z i u i i=1 ^ d d! N 2 Qc + N 1

Remarks ^ is inconsistent for The asymptotic distribution of ^ ^ is non-normal Standard t-test have non-standard distributions

Example: Sampling Distribution of TSLS with weak instruments t Light line: strong instruments Dark line: weak instruments

Some things to check in TSLS One way to check for the strength of the instruments is with the F-stat of the first stage n regressions X = Z + U n k n l l k n k At a minimum, you should reject the null that (excluding any included exogenous variables) they should be jointly zero. Staiger and Stock (1997) and Stock and Wright (1998) are two standard references.

Overidentification Test (a.k.a J-Test) We have checked one of the conditions for IV to work: instrument relevance To check the other, exogeneity, we rely on having more moment conditions than parameters. The reason is that then we have a situation where, if the model is correctly specified, then all moment conditions should be satisfied. If it is not, then some will be violated.

Overidentification Test (cont.) Recall the GMM objective function: ^Q n ( ) = g n ( ) 0 ^Wgn ( ) where 1 n n i=1 nx gi (y i ;X i ;Z i ; ) =g n ( ) = Z0 (Y X ) n =0 when then only as and the model is correctly specified because the moment conditions only hold in populations, not in sample.

Overidentification Test (cont.) The specification test on the model is therefore based on the distance between the sample moment conditions and zero. It is important to note that to apply the test one must use the optimal weighting matrix. This is because we rely on the asymptotic result that 1 nx 1 nx d pn g i ( ) = p Z n i=1 n i² 0 i! N(0; Ð) i=1 Ð=E(g 0 = E(ZiZ 0 ² 2 i ( ) g i ( )) ( i i i )

Overidentification Test (cont.) Hence: as long as n ^Q n ( ^ ) = ng n ( ^ ) 0 ^Wgn ( ^ ) d! Â 2 l k ^W = 1 X n gi ^ ) 0 ^ ) = 1 X n 0 W g i^² 2 n i ( ) g i ( ) Z i=1 n i Z i ² i i=1 Remarks: Careful, this test tends to have low power.

The Distance Statistic Hypothesis tests could be constructed with the asymptotic covariance GMM estimate and the Wald principle, as we have often seen. However, if the hypotheses are nonlinear, it is often better to use directly the GMM criterion function. Suppose we want to test H = k r 0 : h( ) 0 for h : R! R

The Distance Statistic Let the estimates under the alternative be ^ =argmin Let the estimates under the null be ~ =argmin ~Q n ( ) Let h( )=0 ^Q n ( ) D = n( Q ~ n ( ~ ) ^Q n ( ^ )) Then D 0andD! d  2 r. Further, if h is linear then D equals the Wald statistic.

Fixed-effects model of cigarette demandd ln( Q cigarettes it ) = α i + β 1 ln( P cigarettes it ) + β 2 ln(income it ) + u it i = 1,,48, t = 1985, 1986,,1995 α i reflects unobserved omitted factors that t vary across states t but not over time, e.g. attitude towards smoking cigarettes Still, corr(ln( ( P it ),u it) ) is plausibly nonzero because of supply/demand interactions Estimation strategy: Use panel ldata regression methods to eliminate i α i Use TSLS to handle simultaneous causality bias Use T = 2 with 1985 1995 changes ( changes method) look at long-term response, not short-term dynamics (short- v. long-run elasticities)

The changes method (when T=2) T One way to model long-term effects is to consider 10-year changes, between 1985 and 1995 Rewrite the regression in changes form: cigarettes cigarettes ln( Q i1995 ) ln( Q i1985 ) cigarettes cigarettes = β 1 [ln( P i1995 ) ln( P i1985 )] +β 2 [ln(income i1995 ) ln(income i1985 )] + (u i1995 u i1985 ) Create 10-year change variables, for example: 10-year change in log price = ln(p i1995 ) ln(p i1985 ) Then estimate the demand elasticity by TSLS using 10-year changes in the instrumental variables

STATA: Cigarette demand First create 10-year change variables 10-year change in log price = ln(p it ) ln(p it 10 ) = ln(p it /P it 10 ). gen dlpackpc = log(packpc/packpc[_n-10]); _n-10 is the 10-yr lagged value. gen dlavgprs = log(avgprs/avgprs[_n-10]);. gen dlperinc = log(perinc/perinc[_n-10]);. gen drtaxs = rtaxs-rtaxs[_n-10];. gen drtax = rtax-rtax[_n-10];. gen drtaxso = rtaxso-rtaxso[_n-10];

Use TSLS to estimate the demand elasticity by using the 10-year changes specification Y W X Z. ivreg dlpackpc p dlperinc (dlavgprs = drtaxso), r; IV (2SLS) regression with robust standard errors Number of obs = 48 F( 2, 45) = 12.31 Prob > F = 0.0001 R-squared = 0.5499 Root MSE =.09092 ------------------------------------------------------------------------------ Robust dlpackpc Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- dlavgprs -.9380143.2075022-4.52 0.000-1.355945 -.5200834 dlperinc.5259693.3394942 1.55 0.128 -.1578071 1.209746 _cons.2085492.1302294 1.60 0.116 -.0537463.4708446 ------------------------------------------------------------------------------ Instrumented: dlavgprs Instruments: dlperinc drtaxso ------------------------------------------------------------------------------ NOTE: - All the variables Y, X, W, and Z s are in 10-year changes - Estimated elasticity =.94 (SE =.21) surprisingly elastic! - Income elasticity small, not statistically different from zero - Must check whether the instrument is relevant

Check instrument relevance: compute first-stage t F. reg dlavgprs drtaxso dlperinc, r; Regression with robust standard errors Number of obs = 48 F( 2, 45) = 16.84 Prob > F = 0.0000 R-squared = 0.5146 Root MSE =.06334 ------------------------------------------------------------------------------ Robust dlavgprs Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- drtaxso.0254611.0043876 5.80 0.000.016624.0342982 dlperinc -.2241037.2188815-1.02 0.311 -.6649536.2167463 _cons.5321948.0295315 18.02 0.000.4727153.5916742 ------------------------------------------------------------------------------. test drtaxso; We didn t need to run test here because with m=1 instrument, the ( 1) drtaxso = 0 F-statistic is the square of the t-statistic, that is, F( 1, 45) = 33.67 5.80*5.80 = 33.67 Prob > F = 0.0000 First stage F = 33.7 > 10 so instrument is not weak Can we check instrument exogeneity? No: l = k

Check instrument relevance: compute first-stage t F X Z1 Z2 W. reg dlavgprs drtaxso drtax dlperinc, r; Regression with robust standard errors Number of obs = 48 F( 3, 44) = 66.68 Prob > F = 0.0000 R-squared = 0.7779 Root MSE =.04333 ------------------------------------------------------------------------------ Robust dlavgprs Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- drtaxso.013457.0031405 4.28 0.000.0071277.0197863 drtax.0075734.0008859 8.55 0.000.0057879.0093588 dlperinc -.0289943.1242309-0.23 0.817 -.2793654.2213767 _ cons.4919733.0183233 26.85 0.000.4550451.5289015 ------------------------------------------------------------------------------. test drtaxso drtax; ( 1) drtaxso = 0 ( 2) drtax = 0 F( ARE/ECN 2, 240A44) = 88.62 88.62 > 10 so instruments aren t weak Prob > F = 0.0000

What about two instruments (i (cig-only tax, sales tax)?. ivreg dlpackpc dlperinc (dlavgprs = drtaxso drtax), r; IV (2SLS) regression with robust standard errors Number of obs = 48 F( 2, 45) = 21.30 Prob > F = 0.0000 R-squared = 0.5466 Root MSE =.09125 ------------------------------------------------------------------------------ Robust dlpackpc Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- dlavgprs -1.202403.1969433-6.11 0.000-1.599068 -.8057392 dlperinc.4620299.3093405 1.49 0.142 -.1610138 1.085074 _cons.3665388.1219126 3.01 0.004.1209942.6120834 ------------------------------------------------------------------------------ Instrumented: dlavgprs Instruments: dlperinc drtaxso drtax ------------------------------------------------------------------------------ drtaxso = general sales tax only drtax = cigarette-specific tax only Estimated elasticity is -1.2, even more elastic than using general sales tax only With l > k, we can test the overidentifying restrictions

Test the overidentifying i restrictions ti. predict e, resid; Computes predicted values for most recently estimated regression (the previous TSLS regression). reg e drtaxso drtax dlperinc; Regress e on Z s and W s Source SS df MS Number of obs = 48 -------------+------------------------------ F( 3, 44) = 1.64 Model.037769176 3.012589725 Prob > F = 0.1929 Residual.336952289 44.007658007 R-squared = 0.1008 -------------+------------------------------ Adj R-squared = 0.0395 Total.374721465 47.007972797 Root MSE =.08751 ------------------------------------------------------------------------------ e Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- drtaxso.0127669.0061587 2.07 0.044.000355.0251789 drtax -.0038077.0021179-1.80 0.079 -.008076.0004607 dlperinc -.0934062.2978459-0.31 0.755 -.6936752.5068627 _cons.002939.0446131 0.07 0.948 -.0869728.0928509 ------------------------------------------------------------------------------. test drtaxso drtax; ( 1) drtaxso = 0 Compute J-statistic, which is m*f, ( 2) drtax = 0 where F tests whether coefficients on the instruments are zero F( ARE/ECN 2, 240A 44) = 2.47 so J = 2 2.47 = 4.93 Prob > F = 0.0966 ** WARNING this uses the wrong d.f. **

The correct degrees of freedom for the J-statistic J ti ti is l k: l k J = lf, where F = the F-statistic testing the coefficients on Z 1i,,ZZ li in a regression of the TSLS residuals against Z 1i,,Z li, W 1i,,W li. Under the null hypothesis that all the instruments are exogeneous, J has a chi-squared distribution with l k degrees of freedom Here, J = 4.93, distributed chi-squared with d.f. = 1; the 5% critical value is 3.84, so reject at 5% sig. level. In STATA:. dis "J-stat = " r(df)*r(f) " p-value = " chiprob(r(df)-1,r(df)*r(f)); J-stat = 4.9319853 p-value =.02636401 J = 2 2.47 = 4.93 p-value l from chi-squared(1) distribution ib ti Now ARE/ECN what??? 240A

Tbl Tabular summary of fthese results:

How should we interpret the J-test rejection? J-test rejects the null hypothesis that both the instruments are exogenous This means that either rtaxso is endogenous, or rtax is endogenous, or both The J-test doesn t tell us which!! You must exercise judgment Why might rtax (cig-only tax) be endogenous? Political forces: history of smoking or lots of smokers political pressure for low cigarette taxes If so, cig-only tax is endogenous This reasoning doesn t apply to general sales tax use ARE/ECN just 240A one instrument, the general sales tax

The Demand for Cigarettes: Summary of Empirical i Results Use the estimated elasticity based on TSLS with the general sales tax as the only instrument: Elasticity = -.94, SE =.21 This elasticity is surprisingly large (not inelastic) a 1% increase in prices reduces cigarette sales by nearly 1%. This is much more elastic than conventional wisdom in the health economics literature. This is a long-run (ten-year change) elasticity. What would you expect a short-run (one-year change) elasticity to be more or less elastic?