Econometrics of Policy Evaluation (Geneva summer school)

Michael Lechner, Slide 1 Econometrics of Policy Evaluation (Geneva summer school) Michael Lechner Swiss Institute for Empirical Economic Research (SEW) University of St. Gallen Switzerland June 2016 Overview over causality and research designs

Michael Lechner, Slide 2 1 The causal problem 2 Matching 3 Instrumental variables 4 Regression discontuity design (RDD) 5 Differences-in-differences methods (DiD)

1 The causal problem Michael Lechner, Slide 3

Michael Lechner, Slide 4 The causal problem 1 How does an outcome change (Y) if some variable (D) changes Example: Effect of participation in university sports (D) on individual educational outcomes (grades, time to degree) of students (Y) There will be a positive correlation (corr(y,d) > 0)! Does this mean that better students are doing sports (selection effect due to confounding variables) or doing sports makes students better (causal effect) Policy relevance for (non-marketing oriented) university: causal effect! One possible question to answer for a researcher: How much did the participants in unversity sport benefit on average from participating? Key ingredient for knowing the answer: Which grades would the participants have obtained, had they NOT participated?

Michael Lechner, Slide 5 The causal problem 2 Data alone can never answer a causal question Y 1 : Grade if participating in university sports Y 0 : Grade if NOT participating in university sports Average causal effect for participants: ATET=E(Y 1 - Y 0 D=1)=E(Y 1 D=1) E(Y 0 D=1) Use data to estimate E(Y 1 D=1) as mean grade of participants Assume mean grade is 4.5 BUT: No information about E(Y 0 D=1) [may be between 1 and 6] Thus, without further information (assumptions), causal effect only known to be larger than -1.5 (=4.5-6) and smaller than 3.5 (=4.5-1)

Michael Lechner, Slide 6 The causal problem 3 Data alone can never answer a causal question The causal effect is not identified from the data alone Therefore, so-called identifying assumptions are required that cannot be tested by the data Whether such assumptions make sense or not needs to be decided about substantive knowledge of the phenomenon under investigation Thus, chosing a convincing research design is a most important task in any empirical analysis (and different from the more technical aspects of chosing a suitable econometric estimator with such a research design)

Michael Lechner, Slide 7 The causal problem 4 An ideally designed and perfectly implemented experiment solves the problem because it removes any confounding Thus it provides a sufficiently strong set of assumptions that identify the causal effect Example: Randomly selected students must be forced to do sports In this case, participants and non-participants differ only w.r.t. participation and, thus, there cannot be any confounding In this course, we discuss alternative approaches that might be used with experimental and non-experimental data

Michael Lechner, Slide 8 1 The causal problem 2 Matching

Michael Lechner, Slide 9 Matching 1 Matching solves the identification problem by assuming that a sufficiently rich data set is available Remove influence of all confounders by conditioning on them Example ATET: Use outcomes of non-treated appearing 'identical in all relevant dimensions' to the treated ATET =E(Y 1 D=1) E(Y 0 D=1)= E(Y 1 D=1) E[E(Y X,D=0) D=1] Can be viewed as similar to regression methods but without (almost) any functional form assumption allowing for flexible effect heterogeneity

Michael Lechner, Slide 10 1 The causal problem 2 Matching 3 Instrumental variable

Michael Lechner, Slide 11 IV 1 Confounding cannot be removed by conditioning on observables But there is another variable (Z) for which the causal effect on D and Y can be estimated by conditioning on observables there is only a causal effect Z Y because there is a causal effect of Z D and D Y. There is no direct effect of Z on Y. If Z is binary: (Z Y) / (Z D) gives causal effect for compliers 2SLS, IV are parametric versions of this. Can be generalized.

Michael Lechner, Slide 12 IV 2 Back to example: Use randomized incentives for sports Fulfills the conditions Causal effect identified for the subgroup of people who react to the financial incentive by changing their sports behaviour LATE( Z) = E [ EY ( Z= 1, X= x) ] E[ EY ( Z= 0, X= x) ] [ EDZ ( = 1, X= x) ] [ ( = 0, = )] E E EDZ X x

Michael Lechner, Slide 13 1 The causal problem 2 Matching 3 Instrumental variable 4 Regression discontuity design (RDD)

Michael Lechner, Slide 14 RDD 1 Interest is in the effect of repeating the first year in university on overall grades Those who fail and those who don't fail differ in many dimensions, which are not observable in your data (matching likely to fail) Suppose that there is An almost continuous scale that determines the grade Examiners are not perfect. Thus, there is some small randomness in the values of the underlying scale (and thus the grades)

Michael Lechner, Slide 15 RDD 2 Therefore, just around the cut-off, students are approx. identical No confounding in the neighbourhood of the cut-off Thus, the identification problem is solved Works also in cases in which cut-off is not strict but influences the probability of participation to some extend Estimation: Same as LATE but local around the cut-off

Michael Lechner, Slide 16 1 The causal problem 2 Matching 3 Instrumental variable 4 Regression discontuity design (RDD) 5 Differences-in-differences methods (DiD)

Michael Lechner, Slide 17 DiD and panel methods 1 Have observations before and after D happened (no panel needed) Example about university sports Compare grades of participants before and after participation Concern: Grades changes generally during study (trend) Use before-after-comparison of non-participants to remove this trend Non-parametric identification impossible Essentially similar to matching. Not much discussed here, see my survey (2010) for methods appropriate in a non-linear or semiparametric setting. See also panel section of Econometric Methods

Michael Lechner, Slide 18 Next we consider approaches that can be used to estimate the various conditional expectations that play a role for the different research designs