Lecture: Difference-in-Difference (DID)

Lecture: Difference-in-Difference (DID) 1

2 Motivation Q: How to show a new medicine is effective? Naive answer: Give the new medicine to some patients (treatment group), and see what happens This before-and-after comparison is naive because something may happen even without the new medicine. Think about self-healing. Better answer: Give the new medicine to some patients, and compare them to other group of patients that do not get the medicine (control group) It is better because control group shows what would happen to the treatment group in the absence of treatment (counterfactual) This answer is not perfect because the two groups may differ in other ways (confounding factors)

3 Perfect Answers Perfect answer 1: Give the new medicine to a treatment group that is the same as the control group in all other aspects. Here, confounding factors are constant and ceteris paribus holds. Perfect answer 2: Randomly assign the new medicine to treatment and control groups, and compare them. Randomization ensures that the two groups are otherwise comparable. Put differently, there is no systematic difference in confounding factors; the only systematic difference is getting the new medicine.

4 Reality But the reality is we cannot find so many patients that are the same in all other aspects And sometimes it is impossible to implement the randomized controlled trial. Can we randomly assign students to selective schools like Harvard, and evaluate its effect on future earning?

5 DID Based on Observed Data Let D A be the after-treatment dummy (or time dummy), and D T be the treatment dummy. The dependent variable y measures the outcome. Consider the pooled regression that includes D A,D T and their interaction term y = β 0 + β 1 D A + β 2 D T + β 3 (D A D T ) + β 4 x + u (1) 1. β 0 measures the average before outcome for control group 2. β 1 measures (average after outcome - average before outcome ) for control group 3. β 2 measures (the average before outcome for treatment group - average before outcome for control group) 4. β 3 is DID, measuring (the average after outcome for treatment group - average after outcome for control group)-(the average before outcome for treatment group - average before outcome for control group)

Big Picture 1 6

Big Picture 2 7

8 Remarks We can allow for observable difference between the two groups by controlling x We need to assume the unobservable characteristics are the same for the two groups The validity of DID relies upon this crucial assumption!

Heuristic Proof 9 Denote the true causal effect (treatment effect) by β. For the treatment group, we compare the before and after outcomes by running the regression y treatment group = β 0 + β 1 D A + u (2) ˆβ 1 β + bias treatment group (3) where the bias is due to confounding factors. Similarly, we can do the same for the control group y control group = α 0 + α 1 D A + u (4) ˆα 1 0 + bias control group (5) Assuming It follows that bias treatment group = bias control group (6) DID ˆβ 1 ˆα 1 β (7)

10 Bottom Line In order for DID to work, the treatment and control groups need to be as similar as possible. Make sure you do not compare apple to orange They can differ in x for which we have data. The two groups cannot differ in characteristics that are unobservable DID fails if people choose to receive the treatment based on unobserved characteristic. For example, conscious parents may intentionally send their kids to small class (and receive treatment). This can cause self-selection bias and invalidate DID. It is nice to have multiple control groups. Significantly different DID estimates can raise a red flag. Even better, consider the synthetic control method proposed by Alberto Abadie.

(Optional) Synthetic Control Method 1 11 We may consider SCM when we have multiple control groups The basic idea of SCM is to construct just one non-existent combined or synthetic control group, which is a weighted average of the original control groups We want to find optimal weight so that the synthetic control group looks as similar as possible to the treatment group before treatment in terms of observable characteristics For instance, we want to investigate how the 2011 earthquake and tsunami affect Japan (treatment group), and we want to focus on the GDP (outcome). To do so we need to find control groups countries that do not suffer earthquake or tsunami and have similar GDP. Suppose UK and Germany satisfy both conditions, so we have two control groups. The idea of SCM is to combine them into an imaginary country called UG, a weighted average of UK and Germany that mimics Japan to the largest extent. We can construct synthetic control (SC) according to Cobb-Douglas production function Y = K β L α, where Y is GDP, K is capital, L is labor and β,α are input-shares.

(Optional) Synthetic Control Method 2 12 Mathematically, SC is a weighted average of UK and Germany SC = wuk + (1 w)germany We try to find optimal weights (w,1 w) that minimize the squared difference (why not absolute value?) between SC and treatment in terms of K and L (observable characteristics): min w β ( K Japan w K UK (1 w) K Germany) 2 + α ( L Japan w L UK (1 w) L Germany) 2 (8) where K, L denote average (before 2011).We can download a stata package called synth to solve (8). After w is obtained, the estimated treatment effect at period T (after 2011) is GDP Japan T wgdp UK T (1 w)gdp Germany T (9) Exercise: how to modify (8) if we add France (another control) and technology (another input)? What if α and β are unknown?