150C Causal Inference

Size: px
Start display at page:

Download "150C Causal Inference"

Transcription

1 150C Causal Inference Panel Methods Jonathan Mummolo Stanford Jonathan Mummolo (Stanford) 150C Causal Inference 1 / 59

2 Unobserved Group Effects 7 6 SD(Economic Output) for all data = 346 Range(Economic Output) for all data = 1,034 Country 5 Democracy (1 7 Scale) Country 2 Country 3 Country 4 2 Country Economic Output ($ billions) Jonathan Mummolo (Stanford) 150C Causal Inference 2 / 59

3 Outline Panel Setup 1 Panel Setup 2 Unobserved Effects Model and Pooled OLS 3 Fixed Effects Regression 4 Modeling Time 5 Modeling Dynamic Effects Jonathan Mummolo (Stanford) 150C Causal Inference 3 / 59

4 Panel Setup Panel Setup Let y and x (x 1, x 2,..., x K ) be observable random variables and c be an unobservable random variable We are interested in the partial effects of variable x j in the population regression function E[y x 1, x 2,..., x K, c] We observe a sample of i = 1, 2,..., N cross-sectional units for t = 1, 2,...T time periods (a balanced panel) For each unit i, we denote the observable variables for all time periods as {(y it, x it ) : t = 1, 2,..., T } x it (x it1, x it2,..., x itk ) is a 1 K vector Typically assume that cross-sectional units are i.i.d. draws from the population: {y i, x i, c i } N i=1 i.i.d. (cross-sectional independence) y i (y i1, y i2,..., y it ) and x i (x i1, x i2,..., x it ) Consider asymptotic properties with T fixed and N Jonathan Mummolo (Stanford) 150C Causal Inference 4 / 59

5 Panel Setup Panel Setup Single unit: y i = y i1. y it. y it T 1 X i = x i,1,1 x i,1,2 x i,1,j x i,1,k.... x i,t,1 x i,t,2 x i,t,j x i,t,k.... x i,t,1 x i,t,2 x i,t,j x i,t,k T K Panel with all units: y = y 1. y i. y N NT 1 X = X 1. X i. X N NT K Jonathan Mummolo (Stanford) 150C Causal Inference 5 / 59

6 Outline Unobserved Effects Model and Pooled OLS 1 Panel Setup 2 Unobserved Effects Model and Pooled OLS 3 Fixed Effects Regression 4 Modeling Time 5 Modeling Dynamic Effects Jonathan Mummolo (Stanford) 150C Causal Inference 6 / 59

7 Unobserved Effects Model and Pooled OLS Unobserved Effects Model: Farm Output For a randomly drawn cross-sectional unit i, the model is given by y it = x it β + c i + ε it, t = 1, 2,..., T y it : output of farm i in year t x it : 1 K vector of variable inputs for farm i in year t, such as labor, fertilizer, etc. plus an intercept β: K 1 vector of marginal effects of variable inputs c i : farm effect, i.e. the sum of all time-invariant inputs known to farmer i (but unobserved for the researcher), such as soil quality, managerial ability, etc. often called: unobserved effect, unobserved heterogeneity, etc. ε it : time-varying unobserved inputs, such as rainfall, unknown to the farmer at the time the decision on the variable inputs x it is made often called: idiosyncratic error What happens when we regress y it on x it? Jonathan Mummolo (Stanford) 150C Causal Inference 7 / 59

8 Unobserved Effects Model and Pooled OLS Pooled OLS When we ignore the panel structure and regress y it on x it we get y it = x it β + v it, t = 1, 2,..., T with composite error v it c i + ε it Main assumption to obtain consistent estimates for β is: Jonathan Mummolo (Stanford) 150C Causal Inference 8 / 59

9 Unobserved Effects Model and Pooled OLS Pooled OLS When we ignore the panel structure and regress y it on x it we get y it = x it β + v it, t = 1, 2,..., T with composite error v it c i + ε it Main assumption to obtain consistent estimates for β is: E[v it x i1, x i2,..., x it ] = E[v it x it ] = 0 for t = 1, 2,..., T x it are strictly exogenous: the composite error v it in each time period is uncorrelated with the past, current, and future regressors But: labour input x it likely depends on soil quality c i and so we have omitted variable bias and ˆβ is not consistent No correlation between x it and v it implies no correlation between unobserved effect c i and x it for all t Violations are common: whenever we omit a time-constant variable that is correlated with the regressors Jonathan Mummolo (Stanford) 150C Causal Inference 8 / 59

10 Unobserved Effects Model and Pooled OLS Unobserved Effects Model: Program Evaluation Program evaluation model: y it = prog it β + c i + ε it, y it : log wage of individual i in year t t = 1, 2,..., T prog it : indicator coded 1 if individual i participants in training program at t and 0 otherwise β: effect of program c i : sum of all time-invariant unobserved characteristics that affect wages, such as ability, etc. What happens when we regress y it on prog it? Jonathan Mummolo (Stanford) 150C Causal Inference 9 / 59

11 Unobserved Effects Model and Pooled OLS Unobserved Effects Model: Program Evaluation Program evaluation model: y it = prog it β + c i + ε it, y it : log wage of individual i in year t t = 1, 2,..., T prog it : indicator coded 1 if individual i participants in training program at t and 0 otherwise β: effect of program c i : sum of all time-invariant unobserved characteristics that affect wages, such as ability, etc. What happens when we regress y it on prog it? ˆβ not consistent since prog it is likely correlated with c i (e.g. ability) Always ask: Is there a time-constant unobserved variable (c i ) that is correlated with the regressors? If yes, pooled OLS is problematic Additional problem: v it c i + ε it are serially correlated for same i since c i is present in each t and thus pooled OLS standard errors are invalid Jonathan Mummolo (Stanford) 150C Causal Inference 9 / 59

12 Outline Fixed Effects Regression 1 Panel Setup 2 Unobserved Effects Model and Pooled OLS 3 Fixed Effects Regression 4 Modeling Time 5 Modeling Dynamic Effects Jonathan Mummolo (Stanford) 150C Causal Inference 10 / 59

13 Fixed Effects Regression Fixed Effect Regression Our unobserved effects model is: y it = x it β + c i + ε it, t = 1, 2,..., T If we have data on multiple time periods, we can think of c i as fixed effects or nuisance parameters to be estimated Jonathan Mummolo (Stanford) 150C Causal Inference 11 / 59

14 Fixed Effects Regression Fixed Effects Regression Estimating a regression with the time-demeaned variables ÿ it y it ȳ i and ẍ it x it x i is numerically equivalent to a regression of y it on x it and unit specific dummy variables. Fixed effects estimator is often called the within estimator because it only uses the time variation within each cross-sectional unit. Even better, the regression with the time-demeaned variables is consistent for β even when Cov[x it, c i ] 0, because time-demeaning eliminates the unobserved effects: y it = x it β + c i + ε it ȳ i = x i β + c i + ε i (y it ȳ i ) = (x it x i )β + (c i c i ) + (ε it ε i ) ÿ it = ẍ it β + ε it Jonathan Mummolo (Stanford) 150C Causal Inference 12 / 59

15 Fixed Effects Regression Fixed Effects Regression: Main Results Identification assumptions: 1 E[ε it x i1, x i2,..., x it, c i ] = 0, t = 1, 2,..., T regressors are strictly exogenous conditional on the unobserved effect allows x it to be arbitrarily related to c i 2 regressors vary over time for at least some i and are not collinear Fixed effects estimator: 1 Demean and regress ÿ it on ẍ it (need to correct degrees of freedom) 2 Regress y it on x it and unit dummies (dummy variable regression) 3 Regress y it on x it with canned fixed effects routine R: plm(y~x, model = within, data = data) Properties (under assumptions 1-2): ˆβ FE is consistent: plim ˆβ FE,N = β N ˆβ FE also unbiased conditional on X Jonathan Mummolo (Stanford) 150C Causal Inference 13 / 59

16 Fixed Effects Regression Fixed Effects Regression: Main Issues Inference: Standard errors have to be clustered by panel unit (e.g. farm) to allow correlation in the ε it s for the same i. R: coeftest(mod, vcov=function(x) vcovhc(x, cluster="group", type="hc1")) Yields valid inference as long as number of clusters is reasonably large Typically we care about β, but unit fixed effects c i could be of interest plm routine demeans the data before running the regression and therefore does not estimate ĉ i Jonathan Mummolo (Stanford) 150C Causal Inference 14 / 59

17 Fixed Effects Regression Example: Direct Democracy and Naturalizations Do minorities fare worse under direct democracy than under representative democracy? Hainmueller and Hangartner (2016, AJPS) examine data on naturalization requests of immigrants in Switzerland, where municipalities vote on naturalization applications in: referendums (direct democracy) elected municipality councils (representative democracy) Annual panel data from 1,400 municipalities for the period y it : naturalization rate = # naturalizations it / eligible foreign population it 1 x it : 1 if municipality used representative democracy, 0 if municipality used direct democracy in year t Jonathan Mummolo (Stanford) 150C Causal Inference 15 / 59

18 Fixed Effects Regression Naturalization Panel Data Long Format > d <- read.dta("swiss_panel_long.dta") > print(d[30:40,],digits=2) muniid muni_name year nat_rate repdem 30 2 Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Bonstetten Bonstetten Jonathan Mummolo (Stanford) 150C Causal Inference 16 / 59

19 Pooled OLS Fixed Effects Regression > summary(lm(nat_rate~repdem,data=d)) Call: lm(formula = nat_rate ~ repdem, data = d) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** repdem <2e-16 *** Jonathan Mummolo (Stanford) 150C Causal Inference 17 / 59

20 Fixed Effects Regression Time-Demeaning for Fixed Effects: y it ÿ it > library(plyr) > d <- ddply(d,.(muniid), transform, + nat_rate_demean = nat_rate - mean(nat_rate), + nat_rate_mean = mean(nat_rate), + repdem_demean = repdem - mean(repdem)) > > print(d[20:38, + c("muniid","muni_name","year","nat_rate","nat_rate_mean","nat_rate_demean","repdem","repdem_demean") + ],digits=2) muniid muni_name year nat_rate nat_rate_mean nat_rate_demean repdem repdem_demean 20 2 Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Affoltern A.A Jonathan Mummolo (Stanford) 150C Causal Inference 18 / 59

21 Fixed Effects Regression Fixed Effects Regression with Demeaned Data > summary(lm(nat_rate_demean~repdem_demean,data=d)) Call: lm(formula = nat_rate_demean ~ repdem_demean, data = d) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.266e e repdem_demean 3.023e e <2e-16 *** Jonathan Mummolo (Stanford) 150C Causal Inference 19 / 59

22 Fixed Effects Regression Fixed Effects Regression with Canned Routine > library(plm) > library(lmtest) > d <- plm.data(d, indexes = c("muniid", "year")) > mod_fe <- plm(nat_rate~repdem,data=d,model="within") > coeftest(mod_fe, vcov=function(x) vcovhc(x, cluster="group", type="hc1")) t test of coefficients: Estimate Std. Error t value Pr(> t ) repdem < 2.2e-16 *** --- Jonathan Mummolo (Stanford) 150C Causal Inference 20 / 59

23 Fixed Effects Regression Fixed Effects Regression with Dummies > mod_du <- plm(nat_rate~repdem+as.factor(muniid),data=d,model="pooling") > coeftest(mod_du, vcov=function(x) vcovhc(x, cluster="group", type="hc1")) t test of coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e e e+01 < 2.2e-16 *** repdem e e e+01 < 2.2e-16 *** as.factor(muniid) e e e+07 < 2.2e-16 *** as.factor(muniid) e e e+07 < 2.2e-16 *** as.factor(muniid) e e e+07 < 2.2e-16 *** as.factor(muniid) e e e+07 < 2.2e-16 *** as.factor(muniid) e e e+07 < 2.2e-16 *** as.factor(muniid) e e e as.factor(muniid) e e e+02 < 2.2e-16 *** as.factor(muniid) e e e+02 < 2.2e-16 *** as.factor(muniid) e+00 NA NA NA as.factor(muniid) e e e+01 < 2.2e-16 ***. Jonathan Mummolo (Stanford) 150C Causal Inference 21 / 59

24 Fixed Effects Regression Applying Fixed Effects We can use fixed effects for other data structures to restrict comparisons to within unit variation Matched pairs Twin fixed effects to control for unobserved effects of family background Cluster fixed effects in hierarchical data School fixed effects to control for unobserved effects of school Jonathan Mummolo (Stanford) 150C Causal Inference 22 / 59

25 Fixed Effects Regression Problems that (even) fixed effects do not solve y it = x it β + c i + ε it, t = 1, 2,..., T Where y it is murder rate and x it is police spending per capita What happens when we regress y on x and city fixed effects? Jonathan Mummolo (Stanford) 150C Causal Inference 23 / 59

26 Fixed Effects Regression Problems that (even) fixed effects do not solve y it = x it β + c i + ε it, t = 1, 2,..., T Where y it is murder rate and x it is police spending per capita What happens when we regress y on x and city fixed effects? ˆβ FE inconsistent unless strict exogeneity conditional on c i holds E[ε it x i1, x i2,..., x it, c i ] = 0, t = 1, 2,..., T implies ε it uncorrelated with past, current, and future regressors Most common violations: 1 Time-varying omitted variables economic boom leads to more police spending and less murders can include time-varying controls, but avoid post-treatment bias 2 Simultaneity if city adjusts police based on past murder rate, then spending t+1 is correlated with ε t (since higher ε t leads to higher murder rate at t) strictly exogenous x cannot react to what happens to y in the past or the future! Fixed effects do not obviate need for good research design! Jonathan Mummolo (Stanford) 150C Causal Inference 23 / 59

27 Outline Modeling Time 1 Panel Setup 2 Unobserved Effects Model and Pooled OLS 3 Fixed Effects Regression 4 Modeling Time 5 Modeling Dynamic Effects Jonathan Mummolo (Stanford) 150C Causal Inference 24 / 59

28 Common Shocks or Causation? Jonathan Mummolo (Stanford) 150C Causal Inference 25 / 59

29 Naturalization Rate Over Time Mean Nat Rate Year Jonathan Mummolo (Stanford) 150C Causal Inference 26 / 59

30 Representative Democracy Over Time representative democracy (0/1) Year Jonathan Mummolo (Stanford) 150C Causal Inference 27 / 59

31 Adding Time Effects Reconsider our unobserved effects model: y it = x it β + c i + ε it, t = 1, 2,..., T Fixed effects assumption: E[ε it x i, c i ] = 0, t = 1, 2,..., T : regressors are strictly exogenous conditional on the unobserved effect Typical violation: Common shocks that affect all units in the same way and are correlated with x it. Trends in farming technology or climate affect productivity Trends in immigration inflows affect naturalization rates We can allow for such common shocks by including time effects into the model Jonathan Mummolo (Stanford) 150C Causal Inference 28 / 59

32 Fixed Effects Regression: Adding Time Effects Linear time trend: y it = x it β + c i + t + ε it, t = 1, 2,..., T Linear time trend common to all units Time fixed effects: y it = x it β + c i + t t + ε it, t = 1, 2,..., T Common shock in each time period Generalized difference-in-differences model Unit specific linear time trends: y it = x it β + c i + g i t + t t + ε it, t = 1, 2,..., T Linear time trends that vary by unit Jonathan Mummolo (Stanford) 150C Causal Inference 29 / 59

33 Modeling Time Effects Jonathan Mummolo (Stanford) 150C Causal Inference 30 / 59

34 Modeling Time Effects > d$time <- as.numeric(d$year) > d[1:21,c("muniid","muni_name","year","time")] muniid muni_name year time 1 1 Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Aeugst A.A Affoltern A.A Affoltern A.A Jonathan Mummolo (Stanford) 150C Causal Inference 31 / 59

35 Fixed Effects Regression: Linear Time Trend > mod_fe <- plm(nat_rate~repdem+time,data=d,model="within") > coeftest(mod_fe, vcov=function(x) vcovhc(x, cluster="group", type="hc1")) t test of coefficients: Estimate Std. Error t value Pr(> t ) repdem ** time < 2.2e-16 *** Jonathan Mummolo (Stanford) 150C Causal Inference 32 / 59

36 Fixed Effects Regression: Year Fixed Effects > mod_fe <- plm(nat_rate~repdem+year,data=d,model="within") > coeftest(mod_fe, vcov=function(x) vcovhc(x, cluster="group", type="hc1")) t test of coefficients: Estimate Std. Error t value Pr(> t ) repdem e-05 *** year * year year e-05 *** year e-05 *** year e-06 *** year e-07 *** year e-08 *** year e-07 *** year < 2.2e-16 *** year e-12 *** year < 2.2e-16 *** year < 2.2e-16 *** year < 2.2e-16 *** year < 2.2e-16 *** year < 2.2e-16 *** year e-14 *** year < 2.2e-16 *** year e-07 *** --- Jonathan Mummolo (Stanford) 150C Causal Inference 33 / 59

37 ~ Modeling Time 'I( Unit Specific Time Trends240 Often Chapter 5 Eliminate Results ", AOt" TABLE Estimated effects of labor regulation on the performance of firms in Indian states Labor regulation (lagged) (.064) Log development expenditure per capita Log installed electricity capacity per capita (1) (2) (3) (4) (.051).240 (.128).089 (.061) Log state population.720 (.96) (.039).184 (.119).082 (.054) (1.192).0002 (.020).241 (.106).023 (.033) (2.326) Fixed Effe the estimates in colum specific covariates, suc and state population. addition of state-level the minimum wage stu Besley and Burgess es specific trends kills the in column 4. Apparen in states where output trend therefore drives Congress majority We've la beled the two time because this is th (.01) (.010) econometrics. But the Hard left majority (.017) (.009) of states, the subscrip some of which are af Janata majority For example, Kugler, J (.026) (.033) effects of age-specific e Regional majority Likewise, instead of t other types ofcharacte (.009) (.023) (1999), who studied State-specific trends Adjusted R2 No.93 No.93 No.94 Yes.95 laws on teen pregnan birth. Regardless of t Notes: Adapted from Besley and Burgess (2004), table IV. The table reports regression DD estimates of the effects of labor regulation on productivity. The dependent variable is log manufacturing output per capita. All models include always set up an impl question of whether careful consideration. state and year effects. Robust standard errors clustered at the state level are One potential pitfa reported parentheses. State amendments to the Industrial Disputes Act are labor regulationcoded increased 1 = pro-worker, in0 states = neutral, -1 where = pro-employer output and then was cumulated declining anyway position of the treatm over the period to generate the labor regulation measure. Log of installed result of treatment. electrical capacity is measured in kilowatts, and log development expenditure is real per capita state spending on social and economic services. Congress, Jonathan Mummolo (Stanford) 150C Causal Inference hard left, Janata, and regional majority are counts of the number of years and time comparisons of the generosity 34 / of 59 pu

38 Fixed Effects Regression: Unit Specific Time Trends > mod_fe <- plm(nat_rate~ repdem+muniid*time+year,data=d,model="within") > coeftest(mod_fe, vcov=function(x) vcovhc(x, cluster="group", type="hc1")) t test of coefficients: Estimate Std. Error t value Pr(> t ) repdem e-05 *** muniid2:time e e e+07 < 2.2e-16 *** muniid3:time e e e+07 < 2.2e-16 ***. Jonathan Mummolo (Stanford) 150C Causal Inference 35 / 59

39 Unit Specific Quadratic Time Trends > d$time2 <- d$time^2 > mod_fe <- plm(nat_rate~repdem+ muniid*time+muniid*time^2+year,data=d,model="within") > coeftest(mod_fe, vcov=function(x) vcovhc(x, cluster="group", type="hc1")) t test of coefficients: Estimate Std. Error t value Pr(> t ) repdem e-05 *** muniid2:time e < 3.4e-16 *** muniid2:time e e+07 < 5.6e-16 ***. Jonathan Mummolo (Stanford) 150C Causal Inference 36 / 59

40 Interpreting FE Results Sensibly Using fixed effects often removes large amounts of variation from the data Jonathan Mummolo (Stanford) 150C Causal Inference 37 / 59

41 Interpreting FE Results Sensibly Using fixed effects often removes large amounts of variation from the data With unit fixed effects, we are left explaining changes in y within units over time, after removing all variation between units Jonathan Mummolo (Stanford) 150C Causal Inference 37 / 59

42 Interpreting FE Results Sensibly Using fixed effects often removes large amounts of variation from the data With unit fixed effects, we are left explaining changes in y within units over time, after removing all variation between units With unit and time fixed effects, we are left explaining changes in y within units over time net of common time shocks Jonathan Mummolo (Stanford) 150C Causal Inference 37 / 59

43 Interpreting FE Results Sensibly Using fixed effects often removes large amounts of variation from the data With unit fixed effects, we are left explaining changes in y within units over time, after removing all variation between units With unit and time fixed effects, we are left explaining changes in y within units over time net of common time shocks Important to consider realistic counterfactuals when interpreting results, i.e. a one-unit or one SD increase in x may no longer be of interest because x never shifts this much within a unit over time Jonathan Mummolo (Stanford) 150C Causal Inference 37 / 59

44 Interpreting FE Results Sensibly Using fixed effects often removes large amounts of variation from the data With unit fixed effects, we are left explaining changes in y within units over time, after removing all variation between units With unit and time fixed effects, we are left explaining changes in y within units over time net of common time shocks Important to consider realistic counterfactuals when interpreting results, i.e. a one-unit or one SD increase in x may no longer be of interest because x never shifts this much within a unit over time Could end up extrapolating beyond the coverage in the data, yielding unreliable counterfactual estimates Jonathan Mummolo (Stanford) 150C Causal Inference 37 / 59

45 Interpreting FE Results Sensibly Using fixed effects often removes large amounts of variation from the data With unit fixed effects, we are left explaining changes in y within units over time, after removing all variation between units With unit and time fixed effects, we are left explaining changes in y within units over time net of common time shocks Important to consider realistic counterfactuals when interpreting results, i.e. a one-unit or one SD increase in x may no longer be of interest because x never shifts this much within a unit over time Could end up extrapolating beyond the coverage in the data, yielding unreliable counterfactual estimates More important when treatment is continuous Jonathan Mummolo (Stanford) 150C Causal Inference 37 / 59

46 Interpreting FE Results Sensibly Using fixed effects often removes large amounts of variation from the data With unit fixed effects, we are left explaining changes in y within units over time, after removing all variation between units With unit and time fixed effects, we are left explaining changes in y within units over time net of common time shocks Important to consider realistic counterfactuals when interpreting results, i.e. a one-unit or one SD increase in x may no longer be of interest because x never shifts this much within a unit over time Could end up extrapolating beyond the coverage in the data, yielding unreliable counterfactual estimates More important when treatment is continuous Useful first step: examine (plot, summarize) the demeaned data Jonathan Mummolo (Stanford) 150C Causal Inference 37 / 59

47 Interpreting FE Results Sensibly Using fixed effects often removes large amounts of variation from the data With unit fixed effects, we are left explaining changes in y within units over time, after removing all variation between units With unit and time fixed effects, we are left explaining changes in y within units over time net of common time shocks Important to consider realistic counterfactuals when interpreting results, i.e. a one-unit or one SD increase in x may no longer be of interest because x never shifts this much within a unit over time Could end up extrapolating beyond the coverage in the data, yielding unreliable counterfactual estimates More important when treatment is continuous Useful first step: examine (plot, summarize) the demeaned data Also a good way to check functional form assumptions: is the bivariate relationship of interest still linear after conditioning on covariates? Jonathan Mummolo (Stanford) 150C Causal Inference 37 / 59

48 Pooled Data Modeling Time 7 6 SD(Economic Output) for all data = 346 Range(Economic Output) for all data = 1,034 Country 5 Democracy (1 7 Scale) Country 2 Country 3 Country 4 2 Country Economic Output ($ billions) Jonathan Mummolo (Stanford) 150C Causal Inference 38 / 59

49 Reduction in Variation 3 SD(Economic Output) for demeaned data = 24.6 Range(Economic Output) for demeaned data = Mean Deviated Democracy 2 1 One SD Increase in Economic Output Min. to Max. Increase in Economic Output Mean Deviated Economic Output ($ billions) Jonathan Mummolo (Stanford) 150C Causal Inference 39 / 59

50 Example: Healy and Malhotra (2009) Estimates effect of federal disaster relief ($) on presidential vote share Jonathan Mummolo (Stanford) 150C Causal Inference 40 / 59

51 Example: Healy and Malhotra (2009) Estimates effect of federal disaster relief ($) on presidential vote share Observations on all U.S. counties over several election years Jonathan Mummolo (Stanford) 150C Causal Inference 40 / 59

52 Example: Healy and Malhotra (2009) Estimates effect of federal disaster relief ($) on presidential vote share Observations on all U.S. counties over several election years > head(d[,c("name_state","county","year", "all_current_relief","incum_vote")]) name_state county year all_current_relief incum_vote 1 AL AUTAUGA AL AUTAUGA AL AUTAUGA AL AUTAUGA AL AUTAUGA AL BALDWIN Jonathan Mummolo (Stanford) 150C Causal Inference 40 / 59

53 Example: Healy and Malhotra (2009) Pooled OLS Result > summary(lm(incum_vote~all_current_relief, data=d)) Call: lm(formula = incum_vote ~ all_current_relief, data = d) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** all_current_relief <2e-16 *** Jonathan Mummolo (Stanford) 150C Causal Inference 41 / 59

54 Bivariate Plot Pooled Ln(Disaster Relief per Capita) Incumbent Vote % OLS fit LOESS fit Jonathan Mummolo (Stanford) 150C Causal Inference 42 / 59

55 Example: Healy and Malhotra (2009) County Fixed Effects > summary(m1<-lm(incum_vote~ all_current_relief +factor(fips), data=d))$coefficients[1:2,] Estimate Std. Error t value Pr(> t ) (Intercept) e-16 all_current_relief e+00 Jonathan Mummolo (Stanford) 150C Causal Inference 43 / 59

56 County FE Pooled Ln(Disaster Relief per Capita) Incumbent Vote % County FE Ln(Disaster Relief per Capita) Within Counties Incumbent Vote % within Counties OLS fit LOESS fit Jonathan Mummolo (Stanford) 150C Causal Inference 44 / 59

57 What About Multi-Way Fixed Effects? How do we visualize data with multi-way fixed effects? Jonathan Mummolo (Stanford) 150C Causal Inference 45 / 59

58 What About Multi-Way Fixed Effects? How do we visualize data with multi-way fixed effects? Can residualize the data and then plot Jonathan Mummolo (Stanford) 150C Causal Inference 45 / 59

59 What About Multi-Way Fixed Effects? How do we visualize data with multi-way fixed effects? Can residualize the data and then plot Useful exercise with all multivariate regressions Jonathan Mummolo (Stanford) 150C Causal Inference 45 / 59

60 Frisch-Waugh-Lovell Theorem Suppose we are interested in estimating β 1 in the following model: y = α + β 1 x + δ 2 z 1 + β 3 z β k z k + ε where x is the treatment and z 1 -z k are a set of control variables. β 1 can also be estimated through the following multi-step process. Jonathan Mummolo (Stanford) 150C Causal Inference 46 / 59

61 Frisch-Waugh-Lovell Theorem Suppose we are interested in estimating β 1 in the following model: y = α + β 1 x + δ 2 z 1 + β 3 z β k z k + ε where x is the treatment and z 1 -z k are a set of control variables. β 1 can also be estimated through the following multi-step process. 1. Regress x on an intercept and z 1 -z k and store the residuals, r xz = (x ˆx xz ) Jonathan Mummolo (Stanford) 150C Causal Inference 46 / 59

62 Frisch-Waugh-Lovell Theorem Suppose we are interested in estimating β 1 in the following model: y = α + β 1 x + δ 2 z 1 + β 3 z β k z k + ε where x is the treatment and z 1 -z k are a set of control variables. β 1 can also be estimated through the following multi-step process. 1. Regress x on an intercept and z 1 -z k and store the residuals, r xz = (x ˆx xz ) 2. Regress y on an intercept and z 1 -z k and store the residuals, r yz = (y ŷ yz ) Jonathan Mummolo (Stanford) 150C Causal Inference 46 / 59

63 Frisch-Waugh-Lovell Theorem Suppose we are interested in estimating β 1 in the following model: y = α + β 1 x + δ 2 z 1 + β 3 z β k z k + ε where x is the treatment and z 1 -z k are a set of control variables. β 1 can also be estimated through the following multi-step process. 1. Regress x on an intercept and z 1 -z k and store the residuals, r xz = (x ˆx xz ) 2. Regress y on an intercept and z 1 -z k and store the residuals, r yz = (y ŷ yz ) 3. Estimate r yz = θ + β 1 r xz + v Jonathan Mummolo (Stanford) 150C Causal Inference 46 / 59

64 Frisch-Waugh-Lovell Theorem Suppose we are interested in estimating β 1 in the following model: y = α + β 1 x + δ 2 z 1 + β 3 z β k z k + ε where x is the treatment and z 1 -z k are a set of control variables. β 1 can also be estimated through the following multi-step process. 1. Regress x on an intercept and z 1 -z k and store the residuals, r xz = (x ˆx xz ) 2. Regress y on an intercept and z 1 -z k and store the residuals, r yz = (y ŷ yz ) 3. Estimate r yz = θ + β 1 r xz + v Then ˆβ 1 = ˆβ 1 Jonathan Mummolo (Stanford) 150C Causal Inference 46 / 59

65 Frisch-Waugh-Lovell Theorem Suppose we are interested in estimating β 1 in the following model: y = α + β 1 x + δ 2 z 1 + β 3 z β k z k + ε where x is the treatment and z 1 -z k are a set of control variables. β 1 can also be estimated through the following multi-step process. 1. Regress x on an intercept and z 1 -z k and store the residuals, r xz = (x ˆx xz ) 2. Regress y on an intercept and z 1 -z k and store the residuals, r yz = (y ŷ yz ) 3. Estimate r yz = θ + β 1 r xz + v Then ˆβ 1 = ˆβ 1 (Intuition: Remove all variation explained by z 1 z k that is shared with x. What relationship between x and y remains?) Jonathan Mummolo (Stanford) 150C Causal Inference 46 / 59

66 Frisch-Waugh-Lovell Theorem Suppose we are interested in estimating β 1 in the following model: y = α + β 1 x + δ 2 z 1 + β 3 z β k z k + ε where x is the treatment and z 1 -z k are a set of control variables. β 1 can also be estimated through the following multi-step process. 1. Regress x on an intercept and z 1 -z k and store the residuals, r xz = (x ˆx xz ) 2. Regress y on an intercept and z 1 -z k and store the residuals, r yz = (y ŷ yz ) 3. Estimate r yz = θ + β 1 r xz + v Then ˆβ 1 = ˆβ 1 (Intuition: Remove all variation explained by z 1 z k that is shared with x. What relationship between x and y remains?) In one-way FE only, equivalent to demeaning the data within each cross-sectional unit Jonathan Mummolo (Stanford) 150C Causal Inference 46 / 59

67 Healy and Malhotra (2009) 1. Regress Disaster Relief on an intercept and county and year dummies, and store the residuals, r xz = (x ˆx xz ) Jonathan Mummolo (Stanford) 150C Causal Inference 47 / 59

68 Healy and Malhotra (2009) 1. Regress Disaster Relief on an intercept and county and year dummies, and store the residuals, r xz = (x ˆx xz ) 2. Regress Vote Share on an intercept and county and year dummies and store the residuals, r yz = (y ŷ yz ) Jonathan Mummolo (Stanford) 150C Causal Inference 47 / 59

69 Healy and Malhotra (2009) 1. Regress Disaster Relief on an intercept and county and year dummies, and store the residuals, r xz = (x ˆx xz ) 2. Regress Vote Share on an intercept and county and year dummies and store the residuals, r yz = (y ŷ yz ) 3. Estimate r yz = θ + β 1 r xz + v Jonathan Mummolo (Stanford) 150C Causal Inference 47 / 59

70 County and Year FE Dummy Model: > m1b<-lm(incum_vote~ all_current_relief+ factor(fips)+factor(year), data=d) > summary(m1b)$coefficients[1:5,] Estimate Std. Error t value Pr(> t ) (Intercept) e-30 all_current_relief e-119 factor(fips) e-01 factor(fips) e-01 factor(fips) e-01 Jonathan Mummolo (Stanford) 150C Causal Inference 48 / 59

71 County and Year FE Residualized Model: m3b<-lm(all_current_relief~factor(fips)+factor(year), data=d) m4b<-lm(incum_vote ~factor(fips)+factor(year), data=d) > summary(lm(m4b$residuals~m3b$residuals))$coefficients Estimate Std. Error t value Pr(> t ) (Intercept) e e e+00 m3b$residuals e e e-148 Jonathan Mummolo (Stanford) 150C Causal Inference 49 / 59

72 County and Year FE Pooled Ln(Disaster Relief per Capita) Incumbent Vote % SD(X)= County FE Ln(Disaster Relief per Capita) Within Counties Incumbent Vote % Within Counties SD(X*)= County and Year FE Ln(Disaster Relief per Capita) Within Counties/Years Incumbent Vote % within Counties SD(X**)=1.004 OLS fit LOESS fit Jonathan Mummolo (Stanford) 150C Causal Inference 50 / 59

73 Interpreting Results Modeling Time Dummy Model: > m1b<-lm(incum_vote~ all_current_relief+ factor(fips)+factor(year), data=d) > summary(m1b)$coefficients[1:5,] Estimate Std. Error t value Pr(> t ) (Intercept) e-30 all_current_relief e-119 With county and year FE, more reasonable to discuss a one-sd shift in X = disaster relief after residualizing with respect to county and year dummies * 2.17 = 2.26 additional points for incumbent using SD of X * 2.17 = 3.97 additional points for incumbent using SD of X =.569 meaning the substantive impact is cut in half once we consider a plausible shift in X. Jonathan Mummolo (Stanford) 150C Causal Inference 51 / 59

74 Outline Modeling Dynamic Effects 1 Panel Setup 2 Unobserved Effects Model and Pooled OLS 3 Fixed Effects Regression 4 Modeling Time 5 Modeling Dynamic Effects Jonathan Mummolo (Stanford) 150C Causal Inference 52 / 59

75 Modeling Dynamic Effects Leads and Lags Let D it be a binary indicator coded 1 if unit i switched from control to treatment between t and t 1; 0 otherwise Lags: D it 1 : unit switched between t 1 and t 2 Leads: D it+1 : unit switches between t + 1 and t Include lags and leads into the fixed effects model: y it = D it+2 β 2 + D it+1 β 1 + D it β 0 + D it 1 β 1 + D it 2 β 2 + c i + ε it Interpretation of coefficients: Leads β 1, β 2, etc. test for anticipation effects (should be zero!) Switch β 0 tests for immediate effect Lags β 1, β 2, etc. test for long-run effects highest lag dummy can be coded 1 for all post-switch years Jonathan Mummolo (Stanford) 150C Causal Inference 53 / 59

76 Modeling Dynamic Effects Lags and Leads of Switch to Representative Democracy > d[970:989,c(1:3,5,12:ncol(d))] muniid year muni_name repdem switcht lag1 lag2 lag3 lead1 lead2 lead3 lead4 lead Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Hagenbuch Pfungen Jonathan Mummolo (Stanford) 150C Causal Inference 54 / 59

77 Modeling Dynamic Effects Dynamic Effect of Switching to Representative Democracy > mod_all <- plm(nat_rate~lag3+lag2+lag1+switcht+ lead1+lead2+lead3+lead4+lead5+ + year,data=d,model="within") > coeftest(mod_all, vcov=function(x) vcovhc(x, cluster="group", type="hc1")) t test of coefficients: Estimate Std. Error t value Pr(> t ) lag * lag ** lag *** switcht lead lead lead lead lead year * Jonathan Mummolo (Stanford) 150C Causal Inference 55 / 59

78 Modeling Dynamic Effects Dynamic Effect of Switching to Representative Democracy Dynamic Panel Estimates Jonathan Mummolo (Stanford) 150C Causal Inference 56 / 59

79 Switching Plot Modeling Dynamic Effects Naturalization Rate (%) Direct Democracy Representative Democracy Years Before / After Change from Direct to Representative Democracy Jonathan Mummolo (Stanford) 150C Causal Inference 57 / 59

80 Modeling Dynamic Effects Heterogeneous Treatment Effects So far we have assumed that the treatment effect is constant across units Can allow for heterogeneous treatment effects by including interaction of treatment with other regressors y it = treat it α 0 + (treat it x it )α 1 + x it β + c i + t + ε it, t = 1, 2,..., T Often the treatment is interacted with a time-invariant regressor: y it = treat it α 0 + (treat it x i )α 1 + c i + t + ε it, t = 1, 2,..., T Note: The lower order term on the time-invariant x i is collinear with the fixed effects and drops out Jonathan Mummolo (Stanford) 150C Causal Inference 58 / 59

81 Modeling Dynamic Effects Heterogeneous Effect of Direct Democracy Effect: Direct to Representative Democracy 95 % Confidence Interval Change in Naturalization Rate (%) Quartiles SVP Vote Share Municipality Level SVP Vote Share (%) Jonathan Mummolo (Stanford) 150C Causal Inference 59 / 59

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 6 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 21 Recommended Reading For the today Advanced Panel Data Methods. Chapter 14 (pp.

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook) Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook) 1 2 Panel Data Panel data is obtained by observing the same person, firm, county, etc over several periods. Unlike the pooled cross sections,

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

150C Causal Inference

150C Causal Inference 150C Causal Inference Difference in Differences Jonathan Mummolo Stanford Selection on Unobservables Problem Often there are reasons to believe that treated and untreated units differ in unobservable characteristics

More information

Controlling for Time Invariant Heterogeneity

Controlling for Time Invariant Heterogeneity Controlling for Time Invariant Heterogeneity Yona Rubinstein July 2016 Yona Rubinstein (LSE) Controlling for Time Invariant Heterogeneity 07/16 1 / 19 Observables and Unobservables Confounding Factors

More information

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible

More information

Gov 2000: 13. Panel Data and Clustering

Gov 2000: 13. Panel Data and Clustering Gov 2000: 13. Panel Data and Clustering Matthew Blackwell Harvard University mblackwell@gov.harvard.edu November 28, 2016 Where are we? Where are we going? Up until now: the linear regression model, its

More information

Linear Panel Data Models

Linear Panel Data Models Linear Panel Data Models Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania October 5, 2009 Michael R. Roberts Linear Panel Data Models 1/56 Example First Difference

More information

Lecture 4: Linear panel models

Lecture 4: Linear panel models Lecture 4: Linear panel models Luc Behaghel PSE February 2009 Luc Behaghel (PSE) Lecture 4 February 2009 1 / 47 Introduction Panel = repeated observations of the same individuals (e.g., rms, workers, countries)

More information

Econ 582 Fixed Effects Estimation of Panel Data

Econ 582 Fixed Effects Estimation of Panel Data Econ 582 Fixed Effects Estimation of Panel Data Eric Zivot May 28, 2012 Panel Data Framework = x 0 β + = 1 (individuals); =1 (time periods) y 1 = X β ( ) ( 1) + ε Main question: Is x uncorrelated with?

More information

Short T Panels - Review

Short T Panels - Review Short T Panels - Review We have looked at methods for estimating parameters on time-varying explanatory variables consistently in panels with many cross-section observation units but a small number of

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

Topic 10: Panel Data Analysis

Topic 10: Panel Data Analysis Topic 10: Panel Data Analysis Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Introduction Panel data combine the features of cross section data time series. Usually a panel

More information

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63 1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS17/18 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

10 Panel Data. Andrius Buteikis,

10 Panel Data. Andrius Buteikis, 10 Panel Data Andrius Buteikis, andrius.buteikis@mif.vu.lt http://web.vu.lt/mif/a.buteikis/ Introduction Panel data combines cross-sectional and time series data: the same individuals (persons, firms,

More information

Applied Quantitative Methods II

Applied Quantitative Methods II Applied Quantitative Methods II Lecture 10: Panel Data Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 1 / 38 Outline 1 Introduction 2 Pooled OLS 3 First differences 4 Fixed effects

More information

Difference-in-Differences Methods

Difference-in-Differences Methods Difference-in-Differences Methods Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 1 Introduction: A Motivating Example 2 Identification 3 Estimation and Inference 4 Diagnostics

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

Econometrics Problem Set 11

Econometrics Problem Set 11 Econometrics Problem Set WISE, Xiamen University Spring 207 Conceptual Questions. (SW 2.) This question refers to the panel data regressions summarized in the following table: Dependent variable: ln(q

More information

Non-linear panel data modeling

Non-linear panel data modeling Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1

More information

Advanced Econometrics

Advanced Econometrics Based on the textbook by Verbeek: A Guide to Modern Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna May 16, 2013 Outline Univariate

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover). Formatmall skapad: 2011-12-01 Uppdaterad: 2015-03-06 / LP Department of Economics Course name: Empirical Methods in Economics 2 Course code: EC2404 Semester: Spring 2015 Type of exam: MAIN Examiner: Peter

More information

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL INTRODUCTION TO BASIC LINEAR REGRESSION MODEL 13 September 2011 Yogyakarta, Indonesia Cosimo Beverelli (World Trade Organization) 1 LINEAR REGRESSION MODEL In general, regression models estimate the effect

More information

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid Applied Economics Panel Data Department of Economics Universidad Carlos III de Madrid See also Wooldridge (chapter 13), and Stock and Watson (chapter 10) 1 / 38 Panel Data vs Repeated Cross-sections In

More information

1 Estimation of Persistent Dynamic Panel Data. Motivation

1 Estimation of Persistent Dynamic Panel Data. Motivation 1 Estimation of Persistent Dynamic Panel Data. Motivation Consider the following Dynamic Panel Data (DPD) model y it = y it 1 ρ + x it β + µ i + v it (1.1) with i = {1, 2,..., N} denoting the individual

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Impact Evaluation Workshop 2014: Asian Development Bank Sept 1 3, 2014 Manila, Philippines

Impact Evaluation Workshop 2014: Asian Development Bank Sept 1 3, 2014 Manila, Philippines Impact Evaluation Workshop 2014: Asian Development Bank Sept 1 3, 2014 Manila, Philippines Session 15 Regression Estimators, Differences in Differences, and Panel Data Methods I. Introduction: Most evaluations

More information

Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13)

Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13) Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13) Tsun-Feng Chiang* *School of Economics, Henan University, Kaifeng, China March 3, 2014 1 / 30 Pooling Cross Sections across Time Pooled

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables Applied Statistics and Econometrics Giuseppe Ragusa Lecture 15: Instrumental Variables Outline Introduction Endogeneity and Exogeneity Valid Instruments TSLS Testing Validity 2 Instrumental Variables Regression

More information

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Section 7 Model Assessment This section is based on Stock and Watson s Chapter 9. Internal vs. external validity Internal validity refers to whether the analysis is valid for the population and sample

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 1 Jakub Mućk Econometrics of Panel Data Meeting # 1 1 / 31 Outline 1 Course outline 2 Panel data Advantages of Panel Data Limitations of Panel Data 3 Pooled

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

EC327: Advanced Econometrics, Spring 2007

EC327: Advanced Econometrics, Spring 2007 EC327: Advanced Econometrics, Spring 2007 Wooldridge, Introductory Econometrics (3rd ed, 2006) Chapter 14: Advanced panel data methods Fixed effects estimators We discussed the first difference (FD) model

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Multiple Linear Regression CIVL 7012/8012

Multiple Linear Regression CIVL 7012/8012 Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

ECON Introductory Econometrics. Lecture 17: Experiments

ECON Introductory Econometrics. Lecture 17: Experiments ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.

More information

ECO375 Tutorial 8 Instrumental Variables

ECO375 Tutorial 8 Instrumental Variables ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 1 / 22 Review: Endogeneity Instrumental

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 3 Jakub Mućk Econometrics of Panel Data Meeting # 3 1 / 21 Outline 1 Fixed or Random Hausman Test 2 Between Estimator 3 Coefficient of determination (R 2

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

150C Causal Inference

150C Causal Inference 150C Causal Inference Instrumental Variables: Modern Perspective with Heterogeneous Treatment Effects Jonathan Mummolo May 22, 2017 Jonathan Mummolo 150C Causal Inference May 22, 2017 1 / 26 Two Views

More information

Econometrics Homework 4 Solutions

Econometrics Homework 4 Solutions Econometrics Homework 4 Solutions Question 1 (a) General sources of problem: measurement error in regressors, omitted variables that are correlated to the regressors, and simultaneous equation (reverse

More information

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II Jeff Wooldridge IRP Lectures, UW Madison, August 2008 5. Estimating Production Functions Using Proxy Variables 6. Pseudo Panels

More information

α version (only brief introduction so far)

α version (only brief introduction so far) Econometrics I KS Module 8: Panel Data Econometrics Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: June 18, 2018 α version (only brief introduction so far) Alexander

More information

Statistical Models for Causal Analysis

Statistical Models for Causal Analysis Statistical Models for Causal Analysis Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 Three Modes of Statistical Inference 1. Descriptive Inference: summarizing and exploring

More information

Lecture 6: Dynamic panel models 1

Lecture 6: Dynamic panel models 1 Lecture 6: Dynamic panel models 1 Ragnar Nymoen Department of Economics, UiO 16 February 2010 Main issues and references Pre-determinedness and endogeneity of lagged regressors in FE model, and RE model

More information

Linear Models in Econometrics

Linear Models in Econometrics Linear Models in Econometrics Nicky Grant At the most fundamental level econometrics is the development of statistical techniques suited primarily to answering economic questions and testing economic theories.

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

Difference-in-Differences Estimation

Difference-in-Differences Estimation Difference-in-Differences Estimation Jeff Wooldridge Michigan State University Programme Evaluation for Policy Analysis Institute for Fiscal Studies June 2012 1. The Basic Methodology 2. How Should We

More information

Specification testing in panel data models estimated by fixed effects with instrumental variables

Specification testing in panel data models estimated by fixed effects with instrumental variables Specification testing in panel data models estimated by fixed effects wh instrumental variables Carrie Falls Department of Economics Michigan State Universy Abstract I show that a handful of the regressions

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

Multiple Equation GMM with Common Coefficients: Panel Data

Multiple Equation GMM with Common Coefficients: Panel Data Multiple Equation GMM with Common Coefficients: Panel Data Eric Zivot Winter 2013 Multi-equation GMM with common coefficients Example (panel wage equation) 69 = + 69 + + 69 + 1 80 = + 80 + + 80 + 2 Note:

More information

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover). STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods in Economics 2 Course code: EC2402 Examiner: Peter Skogman Thoursie Number of credits: 7,5 credits (hp) Date of exam: Saturday,

More information

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017 Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)

More information

Panel Data: Linear Models

Panel Data: Linear Models Panel Data: Linear Models Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Panel Data: Linear Models 1 / 45 Introduction Outline What

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

Panel data methods for policy analysis

Panel data methods for policy analysis IAPRI Quantitative Analysis Capacity Building Series Panel data methods for policy analysis Part I: Linear panel data models Outline 1. Independently pooled cross sectional data vs. panel/longitudinal

More information

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.

More information

Eco517 Fall 2014 C. Sims FINAL EXAM

Eco517 Fall 2014 C. Sims FINAL EXAM Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,

More information

Problem 13.5 (10 points)

Problem 13.5 (10 points) BOSTON COLLEGE Department of Economics EC 327 Financial Econometrics Spring 2013, Prof. Baum, Mr. Park Problem Set 2 Due Monday 25 February 2013 Total Points Possible: 210 points Problem 13.5 (10 points)

More information

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler Basic econometrics Tutorial 3 Dipl.Kfm. Introduction Some of you were asking about material to revise/prepare econometrics fundamentals. First of all, be aware that I will not be too technical, only as

More information

ES103 Introduction to Econometrics

ES103 Introduction to Econometrics Anita Staneva May 16, ES103 2015Introduction to Econometrics.. Lecture 1 ES103 Introduction to Econometrics Lecture 1: Basic Data Handling and Anita Staneva Egypt Scholars Economic Society Outline Introduction

More information

Fixed Effects Models for Panel Data. December 1, 2014

Fixed Effects Models for Panel Data. December 1, 2014 Fixed Effects Models for Panel Data December 1, 2014 Notation Use the same setup as before, with the linear model Y it = X it β + c i + ɛ it (1) where X it is a 1 K + 1 vector of independent variables.

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 4 Jakub Mućk Econometrics of Panel Data Meeting # 4 1 / 30 Outline 1 Two-way Error Component Model Fixed effects model Random effects model 2 Non-spherical

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models. An obvious reason for the endogeneity of explanatory

Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models. An obvious reason for the endogeneity of explanatory Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models An obvious reason for the endogeneity of explanatory variables in a regression model is simultaneity: that is, one

More information

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5. Outline 1 Elena Llaudet 2 3 4 October 6, 2010 5 based on Common Mistakes on P. Set 4 lnftmpop = -.72-2.84 higdppc -.25 lackpf +.65 higdppc * lackpf 2 lnftmpop = β 0 + β 1 higdppc + β 2 lackpf + β 3 lackpf

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. Linear

More information

Gov 2002: 9. Differences in Differences

Gov 2002: 9. Differences in Differences Gov 2002: 9. Differences in Differences Matthew Blackwell October 30, 2015 1 / 40 1. Basic differences-in-differences model 2. Conditional DID 3. Standard error issues 4. Other DID approaches 2 / 40 Where

More information

ECON Introductory Econometrics. Lecture 16: Instrumental variables

ECON Introductory Econometrics. Lecture 16: Instrumental variables ECON4150 - Introductory Econometrics Lecture 16: Instrumental variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 12 Lecture outline 2 OLS assumptions and when they are violated Instrumental

More information

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015 Introduction to causal identification Nidhiya Menon IGC Summer School, New Delhi, July 2015 Outline 1. Micro-empirical methods 2. Rubin causal model 3. More on Instrumental Variables (IV) Estimating causal

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Panel Data Model (January 9, 2018)

Panel Data Model (January 9, 2018) Ch 11 Panel Data Model (January 9, 2018) 1 Introduction Data sets that combine time series and cross sections are common in econometrics For example, the published statistics of the OECD contain numerous

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Panel Data Exercises Manuel Arellano. Using panel data, a researcher considers the estimation of the following system:

Panel Data Exercises Manuel Arellano. Using panel data, a researcher considers the estimation of the following system: Panel Data Exercises Manuel Arellano Exercise 1 Using panel data, a researcher considers the estimation of the following system: y 1t = α 1 + βx 1t + v 1t. (t =1,..., T ) y Nt = α N + βx Nt + v Nt where

More information

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September

More information

Problem Set - Instrumental Variables

Problem Set - Instrumental Variables Problem Set - Instrumental Variables 1. Consider a simple model to estimate the effect of personal computer (PC) ownership on college grade point average for graduating seniors at a large public university:

More information

Beyond the Target Customer: Social Effects of CRM Campaigns

Beyond the Target Customer: Social Effects of CRM Campaigns Beyond the Target Customer: Social Effects of CRM Campaigns Eva Ascarza, Peter Ebbes, Oded Netzer, Matthew Danielson Link to article: http://journals.ama.org/doi/abs/10.1509/jmr.15.0442 WEB APPENDICES

More information

Gov 2000: 9. Regression with Two Independent Variables

Gov 2000: 9. Regression with Two Independent Variables Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics

More information

Econometrics (60 points) as the multivariate regression of Y on X 1 and X 2? [6 points]

Econometrics (60 points) as the multivariate regression of Y on X 1 and X 2? [6 points] Econometrics (60 points) Question 7: Short Answers (30 points) Answer parts 1-6 with a brief explanation. 1. Suppose the model of interest is Y i = 0 + 1 X 1i + 2 X 2i + u i, where E(u X)=0 and E(u 2 X)=

More information

Analysis of Panel Data: Introduction and Causal Inference with Panel Data

Analysis of Panel Data: Introduction and Causal Inference with Panel Data Analysis of Panel Data: Introduction and Causal Inference with Panel Data Session 1: 15 June 2015 Steven Finkel, PhD Daniel Wallace Professor of Political Science University of Pittsburgh USA Course presents

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap Deviations from the standard

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna October 16, 2013 Outline Introduction Simple

More information

Econ Spring 2016 Section 9

Econ Spring 2016 Section 9 Econ 140 - Spring 2016 Section 9 GSI: Fenella Carpena March 31, 2016 1 Assessing Studies Based on Multiple Regression 1.1 Internal Validity Threat to Examples/Cases Internal Validity OVB Example: wages

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Friday, June 5, 009 Examination time: 3 hours

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

More on Roy Model of Self-Selection

More on Roy Model of Self-Selection V. J. Hotz Rev. May 26, 2007 More on Roy Model of Self-Selection Results drawn on Heckman and Sedlacek JPE, 1985 and Heckman and Honoré, Econometrica, 1986. Two-sector model in which: Agents are income

More information

LECTURE 9: GENTLE INTRODUCTION TO

LECTURE 9: GENTLE INTRODUCTION TO LECTURE 9: GENTLE INTRODUCTION TO REGRESSION WITH TIME SERIES From random variables to random processes (cont d) 2 in cross-sectional regression, we were making inferences about the whole population based

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

New Developments in Econometrics Lecture 16: Quantile Estimation

New Developments in Econometrics Lecture 16: Quantile Estimation New Developments in Econometrics Lecture 16: Quantile Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. Review of Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Endogeneity b) Instrumental

More information