Impact Evaluation Workshop 2014: Asian Development Bank Sept 1 3, 2014 Manila, Philippines

Size: px
Start display at page:

Download "Impact Evaluation Workshop 2014: Asian Development Bank Sept 1 3, 2014 Manila, Philippines"

Transcription

1 Impact Evaluation Workshop 2014: Asian Development Bank Sept 1 3, 2014 Manila, Philippines Session 15 Regression Estimators, Differences in Differences, and Panel Data Methods

2 I. Introduction: Most evaluations in developing countries were conducted on new programs that did not exist before the randomized trial was conducted. In contrast, impact evaluations that are not based on randomized trials almost always are conducted on programs that have existed before the evaluation was planned. Theree are two not random: ways in which participation is 1. The communities in which the programs exist are not randomly chosen. 2. The participants in the program are not randomly assigned. 2

3 This session has three objectives: 1. Explain how very simple ordinary least squares (OLS) estimates of program effects could lead to biased estimates of program impacts. 2. Present four commonly used regression methods to estimate program impacts, including the assumptions needed for those estimations methods to produce unbiased and consistent estimates. The cross section estimator The before after estimator The difference in difference estimator The within estimator 3. Present a case study exploiting panel data methods that allow for correlation between time and the treatment. Let s start with some very simple examples (well, let's skip them here) 3

4 Example: A Before After Estimator The before after estimator obtains a program s impact by comparing outcomes measured after the program started with outcomes measured before it started. Consider a program thatt provides loans to poor farmers, so that they can buy fertilizer to increase their maize production. In the year before the program started, we observed that farmers who later enrolled in the program harvested an average of 1,000 kg of maize per hectare (ha). One year after the program started, maize yields increased to kg/ha. The before after estimator finds a program impact of 200 ( = ) kg/ha. Question: Is 200 kg/ha a plausible estimatee of the program s impact? Consider two cases: 4

5 A: Rainfall was normal during the year before the program started, but a drought occurred in the year the program was launched. B: A drought occurred in the year before the program started, but rainfall returned to normal during the year the program was launched. 5

6 Note: The before after estimator assumes Counterfactual C. Counterfactuals A and B pick up impacts of factors other than the program (e.g. weather changes). 6

7 Example: A Cross section (Enrolled and Nonenrolled) Estimator The cross section estimator obtains a program s impact by comparing outcomes of participants and non participants after the program started. Consider the microfinance program again. weree collected one year after the program Now suppose thatt the only data we have started (i.e. no before data). One year after the program began, the farmers who enrolled in the program harvested an average of 1,100 kg of maize per ha, while those who did not enrolll harvested an average of 1,000 kg/ha. The cross section estimator calculates a program impact of a 100 ( =1,100 1,000) kg/ /ha increase in maize yields. Question: Is 100 kg/ha a plausible estimatee of the program impact? 7

8 Consider the following case scenarios: A: More productive farmers were more likely to obtain the loan because they were more likely to be able to pay back the loan. B: Farmers in the program reside in areas where the quality of land is lower (e.g. they needed more fertilizer to compensate for low land quality). Example: A Difference in Differences Estimator Consider the microfinance example again. A drought occurred the year before the program started, but rainfall was normal the year program was launched. Assume that all farmers were affected by the drought, and they were affected in a similar way. Assume also that not only do we have data collected one year after the program was launched, but we also have data on maize yields before the program was launched, for both enrolled and nonenrolled farmers. Before the program, the farmers who later enrolled in the program harvested 1000 kg of maize per ha, and they harvested 1150 kg/ha one year after the program started. Farmers who did not enroll harvested 900 kg/ha before the program began, and 1000 kg/ha one 8

9 year after the program started. A DID estimator combines the before after and crosssectional (enrolled nonenrolled) estimators: 1 2 = [(enrolled, after) (enrolled, before)] [(nonenrolled, after) (nonenrolled, before)] This yields an estimated impact of ( ) ( ) = 50 kg/ha. Intuition: 1 and 2 remove influence of time invariant factors, e.g., land quality; 1 Δ 2 removes the influence of the common time trend due to, say, the drought. The following figure illustrates these three simple estimators: Cross section: Estimated effect = C D (ignores fixed factors, e.g. land quality, between groups); Before After: Estimated effect = C A (ignores time trend); DID: Estimated effect = (C A) (D B) = C E (accounts for both) 9

10 II. Parameters of Interest and Sources of Bias Recall the two most common parameters of interest for impact evaluation: 1. ATE: the average effect of the program for all persons in the population: ATE E[Y 1 Y 0 ] = E[Δ] 2. ATT: the average effect of the program for program participants: ATT E[Y 1 Y 0 P = 1] = E[Δ P = 1] Recall also that sometimes it is possible to go further by estimating ATE and ATT for a person with characteristics X (a vector of observable variables): ATE(X) E[Y 1 Y 0 X] = E[Δ X] ATT(X) E[Y 1 Y 0 P = 1, X] = E[Δ P = 1, X] If the individuals who take the program tend to be the ones that receive the greatest benefit from it, then we would expect ATT(X) > ATE(X). 10

11 In general, the difference between the mean of observed Y for program participants (P = 1 group) and the mean of observed Y for program non participants (P = 0 group) will not give a consistent (unbiased) estimate of either ATE(X) or ATT(X). To see how bias comes about, assume that for any person in the population the values of Y 1 (the value of Y if that person participates in the program) and Y 0 (the value of Y if that person does not participate in the program) can be expressed as simple linear functions of the X variables for that person, plus an error term: Y 1 = Xβ 1 + U 1 Y 0 = Xβ 0 + U 0 where we assume that E[U 1 X] = E[U 0 X] = 0. The observed value of Y can be written as Y = PY 1 + (1 P)Y 0, where P equals 1 if the person participates in the program and equals 0 if he/she does not participate. 11

12 Note that this setup is quite general, and it allows the program impact to work through X and U. Note that: Y = PY 1 + (1 P)Y 0 Substituting our values for Y 0 and Y 1 above, we have: Regrouping terms, we have: Y= P(Xβ 1 +U 1 ) + (1 P)(Xβ 0 + U 0 ) Y= Xβ 0 + P(Xβ 1 Xβ 0 ) + {U 0 + P(U 1 U 0 )} What does this expression for Y have to do with ATE(X) and ATT(X)? In fact, it is easy to manipulate this expression to show the relationships. To begin, recall that ATE(X) = E[Y 1 Y 0 X]. Substituting the above expressions for Y 1 and Y 0 and rearranging terms, we have: 12

13 ATE(X) = E[Y 1 Y 0 X] = E[(Xβ 1 + U 1 ) (Xβ 0 + U 0 ) X] = E[(Xβ 1 Xβ 0 ) X] + E[U 1 X] E[U 0 X] Recalling that E[U 1 X] = E[U 0 X] =0 (assumption made above), we have = (Xβ 1 Xβ 0 ) This implies that the above expression for Y can be written as: Y = Xβ 0 + P ATE(X) + {U 0 + P(U 1 U 0 )} You can also show (by adding and subtracting PE[U 1 U 0 X, P = 1]) that: Y = Xβ 0 + P ATT(X) + {U 0 + P (U 1 U 0 E[U 1 U 0 X, P = 1])}. 13

14 These two expressions show us how bias can arise when trying to estimate ATE(X) and ATT(X). To estimate ATE(X), the expression Y = Xβ 0 + P ATE(X) + {U 0 + P(U 1 U 0 )} suggests that we regress Y on X and P, and the coefficient on P will be ATE(X). However, this will yield unbiased and consistent estimates of ATE(X) only if the error term {U 0 + P(U 1 U 0 )} is uncorrelated with X and P! In other words, we need to assume that: E[U 0 + P(U 1 U 0 ) X, P] = 0 Similarly, to estimate ATT(X) the above expression Y = Xβ 0 + P ATT(X) + {U 0 + P (U 1 U 0 E[U 1 U 0 X, P = 1])} suggests that the same regression yields an estimate of ATT(X) if the following holds: E[U 0 + P (U 1 U 0 E[U 1 U 0 X, P = 1]) X, P] = 0 14

15 Note that E[P (U 1 U 0 E[U 1 U 0 X, P = 1]) X, P] = 0. This can be seen by considering the two possible values of P. If P = 0 then the expression equals 0. If P = 1 the expression becomes E[(U 1 U 0 E[U 1 U 0 X, P = 1]) X, P = 1], which equals E[U 1 U 0 X, P = 1] E[U 1 U 0 X, P = 1] = 0. So the only concern in estimating ATT(X) is whether E[U 0 X, P] = 0. Therefore, to estimate ATT(X), we need assumptions that imply E[U 0 X, P] = 0. Similarly, estimating ATE(X) requires assumptions that imply E[U 0 + P(U 1 U 0 ) X, P] = 0. Consider the following assumptions: (A.1) Conditional on X, the program effect is the same for everyone (U 1 = U 0 ) (A.2) Conditional on X, the program effect varies across individuals (U 1 U 0 ), but U 1 U 0 does not predict program participation (A.3) Conditional on X, the program effect varies across individuals and U 1 U 0 does predict who participates in the program. 15

16 Note that Assumptions (A.1) and (A.2) imply that ATE(X) = ATT(X): ATT(X) E[Y 1 Y 0 X, P = 1] = E[Xβ 1 Xβ 0 + U 1 U 0 X, P = 1] = Xβ 1 Xβ 0 + E[U 1 U 0 X, P = 1] Under Assumption (A.1), the last term equals 0. It also equals 0 under Assumption (A.2) because that assumption implies that E[U 1 U 0 X, P = 1] = E[U 1 U 0 X], which also equals zero. Thus ATT(X) = Xβ 1 Xβ 0 under either Assumption (A.1) or Assumption (A.2), and ATE(X) = Xβ 1 Xβ 0 as well because E[U 1 U 0 X] = 0. Under assumptions (A.1) and (A.2), ATE = ATT and potential bias arises only if E[U 0 X, P] 0. Under assumption (A.3), bias in estimating ATE can arise if either E[U 0 X, P] 0 or E[U 1 U 0 X, P] 0 (see p.13). 16

17 III. Cross Section Estimator The cross section estimator uses data on a group of nonparticipants to impute counterfactual outcomes for program participants. The data for both groups are collected during the same time period, after the program has started. We now modify the notation to allow for a time subscript: Y 1it = value of Y for person i at time t if he/she participates in the program at time t Y 0it = value of Y for person i at time t if he/she does not participate in the program at time t. The data requirements of this estimator are minimal: it requires data only on participants (P it = 1) and non participants (P it = 0) for some time period t after the participants started their involvement in the program. 17

18 The cross section estimator can be defined as the OLS estimate of: Y it = X it β 0 + P it ATT(X it ) + ε it where ε it = U 0it + P it (U 1it U 0it E[U 0it U 1it X it, P it = 1]). That is, Y it is regressed on X it and P it interacted with X it, and the coefficients on P it interacted with X it provide estimates of the average treatment effect on the treated (ATT) for people with characteristics X it. In practice, it is often assumed that treatment effects are the same across different X it, so that Y it is regressed on X it and the indicator P it, and the single coefficient on P is interpreted as the treatment effect. Recall that under assumptions (A.1) or (A.2), ATE(X) and ATT(X) are the same parameter. Consistency of the cross section regression estimator requires that the error term ε it not be correlated with either X it or P it, i.e. that E[ε it P it, X it ] = 0. This restriction is violated and thus the cross section regression estimator is biased and inconsistent if people select into the program based on expectations about their own gain from the program (violation of A.3). 18

19 To see this, consider that unobservable characteristics, like motivation, intellectual ability, or other advantages are likely to be present and correlated with both participation in or access to the treatment and with the outcome variable, introducing bias into the estimates of the treatment effects. Even though this strong assumption is likely to be violated, the cross section estimator is commonly used because of its minimal data requirements. Thus the other three regression estimators (before after, difference in differences, and within) are preferred, although each of them requires some kind of additional requirement of the data. 19

20 IV. The Before After Estimator Suppose that we have panel data, that is data collected from the same people for 2 or more time periods and that we observe only program participants. For both of the potential outcomes (Y 1 and Y 0 ), assume the same linear model used above: Y 1it = X it β 1 + U 1it Y 0it = X it β 0 + U 0it The X it variables may either be fixed (e.g. gender) or time varying (e.g. age), but they are assumed to be unaffected by an individual s participation in the program. The error terms U 1it and U 0it are assumed to satisfy E[U 1it X it ] = E[U 0it X it ] = 0. Suppose the intervention took place in period t *. For t < t *, none of the individuals had yet participated in the program, so we observe Y 0it and P it = 0. For t > t *, we observe Y 1it and P it = 1. 20

21 Thus, the observed outcome at time t can be written as: Y it = X it β 0 + P it Δ(X it ) +U 0it where P it denotes having participated in the program and Δ(X it ) = X it β 1 X it β 0 + U 1it U 0it is the treatment impact for individual i (note that it is not an average treatment effect because it is for a single person). The evaluation problem can be viewed as a missing data problem, because each person is observed in only one of two potential states (treated or untreated) at any point in time and the missing state needs to be imputed. The before after estimator addresses the missing data problem by using pre program data to impute the missing counterfactual outcome. Let t and t denote two time periods, one before and one after the program intervention. Suppose that we want to estimate the impact of the program on a person who participates between those two time periods. 21

22 In the notation of the panel data model, we can define the ATT(X it ) parameter as: ATT(X it ) = E[Δ(X it ) P it = 1, P it = 0, X it ] = [X it' β 1 + U 1it' X it' β 0 + U 0it' P it = 1, P it = 0, X it ] (all evaluated at t') where the conditioning on P it = 1 and P it = 0 indicates that the person was not in the program at time t but did participate in the program by time t. The before after estimator for ATT(X it ) can be written as follows. Y it Y it = X it β 1 X it β 0 + U 1it U 0it (for participants only; 1st Eqn. at t' 2nd at t) We can derive how this may be estimated using OLS, as follows: = X it β 1 X it β 0 + E[Δ(X it ) P it = 1, P it = 0, X it ] E[Δ(X it ) P it = 1, P it = 0, X it ]+ U 1it U 0it = X it β 1 X it β 0 + ATT(X it ) E[X it β 1 X it' β 0 + U 1it U 0it P it = 1, P it = 0, X it ] + U 1it U 0it = X it β 1 X it β 0 X it β 1 + X it' β 0 + ATT(X it ) E[U 1it U 0it P it = 1, P it = 0, X it ] + U 1it U 0it = (X it X it )β 0 + ATT(X it ) E[U 1it U 0it P it = 1, P it = 0, X it ] + U 1it U 0it + U 0it U 0it 22

23 The last expression implies that one can use OLS to estimate the following: Y it Y it = (X it X it )β 0 + ATT(X it ) + ε it where ε it = (U 1it U 0it E[U 1it U 0it P it = 1, P it = 0, X it ]) + U 0it U 0it Thus, the treatment impact can be obtained from a regression of the difference Y it Y it regressed on (X it X it ) and also on X it in levels (i.e. part of the ATT(X it' ) = [X it' β 1 + U 1it' X it' β 0 + U 0it' P it = 1, P it = 0, X it ]). The coefficients on X it, along with the constant term, provide estimates of ATT(X it ), controlling for any time varying X it variables. If the regressors X are not time varying, then the regression simplifies to regressing Y it Y it on X it. Note, however, that this estimation strategy does not allow for estimation of timespecific intercepts that are unrelated to program participation. The β 0 have to be assumed to be non time varying, or else they cannot be separately identified from the treatment effect. 23

24 Consistent estimation of the ATT(X it ) term requires E[ε it P it = 1, P it = 0, X it ] = 0. In fact, the term in parentheses in the expression for ε it has conditional mean of 0 by construction: E[U 1it U 0it E[U 1it U 0it P it = 1, P it = 0, X it ] P it = 1, P it = 0, X it ] so the key assumption needed for the before after estimator to be an unbiased and consistent estimator is the following: E[U 0it U 0it P it = 1, P it = 0, X it ] = 0. A special case where this assumption is satisfied is when U 0it can be decomposed into a fixed effect error structure: U 0it = f i + v it where f i is fixed over time and v it satisfies E[v it v it P it = 1, P it = 0, X it ] = 0. 24

25 Intuitively, this assumption allows selection into the program to be based on unobservable characteristics that are time invariant (called f i here), which could be correlated with P it, but are then differenced out of the expression U 0it U 0it. Thus a before after estimation strategy allows for person specific permanent unobservables that affect the program participation decision. The regression as described above has one pre and one post program observation for each person and the model is estimated only for people who eventually participate in the program. If there are more than two periods of data available, the model can also be estimated as a standard fixed effects regression (taking deviations from means), making use of all the data available. 25

26 V. Difference in Differences (DID) Estimators The difference in differences (DID) estimator measures the impact of the program intervention by the difference in the before after change in outcomes between participants and nonparticipants. To see how it works, recall that t is a time period before the program started and t is some me period a er it started. Define a (time invariant) indicator variable, denotedd by I i, thatt equals 1 for participants (those for whom P it = 0 and P it = 1) and 0 for non participants (for whom P it = P it = 0). The DID estimator is the OLS estimate of ATT(X it ) in the following regression equation: Y it Y it = X it β 0 X it β 0 + I i ATT(X it ) + where ε it = P it (U 1it U 0i t E[U 1it U 0it P i t = 1, P it = 0, X it ]) + ε it U 0it U 0it t 26

27 Note that this regression equation is identical to that for the before after estimator, except that now it is estimated using both participant and nonparticipant observations. The DID estimator addresses an important shortcoming of the before after estimator in that it allows for time specific intercepts that are common across groups (which can be included in X it β 0 ). These time effects are identified separately from the treatment effects because of the inclusion of the nonparticipant observations (recall that with the before after estimator, the constant term was attributed to the treatment effect, which is not the case here). The DID estimator is unbiased and consistent if E[ε it P it, X it ] = 0, which would be satisfied under a fixed effect error structure. With more than two time periods, the DID estimator can be implemented using a panel data fixed effects regression. 27

28 The data required to implement the DID estimator can be either panel data or repeated cross section data on both participants and nonparticipants. If it is implemented using repeated cross section data, stronger assumptions are needed on the error term. There are also ways of specifying the DID estimator as a levels equation rather than a differenced equation. For example, it can be estimated using the regression: Y it = X it β 0 + t + f i + P it ATT(X it ) + ε it for t = t,, t where ε it = U 0it + P it (U 1it U 0it E[U 1it U 0it P it = 1, X it ]). In this equation, t indicates a time specific intercept and f i is an individual level fixed effect (an indicator variable for each individual). Alternatively, the model could be estimated in deviation from mean form, in which case the fixed effect term would not need to be included since it will be differenced out. 28

29 If repeated cross section data are available rather than longitudinal data, then it is not possible to estimate fixed effects. In that case, we need to impose a stronger assumption on the error term, namely that E[ε it P it, X it ] = 0, which requires that E[U 0it P it, X it ] = 0. This means that people cannot select into the program based on their U 0it values. This is in contrast to panel data, in which these time invariant unobservables are differenced out. The main advantage of longitudinal (before after or difference in difference) estimators over cross sectional methods is that they allow for unobservable determinants of program participation decisions that are correlated with outcomes. However, the fixed effects error structure that is imposed to justify application of these estimators requires that unobservables which could be correlated with the error term be time invariant; this does not allow for variables that both vary over time and are correlated with the observed variables. For example, we might expect there to be correlated unobserved earnings shocks that make people more likely to participate in a social program (such as a public works program) and that would not be captured by a fixed effects error structure. 29

30 VI. Extension: Within Estimators (one way fixed effects) Within estimators identify program impacts from changes in outcomes within some unit, such as within a family, a school or a village. The before after and difference in differences estimators can also be viewed as within estimators, where the variation exploited is the change over time within a given individual. This section describes other kinds of within estimators. Let Y 0ijt and Y 1ij jt denote the outcomes for individual i, who is a member of unit j, and is observed at time t. For simplicity, at first assume that U 1it = U 0it. Assume a linear model for these two outcomes: Y ijt = X ijt β 0 + P ijt ATT(XX ijt ) + ε ijt Assume that the error term ε ijt (= U 0it = U 1it ) can be decomposed as: ε ijt = θ j + v ijt 30

31 where θ j represents the unobservables that are assumed to be fixed for individuals within the same unit, and the v ijt s are independent & identically distributed (i.i.d). Taking differences between two individuals, denoted by i and i, from the same unit j observed in the same time period t gives: Y ijt Y i jt = (X ijt X i jt )β 0 + (P ijt P i jt ) ATT(X ijt ) + (v ijt v i jt ). To estimate ATT(X ijt ), regress Y ijt Y i jt, X ijt X i jt and interaction terms between P ijt P i jt and X ijt. Consistency and unbiasedness of the OLS estimator of ATT(X ijt ) requires that: E[v ijt v i jt X ijt, X i jt, P ijt, P i jt ] = 0 This assumption implies that, within a particular unit, the individual who gets the treatment is selected without any influence of the error term v ijt. 31

32 Comments on the Within Estimator 1. Because it relies on comparing the outcomes of treated and untreated persons, the approach implicitly assumes that there are no spillover effects from treating one individual onto other individuals within the same unit. 2. In the more general version of the model, where U 1it U 0it, one must also assume that the individual in the unit that receives the treatment is selected without any influence of that individual s idiosyncratic gain from the program. That is, the program may be targeted at specific units (e.g. families or villages), but within those units, the selection of participants into the program should be unrelated to their idiosyncratic gains from the program (unrelated to U 1it U 0it ). 3. As with the before after and difference in differences estimation approaches, the within estimator just described allows treatment to be selective across units. That is, it allows E[ε ijt P ijt, X ijt ] 0, because treatment selec on can be based on the unobserved heterogeneity term θ j (heterogeneity shared among individuals within a unit). 4. When the variation being exploited for identification of the treatment effect is variation within a family, village, or school at a single point in time, then the within estimator can be implemented with a single cross section of data. 32

33 Discuss: What are some sources of heterogeneity that might be shared by all individuals in a community? What are some thatt might vary within a community? What are some advantages of using the within estimator? 33

34 VI. Extension: Two Way Fixed Effects and More Y it = X it β 0 + t + f i + P it ATT(X it ) + ε it for t = t,, t What if P is correlated with time variant unobservables? What if the program enrollment expands over time? What if the treatment effect varies over time? The following case study illustrates some possible extensions when long panel datasets are available. 34

35 Case Study: Does Microfinance Reduce Rural Poverty? (Berhane and Gardderoek 2011 AJAE) Background: The Dedebit Credit and Saving Institution (DECSI) in northern Ethiopia provides financial services for production purposes. It officially launched credit and saving programs in 1997 and expanded quickly into almost all villages in Tigray. By 2000, it was providing loans to 210,000 borrowers with 1.4 million credit transactions amounting to 447 million Ethiopian birrs (ETB) total outstanding loans and ETB74 million total savings. As of 2002, its network of 9 branches and 96 subbranches with headquarters in the capital city of the regional state covered more than 91% of the villages in the region and extended loans to about half a million borrowers. To study the impact of microfinance on poverty reduction, a four round survey with three year intervals ( ) was administered on 400 randomly selected rural borrowers and nonborrowers. The dataset covers household and village level information ranging from household characteristics, consumption, assets credit, and savings, to village infrastructure, markets, and credit contracts. 35

36 This analysis is based on a balanced panel of 351 households, of which 211 borrowed and 140 did not borrow in the 1997 survey. 36

37 37

38 Empirical Method: Consider first the following model for impact evaluation: (1) C it = X it β + prog it γ + M i α + u it, t =1, 2,...,T; i =1, 2,...,N where the outcome variable C it, per capita consumption for household i at time t, is determined by a vector of observable household, village, and MFI level characteristics X it, a program participation variable, prog it, and a vector M i of time invariant unobservable variables. The program participation variable is usually defined as a dummy variable. However, given the nature of the data, the authors define prog it as the number of years the household has been in a borrowing relationship in order to account for the degree or intensity of participation. Panel data models that allow program participation decisions to be correlated with unobservables affecting outcome variables reduce this problem. Three such models, i.e. the standard fixed effects model, the random trend model, and a flexible random trend model were used in the study. 38

39 The standard fixed effects estimator (1) provides a consistent estimate of the borrowing impact, γ, under the assumption that all unobservables that influence the outcome of interest are time invariant, which can be removed by a within or first difference transformation. However, if such individual specific unobservables change over time, the estimate for γ is still biased. There are two potential reasons for such effects. First, unobserved negative economic shocks affecting households input endowments may pressurize households into input bridging borrowings or repeat borrowings to settle earlier debts. Second, credit may have lasting effects on unobservables on which selection is based. For example, unobserved household characteristics such as entrepreneurial abilities, which may condition credit demand, may change over time depending on previous exposure to microfinance credit. The individual specific linear trend model, allows both household specific timeinvariant unobservables and individual trends of time varying unobservables to correlate linearly with program participation. This model remedies bias from timeinvariant factors and linear trends in time varying factors, but not from any remaining nonlinear factors. 39

40 (2) C it = X it β + prog it γ + M i α + g i t + u it where g i is an individual trend parameter, which, in addition to the level effects M i, captures individual specific growth rates over time. A consistent estimate for γ, the treatment effect of an additional year of borrowing, can be obtained by eliminating the linear trend in time varying unobservables as well as time invariant unobservables that can potentially bias γ. Equation (2) is first first differenced to eliminate M i, which gives a standard fixed effects model: (3) C it * =X it * β + prog it * γ + g i * + u it * ; t = 1, 2,...,T where C it * = C it C it 1, X it * =X it X it 1, u it * = u it u it 1 and g i * = g it g i(t 1). Equation (3) is then consistently estimated using a standard fixed effects approach. One then seconddifference equation (3) to eliminate g i * and estimate by pooled OLS. Note that γ can be estimated consistently from this specification only if T > 3. 40

41 An advantage of long panel data sets is that they enable one to estimate the impact from long term rather than one shot program participation. In addition to shifting the levels in each borrowing year, repeated participation may affect the rate of change of the outcome variables relative to nonparticipation. This can be accounted for by including dumprog it t in equation (3): (4) C it = X it β + γ 1 prog it + γ 2 dumprog it t + M i α + g i t + u it where dumprog it is a dummy equal to 1 if individual i participated in credit at time t. This specification provides impact estimates robust to random periodical changes by allowing the individual specific trend to vary on participation over time. Estimation follows the same procedures as for equation (2). 41

42 A more flexible specification allows program indicators to reflect the frequency of participation in each year. This is done by replacing progit and dumprogit t in equation (4) with a series of program indicators for each loan cycle for which the participant has been in the program: (5) C it = X it β + γ 1 prog1 it +,..., + γ k progk it + g i t + M i α + u it where progj it =1 if household i has been in the program for exactly j years in year t and zero otherwise; k is the maximum number of (observed) years a household can be in the program. Program indicators attach more weight to differences between households degree of participation regardless of year of participation. More weight is also given to the timing of participation within each indicator. Estimation follows the same procedures as for equations (2) and (4). 42

43 Standard FE (Eqn. 1) Individual Trend (Eqn. 2) Indiv. Trend + Trend Based on Participation (Eqn. 4) Flexible Random Trend Model (Eqn. 5) No. of years borrowed *** ** ** One year borrowing ** Two years borrowing ** Three years borrowing Four years borrowing ** Random trend*borrowing ** Year 2006 dummy *** *** *** *** Age of HH head Age Cultivated land size (in Tsimad = 0.25 hectare) Land size Intercept Within R N

44 VII. Extension: Difference in Differences Matching Matching estimators assume that outcomes are independent of program participation after conditioning on observables. However, for a variety of reasons, there may be systematic differences between participant and nonparticipant outcomes, even after conditioning on observables. Such differences may arise, for example, because of program selectivity on unmeasured characteristics (such as motivation) or because of systematic differences in the level of outcomes across different communities in which the participants and nonparticipants reside. A difference in differences (DID) matching strategy, as defined in Heckman, Ichimura and Todd (1997, 1998), allows program participation to be based on unobservables as long as the unobservables do not vary over time. This approach is analogous to the standard differences in differences regression estimator, but it reweights the participant and nonparticipant observations according to the weighting functions implied by matching estimators. 44

45 To see how this works, we need to start with the following independence assumption: (ΔY 0, ΔY 1 ) P Z where, ΔY 0 = Y 0t Y 0t, ΔY 1 = Y 1t Y 1t, t and t are time periods before and after the program enrollment date, respectively, and indicates statistical independence. This is a key assumption of the DID matching approach. Intuitively, it means that P does not help predict changes in the value of ΔY 0 (i.e. Y 0t Y 0t ) conditional on Pr(Z). Thus, individuals cannot select into the program based on anticipated changes in Y 0 (i.e. Y 0t Y 0t ). This estimator also requires the support condition: 0 < Prob[P = 1 Z] < 1 If interest centers on the ATT(X) parameter, then the matching independence assumption needs to be made only for ΔY 0. 45

46 As with cross sectional matching, nonparametric weighting can be used to construct matches. The local linear DID estimator is given by: ATT KDM = 1 n 1 i I {(Y 1t i Y 1ti ) W ij (Y 0t j Y 0tj )} 1 S P j I 0 S P where the weights correspond to the local linear weights defined in Session T9. If repeated cross section data are available, instead of longitudinal data, the estimator can be implemented as: ATT KDM = n 1 1t i I 1t S P {Y 1t i j I 0t S P W ij Y 0t j } 1 n1 t ' i I {Y 1ti W ij Y 0tj } 1 t ' S P j I 0 t ' S P where I 1t, I 1t, I 0t, I 0t denote the treatment and comparison group datasets in each time period. 46

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Causality and Experiments

Causality and Experiments Causality and Experiments Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania April 13, 2009 Michael R. Roberts Causality and Experiments 1/15 Motivation Introduction

More information

Controlling for Time Invariant Heterogeneity

Controlling for Time Invariant Heterogeneity Controlling for Time Invariant Heterogeneity Yona Rubinstein July 2016 Yona Rubinstein (LSE) Controlling for Time Invariant Heterogeneity 07/16 1 / 19 Observables and Unobservables Confounding Factors

More information

An example to start off with

An example to start off with Impact Evaluation Technical Track Session IV Instrumental Variables Christel Vermeersch Human Development Human Network Development Network Middle East and North Africa Region World Bank Institute Spanish

More information

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015 Introduction to causal identification Nidhiya Menon IGC Summer School, New Delhi, July 2015 Outline 1. Micro-empirical methods 2. Rubin causal model 3. More on Instrumental Variables (IV) Estimating causal

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Chapter 60 Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated

Chapter 60 Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/222400893 Chapter 60 Evaluating Social Programs with Endogenous Program Placement and Selection

More information

Empirical approaches in public economics

Empirical approaches in public economics Empirical approaches in public economics ECON4624 Empirical Public Economics Fall 2016 Gaute Torsvik Outline for today The canonical problem Basic concepts of causal inference Randomized experiments Non-experimental

More information

Applied Quantitative Methods II

Applied Quantitative Methods II Applied Quantitative Methods II Lecture 10: Panel Data Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 1 / 38 Outline 1 Introduction 2 Pooled OLS 3 First differences 4 Fixed effects

More information

Principles Underlying Evaluation Estimators

Principles Underlying Evaluation Estimators The Principles Underlying Evaluation Estimators James J. University of Chicago Econ 350, Winter 2019 The Basic Principles Underlying the Identification of the Main Econometric Evaluation Estimators Two

More information

Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated 1

Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated 1 Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated 1 Petra E. Todd University of Pennsylvania March 19, 2006 1 This chapter is under preparation for the Handbook

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid Applied Economics Panel Data Department of Economics Universidad Carlos III de Madrid See also Wooldridge (chapter 13), and Stock and Watson (chapter 10) 1 / 38 Panel Data vs Repeated Cross-sections In

More information

Difference-in-Differences Methods

Difference-in-Differences Methods Difference-in-Differences Methods Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 1 Introduction: A Motivating Example 2 Identification 3 Estimation and Inference 4 Diagnostics

More information

Impact Evaluation Technical Workshop:

Impact Evaluation Technical Workshop: Impact Evaluation Technical Workshop: Asian Development Bank Sept 1 3, 2014 Manila, Philippines Session 19(b) Quantile Treatment Effects I. Quantile Treatment Effects Most of the evaluation literature

More information

Beyond the Target Customer: Social Effects of CRM Campaigns

Beyond the Target Customer: Social Effects of CRM Campaigns Beyond the Target Customer: Social Effects of CRM Campaigns Eva Ascarza, Peter Ebbes, Oded Netzer, Matthew Danielson Link to article: http://journals.ama.org/doi/abs/10.1509/jmr.15.0442 WEB APPENDICES

More information

Development. ECON 8830 Anant Nyshadham

Development. ECON 8830 Anant Nyshadham Development ECON 8830 Anant Nyshadham Projections & Regressions Linear Projections If we have many potentially related (jointly distributed) variables Outcome of interest Y Explanatory variable of interest

More information

1 Impact Evaluation: Randomized Controlled Trial (RCT)

1 Impact Evaluation: Randomized Controlled Trial (RCT) Introductory Applied Econometrics EEP/IAS 118 Fall 2013 Daley Kutzman Section #12 11-20-13 Warm-Up Consider the two panel data regressions below, where i indexes individuals and t indexes time in months:

More information

Quantitative Economics for the Evaluation of the European Policy

Quantitative Economics for the Evaluation of the European Policy Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti Davide Fiaschi Angela Parenti 1 25th of September, 2017 1 ireneb@ec.unipi.it, davide.fiaschi@unipi.it,

More information

Econometrics I. by Kefyalew Endale (AAU)

Econometrics I. by Kefyalew Endale (AAU) Econometrics I By Kefyalew Endale, Assistant Professor, Department of Economics, Addis Ababa University Email: ekefyalew@gmail.com October 2016 Main reference-wooldrigde (2004). Introductory Econometrics,

More information

Econometric Causality

Econometric Causality Econometric (2008) International Statistical Review, 76(1):1-27 James J. Heckman Spencer/INET Conference University of Chicago Econometric The econometric approach to causality develops explicit models

More information

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Matthew Harding and Carlos Lamarche January 12, 2011 Abstract We propose a method for estimating

More information

Lecture 9. Matthew Osborne

Lecture 9. Matthew Osborne Lecture 9 Matthew Osborne 22 September 2006 Potential Outcome Model Try to replicate experimental data. Social Experiment: controlled experiment. Caveat: usually very expensive. Natural Experiment: observe

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Section 7 Model Assessment This section is based on Stock and Watson s Chapter 9. Internal vs. external validity Internal validity refers to whether the analysis is valid for the population and sample

More information

Difference-in-Differences Estimation

Difference-in-Differences Estimation Difference-in-Differences Estimation Jeff Wooldridge Michigan State University Programme Evaluation for Policy Analysis Institute for Fiscal Studies June 2012 1. The Basic Methodology 2. How Should We

More information

Instrumental Variables

Instrumental Variables Instrumental Variables Yona Rubinstein July 2016 Yona Rubinstein (LSE) Instrumental Variables 07/16 1 / 31 The Limitation of Panel Data So far we learned how to account for selection on time invariant

More information

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63 1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS17/18 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:

More information

PSC 504: Differences-in-differeces estimators

PSC 504: Differences-in-differeces estimators PSC 504: Differences-in-differeces estimators Matthew Blackwell 3/22/2013 Basic differences-in-differences model Setup e basic idea behind a differences-in-differences model (shorthand: diff-in-diff, DID,

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006 Ability Bias, Errors in Variables and Sibling Methods James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006 1 1 Ability Bias Consider the model: log = 0 + 1 + where =income, = schooling,

More information

Chapter 6 Stochastic Regressors

Chapter 6 Stochastic Regressors Chapter 6 Stochastic Regressors 6. Stochastic regressors in non-longitudinal settings 6.2 Stochastic regressors in longitudinal settings 6.3 Longitudinal data models with heterogeneity terms and sequentially

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL INTRODUCTION TO BASIC LINEAR REGRESSION MODEL 13 September 2011 Yogyakarta, Indonesia Cosimo Beverelli (World Trade Organization) 1 LINEAR REGRESSION MODEL In general, regression models estimate the effect

More information

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook) Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook) 1 2 Panel Data Panel data is obtained by observing the same person, firm, county, etc over several periods. Unlike the pooled cross sections,

More information

Analysis of Panel Data: Introduction and Causal Inference with Panel Data

Analysis of Panel Data: Introduction and Causal Inference with Panel Data Analysis of Panel Data: Introduction and Causal Inference with Panel Data Session 1: 15 June 2015 Steven Finkel, PhD Daniel Wallace Professor of Political Science University of Pittsburgh USA Course presents

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL BRENDAN KLINE AND ELIE TAMER Abstract. Randomized trials (RTs) are used to learn about treatment effects. This paper

More information

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.

More information

Panel data methods for policy analysis

Panel data methods for policy analysis IAPRI Quantitative Analysis Capacity Building Series Panel data methods for policy analysis Part I: Linear panel data models Outline 1. Independently pooled cross sectional data vs. panel/longitudinal

More information

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 6 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 21 Recommended Reading For the today Advanced Panel Data Methods. Chapter 14 (pp.

More information

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017 Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within

More information

y it = α i + β 0 ix it + ε it (0.1) The panel data estimators for the linear model are all standard, either the application of OLS or GLS.

y it = α i + β 0 ix it + ε it (0.1) The panel data estimators for the linear model are all standard, either the application of OLS or GLS. 0.1. Panel Data. Suppose we have a panel of data for groups (e.g. people, countries or regions) i =1, 2,..., N over time periods t =1, 2,..., T on a dependent variable y it and a kx1 vector of independent

More information

Part VII. Accounting for the Endogeneity of Schooling. Endogeneity of schooling Mean growth rate of earnings Mean growth rate Selection bias Summary

Part VII. Accounting for the Endogeneity of Schooling. Endogeneity of schooling Mean growth rate of earnings Mean growth rate Selection bias Summary Part VII Accounting for the Endogeneity of Schooling 327 / 785 Much of the CPS-Census literature on the returns to schooling ignores the choice of schooling and its consequences for estimating the rate

More information

Ch 7: Dummy (binary, indicator) variables

Ch 7: Dummy (binary, indicator) variables Ch 7: Dummy (binary, indicator) variables :Examples Dummy variable are used to indicate the presence or absence of a characteristic. For example, define female i 1 if obs i is female 0 otherwise or male

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

Linear Models in Econometrics

Linear Models in Econometrics Linear Models in Econometrics Nicky Grant At the most fundamental level econometrics is the development of statistical techniques suited primarily to answering economic questions and testing economic theories.

More information

Click to edit Master title style

Click to edit Master title style Impact Evaluation Technical Track Session IV Click to edit Master title style Instrumental Variables Christel Vermeersch Amman, Jordan March 8-12, 2009 Click to edit Master subtitle style Human Development

More information

Econ 582 Fixed Effects Estimation of Panel Data

Econ 582 Fixed Effects Estimation of Panel Data Econ 582 Fixed Effects Estimation of Panel Data Eric Zivot May 28, 2012 Panel Data Framework = x 0 β + = 1 (individuals); =1 (time periods) y 1 = X β ( ) ( 1) + ε Main question: Is x uncorrelated with?

More information

Statistical Models for Causal Analysis

Statistical Models for Causal Analysis Statistical Models for Causal Analysis Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 Three Modes of Statistical Inference 1. Descriptive Inference: summarizing and exploring

More information

Treatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison

Treatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison Treatment Effects Christopher Taber Department of Economics University of Wisconsin-Madison September 6, 2017 Notation First a word on notation I like to use i subscripts on random variables to be clear

More information

General motivation behind the augmented Solow model

General motivation behind the augmented Solow model General motivation behind the augmented Solow model Empirical analysis suggests that the elasticity of output Y with respect to capital implied by the Solow model (α 0.3) is too low to reconcile the model

More information

Fixed Effects Models for Panel Data. December 1, 2014

Fixed Effects Models for Panel Data. December 1, 2014 Fixed Effects Models for Panel Data December 1, 2014 Notation Use the same setup as before, with the linear model Y it = X it β + c i + ɛ it (1) where X it is a 1 K + 1 vector of independent variables.

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

Differences in Differences (DD) Empirical Methods. Prof. Michael R. Roberts. Copyright Michael R. Roberts

Differences in Differences (DD) Empirical Methods. Prof. Michael R. Roberts. Copyright Michael R. Roberts Differences in Differences (DD) Empirical Methods Prof. Michael R. Roberts 1 Topic Overview Introduction» Intuition and examples» Experiments» Single Difference Estimators DD» What is it» Identifying Assumptions

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous Econometrics of causal inference Throughout, we consider the simplest case of a linear outcome equation, and homogeneous effects: y = βx + ɛ (1) where y is some outcome, x is an explanatory variable, and

More information

Potential Outcomes Model (POM)

Potential Outcomes Model (POM) Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics

More information

Panel data panel data set not

Panel data panel data set not Panel data A panel data set contains repeated observations on the same units collected over a number of periods: it combines cross-section and time series data. Examples The Penn World Table provides national

More information

Policy-Relevant Treatment Effects

Policy-Relevant Treatment Effects Policy-Relevant Treatment Effects By JAMES J. HECKMAN AND EDWARD VYTLACIL* Accounting for individual-level heterogeneity in the response to treatment is a major development in the econometric literature

More information

Non-linear panel data modeling

Non-linear panel data modeling Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1

More information

Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13)

Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13) Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13) Tsun-Feng Chiang* *School of Economics, Henan University, Kaifeng, China March 3, 2014 1 / 30 Pooling Cross Sections across Time Pooled

More information

Next, we discuss econometric methods that can be used to estimate panel data models.

Next, we discuss econometric methods that can be used to estimate panel data models. 1 Motivation Next, we discuss econometric methods that can be used to estimate panel data models. Panel data is a repeated observation of the same cross section Panel data is highly desirable when it is

More information

6. Assessing studies based on multiple regression

6. Assessing studies based on multiple regression 6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Econometric Methods for Ex Post Social Program Evaluation

Econometric Methods for Ex Post Social Program Evaluation Econometric Methods for Ex Post Social Program Evaluation Petra E. Todd 1 1 University of Pennsylvania January, 2013 Chapter 1: The evaluation problem Questions of interest in program evaluations Do program

More information

Impact Evaluation of Rural Road Projects. Dominique van de Walle World Bank

Impact Evaluation of Rural Road Projects. Dominique van de Walle World Bank Impact Evaluation of Rural Road Projects Dominique van de Walle World Bank Introduction General consensus that roads are good for development & living standards A sizeable share of development aid and

More information

Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1

Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1 Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1 Income and wealth distributions have a prominent position in

More information

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler Basic econometrics Tutorial 3 Dipl.Kfm. Introduction Some of you were asking about material to revise/prepare econometrics fundamentals. First of all, be aware that I will not be too technical, only as

More information

Short T Panels - Review

Short T Panels - Review Short T Panels - Review We have looked at methods for estimating parameters on time-varying explanatory variables consistently in panels with many cross-section observation units but a small number of

More information

Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem

Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects The Problem Analysts are frequently interested in measuring the impact of a treatment on individual behavior; e.g., the impact of job

More information

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning Økonomisk Kandidateksamen 2004 (I) Econometrics 2 Rettevejledning This is a closed-book exam (uden hjælpemidler). Answer all questions! The group of questions 1 to 4 have equal weight. Within each group,

More information

1. The OLS Estimator. 1.1 Population model and notation

1. The OLS Estimator. 1.1 Population model and notation 1. The OLS Estimator OLS stands for Ordinary Least Squares. There are 6 assumptions ordinarily made, and the method of fitting a line through data is by least-squares. OLS is a common estimation methodology

More information

PhD/MA Econometrics Examination January 2012 PART A

PhD/MA Econometrics Examination January 2012 PART A PhD/MA Econometrics Examination January 2012 PART A ANSWER ANY TWO QUESTIONS IN THIS SECTION NOTE: (1) The indicator function has the properties: (2) Question 1 Let, [defined as if using the indicator

More information

Notes on Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market

Notes on Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market Notes on Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market Heckman and Sedlacek, JPE 1985, 93(6), 1077-1125 James Heckman University of Chicago

More information

Introduction to Econometrics. Regression with Panel Data

Introduction to Econometrics. Regression with Panel Data Introduction to Econometrics The statistical analysis of economic (and related) data STATS301 Regression with Panel Data Titulaire: Christopher Bruffaerts Assistant: Lorenzo Ricci 1 Regression with Panel

More information

Session IV Instrumental Variables

Session IV Instrumental Variables Impact Evaluation Session IV Instrumental Variables Christel M. J. Vermeersch January 008 Human Development Human Network Development Network Middle East and North Africa Middle East Region and North Africa

More information

Efficiency of repeated-cross-section estimators in fixed-effects models

Efficiency of repeated-cross-section estimators in fixed-effects models Efficiency of repeated-cross-section estimators in fixed-effects models Montezuma Dumangane and Nicoletta Rosati CEMAPRE and ISEG-UTL January 2009 Abstract PRELIMINARY AND INCOMPLETE Exploiting across

More information

Gov 2002: 9. Differences in Differences

Gov 2002: 9. Differences in Differences Gov 2002: 9. Differences in Differences Matthew Blackwell October 30, 2015 1 / 40 1. Basic differences-in-differences model 2. Conditional DID 3. Standard error issues 4. Other DID approaches 2 / 40 Where

More information

Panel Data. STAT-S-301 Exercise session 5. November 10th, vary across entities but not over time. could cause omitted variable bias if omitted

Panel Data. STAT-S-301 Exercise session 5. November 10th, vary across entities but not over time. could cause omitted variable bias if omitted Panel Data STAT-S-301 Exercise session 5 November 10th, 2016 Panel data consist of observations on the same n entities at two or mor time periods (T). If two variables Y, and X are observed, the data is

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Princeton University Asian Political Methodology Conference University of Sydney Joint

More information

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover). STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods in Economics 2 Course code: EC2402 Examiner: Peter Skogman Thoursie Number of credits: 7,5 credits (hp) Date of exam: Saturday,

More information

An Introduction to Causal Analysis on Observational Data using Propensity Scores

An Introduction to Causal Analysis on Observational Data using Propensity Scores An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

Econometrics I Lecture 3: The Simple Linear Regression Model

Econometrics I Lecture 3: The Simple Linear Regression Model Econometrics I Lecture 3: The Simple Linear Regression Model Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology 44716 Fall 1397 1 / 32 Outline Introduction Estimating

More information

Basic Linear Model. Chapters 4 and 4: Part II. Basic Linear Model

Basic Linear Model. Chapters 4 and 4: Part II. Basic Linear Model Basic Linear Model Chapters 4 and 4: Part II Statistical Properties of Least Square Estimates Y i = α+βx i + ε I Want to chooses estimates for α and β that best fit the data Objective minimize the sum

More information

Alternative Approaches to Evaluation in Empirical Microeconomics

Alternative Approaches to Evaluation in Empirical Microeconomics Alternative Approaches to Evaluation in Empirical Microeconomics Richard Blundell and Monica Costa Dias Institute for Fiscal Studies August 2007 Abstract This paper reviews a range of the most popular

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

Topic 10: Panel Data Analysis

Topic 10: Panel Data Analysis Topic 10: Panel Data Analysis Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Introduction Panel data combine the features of cross section data time series. Usually a panel

More information

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1 PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,

More information

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Alessandra Mattei Dipartimento di Statistica G. Parenti Università

More information

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply Dimitri Dimitropoulos Fall 2014 UToronto 1 / 55 References RW Section 3. Wooldridge, J. (2008). Introductory Econometrics:

More information