Impact Evaluation Workshop 2014: Asian Development Bank Sept 1 3, 2014 Manila, Philippines

Impact Evaluation Workshop 2014: Asian Development Bank Sept 1 3, 2014 Manila, Philippines Session 15 Regression Estimators, Differences in Differences, and Panel Data Methods

I. Introduction: Most evaluations in developing countries were conducted on new programs that did not exist before the randomized trial was conducted. In contrast, impact evaluations that are not based on randomized trials almost always are conducted on programs that have existed before the evaluation was planned. Theree are two not random: ways in which participation is 1. The communities in which the programs exist are not randomly chosen. 2. The participants in the program are not randomly assigned. 2

This session has three objectives: 1. Explain how very simple ordinary least squares (OLS) estimates of program effects could lead to biased estimates of program impacts. 2. Present four commonly used regression methods to estimate program impacts, including the assumptions needed for those estimations methods to produce unbiased and consistent estimates. The cross section estimator The before after estimator The difference in difference estimator The within estimator 3. Present a case study exploiting panel data methods that allow for correlation between time and the treatment. Let s start with some very simple examples (well, let's skip them here) 3

Example: A Before After Estimator The before after estimator obtains a program s impact by comparing outcomes measured after the program started with outcomes measured before it started. Consider a program thatt provides loans to poor farmers, so that they can buy fertilizer to increase their maize production. In the year before the program started, we observed that farmers who later enrolled in the program harvested an average of 1,000 kg of maize per hectare (ha). One year after the program started, maize yields increased to 12000 kg/ha. The before after estimator finds a program impact of 200 ( =1200 1000) kg/ha. Question: Is 200 kg/ha a plausible estimatee of the program s impact? Consider two cases: 4

A: Rainfall was normal during the year before the program started, but a drought occurred in the year the program was launched. B: A drought occurred in the year before the program started, but rainfall returned to normal during the year the program was launched. 5

Note: The before after estimator assumes Counterfactual C. Counterfactuals A and B pick up impacts of factors other than the program (e.g. weather changes). 6

Example: A Cross section (Enrolled and Nonenrolled) Estimator The cross section estimator obtains a program s impact by comparing outcomes of participants and non participants after the program started. Consider the microfinance program again. weree collected one year after the program Now suppose thatt the only data we have started (i.e. no before data). One year after the program began, the farmers who enrolled in the program harvested an average of 1,100 kg of maize per ha, while those who did not enrolll harvested an average of 1,000 kg/ha. The cross section estimator calculates a program impact of a 100 ( =1,100 1,000) kg/ /ha increase in maize yields. Question: Is 100 kg/ha a plausible estimatee of the program impact? 7

Consider the following case scenarios: A: More productive farmers were more likely to obtain the loan because they were more likely to be able to pay back the loan. B: Farmers in the program reside in areas where the quality of land is lower (e.g. they needed more fertilizer to compensate for low land quality). Example: A Difference in Differences Estimator Consider the microfinance example again. A drought occurred the year before the program started, but rainfall was normal the year program was launched. Assume that all farmers were affected by the drought, and they were affected in a similar way. Assume also that not only do we have data collected one year after the program was launched, but we also have data on maize yields before the program was launched, for both enrolled and nonenrolled farmers. Before the program, the farmers who later enrolled in the program harvested 1000 kg of maize per ha, and they harvested 1150 kg/ha one year after the program started. Farmers who did not enroll harvested 900 kg/ha before the program began, and 1000 kg/ha one 8

year after the program started. A DID estimator combines the before after and crosssectional (enrolled nonenrolled) estimators: 1 2 = [(enrolled, after) (enrolled, before)] [(nonenrolled, after) (nonenrolled, before)] This yields an estimated impact of (1150 1000) (1000 900) = 50 kg/ha. Intuition: 1 and 2 remove influence of time invariant factors, e.g., land quality; 1 Δ 2 removes the influence of the common time trend due to, say, the drought. The following figure illustrates these three simple estimators: Cross section: Estimated effect = C D (ignores fixed factors, e.g. land quality, between groups); Before After: Estimated effect = C A (ignores time trend); DID: Estimated effect = (C A) (D B) = C E (accounts for both) 9

II. Parameters of Interest and Sources of Bias Recall the two most common parameters of interest for impact evaluation: 1. ATE: the average effect of the program for all persons in the population: ATE E[Y 1 Y 0 ] = E[Δ] 2. ATT: the average effect of the program for program participants: ATT E[Y 1 Y 0 P = 1] = E[Δ P = 1] Recall also that sometimes it is possible to go further by estimating ATE and ATT for a person with characteristics X (a vector of observable variables): ATE(X) E[Y 1 Y 0 X] = E[Δ X] ATT(X) E[Y 1 Y 0 P = 1, X] = E[Δ P = 1, X] If the individuals who take the program tend to be the ones that receive the greatest benefit from it, then we would expect ATT(X) > ATE(X). 10

In general, the difference between the mean of observed Y for program participants (P = 1 group) and the mean of observed Y for program non participants (P = 0 group) will not give a consistent (unbiased) estimate of either ATE(X) or ATT(X). To see how bias comes about, assume that for any person in the population the values of Y 1 (the value of Y if that person participates in the program) and Y 0 (the value of Y if that person does not participate in the program) can be expressed as simple linear functions of the X variables for that person, plus an error term: Y 1 = Xβ 1 + U 1 Y 0 = Xβ 0 + U 0 where we assume that E[U 1 X] = E[U 0 X] = 0. The observed value of Y can be written as Y = PY 1 + (1 P)Y 0, where P equals 1 if the person participates in the program and equals 0 if he/she does not participate. 11

Note that this setup is quite general, and it allows the program impact to work through X and U. Note that: Y = PY 1 + (1 P)Y 0 Substituting our values for Y 0 and Y 1 above, we have: Regrouping terms, we have: Y= P(Xβ 1 +U 1 ) + (1 P)(Xβ 0 + U 0 ) Y= Xβ 0 + P(Xβ 1 Xβ 0 ) + {U 0 + P(U 1 U 0 )} What does this expression for Y have to do with ATE(X) and ATT(X)? In fact, it is easy to manipulate this expression to show the relationships. To begin, recall that ATE(X) = E[Y 1 Y 0 X]. Substituting the above expressions for Y 1 and Y 0 and rearranging terms, we have: 12

ATE(X) = E[Y 1 Y 0 X] = E[(Xβ 1 + U 1 ) (Xβ 0 + U 0 ) X] = E[(Xβ 1 Xβ 0 ) X] + E[U 1 X] E[U 0 X] Recalling that E[U 1 X] = E[U 0 X] =0 (assumption made above), we have = (Xβ 1 Xβ 0 ) + 0 + 0 This implies that the above expression for Y can be written as: Y = Xβ 0 + P ATE(X) + {U 0 + P(U 1 U 0 )} You can also show (by adding and subtracting PE[U 1 U 0 X, P = 1]) that: Y = Xβ 0 + P ATT(X) + {U 0 + P (U 1 U 0 E[U 1 U 0 X, P = 1])}. 13

These two expressions show us how bias can arise when trying to estimate ATE(X) and ATT(X). To estimate ATE(X), the expression Y = Xβ 0 + P ATE(X) + {U 0 + P(U 1 U 0 )} suggests that we regress Y on X and P, and the coefficient on P will be ATE(X). However, this will yield unbiased and consistent estimates of ATE(X) only if the error term {U 0 + P(U 1 U 0 )} is uncorrelated with X and P! In other words, we need to assume that: E[U 0 + P(U 1 U 0 ) X, P] = 0 Similarly, to estimate ATT(X) the above expression Y = Xβ 0 + P ATT(X) + {U 0 + P (U 1 U 0 E[U 1 U 0 X, P = 1])} suggests that the same regression yields an estimate of ATT(X) if the following holds: E[U 0 + P (U 1 U 0 E[U 1 U 0 X, P = 1]) X, P] = 0 14

Note that E[P (U 1 U 0 E[U 1 U 0 X, P = 1]) X, P] = 0. This can be seen by considering the two possible values of P. If P = 0 then the expression equals 0. If P = 1 the expression becomes E[(U 1 U 0 E[U 1 U 0 X, P = 1]) X, P = 1], which equals E[U 1 U 0 X, P = 1] E[U 1 U 0 X, P = 1] = 0. So the only concern in estimating ATT(X) is whether E[U 0 X, P] = 0. Therefore, to estimate ATT(X), we need assumptions that imply E[U 0 X, P] = 0. Similarly, estimating ATE(X) requires assumptions that imply E[U 0 + P(U 1 U 0 ) X, P] = 0. Consider the following assumptions: (A.1) Conditional on X, the program effect is the same for everyone (U 1 = U 0 ) (A.2) Conditional on X, the program effect varies across individuals (U 1 U 0 ), but U 1 U 0 does not predict program participation (A.3) Conditional on X, the program effect varies across individuals and U 1 U 0 does predict who participates in the program. 15

Note that Assumptions (A.1) and (A.2) imply that ATE(X) = ATT(X): ATT(X) E[Y 1 Y 0 X, P = 1] = E[Xβ 1 Xβ 0 + U 1 U 0 X, P = 1] = Xβ 1 Xβ 0 + E[U 1 U 0 X, P = 1] Under Assumption (A.1), the last term equals 0. It also equals 0 under Assumption (A.2) because that assumption implies that E[U 1 U 0 X, P = 1] = E[U 1 U 0 X], which also equals zero. Thus ATT(X) = Xβ 1 Xβ 0 under either Assumption (A.1) or Assumption (A.2), and ATE(X) = Xβ 1 Xβ 0 as well because E[U 1 U 0 X] = 0. Under assumptions (A.1) and (A.2), ATE = ATT and potential bias arises only if E[U 0 X, P] 0. Under assumption (A.3), bias in estimating ATE can arise if either E[U 0 X, P] 0 or E[U 1 U 0 X, P] 0 (see p.13). 16

III. Cross Section Estimator The cross section estimator uses data on a group of nonparticipants to impute counterfactual outcomes for program participants. The data for both groups are collected during the same time period, after the program has started. We now modify the notation to allow for a time subscript: Y 1it = value of Y for person i at time t if he/she participates in the program at time t Y 0it = value of Y for person i at time t if he/she does not participate in the program at time t. The data requirements of this estimator are minimal: it requires data only on participants (P it = 1) and non participants (P it = 0) for some time period t after the participants started their involvement in the program. 17

The cross section estimator can be defined as the OLS estimate of: Y it = X it β 0 + P it ATT(X it ) + ε it where ε it = U 0it + P it (U 1it U 0it E[U 0it U 1it X it, P it = 1]). That is, Y it is regressed on X it and P it interacted with X it, and the coefficients on P it interacted with X it provide estimates of the average treatment effect on the treated (ATT) for people with characteristics X it. In practice, it is often assumed that treatment effects are the same across different X it, so that Y it is regressed on X it and the indicator P it, and the single coefficient on P is interpreted as the treatment effect. Recall that under assumptions (A.1) or (A.2), ATE(X) and ATT(X) are the same parameter. Consistency of the cross section regression estimator requires that the error term ε it not be correlated with either X it or P it, i.e. that E[ε it P it, X it ] = 0. This restriction is violated and thus the cross section regression estimator is biased and inconsistent if people select into the program based on expectations about their own gain from the program (violation of A.3). 18

To see this, consider that unobservable characteristics, like motivation, intellectual ability, or other advantages are likely to be present and correlated with both participation in or access to the treatment and with the outcome variable, introducing bias into the estimates of the treatment effects. Even though this strong assumption is likely to be violated, the cross section estimator is commonly used because of its minimal data requirements. Thus the other three regression estimators (before after, difference in differences, and within) are preferred, although each of them requires some kind of additional requirement of the data. 19

IV. The Before After Estimator Suppose that we have panel data, that is data collected from the same people for 2 or more time periods and that we observe only program participants. For both of the potential outcomes (Y 1 and Y 0 ), assume the same linear model used above: Y 1it = X it β 1 + U 1it Y 0it = X it β 0 + U 0it The X it variables may either be fixed (e.g. gender) or time varying (e.g. age), but they are assumed to be unaffected by an individual s participation in the program. The error terms U 1it and U 0it are assumed to satisfy E[U 1it X it ] = E[U 0it X it ] = 0. Suppose the intervention took place in period t *. For t < t *, none of the individuals had yet participated in the program, so we observe Y 0it and P it = 0. For t > t *, we observe Y 1it and P it = 1. 20

Thus, the observed outcome at time t can be written as: Y it = X it β 0 + P it Δ(X it ) +U 0it where P it denotes having participated in the program and Δ(X it ) = X it β 1 X it β 0 + U 1it U 0it is the treatment impact for individual i (note that it is not an average treatment effect because it is for a single person). The evaluation problem can be viewed as a missing data problem, because each person is observed in only one of two potential states (treated or untreated) at any point in time and the missing state needs to be imputed. The before after estimator addresses the missing data problem by using pre program data to impute the missing counterfactual outcome. Let t and t denote two time periods, one before and one after the program intervention. Suppose that we want to estimate the impact of the program on a person who participates between those two time periods. 21

In the notation of the panel data model, we can define the ATT(X it ) parameter as: ATT(X it ) = E[Δ(X it ) P it = 1, P it = 0, X it ] = [X it' β 1 + U 1it' X it' β 0 + U 0it' P it = 1, P it = 0, X it ] (all evaluated at t') where the conditioning on P it = 1 and P it = 0 indicates that the person was not in the program at time t but did participate in the program by time t. The before after estimator for ATT(X it ) can be written as follows. Y it Y it = X it β 1 X it β 0 + U 1it U 0it (for participants only; 1st Eqn. at t' 2nd at t) We can derive how this may be estimated using OLS, as follows: = X it β 1 X it β 0 + E[Δ(X it ) P it = 1, P it = 0, X it ] E[Δ(X it ) P it = 1, P it = 0, X it ]+ U 1it U 0it = X it β 1 X it β 0 + ATT(X it ) E[X it β 1 X it' β 0 + U 1it U 0it P it = 1, P it = 0, X it ] + U 1it U 0it = X it β 1 X it β 0 X it β 1 + X it' β 0 + ATT(X it ) E[U 1it U 0it P it = 1, P it = 0, X it ] + U 1it U 0it = (X it X it )β 0 + ATT(X it ) E[U 1it U 0it P it = 1, P it = 0, X it ] + U 1it U 0it + U 0it U 0it 22

The last expression implies that one can use OLS to estimate the following: Y it Y it = (X it X it )β 0 + ATT(X it ) + ε it where ε it = (U 1it U 0it E[U 1it U 0it P it = 1, P it = 0, X it ]) + U 0it U 0it Thus, the treatment impact can be obtained from a regression of the difference Y it Y it regressed on (X it X it ) and also on X it in levels (i.e. part of the ATT(X it' ) = [X it' β 1 + U 1it' X it' β 0 + U 0it' P it = 1, P it = 0, X it ]). The coefficients on X it, along with the constant term, provide estimates of ATT(X it ), controlling for any time varying X it variables. If the regressors X are not time varying, then the regression simplifies to regressing Y it Y it on X it. Note, however, that this estimation strategy does not allow for estimation of timespecific intercepts that are unrelated to program participation. The β 0 have to be assumed to be non time varying, or else they cannot be separately identified from the treatment effect. 23

Consistent estimation of the ATT(X it ) term requires E[ε it P it = 1, P it = 0, X it ] = 0. In fact, the term in parentheses in the expression for ε it has conditional mean of 0 by construction: E[U 1it U 0it E[U 1it U 0it P it = 1, P it = 0, X it ] P it = 1, P it = 0, X it ] so the key assumption needed for the before after estimator to be an unbiased and consistent estimator is the following: E[U 0it U 0it P it = 1, P it = 0, X it ] = 0. A special case where this assumption is satisfied is when U 0it can be decomposed into a fixed effect error structure: U 0it = f i + v it where f i is fixed over time and v it satisfies E[v it v it P it = 1, P it = 0, X it ] = 0. 24

Intuitively, this assumption allows selection into the program to be based on unobservable characteristics that are time invariant (called f i here), which could be correlated with P it, but are then differenced out of the expression U 0it U 0it. Thus a before after estimation strategy allows for person specific permanent unobservables that affect the program participation decision. The regression as described above has one pre and one post program observation for each person and the model is estimated only for people who eventually participate in the program. If there are more than two periods of data available, the model can also be estimated as a standard fixed effects regression (taking deviations from means), making use of all the data available. 25

V. Difference in Differences (DID) Estimators The difference in differences (DID) estimator measures the impact of the program intervention by the difference in the before after change in outcomes between participants and nonparticipants. To see how it works, recall that t is a time period before the program started and t is some me period a er it started. Define a (time invariant) indicator variable, denotedd by I i, thatt equals 1 for participants (those for whom P it = 0 and P it = 1) and 0 for non participants (for whom P it = P it = 0). The DID estimator is the OLS estimate of ATT(X it ) in the following regression equation: Y it Y it = X it β 0 X it β 0 + I i ATT(X it ) + where ε it = P it (U 1it U 0i t E[U 1it U 0it P i t = 1, P it = 0, X it ]) + ε it U 0it U 0it t 26

Note that this regression equation is identical to that for the before after estimator, except that now it is estimated using both participant and nonparticipant observations. The DID estimator addresses an important shortcoming of the before after estimator in that it allows for time specific intercepts that are common across groups (which can be included in X it β 0 ). These time effects are identified separately from the treatment effects because of the inclusion of the nonparticipant observations (recall that with the before after estimator, the constant term was attributed to the treatment effect, which is not the case here). The DID estimator is unbiased and consistent if E[ε it P it, X it ] = 0, which would be satisfied under a fixed effect error structure. With more than two time periods, the DID estimator can be implemented using a panel data fixed effects regression. 27

The data required to implement the DID estimator can be either panel data or repeated cross section data on both participants and nonparticipants. If it is implemented using repeated cross section data, stronger assumptions are needed on the error term. There are also ways of specifying the DID estimator as a levels equation rather than a differenced equation. For example, it can be estimated using the regression: Y it = X it β 0 + t + f i + P it ATT(X it ) + ε it for t = t,, t where ε it = U 0it + P it (U 1it U 0it E[U 1it U 0it P it = 1, X it ]). In this equation, t indicates a time specific intercept and f i is an individual level fixed effect (an indicator variable for each individual). Alternatively, the model could be estimated in deviation from mean form, in which case the fixed effect term would not need to be included since it will be differenced out. 28

If repeated cross section data are available rather than longitudinal data, then it is not possible to estimate fixed effects. In that case, we need to impose a stronger assumption on the error term, namely that E[ε it P it, X it ] = 0, which requires that E[U 0it P it, X it ] = 0. This means that people cannot select into the program based on their U 0it values. This is in contrast to panel data, in which these time invariant unobservables are differenced out. The main advantage of longitudinal (before after or difference in difference) estimators over cross sectional methods is that they allow for unobservable determinants of program participation decisions that are correlated with outcomes. However, the fixed effects error structure that is imposed to justify application of these estimators requires that unobservables which could be correlated with the error term be time invariant; this does not allow for variables that both vary over time and are correlated with the observed variables. For example, we might expect there to be correlated unobserved earnings shocks that make people more likely to participate in a social program (such as a public works program) and that would not be captured by a fixed effects error structure. 29

VI. Extension: Within Estimators (one way fixed effects) Within estimators identify program impacts from changes in outcomes within some unit, such as within a family, a school or a village. The before after and difference in differences estimators can also be viewed as within estimators, where the variation exploited is the change over time within a given individual. This section describes other kinds of within estimators. Let Y 0ijt and Y 1ij jt denote the outcomes for individual i, who is a member of unit j, and is observed at time t. For simplicity, at first assume that U 1it = U 0it. Assume a linear model for these two outcomes: Y ijt = X ijt β 0 + P ijt ATT(XX ijt ) + ε ijt Assume that the error term ε ijt (= U 0it = U 1it ) can be decomposed as: ε ijt = θ j + v ijt 30

where θ j represents the unobservables that are assumed to be fixed for individuals within the same unit, and the v ijt s are independent & identically distributed (i.i.d). Taking differences between two individuals, denoted by i and i, from the same unit j observed in the same time period t gives: Y ijt Y i jt = (X ijt X i jt )β 0 + (P ijt P i jt ) ATT(X ijt ) + (v ijt v i jt ). To estimate ATT(X ijt ), regress Y ijt Y i jt, X ijt X i jt and interaction terms between P ijt P i jt and X ijt. Consistency and unbiasedness of the OLS estimator of ATT(X ijt ) requires that: E[v ijt v i jt X ijt, X i jt, P ijt, P i jt ] = 0 This assumption implies that, within a particular unit, the individual who gets the treatment is selected without any influence of the error term v ijt. 31

Comments on the Within Estimator 1. Because it relies on comparing the outcomes of treated and untreated persons, the approach implicitly assumes that there are no spillover effects from treating one individual onto other individuals within the same unit. 2. In the more general version of the model, where U 1it U 0it, one must also assume that the individual in the unit that receives the treatment is selected without any influence of that individual s idiosyncratic gain from the program. That is, the program may be targeted at specific units (e.g. families or villages), but within those units, the selection of participants into the program should be unrelated to their idiosyncratic gains from the program (unrelated to U 1it U 0it ). 3. As with the before after and difference in differences estimation approaches, the within estimator just described allows treatment to be selective across units. That is, it allows E[ε ijt P ijt, X ijt ] 0, because treatment selec on can be based on the unobserved heterogeneity term θ j (heterogeneity shared among individuals within a unit). 4. When the variation being exploited for identification of the treatment effect is variation within a family, village, or school at a single point in time, then the within estimator can be implemented with a single cross section of data. 32

Discuss: What are some sources of heterogeneity that might be shared by all individuals in a community? What are some thatt might vary within a community? What are some advantages of using the within estimator? 33

VI. Extension: Two Way Fixed Effects and More Y it = X it β 0 + t + f i + P it ATT(X it ) + ε it for t = t,, t What if P is correlated with time variant unobservables? What if the program enrollment expands over time? What if the treatment effect varies over time? The following case study illustrates some possible extensions when long panel datasets are available. 34

Case Study: Does Microfinance Reduce Rural Poverty? (Berhane and Gardderoek 2011 AJAE) Background: The Dedebit Credit and Saving Institution (DECSI) in northern Ethiopia provides financial services for production purposes. It officially launched credit and saving programs in 1997 and expanded quickly into almost all villages in Tigray. By 2000, it was providing loans to 210,000 borrowers with 1.4 million credit transactions amounting to 447 million Ethiopian birrs (ETB) total outstanding loans and ETB74 million total savings. As of 2002, its network of 9 branches and 96 subbranches with headquarters in the capital city of the regional state covered more than 91% of the villages in the region and extended loans to about half a million borrowers. To study the impact of microfinance on poverty reduction, a four round survey with three year intervals (1997 2006) was administered on 400 randomly selected rural borrowers and nonborrowers. The dataset covers household and village level information ranging from household characteristics, consumption, assets credit, and savings, to village infrastructure, markets, and credit contracts. 35

This analysis is based on a balanced panel of 351 households, of which 211 borrowed and 140 did not borrow in the 1997 survey. 36

Empirical Method: Consider first the following model for impact evaluation: (1) C it = X it β + prog it γ + M i α + u it, t =1, 2,...,T; i =1, 2,...,N where the outcome variable C it, per capita consumption for household i at time t, is determined by a vector of observable household, village, and MFI level characteristics X it, a program participation variable, prog it, and a vector M i of time invariant unobservable variables. The program participation variable is usually defined as a dummy variable. However, given the nature of the data, the authors define prog it as the number of years the household has been in a borrowing relationship in order to account for the degree or intensity of participation. Panel data models that allow program participation decisions to be correlated with unobservables affecting outcome variables reduce this problem. Three such models, i.e. the standard fixed effects model, the random trend model, and a flexible random trend model were used in the study. 38

The standard fixed effects estimator (1) provides a consistent estimate of the borrowing impact, γ, under the assumption that all unobservables that influence the outcome of interest are time invariant, which can be removed by a within or first difference transformation. However, if such individual specific unobservables change over time, the estimate for γ is still biased. There are two potential reasons for such effects. First, unobserved negative economic shocks affecting households input endowments may pressurize households into input bridging borrowings or repeat borrowings to settle earlier debts. Second, credit may have lasting effects on unobservables on which selection is based. For example, unobserved household characteristics such as entrepreneurial abilities, which may condition credit demand, may change over time depending on previous exposure to microfinance credit. The individual specific linear trend model, allows both household specific timeinvariant unobservables and individual trends of time varying unobservables to correlate linearly with program participation. This model remedies bias from timeinvariant factors and linear trends in time varying factors, but not from any remaining nonlinear factors. 39

(2) C it = X it β + prog it γ + M i α + g i t + u it where g i is an individual trend parameter, which, in addition to the level effects M i, captures individual specific growth rates over time. A consistent estimate for γ, the treatment effect of an additional year of borrowing, can be obtained by eliminating the linear trend in time varying unobservables as well as time invariant unobservables that can potentially bias γ. Equation (2) is first first differenced to eliminate M i, which gives a standard fixed effects model: (3) C it * =X it * β + prog it * γ + g i * + u it * ; t = 1, 2,...,T where C it * = C it C it 1, X it * =X it X it 1, u it * = u it u it 1 and g i * = g it g i(t 1). Equation (3) is then consistently estimated using a standard fixed effects approach. One then seconddifference equation (3) to eliminate g i * and estimate by pooled OLS. Note that γ can be estimated consistently from this specification only if T > 3. 40

An advantage of long panel data sets is that they enable one to estimate the impact from long term rather than one shot program participation. In addition to shifting the levels in each borrowing year, repeated participation may affect the rate of change of the outcome variables relative to nonparticipation. This can be accounted for by including dumprog it t in equation (3): (4) C it = X it β + γ 1 prog it + γ 2 dumprog it t + M i α + g i t + u it where dumprog it is a dummy equal to 1 if individual i participated in credit at time t. This specification provides impact estimates robust to random periodical changes by allowing the individual specific trend to vary on participation over time. Estimation follows the same procedures as for equation (2). 41

A more flexible specification allows program indicators to reflect the frequency of participation in each year. This is done by replacing progit and dumprogit t in equation (4) with a series of program indicators for each loan cycle for which the participant has been in the program: (5) C it = X it β + γ 1 prog1 it +,..., + γ k progk it + g i t + M i α + u it where progj it =1 if household i has been in the program for exactly j years in year t and zero otherwise; k is the maximum number of (observed) years a household can be in the program. Program indicators attach more weight to differences between households degree of participation regardless of year of participation. More weight is also given to the timing of participation within each indicator. Estimation follows the same procedures as for equations (2) and (4). 42

Standard FE (Eqn. 1) Individual Trend (Eqn. 2) Indiv. Trend + Trend Based on Participation (Eqn. 4) Flexible Random Trend Model (Eqn. 5) No. of years borrowed 414.665*** 199.317** 160.738** One year borrowing 273.936** Two years borrowing 319.132** Three years borrowing 310.697 Four years borrowing 665.024** Random trend*borrowing 33.858** Year 2006 dummy 264.098*** 323.439*** 324.497*** 326.079*** Age of HH head 10.216 2.003 1.632 2.578 Age 2 0.059 0.022 0.017 0.027 Cultivated land size (in 11.735 0.496 1.739 0.887 Tsimad = 0.25 hectare) Land size 2 0.066 0.139 0.193 0.175 Intercept 289.897 130.553 113.738 16.268 Within R 2 0.215 0.164 0.169 0.170 N 1404 702 702 702 43

VII. Extension: Difference in Differences Matching Matching estimators assume that outcomes are independent of program participation after conditioning on observables. However, for a variety of reasons, there may be systematic differences between participant and nonparticipant outcomes, even after conditioning on observables. Such differences may arise, for example, because of program selectivity on unmeasured characteristics (such as motivation) or because of systematic differences in the level of outcomes across different communities in which the participants and nonparticipants reside. A difference in differences (DID) matching strategy, as defined in Heckman, Ichimura and Todd (1997, 1998), allows program participation to be based on unobservables as long as the unobservables do not vary over time. This approach is analogous to the standard differences in differences regression estimator, but it reweights the participant and nonparticipant observations according to the weighting functions implied by matching estimators. 44

To see how this works, we need to start with the following independence assumption: (ΔY 0, ΔY 1 ) P Z where, ΔY 0 = Y 0t Y 0t, ΔY 1 = Y 1t Y 1t, t and t are time periods before and after the program enrollment date, respectively, and indicates statistical independence. This is a key assumption of the DID matching approach. Intuitively, it means that P does not help predict changes in the value of ΔY 0 (i.e. Y 0t Y 0t ) conditional on Pr(Z). Thus, individuals cannot select into the program based on anticipated changes in Y 0 (i.e. Y 0t Y 0t ). This estimator also requires the support condition: 0 < Prob[P = 1 Z] < 1 If interest centers on the ATT(X) parameter, then the matching independence assumption needs to be made only for ΔY 0. 45

As with cross sectional matching, nonparametric weighting can be used to construct matches. The local linear DID estimator is given by: ATT KDM = 1 n 1 i I {(Y 1t i Y 1ti ) W ij (Y 0t j Y 0tj )} 1 S P j I 0 S P where the weights correspond to the local linear weights defined in Session T9. If repeated cross section data are available, instead of longitudinal data, the estimator can be implemented as: ATT KDM = n 1 1t i I 1t S P {Y 1t i j I 0t S P W ij Y 0t j } 1 n1 t ' i I {Y 1ti W ij Y 0tj } 1 t ' S P j I 0 t ' S P where I 1t, I 1t, I 0t, I 0t denote the treatment and comparison group datasets in each time period. 46