Estimating the causal effect of fertility on economic wellbeing: data requirements, identifying assumptions and estimation methods

Empir Econ (2013) 44:355 385 DOI 10.1007/s00181-010-0356-9 Estimating the causal effect of fertility on economic wellbeing: data requirements, identifying assumptions and estimation methods Bruno Arpino Arnstein Aassve Received: 21 April 2008 / Accepted: 20 November 2009 / Published online: 14 March 2010 Springer-Verlag 2010 Abstract This article aims to answer to what extent fertility has a causal effect on households economic wellbeing an issue that has received considerable interest in development studies and policy analysis. However, only recently has this literature begun to give importance to adequate modelling for estimation of causal effects. We discuss several strategies for causal inference, stressing that their validity must be judged on the assumptions we can plausibly formulate in a given application, which in turn depends on the richness of available data. We contrast methods relying on the unconfoundedness assumption, which include regressions and propensity score matching, with instrumental variable methods. This discussion has a general importance, representing a set of guidelines that are useful for choosing an appropriate strategy of analysis. The discussion is valid for both cross-sectional or panel data. Keywords Fertility Poverty Causal inference Unconfoundedness Instrumental variables VLSMS JEL Classification D19 I32 J13 1 Introduction There is a strong positive correlation between poverty and family size in most developing countries (Schoumaker and Tabutin 1999). Not much is known however, about the extent fertility has a causal impact on households wellbeing. Needless to say, the issue is of critical importance for implementing sound policies. This article considers B. Arpino (B) A. Aassve Department of Decision Sciences, DONDENA Centre for Research on Social Dynamics, Bocconi University, Via Roentgen, 20136 Milan, Italy e-mail: bruno.arpino@unibocconi.it

356 B. Arpino, A. Aassve different strategies for establishing the causal effect of fertility on households wellbeing. We take a quasi-experimental approach where fertility is considered as a treatment and the outcome is the equivalised household consumption expenditure. We adopt the potential outcomes framework (Neyman 1923; Rubin 1974, 1978) where recorded childbearing events are used as a measure of fertility. Consequently, each household i has two potential outcomes: Y i1 if it experiences a childbearing event between two points in time (treated) and Y i0 otherwise (untreated or control). However, childbearing is, at least in part, down to individual choice, giving rise to self-selection: households that choose to have more children (self-selected into the treatment) may be very different from households that choose to have fewer children irrespective of the treatment. Hence, if we observe that the first group of households has on average lower per capita expenditure, we cannot necessarily assert that this is due to fertility since the two groups of households are likely to be different in respect to many other characteristics, such as education. Thus, a simple difference in the average consumption (or income) for the two groups of households gives a biased estimate. We discuss different strategies to deal with the self-selection problem, stressing that their validity must be judged on the assumptions we can formulate in a given application, which in turn depends on the richness of available data. A key distinction is between those situations where we can assume that selection depends only on characteristics that are observed by the researcher (selection on observables) and those situations where one or more of the relevant characteristics are unobserved (selection on unobservables). In the first case, we compare units of similar characteristics that differ only by the treatment status. For these units the observed difference in the outcome can be reasonably assumed to be due to the treatment. Propensity score matching (PSM) relies on the selection on observables assumption, which is referred as the unconfoundedness assumption (UNC). Multiple regression is also a method relying on this assumption, though the identifying assumption can be stated in a weaker way (see, e.g. Wooldridge 2002). The empirical analyses use the Vietnam Living Standard measurement Survey, a rich panel data set, which was first surveyed in 1992/1993 and with a follow up in 1997/1998. Exploiting the longitudinal structure of the data, we develop our estimators in a pre post treatment setting. This has several advantages. First, covariates are measured before the exposure to the treatment, which makes it more likely that covariates are not affected by the treatment (e.g. Rosenbaum 1984; Imbens 2004). A second advantage is that the lagged value of the outcome variable, Y t1, can be included in the set of matching covariates all of which being measured at the first wave. This is important because the household s level of living standard prior to treatment is relevant both for the probability of experiencing a childbearing event between the two waves and for the consumption expenditure levels at the second wave, Y t2. Having information at two points in time, the dependent variable can be defined as the difference between the levels of the outcome after and before the treatment. In particular, we match individuals in the treatment group with individuals in the control group having similar first-period values, and their changes in outcomes are compared. An advantage of taking the difference in the pre- and post-treatment outcomes is that this helps removing residual imbalance in the average values of Y t1 between treated and control

Estimating the causal effect of fertility on economic wellbeing 357 group. Moreover, it is likely (in our application at least) that the variance is lower when outcome is defined as a change, as opposed to when maintaining the level. Hence, the resulting estimator will be more efficient than the one relying on levels. Importantly, we stress the fact that specifying the outcome as a difference does not change the estimand. The interest remains on the effect of childbearing events between the two waves on the consumption expenditure level at the second wave. Our approach is useful in the sense that the general discussion of methods based on the assumption of selection on observables compared to those based on unobservables, applies independently of whether the application is based on longitudinal or cross-sectional data. The standard solution to deal with selection on unobservables is to use an instrumental variable (IV) method, which of course relies on the availability of a good instrument, which in our case should be a variable which influence fertility and has no direct impact on consumption expenditures. However, even if such a variable is available the estimator can be unsatisfactory. The reason is that, unless we are willing to impose very strong assumptions, IV estimates refer only to the unobserved sub-sample of the population that reacts to the chosen instrument, i.e. the compliers (Imbens and Angrist 1994; Angrist et al. 1996). The corresponding parameter estimate is, consequently, the local average treatment effect (LATE) which, in the presence of heterogeneous treatment effects, may be different from average treatment effect (ATE) and the average treatment effect for the treated (ATT) that usually are the parameters of interest. This is of course important for policy analysis, since only if the instrument coincides with a variable of real policy relevance, can we also argue that the estimated LATE has direct policy usefulness (Heckman 1997). Moreover, the estimated LATE based on different instruments are generally different because the identified sub-populations of compliers are different. We implement the IV approach demonstrating its benefits and drawbacks by using two very different instruments. The first instrument is the sex composition of existing children. This is a widely used instrument (see, e.g. Angrist and Evans 1998; Chun and Oh 2002; Gupta and Dubey 2003) and is based on the fact that parents in Vietnam tend to have a strong preference for boys, especially in the North (Haughton and Haughton 1995; Johansson 1996, 1998; Belanger 2002). Since the preference for sons is a wide-spread phenomenon among Vietnamese households we expect the proportion of compliers to be rather high. The second instrument is the availability of contraception at the community level. This is similar to other well-used instruments related to the availability of services in the neighbourhood or its distance from the dwelling (examples include McClellan et al. (1994) who use proximity to cardiac care centres or Card (1995) who uses college proximity). An interesting aspect of this instrument is that it corresponds to a potential policy variable on which policy makers can act to both reduce fertility and, through it, make an impact on poverty. However, areas without availability of contraceptives in Vietnam are few (Nguyen-Dinh 1997; Duy et al. 2001; Anh and Thang 2002), which means that it cannot be considered as a general policy tool. From a statistical point of view, a key difference between the two instruments is that the second one cannot be considered as randomised, because availability of contraception is related to other characteristics of the community, which in turn may influence households wellbeing. As a consequence, detailed control for covariates

358 B. Arpino, A. Aassve is required, which is usually accomplished by imposing functional form and additive separability in the error term. However, these and other strong assumptions can be avoided if implementing a non-parametric approach, such as the one suggested by Frölich (2007). Whereas we have in our application access to valid instruments, this is not always so. In those situations, IV estimators cannot be used and it becomes important to implement sensitivity analysis for estimators based on selection on observables. So far, this is not very common in the applied literature, but is a critical tool as a means to assess the credibility of the identifying assumption. The key idea of the approach is to evaluate how strong the associations among an unmeasured variable, the treatment and outcome variables must be in order to undermine the results of the analysis based on the UNC. If the results are highly sensitive, the validity of the identifying assumption becomes questionable. Among the different approaches for sensitivity analysis proposed in the literature, we discuss and apply those suggested by Rosenbaum (1987b) and Ichino et al. (2008). The article is organised as follows. Section 2 reviews the statistical issues, Sect. 3 provides background information about the application, Sect. 4 shows the results, and Sect. 5 concludes. 2 Causal inference in observational studies under the potential outcomes approach The potential outcomes approach was introduced by Neyman (1923) and extended by Rubin (1974) to observational studies. We invoke the stable unit treatment value assumption (SUTVA) (Rubin 1980), which states that the potential outcomes for each unit are not affected by the treatments assigned to any other units and that there are no hidden versions of the treatment. Potential outcomes are denoted by Y 1, to indicate the outcome that would have resulted if the unit was exposed to the treatment and Y 0 if it was exposed to the control (Rosenbaum and Rubin 1983a). Since each unit receives only the treatment or control, either Y 1 or Y 0 is observed for each unit. Assume that we have a random sample of N individual units under study {d i, y i, x i } i=1 N. D represents the treatment indicator that takes the value 1 for treated units and 0 for untreated or the controls, Y indicates the observed outcome, and X indicates the set of covariates or confounders. 1 The two causal parameters usually of interest are the ATE and the ATT which are defined as: ATE = E (Y 1 Y 0 ) (1) ATT = E (Y 1 Y 0 D = 1). (2) The ATE is the expected effect of the treatment on a randomly drawn unit from the population while the ATT gives the expected effect of the treatment on a randomly 1 As a convention, capital letters usually denote random variables, whereas small letters indicate their realisations. For simplicity, population units are usually not indexed by unit indicators unless this is necessary for clarity.

Estimating the causal effect of fertility on economic wellbeing 359 drawn unit from the population of treated. It is consequently the parameter that tends to be of interest to policy makers (Heckman et al. 1997). 2.1 Identifying assumptions Those situations where selection depends only on observed characteristics represent a critical distinction from the case where selection also depends on unobserved characteristics. The selection on observables assumption is also known as the UNC and represents the fundamental identifying assumption for a large range of empirical studies: 2 Assumption A.1 (Unconfoundedness) Y 1, Y 0 D X where in the notation introduced by Dawid (1979) indicates independence. Assumption A.1 implies that after conditioning on variables influencing both the selection and the outcome, the dependence among potential outcomes and the treatment is cancelled out. Regression and matching techniques, as well as stratification and weighting methods, all rely on this assumption. In the regression analysis, it suffice to assume that conditional independence of potential outcomes on the treatment hold in expected values (see, e.g. Wooldridge 2002, p. 607). That is, we can substitute assumption A.1 with the weaker: E(Y 1 D, X) = E(Y 1 X) and E(Y 0 D, X) = E(Y 0 X). The fundamental idea behind these methods is to compare treated units with control units that are similar in their characteristics. Another assumption, termed overlap, is also required. Assumption A.2 (Overlap) 0 < P(D = 1 X) <1. where P(D = 1 X) is the conditional probability of receiving the treatment given covariates, X. Assumption A.2 implies equality in the support of X in the two groups of treated and controls (i.e. Support (X D = 1) = Support(X D = 0)) which guaranties that ATE is well defined (Heckman et al. 1997). If the assumption does not hold, then it is possible that for some values of the covariates there are no comparable units. The most common approach to deal with selection on unobservables is to exploit the availability of an IV a variable assumed to impact the selection into treatment but to have no direct influence on the outcome. The concrete possibility to use an IV method relies, of course, on the availability of such a variable. In practice, instruments are often difficult to find. In this case, a sensitivity analysis becomes very useful because it can be used to assess the importance of the violation of the UNC for the estimated causal effect. Of course, it does not represent an alternative to the IV approach. 2 The unconfoundedness assumption is sometimes referred to as the conditional independence or the exogeneity assumption (Imbens 2004).

360 B. Arpino, A. Aassve As with methods based on the UNC also IV methods impose a range of critical identifying assumptions. Let us consider a binary instrument indicated by Z. In randomised settings, the levels of the instruments can be seen as the assignment to the treatment, which is different from the treatment actually taken due to non-compliance. Under SUTVA, we indicate with D z and Y z,d, respectively, the binary potential treatment indicator and the potential outcomes for unit i. The identifying assumptions for the estimation of causal effects using the availability of one IV are clarified by Angrist et al. (1996, in the following AIR). Apart from the SUTVA and the randomisation of the instrument, the fundamental assumptions are: Assumption B.1 (Exclusion Restriction) Y z,d = Y d Assumption B.2 (Nonzero Average Causal Effect of Z on D) Assumption B.3 (Monotonicity) E[D 1 D 0 ] = 0 D i1 D i0 for all i = 1,...,N. The exclusion restriction means that the instrument Z impacts on Y only through D and corresponds to validity of the instrument. Assumption B.2 requires that for at least some unit the instrument changes the treatment status and corresponds to the hypothesis of nonzero correlation between the instrument and the endogenous variable (i.e. relevance of the instrument). The assumption of monotonicity is critical when comparing the IV approach to methods based on the UNC. To see how, we have to characterise units by the way they might react to the level of the instrument. A first group is termed compliers and defined by units that are induced to take the treatment by the instrument: D i1 D i0 = 1. Other units may not be influenced by the instrument and are defined as either always-takers, where D i1 = D i0 = 1 (they always take the treatment whatever being the level of the instrument), or never-takers,ifd i1 = D i0 = 0 (they always take the control). Finally, we might encounter defiers, who are units that do the opposite of their assignment status. The monotonicity assumption implies that there are no defiers and is crucial for identification since otherwise the treatment effect for those who shift from non-participation to participation when Z shift from 0 to 1 can be cancelled out by the treatment effect of those who shift from participation to non-participation (Imbens and Angrist 1994). Importantly, the monotonicity assumption, likewise the exclusion restriction and the UNC, is untestable and its plausibility has to be evaluated in the context of the given application. AIR demonstrate that under the aforementioned assumptions we can only identify the average causal effect calculated on the sub-population of compliers, which is termed the LATE: LATE = E [Y 1 Y 0 D 1 D 0 = 1]. (3)

Estimating the causal effect of fertility on economic wellbeing 361 Critically important for empirical work is that in case of heterogeneous treatment effects LATE is in general different from the ATE and the ATT, which tend to be the parameters of interest. This is because LATE refers only to the sub-population of compliers, while ATE and ATT are defined, respectively, on the whole population and on the sub-population of treated. Moreover, a serious drawback of the LATE is that the sub-population of compliers is not identifiable by the data. Finally, the estimated LATE depends on the instrument used because different instruments identify different sub-population of compliers. In specific applications LATE becomes an interesting parameter for policy. Suppose that the policy maker wants to know the (average) causal effect of D on Y when we obtain a change in D by manipulating it through Z. In this case, the interest lies in the (average) causal effect of D on Y for units that react to the policy intervention on Z (the compliers). In this situation, however, the policy maker cannot identify which are the compliers, but can only estimate the dimension of this group. The presumption in such cases is that the average causal effect calculated on the sub-population of units whose behaviour was modified by assignment is likely to be informative about sub-populations that will comply in the future. 2.2 Strategies for the estimation of causal effects We discuss here three strategies for the estimation of causal effects. The first is based on assumptions A.1 and A.2, and includes regression and PSM. The second strategy consists of combining these methods with a sensitivity analysis. In essence, the sensitivity analysis assesses the robustness of estimates when we suspect failure of the UNC assumption. The third method is the IV approach. Rather than discussing the technical details of each estimator in depth we present instead the general ideas and limitations of the different techniques. For a formal comparison of these methods, we refer to Blundell et al. (2005) and Imbens (2004). 2.2.1 Strategy 1: methods based on the UNC In the standard multivariate regression model, we assume a linear relationship between outcome and independent variables and homogeneity of treatment effects; in fact, in the simplest regression model the treatment variable is not interacted with covariates and its coefficient is the same for all units. This model constrains the ATE to coincide with the ATT and if treatment effects are heterogeneous we are not able to make separate estimates of the two quantities. 3 Moreover, if the true model is nonlinear, the OLS estimates of the treatment effects would be in general biased. In parametric regression, the overlap assumption is not required in so far we can be sure to have the correct specification of the model. Otherwise the comparison of treated and control units outside the common support rely heavily on the linear extrapolation. Of course, the standard model can be extended and made flexible to overcome these limitations. For example, 3 In general, ATE and ATT are expected to differ if the distribution of covariates in the treated and control group are different and if the treatment interacts with covariates (at least some of them).

362 B. Arpino, A. Aassve the common support problem can be circumvented by first estimating it and running the regression conditioning on it. Moreover, we can avoid to assume homogeneous treatment effects by including a complete set of interactions between each one of the covariate X and the treatment indicator D. This gives rise to the so-called fully interacted linear model (FILM in the following see Goodman and Sianesi 2005). Since in the FILM, differently from a fully saturated model, covariates are not recoded into qualitative variables, this approach is still parametric with respect to the way continuous covariates enter the regression function and interact with the treatment. Also, the linearity assumption can be avoided if we use a non-parametric method, such as a kernel estimator (see Hardle and Linton 1994), which allows the functional form between outcome and independent variables to be determined by the data themselves. Non-parametric methods, however, have computational drawbacks when the set of covariates is large and many of them are multi-valued, or, worse, continuous. This problem, known as curse of dimensionality, is also relevant for matching methods. A popular way to overcome the dimensionality problem is to implement the matching on the basis of a univariate propensity score (Rosenbaum and Rubin 1983a). This is defined as the conditional probability of receiving a treatment given pre-treatment characteristics: e(x) Pr{D = 1 X} =E{D X}. When the propensity scores are balanced across the treatment and control groups, the distribution of all covariates X, are balanced in expectation across the two groups (balancing property of the propensity score). Therefore, matching on the propensity score is equivalent of matching on X. Once the propensity score is estimated, several methods of matching are available. The most common ones are kernel (gaussian and epanechnikov), nearest neighbour, radius and stratification matching (for a discussion about these methods see Caliendo and Kopeinig 2005; Smith and Todd 2005; Becker and Ichino 2002). Asymptotically, all PSM estimators should yield the same results (Smith 2000), while in small samples the choice of the matching algorithm can be important and generally a trade-off between bias and variance arises (Caliendo and Kopeinig 2005). As noted by Bryson et al. (2002) it is sensible to try a number of approaches. If they give similar results, the choice is irrelevant. Otherwise, further investigation is needed in order to reveal the source of the disparity. As will be explained in Sect. 4, we adopt this pragmatic approach and assess the sensitivity of results with respect to the matching method. Consistent with many other previous studies (see, e.g. Smith and Todd 2005), the different estimators yield very similar results (both in terms of point estimate and standard errors). The analysis in Sect. 4 is based on a nearest neighbour matching method meaning that for each treated (control) unit the algorithm finds the control (treated) unit with the closest propensity score. We use the variant with replacement implying that we allow a control (treated) individual to be used more than once as a match for individuals in the treated (control) sample. Among the other methods we tried (nearest neighbour without replacement, k-nearest neighbour, radius and kernel) this approach guarantees the best quality of matches, because only units with the closest propensity score are matched, but at the cost of higher variance (Caliendo and Kopeinig 2005). Focussing on the estimation of the ATT, to estimate the treatment effect for a treated person i, the observed outcome y i1 is compared to the outcomes y j0 for the matched unit j in the untreated sample. The ATT estimator can be written as:

Estimating the causal effect of fertility on economic wellbeing 363 ATT ˆ = 1 n D i:d i =1 [ yi1 y m(i)0 ], (4) where n D is the number of treated that find a match in the untreated group and m(i) indicates the matched control for treated unit i. Under assumptions A.1 and A.2, regression and matching techniques can be used with cross-sectional data to estimate ATE and ATT, in which case Y, X, D are all measured at the same time. Longitudinal data available for at least two time points offers some important practical advantages. First, one is in a better position to measure covariates before the exposure to treatment. As is well known, one should only control for those covariates not being affected by the treatment itself (e.g. Rosenbaum 1984). Hence, being able to measure variables before the treatment makes this condition more likely to hold (Imbens 2004). To make this explicit we indicate covariates as X t1, while the outcome as Y t2. The treatment indicator, D, measures childbearing events between t 1 and t 2 and the ATT estimator can be written as: ATT ˆ = 1 n D i:d i =1 [ yi1 t 2 y m(i)0t2 ] A second advantage is that we can include in the matching set the outcome variable of interest measured before the exposure to treatment. In our application, where the outcome is the consumption expenditure (see Sect. 4), we include this variable, Y t1, in the conditioning set measured at the first wave. This reflects the households level of living standards prior to treatment, and is likely to be of relevance both for the probability to experience a childbearing event between the two waves and for the consumption expenditure levels at the second wave, Y t2. The UNC assumption can be more explicitly written as: Assumption A.3 (Unconfoundedness) Y 1t2, Y 0t2 D X t1, Y t1 As noted by Athey and Imbens (2006), assumption A.3 implies that individuals in the treatment group should be matched with individuals in the control group with similar (identical if the matching could be perfect) first-period outcome, as well as other pre-treatment characteristics, and their second-period outcomes should be compared. However, perfect matching is not feasible and matching on the propensity score guarantees that, on average, covariates (including Y t1 in our case) are balanced in the matched treated and control group. Importantly, taking the difference in the pre- and post-treatment outcomes helps in reducing any remaining unbalance in Y t1. This approach is similar in spirit to the bias-correction proposal of Abadie and Imbens (2002) to reduce bias due to residual imbalance in covariates after matching. The fact that the dependent variable is now defined as the difference in the levels of the outcome after and before the treatment implies that the ATT estimator can be written as:

364 B. Arpino, A. Aassve ATT ˆ = 1 n D i:d i =1 [( yid,t2 y id,t1 ) ( ym(i)u,t2 y m(i)u,t1 )], (5) where the subscripts D and U make explicit that the two first outcomes in (5) are measured on treated units and the other two on untreated units. From formula (5) we can see that if the matching is exact on the variable Y t1 then the estimate obtained using the difference as outcome (5) is exactly equal to that in (4). However, even if the matching is not exact but the PSM works well (i.e. we succeed in balancing Y t1 ) then the two estimators are expected to give similar results. It is worth noting that despite the fact that estimator (5) used as dependent variable the change instead of levels at time 2, the estimands of interest, namely, ATE and ATT as defined in (1) and (2) are the same. For example, for the ATT we can note that: ATT = E(Y 1t2 Y 0t2 D = 1) = E[(Y 1t2 Y t1 ) (Y 0t2 Y t1 ) D = 1]. Another advantage from taking the difference is that we expect the resulting estimator to be more efficient. This is likely to be the case in our application (and we suspect in many other applications), since there will be more heterogeneity in the levels of the consumption expenditure at time t 2 compared to the consumption growth between the two waves. In other words, the variable Y t2 is likely to have a higher variance than (Y t2 Y t1 ) although this is not true in general. A related literature motivates the advantages of considering the difference in the pre post levels of the outcome as a way to improve the robustness of the matching method through elimination of possible time-invariant unobservables (Heckman et al. 1997; Smith and Todd 2005; Aassve et al. 2007). The resulting estimator is similar to ours, apart from the fact that Y t1 is not included in the set of matching covariates. The estimator is labelled as matching-difference-in-difference (MDID) and relies on an identifying assumption that is different from A.3. For example, for the ATT the identifying assumption can be written as 4 :(Y 0t2 Y 0t1 ) D X t1. As noted by Athey and Imbens (2006, p. 448) the two assumptions coincide under special conditions imposed on the unobserved components. Otherwise, the two identifying strategy, even though similar, are different and the A.3 remains a selection on observables assumption. The choice is subject matter and depends on what the researcher believes is the best identifying strategy for his/her application. We use A.3 as a starting point and compare treated and control with similar background characteristics X and initial values of the outcome instead of relying on assumptions of conditional parallel trend in the outcome as with the MDID. Having maintained an unconfoundedness-type assumption, the discussion in this section applies also to cross-sectional studies. To deal with the possible presence of unobservables we discuss methods for sensitivity analysis and IV methods. 2.2.2 Strategy 2: sensitivity analysis of methods based on the UNC The UNC becomes implausible once one or more relevant confounders are unobserved. If an instrument is available then one can proceeds with an IV estimator that we discuss in the next sub-section. Several approaches are proposed in the literature 4 If only ATE are to be identified, the assumption can be stated in a weaker form as mean independence instead of full independence (e.g. Heckman et al. 1997).

Estimating the causal effect of fertility on economic wellbeing 365 to deal with situations where instruments are not available and where the plausibility of the unconfoundedness is doubtful. One approach is to implement indirect test of the UNC assumption, relying on the estimation of a pseudo causal effect that is known to be zero (Imbens 2004). A first type is to focus on estimating the causal effect of the treatment of interest on a variable that is known to be unaffected by it. Another type of tests relies on the presence of multiple control groups (Rosenbaum 1987a; Heckman et al. 1997) that arise, for example, when rules for eligibility are in place. The presence of ineligibility rules is also the basis for the bias-correction method proposed by Costa Dias et al. (2008). An important alternative to the indirect tests is the implementation of sensitivity analyses. The fundamental idea of this approach is to relax the unconfoundedness with the aim to assess how strong an unmeasured variable must be in order to undermine the implications of the matching analysis. If the results are highly sensitive, then the validity of the identifying assumption becomes questionable and alternative estimation strategies must be considered. Different approaches for sensitivity analysis have been proposed in the literature. Rosenbaum and Rubin (1983b) and Imbens (2003) propose methods to assess the sensitivity of ATE estimates in parametric regression models. Here, we apply the approaches suggested by Rosenbaum (1987b) and Ichino et al. (2008, in the following IMN) that does not rely on any parametric models for the estimation of the treatment effects. The underlying hypothesis in all of these approaches is that assignment to treatment may be confounded given the set of observed covariates but it is unconfounded given observed and an unobservable covariate, U: Y 1, Y 0 D X, U. In the Rosenbaum s approach, sensitivity is measured using only the relation between the unobserved covariate and the treatment assignment. To briefly describe the Rosenbaum approach, we link the probability that to receives the treatment, π, to observed characteristics, X, and an unobserved covariate, U, with a logisticregression function: ( ) π log = κ (X) + γ U; with 0 U 1. 1 π Under these assumptions, Rosenbaum shows that the odds ratios between two units i and j with the same X values can be bounded in the following way: 1 π i/(1 π i ) π j / ( 1 π j ), where = e γ. If = 1 this means that unconfoundedness holds and that no hidden bias exists. Increasing values of imply an increasingly important role for unobservables on the selection into treatment. Rosenbaum suggests to progressively increase the values of in order to assess the association required to overturn, or change substantially, p-values of statistical tests of no effect of the treatment. If this happens at high values of this means that the results of the analysis based on the UNC are sensitive to the presence of an unobservable only if this was strongly associated with treatment

366 B. Arpino, A. Aassve selection. The plausibility of the presence of such an unobservable has to be judged by the research, depending on the richness of information included in the analysis. Unlike Rosenbaum, the approach by IMN assesses the sensitivity of point estimates of the ATT under different possible scenarios of deviation from the UNC. 5 The underlying hypothesis is, as in the previous approaches, that assignment to treatment may be confounded given the set of observables covariates but it is unconfounded given observed and an unobservable covariate, U. The procedure can be summarised in the following steps: (1) Calculate ATT using PSM on X; (2) Simulate a variable U representing a potential unobserved confounder; (3) Include U together with X in the matching set and calculate ATT; (4) Repeat steps 2 and 3 several times (e.g. 1,000) and calculate average ATT to be compared with the baseline estimate obtained in (1) under UNC. In the simulation process, IMN assume that U and the outcome are binary variables. In case of continuous outcomes, as in our application, a transformation is needed so that the outcome takes the value 1 if it is above a certain threshold (the median for example) and 0 otherwise, alternatively one could consider other outcome variables such as poverty status which essentially is a dichotomous transformation of consumption expenditure. 6 However, this transformed variable is only required to simulate the values of U (step 2) and it is not used as the outcome variable when estimating the ATT (step 3). Since all the involved variables in the simulation are binary, the distribution of U is specified by the four key parameters: p kw = P (U = 1 D = k, Y = w) = P(U = 1 D = k, Y = w, X) k,w = 0, 1 (6) It is assumed here that U is independent to X conditional to D and Y. In order to choose the signs of the associations between U, Y 0 and D, IMN note that if q = p 01 p 00 > 0 then U has a positive effect on Y 0 (conditioning on X), whereas if s = p 1 p 0 > 0, where p k = P(U = 1 D = k), then U has a positive effect on D. Ifwesetp u = P(U = 1) and q = p 11 p 10 the four parameters p kw are univocally identified from specifying the values of q and s. Hence, by changing the values of q and s we can produce different scenarios for U. For example, if we want to mimic the effect of unobserved ability we can set q to a positive value (positive effect on consumption) and s to a negative value (negative effect on fertility). It is important to note that with this approach we can only choose the signs of the associations of U with D and Y 0 according to the values of q and s. However, for increasingly higher absolute values of q and s the strength of the associations increases. Therefore, the idea is to use this sensitivity analysis as in the Rosenbaum approach. The difference is now that, by progressively increase the values of both q and s, we can increase the levels of association between U and treatment and outcome instead of treatment only. In order to have an 5 Under the assumption of an additive treatment effect, Rosenbaum also derives bounds on the Hodges Lehmann point estimate of the treatment effect (see Rosenbaum 2002 for details). 6 For more details on the simulations, see Ichino et al. (2008) andnannicini (2007) for details on the STATA module sensatt which implements this method.

Estimating the causal effect of fertility on economic wellbeing 367 easily interpretable measure of these associations, IMN propose to use the following parameters: Ɣ = rep r=1 1 rep [ ] Pr (Y = 1 D = 0, U r = 1, X)/P r (Y = 0 D = 0, U r = 1, X) P r (Y = 1 D = 0, U r = 0, X)/P r (Y = 0 D = 0, U r = 0, X) and = rep r=1 1 rep [ ] Pr (D = 1 U r = 1, X)/P r (D = 0 U r = 1, X) P r (D = 1 U r = 0, X)/P r (D = 0 U r = 0, X) where rep indicates the number of replications. The parameter Ɣ is the average odds ratio from the logit model of P(Y = 1 D = 0, U, X) calculated over several replications of the simulation procedure. It is in other words a measure of the effect of U on Y, and is in this sense an outcome effect. The parameter refers to the average odds ratio from the logit model of P(D = 1 U, X). This is a measure of the effect of U on D, and is therefore a measure of the selection effect. At each replication of the simulation exercise, together with the two mentioned odd ratios, the ATT is estimated using as covariates the set X and the simulated U. The final simulated ATT estimate is the average of the estimates obtained in all the replications. 7 2.2.3 Strategy 3: IV methods When UNC is implausible and an instrument is available one would naturally implement IV methods. The way they are implemented depends on whether the available instrument can be thought of as randomised or not. In the previous discussion, we assumed that the instrument is randomised, which means that there is no need to control for covariates. In this case, AIR shows that LATE can be simply estimated by the Wald estimator. However, in many applications Z is not randomly assigned and can be confounded with D or with Y or both. The implication is that in this contexts usually the IV assumptions, as the exclusion restriction, can be thought as being reliable only conditional on a set of covariates. In other words, in these situations Z can be considered unconfounded only conditional on covariates. The conventional approach to accommodate covariates in IV estimation consists of parametric or semi-parametric methods two stages least squares being the most common and classic examples include Card (1995) and Angrist and Krueger (1991). A serious drawback of these methods is that most of them impose additive separability in the error term, which amounts to rule out unobserved heterogeneity in the treatment effects. One approach that overcomes the strong assumptions used by the aforementioned IV methods is the 7 A complementary approach proposed by Manski (1990) consists to drop the UNC assumption entirely and construct bounds for ATT that rely on alternative identifying assumptions, for example that outcome is bounded. IMN show how this approach is related with their sensitivity analysis and argue that non-parametric bounds are too much a conservative method and bounds calculations rely on extreme circumstances that are implausible. Moreover in our application the outcome is continuous and has no natural bounds.

368 B. Arpino, A. Aassve non-parametric approach suggested by Frölich (2007). The identifying assumptions in this case are basically the same as is the case of a randomised instrument but stated in terms of conditioning on covariates. In this way, we can identify the conditional LATE, which is the LATE defined for units with specific observed characteristics. The marginal LATE is identified as follows: 8 LATE = (E[Y X, Z = 1] E[Y X, Z = 0]) dfx (E[W X, Z = 1] E[W X, Z = 0]) dfx. (7) When the number of covariates included in the set X is high, non-parametric estimation of equation (7) becomes difficult, especially in small samples. An alternative is to make use of the aforementioned balancing property of the propensity score that allows us to substitute the high dimensional set X in (7) by a univariate variable: π = P(Z = 1 X). 3 Fertility and economic wellbeing and the Vietnamese context Our application is concerned with estimating the causal effect of fertility on economic wellbeing. The interrelationship between the two has received considerable interest in development studies and the economics literature. The traditional micro-economic framework considers children as an essential part of the household s work force as they generate income. This is especially true for male children. In rural underdeveloped regions of the world, which rely largely on a low level of farming technology and where households have no or little access to state benefits, this argument makes a great deal of sense (Admassie 2002). In this setting households will have a high demand for children. The down side is that a large number of children participating in household production hamper investment in human capital (Moav 2005). There are of course important supply side considerations in this regard: rural areas in developing countries have poor access to both education and contraceptives, both limiting the extent couples are able to make choices about fertility outcomes (Easterlin and Crimmins 1985). As households attain higher levels of income and wealth, they also have fewer children, either due to a quantity quality trade-off, as suggested by Becker and Lewis (1973), or due to an increase in the opportunity cost of women earning a higher income, as suggested by Willis (1973). An important aspect with regard to Vietnam is that the country has experienced a tremendous decline in fertility over the past two decades, and at present one can safely claim that the country has completed the fertility transition. The figures speak for themselves: in 1980 the total fertility rate (TFR) was 5.0, in 2003 it was 1.9. Contraceptive availability and knowledge is widespread and family planning programs were initiated already in 1960s (Scornet 2007). 9 8 It is important to note that a common support assumption is needed, as stated by Frölich: Supp(X/Z=1) = Supp(X/Z=0). However, here we give only some intuitions about the assumptions underlying this method. For a detailed and more formal discussion we refer to Frölich s paper. 9 An important factor in this change was the introduction of the Doi Moi (renewal) policy in the late eighties which consisted of replacement of collective farms by allocation of land to individual households;

Estimating the causal effect of fertility on economic wellbeing 369 In light of our technical discussion in Sect. 2, the key issue in this application is that fertility decisions can be driven by both observed and unobserved selection. In terms of observables, predicting their effects is relatively straightforward within an economics framework. The key is to understand the drivers behind women s perceived opportunity cost of childbearing. Higher education and labour force participation among women increase women s opportunity cost, producing a negative effect on fertility. It will also increase their income level and hence consumption expenditure. Typically, any increase in the opportunity costs dominates the positive income effect. Increased education among men, and therefore higher earnings, translate into a positive income effect, and hence having a positive effect on fertility (Ermisch 1989). However, empirical analysis shows that there is not necessarily a positive relationship between income and family size (i.e. number of children), the key explanation being that couples make trade-offs between quantity and quality (Becker and Lewis 1973), especially as the country in question develops and pass through the fertility transition. As for the unobservables these can operate through different mechanisms. The key unobserved variables are ability and aspirations and they play an important role in our application. In general, we would expect those with higher ability or aspirations in terms of work and career to have lower fertility because of their higher opportunity cost. Thus, ability is negatively correlated with fertility but is positively associated with consumption expenditure. Moreover, fertility is commonly measured in terms of childbearing events as we do here. However, the childbearing outcomes are the direct result of contraceptive practices, which are typically unmeasured in household surveys. Better knowledge and higher uptake of contraceptives reduces unwanted pregnancies, which would reduce fertility. However, unobserved ability is positively associated with contraceptive use, which reinforces the negative effect of ability on fertility. Fertility is of course based on the joint decision of a couple, and not the woman alone. Hence, behind the childbearing outcomes, there is also a bargaining process taking place. Again, unobserved ability may play an important role. High ability women, may have stronger bargaining power, either as a result of the ability itself (e.g. they are better negotiators), or through the effect higher ability has on their labour supply and hence earnings. Whereas ability works through different mechanisms, the prediction of its effect is rather clear in the sense that high ability is associated with lower fertility, but higher income and hence consumption expenditure. Consequently, its omission implies a negative bias in the estimation of the effect of fertility on consumption expenditure. The data we use comes from the Vietnam living standard measurement survey (VLSMS) first surveyed in 1992/1993 with a follow-up in 1997/1998. The longitudinal nature of the data set allows us to measure if any women in the household experienced another birth between the two waves. The treatment is then defined as a binary Footnote 9 continued legalisation of many forms of private economic activity; removal of price controls; and legalisation and encouragement of Foreign Development Investment (FDI). Since the introduction of Doi-Moi, the country embarked on a remarkable economic recovery, followed by a substantial poverty reduction (Glewwe et al. 2002).

370 B. Arpino, A. Aassve Table 1 Average equivalised household consumption expenditure at the two waves and its growth by number of children born between the two waves Number of children born between the two waves Observations Average consumption in 1992 Average consumption in 1997 0 1,232 970 2,436 1,466 1 581 856 1,892 1,036 2 182 790 1,755 965 3 28 571 1,154 583 At least 1 791 832 1,835 1,004 Total 2,023 916 2,201 1,285 Average consumption growth in 1997 1992 Notes: We consider the number of children of all household members born between the two waves and still alive at the second wave. All consumption measures are valued in dongs and rescaled using prices in 1992. The 2,023 households represented in the table are selected taking only households with at least one married woman aged between 15 and 40 in the first wave. Consumption is expressed in thousands of dongs variable taking value 1 if the household experiences a childbearing event between the two waves (treated) and 0 otherwise (untreated or control). The outcome of interest is the equivalised consumption expenditure level in the second wave. In the empirical implementation presented in the next section, we control for a range of explanatory variables measured in the first wave. The data follows otherwise the standard format of the World Bank LSMS, including detailed information about education, employment, fertility, expenditure and incomes. The survey also provides detailed community information from a separate community questionnaire. This information is available for the 120 rural communities sampled and consists of data on health, schooling and main economic activities. The availability of this information is important for two reasons. First, characteristics of communities where households reside are likely to influence both economic wellbeing and fertility and, hence, are potentially relevant confounders. Second, from this information we get an interesting IV, represented by the availability of contraceptives in the community. The conventional approximation for the household s welfare is to use the household s observed consumption expenditure, which requires detailed information on consumption behaviour and its expenditure pattern (Coudouel et al. 2002; Deaton and Zaidi 2002). The expenditure variables are calculated by the World Bank procedure which is readily available with the VLSMS. We choose a relatively simple equivalence scale giving to each child aged 0 14 in the household a weight of 0.65 relative to adults. 10 Table 1 shows simple descriptive analysis highlighting a clear negative association between number of children and economic wellbeing. Our choice of covariates is based mainly on dimensions which are important for both household s standard of living and fertility behaviour and hence are potentially confounders that have to be included in the conditioning set X to make the UNC plausible. All these variables can theoretically have an impact on change in consumption 10 We assessed the robustness of results to the imposed equivalence scale. Results are consistent to those presented here for reasonable equivalence scales. This analysis is available from authors upon request.