Fertility and the Health of Children: An Application of a Nonparametric Conditional LATE Estimator

Size: px

Start display at page:

Download "Fertility and the Health of Children: An Application of a Nonparametric Conditional LATE Estimator"

Brittney Hensley
5 years ago
Views:

1 Fertility and the Health of Children: An Application of a Nonparametric Conditional LATE Estimator Daniel J. Henderson State University of New York at Binghamton Daniel L. Millimet Southern Methodist University Christopher Parmeter State University of New York at Binghamton Le Wang Southern Methodist University December 2005 Abstract Although the theoretical trade-off between the quantity and quality of children is well-established, empirical evidence supporting such a causal relationship is limited. This paper applies a recently developed nonparametric estimator of the conditional local average treatment effect to assess the sensitity of the quantity-quality trade-off to functional form and parametric assumptions. Using data from the Indonesia Family Life Survey and controlling for the potential endogeneity of fertility, we find evidence supporting the trade-off. Moreover, the magnitude and statistical significance of the results are different from the standard two-staged least squares estimates. JEL: C14, D10, I12, O12 Keywords: Local Average Treatment Effects, Nonparametric Regression, Health, Human Capital, Fertility The authors are grateful to Markus Frölich and participants at the Advances in Econometrics conference for helpful comments. GAUSS code used in the paper is available upon request. Corresponding address: Daniel Millimet, Department of Economics, Box 0496, Southern Methodist University, Dallas, TX Tel: (214) Fax: (214) millimet@mail.smu.edu.

2 1 Introduction In the usual treatment effects set-up, one is interested in identifying the average impact of a binary variable (the treatment ) on a particular outcome. When the treatment is assigned randomly, the average treatment effect (ATE) is estimated by the difference in means. However, when the treatment is not assigned randomly, and is correlated with unobservables that also impact the outcome of interest, an exogenous variable (the instrument ) is required such that the instrument is correlated with the treatment, but uncorrelated with the outcome conditional on the treatment. With such an instrument and assuming a constant treatment effect, the usual Wald estimator given by the ratio of the effect of the instrument on the outcome to the effect of the instrument on the probability of treatment identifies the ATE. However, the conditions required in this relatively simple framework may break down for two reasons. First, the treatment effect may not be constant across observations. Second, finding an exogenous instrument (i.e., one that is uncorrelated with the outcome conditional only on treatment status) may be unrealistic. In the case of non-constant treatment effects, Imbens and Angrist (1994) introduced the concept of the local average treatment effect (LATE). Under certain conditions (discussed below), Angrist et al. (1996) show that the Wald estimator may be interpreted as the ATE for the subpopulation of observations whose treatment assignment is determined by the instrument, referred to as compliers. In the second case, one solution is to incorporate additional covariates into the model such that an instrument is available that is uncorrelated with the outcome conditional on treatment status and the additional covariates. While adding covariates improves the likelihood that an instrument will be exogenous, the instrument must maintain its correlation with the treatment variable conditional on the covariates as well. Assuming one has an instrument satisfying the necessary conditions, the final issue confronted is how one estimates the conditional means required by the Wald estimator. Until recently, parametric or semiparametric methods have been exclusively relied on (see, e.g., Angrist et al. 2000; Yau and Little 2001; Abadie 2003). However, Frölich (2005a) proves the feasibility of nonparametric estimation of the relevant conditional means. In this paper, we review the general method proposed in Frölich (2005a), as well as the actual nonparametric techniques based on Generalized Kernel Estimation developed in Li and Racine (2004) and Racine and Li (2004) used to operationalize Frölich s estimator of the 1

3 conditional LATE. We then apply this estimator to test the theoretical quantity-quality trade-off : an increase in the quantity of children in a household has a negative causal effect on the quality of children. Here, the treatment is defined as a household having more than two children (relative to the control of having only two children), and the instrument is an indicator of whether the firsttwochildrenareofthe same gender. The outcome of interest is a measure of child health, weight-for-age, and the covariates included in the model measure several individual and family attributes. The model is estimated using data on roughly 5,400 children age ten and under from the 2000 wave of the Indonesian Family Life Survey (IFLS). For comparison, we also utilize more traditional Ordinary Least Squares (OLS) and Two-Stage Least Squares (TSLS) estimators. The results yield two main conclusions. First, the nonparametric conditional LATE estimator reveals a negative conditional LATE of residing in a larger household. Moreover, the effect is statistically and economically significant; in the subpopulation of compliers, going from a household with only two children to one with more than two children causes a decrease in weight-for-age by an average of 1.8 standard deviations. Second, two-stage least squares estimation, based on a linear functional form, yields a larger, but statistically insignificant, point estimate. Thus, the flexibility afforded by the nonparametric approach matters, at least in this example. The remainder of the paper is organized as follows. Section 2 discusses the nonparametric estimator. Section 3 discusses the application. Section 4 concludes. 2 Nonparametric c-late Estimation 2.1 The c-late Estimator We are interested in identifying the causal effect of a binary treatment on an outcome of interest. The fundamental problem with such identification is one of incomplete information (Rosenbaum and Rubin 1983). While one observes whether the treatment occurs and the outcome conditional on the treatment, the counterfactual is unobserved. Let Yi 1 denote the outcome of observation i if treatment occurs (given by D i =1) and denote Yi 0 the outcome if treatment does not occur (D i =0). Thus, the treatment effect for observation i, γ i, is given by the difference in the potential outcomes, Yi 1 Yi 0.Moreover,if both potential outcomes were observable, the average treatment effect (ATE), γ AT E = E Yi 1 Yi 0, as well as any other summary statistic of the distribution of the treatment effect, would be trivial to estimate. However, given that only Y i = D i Yi 1 +(1 D i)yi 0 is observed for each observation, estimation 2

4 of the ATE is no longer straightforward unless treatment assignment is random. Absent data from a randomized experiment, estimators of the ATE (or other mean treatment effect parameters) can be classified into two categories: selection on observables and selection on unobservables. In the former, it is assumed that the econometrician observes a vector of attributes for each observation, X i, such that the distribution of the potential outcomes are independent of treatment assignment conditional of X. Formally, this unconfoundedness or conditional independence assumption is given by Y 1,Y 0 D X (1) If (1) is unlikely to hold, one falls into the selection on unobservables case. It is this case we consider. 1 When (1) does not hold, estimation of mean treatment effect parameters typically requires an instrumental variable (IV), Z. WhenZ is a binary variable and the following conditions hold: (A1) FIRST-STAGE: E [D Z] is a non-trivial function of Z (A2) MEAN INDEPENDENCE: E[Y j i,z i Z i =0]=E[Y j i,z i Z i =1], j =0, 1 (A3) CONSTANT TREATMENT EFFECT: γ i = γ i the ATE is given by γ AT E = E Yi 1 Yi 0 E [Y Z =1] E [Y Z =0] = (2) E [D Z =1] E [D Z =0] Replacing the expectations in (2) with their sample counterparts results in the usual Wald estimator. In practice, however, the independence assumption, (A2), may not hold unconditionally, and the assumption of a constant treatment effect, (A3), may be unrealistic (Heckman 1997). When (A3) does not hold, one must distinguish between different subpopulations in terms of how their treatment assignment responds to variation in the instrument. Following Angrist et al. (1996), every observation must belong to one of four types τ {n, c, d, a} given the binary nature of D and Z: NEVER-TAKER: τ i = n D 0i =0and D 1i =0 COMPLIER: τ i = c D 0i =0and D 1i =1 DEFIER: τ i = d D 0i =1and D 1i =0 1 For an introduction to selection on observable estimators, see Wooldridge (2002), Imbens (2004), and Lee (2005). 3

5 ALWAYS-TAKER: τ i = a D 0i =1and D 1i =1 where D ji is the value of D for observation i given Z i = j, j =0, 1. Never- (always-) takers never (always) receive the treatment regardless of the value of the instrument. The treatment assignment of compliers and defiers, on the other hand, is manipulated by the instrument. Thus, when the treatment effect is not constant, one cannot learn anything about the treatment effect for never- and always-takers. However, under the following conditions (B1) MONOTONICITY (no defiers): Pr(D 1i D 0i )=1 (B2) FIRST-STAGE (population of compliers has positive probability): Pr(D 1i =1)> Pr(D 0i =1) (B3) UNCONFOUNDED TYPE: Pr(τ i = t Z i =0)=Pr(τ i = t Z i =1), t {n, c, a} (B4) MEAN INDEPENDENCE WITHIN SUBPOPULATIONS: E[Yi,Z 0 i Z i =0,τ i = t] =E[Yi,Z 0 i Z i =1,τ i = t], t {n, c}; E[Yi,Z 1 i Z i =0,τ i = t] =E[Yi,Z 1 i Z i =1,τ i = t], t {a, c} the ATE for the subpopulation of compliers, also known as the local average treatment effect (LATE), is given by γ = E Yi 1 Yi 0 τ i = c = E [Y Z =1] E [Y Z =0] (3) E [D Z =1] E [D Z =0] Replacing expectations with their sample counterparts yields the same estimand as in (2); the only difference is in the interpretation (see Lee (2005) for more details). In the case when (A2) and (A3) both do not hold, one may identify the conditional LATE (c-late), conditional on a vector of observables, X, under certain conditions (Angrist and Imbens 1995). The c-late evaluated at a particular x, γ (x), isdefined as γ (x) =E Y 1 i Y 0 i X i = x, τ i = c = E [Y X = x, Z =1] E [Y X = x, Z =0] E [D X = x, Z =1] E [D X = x, Z =0] The overall c-late, γ, is a weighted average of γ (x), where the weights come from the density of x in the subpopulation of compliers; this is given by (4) Z γ = γ (x) df x τ=c (5) where F x τ=c denotes the distribution of x in the subpopulation of compliers. If the following conditions hold: 4

6 (C1) MONOTONICITY (no defiers): Pr(D 1i D 0i X) =1 (C2) FIRST-STAGE (population of compliers has positive probability): Pr(D 1i =1 X) > Pr(D 0i = 1 X) (C3) UNCONFOUNDED TYPE: Pr(τ i = t X i = x, Z i = 0) = Pr(τ i = t X i = x, Z i = 1), t {n, c, a}, x Supp(X) (C4) MEAN INDEPENDENCE WITHIN SUBPOPULATIONS: E[Yi,Z 0 i Z i =0,τ i = t] =E[Yi,Z 0 i X i = x, Z i =1,τ i = t], t {n, c}, x Supp(X); E[Yi,Z 1 i X i = x, Z i =0,τ i = t] =E[Yi,Z 1 i X i = x, Z i =1,τ i = t], t {a, c}, x Supp(X) (C5) COMMON SUPPORT: Supp(X Z = 0) = Supp(X Z =1) equations (4) and (5) may be estimated either parametrically, semiparametrically, or nonparametrically. Given that estimates of γ(x) and γ may be sensitive to the choice of functional form assumed or the additive separability imposed in semiparametric models, we utilize a fully nonparametric approach recently developed in Frölich (2005a). To proceed in the estimation of γ, werequirespecific knowledge of the distribution of the covariates for the subpopulation of compliers. However, since the subpopulation of compliers is unknown, df x τ=c is unknown. Relying on Bayes theorem simplifies matters; we can replace df x τ=c with Pr (τ = c X = x) df x (6) Pr (τ = c) in (5). Noting that Pr (τ = c X = x) =E [D X = x, Z =1] E [D X = x, Z =0]and substituting (4) for γ (x), (5) may be rewritten as R E [Y X = x, Z =1] E [Y X = x, Z =0]f (x) dx γ = (7) Pr (τ = c) This representation of γ still depends on an unknown, Pr(τ = c). However, replacing Pr (τ = c) = R R Pr (τ = c X = x) f (x) dx = E [D X = x, Z =1] E [D X = x, Z =0]f (x) dx in (7) yields γ = R E [Y X = x, Z =1] E [Y X = x, Z =0]f (x) dx R E [D X = x, Z =1] E [D X = x, Z =0]f (x) dx (8) which is nonparametrically identified. 5

7 Define the following conditional mean functions: m 1 (x) = E [Y X = x, Z =1] m 0 (x) = E [Y X = x, Z =0] µ 1 (x) = E [D X = x, Z =1] µ 0 (x) = E [D X = x, Z =0] and let bm 1 (x), bm 0 (x), bµ 1 (x), andbµ 0 (x) be their corresponding nonparametric conditional mean estimators. Then, a suitable estimate for (8) is where n is the sample size. ˆγ = np [ bm 1 (X i ) bm 0 (X i )] i=1 np [bµ 1 (X i ) bµ 0 (X i )] i=1 (9) This estimator uses X i to construct bm 1 (x), bm 0 (x), bµ 1 (x), andbµ 0 (x). However, for any particular observation, Z i =0or 1 and so it is reasonable to use the observed values Y i and D i as estimates of E [Y i X = X i,z = z] and E [D i X = X i,z = z], respectively, when Z i = z (due to the expected efficiency gains). The nonparametric average c-late estimator, ˆγ, nowbecomes ˆγ = P i:z i =1 P i:z i =1 [Y i ˆm 0 (X i )] P i:z i =0 [D i ˆµ 0 (X i )] P i:z i =0 [Y i ˆm 1 (X i )] [D i ˆµ 1 (X i )] which is equivalent to the ratio of two matching estimators (see Lee (2005) for an introduction to matching estimators). (10) may be estimated using any nonparametric estimator, provided that the estimator satisfies the assumptions given in Frölich (2005a) required for bγ to be n-consistent and asymptotically (10) normal. Frölich (2005a) also gives additional conditions under which bγ acheives its semiparametric efficiency bound. Thus, while power series, polynomial series, splines, and kernel techniques are all viable estimation methods, we utilize recently developed Generalized Kernel Estimation, which we now discuss. 2.2 Generalized Kernel Estimation Given the type of data typically encountered in applications of program evaluation or general treatments, we utilize Generalized Kernel Estimation developed in Li and Racine (2004) and Racine and Li (2004) 6

8 to estimate (10). The benefit of this approach over typical kernel methods is that it allows for smoothing of both continuous and categorical (ordered and unordered) variables, which are common in datasets. Our objective is to estimate m z (x) =E [Y X = x, Z = z] and µ z (x) =E [D X = x, Z = z] nonparametrically for all Z. Given that Z is binary, we need to estimate four models: m z (x) and µ z (x), z =0, 1. Each model is estimated using only the appropriate subsample. In other words, bm 0 (x) and bµ 0 (x) are estimated using the subsample where Z =0; similarly for bm 1 (x) and bµ 1 (x). Once the four models are estimated, we are able to predict the missing counterfactual for each observation. Specifically, we utilize bm 0 (x) and bµ 0 (x) to estimate the missing counterfactuals, E [Y i X = X i,z i =0]and E [D i X = X i,z i =0], respectively, for the subsample where Z =1; for the subsample where Z =0, we utilize bm 1 (x) and bµ 1 (x). This yields an estimate of the expected change from moving from Z =1 to Z =0,andviceversa. To obtain each of these estimates, we first use a Nadaraya-Watson (Local-Constant Least-Squares) type estimator which estimates the conditional expectation of a variable, and then predict the missing counterfactuals. Formally, bm 0 (x i )=E [Y X = x, Z =0]= P j:z=0 P j:z=0 y j K K ³ xj x i h ³ xj x i (11) h where K( ) is the kernel function and h is the bandwidth. We then evaluate bm 0 (x i ) for each observation i, i =1, 2,...,n 1,wheren 1 is the number of observations with Z =1. Similarly, bm 1 (x i )=E [Y X = x, Z =1]= P j:z=1 P j:z=1 y j K K ³ xj x i h ³ xj x i (12) h and we then evaluate bm 1 (x i ) for each observation i, i =1, 2,...,n 0,wheren 0 is the number of observations with Z =0. The extensions for µ 0 (x) and µ 1 (x) are trivial and are given by bµ 0 (x i )=E [D X = x, Z =0]= P j:z=0 P j:z=0 D j K K ³ xj x i h ³ xj x i h (13) and bµ 1 (x i )=E [D X = x, Z =1]= P j:z=1 P j:z=1 D j K K ³ xj x i h ³ xj x i h (14) 7

9 respectively. We purposely skipped the description of the kernel function, K ( ), as this is what separates the Generalized Kernel Estimation procedure from the standard model. In the standard model, the kernel function is designed for continuous variables and satisfies several properties. First, for large values of h 1 (x j x i ), K should be small; small weights are assigned to data points which are far ³ xj x i h away from x i. Second, since the bandwidth h goes to zero as the sample size grows towards infinity, K goes to zero for all x j 6= x i. Finally, the kernel function must integrate to unity. Given ³ xj x i h these requirements, kernels are frequently chosen to be well-known density functions, with the standard normal being one of the most popular. Much of this changes when the data is discrete. In order to smooth categorical data, a kernel function that is designed specifically for discrete data must be used. These types of kernels assign weights of unity when the discrete regressors x j and x i are identical, and weights which are functions of their associated bandwidth when they differ. Specifically, (a variation of ) the Aitchison and Aitken (1976) kernel function for unordered categorical variables equals one if x j = x i and λ u otherwise, where λ u is the bandwidth for the unordered discrete regressor. The Wang and Van Ryzin (1981) kernel function for ordered categorical data equals one if x j = x i and (λ o ) x j x i otherwise, where λ o is the bandwidth for the ordered discrete regressor. The difference between the two is that the unordered kernel treats deviations from x i equally, whereas the ordered kernel weights the values differently depending on the distance between x j and x i. For instance, a time trend is an example of an ordered discrete variable, whereas a variable identifying separate countries in a cross-country panel data set is an example of an unordered discrete variable. For a data set containing continuous, unordered, and ordered discrete variables, K ( ) can be constructed using the product kernel (Pagan and Ullah 1999). Here, µ µ xj x i K = Π q 1 x c h s=1 λ c l c sj x c si rπ s λ c s s=1 lu x u sj,x u si,λ u p s Π l o x o s=1 sj,x o si,λ o s (15) where l c, l u,andl o represent the continuous (Gaussian), unordered (Aitchison and Aitken), and ordered (Wang and Van Ryzin) kernel functions with bandwidths λ c, λ u,andλ o, evaluating the continuous (x c ), unordered (x u ), and ordered (x o ) regressors, respectively. After choosing the kernel function, the final issue to be resolved is the choice of bandwidths. Because it is believed that the choice of the continuous kernel function matters little in the estimation of the 8

10 conditional mean, selection of the bandwidths is considered to be the most salient factor when performing nonparametric estimation. As indicated above, the bandwidth controls the amount by which the data are smoothed. Large values of h will lead to large amounts of smoothing, resulting in low variance, but high bias. Small values of h, on the other hand, will lead to less smoothing, resulting in high variance, but low bias. This trade-off is a well known dilemma in applied nonparametric econometrics and thus we often resort to automatic determination procedures to estimate the bandwidths. Although there exist many selection methods, we utilize Hurvich et al. s (1998) Expected Kullback Leibler (AIC c ) criteria. 2 The basic idea behind the procedure is that we want to estimate the bandwidths by minimizing a particular objective function. Specifically, we wish to minimize where and H is given by by = bm z (x) =Hy,or and H is given by b D = bµ z (x) =HD. AIC c (λ c,λ u,λ o )=log bσ 2 + bσ 2 = 1 n z X = µ 1 n z j:z=z bσ 2 = 1 n z X = µ 1 bm j z n z 1+tr(H)/n z 1 [tr(h)+2]/n z (16) yj bm j z (x j ) 2 y 0 (I H) 0 (I H)y j:z=z leave-one-out estimators of m z (x) and µ z (x), respectively. Dj bµ j z (x j ) 2 D 0 (I H) 0 (I H)D (x j )=Hy j and bµ j z (x j )=HD j are the commonly used The AIC c criteria depends crucially on the leave-one-out estimators. These estimators are obtained by omitting one observation at a time, estimating the model each time on the sample of size n z 1, and predicting the expected value of y (or D) given the x j s for the omitted observation. After performing this estimation for each observation in the data, the sum of the squared differences between the true y (or D) and the expected value of y (or D) is used to calculate bσ 2. The set of bandwidths that minimize 2 Achieving n-consistency actually requires one to undersmooth in order to reduce the bias terms to be of lower order. Nevertheless, conventional methods, such as the AIC c criteria, appear to perform well in finite samples. See Frölich (2004, 2005b). 9

11 the AIC c function are those that are utilized in the final estimation. Obviously, as the sample size grows and the number of regressors increases, computation time increases dramatically. However, it is highly recommended that one uses these types of techniques as opposed to a rule of thumb selection, especially in the presence of discrete data, as no rule of thumb selection criteria exists. Once we have estimated the four separate sets of bandwidths, corresponding to the four conditional expectation estimates, we can estimate bm 0 (x), bm 1 (x), bµ 0 (x), andbµ 1 (x) and obtain our estimate of bγ in (10). The standard error of bγ is obtained via bootstrapping (199 repetitions). 3 Application to Fertility and the Health of Children 3.1 Motivation A long-standing issue in labor and development economics is the so-called quantity-quality trade-off. According to the hypothesis, there is a negative relationship between child outcomes (quality) and the number of children in a household (quantity) (Becker and Tomes 1976; Becker and Lewis 1973; Willis 1973; Becker 1960). 3 The trade-off is typically assumed to originate from parental preferences for equal levels of quality across children (Rosenzweig and Wolpin 1980). Empirical tests of this model concentrate on estimating demand equations for child-specific outcomes (e.g., the level of schooling), where the number of children is one potential influence on demand. Such studies document a negative relationship between sibship size and human capital investments (see, e.g., Conley and Glauber 2005; Glick et al. 2005; Lee 2004; Ahn et al. 1998; Parish and Willis 1993; Hanushek 1992; Knodel and Wongsith 1991; Rosenzweig and Wolpin 1980), although a few find either no effect (Black et al. 2005; Kaestner 1997; Mock and Leslie 1984) or even a positive effect (Qian 2005; Hossain 1990; Chernochovsky 1985; Gomes 1984). Empirical assessments of the quantity-quality trade-off have focused mainly on education as a measure of child quality. Here, we use health status for several reasons. First, Thomas and Frankenberg (2002) state that adult stature is largely determined during the fetal and early childhood periods, and Thomas et al. (1990, 1991) note the direct relationship between child anthropometric measures and the probability of survival as well as skill development. Second, researchers have documented a positive 3 Throughout the text, we use the terms household and family interchangeably. In the data section, we explicitly define our empirical measures. 10

12 association between adult health and labor market outcomes at both the microeconomic and macroeconomic levels. Specifically, many studies have found that there is a positive impact of height on individual hourly earning (Strauss and Thomas (1998) provide an excellent review), Fogel (1994) documents the parallel historical increases in height and economic growth, Bloom et al. (2001) document a causal connection between life expectancy and economic growth, and López-Casasnovas et al. (2005) offer a detailed theoretical and empirical account of the linkages between health and economic development (see also Wolpin 1997). Given the importance of children s health, a small literature has developed investigating its determinants. In many of these studies, household size enters the empirical analysis as an important control, although the estimated relationship is not of primary interest and the issue of causation versus correlation is often ignored. A recent, notable exception is Glick et al. (2005). The authors utilize data on twins to isolate the casual effect of fertility on child health and school enrollment using Romanian data, finding sizeable negative effects that increase in magnitude after accounting for the endogeneity of sibship size. Here, we use a binary instrument based on the gender composition of children in the household, as utilized in, for example, Butcher and Case (1994), Angrist and Evans (1998), Cruces and Galiani (2004), Conley and Glauber (2005), and Millimet and Wang (2005). 3.2 Data The data are obtained from the Indonesian Family Life Survey (IFLS), a large-scale longitudinal survey conducted jointly by RAND and the Center for Population and Policy Studies (CPPS) at the University of Gadjah Mada. The IFLS provides a rich data source based on a sample of households representing about 83% of the Indonesian population living in 13 of the nation s 26 provinces in Three waves exist presently: 1993, 1997, and 2000 (see Strauss et al. (2004a, 2004b) for a complete description of the surveys). We utilize the 2000 wave, which is the most current survey containing physical assessments of children, and form a sample of roughly 5,400 children aged ten and under, with at least one identifiable birth parent in the survey, and who come from a household with at least two children. We assess the empirical relevance of the trade-off using weight as the basis of our measure of health status, Y. However, since children are still growing, comparing anthropometric data from children of different ages is complicated (Vidmar et al. 2004). Thus, we standardize the raw data on weight to the reference population for the child s age and sex utilizing the 1990 British Growth Reference data. 11

13 This is a frequently used measure of health, and captures persistent long-term malnutrition as well as short-term health shocks. 4 The quantity of children is defined as the number of children belonging to a given set of parents. Assuming that couples take into account their spouses fertility history, this definition includes any children from previous marriages. For example, if one (or both) parents were previously married and entered the current marriage with children from the previous union, then these children are used when computing the number of siblings. Our definition excludes children belonging to other couples who may share a common residence. Finally, our definition includes children belonging to the couple, but not currently residing in the household if the child resided in the household during the 1993 or 1997 wave. Once the number and identity of siblings is established, we create the binary treatment variable, D, equal to one if the household has more than two children and zero otherwise (recall, the sample is limited to households with at least two children). To estimate γ, wedefine Z based on the gender of children; Z is equal to one if the first two children are of the same sex (zero otherwise). To obtain the conditional LATE, we condition on several unordered and ordered discrete variables, as well as continuous variables. While the sex composition of children seems at first glance to be uncorrelated with all other potential determinants of child health, Angrist and Evans (1998) show that it may be correlated with the gender of the first two children. Moreover, even in randomized experiments, controlling for important potentially confounding variables may be advisable since the randomization only balances the confounders in expectation (Imai and van Dyk 2004). Thus, we condition on the following variables: Unordered Discrete: gender of the child, gender of the first child, and province (North Sumatra, West Sumatra, South Sumatra, Lapung, Jakarta, West Java, Central Java, Yogyakarta, East Java, Bali, West Nusa Tenggara, South Kalimantan, and South Sulawesi); Ordered Discrete: mother s education (less than primary school, junior high school, senior high school, university or other), father s education (less than primary school, junior high school, senior high school, university, and other), birth order; 4 Cogill (2003, p. 11) notes that weight-for-age reflects both past (chronic) and/or present (acute) undernutrition. See also Thomas et al. (1991, 1996). 12

14 Continuous: age of the child in months, mother s weight, and father s weight. 5,6 The inclusion of parental weight is noteworthy. As noted in Thomas et al. (1990, 1996), parental health status, as reflected in anthropometric measures, may have a direct effect on child health through their impact on birthweight. In addition, parental measures may partially capture household unobservables, lending further credibility to the use of Z as a valid instrument. Table 1 presents summary statistics. On average, children in a household with two children are healthier (in terms of standardized weight), consonant with the quantity-quality trade-off, although they are also nearly a year younger. Table 2 presents a cross-tabulation of the treatment variable and the instrument. 3.3 Results Results are presented in Table 3. For comparison, OLS and TSLS results are also displayed. The OLS results indicate that moving from a two child household to one with more than two children is associated with a small, statistically significant reduction in weight-for-age (γ = 0.256, s.e. = 0.040). Treating fertility as endogenous, but excluding the control variables, we find a much larger adverse impact of fertility on weight-for-age (γ = 1.635, s.e. = 0.793). This corresponds to equation (2). Moreover, we also find that the instrument is positive and statistically significant, as suggested by the results in Table 2, in the first-stage regressions (F =14.19, p =0.00). When we re-estimate the OLS and TSLS models including the control variables, the OLS estimate is cut in half, but remains statistically significant (γ = 0.145, s.e. = 0.055). The TSLS estimate, on the other hand, becomes even larger in magnitude, but is no longer statistically significant (γ = 2.718, s.e. = 1.736). However, the first-stage continues 5 Missing values of mother s and father s weight are replaced with sample means. University and Other are pooled for mother s education due to sample size in the cells. Birth order is coded as 1, 2, 3, 4, and 5 or more for the same reason. Finally, gender of the second child is excluded, otherwise the common support condition (C5) would be invalidated. 6 Although the sample is restricted to children aged ten years and under, the age in months of some children in the sample indicates they are older then ten. the question concerning the child s age in years at the time of the 2000 wave. This discrepancy arises because the sample is chosen based on the answer to This question is answered by a household member older than age 18. To more carefully control for age in the analysis, we wish to measure the child s age in months. To do so, we utilize survey information on the exact month and year of the child s birth and the month and year of the survey. Discrepancies arise due to the fact that many individuals do not recall their exact birthdate. Strauss et al. (2004b, p. 23) states: In Indonesia, as in many developing countries, however, not everyone knows his/her birthdate or age accurately. Therefore, reported birthdate and ages across waves do not always match for a respondent, and there may even be discrepancies between books within a wave. 13

15 to indicate a strong statistical effect of the instrument on the treatment (F =6.96, p =0.01). Finally, our nonparametric estimate of the conditional LATE, corresponding to equation (5), is similar to the TSLS estimate, but is now statistically significant at the p < 0.05 level (γ = 2.358, s.e. = 0.936). This represents an effect approximately equal to 1.8 standard deviations. Thus, for the subpopulation of households whose fertility decisions are influenced by the instrument, we find strong evidence of the quantity-quality trade-off. Furthermore, given the differences in magnitude and significance in the TSLS and Generalized Kernel estimates, this suggests the linear functional form assumptions imposed in the former yield moderately misleading results. 4 Conclusion Applying recent nonparametric advances, we revisit the issue of a negative causal effect of sibship size on the quality of children, where quality is measured by children s weight-for-age. The methodology allows for heterogeneous treatment effects as well as unordered discrete, ordered discrete, and continuous covariates in the model. The results indicate two main conclusions. First, the nonparametric estimator reveals a negative conditional LATE of additional siblings. Moreover, the effect is statistically and economically significant; in the subpopulation of compliers, going from a household with only two children to one with more than two children causes a decrease in weight-for-age by an average of 1.8 standard deviations. Second, two-stage least squares estimation, based on a linear functional form, yields a moderately larger, but statistically insignificant, point estimate. Thus, the flexibility afforded by the nonparametric approach matters, at least in this application. 14

16 References [1] Abadie, A. (2003), Semiparametric Instrumental Variable Estimation of Treatment Response Models, Journal of Econometrics, 113, [2] Ahn, T.S., J. Knodel, D. Lam, and J. Freidman (1998), Family Size and Children s Education in Vietnam, Demography, 35, [3] Aitchison, J. and C.G.G. Aitken (1976), Multivariate Binary Discrimination by Kernel Method, Biometrika, 63, [4] Angrist, J. and W. Evans (1998), Children and Their Parents Labor Supply: Evidence from Exogenous Variation in Family Size, American Economic Review, 88, [5] Angrist, J., K. Graddy, and G.W. Imbens (2000), The Interpretation of Instrumental Variables Estimators in Simultaneous Equations Models with an Application to the Demand for Fish, Review of Economic Studies, 67, [6] Angrist, J. and G.W. Imbens (1995): Two-Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity, Journal of the American Statistical Association, 90, [7] Angrist, J.D., G.W. Imbens, and D.B. Rubin (1996), Identification of Causal Effects Using Instrumental Variables, Journal of the American Statistical Association, 91, [8] Becker, G.S. (1960), An Economic Analysis of Fertility, in Demographic and Economic Change in Developed Countries, Universities-National Bureau Conference Series, No. 11, Princeton, NJ: Princeton University Press. [9] Becker, G.S. and H.G. Lewis (1973), On the Interaction Between the Quantity and Quality of Children, Journal of Political Economy, 82, S [10] Becker, G.S. and N. Tomes (1976), Child Endowments and the Quantity and Quality of Children, Journal of Political Economy, 84, S [11] Black, S.E., P.J. Devereux, and K.G. Salvanes (2005), The More the Merrier? The Effect of Family Composition on Children s Education, Quarterly Journal of Economics, 120,

17 [12] Bloom, D.E., D. Canning, and J. Sevilla (2001), The Effect of Health on Economic Growth: Theory and Evidence, NBER Working Paper No [13] Butcher, K.F. and A. Case (1994), The Effect of Sibling Sex Composition on Women s Education and Earnings, Quarterly Journal of Economics, 109, [14] Chernochovsky, D. (1985), Socioeconomic and Demographic Aspects of School Enrollment and Attendance in Rural Botswana, Economic Development and Cultural Change, 33, [15] Cogill, B. (2003), Anthropometric Indicators Measurement Guide, Food and Nutrition Technical Assistance Project, Academy for Educational Development, Washington, D.C. [16] Conley, D. and R. Glauber (2005), Parental Education Investment and Children s Academic Risk: Estimates of the Impact of Sibship Size and Birth Order from Exogenous Changes in Fertility, NBER Working Paper [17] Cruces, G. and S. Galiani (2004), Fertility and Female Labor Supply in Latin America: New Causal Evidence, unpublished manuscript, London School of Economics. [18] Fogel, R. (1994), Economic Growth, Population Theory and Physiology: the Bearing of Long- Term Processes on the Making of Economic Policy, American Economic Review, 84, [19] Frölich, M. (2004), Finite Sample Properties of Propensity-Score Matching and Weighting Estimators, Review of Economics and Statistics, 86, [20] Fröhlich, M. (2005a), Nonparametric IV Estimation of Local Average Treatment Effects with Covariates, Journal of Econometrics, forthcoming. [21] Frölich, M. (2005b), Matching Estimators and Optimal Bandwidth Choice, Statistics and Computing, 15, [22] Glick, P., A. Marini, and D.E. Sahn (2005), Estimating the Consequences of Changes in Fertility on Child Health and Education in Romania: An Analysis Using Twins Data, Cornell University Food and Nutrition Policy Program Working Paper No [23] Gomes, M. (1984), Family Size and Educational Attainment in Kenya, Population and Development Review, 10,

18 [24] Hanushek, E. (1992), The Trade-off Between Child Quantity and Quality, Journal of Political Economy, 100, [25] Heckman, J. (1997), Instrumental Variables - A Study of Implicit Behavioral Assumptions Used in Making Program Evaluations, Journal of Human Resources, 32, [26] Hurvich, C.M., J.S. Simonoff, and C.-L. Tsai (1998), Smoothing Parameter Selection in Nonparametric Regression Using an Improved Akaike Information Criterion, Journal of the Royal Statistical Society, Series B, 60, [27] Imai, K. and D.A. van Dyk (2004), Causal Inference With General Treatment Regimes: Generalizing the Propensity Score, Journal of the American Statistical Association, 99, [28] Imbens, G.W. (2004), Nonparametric Estimation of Average Treatment Effects Under Exogeneity: AReview, Review of Economics & Statistics, 86, 4-29 [29] Imbens, G.W. and J. Angrist (1994): Identification and Estimation of Local Average Treatment Effects, Econometrica, 62, [30] Kaestner, R. (1997), Are Brothers Really Better? Sibling Sex Composition and Educational Achievement Revisited, Journal of Human Resources, 32, [31] Knodel, J. and M. Wongsith (1991), Family Size and Children s Education in Thailand: Evidence from a National Sample, Demography, 28, [32] Lee, J. (2004), Sibling Size and Investment in Children s Education: An Asian Instrument, IZA Discussion Paper No [33] Lee, M.-J. (2005), Micro-Econometrics for Policy, Program, and Treatment Effects, Oxford: Oxford University Press. [34] Li, Q., and J. Racine (2004). Cross-Validated Local Linear Nonparametric Regression, Statistica Sinica, 14, [35] López-Casasnovas, G., B. Rivera, and L. Currais (2005), Health and Economic Growth: Findings and Policy Implications, Cambridge, MA: MIT Press. 17

19 [36] Millimet, D.L. and L. Wang (2005), Is the Quantity-Quality Trade-off Really a Trade-off for All? Southern Methodist University, Department of Economics Working Paper No [37] Mock, P.R. and J. Leslie (1984), Childhood Malnutrition and Schooling in the Terai Region of Nepal, Journal of Development Economics, 20, [38] N c, Nonparametric software by Jeff Racine ( [39] Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge, Cambridge University Press. [40] Parish, W. and R. Willis (1993), Daughters, Education, and Family Budgets: Taiwan Experiences, Journal of Human Resources, 28, [41] Qian, N. (2005), Quantity-Quality: The Positive Effect of Family Size on Scholl Enrollment in China," unpublished manuscript, Department of Economics, MIT. [42] Racine, J. and Q. Li (2004), Nonparametric Estimation of Regression Functions with Both Categorical and Continuous Data, Journal of Econometrics, 119, [43] Rosenbaum, P. and D. Rubin (1983): The Central Role of the Propensity Score in Observational Studies on Causal Effects, Biometrika, 70, [44] Rosenzweig, M. and K.I. Wolpin (1980), Testing the Quantity-Quality Fertility Model: The Use of Twins as a Natural Experiment, Econometrica, 48, [45] Rosenzweig, M. and K.I. Wolpin (2000), Natural Natural Experiments in Economics, Journal of Economic Literature, 38, [46] Strauss, J., K. Beegle, B. Sikoki, A. Dwiyanto, Y. Herawatt, and F. Witoelar (2004a), The Third Wave of the Indonesia Family Life Survey: Overview and Field Report, Rand Labor and Population Working Paper series. [47] Strauss, J., K. Beegle, B. Sikoki, A. Dwiyanto, Y. Herawatt, and F. Witoelar (2004b), User s Guide for the Indonesia Family Life Survey, Wave 3, Rand Labor and Population Working Paper series. 18

20 [48] Strauss, J. and D. Thomas (1998), Health, Nutrition and Economic Development, Journal of Economic Literature, 36, [49] Thomas, D. and E. Frankenberg (2002), Health, Nutrition and Prosperity: a Microeconomic Perspective, Bulletin of the World Health Organization, 80, [50] Thomas, D., V. Lavy, and J. Strauss (1996), Public Policy and Anthropometric Outcomes in the Cŏte d Ivoire, Journal of Public Economics, 61, [51] Thomas, D., J. Strauss, and M. Henriques (1990), Child Survival, Height for Age and Household Characteristics in Brazil, Journal of Development Economics, 33, [52] Thomas, D., J. Strauss, and M. Henriques (1991), How Does Mother s Education Affect Child Height?, Journal of Human Resources, 26, [53] Vidmar, S., J. Carlin, K. Hesketh and T. Cole (2004), Standardizing Anthropometric Measures in Children and Adolescents with New Functions for egen, Stata Journal, 4, [54] Wang, M.C., and J. Van Ryzin (1981), A Class of Smooth Estimators for Discrete Estimation, Biometrika, 68, [55] Willis, R.J. (1973), A New Approach to the Economic Theory of Fertility Behavior, Journal of Political Economy, 81, S [56] Wolpin, K.I. (1997), Determinants and Consequences of the Mortality and Health of Infants and Children, Chapter 10 in M.R. Rosenzweig and O. Stark (eds.) Handbook of Population and Family Economics, Vol.1A,ElsevierScienceB.V. [57] Wooldridge, J.M. (2002). Econometric Analysis of Cross Section and Panel Data, Cambridge, MA: MIT Press. [58] Yau, L. and R. Little (2001), Inference for the Complier-Average Causal Effect from Longitudinal Data Subject to Noncompliance and Missing Data, with Application to a Job Training Assessment for the Unemployed, Journal of American Statistical Association, 96,

21 Table 1. Summary Statistics. More than Two Children Two Children Variable Mean SD Min Max Obs Mean SD Min Max Obs Weight for age (z -score) First two children are same sex (1 = yes) Age in months Gender (1 = male) First Child's Gender (1 = male) Birth Order Father's Education Less than elementary school Junior high school Senior high school University Other Mother's Education Less than elementary school Junior high school Senior high school University or Other Father's Weight Mother's Weight Notes: Data from Indonesia Family Life Survey 3 wave. Appropriate sample weights utilized. Additional control indicating residence in one of 13 provinces not shown.

22 Table 2. Cross-Tabulation of Treatment Variable and Instrument Treatment (D ) Instrument (Z ) Total Total Note: D equals one if the household has more than two children; zero otherwise. Z equals one if the first two children are of the same gender; zero otherwise.

23 Table 3. Estimates of Average Treatment Effects: OLS, Two-Staged Least Squares, and Generalized Kernel Estimation. Weight-for-Age Model Coefficient SE OLS (No Controls): ATE (0.040) Two-Stage Least Squares (No Controls): LATE (0.793) F-test F = [p = 0.00] OLS (With Controls): Conditional ATE (0.055) Two-Staged Least Squares (With Controls): Conditional LATE (1.736) F-test F = 6.96 [p = 0.01] Generalized Kernel: Conditional LATE (0.936) NOTES: Control variables include those listed in Table 1, where birth order, parental education, and province are converted into multiple dummy variables. F-test refers to the test of significance of the instrument(s) in the first-stage. Appropriate sample weights used. Robust standard errors used for OLS and TSLS estimates; bootstrap errors used for GK estimate (199 repetitions). indicates statistical significance at the 95% level.

Sensitivity checks for the local average treatment effect

Sensitivity checks for the local average treatment effect Martin Huber March 13, 2014 University of St. Gallen, Dept. of Economics Abstract: The nonparametric identification of the local average treatment