Bias from Censored Regressors - PDF Free Download

Bias from Censored Regressors Roberto Rigobon Thomas M. Stoker September 2005 Abstract We study the bias that arises from using censored regressors in estimation of linear models. We present results on bias in OLS regression estimators with exogenous censoring, and IV estimators when the censored regressor is endogenous. Bound censoring such as top-and bottom-coding result in expansion bias, or e ects that are too large. Independent random censoring results in bias that varies with the estimation method; attenuation bias in OLS estimators and expansion bias in IV estimators. We note how large biases can result when there are several regressors, and how that problem is particularly severe when a 0-1 variable is used in place of a continous regressor. 1. Introduction When the values of the dependent variable of a linear regression model are bounded and censored, the OLS estimates of the regression coe cients are biased. This fact is a standard lesson covered in textbooks on econometrics. It has stimulated a great deal of work on how to estimate coe cients when there is a censored dependent variable. On the other hand, even though the same is true when the values of the regressors are censored namely that estimates of coe cients are biased there has been very little study of this phenomena in the literature. This is a little odd, because in practice it appears that one encounters censoring in regressors or independent variables as often, or more often, than censoring in dependent variables. Moreover, as we show in this paper, the biases implied by censored regressors can be very insidious for practical work in many cases expansion bias occurs, so that estimated e ects are larger than the associated true e ects. When one is trying to discover what are the most important in uences in an empirical problem, having estimates that are too large can be at least as troublesome as having estimates that are too small or of the wrong sign. Sloan School of Management, MIT, 50 Memorial Drive, Cambridge, MA 02142 USA. We are grateful for valuable comments from several seminar audiences, and want to speci cally thank Elie Tamer, Mitali Das, Jinyong Hahn, Jerry Hausman and Whitney Newey. 1

In this paper we present many results on bias from estimating with censored regressors in a linear model. 1 We have results for various censoring structures, including our two primary examples of independent random censoring and censoring to an upper or lower bound. We cover OLS estimators for the case where the censored regressor is exogenous, and IV estimators for the case where the censored regressor is endogenous. We derive explicit formulae and illustrate the di erent biases graphically for bivariate models, and we cover the often severe transmission of bias that occurs when there are several regressors. While the majority of the exposition focuses on single-value censoring, we close with some results on the biases that arise when a 0-1 variable is used in place of a continuous regressor. Censoring is very prevalent in observed data. There are several sources of censoring that are useful to keep in mind. One source is pure measurement error, albeit of a nonstandard form. Consider how often variables are observed in ranges, including unlimited top and bottom categories. For instance, observed household income is often recorded in increments of one thousand or ve thousand dollars, and would have a top-coded response of, say, $100,000 and above. Nonreponses are sometimes recorded at a bound value. For instance, household nancial wealth (e.g. stock holdings) may be recorded with a lower bound of zero, and some of those zero values may be genuine zeros or may be nonresponses. A second source of censoring occurs because observed data does not match the economic concept of interest, and is a censored version of it. Suppose we are interested in the impact of cash constraints on rm investment behavior. One cannot observe how cash-constrained a rm actually is, only imperfect re ections of it, such as the fact that dividends are paid or new debt was recently issued. Consider observed dividends as a measure of cash constraints. If dividends are large and positive, the rm is very likely not to be cash constrained. However, dividends are never observed to be negative. Zero values could represent a rm with either minimal cash constraints, or a rm that is struggling with major cash constraints. As such, observed dividends are a censored version of lack of cash constraints," which is the concept of interest to a study of rm investment decisions. While re ective of the above sources, it is useful to note how dummy (0-1) variables can t into our discussion. Consider the classical problem of returns to education. Suppose individual log-wages are regressed on a 0-1 variable indicating whether the individual has a college degree. Suppose that the appropriate measure of education is number of years in school. Alternatively, suppose that there are separate discernible returns to high school, college, possibly varying with major or specialization and post-graduate study. In either case the 0-1 variable that indicates college degree is a censored version of the appropriate education measure. This raises a large number of issues for the interpretation of the estimates of coe cients of 0-1 variables, as well 1 Initial versions of a few of the results in this paper have previously been reported in the unpublished working papers Rigobon and Stoker (2005a, 2005b). All such results have been re ned and updated, and many are extended in this version. 2

as coe cient of other variables in the equation. Here we relate the bias from 0-1 censoring to our discussion of independent censoring and bound censoring. We note how biases can force the coe cient of a 0-1 variable to have the wrong sign, as well as generate large expansion bias in coe cients of correlated regressors. The econometrics literature has essentially no coverage of biases from censored regressors, and relatively little coverage of estimation with censored regressors. An exception is Manski and Tamer (2002), who study identi cation and consistent estimation with interval data. The statistical literature on missing data problems covers some situations of censored regressors, with most results applicable to data missing at random. See the survey by Little (1992) and Little and Rubin (2002) for coverage of this large literature, and Ridder and Mo t (2003) for survey of the related literature on data combination. 2 Bound censoring (top-coding and bottom-coding) is a nonignorable data coarsening in the sense of Heitjan and Rubin (1990, 1991). We have collected many bias results in one paper to point out both some similarities across procedures, as well as some striking di erences. For example, with bound censoring, one expects expansion bias with OLS and IV estimators, or e ects that are measured as being too large. In contrast, independent random censoring leads to attenuation bias in OLS estimators, but to expansion bias in IV estimators. Finally, we will note how the transmission of bias with correlated regressors can result in enormous bias in estimation. The exposition begins by introducing the framework. Results on bias in OLS estimators of bivariate models are presented rst. These are followed by results on bias in IV estimation when the censored regressor is endogenous, including illustration with the case where the instrument represents random assignment. Results on bias with several regressors are then discussed. Finally, some results on 0-1 censoring are derived and illustrated. We present some concluding remarks, discussing related work on testing and estimation. 2. Bias with Censored Regressors We are interested in bias from using a censored regressor in estimating a linear model. We assume that the true model is an equation of the form y i = + x i + 0 w i + " i i = 1; :::; n (1) where x i is a single regressor of interest, and w i is a k-vector of predictor variables. We assume that the distribution of x i ; wi; 0 " i is nonsingular and has nite second moments. For studying regression estimators, we assume E (" i jx i ; w i ) = 0. 2 Recent contributions include Chen, Hong and Tamer (2005), Chen, Hong and Tarossa (2004), Horowitz and Manski (1998,2000), Liang, Wang, Robins and Carroll (2004), Tripathi (2003,2004), Mahajan (2004) and Ichimura and Martinez-Sanchis (2005). 3

The problem is that we do not observe x i, but rather a censored version of it. Consider a censoring process described by an indicator d i, and denote the probability of censoring as p = Prfd = 1g: When the regressor is censored, it is set to the value. That is, we observe where x cen i the model x cen i = (1 d i ) x i + d i (2) is the censored version of x i. The question of interest is what happens if we estimate y i = a + bx cen i + f 0 w i + u i i = 1; :::; n; (3) How are the estimates ^a; ^b; ^f biased as estimators of ; ;? While the censoring process can be quite general, for our analysis of regression, we exclude the problems that arise from censoring the dependent variable y i. We assume that the regressor x i is exogenously censored, by assuming that E (" i jd i ; x i ; w i ) = 0 (4) This is equivalent to assuming that " i and d i are uncorrelated, conditional on x i and w i. We will drop this assumption in Section 2.2, when x i is assumed to be endogenous. It is also useful to note that (2) represents single value censoring, to, which we assume for now, but relax at various points in the exposition. There are two primary examples we will allude to throughout the text. First is independent random censoring, where d i is taken as independent of x i and w i. 3 Second is bound censoring, as in top-coding with censoring above an upper bound or bottom-coding with censoring below a lower bound 4 d i = 1 [x i > ] (5) d i = 1 [x i < ] (6) Many of our results will extend immediately to the case of double bounding, where there is top-coding with upper bound 1 together with bottom coding with lower bound 0. Double 3 The term "random censoring" is sometimes used to describe a situation where a dependent variable is boundcensored, and the bound varies randomly across observations. Here, we will use this term as we have indicated, to censoring of a regressor, where the censoring process is either independent or substantially uncorrelated with the regressor value. 4 In the parlance of the missing data literature (c.f. Little and Rubin (2002)), our notion of random censoring is analogous to "missing completely at random," or MCAR. Top-coding and bottom-coding involve censoring determined by the value of the regressor, so that they are analogous to "not missing at random" processes, or NMAR, where in addition, the censoring threshold is given by the censoring value. 4

bounding provides a simple example of two-value censoring: 5 Much of the intuition about bias is available when the model is bivariate, which also facilitates good pictures. For the next two subsections, we focus on that specialization. 2.1. Censoring in Bivariate Regression We now assume that the true model is bivariate regression, y i = + x i + " i i = 1; :::; n (7) where we have dropped w i from (1), and make the same distributional assumptions as above, including E (" i jx i ) = 0. We are interested in the (asymptotic) bias in the OLS coe cient ^b from estimating the model y i = a + bx cen i + u i i = 1; :::; n: (8) where x cen i is the censored version of x i given in (2), namely It is useful to express the bias in proportional terms, as P n ^b = i=1 (xcen i x cen ) (y i y) P n i=1 (xcen i x cen ) 2 : (9) plim ^b = (1 + ) : (10) There is no bias i = 0. Attenuation bias refers to the situation where bias refers to the situation when > 0. 1 < < 0. Expansion We begin by discussing our two primary examples. Start with independent censoring, where d i is independent of the value of x i (and y i ). Figure 1 illustrates this situation with > E (x). The small circles indicate the censored data points, including the block of points with x i =. Since regressor values are randomly censored from the scatter, the observations with x i = have center of mass below the regression line. The net result is attenuation bias, with the estimated slope lower than the true slope. Note how the bias will be sensitive to the value of. If we move to the right, the center of mass of the points with x i = does not change, and so there will be more attenuation bias in the OLS coe cient. If we move to the left of E (x), the center of mass will now be above the regression line, and attenuation bias still results. When 5 In our notation, double bounding (with 0 < 1 ) is represented as where d 0i = 1 [x i < 0 ] and d 1i = 1 [x i > 1 ]. x cen i = (1 d 0i d 1i ) x i + d 0i 0 + d 1i 1 5

= E (x), there is no such tendency, and for that situation we expect no bias. These graphical arguments are all veri ed formally below. Now consider censoring to an upper bound, or top-coding. Figure 2 shows a scatterplot of data with and without top-coding of the regressor. Again, the small circles are the resulting data points when the regressor is censored at the upper bounds. This results in expansion bias, as the model estimated with the censored regressor clearly has a steeper slope, because of pile-up" of observations at the bound. One reason for positive bias is that there are positive e ects on y i associated with x i in the tail, that are not captured by x cen i. It is clear that an analogous situation of expansion bias would arise with censoring to a lower bound, or bottom-coding. In each of these cases, the extent of bias will be sensitive to the censoring value : more censoring implies more bias. Finally, all these features are true with double bounding, as illustrated in Figure 3. We now turn to some speci c results. Proposition 1 gives a general characterization of the bias in bivariate regression. Proposition 1. OLS Bias: Bivariate Regression: Provided that 0 < p < 1, we have where Consequently, the bias term = p (1 p) 1. = 0 if and only if = E(xjd = 1) or = E (xjd = 0) ; 2. > 0 if otherwise < 0. plim ^b = (1 + ) (11) (E(xjd = 1) ) ( E(xjd = 0)) : (12) V ar (x cen ) E(xjd = 0) < < E (xjd = 1) or E(xjd = 1) < < E(xjd = 0); The proof of Proposition 1 is direct, and given in the Appendix A. It is clearly important to consider the censoring process d i and the censoring value separately. For instance, the sign and extent of the bias are a ected by the position of : there is no bias if equals either conditional mean E(xjd = 1) or E (xjd = 0), there is expansion bias if is between the means, and there is attenuation bias in all other circumstances. Our primary examples are covered as corollaries of this basic result. With independent random censoring, we have E(xjd = 1) = E (xjd = 0) = E (x), so the numerator of (12) is negative unless = E (x). So, in regression, random censoring implies 6

attenuation bias. Clearly, this result only requires 0 correlation between d i and x i. The explicit formula for bias in this case is given in the following: Corollary 2. OLS Bias: Uncorrelated Censoring. Suppose Cov(x; d) = 0 and Cov(x 2 ; d) = 0. Then plim ^b = (1 + ran ) where ran = p We have ran 0, with equality holding only if = E (x). ( E (x)) 2 V ar (x) + p ( E (x)) 2 : (13) For bound censoring, condition 2. of Proposition 1 applies. With top-coding, we clearly have E(xjd = 0) < < E (xjd = 1), and with bottom coding, we clearly have E(xjd = 1) < < E(xjd = 0), so that bound censoring implies expansion bias. We summarize this as Corollary 3. OLS Bias: Bound Censoring. Suppose 1. Suppose the regressor is top-coded at 1, with d i = 1 [x i > 1 ] then where 1 > 0: plim ^b = (1 + 1 ) (14) 2. Suppose the regressor is bottom-coded at 0, with d i = 1 [x i < 0 ] then where 0 > 0: plim ^b = (1 + 0 ) (15) 3. Suppose the regressor is top-coded at 1 and bottom-coded at 0, with 0 < 1, then where > 1 + 0 above. plim ^b = (1 + ) (16) The result on double bounding says that bias from two-sided censoring is greater than the sum of the biases associated with censoring only to the upper bound and lower bound separately. This re ects the fact that with two-sided censoring, the censored points are in two separated groups and therefore have more in uence on the estimated regression coe cient. 7

The speci c formula for bias from top-coding and bottom-coding depends on expectations over truncated distributions. We can get a sense of the size of the bias by computing it for a speci c distribution. In that spirit, Figure 4 presents the expansion bias from one-sided and twosided bound censoring under the assumption that x i is normally distributed. 6 This is computed for di erent levels of censoring (p), which is equivalent to setting di erent bound limits (). The solid line displays the bias of (16) from top- and bottom-coding under the assumption of symmetric (two-sided) censoring, with probability p censored in each tail, and it is plotted against the total probability of censoring 2p. The dashed line is the expansion bias 1 of (14) from using top-coded data (one-sided censoring), which is plotted against the total same total censoring probability. For instance, plotted over 2p = :20 is the two-sided bias from censoring 10% in each tail, and the one-sided bias from censoring 20% in the upper tail. For comparison, the diagonal (2p; 2p) is included as the dotted line. We see that the bias is roughly linear in 2p for low censoring levels, up to around 30% of the data censored. After that the bias rises nonlinearly, but a bias that doubles the coe cient value involves a lot of censoring; 60% or more of the data. The two-sided bias is greater than the one-sided bias over the whole range of probabilities. So, in this sense, censoring on both sides induces more than twice the bias of censoring on one side only. As noted above, this makes sense because with two-sided censoring, the censored points are in two separated groups and therefore have more in uence on the estimated regression coe cient. 2.2. Endogenous Censored Regressors We now consider the case where the regressor that is censored is endogenous, and where we have a valid instrument for the uncensored regressor. The censoring induces bias in IV estimators that has a quite di erent structure than bias in OLS coe cients. We now cover these issues in the bivariate format. We give intuition for the case of random assignment, and then follow with some general results. As above, we assume that the true model is y i = + x i + " i i = 1; :::; n (17) where x i is the (uncensored) endogenous regressor, " i is the disturbance and z i denotes a valid instrument for x i. In particular, we assume that f(x i ; z i ; " i ) j i = 1; :::; ng is an i.i.d. random sample from a distribution with nite second moments, with E (") = 0, Cov(z; x) 6= 0 and Cov (z; ") = 0. This implies that Cov (z; y) Cov (z; x) = (18) 6 The bias formulae themselves are complicated and do not admit to obvious interpretation. Because of that, we give a brief derivation and show the formulae in Appendix B. 8

which is the consistent limit of the IV estimator if there were no censoring. Instead of x i, assume we observe x cen i = (1 d i ) x i + d i : (19) where again d i represents a general censoring process with p = Prfd = 1g 6= 0. We assume that d i is such that Cov(z; x cen ) exists and is nonzero. 7 When we use x cen i in estimation, we compute the IV estimator ^c that uses z i to instrument x cen i. We have ^c = P n i=1 (z i z) (y i y) P n i=1 (z i z) (x cen i x cen )! Cov (z; y) Cov (z; x cen ) (20) where we have assumed Cov(z; x cen ) 6= 0, since otherwise ^c has no probability limit. From (18), the asymptotic limit of ^c is Cov (z; x) plim ^c = : (21) Cov (z; x cen ) The bias of the IV estimator is expressed in proportional form as where x i = x cen i + x o i, with the part of x i lost by censoring. plim ^c = (1 + ) ; with = Cov (z; xo ) Cov (z; x cen ) : (22) x o i = d i (x i ) (23) The intuition on why the IV estimator is biased is very clear. The instrument z i is valid in the true" data; meaning that it is correlated with the variable of interest x i and uncorrelated with the disturbance " i. The censored regressor x cen i is also correlated with the disturbance, and the di erence x o i is now omitted and correlated with the instrument, making z i invalid as an instrument in the equation with x cen i. 2.2.1. Random Assignment We can get further intuition from the case where the instrument represents random assignment into two groups. Assume z is a binary instrument indicating groups 0 and 1, and 7 For discussing asymptotic bias, we do not need to assume the analog of the exogenous censoring condition (4) for the IV case, namely E (" i jd i ; z i ; w i ) = 0 where w i refers to additional exogenous regressors. This condition is necessary when discussing approaches to testing and estimation that rely on estimation with complete cases (observations with d i = 0). 9

denote the probability of being in group 1 as q = Prfz = 1g 6= 0. Here Cov(z; x) 6= 0 implies E (x j z = 1) 6= E (x j z = 0), or that the grouping is associated with a shift in the mean of x. Likewise Cov (z; ") = 0 implies E (" j z = 1) = E (" j z = 0) = 0, or that there is no shift in the mean of " associated with the grouping. The IV estimator with instrument z i is now the group-di erence estimator of Wald (1940). 8 Analogous to (18), we have E (y j z = 1) E (y j z = 0) E (x j z = 1) E (x j z = 0) = : (24) When we use the censored regressor x cen i instead of x i, the IV estimator is ^c = y 1 y 0 x cen 1 x cen 0 where y 0 ; x cen 0 are averages over group 0 and y 1 ; x cen 1, are the averages over group 1. We have plim ^c = E (y j z = 1) E (y j z = 0) E (x cen j z = 1) E (x cen j z = 0) (25) = (1 + ) : (26) where and again, x o i = d i (x i = E (xo j z = 1) E (x o j z = 0) E (x cen j z = 1) E (x cen j z = 0) ) is the part of x i omitted by the censoring. Since (27) E (x j z = 1) E (x j z = 0) = E (x cen j z = 1) E (x cen j z = 0) (28) +E (x o j z = 1) E (x o j z = 0) ; the size and sign of the IV bias is determined by how the censoring operates on the two di erent assignment groups. When the data that is censored has broadly similar structure to the uncensored data, there is expansion bias. Namely, > 0 when the mean shifts E (x o j z = 1) E (x o j z = 0) and E (x cen j z = 1) E (x cen j z = 0) are of the same sign. There is attenuation bias ( < 0) only when the mean shifts are of opposite signs. We now turn to our two primary examples, independent censoring and bound censoring. 8 For instance, if x i were not censored, the group-di erence estimator is ^ = y 1 y 0 x 1 x 0 where y 1 = P n i=1 z iy i / P n i=1 z i is the average of y i for group 1, y 0 = P n i=1 (1 z i) y i / P n i=1 (1 z i), is the average of y i for group 0, etc. 10

When d is independent of z and x, we compute the IV bias explicitly. We have E (x o j z = 1) E (x o j z = 0) = E (d) [E (x j z = 1) E (x j z = 0)] (29) so that from (27) and (28), the proportional IV bias is = p [E (x j z = 1) E (x j z = 0)] = p 1 p > 0: (30) This bias expression is remarkable in its simplicity, and there are several important features to note. First, there is expansion bias in IV estimates from random censoring, which is the opposite of the attenuation bias noted for OLS estimates. Second, the IV bias does not depend on the distribution of x, the assignment grouping, or the censoring value. It depends only on p, the percentage censored, and is nonlinear and strictly convex in p. 9 Later we will show that the IV bias expression (30) is valid under much more general conditions than random assignment. Figures 5 and 6 illustrate IV bias with independent censoring. 10 Figure 5 shows the uncensored data, including the z value grouping. There is substantial overlap between the groups. Also illustrated are the group means and the uncensored IV (group di erence) estimator. 11 Figure 6 shows what happens with 30% random censoring. The censored data are shown as small circles, with the censoring value = 4. The IV t is clearly steeper, which illustrates the positive IV bias. 12 Mechanically, the within-group means of x are both shifted toward but the within-group means of y are unchanged, so that the slope (25) is increased. It is worth mentioning that the slope bias does not depend on the speci c censoring value, and the same tilting occurs whether = 0 or = 6. 13 For bound censoring, assume that we have top-coding, with d i = 1 [x i > ] : (31) Figure 7 illustrates how expansion bias arises from top-coding. The same data are used here as in Figures 5 and 6, and now censoring occurs for values greater than = 6. Clearly, much more 9 For instance, 25% censoring is associated with 33.3% proportional bias, and 50% censoring is associated with 100% proportional bias. 10 The speci cation is as follows. The true model is y i = 2 + :5x i + " i. The probability of z i = 1 is q = :6. For z i = 0, (x i ; " i ) is joint normal with mean (2; 0), where the standard error of x i is 2, the standard error of " i is 1 and the correlation between x i and " i is :5. For z i = 1, we have (x i ; " i ) joint normal with mean (6; 0) and the same variance parameters. See note 8. 12 The value of the uncensored estimator is ^ = :46 and the censored estimator is ^d = :64. The actual fraction of data censored is.28. 13 We veri ed this but do not include an additional gure. 11

censoring occurs for observations with z = 1 than for those with z = 0. Because of top-coding, the mean of the x values for z = 1 is shifted more to the left than than the mean of x values for the z = 0 group, which gives the positive IV bias. 14 We would expect expansion bias to be the typical case with top-coding, but it not assured. If the z = 0 group had much wider dispersion than the z = 1 group, then top-coding could involve censoring more of the right tail of the z = 0 group than that of the z = 1 group. If the mean of x values for the z = 0 group is shifted more than the mean for z = 1, then attenuation bias would result. 15 To learn more, we can apply probability arithmetic to express the numerator of as E (x o j z = 1) E (x o j z = 0) = p [E (x j d = 1; z = 1) E (x j d = 1; z = 0)] (32) Cov (z; d) + [E (x j d = 1; z = 1) ] q Cov (z; d) + [E (x j d = 1; z = 0) ] ; 1 q for the case of random assignment, but with a general censoring indicator d. Applying (32) to top-coding shows that the IV bias depends on the censoring percentage p, the distributions of the x s within the groups and the relative locations of the group distributions. If the mean of the observations censored from the z = 1 group is at least as large as the mean from the z = 0 group, and at least as much censoring occurs in the z = 1 group as the z = 0 group (Cov (z; d) 0), then all the RHS terms of (32) are positive, and expansion bias ( > 0) arises in the IV estimator. The case of random assignment further illustates how there is no IV bias only in very unusual circumstances. The IV bias is zero only when E (x o j z = 1) E (x o j z = 0) = 0: (33) In other words, the mean shift (28) that identi es occurs entirely in the mean of the censored regressor x cen, so that the censoring doesn t remove any information of value for identifying. We can get some more primitive conditions for zero bias from (32). Zero IV bias arises if the 14 Here we have ^ = :46 as before, and ^d = :59 with top-coding. The fraction of data censored is.34. 15 With the mixture of normals design of the gures, one could solve for how much the standard deviation of x conditional on z = 0 has to increase relative to that of z = 1 to generate attenuation bias. We note here some casual observations. Quadrupling the standard error of x to 8 for z = 0 produces attenuation bias as the rule. Tripling to 6 produces simulated results with each kind of bias: some with attenuation bias, some with small bias and some with expansion bias. 12

mean of the censored observations is the same for both groups: and E (x j d = 1; z = 1) = E (x j d = 1; z = 0) (34) either Cov (z; d) = 0 or = E (x j d = 1) This is more involved than just the mean of the censored observations either the instrument must be uncorrelated with the censoring, or the censoring value is exactly equal to the mean. Again, it seems di cult to conceive of censoring mechanisms that would produce these circumstances without assuming speci c formulas for the underlying distributions. 2.2.2. Results on IV Bias in Bivariate Models We now return to the setting of a general instrument z. The IV bias is given in (22), repeated here as plim ^c = (1 + ) ; with = Cov (z; xo ) Cov (z; x cen ) : (35) We now give a general characterization of IV bias, and follow that with results on random censoring and bound censoring. Let z and x denote the following mean di erences between censored and uncensored observations: z E (zjd = 1) E (zjd = 0) = x E (xjd = 1) E (xjd = 0) = 1 Cov(z; d); (36) p (1 p) 1 Cov(x; d): (37) p (1 p) IV bias is characterized in Proposition 5, which follows from the formulae in Lemma 4. The proofs of these and subsequent results are collected in Appendix A. Lemma 4. We have Cov (z; x cen ) = (1 p) Cov (z; xjd = 0) + p (1 p) z [ E (xjd = 0)] ; (38) and Cov (z; x o ) = pcov (z; xjd = 1) + p (1 p) z [E (xjd = 1) ] (39) Cov (z; x) = pcov (z; xjd = 1) + (1 p) Cov (z; xjd = 0) + p (1 p) z x (40) 13

Proposition 5. IV Bias: Bivariate Models: The proportional IV bias, = Cov (z; x o ) =Cov (z; x cen ), is = p Cov (z; xjd = 1) + (1 p) z [E (xjd = 1) ] (41) 1 p Cov (z; xjd = 0) + p z [ E (xjd = 0)] The expression (41) shows that IV bias depends on the distribution of (z; x) for censored and uncensored data through conditional means and covariances, but also on the value that the data is censored to. This mirrors the same point stressed above for bias in OLS regression with censored regressors. Now consider independent random censoring. There must be nonzero expansion bias in this case, and in fact, the bias formula (30) holds quite generally. We have the following corollary, which includes independent censoring under much weaker correlation assumptions: Corollary 6. IV Bias: Uncorrelated Censoring. If Cov(z; d) = 0, Cov(x; d) = 0 and Cov(zx; d) = 0, then = p (42) 1 p with > 0 always. The conditions are equivalent to assuming that z, x and zx are mean independent of d, and clearly include statistical independence of (z; x) and d. The bias is always positive, varies directly with the amount of censoring p, but is not a ected by the distribution of (z; x) or the censoring point. This is in strong contrast to the attenuation bias induced in OLS estimators (or zero bias when = E (x)). That is, with IV estimation, expansion bias is always implied, regardless of the censoring value. 16 The role of the distribution and of the censoring point are clari ed by some results on partial" lack of correlation, as outlined in the following Corollary. Corollary 7. IV Bias: Partially Uncorrelated Censoring. 1. Uncorrelated with Instrument z. If Cov(z; d) = 0, then = p Cov (z; xjd = 1) 1 p Cov (z; xjd = 0) (43) 16 While not related to censoring, Black, Berger and Scott (2000) have a similar nding for a speci c model of errors-in-variables, where the bias in OLS coe cients is the opposite direction of the bias in IV coe cient estimates. 14

2. Uncorrelated with Predictor x. If Cov(x; d) = 0, then where ' z [E (x) ] : = p 1 p Cov (z; xjd = 1) + (1 Cov (z; xjd = 0) p) ' p' (44) When z is uncorrelated with the censoring, the bias under independent censoring is scaled by the ratio of covariances of z and x over censored and uncensored values. Expansion bias results if the covariances are of the same sign. If the covariance between z and x is much lower in the censored data than the uncensored data, then the IV bias will be smaller than with independence. 17 When only the regressor x is uncorrelated with the censoring, then there is a further modi cation to the formula for bias under independence. Correlation of the censoring indicator d with the instrument z induces a shift ' in the numerator and denominator of the proportional bias. The shift depends on the censoring value. If z is correlated with d, the shift ' vanishes only when the censoring is to the mean = E (x). We now consider bound censoring, where censoring is correlated with the regressor by construction. Consider top-coding, where d i = 1 [x i > ] (45) and assume that Cov(z; x) > 0 without loss of generality. The intuition developed for random assignment seems applicable in general. That is, with top-coding, d is positively correlated with x, and x is positively correlated with z, so it seems natural that z and d will be positively correlated and expansion bias in the IV estimate will result. We expect this as the typical case, but unfortunately it is possible for attenuation bias to result, if there is a large di erence between the distributions of (x; z) for censored observations and uncensored ones. The sense in which those distributions di er is spelled out in Corollary 8. The results for top coding are very similar to the results for bottom coding, where and so we list them together in the Corollary d i = 1 [x i < ] (46) Corollary 8. IV Bias: Bound Censoring. Suppose Cov (z; x cen ) > 0 1. If x is top-coded at and Cov (z; d) > 0 or 17 If is worth noting that zero correlation between z and d can include situations where d is correlated with the disturbance ". 15

2. If x is bottom-coded at and Cov (z; d) < 0, then if and only if > 0 Cov (z; xjd = 1) > (1 p) z [E (xjd = 1) ] (47) Clearly, > 0 if Cov (z; xjd = 1) 0, in either case. With top-coding, we have z > 0 and [E (xjd = 1) ] > 0, and with bottom coding, we have z < 0 and [E (xjd = 1) ] < 0. In either case the right-hand side of (47) is negative. Therefore, Corollary 8 says expansion bias in IV estimators is the norm with bound censoring, unless the correlation between z and x in the censored data is substantially negative. 18 That is, with top-coding the correlation structure of (z; x) for the high x values has to be radically di erent, essentially the opposite, of the correlation structure for the low x values. An analogous statement applies for bottom-coding. 19 It would be desirable to discover some primitive conditions that are closely associated with IV expansion bias when there is bound censoring. Unfortunately, all the primitive conditions that the authors have discovered are much stronger than the tenets of Corollary 8. Of those, there is one set of conditions worth mentioning, which essentially assures the positive relation between the regressor, instrument and censoring discussed above. This is where z is mean-monotonic in x; namely E (zjx = x 1 ) E (zjx = x 0 ) for any values x 1 x 0 (48) This condition guarantees all the conditions of Corollary 8, as summarized in Corollary 9. IV Bias: Bound Censoring with Mean Monotonicity of z in x. Suppose (48) holds, then > 0 1. If x is top-coded at or 2. If x is bottom-coded at. Mean-monotonicity gives a uniform structure to the covariance of z and x over ranges of x values. It is not a counterintuitive condition. For instance, if (z; x) is joint normally distributed then z is mean-monotonic in x. But it clearly implies a strong relationship between the endogenous regressor and the instrument. 18 The condition that Cov (z; x cen ) > 0 just assumes that the observed predictor is correlated with the same sign as the uncensored variable. 19 As noted for OLS estimators, there will be greater IV bias with both top-coding and bottom-coding than either one alone. 16

2.2.3. Relation to OLS Bias with a Censored Dependent Variable Every student of econometrics is familiar with the picture showing that the OLS estimator of a linear model coe cient is downward biased when the dependent variable is censored by bottomcoding (or top-coding). 20 It is useful to note that expansion bias in IV estimators is equivalent to downward bias of OLS in the rst stage regression due to bound censoring of the dependent variable x. Suppose again that Cov (z; x) > 0. We have expansion bias, > 0; if and only if Cov (z; x cen ) < Cov (z; x). This is the same as Cov (z; x cen ) V ar (z) < Cov (z; x) V ar (z) (49) The right-hand-side of (49) is the limit of the OLS coe cient of the rst-stage regression, of x i regressed on z i and a constant. The left-hand-side is the limit of the OLS coe cient of the rst-stage regression using the censored variable x cen i. The conditions for IV expansion bias > 0 are exactly the same as the conditions for downward bias arising in OLS coe cients when the dependent variable is censored, where bias here is de ned generally as the di erence in the OLS estimator limits with and without censoring. In other words, in examining IV expansion bias we have been equivalently characterizing the conditions that give downward bias in OLS coe cients with dependent variable censoring, doing so without making any assumptions about the form of a model" between x and z. One immediate corollary is that the rst-stage OLS coe cient will be downward biased with bound censoring when z is mean-monotonic in x. 2.3. Several Regressors and Transmission of Bias The case when there are several regressors is the most relevant to empirical practice. Bias from censoring one variable can contaminate the estimates of coe cients of other variables, which we refer to as bias transmission. This is important when the censored variable is representing an e ect of great interest, but perhaps of more importance when the censored variable is not the primary focus of interest. We may, in fact, care the most about the coe cients of the other variables, including the censored variable as a generalized control. With bias transmission, the estimates of the coe cients of primary interest can be badly wrong because of including an imperfect control. We now focus on this situation for censoring as we have studied above. In Section 3, we focus on the same problem when we use a 0-1 variable in place of a continuous regressor. 20 See Davidson and MacKinnon (2004), Figure 11.3, page 482, among many others. 17

It is often very di cult to obtain speci c results on coe cient bias when there are several regressors. 21 When one regressor is censored, we are able to get some broad insight as well as a couple speci c results, as presented below. We focus only on OLS estimators, although results of a similar nature should be available for IV estimators. We return to basic model (1), where we assume a single additional regressor w i, given as y i = + x i + w i + " i i = 1; :::; n: (50) The regressor x i is censored as x cen i = (1 d i ) x i + d i, and we are interested in what happens when we estimate the model y i = a + bx cen i + fw i + u i i = 1; :::; n:; (51) How are the OLS estimates ^b and ^f biased as estimators of and? In broad terms, the issue centers on how well w i proxies x i for the censored observations. Write (50) as y i = + x cen i + w i + x o i + " i i = 1; :::; n; (52) with x o i = d i (x i ) as before. If w i and x i are only slightly correlated, then ^f will estimate with little bias, and ^b will estimate with the bias appropriate for bivariate regression. However, if w i and x i are highly correlated, then ^f will be very biased as an estimate of, as w i acts to proxy for x i in the censored data. That is the basic logic biases arise that are consistent with the censored values being proxied by other included regressors. We can sharpen this logic somewhat. There is no bias transmission when ^f is consistent for, which occurs if Cov (x cen ; w) = 0. We can develop this as Cov (x cen ; w) = Cov (x; w) Cov (x o ; w) (53) = (1 p) [Cov (x; wjd = 0) = 0 p fe (wjd = 1) E (wjd = 0)g f E (xjd = 0)g] Notice that it is not su cient for w i to be uncorrelated with x i in the uncensored data. We either must have censoring to the mean, = E (xjd = 0), or w i uncorrelated with the censoring, E (wjd = 1) = E (wjd = 0). On the nature of bias when w i and x i are correlated, we appeal to our primary examples. Consider random censoring where we assume that d i is statistically independent of x and w. 21 Notable exceptions include the work on multivariate errors-in-variables of Klepper and Leamer (1984) and on bias with polynomial terms of Griliches and Ringstad (1970). 18

With calculations similar to those applied in Corollary 2, it is easy to show that where ^b plim ^f V ar (x = cen ) Cov (x cen ; w) Cov (x cen ; w) V ar (w) = [G + ph] 1 G + [G + ph] 1 Cov (x cen ; y) Cov (w; y) 1 (ph) 0 + Cov(x;w) V ar(w)! (54) V ar (x) Cov (x; w) ( E (x)) 2 G = and H = Cov (x; w) V ar (w) This gives the asymptotic bias as ^b plim ^f = p [G + ph] 1 H 0 1 (1 p) 1 Cov(x;w) V ar(w) 0 V ar (w)! (55) (56) This re ects attenuation bias (as found before) if there is low correlation between w i and x i, as well as the transmission of bias if w i and x i are highly correlated. Moreover, recall that Corollary 2 showed that there is zero bias for bivariate regression when = E (x). That is not true in multivariate regression with a correlated regressor. Equation (56) shows there is nonzero bias if Cov (x; w) 6= 0; even when = E (x) For bound censoring, the exact bias formula are too complicated to admit easy interpretation (and too tedious to derive here). Some interpretation is possible if we expand the exact bias in p and examine the leading terms. In particular, suppose we have top-coding with d i = 1 [x i > ], and we assume E (x) = 0, E (w) = 0 for simplicity. Then we have 22 ^b plim ^f with = p V ar (x) V ar (w) Cov (x; w) V ar (w) E V ar (x) C C = Cov (x; wjd = 1) + E (wjd = 1) fe (xjd = 1) E = fe (xjd = 1) Cov (x; w) C Cov (x; w) E Now, suppose that w and x are positively correlated (in censored and uncensored data), and that > 0. Because of top-coding, we have E > 0 and because of the correlation we have C > 0. Thus the bias in each coe cient is a di erence of positive terms. With low correlation between w and x, the term C is small and the expansion bias term E dominates for ^b, together with 22 See Rigobon and Stoker (2005a) for details on this derivation. g g (57) (58) 19

a negative bias term induced for ^f. As the correlation increases, expansion bias in ^b decreases and, in essence, transmits to bias in ^f. This demonstrates the basic insight we discussed above. It is possible for the bias to have either sign depending on the correlation between x and w. For extremely large correlation, the positive expansion bias in ^b can be wholly reversed, and a positive bias arises for ^f. In that case, it appears that w is doing a better job of proxying for x than the censored x cen is. We now illustrate these cases. 2.3.1. Bias Transmission with Normal Regressors To clarify the intuition of the previous derivation, we assumed that the underlying (uncensored) regressors x and w were joint normally distributed 23. We computed the biases by simulation for di erent levels of censoring and di erent correlations between x and w. The biases resulting from top-coding are presented in Table 1. There are four levels of censoring from mild (10%) to severe (60%). There are ve correlation values between x and w, from no correlation (0%) to moderate correlation (50%, 75%) and nally, extreme correlation (95%). For the coe cient ^b of the censored regressor, the 0 correlation case shows the highest expansion bias, increasing with censoring. These values are consistent with those for bivariate regression, and there is no bias in ^f, the coe cient of the other regressor. As correlation is increased, expansion bias in ^b is reduced and bias emerges in ^f. The amount of transmission is pretty substantial with moderate correlation, and we have included the cases of -.5 and.5 correlation to illustrate the symmetry in the structure of these biases. Finally, with extreme correlation, the bias is reversed for ^b and there is large bias in f: _ In this case w does a better job of representing the omitted x than the censored version x cen. This is all in line with the intuition discussed above. Table 2 presents the bias results for an alternative situation, where x is taken to be independently censored to its mean = E (x), with the additional regressor w in the equation. With 0 correlation, there are no biases, which is consistent with the bivariate result for independent censoring to the mean. As the correlation is increase, bias emerges in ^f, as w is proxying for x for the censored observations, and we have a resultant attenuation bias in ^b. This phenomenon increases monotonically in both the censoring level and the correlation value, with moderate correlation and censoring resulting in large bias in ^f and ^b. In particular, censoring to the mean eliminates bias only in situations equivalent to bivariate regressors with several correlated regressors, huge bias can result in the case. With extreme correlation, even random censoring of 10% can result in coe cient biases of 50%. 23 Speci cally, we set = 1, = 1, and = 1 and assumed that x and w have the same variance. 20

Bias of: Correlation Censoring -50% 0% 50% 75% 95% 10% b 6.2% 7.4% 6.2% 3.2% -17.6% ^f -2.0% 0.0% 2.0% 5.7% 24.3% 20% b 12.2% 15.5% 12.2% 5.0% -32.8% ^f -5.2% 0.0% 5.2% 11.8% 43.8% 40% b 26.2% 34.3% 26.2% 7.4% -52.5% ^f -12.2% 0.0% 12.0% 26.8% 68.0% 60% b 45.5% 62.8% 45.5% 12.8% -61.8% ^f -20.9% 0.0% 20.9% 41.3% 80.5% Table 1: Coe cient Biases: One Top-Coded Regressor. Bias of: Correlation Censoring -50% 0% 50% 75% 95% 10% b -3.1% 0.0% -3.1% -11.3% -47.9% ^f -6.3% 0.0% 6.3% 15.1% 50.2% 20% b -6.3% 0.0% -6.3% -20.6% -64.8% ^f -12.6% 0.0% 12.6% 27.5% 68.4% 40% b -11.7% 0.0% -11.7% -33.4% -78.6% ^f -23.6% 0.0% 23.6% 45.3% 82.6% 60% b -16.7% 0.0% -16.7% -44.1% -84.9% ^f 33.3% 0.0% 33.3% 58.2% 89.1% Table 2: Coe cient Biases: One Regressor Independently Censored to Mean. 21

3. Censoring a Regressor to a 0-1 Variable Our nal topic is to consider the implications of replacing a continuous regressor with a dummy (0-1) variable indicating low and high values. This is a severe form of two-value censoring, where almost all of the information of the original continuous variable is lost. Nevertheless, empirical practice is replete with examples of the use of dummy variables where an underlying continuous variable may or may not be more appropriate. Here, we wish to open this area of inquiry with a few general points in line with the ones we have raised above. 3.1. Bivariate Case Begin with the original bivariate framework (7). With the true model we are interested in the results from estimating y i = + x i + " i i = 1; :::; n; (59) y i = a + bd i + u i i = 1; :::; n: (60) where D i is a dummy variable indicating whether x i exceeds a threshold, D i = 1 [x i > ] : (61) Obviously, this represents two-value censoring (0, 1), and all information about x i is lost except whether it above the threshold or not. We denote Pr fd = 1g P. From the practical point of view it should be clear, even from the design, that b and are not the same. However, in practice, it is common to interprete one coe cient as a measure of the other. In our example about returns to years schooling, the question is how the return is related to the "college e ect" measured by estimating (60). We now formalize how they di er. For OLS estimators, we have the expressions plim ^a = E (yjd = 0) = + E (xjd = 0) plim ^b = E (yjd = 1) E (yjd = 0) = [E (xjd = 1) E (xjd = 0)] (62) The OLS slope coe cient ^b measures up to a positive scale. So, if the only question of interest concerns the sign of, then estimation with 0-1 censoring allows a consistent answer to that question. Any further interpretation of the value of ^b depends on the of the between-di erence E (xjd = 1) E (xjd = 0), which is an unknown aspect of the distrubution of the uncensored variable x. relative to continuous variable. 22

The same sort of bias arises for bivariate IV estimators. Suppose that z i is a valid instrument for x i in (59) and that ^c is the IV estimator using z i to instrument D i in (60). It is straightforward to show that (1 P ) Cov (x; zjd = 0) + P Cov (x; zjd = 1) plim ^c = [E (xjd = 1) E (xjd = 0)] + : P (1 P ) [E (zjd = 1) E (zjd = 0)] (63) Now there is an additional bias term, depending upon the within interaction of the instrument z i with x i. If z i and x i are positively correlated, then it is natural to expect that the nal covariance term is positive (higher x i broadly associated with higher z i values), in which case the IV estimator would estimate a term with same sign as. Any additional interpretation of the value of ^c relative to now would require knowing features of the joint distribution of x i and z i. However, it is easy to contruct examples where the covariance term of (63) is negative, and outweighs the rst term so that the overall bias scaling is negative. In those cases, the censoring of x i to D i gives a completely wrong depiction of the relation between y i and x i. 3.2. Multivariate Case Given the extreme nature of 0-1 censoring, one might expect the biggest bias issues to arise when there are additional regressors in the equation. If an additional regressor w i is correlated with x i, then the censoring will cause w i to proxy x i. The resulting transmission of bias is likely to be more extreme than in the cases studied earlier (since x i was observed for a positive fraction of the data). This will contaminate the coe cient of w i, and will likely a ect bias in the coe cient of D i as well. We now verify this intuition. Now the true model is (50), reproduced as y i = + x i + w i + " i i = 1; :::; n: (64) We are interested in the OLS estimates ^a, ^b, ^f of the coe cients of where D i is observed instead of x i. y i = a + bd i + fw i + u i i = 1; :::; n: (65) To develop the bias in this case, note rst that regressing a variable on D i leaves the withindeviation as residual; e.g. the residual of y i regressed on D is y i = y i (1 D i ) y 0 D i y 1 where y 1 = P n i=1 D iy i / P n i=1 D i is the average of y i for D = 1 and y 0 = P n i=1 (1 D i) y i / P n i=1 (1 D i), for D = 0. 23