Econometrica, Vol. 75, No. 5 (September, 2007), 1513 1518 IDENTIFICATION OF MARGINAL EFFECTS IN NONSEPARABLE MODELS WITHOUT MONOTONICITY BY STEFAN HODERLEIN AND ENNO MAMMEN 1 Nonseparable models do not impose any type of additivity between the unobserved part and the observable regressors, and are therefore ideal for many economic applications. To identify these models using the entire joint distribution of the data as summarized in regression quantiles, monotonicity in unobservables has frequently been assumed. This paper establishes that in the absence of monotonicity, the quantiles identify local average structural derivatives of nonseparable models. KEYWORDS: Nonparametric, partial identification, nonseparable model, quantile regression, nonparametric instrumental variable, weak axiom. 1. INTRODUCTION IN MANY CLASSICAL ECONOMETRIC MODELS the error terms have been modeled as additively separable because this allows the application of powerful statistical tools. However, additive separability has often no real foundation in economic theory, and hence models with weakly separable error terms have become increasingly popular. In the most basic setup, such a model takes the form (1) Y = φ(x A) where Y and X are observable real valued random p and d vectors, φ is a smooth measurable function, and A is an unobservable random vector. Often, the relationship between Y and some or all of the regressors X is of key economic interest, while A captures omitted factors and all types of unobserved heterogeneity. This model (1)was first analyzed byroehrig (1988).Recent contributions that avoid assuming monotonicity are by Imbens and Newey (2003), Altonji and Matzkin (2005), and Hoderlein (2007). See also Hoderlein and Mammen (2006) for more references. In this paper we will be concerned with the following question: What can we learn from the entire joint distribution of observables (Y X) about marginal effects of φ with respect to economically important variables in the structural model (1)? To be clear, we are not looking for a minimal set of assumptions that make the structural model (1) completely identifiable. Instead we look at functionals of the structural model that are still identified without invoking any major assumption on the structure of φ or the dimensionality of A. 1 Stefan Hoderlein dedicates this paper to the memory of Eduardo Hoderlein. We are indebted to the coeditor Whitney Newey, to three anonymous referees, to Andrew Chesher, Joel Horowitz, Oliver Linton, Rosa Matzkin, and Jim Powell, as well as to seminar participants at the ESWC, EMS Oslo, Bergen, Berlin, Göttingen, Heidelberg, Frankfurt, Madrid, Mannheim, Northwestern, Strassburg, Tübingen, and UCL/IFS for helpful comments. The usual disclaimer applies. 1513
1514 S. HODERLEIN AND E. MAMMEN The main result of this paper establishes that for scalar Y the conditional expectations of derivatives of φ with respect to X, given all available data, that is, X and Y, are identified under a weak conditional independence assumption. This result does not imply that every single individual s marginal effect in a heterogeneous population is identified (e.g., x1 φ), but it shows that the average marginal effects over all individuals with the same observable data outcomes (covariates and response) are identified. We call this quantity the local average structural derivative (LASD). For instance, by use of quantile derivatives we can identify the average marginal effect of a marginal increase in schooling on wage for a subpopulation defined by different values of the covariates, say age and number of children, as well as at different quantiles of the wage distribution. Consequently, if policy makers want to target the average wage of young low income families with many children by marginally increasing their schooling, then it would be exactly the LASD they would have to consider. Of course, in this example the obvious question is that of endogeneity of the regressors. In Hoderlein and Mammen (2006) we discussed the economic rationale of our approach in more detail and extended it precisely to include endogenous regressors. Moreover, we discussed semiparametric modeling, nonparametric estimation, and testing (including the associated large sample theory), and we provided an application. 2. IDENTIFICATION: BASIC RESULT In this section we establish formally what can be learned from data about the marginal effect of one regressor, say x 1 on the dependent variable y.summarizing the setup, in model (1), let Y be a random scalar and let X be a random d vector. A is an unobserved random variable that takes values in a Borel space A, that is a set that is homeomorphic to a Borel subset of the unit interval endowed with the Borel σ-field. This includes the case that A is a random element of a Polish space, for example, a random piecewise continuous (utility) function. Moreover, let x1 denote the partial derivative with respect to the first component of x. We can now formulate the central question as follows: What can be learned about x1 φ from the observed data (Y X)? To answer this question, we introduce some notation and assumptions. Let k α (x) denote the conditional α-quantile of Y given X = x, that is, for 0 <α<1 the quantity k α (x) is defined by P[Y k α (x) X = x] =α For fixed values x R d and 0 <α<1 we will make use of the following assumptions: ASSUMPTION A1: The random variables A and X 1 are conditionally independent given X 2 X d. ASSUMPTION A2: The conditional distribution of Y given X is absolutely continuous with respect to the Lebesgue measure for x 1 in a neighborhood of x 1 and for x 1 = x 1. Here we use the notation x 1 = (x 2 x d ). The density
MARGINAL EFFECTS IN NONSEPARABLE MODELS 1515 f Y X (y x 1 x 1 ) of Y given X is continuous in (y x 1) at the point (y x 1 ) = (k α (x ) x 1 ). The conditional density f Y X(y x ) of Y given X = x is bounded in y R. ASSUMPTION A3: k α (x) is partially differentiable with respect to the first component at x = x. Moreover, there exists a measurable function that satisfies P [ φ(x 1 + δ x 1 A) φ(x A) δ (A) εδ X = x ] = o(δ) for δ 0andfixedε>0. We write x1 φ(x a)for (a) and x1 φ for (A). ASSUMPTION A4: The conditional distribution of (Y x1 φ) given X is absolutely continuous with respect to the Lebesgue measure for x = x. For the conditional density f Y x1 φ X of (Y ; x1 φ) given X, we require that f Y x1 φ X(y y x ) Cg(y ),wherec is a constant and g is a positive density on R with finite mean (i.e., y g(y )dy < ). REMARK 2.1: The only substantial assumption is Assumption A1, a(conditional) independence assumption that is weaker than full joint independence of A and X usually assumed in this literature. The other assumptions are common regularity conditions. Note that all other regressors may be arbitrarily correlated with the unobservables A and they may be discrete. For a discussion of the economic content of these assumptions, including an example, we refer again to Hoderlein and Mammen (2006). Under these assumptions, we obtain the following theorem on the relation between the derivative of the conditional quantile and the marginal effect of the nonseparable function φ. The proof can be found in the Appendix. THEOREM 2.1: For fixed values x R d and 0 <α<1, assume that Assumptions A1 A4 hold. 2 Then E[ x1 φ(x A) X = x Y = k α (x )]= x1 k α (x ) REMARK 2.2: This theorem states that we can identify an average over the marginal effects x1 φ from the data. The derivative of the quantile is the best approximation (in the sense of minimizing L 2 distance) to the true marginal effects given all the information at our disposal. Obviously, from this quantity, weighted population averages can be obtained where the weights may depend on the dependent variable. In Hoderlein and Mammen (2006), we extended 2 Here the versions of the conditional expectation and of the conditional quantile are calculated with versions of the conditional and unconditional densities that fulfill Assumptions A1 A4.
1516 S. HODERLEIN AND E. MAMMEN this result further by treating the case of endogenous regressors in a control function setup. However, we also showed by counterexample that an extension to systems of equations is not possible without further assumptions. REMARK 2.3: Let us compare our model to the case where A is univariate, without loss of generality uniformly distributed on [0 1], andφ is strictly monotone in A. Under this additional monotonicity assumption the function φ(x α) is identified by k α (x), and hence x1 φ(x α) = x1 k α (x). However, this assumption has severe consequences. Take as an example consumer demand: By observing the expenditure Y of an individual for only one value of the covariate X, the expenditure for the same individual is determined for all values of X. Moreover, any two individuals with the same expenditure at one X have identical demands for all X. Finally, together with conditional independence, monotonicity implies that the ordering of individuals remains strictly preserved, that is, they stay at exactly the same percentile of expenditure for any level of the covariates. As is obvious from these examples, invoking the assumption of monotonicity should be carefully considered in general. To see the differences from the monotonicity approach formally, suppose for the moment that we were interested in the closest approximation to the square of the derivative x1 φ(x A). Given all our information, that is, X and Y, the best L 2 approximation to this quantity is E[ x1 φ(x A) 2 X = x Y = y] = E[ x1 φ(x A) X = x Y = y] 2 + Var[ x1 φ(x A) X = x Y = y] = x1 k α (x) 2 + Var[ x1 φ(x A) X = x Y = y] x1 k α (x) 2 with equality holding only if x1 φ is degenerate in the sense that it is a function of (X Y ) only; else, E[ x1 φ(x A) 2 X = x Y = y] > x1 k α (x) 2, that is, we can only bound the quantity of interest. The obvious conclusion is that for general nonlinear functions of the derivative, the monotonicity approach still point identifies the nonlinear function of the derivative, while without monotonicity we will, in general, not even be able to give bounds for the best approximation. Dept. of Economics, University of Mannheim, L 7, 3-5, 68131 Mannheim, Germany; stefan_hoderlein@yahoo.com and Dept. of Economics, University of Mannheim, L 7, 3-5, 68131 Mannheim, Germany; emammen@rumms.uni-mannheim.de. Manuscript received October, 2005; final revision received April, 2007.
MARGINAL EFFECTS IN NONSEPARABLE MODELS 1517 APPENDIX PROOF OF THEOREM 2.1: For simplicity of exposition, we concentrate on the scalar x case and we write X = X 1. By definition of k α (x),forδ>0, (2) where 0 = P[Y k α (x + δ) X = x + δ] P[Y k α (x ) X = x ] = A 1 + A 2 + A 3 A 1 = P [ φ(x + δ A) k α (x + δ) X = x + δ ] P [ φ(x + δ A) k α (x ) X = x + δ ] A 2 = P [ φ(x + δ A) k α (x ) X = x + δ ] P [ φ(x + δ A) k α (x ) X = x ] A 3 = P [ φ(x + δ A) k α (x ) X = x ] P [ φ(x A) k α (x ) X = x ] By Assumption A1 we have A 2 = 0. For A 1 we get (3) A 1 = kα(x +δ) k α(x ) f Y X (y x + δ) dy = δ x k α (x )f Y X (k α (x ) x ) + o(δ) by Assumptions A2 and A3 for δ 0. For A 3 we get, for δ 0, (4) A 3 = P [ k α (x )<Y k α (x ) + Y φ(x + δ A) X = x ] P [ k α (x ) + Y φ(x + δ A) < Y k α (x ) X = x ] = P [ k α (x ) Y k α (x ) δ x φ X = x ] P [ k α (x ) δ x φ Y k α (x ) X = x ] + o(δ) where in the last step we have used Assumptions A3 and A4. By some more simple algebra, [ A 3 = P Y k α (x ) x φ Y k α(x ) ] X = x δ [ P Y k α (x ) x φ Y k α(x ) ] X = x + o(δ) δ [y kα(x )]/δ = f Y xφ X(y y x )dydy k α(x )
1518 S. HODERLEIN AND E. MAMMEN 0 = δ δ 0 = δ δ kα(x ) + u 0 [y k α(x )]/δ f Y xφ X(y y x )dydy + o(δ) f Y xφ X(k α (x ) δu y x )dudy u f Y xφ X(k α (x ) δu y x )dudy + o(δ) ( y )f Y xφ X(k α (x ) y x )dy 0 y f Y xφ X(k α (x ) y x )dy + o(δ) = δe[ x φ Y = k(x ) X = x ]f Y X (k α (x ) x ) + o(δ) From (2) (4)andA 2 = 0 we get the statement of the theorem. Q.E.D. REFERENCES ALTONJI,J.,AND R. MATZKIN (2005): Cross Section and Panel Data Estimators for Nonseparable Models with Endogenous Regressors, Econometrica, 73, 1053 1103. [1513] HODERLEIN, S. (2007): How Many Consumers Are Rational? Working Paper, Mannheim University. [1513] HODERLEIN,S.,AND E. MAMMEN (2006): Partial Identification and Nonparametric Estimation of Local Average Structural Derivatives, Working Paper, Mannheim University. [1513-1515] IMBENS, G., AND W. NEWEY (2003): Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity, Working Paper, MIT. [1513] ROEHRIG, C. (1988): Conditions for Identification in Nonparametric and Parametric Models, Econometrica, 56, 433 447. [1513]