Identi cation of Average E ects under Magnitude and Sign Restrictions on Confounding

Size: px

Start display at page:

Download "Identi cation of Average E ects under Magnitude and Sign Restrictions on Confounding"

Bryan Rose
5 years ago
Views:

1 Identi cation of Average E ects under Magnitude and Sign Restrictions on Confounding Karim Chalak y Boston College August 1, 214 Abstract This paper studies measuring the average e ects of X on Y in structural systems by imposing magnitude and sign restrictions on confounding without requiring (conditional) exogeneity of causes, treatment, or instruments. e study the identi cation of covariateconditioned average random coe cients, average nonparametric discrete and marginal e ects, local and marginal treatment e ects as well as average treatment e ects for the population, treated, and untreated. e characterize the omitted variables bias, due to confounders U, of regression and instrumental variables methods for the identi cation of these various average e ects, thereby generalizing the classic linear regression omitted variable bias representation. Then, using proxies for U, we ask how do the average direct e ects of U on Y compare in magnitude and sign to those of U on. Exogeneity and proportional confounding are limit cases yielding full identi cation. Alternatively, the e ects of X on Y are partially identi ed in sharp bounded intervals if is su ciently sensitive to U, and sharp upper or lower bounds may obtain otherwise. After studying estimation and inference, we apply this method to study the nancial return to education and the black-white wage gap. Keywords: causality, confounding, endogeneity, omitted variable, partial identi cation, proxy. Department of Economics, Boston College (University of Virginia starting Fall 214). chalak@bc.edu. y Acknowledgments: I thank the participants in the Northwestern Junior Festival of Recent Developments in Microeconometrics, Harvard Causal Inference Seminar, 212 California Econometrics Conference, 213 BU-BC Green Line Econometrics Conference, 213 North American inter Meeting of the Econometric Society, New York Camp Econometrics VIII, 23rd annual meeting of the Midwest Econometrics Group, 9th Greater New York Metropolitan Area Econometrics Colloquium, Cowles Foundation Conference on Econometrics, and the seminars at BC, the Federal Reserve Bank of Cleveland, UCSD, UCLA, USC, University of Pittsburgh, IUPUI, University of isconsin-milwaukee, Oxford, Royal Holloway University of London, University of Leicester, UVA, and University of Montreal as well as Kate Antonovics, Andrew Beauchamp, Stéphane Bonhomme, Donald Cox, Julie Cullen, Stefan Hoderlein, Arthur Lewbel, Matthew Masten, and Elie Tamer for helpful comments. I thank Rossella Calvi, Daniel Kim, and Tao Yang for excellent research assistance. Any errors are the author s responsibility. 1

2 1 Introduction This paper studies identifying and estimating average causal e ects in structural systems by imposing restrictions on the magnitude and sign of confounding without requiring conditional exogeneity of causes, treatment, or instruments given covariates. In particular, we study the full (point) and partial identi cation of covariate-conditioned average random coe cients, average nonparametric discrete and marginal e ects, local and marginal treatment e ects as well as average treatment e ects for the population, treated, and untreated. To illustrate the paper s main ideas, consider a Mincer (1974) earning structural equation, frequently employed in empirical work (see e.g. discussion in Card, 1999), given by Y = Y + X + U Y ; (1a) where Y denotes the logarithm of hourly wage, X denotes observed determinants of wage including years of education, and the scalar U, commonly referred to as ability in the literature, denotes unobserved skill. Thus, both X and U are potential structural determinants (causes) of Y, albeit realizations of Y and X are observed whereas those of U are not. As discussed below, we emphasize that this paper s approach does not require a linear or parametric speci cation. Nevertheless, in order to introduce the main ideas in their simplest form, we let U be scalar and consider constant slope coe cients and Y for now but allow for a random intercept which may be correlated with X. e also leave implicit conditioning on covariates. Our Y object of interest here is, the vector of (average) direct e ects of the elements of X on Y (e.g. average nancial return to education). Because U is freely associated with X and may cause Y (e.g. education choices and wage may depend on ability), we say that U is an unobserved confounder and X is endogenous. The researcher observes realizations of a vector Z of potential instruments that are uncorrelated with Y but possibly freely correlated with ability U and therefore invalid. This allows for the possibility that a potential instrument for education, e.g. proximity to a college, may be correlated with ability U, e.g. due to unobserved parental characteristics or choices. e let Z and X have the same dimension; in particular, Z may equal X. Suppose that the researcher observes realizations of a proxy for U that is possibly error-laden and given by = + U ; (1b) where, for now, we consider a constant slope coe cient and random intercept which may be correlated with X. For example, may denote the logarithm of a test score commonly used as a proxy for ability, such as IQ (Intelligence Quotient) or K (Knowledge of the orld of ork). This parsimonious speci cation facilitates comparing the slope coe cients on U in the Y and equations while maintaining the commonly used log-level speci cation for 2

3 the wage equation. In particular, Y and denote respectively the semi-elasticities of wage and test score with respect to unobserved ability (i.e. 1 Y % and 1 % are the average approximate percentage changes in wage and test score respectively due directly to a unit or percentile increase in U). Alternatively, one could consider standardizing the variables in these two equations, in which case the slope coe cients on standardized ability denote standard deviation shifts in wage and test score respectively due to a standard deviation shift in ability. Let Z be uncorrelated with ; thus, the proxy is informative, in the sense that any correlation between Z and arises solely due to 1 U. De ne Z ~ Z E(Z). e then have E( ZY ~ ) = E( ZX ~ ) + E( ZU) ~ Y and E( Z ~ ) = E( ZU) ~ : Provided E( ZX ~ ) is nonsingular and 6=, we obtain = E( ~ ZX ) 1 E( ~ ZY ) E( ~ ZX ) 1 E( ~ Z ) Y : This expression for involves two linear instrumental variables (IV) regression estimands E( ZX ~ ) 1 E( ZY ~ ) and E( ZX ~ ) 1 E( Z ~ ). It also involves the unknown Y denoting the ratio of the (average) direct e ect of U on Y to that of U on. Importantly, the IV regression omitted variable bias (or inconsistency) E( ZX ~ ) 1 E( Z ~ ) Y in measuring is known up to this ratio. As we show, similar expressions for the average e ects of X on Y obtain, under suitable assumptions, in the cases of random slope coe cients and nonparametric e ects. e ask the following questions: 1. How does the average direct e ect of U on Y compare in magnitude to that of U on? 2. How does the average direct e ect of U on Y compare in sign to that of U on? The answers to these questions impose restrictions on the magnitude and sign of confounding which fully or partially identify the average e ects of X on Y. The paper does not require particular answers to these questions. Instead, it characterizes the mapping 2 from every possible answer to the corresponding identi cation region for the average e ects of X on Y. In particular, exogeneity is a limiting special case, which can obtain if the average direct e ect Y of U on Y is zero, yielding full (point) identi cation. Proportional confounding is another limiting case, in which the average direct e ect of U on Y equals a known proportion of that of U on, also yielding full identi cation. Alternatively, weaker restrictions on how the average direct e ect of U on Y compares in magnitude and/or sign to that of U on partially identify elements of, 1 More generally, we let U denote the vector of unobservables that drive Y or the proxies and are thought to be freely correlated with Z (conditional on covariates S), and we absorb into Y and respectively the unobserved drivers of Y and that are (conditionally) uncorrelated with Z. It then su ces to have as many proxies as confounders U. Recall that Z may equal X. 2 Leamer (1983) suggests the slogan the mapping is the message. 3

4 yielding sharp bounded intervals when the proxy is su ciently sensitive to the confounder U, and sharp lower or upper bounds otherwise. Sometimes, economic theory and evidence can provide guidance to answering these questions in particular contexts. For example, in the earning equation illustrative example, it may be reasonable to assume that, given the observables, wage is on average less elastic or sensitive to unobserved ability than the test score is, i.e. Y. Speci cally, given the observed characteristics, a change in U may, on average, directly cause a higher percentage change in the test score than in wage. Moreover, we sometimes further assume that ability, on average, directly a ects wage and the test score in the same direction, i.e. Y. These assumptions are in accord with several theoretical and empirical ndings. For instance, Cawley, Heckman, and Vytlacil (21) nd that the fraction of wage variance explained by measures of cognitive ability is modest and that personality traits are correlated with earnings primarily through schooling attainment. Provided that ability measures, such as IQ or K, are su ciently associated with unobserved ability U, this suggests that the average direct e ects of U on Y may be modest. Second, when ability is not revealed to employers, they may statistically discriminate based on observables such as education (see e.g. Altonji and Pierret, 21; Arcidiacono, Bayer, and Hizmo, 21). This also suggests a modest average direct e ect of U on Y. Third, the empirical ndings in this paper corroborate the assumption Y since allowing Y to be larger than often extends the estimated identi cation regions to include a negative average return to education and a black-white wage gap in favor of blacks, which is inconsistent with the general ndings in the literature. Last, the assumptions Y, Y, or Y 1, underlying partial identi cation, are a weakening of the commonly employed assumption of exogeneity. Speci cally, with Z and U freely (conditionally) correlated, exogeneity requires the coe cient Y on U to be zero; we allow but don t require this. More generally, this paper s approach doesn t require a linear or parametric speci cation; Sections 4 and 5 give key general results. In particular, Section 4 studies the structural system: Y = r(x; S; U; U Y ) and = q(s; U; U ), (2) where the vectors of unobservables U Y and of confounders U interact nonseparably with X and a vector of observed covariates S to drive Y according to the unknown nonparametric structural function r. Further, U interacts with the unobserved vector U and the covariates S to drive the vector of proxies according to the unknown function q. Unlike U Y and U, U may statistically depend on X given the covariates S. Here, we study the identi cation of conditional average discrete and marginal causal e ects of X on Y given covariates S under magnitude and sign restrictions on confounding. For example, in the nonseparable nonparametric case, one can apply a probability transformation, e.g. for continuous response, proxy, and confounder, to 4

5 rewrite the response and proxy equations such that U, Y, and have uniform distributions. The researcher can then contrast the average sign and magnitude of the percentile changes in the response and proxy due to a percentile change in the confounder. To illustrate, consider rst the additively separable case in which Y = r(x; S; U Y ) + U Y and = + U, (3) where Y is a vector of random coe cients that can depend on S and U Y and where is a vector, and a matrix, of random coe cients that can depend on S and U. hen = and 3 U Y?XjS, we obtain the speci cation for the Y equation studied in e.g. Altonji Y and Matzkin (25), Hoderlein and Mammen (27), and Imbens and Newey (29), yielding full identi cation of various average e ects of X on Y. Section 4 studies the full and partial identi cation of conditional average e ects of X on Y in systems with U Y?XjS = s but where U may depend on X given S, rst in the additively separable case in equations (3) with the random vector Y possibly nonzero, and second when the e ect of U on Y in the nonseparable equations (2) is possibly nonzero. In section 3, we focus on the special case in which r(x; S; U Y ) = r (S; U Y ) + P k j=1 X jr j (S; U Y ) Y + X, with Y and denoting random intercept and slope coe cients. In this case, we study the identi cation of conditional averages of under magnitude and sign restrictions on confounding, while allowing for X and the potential instruments Z to be freely (conditionally) correlated with U and for Y to be nonzero. Appendix A contains additional extensions in the random coe cients case to allow for a panel structure and proxies included in the Y equation. Last, Section 5 studies treatment e ects and augments equations (2) and (3) from Section 4 with a threshold crossing equation generating the binary treatment X: X = 1fU X (Z; S)g; where U X is an unobserved variable and the function is unknown. For example, when Y = in the separable equations (3) and (U X ; U Y )?ZjS, we obtain the speci cation for the X and Y equations studied e.g. in Imbens and Angrist (1994) and Heckman and Vytlacil (25). Section 5 studies the full and partial identi cation of conditional local and marginal treatment e ects as well as conditional average treatment e ects for the population, treated, and untreated, when (U X ; U Y )?ZjS = s but U may depend on Z given S in both the separable case in equations (3) with Y possibly nonzero and the nonseparable case in equations (2) when the e ect of U on Y may be nonzero. 3 Throughout, we use A?BjS to denote conditional independence as in Dawid (1979). Further, we write A?BjS = s to denote conditional independence at S = s. 5

6 This paper s method provides a simple alternative to the common practice which informally assumes that conditioning on proxies for confounders ensures conditional exogeneity. For example, consider the illustrative linear example in equations (1a, 1b) with Cov(X; ( Y ; ) ) =. Then conditioning on doesn t ensure that the coe cient on X from a regression of Y on (1; X ; ) identi es. In particular, substituting for scalar U gives Y = Y Y + X + Y ; (4) and the possible conditional correlation 4 between X and given leads to regression bias. Indeed, more generally, conditioning on may, but need not, attenuate the regression bias (see e.g. ickens, 1972; Battistin and Chesher, 29; and Ogburna and Vandereele, 212). Note, however, that this regression consistently estimates ( ; Y ) if one assumes that Cov(U; Y ) = and = so that the proxy = U is a perfect rescaling of U. e don t require these assumptions. Rather than conditioning on mismeasured proxies, this paper employs proxies to bound the nonparametric regression bias (see Section 4). Furthermore, this paper s method provides a practical alternative to IV methods when potential instruments may be weak or (conditionally) endogenous. In particular, we don t require Cov(Z; ( Y ; U) ) = in equations (1a, 1b). Instead we need only impose Cov(Z; ( Y ; ) ) =. Moreover, is under-identi ed in equation (4) from the linear illustrative example since Z and X have the same dimension and thus there are fewer exogenous instruments for (X ; ) than needed for full identi cation. Similar di culties arise in more general nonlinear cases. As discussed above, imposing sign and magnitude restrictions on confounding is a weakening of exogeneity. Manski and Pepper (2) employ alternative assumptions to bound nonparametric average e ects. In particular, they assume known bounds on the range of Y and that E[r(x; s; U; U Y )jz = z; S = s] is monotonic in z. They also consider the assumption that r is monotonic in x. Okumura and Usui (214) combine the last two assumptions for Z = X along with the assumption that r is concave in x. Also, several recent papers employ alternative assumptions to partially identify linear or parametric e ects of endogenous variables 5. For example, Altonji, Conley, Elder, and Taber (211) assume that the selection on unobservables occurs similarly to that on observables. Also, Reinhold and outersen (29) and Nevo and Rosen (212) assume that the correlation between the potential instrument and U and that between the endogenous variable and U have the same sign and then further assume that the potential instrument is less correlated with U than the endogenous variable is. Lewbel 4 From = + U, we have that is generally correlated with U given. Since X (or Z) and U are freely correlated, it follows that is generally correlated with X (or Z) given. 5 See also e.g. Klepper and Leamer (1984) and Leamer (1987) who study linear systems where the measurement errors in the explanatory variables are uncorrelated with the explanatory variables and with the disturbances in the explained equations (as well as sometimes with each other). 6

7 (212) restricts the covariance of Z and the product of heteroskedastic error terms. Bontemps, Magnac, and Maurin (212) provide additional examples and a general treatment of set identi ed linear models. e don t require the assumptions in these papers. Instead, we employ proxies to identify average e ects under magnitude and sign restrictions on confounding. Of course, which identifying assumption is more appropriate depends on the context. After deriving sharp identi cation regions for the average direct e ects of X on Y in Sections 3, 4, and 5 under magnitude and sign restrictions on confounding, Section 6 studies estimation and inference. Last, Section 7 applies these results to study the return to education and the black-white wage gap. Using the data in Card (1995), we employ restrictions on confounding ( Y 1 and Y 1) to partially identify in sharp bounded intervals the covariate-conditioned average nancial incremental return to each year of education as well as the average black-white wage gap. Importantly, we don t require that instruments or regressors are conditionally exogenous. Generally, we nd that regression estimates, which would be con- sistent under exogeneity ( Y = ), provide an upper bound on the average return to education (e.g. 19:5% for the return to the 16 th year) and black-white wage gap ( 17:8%) and that the regression-based bound estimates are generally narrower than the IV-based ones, with especially narrower con dence intervals. Note that if one assumes that the proxy (log(k )) is a perfect rescaling of U then Y is estimated to be :23 with 95% con dence interval (CI) [:141; :264]. Allowing for error-laden proxies, the regression-based estimated sharp identi cation region for the black-white wage gap under sign and magnitude restrictions on confounding ( Y 1) is relatively wide, [ 17:8%; 1:9%] with a 95% CI [ 21%; 5:4%]. Thus, under these weaker than exogeneity assumptions on confounding, this data set is inconclusive about the extent of discrimination in the labor market. In contrast, the average return to education for the black subpopulation may di er slightly from the nonblack subpopulation, if at all. Further, we nd nonlinearity in the return to education, with the 12 th, 16 th, and 18 th years, corresponding to obtaining a high school, college, and possibly a graduate degree, yielding a high average return. For example, under sign and magnitude restrictions on confounding, the estimated identi cation region for the average return to the 16 th year is [13:33%; 19:5%] with 95% CI [7:5%; 25:1%] whereas that for the 13 th year is [:7%; 7:8%] with 95% CI [ 3:4%; 11:6%]. This nonlinearity may partly explain why, contrary to the expected direction of ability bias, linear IV estimates of the average return to education often exceed linear regression estimates. In particular, both types of estimates are weighted averages of yearly incremental returns for di erent subpopulations and the large IV estimates may re ect the relatively high return to graduation years for the subpopulation whose graduation outcomes depends on instruments such as proximity to college (see e.g. Card 1995, 1999). Section 8 concludes and mathematical proofs are gathered in Appendix B. 7

8 2 Data Generation The next assumption de nes the data generating process. Assumption 1 (S.1) (i) Let M ( S ; Z ; X ; ; Y ) be a random vector with unknown 1 `1 k1 m1 11 distribution P 2 P. (ii) Let a structural system generate the unobserved vectors U and U Y of countable dimension and confounders U collected in L (U ; U Y ; U ), covariates S, potential l1 instruments Z, causes X, proxies, and response Y such that Y = r(x; S; U; U Y ) and = q(s; U; U ); where r and q are unknown real- and vector-valued measurable functions respectively and E(Y; ) < 1. Realizations of M are observed whereas those of L are not. S.1(i) de nes the notation for observables. S.1(ii) imposes structure on the data generating process. e distinguish between the observed (or measured) variables M and unobserved (or latent) variables L. The vectors of unobservables U Y and of confounders U may interact nonseparably with the causes of interest X and covariates S to impact the response Y according to the nonparametric structural function r. e allow but do not require the availability of covariates S; if these are absent, set S = 1. Further, we allow but don t require elements of S to directly a ect Y ; if r doesn t directly depend on S, these may nevertheless serve as conditioning variables. e observe realizations of a vector of proxies for U. e sometimes allow for and X to have common elements. Analogously to U Y, the vector U interacts with S and U to generate the proxies according to q. Last, we also observe realizations of a vector of potential instruments Z possibly equal to, or containing elements of, X. Importantly, unlike U and U Y, U may statistically depend on Z or X given S, thereby creating di culties for the identi cation of the e ects of X on Y. Thus, elements of Z may but need not be valid instruments since these may be included in the Y equation and are freely (conditionally) correlated with U. 2.1 Causal E ects For 6 (x; s; u; u y ) and (x ; s; u; u y ) in X S U U Y, the direct e ect of X on Y at (x; x ) given (s; u; u y ) is (x; x ; s; u; u y ) r(x ; s; u; u y ) r(x; s; u; u y ). Further, when r is di erentiable in a particular cause of interest, we set k = 1 to denote this cause by X and we subsume, without loss of generality, the remaining elements of X into S. Then (x; s; u; u r(x; s; u; u y) is the direct marginal e ect of X on Y at x given (s; u; u y ). If U enters r separably from X A b. 6 Throughout, for random vectors A and B, we denote the support of A by A and that of A given B = b by 8

9 then (x; s; u; u y ) = (x; s; u y ) does not depend on U. Further, if the e ect of X on Y is linear, (x; s; u y ) does not depend on X, and we obtain the random coe cient speci cation (S; U Y ). Similarly, for (x; s; u; u y ) and (x; s; u ; u y ) in X S U U Y, the direct e ect of U on Y at (u; u ) given (x; s; u y ) is Y (u; u ; x; s; u y ) r(x; s; u ; u y ) r(x; s; u; u y ). Further, for l = 1 and r di erentiable in u, the direct marginal e ect of U on Y at u given (x; s; u y ) is Y (u; x; s; u y r(x; s; u; y). If U enters r separably from X and its e ect on Y is linear, we obtain the random coe cient speci cation Y Y (S; U Y ). Analogously, for (s; u; u w ) and (s; u ; u w ) in S U U, the direct e ect of U on h, for h = 1; :::; m; are h (u; u ; s; u w ) q h (s; u ; u w ) q h (s; u; u w ) and the direct marginal e ect of U on h is h (u; s; u q h(s; u; u w ). Thus, causal e ects, such as, Y, and, are features of the structural system and can derive from economic theory whereas the observability of X and U is an empirical matter. In this paper, we re interested in measuring certain (conditional) averages of (x; x ; s; u; u y ) and (x; s; u; u y ) such as the conditional average direct e ect of X on Y at (x; x ) given X = x and S = s: (x; x jx ; s) E[(x; x ; s; U; U Y )jx = x ; S = s] E[r(x ; S; U; U Y ) r(x; S; U; U Y )jx = x ; S = s]: For instance, for binary treatment X, averaging (; 1j1; s) over the distribution of S given X = 1 gives the average treatment e ect on the treated (; 1j1). As another example, we re interested in measuring the conditional average direct marginal e ect of X on Y at x given S = s: (xjs) E[(x; s; U; U Y )js = r(x; s; U; U Y )js = s]: It s also useful to give a succinct notation for the average direct e ects of U on Y and. For example, Y (u; xjs) E[ Y (u; x; s; U Y )js = r(x; s; u; U Y )js = s]. Similarly, for scalar, (u; u js) E[ (u; u ; s; U )js = s] E[q(s; u ; U ) q(s; u; U )js = s]: 3 Identi cation of Average Random Coe cients hile the paper s method doesn t require a linear or parametric e ect of X on Y, we nd it instructive to begin our analysis of the identi cation of average e ects under magnitude and sign restrictions on confounding by studying linear structures with random coe cients. e 9

10 relax the linearity assumption when studying the identi cation of conditional average nonparametric e ects in Section 4 and of local, marginal, and average treatment e ects in Section 5. Speci cally, we impose the following assumption in Section 3. Assumption 2 (S.2) Linearity: Assume S.1 with Cov[Z; (Y; ) ] < 1 and Y = r(x; S; U; U Y ) = r (S; U Y ) + kx X j r j (S; U Y ) + j=1 and let the h th component q h of q be given by h = q h (U; U ) = q h; (S; U ) + so that stacking h, h = 1; :::; m; into gives lx U g r g (S; U Y ) Y + X + U Y ; g=1 lx U g q h;g (S; U ) h + U h ; g=1 = + U : e collect the random coe cients 7 in ( ; vec( ) ; Y ; ; Y ). m1 lm 11 k1 l1 Thus, under S.2, for each observation i in a sample, we have Y i = Y;i + X i i + U i Y;i and i = ;i + U i ;i. This allows for random e ects for each individual or unit and encompasses constant slope coe cients as a special case. e suppress the index i when referring to the population. In particular, is the vector of random direct e ects of the elements of X on Y. To simplify the exposition, we set S = 1 in Sections 3.2 and 3.3 and study identifying the average e ect E() of X on Y. Section 3.4 accommodates covariates S and studies the identi cation of the conditional average e ect (S) E(jS) of X on Y given S. 3.1 IV Regression Notation Throughout this paper, for a generic random vector A with nite mean, we write: A E(A) and A ~ A A: For example, we write E() and Y E( Y ). Further, we employ the following succinct notation for IV regression coe cients and residuals. For generic random vector A and vectors B and C of equal dimension with E( C ~ A ~ ) nite and E( C ~ B ~ ) nite and nonsingular, we let R A:BjC E( ~ C ~ B ) 1 E( ~ C ~ A ) and A:BjC ~ A ~ B R A:BjC 7 e slightly abuse notation in the linear case and collect in and Y the random direct e ects of the elements of X and U on Y. Similarly, we collect in the random direct e ects of the elements of U on those of. 1

11 denote the linear IV regression estimand and residual respectively, so that by construction E( C ~ A:BjC ) =. For example, for k = `, R Y:XjZ E( Z ~ X ~ ) 1 E( Z ~ Y ~ ) is the vector of slope coe cients associated with X in a linear IV regression of Y on (1; X ) using instruments (1; Z ). In the special case where B = C, we obtain the linear regression coe cients and residuals: R A:B R A:BjB E( B ~ B ~ ) 1 E( B ~ A ~ ) and A:B A:BjB A ~ B ~ R A:B : 3.2 Characterization and Full Identi cation e begin by characterizing and studying conditions for full identi cation. Section 3.3 studies partial identi cation of elements of under sign and magnitude restrictions on confounding. For illustration, consider the example from the Introduction with scalar proxy and confounder U and constant slope coe cients: Y = Y + X + UY and = + U : Using the IV regression succinct notation, recall that, provided E( Z ~ X ~ ) is nonsingular, 6=, and Cov(Z; ( Y ; ) ) =, we have: = R Y:XjZ R :XjZ Y : In particular, the IV regression (omitted variable) bias R Y :XjZ depends on the ratio Y of the (average) direct e ect of U on the response Y to that of U on the proxy. Theorem 3.1 extends this result to allow for vectors U and and for random slope coe cients. e set S = 1 here; Section 3.4 explicitly conditions on covariates. Theorem 3.1 Assume S.2 with S = 1, ` = k, m = l, and that (i) E( ~ Z ~ X ) and are nonsingular, (ii) Cov( Y ; Z) =, E( ~ jz; X) =, and E( ~ Y ju; Z) =, (iii) Cov( ; Z) = and E( ~ ju; Z) =. Let 1 Y then = R Y:XjZ R :XjZ : e let B R Y:XjZ = R:XjZ denote the IV regression bias (or inconsistency) in measuring. hen Z = X, Theorem 3.1 gives that = R Y:X R :X. Next, we discuss the conditions in Theorem 3.1. Condition (i) requires ` = k and m = l, with E( Z ~ X ~ ) and nonsingular. More generally, ` k and m l su ce for full or partial identi cation and one may use a weighted combination of the resulting moments. In particular, 11

12 having ` k + m may fully identify. For example, if ` = k + m and m = l then ( ; ) = R Y:(X ; ) jz provided E[ ~ Z( ~ X ; ~ ) ] is nonsingular 8. e do not require this many instruments here; as such is under-identi ed. In particular, we let ` = k, with Z possibly equal to X. Conditions (ii) and (iii) are implied by the assumption that the coe cients are mean independent 9 of (U; Z; X) or the stronger assumption that (U ; U Y )? (U; Z; X). Note that when, Y, and are constants, conditions (ii) and (iii) reduce to Cov(Z; ( Y ; ) ) =. In particular, condition (ii) imposes assumptions on the random coe cients in the Y equation. It requires that Z is uncorrelated with Y, is mean independent 1 of (Z; X), and Y is mean independent of (U; Z). Roughly speaking, (ii) isolates U as the source of the di culty in identifying. Had U been observed with ` = k +l and E( ~ Z( ~ X ; ~ U )) nonsingular, (ii) would permit identifying the average slope coe cients via IV regression. Note that linearity and condition (ii) can (indirectly) restrict how the return to education depends on ability U (see e.g. Card 1999). In the linear case, and if valid instruments are available, one can consider IV methods for the correlated random coe cient model, e.g. ooldridge (1997, 23) and Heckman and Vytlacil (1998). Similarly, linearity and condition (ii) can restrict how Y relates to Z (and X), e.g. in learning models where the return to ability can vary with experience and depend on educational attainment (e.g. Altonji and Pierret, 21; Arcidiacono, Bayer, and Hizmo, 21). Sections 4 and 5 weaken these restrictions in Theorem 3.1 and give identi cation results for the case in which X and U can interact to determine Y via the nonseparable speci cation in S.1. Importantly, however, the conditions in Theorem 3.1 do not restrict the joint distribution of (U; Z ; X ) other than requiring that E( ~ Z ~ X ) is nonsingular. In particular, Z and X can be freely correlated with U and thus endogenous. Condition (iii) imposes restrictions on the random coe cients in the equation. It requires that Z is uncorrelated with and that the elements of are mean independent of (U; Z). This linearly relates Cov(Z; ) to Cov(Z; U) via the matrix. Note that we do not directly restrict the dependence between and U. Nevertheless, even when is constant and is independent of U, recall from the Introduction that the coe cient on X from a regression of Y on (1; X ; ) need not identify. e note that one means for full identi cation generates su ciently many valid instruments by imposing restrictions involving the components of,, and U. For example, let U, 1, and 2 be scalars and suppose that Y = Y + X + U Y, 1 = 1 + U 1, and 2 = 2 + U 2 ; 8 For example, if Y and are constant, substituting for U gives Y = Y + X +. 9 In the linear case, having ( Y ; ) be mean independent of Z may fully identify by generating a su cient number of instruments as functions of Z. Indeed, under such stronger (mean) independence assumptions involving Z, one can dispense with linearity as we show in Sections 4 and 5. 1 e employ the unnecessary mean independence assumptions in conditions (ii) and (iii) of Theorem 3.1 because of their simple interpretation. However, E[ ZX ~ ] ~ =, E[ ZU ~ ~ Y ] =, and E[ ZU ~ ~ ] = su ce. 12

13 so that there are two proxies for U, with 1 ; 2 6= and Corr[(U; 2 ) ; ( Y ; 1 ) ] =. Then Y = Y Y X + Y 1 1 and ( ; Y 1 ) may be fully identi ed from an IV regression of Y on (1; X ; 1 ) using instruments (1; Z ; 2 ) (see e.g. Blackburn and Neumark, 1992). e don t require such restrictions here, allowing for example, for components of (e.g. test taking skills) to be correlated. (Appendix A studies the case of multiple proxies for U that are components of X.) To illustrate the consequences of Theorem 3.1, consider the example from the Introduction with scalar U and. Observe that R Y:XjZ fully identi es under exogeneity. In this case, the IV regression bias disappears either because U does not determine Y, and in particular Y =, or because Z and U are uncorrelated, and thus R :XjZ =. Alternatively, shape restrictions on the e ects of U on Y and can fully identify. In particular, this occurs under signed proportional confounding, in which case the sign of the ratio Y of the average direct e ect of U on Y to that of U on is known and its magnitude equals a known constant jdj. For example, under equiconfounding, jdj = 1, and U directly a ects Y and equally on average. In this case, is fully identi ed under positive ( Y = ) or negative ( Y = ) equiconfounding by = R Y :XjZ or = R Y +:XjZ respectively (see Chalak, 212). More generally, U may be a vector of potential confounders. Often, to each confounder U h corresponds a proxy h = h +U h h so that = +U with = diag( 1 ; :::; m ). In this case, = R Y:XjZ R :XjZ = RY:XjZ P m h=1 Y;h h R h :XjZ: As before, under exogeneity = R Y:XjZ whereas under e.g. positive equiconfounding Y;h h = 1 for h = 1; :::; m and P = R m Y:XjZ h=1 R h :XjZ. Corollary 3.2 extends these full identi cation results to allow for a general matrix, with possibly equal to a known vector d of constants. However, it is useful throughout to keep in mind the leading one-to-one case where is a diagonal matrix with straightforward interpretation. e use subscripts to denote vector elements. For example, j and R Y:XjZ;j are the j th elements of and R Y:XjZ, and h and d h are the h th elements of and d, respectively. Corollary 3.2 Assume the conditions of Theorem 3.1 and let j = 1; :::; k. (i) If B j = (exogeneity) then j = R Y:XjZ;j. (ii) If = d (signed proportional confounding) then j = P m h=1 d hr h :XjZ;j. R Y:XjZ;j Thus, it su ces for exogeneity that Y = or R :XjZ =. In particular, if one fails to reject the null hypothesis R :XjZ;j = against the alternative R :XjZ;j 6=, say via a t-test in the scalar proxy case, then one cannot reject, under Theorem 3.1 s assumptions, that R Y:XjZ;j identi es j. Further, signed proportional confounding with known jd h j and sign(d h ); h = 1; :::; m, point identi es. 13

14 3.3 Partial Identi cation In the absence of conditions leading to full identi cation, magnitude and sign restrictions on the average direct e ects of U on Y and partially identify the elements of. To illustrate, consider the return to education example from the Introduction with scalar U and proxy (log(k )). As discussed above, exogeneity ( Y = ) and signed proportional confounding ( Y = d ) are limit cases securing full identi cation. Next, we derive sharp identi cation regions for the elements of under weaker sign and magnitude restrictions. In particular, we ask how does the average direct e ect Y average e ect of U on. of U on Y compares in magnitude and sign to the Suppose that Y 1 so that the magnitude of the average direct e ect of U on Y is not larger than that of U on. Here, is, on average, at least as directly responsive to U than Y is. Assume further that the sign of is known. For example, suppose that so that U a ects Y and on average in the same direction. Thus, in this case 2 D = [; 1]. Then the expression for gives the following identi cation regions for j, j = 1; :::; k, which depend on the sign of R :XjZ;j : j 2 B j ([; 1] jr :XjZ;j ) = [R Y:XjZ;j ; R Y:XjZ;j R :XjZ;j ]; j 2 B j ([; 1] j R :XjZ;j ) = [R Y:XjZ;j R :XjZ;j ; R Y:XjZ;j ]: Symmetrically, if we maintain the magnitude restriction 1 and assume that so that 2 D = [ 1; ], we obtain j 2 B j ([ 1; ] jr :XjZ;j ) = [R Y:XjZ;j + R :XjZ;j ; R Y:XjZ;j ]; j 2 B j ([ 1; ] j R :XjZ;j ) = [R Y:XjZ;j ; R Y:XjZ;j + R :XjZ;j ]: Instead, if 1 Y, so that is on average at most as directly responsive to U than Y is, and, so that U a ects Y and on average in the same direction, then 2 D = [1; +1) and we obtain the following identi cation regions for j, j = 1; :::; k: j 2 B j ([1; +1) jr :XjZ;j ) = [R Y:XjZ;j R :XjZ;j ; +1); j 2 B j ([1; +1) j R :XjZ;j ) = ( 1; R Y:XjZ;j R :XjZ;j ]: Note that these identi cation regions exclude the IV estimand R Y:XjZ;j. Symmetrically, if we assume that 1 and so that 2 D = ( 1; 1], we obtain j 2 B j (( 1; 1] jr :XjZ;j ) = ( 1; R Y:XjZ;j + R :XjZ;j ]; j 2 B j (( 1; 1] j R :XjZ;j ) = [R Y:XjZ;j + R :XjZ;j ; +1]: 14

15 ider intervals obtain under either magnitude or sign (but not both) restrictions on the average direct e ects Y and. In particular, if Y ; is on average at least as directly responsive as Y is to U, 2 D = [ 1; 1], and j is partially identi ed as follows: j 2 B j ([ 1; 1]) = [R Y:XjZ;j R:XjZ;j ; RY:XjZ;j + R:XjZ;j ]: Note that B j ([ 1; 1]) is twice as large as B j ([; 1] jsign(r :XjZ;j )) or B j ([ 1; ] jsign(r :XjZ;j )). Also, the closer Z is to exogeneity, the smaller R :XjZ;j is, and the tighter these three identi cation regions are. Alternatively, if Y, is, on average, less directly responsive to U than Y is, 2 D = ( 1; 1] S [1; +1), and j 2 B j (( 1; 1] S [1; +1)) = ( 1; R Y:XjZ;j R:XjZ;j ] S [RY:XjZ;j + R:XjZ;j ; +1). In this case, the farther Z is from exogeneity, the larger R:XjZ;j is, and the more informative B j (( 1; 1] jsign(r :XjZ;j )), B j ([1; +1) jsign(r :XjZ;j )), and B j (( 1; 1] S [1; +1)) are. Alone, sign restrictions determine the direction of the IV regression omitted variable bias. In particular, we have B j (( 1; ] j R :XjZ;j ) = B j ([; +1) j R :XjZ;j ) = [R Y:XjZ;j ; +1); B j (( 1; ] jr :XjZ;j ) = B j ([; +1) j R :XjZ;j ) = ( 1; R Y:XjZ;j ]: The above identi cation regions for j, under magnitude or sign restrictions on confounding (or both) with scalars U and, are sharp. Thus, any point in these regions is feasible under the maintained assumptions. In particular, given S.2 and the distribution of the observables M, for each element b of B j (D j sign(r :XjZ;j )) or B j (D), one can construct constants d Y and d (which, being constant, satisfy the conditions on Y and in Theorem 3.1) such that d Y d 2 D. For example, for R :XjZ;j 6=, it su ces to let d Y 1 d = R :XjZ;j (R Y:XjZ;j b). These identi cation regions obtain in part by asking how the average direct e ects Y and compare in magnitude. If this comparison is ambiguous, a researcher may be more con dent imposing a lower bound d L and upper bound d H on Y so that 2 D = [d L ; d H ]. In this case, similar sharp identi cation regions, involving d L R :XjZ;j and d H R :XjZ;j, derive, with exogeneity or signed proportional confounding as limit cases. Further, suppose more generally that U is a vector and there is a proxy h = h + U h h for each confounder U h, h = 1; :::; m, so that = diag( 1 ; :::; m ). Then P = R m Y:XjZ h=1 h R h :XjZ;j and magnitude and/or sign restrictions on h = Y;h h regions B j ( m h=1 D h j sign h=1;:::;m Y;h 2 D h, h = 1; :::; m, yield the sharp identi cation (R h :XjZ;j)) for j, j = 1; :::; k, de ned next. Corollary 3.3 derives sharp identi cation regions for a general matrix and restrictions 11 h 2 D h = [d L;h; ; d H;h ], h = 1; :::; m (we allow for d L;h = 1 and/or d H;h = +1 yielding 11 Here, we focus on interval restrictions. One can consider other types of restrictions on h. 15

16 identi cation regions that are either half open intervals or the real line). To facilitate the exposition in subsequent sections, we let A R :XjZ so that A jh R h :XjZ;j. Also, when some elements of R :XjZ;j have di erent signs, we partition these, without loss of generality, such that R h :XjZ;j for h = 1; :::; g and R h :XjZ;j for h = g + 1; :::; m. Otherwise, if all the elements of R :XjZ;j have common sign, we omit the irrelevant inequalities and sums. Corollary 3.3 Assume the conditions of Theorem 3.1 and that h 2 D h = [d L;h; ; d H;h ], h = 1; :::; m. Then, for j = 1; :::; k, j 2 B j ( m h=1 D h j sign (A jh )) de ned as follows, and these h=1;:::;m bounds are sharp: B j ([d L;1; ; d H;1 ] ::: [d L;m; ; d H;m ] ja j1 ; :::; A jg ; A j g+1 ; :::; A jm ) = [R Y:XjZ;j P g h=1 A jhd L;h P m h=g+1 A jhd H;h ; R Y:XjZ;j P g h=1 A jhd H;h P m h=g+1 A jhd L;h ]: Note that these sharp identi cation regions may but need not contain R Y:XjZ;j. Sharp identi cation regions under either magnitude or sign restrictions (but not both) on confounding derive by setting the vectors d L and d H suitably. Further, note that di erent potential instruments or proxies may lead to di erent identi cation regions for j, in which case j is identi ed in the intersection of these regions, provided it is nonempty. 3.4 Conditioning on Covariates e extend the results in Sections 3.2 and 3.3 to accommodate conditioning on covariates S. For this, for a generic random vector A with E(A) nite, let A(S) E(AjS) and ~ A(S) A A(S): For example, for s 2 S, (s) E(jS = s) denotes the average direct e ects of X on Y given covariates S = s. Further, for generic random vectors B and C of equal dimension with E[ C(S)( ~ A ~ (S); B ~ (S))jS = s] nite and E( C(S) ~ B ~ (S)jS = s) nonsingular, we de ne the conditional linear IV regression estimand and residual R A:BjC (s) E( ~ C(S) ~ B (S)jS = s) 1 E( ~ C(S) ~ A (S)jS = s) and A:BjC(s) ~ A (s) ~ B (s)r A:BjC (s); so that by construction E( C(S) ~ A:BjC (S)jS = s) =. For example, for k = ` we write R Y:XjZ (s) E( Z(S) ~ X ~ (S)jS = s) 1 E( Z(S) ~ Y ~ (S)jS = s). hen B = C, we obtain R A:B (s) R A:BjB (s) and A:B (s) A:BjB (s): This notation reduces to that de ned in Section 3.1 when S is degenerate, S = 1, in which case we leave S implicit. Theorem 3.4 Assume S.2 with ` = k and m = l, and that, for s 2 S, 16

17 (i) E( ~ Z(S) ~ X (S)jS = s) and (s) are nonsingular, (ii) Cov( Y ; ZjS = s) =, E( ~ (S)jZ; X; S = s) =, and E( ~ Y (S)jU; Z; S = s) =, (iii) Cov( ; ZjS = s) = and E( ~ (S)jU; Z; S = s) =. Let (s) 1 (s) Y (s) then (s) = R Y:XjZ (s) R :XjZ (s) (s): Here, B(s) R Y:XjZ (s) (s) = R:XjZ (s) (s) denotes the conditional IV regression bias in measuring (s). hen Z = X, Theorem 3.4 gives (s) = R Y:X (s) R :X (s) (s): Conditions (ii) and (iii) in Theorem 3.4 are implied by the stronger (unnecessary under linearity) condition (U ; U Y )? (U; Z; X)jS. In particular, the covariate-conditioned conditions (ii) and (iii) in Theorem 3.4 weaken their unconditional uncorrelation and mean independence analogs in Theorem 3.1. Further, conditioning on covariates S may render Z closer to exogeneity and the conditional IV regression bias smaller. Theorem 3.4 reduces to Theorem 3.1 when S is degenerate, S = 1: If the conditions in Theorem 3.4 hold for almost every s 2 S then an expression for derives by averaging the expression for (s) over the distribution of S. If, in addition, the conditional average e ects (S), (S), and Y (S) are constant, so that the mean independence assumptions involving,, and Y in Theorem 3.4 hold unconditionally, the law of iterated expectations gives = E( Z(S) ~ X ~ (S)) 1 E( Z(S) ~ Y ~ (S)) E( Z(S) ~ X ~ (S)) 1 E( Z(S) ~ ~ (S)) : This expression involves two estimands from IV regressions of ~ Y (S) and ~ (S) respectively on ~X(S) using instrument ~ Z(S). Further, if the conditional expectations Z(S), X(S), (S), and Y (S) are a ne functions of S, we obtain = E( Z:S X:S) 1 E( Z:S Y:S) E( Z:S X:S) 1 E( Z:S :S) : Using partitioned regressions (Frisch and augh, 1933), the two residual-based IV estimands in the above expression for can be recovered from R Y:(X ;S ) j(z ;S ) and R :(X ;S ) j(z ;S ) as the coe cients associated with ~ X (i.e. as the coe cients on X in IV regressions of Y and respectively on (1; X ; S ) using instruments (1; Z ; S ) ). From Theorem 3.4, we have that j (s) is fully identi ed under conditional exogeneity (B j (s) = ) or conditional signed proportional confounding ( (s) = d(s), a vector of known or estimable functions of s). Otherwise, the expression for (s) can be used, along with sign and magnitude restrictions on elements of (s) 1 (s) Y (s), involving the conditional average direct e ects of U on Y and given S = s, to partially identify j (s). This yields the sharp 17

18 identi cation regions B j ( m h=1 D h(s) j sign (A jh (s))) for j (s) de ned as in Corollary 3.3 with h=1;:::;m D h (s) [d L;h (s); d H;h (s)] where d L;h (s) and d H;h (s) are known or estimable functions of s for h = 1; :::; m and with R Y:XjZ (s) replacing R Y:XjZ and A(s) R :XjZ (s) replacing A. Appendix A contains extensions of the results in Section 3 on identi cation of average coe cients under magnitude and sign restrictions on confounding. Section A.1 studies a panel structure with individual and time varying random coe cients without requiring xed e ects. Section A.2 studies cases where the proxies are a component X 1 of X, included in the Y equation. 4 Identi cation of Average Nonparametric E ects e extend the analysis in Section 3 by removing the linearity assumption S.2 and studying the identi cation of average nonparametric e ects, with Y = r(x; S; U; U Y ) as speci ed in S.1. As discussed in Section 2.1, here we study the identi cation of the conditional average direct e ect of X on Y at (x; x ) given X = x and S = s: (x; x jx ; s) E[r(x ; s; U; U Y ) r(x; s; U; U Y )jx = x ; S = s]; such as the conditional average e ect of the treatment on the treated (; 1j1; s) as well as the conditional average direct marginal e ect X on Y at x given X = x and S = s: (xjx; r(x; s; U; U Y )jx = x; S = s]; where we set k = 1 to label the cause of interest by X. This also enables studying the identi cation of averages of (x; x jx ; s) and (xjx; s) such as the average direct e ect of X on Y at (x; x ) given X = x, (x; x jx ) = E[ (x; x jx ; S)jX = x ], and the conditional average marginal e ect (s) = E[ (XjX; s)js = s]. In studying the identi cation of (x; x jx ; s) and (xjx; s), we nd it useful to employ the following notation for the di erence and derivative of a nonparametric regression. Speci cally, for generic random vectors A and B with E(A) nite and b and b in the support of B given covariates S = s, we let R N A:B(b; b ; s) E(A jb = b ; S = s) E(A jb = b; S = s): Further, when B is a scalar and the derivative exists, we de ne R N E(A jb = b; S = s): 18

19 4.1 Additive Separability e begin our analysis by studying the case in which U enters r separably. In particular, we impose the following assumption. e remove separability in Section 4.2. Assumption 3 (S.3) Additive Separability: Assume S.1 with lx Y = r(x; S; U; U Y ) = r(x; S; U Y ) + U g r g (S; U Y ) r(x; S; U Y ) + U Y ; g=1 and generated as in S.2: = + U ; and adjust the random coe cients ( ; vec( ) ; Y ) correspondingly. m1 lm l1 Under S.3, when Y = and U Y? XjS, we obtain the nonparametric speci cation for the Y equation, imposed e.g. in Altonji and Matzkin (25), Hoderlein and Mammen (27), and Imbens and Newey 12 (29), in which case certain average e ects of X on Y are point identi ed. Next, we maintain that U Y? XjS = s but allow U to freely (conditionally) depend on X, with Y possibly nonzero. e then employ proxies to fully or partially identify average e ects of X on Y under magnitude and sign restrictions on confounding. For s 2 S and x; x 2 X, note that S.3 and U Y? XjS = s give that (x; x jx ; s) = E[r(x ; s; U Y ) r(x; s; U Y )js = s] (x; x js): and (xjx; s) r(x; s; U Y )js = s] (xjs): Theorem 4.1 gives conditions under which (x; x js) and (xjs) depend on the unknown (s) 1 (s) Y (s), involving the conditional average direct e ects of U on Y and. Theorem 4.1 Assume S.3 with m = l and that, for s 2 S, (i) (s) is nonsingular, (ii) U Y? XjS = s and E( ~ Y (S)jU; X; S = s) =, (iii) E(~ (S)jX; S = s) = and E( ~ (S)jU; X; S = s) =. Let (s) 1 (s) Y (s). Then for x; x 2 X, (x; x js) = R N Y:X(x; x ; s) R N :X(x; x ; s) (s): 12 Similar to Imbens and Newey (29), one can consider covariates S 2 and a scalar unobserved S 1 recoverable from the choice equation X = q(z; S 2 ; S 1 ) with q monotonic in S 1, such that (U Y ; S 1 )? Z js 2, yielding U Y? XjS with S = (S 1 ; S 2). e allow but do not require this possibility. 19

20 Set k = 1 and suppose further E( jx = x; S = s) exists and is nite and (iv.b) for all x y nonempty open neighborhood r(xy ; s; u y ) exists for a.e N (x) X, a u y, and there is a function s (U Y ) with E( s (U Y )js = s) < 1 r(xy ; s; u y ) < s (u y ) for a.e. u y. Then (xjs) = RY:X(x; N s) R:X(x; N s) (s): Conditions (ii) and (iii) of Theorem 4.1 strengthen their counterparts in Theorem 3.4 with Z = X. They are implied by the stronger condition 14 (U ; U Y )? (U; X)jS. Note that (ii) allows for nonzero Y and thus weakens the assumption U Y? XjS with Y = often employed in the literature. Condition (iii) ensures that is an informative proxy so that the mean dependence of on X given S = s arises solely due to U. Condition (iv) ensures that derivatives exist E[r(x; s; U Y )js = s] r(x; s; U Y )js = s] (see e.g. hite and Chalak, 213). The conditional nonparametric regression biases for (x; x js) and (xjs) are B(x; x js) = R N :X(x; x ; s) (s) and B(xjs) = R N :X(x; s) (s): As in the linear case, (x; x js) and (xjs) are fully identi ed under conditional exogeneity or signed proportional confounding. In particular, if B(x; x js) = or B(xjs) = (conditional exogeneity) then (x; x js) = R N Y:X (x; x ; s) or (xjs) = R N Y:X (x; s). This obtains e.g. if Y (s) = or U is conditionally mean independent of X, E( ~ U(S)jX; S = s) =. Alternatively, if (s) = d(s) with d h (s), h = 1; :::; m, known or estimable (conditional signed proportional confounding) then (x; x js) = R N Y:X(x; x ; s) P m h=1 RN h :X(x; x ; s)d h (s) and (xjs) = R N Y:X(x; s) P m h=1 RN h :X(x; s)d h (s): From Theorem 4.1, observe that another avenue for full identi cation of (x; x js) or (xjs) is to impose m restrictions on (; js) or (js). For example, if m = l = 1 and one assumes that (x y ; x z js) = for x y ; x z 2 X, as occurs e.g. if a nondegenerate component of X is excluded from r and thus the Y equation, with R:X N (xy ; x z ; s) 6= then (s) = RN Y:X (xy ;x z ;s) and R:X N (xy ;x z ;s) (x; x js) is thus fully identi ed. Analogous restrictions fully identify (xjs). e don t require such restrictions. 13 In this context, the quali er almost every (a.e.) means that the condition can fail for u y belonging to a measurable set V having P [U Y 2 V j S = s] = (see e.g. hite and Chalak, 213). 14 Had U been observed, U Y? (U; X)jS would su ce to identify the average e ects of X on Y, as would (U; U Y )? XjS with U unobserved. Here U is unobserved and can depend on X. 2

Identification of Average Effects under Magnitude and Sign Restrictions on Confounding

Identification of Average Effects under Magnitude and Sign Restrictions on Confounding Karim Chalak University of Virginia July 19, 2015 Abstract This paper studies measuring the average effects of X on