Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments

Similar documents
Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments

A Note on the Closed-form Identi cation of Regression Models with a Mismeasured Binary Regressor

Nonparametric Identi cation and Estimation of Nonclassical Errors-in-Variables Models Without Additional Information

Nonparametric Identi cation of a Binary Random Factor in Cross Section Data

Nonparametric Identication of a Binary Random Factor in Cross Section Data and

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Lecture Notes on Measurement Error

Made available courtesy of Elsevier:

Identification of Regression Models with Misclassified and Endogenous Binary Regressor

NONPARAMETRIC IDENTIFICATION AND ESTIMATION OF NONCLASSICAL ERRORS-IN-VARIABLES MODELS WITHOUT ADDITIONAL INFORMATION

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION

13 Endogeneity and Nonparametric IV

Identification and Estimation of Regression Models with Misclassification

Identifying the Returns to Lying When the Truth is Unobserved

A Local Generalized Method of Moments Estimator

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Simple Estimators for Monotone Index Models

The Asymptotic Variance of Semi-parametric Estimators with Generated Regressors

Control Functions in Nonseparable Simultaneous Equations Models 1

Testing for Regime Switching: A Comment

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Simple Estimators for Semiparametric Multinomial Choice Models

ECONOMETRICS II (ECO 2401) Victor Aguirregabiria. Spring 2018 TOPIC 4: INTRODUCTION TO THE EVALUATION OF TREATMENT EFFECTS

Notes on Generalized Method of Moments Estimation

The Econometrics of Unobservables: Applications of Measurement Error Models in Empirical Industrial Organization and Labor Economics

Regressor Dimension Reduction with Economic Constraints: The Example of Demand Systems with Many Goods

Speci cation of Conditional Expectation Functions

11. Bootstrap Methods

Closed-Form Estimation of Nonparametric Models. with Non-Classical Measurement Errors

Flexible Estimation of Treatment Effect Parameters

A Course on Advanced Econometrics

Pseudo panels and repeated cross-sections

econstor Make Your Publications Visible.

Rank Estimation of Partially Linear Index Models

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Econ 273B Advanced Econometrics Spring

A New Approach to Robust Inference in Cointegration

Economics 241B Estimation with Instruments

Estimation of a Local-Aggregate Network Model with. Sampled Networks

Identi cation and Inference of Nonlinear Models Using Two Samples with Arbitrary Measurement Errors

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Introduction: structural econometrics. Jean-Marc Robin

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University

Chapter 2. Dynamic panel data models

Nonparametric Estimation of Wages and Labor Force Participation

Robust Con dence Intervals in Nonlinear Regression under Weak Identi cation

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

When is it really justifiable to ignore explanatory variable endogeneity in a regression model?

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

Estimation of Average Treatment Effects With Misclassification

Estimation and Inference with Weak Identi cation

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS

Environmental Econometrics

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Instrumental Variables. Ethan Kaplan

A Simple Estimator for Binary Choice Models With Endogenous Regressors

Unconditional Quantile Regressions

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

A simple alternative to the linear probability model for binary choice models with endogenous regressors

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics

LECTURE 13: TIME SERIES I

Identification through Heteroscedasticity: What If We Have the Wrong Form of Heteroscedasticity?

Testing Weak Convergence Based on HAR Covariance Matrix Estimators

Conditions for the Existence of Control Functions in Nonseparable Simultaneous Equations Models 1

1 A Non-technical Introduction to Regression

Chapter 2. GMM: Estimating Rational Expectations Models

Chapter 1. GMM: Basic Concepts

Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON

A Simple Estimator for Binary Choice Models With Endogenous Regressors

Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models

An Overview of the Special Regressor Method

Introduction to the Simultaneous Equations Model

Instrumental variables estimation using heteroskedasticity-based instruments

Trimming for Bounds on Treatment Effects with Missing Outcomes *

Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels.

A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters. By Tiemen Woutersen. Draft, September

Single-Equation GMM: Endogeneity Bias

Introduction to Econometrics

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

Economics 620, Lecture 13: Time Series I

Non-parametric Identi cation and Testable Implications of the Roy Model

Comparing groups using predicted probabilities

A Conditional-Heteroskedasticity-Robust Con dence Interval for the Autoregressive Parameter

Nonparametric Identification and Semiparametric Estimation of Classical Measurement Error Models Without Side Information

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Online publication date: 30 April 2010 PLEASE SCROLL DOWN FOR ARTICLE

A CONSISTENT SPECIFICATION TEST FOR MODELS DEFINED BY CONDITIONAL MOMENT RESTRICTIONS. Manuel A. Domínguez and Ignacio N. Lobato 1

NBER WORKING PAPER SERIES NONPARAMETRIC IDENTIFICATION OF MULTINOMIAL CHOICE DEMAND MODELS WITH HETEROGENEOUS CONSUMERS

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models

The relationship between treatment parameters within a latent variable framework

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Very simple marginal effects in some discrete choice models *

Economics 620, Lecture 15: Introduction to the Simultaneous Equations Model

Parametric Identification of Multiplicative Exponential Heteroskedasticity

Transcription:

Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments Xiaohong Chen Yale University Yingyao Hu y Johns Hopkins University Arthur Lewbel z Boston College First version November 2006; Revised February 2008. Abstract We observe a dependent variable and some regressors, including a mismeasured binary regressor. We provide identi cation of the nonparametric regression model containing this misclassi ed dichotomous regressor. We obtain identi cation without parameterizations or instruments, by assuming the model error isn t skewed. Keywords misclassi cation error; identi cation; nonparametric regression; JEL codes C14, C20 Department of Economics, Yale University, Box 208281, New Haven, CT 06520-8281, USA. Tel 203-432-5852. Email xiaohong.chen@yale.edu. y Department of Economics, Johns Hopkins University, 440 Mergenthaler Hall, 3400 N. Charles Street, Baltimore, MD 21218, USA. Tel 410-516-7610. Email yhu@jhu.edu. z Department of Economics, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA. Tel 617-522-3678. Email lewbel@bc.edu. 1

1 Motivation We provide identi cation of a nonparametric regression model with a dichotomous regressor subject to misclassi cation error. The available sample information consists of a dependent variable and a set of regressors, one of which is binary and error-ridden with misclassi cation error that has unknown distribution. Our identi cation strategy does not parameterize any regression or distribution functions, and does not require additional sample information such as instrumental variables, repeated measurements, or an auxiliary sample. Our main identifying assumption is that the regression model error has zero conditional third moment. The results include a closed-form solution for the unknown distributions and the regression function Dichotomous (binary) variables, such as union status, smoking behavior, and having a college degree or not, are involved in many economic models. Measurement errors in dichotomous variables take the form of misclassi cation errors, i.e., some observations where the variable is actually a one may be misclassi ed as a zero, and vice versa. A common source of misclassi cation errors is self reporting, where people may have psychological or economic incentives to misreport dichotomous variables (see Bound, Brown, and Mathiowetz (2001) for a survey). Misclassi cation may also arise from ordinary coding or reporting errors, e.g., Kane, Rouse, and Staiger (1999) report substantial classi cation errors in both self reports and transcript reports of educational attainment. Unlike ordinary mismeasured regressors, misclassi ed regressors cannot possess the properties of classically mismeasured variables, in particular, classi cation errors are not independent of the underlying true regressor, and are in general not mean zero. As with ordinary mismeasured regressors, estimated regressions with a misclassi ed regressor are inconsistent, and the latent true regression model based just on conditionally mean zero model errors is generally not identi ed in the presence of a misclassi ed regressor. To identify the latent model, we must either impose additional assumptions or possess additional sample information. One popular additional assumption is to assume the measurement error distribution belong to some parametric family; Additional sample information often used to obtain identi cation includes an instrumental variable or a repeated measurement in the same sample, or a secondary sample. See, e.g., Carroll, Ruppert, Stefanski and Crainiceanu (2006), and Chen, Hong and Nekipelov (2007) for detailed recent reviews on existing approaches to measurement error problems. 2

In this note we obtain identi cation without parameterizing errors and without auxiliary information like instrumental variables, repeated measurements, or a secondary sample. A related result is Chen, Hu, and Lewbel (2007). We show here that, given some mild regularity conditions, a nonparametric mean regression with a misclassi ed binary regressor is identi ed (and can be solved in closed form) if the latent regression error has zero conditional third moment, as would be the case if the regression error were symmetric. We also brie y discuss how simple estimators might be constructed based on our identi cation method. 2 Identi cation We are interested in a regression model as follows Y = m (X ; W ) + ; E (jx ; W ) = 0 (2.1) where Y is the dependent variable, X 2 X = f0; 1g is the dichotomous regressor subject to misclassi cation error, and W is an error-free covariate vector. We are interested in the nonparametric identi cation of the regression function m(). The regression error need not be independent of the regressors X and W, so we have conditional density functions f Y jx ;W (yjx ; w) = f jx ;W (y m(x ; w)jx ; w) (2.2) In a random sample, we observe (X; Y; W ) 2 X YW, where X is a proxy or a mismeasured version of X. We assume Assumption 2.1 f Y jx ;W;X(yjx ; w; x) = f Y jx ;W (yjx ; w) for all (x; x ; y; w) 2 X X Y W This assumption implies that the measurement error in X is independent of the dependent variable Y conditional on the true value X and the covariate W, and so X is independent of the regression error conditional on X and W. This is analogous to the classical measurement error assumption of having the measurement error independent of the regression model error. This assumption may be problematic in applications where the same individual who provides the source of misclassi cation by supplying X also helps determine the outcome Y, however, this is a standard assumption in the literature of mismeasured and misclassi ed regressors. See, e.g., Li (2002), Schennach (2004), Mahajan (2006), Lewbel (2007a) and Hu (2006). 3

By construction, the relationship between the observed density and the latent ones are as follows f Y jx;w (yjx; w) = X f Y jx ;W;X(yjx ; w; x)f X jx;w (x jx; w) x = X f jx ;W (y m(x ; w)jx ; w) f X jx;w (x jx; w) (2.3) x Using the fact that X and X are 0-1 dichotomous, de ne the following simplifying notation m 0 (w) = m (0; w), m 1 (w) = m (1; w), 0 (w) = E(Y jx = 0; w), 1 (w) = E(Y jx = 1; w), p (w) = f X jx;w (1j0; w), and q (w) = f X jx;w (0j1; w) Equation (2.3) is then equivalent to fy jx;w (yj0; w) f Y jx;w (yj1; w) = 1 p (w) p (w) q (w) 1 q (w) Since f jx ;W has zero mean, we obtain fjx ;W (y m 0 (w) j0; w) f jx ;W (y m 1 (w) j1; w). (2.4) 0 (w) = (1 p (w)) m 0 (w) + p (w) m 1 (w) and 1 (w) = q (w) m 0 (w) + (1 q (w)) m 1 (w) Assume Assumption 2.2 m 1 (w) 6= m 0 (w) for all w 2 W (2.5) This assumption means that X has a nonzero e ect on the conditional mean of Y, and so is a relevant explanatory variable, given W. We may now solve equation (2.5) for p(w) and q(w), yielding p (w) = 0 (w) m 0 (w) m 1 (w) m 0 (w) and q (w) = m 1 (w) 1 (w) m 1 (w) m 0 (w) Without loss of generality, we assume, Assumption 2.3 for all w 2 W, (i) 1 (w) > 0 (w); (ii) p (w) + q (w) < 1 Assumption 2.3(i) is not restrictive because one can always rede ne X as 1 (2.6) X if needed. Assumption 2.3(ii) implies that the ordering of m 1 (w) and m 0 (w) is the same as that of 1 (w) and 0 (w) because 1 p (w) q (w) = 1 (w) 0 (w) m 1 (w) m 0 The intuition of assumption 2.3(ii) (w) is that the total misclassi cation probability is not too large so that 1 (w) > 0 (w) implies m 1 (w) > m 0 (w) (see, e.g., Lewbel 2007a) for a further discussion of this assumption). In summary, we have m 1 (w) 1 (w) > 0 (w) m 0 (w) 4

1 p (w) p (w) The condition p (w) + q (w) 6= 1 also guarantees that the matrix q (w) 1 q (w) in equation (2.4) is invertible. If we then plug into equation (2.4) the expressions for p (w) and q (w) in equation (2.6), we obtain for j = 0; 1 f jx ;W (y m j (w) jj; w) = 1 (w) m j (w) 1 (w) 0 (w) f Y jx;w (yj0; w) + m j (w) 0 (w) 1 (w) 0 (w) f Y jx;w (yj1; w) (2.7) Equation (2.7) is our vehicle for identi cation. Given any information about the distribution of the regression error, equation (2.7) provides the link between that information and the unknowns m 0 (w) and m 1 (w), along with the observable density f Y jx;w and observable conditional means 0 (w) and 1 (w). The speci c assumption about that we use to obtain identi cation is this Assumption 2.4 E ( 3 jx ; W ) = 0 A su cient though much stronger than necessary condition for this assumption to hold is that f jx ;W be symmetric for each x 2 X and w 2 W. Notice that the regression model error need not be independent of the regressors X ; W, and in particular our assumptions permit to have heteroskedasticity of completely unknown form. Let denote the characteristic function and Z jx =j;w (t) = e it f jx ;W (jj; w)d; Z Y jx=j;w (t) = e ity f Y jx;w (yjj; w)dy Then equation (2.7) implies that for any real t ln e itmj(w) jx =j;w (t) 1 (w) m j (w) = ln 1 (w) 0 (w) Y jx=0;w(t) + m j (w) 0 (w) 1 (w) 0 (w) Y jx=1;w(t) (2.8) Notice that @ 3 @t ln 3 eitm j(w) jx =j;w (t) Assumption 2.4 therefore implies that for j = 0; 1 = @3 t=0 @t ln 3 jx =j;w (t) t=0 = ie 3 jx = j; W = w G (m j (w)) = 0; (2.9) 5

where G (z) i @3 @t ln 1 (w) z 3 1 (w) 0 (w) Y jx=0;w(t) + z 0 (w) 1 (w) 0 (w) Y jx=1;w(t) t=0 This equation shows that the unknowns m 0 (w) and m 1 (w) are two roots of the cubic function G () in equation (2.9). Suppose the three roots of this equation are r a (w) r b (w) r c (w). In fact, we have r a (w) m 0 (w) 0 (w) < 1 (w) m 1 (w) r c (w) ; which implies bounds on m 0 (w) and m 1 (w). To obtain point identi cation of m j (w), we need to be able to uniquely de ne which roots of the cubic function G () correspond to m 0 (w) and m 1 (w). This is provided by the following assumption. Assumption 2.5 Assume E (Y 0 (w)) 3 jx = 0; W = w 0 E (Y 1 (w)) 3 jx = 1; W = w and, when an equality with X = j holds, assume dg(z) dz It follows from Assumption 2.5 that > 0 z=j (w) r a (w) 0 (w) < r b (w) < 1 (w) r c (w) Since m 0 (w) 0 (w) < 1 (w) m 1 (w), we then have point identi cation by m 0 (w) = r a (w) and m 1 (w) = r c (w). Note that Assumption 2.5 is directly testable from the data. Based on the de nition of skewness of a distribution and 0 (w) > E(Y jw = w) > 1 (w), assumption 2.5 implies that the distributions f Y jx;w (yj1; w) and f Y jx;w (yj0; w) are skewed towards the unconditional mean E(Y jw = w) compared with each conditional mean. An analogous result is Lewbel (1997), who obtains identi cation in a classical measurement error context without auxiliary data exploiting skewness in a di erent way. E[(Y Notice that assumption 2.4 implies that E[(Y m 1 (w)) 3 jx = 1; W = w] = 0. Assumption 2.5 then implies m 0 (w)) 3 jx = 0; W = w] = 0 and E (Y 0 (w)) 3 jx = 0; W = w E (Y m 0 (w)) 3 jx = 0; W = w and E (Y 1 (w)) 3 jx = 1; W = w E (Y m 1 (w)) 3 jx = 1; W = w 6

The third moments on the left-hand sides are observed from the data and the right-hand sides contain the latent third moments. We may treat the third moments E[ Y j (w) 3 jx = j; W = w] as a naive estimator of the true moments E[(Y m j (w)) 3 jx = j; W = w]. Assumption 2.4 implies that the latent third moments are known to be zero. Assumption 2.5 implies that the sign of the bias of the naive estimator is di erent in two subsamples corresponding to X = 0 and X = 1. We leave the detailed proof to the appendix and summarize the result as follows Theorem 2.1 Suppose that assumptions 2.1-2.5 hold in equation (2.1). Then, the density f Y;X;W uniquely determines f Y jx ;W and f X ;X;W. Identi cation of the distributions f Y jx ;W and f X ;X;W by Theorem 2.1 immediately implies that the regression function m (X ; W ), the conditional distribution of the regression error, f jx ;W, and the conditional distribution of the misclassi cation error (the di erence between X and X ) are all identi ed. 3 Conclusions and Possible Estimators We have shown that a nonparametric regression model containing a dichotomous misclassi- ed regressor can be identi ed without any auxiliary data like instruments, repeated measurements, or a secondary sample (such as validation data), and without any parametric restrictions. The only identifying assumptions are some regularity conditions and the assumption that the regression model error has zero conditional skewness. We have focused on identi cation, so we conclude by brie y describing how estimators might be constructed based on our identi cation method. One possibility would be to substitute consistent estimators of the conditional means j (w) and characteristic functions Y jx;w (t) into equation (2.9), and solve the resulting cubic equation for estimates of m j (w). Another possibility is to observe that, based on the proof of our main theorem, the identifying equations can be written in terms of conditional mean zero expectations as E( Y j (w) I (X = j) jw = w) = 0; E( Y 2 j (w) I (X = j) jw = w) = 0; E E( Y 3 j (w) I (X = j) jw = w) = 0; 2m j (W ) 3 3 1(w) 0 (w) m 1 (w) 0 (w) j(w ) 2 3 0 (w) 1 (w) 3 1 (w) 0 (w)+ 0 (w) 1 (w) m 1 (w) 0 (w) j (W ) + 1 (w) 0(w) 0 (w) 1 (w) jw = w 1 (w) 0 (w)! = 0 7

See the Appendix, particularly equation (A.6). We might then apply Ai and Chen (2003) to these conditional moments to obtain sieve estimates of m j (w), j (w), j (w), and j (w). Alternatively, the local GMM estimator of Lewbel (2007b) could be employed. If w is discrete or empty, or if these functions of w are nitely parameterized, then these estimators could be reduced to ordinary GMM. References [1] Ai, C. and X. Chen (2003), "E cient Estimation of Models With Conditional Moment Restrictions Containing Unknown Functions," Econometrica, 71, 1795-1844. [2] Bound, J., C. Brown, and N. Mathiowetz (2001) Measurement Error in Survey Data, in Handbook of Econometrics, vol. 5, ed. by J. J.Heckman and E. Leamer, Elsevier Science. [3] Carroll, R., D. Ruppert, L. Stefanski and C. Crainiceanu (2006) Measurement Error in Nonlinear Models A Modern Perspective, Second Edition. CRI, New York. [4] Chen, X., H. Hong, and D. Nekipelov (2007) Measurement Error Models, survey prepared for Journal of Economic Literature. [5] Chen, X., Hu, Y., and A. Lewbel (2007) Nonparametric Identi cation and Estimation of Nonclassical Errors-in-Variables Models Without Additional Information, unpublished manuscript. [6] Hu, Y. (2006) Identi cation and Estimation of Nonlinear Models with Misclassi - cation Error Using Instrumental Variables, Working Paper, University of Texas at Austin. [7] Kane, T. J., C. E. Rouse, and D. Staiger, (1999) "Estimating Returns to Schooling When Schooling is Misreported," NBER working paper #7235. [8] Lewbel, A. (1997) "Constructing Instruments for Regressions With Measurement Error When No Additional Data are Available, With an Application to Patents and R&D," Econometrica, 65, 1201-1213. [9] Lewbel, A., (2007a) Estimation of average treatment e ects with misclassi cation, Econometrica, 2007, 75, 537-551. 8

[10] Lewbel, A., (2007b) "A Local Generalized Method of Moments Estimator," Economics Letters, 2007, 94, 124-128. [11] Li, T., (2002) Robust and consistent estimation of nonlinear errors-in-variables models, Journal of Econometrics, 110, pp. 1-26. [12] Mahajan, A. (2006) Identi cation and estimation of regression models with misclassi cation, Econometrica, vol. 74, pp. 631-665. [13] Schennach, S. (2004), Estimation of Nonlinear Models with Measurement Error, Econometrica, 72, 33-75 4 Appendix Proof. (Theorem 2.1) Frist, we introduce notations as follows for j = 0; 1; m j (w) = m (j; w) ; j (w) = E(Y jx = j; W = w); p (w) = f X jx;w (1j0; w) ; q (w) = f X jx;w (0j1; w) ; j (w) = E(Y 2 jx = j; W = w), and j (w) = E(Y 3 jx = j; W = w) We start the proof with equation (2.3), which is equivalent to fy jx;w (yj0; w) 1 p (w) p (w) = f Y jx;w (yj1; w) q (w) 1 q (w) fjx ;W (y m 0 (w) j0; w) f jx ;W (y m 1 (w) j1; w) Assumption 2.4 implies that f jx ;W has zero mean. Therefore, we have 0 (w) = (1 p (w)) m 0 (w) + p (w) m 1 (w) ; 1 (w) = q (w) m 0 (w) + (1 q (w)) m 1 (w) By assumption 2.2, we may solve for p (w) and q (w) as follows (A.1) p (w) = 0 (w) m 0 (w) m 1 (w) m 0 (w) and q (w) = m 1 (w) 1 (w) m 1 (w) m 0 (w) (A.2) We also have 1 p (w) q (w) = 1 (w) 0 (w) m 1 (w) m 0 As discussed before, assumption 2.3 implies (w) that m 1 (w) 1 (w) > 0 (w) m 0 (w) and fjx ;W (y m 0 (w) j0; w) f jx ;W (y m 1 (w) j1; w) = 1 1 p (w) q (w) 1 q (w) p (w) q (w) 1 p (w) Plug-in the expression of p (w) and q (w) in equation (A.2), we have fy jx;w (yj0; w) f Y jx;w (yj1; w) f jx ;W (y m j (w) jj; w) = 1 (w) m j (w) 1 (w) 0 (w) f Y jx;w (yj0; w) + m j (w) 0 (w) 1 (w) 0 (w) f Y jx;w (yj1; w) (A.3) 9

Let denote the characteristic function, jx =j;w (t) = R e it f jx ;W (jj; w)d, and Y jx=j;w (t) = R e ity f Y jx;w (yjj; w)dy Equation (A.3) implies that for any real t e itmj(w) jx =j;w (t) = 1 (w) m j (w) 1 (w) 0 (w) Y jx=0;w(t) + m j (w) 0 (w) 1 (w) 0 (w) Y jx=1;w(t) We then consider the log transform ln e itmj(w) jx =j;w (t) 1 (w) m j (w) = ln 1 (w) 0 (w) Y jx=0;w(t) + m j (w) 0 (w) 1 (w) 0 (w) Y jx=1;w(t) (A.4) Assumption 2.4 implies that for j = 0; 1 0 = i @3 @t ln 1 (w) m j (w) 3 1 (w) 0 (w) Y jx=0;w(t) + m j (w) 0 (w) 1 (w) 0 (w) Y jx=1;w(t) (A.5) t=0 When t = 0, we have Y jx;w (0) = 1, @ 3 @t 3 Y jx=j (0) = @ @ @t Y jx=j;w(0) = i j ; 2 @t 2 Y jx=j;w (0) = i j Furthermore, equation (A.5) implies G (m j ) = 0; where j ;and G (z) 2z 3 3 1 (w) 0 (w) 1 (w) 0 (w) z2 (A.6) 3 0 (w) 1 (w) 3 1 (w) 0 (w) + 0 (w) 1 (w) 1 (w) 0 (w) z + 1 (w) 0 (w) 0 (w) 1 (w) 1 (w) 0 (w) This cubic equation has two real roots m 0 (w) and m 1 (w), and has all real coe cients. Therefore its third root is also real. Suppose the three roots are r a (w) r b (w) r c (w) for each given w. Since m 0 (w) 6= m 1 (w), we will never have r a (w) = r b (w) = r c (w). If the second largest of the three roots is between 0 (w) and 1 (w), i.e., 0 (w) < r b (w) < 1 (w), then we know the largest root r c (w) equals m 1 (w) and the smallest root r a (w) equals m 0 (w) because m 1 (w) 1 (w) > 0 (w) m 0 (w). Given the shape of the cubic function, we know 8 < 0 if z < r a (w) >< > 0 if r G (z) a (w) < z < r b (w) < 0 if r b (w) < z < r c (w) > > 0 if r c (w) < z That means the second largest of the three roots r b is between 0 (w) and 1 (w) if G ( 1 (w)) < 0 and G ( 0 (w)) > 0 It is tedious but straightforward to show that G ( 1 (w)) = E[(Y 1 (w)) 3 jx = 1; W = w] and G ( 0 (w)) = E[(Y 0 (w)) 3 jx = 0; W = w]. Therefore, assumption 2.5 implies that G ( 1 (w)) 0 and G ( 0 (w)) 0. Given the graph of the cubic function G (), if G ( 1 (w)) < 0, then 1 (w) < m 1 (w) implies that m 1 (w) equals the largest of the three roots. When G ( 0 (w)) > 0, then m 0 (w) < 0 (w) implies that m 0 (w) equals the smallest of the three roots. 10

In case E (Y 1 (w)) 3 jx = 1; W = w = 0, i.e., G ( 1 (w)) = 0, 1 (w) is a root of G (). Given the graph of the cubic function G (), we know 8 < That means the condition dg(z) dz dg (z) dz z=1 (w) 0 at z = r a (w) 0 at z = r b (w) 0 at z = r c (w) > 0 guarantees that 1 (w) is the largest root and equal to m 1 (w). If E (Y 0 (w)) 3 jx = 0; W = w = 0, i.e. G ( 0 (w)) = 0, 0 (w) is a root of G (). The condition dg(z) dz > 0 guarantees that 0 (w) is the smallest root z=0 (w) and equal to m 0 (w). In summary, assumption 2.5 guarantees that m 0 (w) and m 1 (w) can be identi ed out of the three directly estimable roots. After we have identi ed m 0 (w) and m 1 (w), p (w) and q (w) (or f X jx;w ) are identi ed from equation (A.2), and the density f jx ;W (or f Y jx ;W ) is also identi ed from equation (A.3). Since X and W are observed in the data, identi cation of f X jx;w implies that of f X ;X;W. Thus, we have identi ed the latent densities f Y jx ;W and f X ;X;W from the observed density f Y;X;W under assumptions 2.1-2.5. 11