Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Zhengyu Zhang School of Economics Shanghai University of Finance and Economics zy.zhang@mail.shufe.edu.cn This Version: April 2013 Abstract This article introduces a new identification and estimation strategy for partially linear regression models with a general form of unknown heteroscedasticity, i.e., Y = X β 0 + m(z + U and U = σ(x, Z, where is independent of (X, Z, the functional form of both m( and σ( are left unspecified. We show that in such a model, β 0 and m( can be exactly identified while the scale function σ( can be identified up to scale as long as σ(x, Z permits sufficient nonlinearity in X. A two-stage estimation procedure motivated by the identification strategy is described and its large sample properties are formally established. Moreover, our strategy is flexible enough to allow for various degrees of censoring in the dependent variable. Simulation results show that the proposed estimator performs reasonably well in finite samples. JEL Classification: C13; C14; C24; Keywords: Censored regression model; Partially linear regression; Heteroscedasticity; Quantile regression;

1 Introduction Regression models with a partially linear structure have received a great deal of attention in both econometrics and statistics literature. Its popularity stems from its flexible specification, which allows for some variables to be linearly related to the response variable without imposing stringent restrictions on variables whose relationship to the response variable may be difficult to parameterize. On the other hand, while fully nonparametric modeling could reduce the possibility of misspecification with greater robustness, the curse of dimensionality occurs as the number of independent variables increases. As an intermediate strategy, the partially linear specification minimizes the curse of dimensionality by allowing flexibility with respect to a small number of covariates while restricting other covariates to enter through a linear-index structure. A baseline partially linear regression model can be expressed as Y = X β 0 + m(z + U, (1.1 where Y is the (latent scalar dependent variable, (X, Z R p X+p Z is the vector of covariates, β 0 is a p X 1 vector to be estimated, the functional form of m( is unknown and U is the unobserved error term. The literature on estimation of (1.1 has been rather extensive. Several approaches have been proposed, including Powell (1987 s pairwise difference estimator, Robinson (1988 s partitioned regression estimator, Lee (2003 s average quantile regression estimator, to name only a few. 1 Despite abundance of this literature, the literature on limited dependent variable versions of this model has been comparatively limited. For the case in which the error distribution is parametrically specified, Honore and Powell (2005, Aradillas-Lopez et al. (2007 showed how the pairwise difference approach can be extended to partially linear versions of several limited dependent variable models (e.g., logit, censored, and Poisson regression models. Similarly, Ai and McFadden (1997 considered the estimation of a wide class of latent partially linear models including the censored regression model, but they imposed a parametric form on the error distribution, resulting in inconsistent estimates if the distribution is misspecified. For limited-dependent-variable partially linear models, there are relatively few estimators for the case in which errors are not parametrically specified. Chen and Khan (2001 proposed an estimation procedure for a partially linear censored regression under conditional quantile restriction. Although quantile regression is flexible enough to accommodate conditional heteroscedasticity, the form of such conditional heteroscedasticity is rather limited for identification of β 0. More recently, Abrevaya and Shin (2011 provided a rank based estimation methodology for a general partially linear regression model in which no parametric assumptions are made on the error disturbance. 1 These estimators, however, are difficult to extend to their limited dependent variable versions. See the discussion below. 2

However, their rank based estimator requires the independence between the error term and the regressors, thus cannot apply to the heteroscedastic error terms. In light of these studies, in this paper we consider the identification and estimation of a partially linear censored regression model that features multiplicative heteroscedasticity of unknown functional form, i.e., the model is specified by (1.1 with the observed dependent variable Y generated by Y = max{0, Y }, 2 (1.2 and the disturbance U represented as the product of a nonparametrically specified scale function of the regressors, and a homoscedastic error, U = σ(x, Z. (1.3 We impose no parametric as well as location restriction on the distribution of as long as it is independent of (X, Z, has a continuous distribution with density function that is bounded, positive, and continuous on R. Moreover, we impose no parametric restriction on the functional form of σ( as long as it is positive and satisfies certain smoothness properties, as detailed in the next section. 3,4 Broadly speaking, the general form of heteroscedasticity specified by (1.3 has posed substantial difficulty for model identification and estimation. For identification, it is straightforward to see that the regressor X should not contain a constant term. Moreover, it can be argued that both β 0 and m( generally cannot be identified if no functional restriction is imposed on σ(. To see this, assume σ(x, Z = X γ 0 + σ 1 (Z with σ 1 ( left unspecified. Then the model (1.1 is observationally equivalent to the following model Y = X (β 0 + γ 0 + (m(z + σ 1 (Z + U with U = (X γ 0 + σ 1 (Z( 1. The major insight we show in this paper is that for a partially linear censored regression model with a general form of heteroscedasticity like (1.1-(1.3, both β 0 and m( can be exactly identified if there exists sufficient nonlinearity in σ(x, Z with respect to X. The basic intuition behind this result, can be understood, by assuming σ(x, Z is additively separable, i.e., σ(x, Z = σ 1 (Z + σ 2 (X. 5 If so, then 2 The assumption that the dependent variable is censored from below at zero by (1.2 is made for convenience. Our identification and estimation strategy introduced below can extend, by a slight modification to any two sided fixed censoring, i.e., Y = ci(y c + Y I(c < Y < c + ci(y c with c < c +. 3 Our identification strategy for β 0 requires a weak condition on σ(. See the discussion below. Also see Assumption 2.1 in Section 2. 4 Our model (1.1-(1.3 can also be regarded as resulting from a heteroscedastic sample selection model where m(z is used to correct for the sample selection bias related to a set of regressors Z. For such a specification, See Ahn and Powell (1993. 5 This additive separability is assumed here for heuristic illustration but not formally required for our identification strategy. 3

the model (1.1 is observationally distinguishable from Y = X β 0 + aσ 2 (X + (m(z + aσ 1 (Z + U with U = (σ 2 (X + σ 1 (Z( a for any a 0 as long as σ 2 (X contains other additional nonlinear terms in X than the linear one. Formally, our identification strategy exploits variation in conditional quantile functions of the dependent variable evaluated at different quantile indices, which has been used by Chen and Khan (2000 for linear censored regression model. Extending this idea to partially linear censored regression models complicates the analysis in that there now exist two nuisance nonlinear components m( and σ( both of which should be eliminated successively in order to identify β 0, while the linear model contains only one nuisance parameter σ(. This feature thus makes the resulting two stage estimation procedure looks somewhat like a double pairwise difference estimator with the limiting distribution more complex than the linear model. The rest of the paper is structured as follows. Our identification strategy for β 0 and m( under a weak condition on σ( is motivated and described in Section 2. Estimation of these parameters are considered in Section 3. Monte Carlo simulation results are reported in Section 4. Section 5 concludes and all the technical details are collected in the Appendix. 2 Identification We first consider the identification of β 0. Identification of the other two infinite dimensional parameters m( and σ( will be considered later. To fix the idea, first consider the uncensored model with the dependent variable Y in (1.1 directly observed by the researcher. Extension to the censoring case (1.2 will be discussed later. Our general idea for identifying β 0 is to eliminate two nuisance nonlinear components in the regression model, i.e., m( and σ( successively by exploiting variations of qy τ (x, z, the quantile functions of Y conditional on X and Z, satisfying ( P Y qy τ (x, z X = x, Z = z = τ (2.1 for any x X, z Z and τ (0, 1, where X R p X and Z R p Z are the supports of X and Z, respectively. It follows from (1.1, (1.3 and (2.1 that q τ Y (X i, Z i = X iβ 0 + m(z i + σ(x i, Z i q τ (2.2 and q τ Y (X j, Z i = X jβ 0 + m(z i + σ(x j, Z i q τ (2.3 4

for any pair of indices i, j = 1,, n, i j and 0 < τ < 1, where q τ is the τ-th quantile of. Denote qii τ = qy τ (X i, Z i, qji τ = qy τ (X j, Z i, q τ = qii τ qji, τ σ ii = σ(x i, Z i and σ ji = σ(x j, Z i. Taking difference between (2.2 and (2.3 thus eliminates m(z i and gives q τ = ( β 0 + (σ ii σ ji q τ. (2.4 To identify β 0, we then need to eliminate the remaining nuisance component σ ii σ ji in (2.4. Following the idea of Chen and Khan (2000, we do this by exploiting the variation of (2.4 with respect to the quantile index τ. That is, for any two different quantile indices 0 < τ 1 < τ 2 < 1, we solve out σ ii σ ji by pairwise differencing and which gives q τ 1 = ( β 0 + (σ ii σ ji q τ 1 (2.5 q τ 2 = ( β 0 + (σ ii σ ji q τ 2, (2.6 σ ii σ ji = qτ 1 qτ 2 q τ 1 q τ. (2.7 2 Then combining (2.5-(2.7, we arrive at the following equation which serves as the basis for our identification strategy, where and q τ 1τ 2 = ( β 0 + ( q τ 1 qτ 2 ατ 1τ 2 0 (2.8 q τ 1τ 2 = ( q τ 1 + qτ 2 /2 α τ 1τ 2 0 = 1 2 q τ 1 q τ 1 + q τ 2 q τ. (2.9 2 Notice that for any given 0 < τ 1, τ 2 < 1, it follows from (2.7 that ( q τ 1 qτ 2 is proportional to σ(x i, Z i σ(x j, Z i. Thus β 0 is identified from the regression (2.8 if the following assumption is satisfied: Assumption 2.1. The following (p x + 1 (p x + 1 matrix has full rank: ( ( (X 1 X 2 V = E (X 1 X 2, σ(x 1, Z 1 σ(x 2, Z 1 ]. σ(x 1, Z 1 σ(x 2, Z 1 Put differently, Assumption 2.1 says that β 0 is identified if there is no multicollinearity in the generated regressor vector ( (X 1 X 2, σ(x 1, X 1 σ(x 2, Z 1, which amounts to the existence of sufficient nonlinearity in σ(x, Z with respect to X. Remark 2.1. This identification condition fails, if, e.g., σ(x, Z relies on X only, i.e., σ(x, Z = σ(x and σ(x is linear in X. However, this is unlikely to occur since a 5

function that is linear in X is not a regular specification for a scale function σ(x that is supposed to be positive along the support of X. 6 Remark 2.2. One may not expect to achieve efficiency gain by including more ( q τ 1 q τ 2 s into the regression (2.8 with different pairs (τ 1, τ 2, because all these ( q τ 1 qτ 2 s are proportional to ( (X 1 X 2, σ(x 1, Z 1 σ(x 2, Z 1, which doesn t rely on (τ 1, τ 2. Remark 2.3. It is instructive to compare our identification condition with Chen and Khan (2000 s for linear censored regression model. In their model, the sufficient condition for identification is that E (X, σ(x (X, σ(x ] has full rank (Assumption FR, which differs from Assumption 2.1 in that ours has accounted for partially linear structure in the model. We show below that the identification strategy introduced above is flexible enough to deal with the model with the censored regression model, that is, Y cannot be observed but only the censored variable Y is observed, according to (1.2. In this case, the key observation that makes us believe our identification strategy still work is the well-known equivariance property of conditional quantiles, which has been exploited by Powell (1984, 1986 to deal with the censored linear quantile regression models. The property says that the τ-th conditional quantile of the observed response variable Y, is given by q τ Y ( = max { q τ Y (, 0}, (2.10 implying that q τ Y ( is as informative as qτ Y ( as long as qτ Y ( > 0 and in this case qτ Y ( = qy τ (. Based on this observation, it is straightforward to see that the regression equation (2.1.8 which lays the root for our identification strategy still holds conditional on the events q τ 1 Y (X i, Z i > 0 and q τ 1 Y (X j, Z i > 0. 7 We end this section by summarizing the above reasoning in the following sufficient condition for identification of β 0. Assumption 2.2. The following (p x + 1 (p x + 1 matrix has full rank: ( V = E I (q τ 1 Y (X 1, Z 1 > 0 I (q τ 1 Y (X (X 1 X 2 2, Z 1 > 0 σ(x 1, Z 1 σ(x 2, Z 1 ( ] (X 1 X 2, σ(x 1, Z 1 σ(x 2, Z 1. Next we consider identification of the nonlinear component m(. Fix z Z, we have from (2.2 that q τ Y (X i, z = X iβ 0 + m(z + σ(x i, zq τ, (2.11 for i = 1,, n. If σ( were known, it is straightforward to see that we can identify m(z by regressing q τ Y (X i, z X iβ 0 ] s on (1, σ(x i, z s, given that β 0 is identified. Apparently, 6 One oft-touted specification for σ(x is the exponential function σ(x = exp(x γ 0. 7 The monotonicity of conditional quantile function with respect to quantile index implies that q τ 2 Y (Xi, Zi > 0 and q τ 2 Y (Xj, Zi > 0 as long as τ2 > τ1, qτ 1 Y (Xi, Zi > 0 and qτ 1 Y (Xj, Zi > 0. 6

this seems infeasible since σ( is not known for now. However, we can still solve out σ(x i, z by pairwise comparing (2.11 evaluated at two different quantile indices 0 < τ 1 < τ 2 < 1, i.e., which gives Combining (2.12-(2.14, we arrive at q τ 1 Y (X i, z = X iβ 0 + m(z + σ(x i, zq τ 1. (2.12 q τ 2 Y (X i, z = X iβ 0 + m(z + σ(x i, zq τ 2, (2.13 σ(x i, z = qτ 1 Y (X i, z q τ 2 Y (X i, z q τ 1 q τ. (2.14 2 m(z = q τ 1τ 2 Y (X i, z X iβ 0 ( q τ 1 Y (X i, z q τ 2 Y (X i, z α τ 1τ 2 0, (2.15 where q τ 1τ 2 Y (X i, z = (q τ 1 Y (X i, z + q τ 2 Y (X i, z/2 and α τ 1τ 2 0 is the same as (2.9. Since each component on the righthand side of (2.2.5 can be or has been identified from the preceding steps, m(z is also identified. In the presence of censoring in Y, the above identification equation (2.15 still works since all these conditional quantile functions involved in (2.15 are identified conditional on the event that they are positive, by referring to (2.10. Given the identification of β 0 and m(, it is straightforward to see that σ(x, z is also identified up to scale for any (x, z X Z. That is, under the normalization q τ = 1 for a sufficiently large τ, σ(x, z = q τ Y (x, z x β 0 m(z. (2.16 3 Estimation 3.1 A two stage estimator To estimate β 0, the identification strategy introduced in the preceding section can be translated into the following two stage procedure. The first stage aims to nonparametrically fit the conditional quantile function qy τ (X, Z given X = x and Z = z at various in-sample and out-of-sample points for τ = τ 1, τ 2. These nonparametric estimates then are used to construct the generated dependent variables and regressors in the regression equation (2.8, i.e., = qτ 1 Y (X i, Z i q τ 1 Y (X j, Z i, and q τ 2 = qτ 2 Y (X i, Z i q τ 2 Y (X j, Z i, q τ 1τ 2 = + q τ 2. 2 7

In the second stage, we regress τ 2 s on ( s and q τ 2 s for i, j = 1,, n, i j, conditional on q τ 1 Y (X i, Z i > 0 and q τ 1 Y (X j, Z i > 0. Then the estimated coefficient on ( is the estimator for β 0. Before proceeding to a discussion of the asymptotic properties of these estimators we first discuss each of these two stages in greater detail. Specifically, we discuss the nonparametric estimation procedure adopted in the first stage, and some technical complications associated with the second stage ordinary least squares. 3.1.1 Stage 1: local polynomial quantile regression The first stage of our estimation procedure involves nonparametrically estimating the conditional quantiles of Y given the regressors W = (X, Z. While there are numerous methods for estimating the conditional regression quantiles in the statistics and econometrics literature, we use the local polynomial regression estimator introduced by Chaudhuri (1991a,b. This estimation procedure is computationally simple as it involves minimization of a globally convex objective function which can be handled using linear programming methods. Moreover, it allows for simple control of the order of the bias by appropriately selecting the polynomial order. This is also in parallel with other two stage procedures that have used local polynomial quantile regression as their first stage estimator, e.g., Lee (2003, Chen and Khan (2000. We assume that the regressor vector W i = (X i, Z i can be partitioned into (Wi d, Wi c, where Wi d and Wi c are p W d-dimensional discrete and p W c-dimensional continuously distributed components of W i respectively, with p W d = p X d+p Z d and p W c = p X c+p Z c. Let C n (x, z denote the p W = p X + p Z dimensional cube centered at (x, z associated with a bandwidth h. We mean by W j C n (x, z that W d j = w d = (x d, z d and W c j = (X c j, Z c j lies in the p W c- dimensional cube with side length 2h. Let d denote the order of differentiability of the quantile functions with respect to w c = (x c, z c. Define A(d = { (α 1,, α p W c p W c l=1 αl d }, where each α l is a non-negative integer. Let #A(d be the number of elements in A(d. Moreover, for convenience, let the first element in A(d be (0,, 0. Then for s d, the local polynomial fitting of qy τ (x, z amounts to solving the following minimization problem bn = arg min b k=1 where b = (b 1,, b #A(s, (W c k wc α l n { ( #A(s I (X k, Z k C n (x, z }ρ τ Y k b l (Wk c w c α l l=1 is a shorthand notation for the product of each component of (W c k wc raised to the corresponding component of α l A(s, ρ τ (u = u (τ I(u 0. Among the total #A(s components of b n, it is the coefficient corresponding to the constant term b 1, i.e., the first component of b n that estimates the conditional quantile function qy τ (x, z while others are regarded as nuisance parameters. 8

3.1.2 Stage 2: weighted least squares The second stage of our estimator treats the values estimated in the first stage as raw data, and adopts a weighted least-squares device to estimate β 0, which amounts to the following minimization problem, ( β n, α n = arg min β,α i j q τ 1 τ 2 ( β τ ( q 1 q 2 τ 2 α] ωii ω ji, where ω ii = ω( q τ 1 Y (x i, z i, ω ji = ω( q τ 1 Y (x j, z i. with the trimming function ω(q = I(q > 0, ensuring that the individual summands for which q τ 1 Y (x i, z i 0 or q τ 1 Y (X j, Z i 0 will not contribute to the weighted least squares. Moreover, instead of using the one-zero valued trimming function, one may follow Chen and Khan (2000 (also Buchinsky and Hahn (1998 to adopt a smoothed trimming function for technical convenience: for some small positive number c, ω(q takes the value zero for q < c; otherwise 0 < ω(q < 1. θ = (β, α. Thus our estimator for θ 0 = (β 0, α τ 1τ 2 0 can be written as Let θ n = Ŝ 1 1n Ŝ2n (3.1.1 where Ŝ 1n = q τ 2 ] q τ 2 ] ω ii ω ji and Ŝ 2n = q τ 2 ] q τ 1τ 2 ω ii ω ji. 3.2 Large sample properties In this subsection, we show that the estimator (3.1.1 is n-consistent and derive its limiting distribution. To do this, we introduce the following assumptions, most of which are adapted from Chen and Khan (2000. Assumption 3.1. The dataset { Y i, X i, Z i }, i = 1,, n are i.i.d. generated by (1.1- (1.3. Assumption 3.2. The random variable is independent of (X, Z, has a continuous distribution with density function that is bounded, positive, and continuous on R. Assumption 3.3. For any w = (x, z X Z, f W c W d(wc w d, the density of W c = (X c, Z c conditional on W d = (X d, Z d is finitely and uniformly bounded away from zero. For some ϱ (0, 1], and any function f( defined on a set D, we mean by f( C ϱ (D that there exists a positive constant C such that f(x 1 f(x 2 C x 1 x 2 ϱ 9

for any x 1, x 2 D with x = (x x 1/2. This latter definition was introduced by Chaudhuri et al. (1997 to describe the order of smoothness for various functions involved in local polynomial quantile regression. Assumption 3.4. For some ϱ (0, 1], f W (, C ϱ (X Z, where f W (, is the marginal density function of (X, Z. σ(x, z is continuously differentiable in w c = (x c, z c of order η 1, with η 1 th-order derivatives C ϱ (X Z. m(z is continuously differentiable in z c of order η 2, with η 2 th-order derivatives C ϱ (Z. Assumption 3.5. The trimming function ω( 0. ω(q = 0 if and only if q < c with c a small positive number. ω( is differentiable with bounded derivative. Assumption 3.6. The bandwidth h satisfies: h = O(n ξ, with ( 1 ξ 2(ϱ + min{η 1, η 2 }, 1. 3(p X c + p Z c Assumption 3.1 describes the data generating process. Assumptions 3.2-3.4 impose restrictions on primitives of the model, including the disturbance, distribution of the regressors and smoothness of the unknown functions. Assumption 3.5 places regularity conditions on the trimming function and Assumption 3.6 is concerned with the choice of bandwidth involved in the nonparametric estimation. A condition of similar form can be found in other literature on semiparametric models involving local polynomial regression, e.g., Assumption 6 in Lee (2003 and Assumption BC in Chen and Khan (2000. In general, these bandwidth conditions are required to control the order of residual terms resulted from the first stage local polynomial quantile regression to sufficiently small so that plugging them into the second stage will not affect the optimal convergence rate of the final estimator. To present the limiting distribution, more notations are needed. Let κ 1 = q τ 2 /(q τ 1 q τ 2 and κ 2 = q τ 1 /(q τ 1 q τ 2. For any τ 1 and τ 2, define ψ τ 1 1 (z = E ( ω(q τ 1 Y (X, Z Z = z, and ψ τ 1 2 (z = E ( Xω(q τ 1 Y (X, Z Z = z, ψ τ 1τ 2 3 (z = E ( ω(q τ 1 Y (X, Zqτ 2 Y (X, Z Z = z. Let V 1i, V 2i denote the residuals generated by Y i q τ 1 Y (X i, Z i and Y i q τ 2 Y (X i, Z i, respectively. The joint densities of (V 1, X, Z and (V 2, X, Z is denoted by f V1 XZ(,, and f V2 XZ(,,, respectively. Define the following zero mean random vectors: π 1k = ω τ 1 I (Y k q τ 1 1 ] fv 1 XZ (0, X k, Z k f XZ (X k, Z k X k ψ τ 1 1 (Z k ψ τ 1 2 (Z k ], 10

π 2k = ω τ 1 I (Y k q τ 1 1 ] fv 1 XZ (0, X k, Z k f XZ (X k, Z k (q τ 1 qτ 2 ψτ 1 (Z k (ψ τ 1τ 1 3 (Z k ψ τ 1τ 2 3 (Z k ], π 3k = ω τ 1 I (Y k q τ 1 1 ] fv 1 XZ (0, X k, Z k f X (X k f Z (Z k ψ τ 1 2 (Z k X k ψ τ 1 1 (Z k ], π 4k = ω τ 1 I (Y k q τ 1 1 ] fv 1 XZ (0, X k, Z k f X (X k f Z (Z k (ψ τ 1τ 1 3 (Z k ψ τ 1τ 2 3 (Z k (q τ 1 qτ 2 ψτ 1 (Z k ], π 5k = ω τ 1 I (Y k q τ 2 1 ] fv 2 XZ (0, X k, Z k f XZ (X k, Z k X k ψ τ 1 1 (Z k ψ τ 1 2 (Z k ], π 6k = ω τ 1 I (Y k q τ 2 1 ] fv 2 XZ (0, X k, Z k f XZ (X k, Z k (q τ 1 qτ 2 ψτ 1 (Z k (ψ τ 1τ 1 3 (Z k ψ τ 1τ 2 3 (Z k ], π 7k = ω τ 1 I (Y k q τ 2 1 ] fv 2 XZ (0, X k, Z k f X (X k f Z (Z k ψ τ 1 2 (Z k X k ψ τ 1 1 (Z k ], π 8k = ω τ 1 I (Y k q τ 2 1 ] fv 2 XZ (0, X k, Z k f X (X k f Z (Z k (ψ τ 1τ 1 3 (Z k ψ τ 1τ 2 3 (Z k (q τ 1 qτ 2 ψτ 1 (Z k ]. Proposition 1. Suppose that Assumptions 2.2, and 3.1-3.6 hold. Then as n, we have n( θn θ 0 d N (0, V 1 ΩV 1, where V = ( I px 0 0 q τ 1 q τ 2 V ( I px 0 0 q τ 1 q τ 2 with V given by Assumption 2.2, and Ω = E(Π k Π k, ( ( π 1k π 3k π 5k π 7k Π k = κ 1 + κ 2. π 2k π 4k π 6k π 8k The limiting distribution of our estimator is much more complex than Chen and Khan (2000 due to the presence of partially linear structure in our model. For large sample inference, consistent estimate of the asymptotic variance matrix is needed. Given the consistent estimator θ n for θ 0, it follows from (2.8 and the proof of Proposition 1 that V can be estimated by V n = 1 n(n 1 n q τ 2 11 ] q τ 2 ] ω ii ω ji.

To estimate Ω, first consider the estimation of κ 1 and κ 2. By (2.9, we know that α n will be q τ 1 +q τ 2 q τ 1 q τ 2 a consistent estimator for α τ 1τ 2 0 = 1 = 1 κ 2 2 1. Thus we estimate κ 1 by κ 1n = 1 α 2 n and κ 2 by κ 2n = 1 κ 1n. Given the consistent estimator π lk s for π lk s, l = 1,, 8, Ω then can be estimated by ( ( ] Ω n = 1 n π 1k π 3k π 5k π 7k κ 1n + κ 2n n π k=1 2k π 4k π 6k π 8k ( ( ] π 1k π 3k π 5k π 7k κ 1n + κ 2n. π 2k π 4k π 6k π 8k Remark 3.1. As argued by Chen and Khan (2000, the selection of the quantile pair (τ 1, τ 2 in practice is governed by two factors. The first is that for the full rank condition Assumption 2.1 or 2.2 to be satisfied in finite samples, it is clearly necessary that both quantiles be significantly larger than the level of censoring in the data. However, precision of the estimators is also sacrificed if the quantiles get arbitrarily close to one, as the density of the residuals becomes very small. Optimally weighing in both considerations is a difficult problem as it involves knowledge of the regressor distribution, the parameter β 0, the scale function σ(, and the density of the homoscedastic component. Given the estimator β n and α n for (β 0, α 0, (2.15 implies m(z for any given z Z can be estimated by m n (z = 1 n n where ω i = ω( q τ 1 Y (X i, z, and ( ] / ω i q τ 1τ 2 Y (X i, z X i β n q τ 1 Y (X i, z q τ 2 1 Y (X i, z α n n q τ 1τ 2 Y (X i, z = qτ 1 Y (X i, z + q τ 2 Y (X i, z. 2 n ω i, Moreover, it follows from (2.16 that σ(x, z for any given (x, z X Z can be estimated by σ n (x, z = qτ 1 Y (x, z x βn m n (z 2 q τ + qτ 2 Y (x, z x βn m n (z 1 2 q τ 2 for some sufficiently large τ 1 so that q τ 1 Y (x, z > 0, where qτ 1 q τ 2 = 2 α n 1 2 α n + 1. can be normalized to one and 4 Monte Carlo Simulation We conduct a Monte Carlo experiment to evaluate the finite sample performance of our two stage estimator. We simulated from models of the form Y = max ( 0, a + Xβ 0 + m(z + U, 12

Table 1. Simulation Results σ(x, Z = exp(x n = 200 n = 400 (τ 1, τ 2 (0.5, 0.6 (0.5, 0.8 (0.5, 0.6 (0.5, 0.8 Censor deg. Bias Std Bias Std Bias Std Bias Std 15%.0663.1386.0541.1427.0473.1096.0442.1072 30%.0624.1583.0682.1621.0528.1136.0542.1205 45%.0742.1884.0768.1825.0624.1435.0647.1414 σ(x, Z = X 2 n = 200 n = 400 (τ 1, τ 2 (0.5, 0.6 (0.5, 0.8 (0.5, 0.6 (0.5, 0.8 Censor deg. Bias Std Bias Std Bias Std Bias Std 15%.0473.1347.0483.1436.0392.1053.0435.1082 30%.0527.1529.0663.1580.0453.1127.0581.1185 45%.0682.1822.0724.1858.0524.1372.0673.1367 σ(x, Z = Z 2 n = 200 n = 400 (τ 1, τ 2 (0.5, 0.6 (0.5, 0.8 (0.5, 0.6 (0.5, 0.8 Censor deg. Bias Std Bias Std Bias Std Bias Std 15%.0779.1856.0726.1845.0594.1368.0603.1412 30%.0825.2067.0863.2184.0764.1593.0728.1641 45%.0872.2465.0836.2373.0793.1842.0835.1857 13

where X and Z are two scalar random variables distributed as standard normal N (0, 1, β 0 = 1, m(z = sin(0.5πz and the intercept a varies across different designs to keep the degree of censoring at a given level. For example, letting a be the 30% empirical quantile of Yi = X i β 0 + m(z i + U i s results in about 30% degree of censoring. U = σ(x, Z, where is distributed as a chi square distribution with one degree of freedom χ 2 (1 and we consider the following three forms of σ( : DGP1: σ(x, Z = C 1 exp(x, DGP2: σ(x, Z = C 2 X 2, DGP3: σ(x, Z = C 3 Z 2, where the constants C l s l = 1, 2, 3 are set to keep the (unconditional variance of the disturbance U at one. Among these scale functions, the third violates the sufficient condition for regular identification of our estimator. Nonetheless, as we see in the simulation result, even in this case our estimator performs relatively well in finite samples. For these designs, we consider n {200, 400} and the censoring degree {15%, 30%, 45%}. As discussed by Remark 3.1, to ensure that the full rank condition is satisfied in finite samples, the quantile pair should be significantly larger than the level of censoring in the data. For the current study, we consider (τ 1, τ 2 {(0.5, 0.6, (0.5, 0.8}, in order to explore how sensitive our estimator is with respect to different selection of quantile pairs, To implement our estimator, we need to choose the bandwidth h. Generally speaking, selecting the bandwidth in any two step estimator is a difficult problem, but there are procedures that incorporate the undersmoothing prescribed by the theory. Examples include the procedures used in Horowitz (1992 and Buchinsky and Hahn (1998, which both perform well in the simulation studies they consider. For our study, we consider bandwidths that decrease to zero at the rate n 1/6, i.e., h = Â n 1/6, as this rate is consistent with the guidelines in Assumption 3.6 when ϱ = 1 and p X c + p Z c = 2. Following Chen and Khan (2000, the bandwidth constant Â is selected by a weak grid search over the interval 1, 4] and the value which minimizes the sum of square residuals is chosen. The first stage of our estimator involves nonparametrically fitting conditional quantile function q τ Y (X, Z at various in-sample points (X i, Z i s and out-of-sample points (X j, Z i s. To ease the computational burden, qy τ (X, Z is not evaluated at out-of-sample points (X j, Z i s for each i, j = 1,, n, i j. Instead, the nonparametric conditional quantile function is estimated only at (X j, Z i s if and only if 0.5 σ X with σ X the empirical standard deviation of X i s. The rationale behind this is that any local fitting technique makes use of only the sample information around a point for being evaluated, thus fitting conditional quantile function evaluated at the out-of-sample points that are close to in-sample points have already exploited all information in the dataset. For the trimming function ω(, we adopt the following which has been used by Buchin- 14

sky and Hahn (1998 and also Chen and Khan (2000, ( ( e q 2c e c 2 + e c + e c ω(q = I(c < q < 3c + I(q 3c 1 + eq 2c 1 + e c e c e c with c some positive constant. This trimming function takes positive value between (0, 1] as long as q > c and is differentiable everywhere, thus satisfying Assumption 3.5. regards the trimming value, we choose c to be the 5% empirical quantile of { q τ 1 Y (X i, Z i } s conditional on q τ 1 Y (X i, Z i > 0. According to Buchinsky and Hahn (1998, changing the trimming value c had virtually no affect on the performance of their estimator. 8 For each case, we repeat 1000 times and report the empirical bias and standard deviation. The simulation results are summarized in Table 1. From Table 1, we see that these estimators are slightly downward biased; these biases usually decline with the sample size but moderately increase with the degree of censoring. These estimators tend to give acceptable precision even when the dataset is heavily censored (e.g., 45% censored. Their standard deviation declines with the sample size but increases with the degree of censoring. The magnitude of such decline is generally consistent to n-asymptotics, which is consistent with our theoretical prediction. These estimators seem quite robust with respect to the choice of quantile pairs in terms of either bias or standard deviation, as long as the quantile index significantly excesses the degree of censoring. In particular, for DGP3 that violates our sufficient identification condition, our estimator remains to perform reasonably well. As 5 Conclusion In this paper, we consider the identification and estimation of finite dimensional parameters as well as infinite dimensional unknown functions in a partially linear censored regression model with multiplicative heteroscedasticity of unknown form. Our identification strategy exploits variation in conditional quantile functions of the dependent variable with respect to the quantile index and a two stage estimation procedure based on this identification strategy is defined. The proposed estimator is shown to be n-consistent and its limiting distribution is formally derived. Monte Carlo simulation indicates that our estimator performs reasonably well in finite samples. References Abrevaya, J. and Shin, Y., 2011, Rank estimation of partially linear index models, Econometrics Journal, 14, 409-437. 8 Moreover, Buchinsky and Hahn found that replacing the smooth function ω(q with the indicator function I(q > 0 also had only a minor effect on the performance of the estimator. 15

Ahn, H. and Powell, J.L., 1993, Semiparametric estimation of censored selection models with a nonparametric selection mechanism, Journal of Econometrics, 58, 3-29. Ai, C. and McFadden, D., 1997, Estimation of some partially specified nonlinear models, Journal of Econometrics, 76, 1-37. Aradillas-Lopez, A., Honore, B., Powell, J.L., 2007, Pairwise difference estimation with nonparametric control variables, International Economic Review, 48, 1119-1158. Buchinsky, M. and Hahn, J., 1998, An alternative estimator for the censored quantile regression model, Econometrica, 66, 627-651. Chaudhuri, P., 1991a, Nonparametric estimates of regression quantiles and their local bahadur representation, Annals of Statistics, 19, 760-777. Chaudhuri, P., 1991b, Global nonparametric estimation of conditional quantiles and their derivatives, Journal of Multivariate Analysis, 39, 246-269. Chaudhuri, P., K. Doksum and A. Samarov, 1997, On average derivative quantile regression, Annals of Statistics, 25, 715-744. Chen, S. and Khan, S., 2000, Estimating censored regression models in the presence of nonparametric multiplicative heteroscedasticity, Journal of Econometrics, 98, 283-316. Chen, S. and Khan, S., 2001, Semiparametric estimation of a partially linear censored regression model, Econometric Theory, 17, 567-590. Honore, B. and Powell, J.L., 2005, Pairwise difference estimation of nonlinear models, in Identification and Inference for Econometric Models, edited by Andrews, D.W.K. and Stock, J.H., chapter 22, Cambridge University Press. Horowitz, J.L., 1992, A smoothed maximum score estimator for the binary response model, Econometrica, 60, 505-531. Lee, S., 2003, Efficient semiparametric estimation of a partially linear quantile regression model, Econometric Theory, 19, 1-31. Powell, J. L., 1984, Least absolute deviations estimation for the censored regression model, Journal of Econometrics, 25, 303-325. Powell, J. L., 1986, Censored regression quantiles, Journal of Econometrics, 32, 143-155. Powell J.L., 1987, Semiparametric estimation of bivariate latent variable models, working paper, Social Research Institute, University of Wisconsin, Madison, WI. Robinson, P., 1988, Root-n-consistent semiparametric regression, Econometrica, 56, 931-954. Appendix: Proofs Lemma A.1. Under Assumptions 3.1-3.4 and 3.6, max (x,z X Z,q τ Y (x,z c/2 q τ Y (x, z q τ Y (x, z = op (n 1/4. 16

Proof. See Lemma 1 in Chen and Khan (2000 and Lemma 4.3-(i in Chaudhuri et al. (1997. Lemma A.2. Under Assumptions 3.1-3.4 and 3.6, let A n = { (x, z X Z : qy τ (x, z c/2 and q Y τ (x, z c} for some c > 0. Then P ((X, Z A n C 1 exp( C 2 nh p w c, with C 1, C 2 two positive constants. Proof. See Chen and Khan (2000 s Lemma 2. Proof of Proposition 1. Since ( 1 1 1 n ( θn θ 0 = (Ŝ2n n(n Ŝ1nθ 0, 1Ŝ1n n(n 1 we prove the proposition by showing the following results successively. (i We show that 1 p V, where V is a finite matrix with full rank defined in Proposition 1. (ii We n(n 1Ŝ1n establish the asymptotic representation for 1 n(n 1 (Ŝ2n Ŝ1nθ 0. The result (i follows from the law of large number for U-statistics, Assumption 2.2 and the proof essentially is the same as Lemma 3 in Chen and Khan (2000. Now we consider proving (ii. Decompose 1 (Ŝ2n Ŝ1nθ 0 n(n 1 = 1 ω ii ω ji n(n 1 q τ 2 ] q τ 1τ 2 into T 1n + T 2n + T 3n T 4n, where 1 0 T 1n = ω ii ω ji n(n 1 q τ 2 ( q τ 1 qτ 2 and T 3n = T 4n = T 2n = 1 ω ii ω ji n(n 1 1 ω ii ω ji n(n 1 1 ω ii ω ji n(n 1 q τ 1 qτ 2 q τ 1 qτ 2 q τ 1 qτ 2 ] q τ 1τ 2 q τ 2 ] q τ 1τ 2 ] ] q τ1τ2 q τ 1τ 2, q τ 1 qτ 2 ] θ 0 ] q τ 2 ] θ 0 ] ] q τ1 q τ 2 ( q τ 1 ] qτ 2 α τ 1τ 2 0. First, it follows from Lemma A.1, Eq. (2.8, the boundedness of ω( and ] q τ 1τ 2 θ 0 q τ 2 = q τ 1τ 2 ( β 0 ( q τ 1 qτ 2 ( + q τ 1 τ 2 q τ 1τ 2 τ (( q 1 q τ 2 17 α τ 1 τ 2 0 ( q τ 1 qτ 2 α τ 1τ 2 0, ] θ 0 ],

that T 1n = o p (1. Moreover, it follows from (2.8, Lemma A.2 and the proof of Lemma 4 in Chen and Khan (2000 that T 3n = o p (1. Thus it remains to consider T 2n and T 4n. It follows from q τ 1τ 2 q τ 1τ 2 = 1 ] τ ( q 1 2 qτ 1 τ + ( q 2 qτ 2, and (2.9 that T 2 T 4 can be written as 1 T 2 T 4 = ω ii ω ji n(n 1 q τ 1 qτ 2 ] ] τ1 κ 1 ( q qτ 1 τ + κ 2 ( q 2 qτ 2, where κ 1 = q τ 2 /(q τ 1 q τ 2 and κ 2 = q τ 1 /(q τ 1 q τ 2. Since q τ qτ = q ii τ qii τ ( q ji τ qji τ for τ = τ 1 and τ 2, we are in position to consider the limiting distributions for ] Γ 1n = 1 ω ii ω ji n(n 1 q τ 1 qτ 2 ( q τ 1 ii qτ 1 ii, Γ 2n = 1 ω ii ω ji n(n 1 q τ 1 qτ 2 ] ( q τ1 ji qτ 1 ji, Γ 3n = 1 ω ii ω ji n(n 1 q τ 1 qτ 2 ] ( q τ 2 ii qτ 2 ii, and Γ 4n = 1 ω ii ω ji n(n 1 q τ 1 qτ 2 ] ( q τ2 ji qτ 2 ji. First consider Γ 1n. By the mean value expansion and Lemma A.2, we can replace ω ii ω ji with ω ii ω ji without affecting the dominant term in Γ 1n, i.e., we can show ] 1 Γ 1n = ω ii ω ji n(n 1 q τ 1 ( q τ 1 qτ 2 ii qτ 1 ii + o p(1. The next step is to plug in Γ 1n a Bahadur type expansion of q τ 1 ii qτ 1 ii. Fortunately this type of expansion has been already well established in Chaudhuri (1991a, b and Chaudhuri et al. (1997. According to Chaudhuri (1991a, q iik q iik permits the following asymptotic linear representation: q τ 1 ii qτ 1 ii = #Cn 1 (X i, Z i e T 1 G 1 { n (X i, Z i k i,j } τ 1 I (Y k q τ 1 ] I (X k, Z k C n (X i, Z i ] L(h, X k X i, Z k Z i + R n (X i, Z i, 18

where the remainder term max Xi X,Z i Z R n (X i, Z i = o(n 1/2 by Assumption 3.6 and Lemma 4.1 in Chaudhuri et al. (1997, #C n (x, z is the number of elements in C n (x, z, L(h, X k x, Z k z denotes the #A(s-dimensional vector L(h, X k x, Z k z = 1, { h α] (W k z α], 1 α] s }], e 1 is a #A(s-dimensional column vector with the first component being one and the remaining components being zero; #A(s #A(s matrix G n (X i, Z i is the density weighted conditional mean of the outer product L(h, X k X i, Z k Z i given that (X k, Z k C n (X i, Z i. Let f n (x, z = (1/nE#C n (X, Z X = x, Z = z]. Plugging the representation for q τ 1 ii qτ 1 ii into Γ 1n gives Γ 1n = 1 n(n 1(n 2 i j k ω ii ω ji q τ 1 qτ 2 ] fn 1 (X i, Z i e T 1 G 1 n (X i, Z i L(h, X k X i, Z k Z i τ 1 I (Y k q τ 1 ] I (X k, Z k C n (X i, Z i ] + o p (1, which is a third-order U-statistic. Applying the projection theorem for U-statistics, also following the proof of Lemma 5 in Chen and Khan (2000, we can show 1 ω ii ω ji X i f 1 n(n 1(n 2 n (X i, Z i e T 1 G 1 n (X i, Z i L(h, X k X i, Z k Z i i j k τ 1 I (Y k q τ 1 ] I (X k, Z k C n (X i, Z i ] = 1 n ω X k τ 1 I (Y k q τ 1 1 n ] fv 1 XZ (0, X k, Z k f XZ (X k, Z k ψ τ 1 1 (Z k + o p (1, k=1 1 n(n 1(n 2 i j k ω ii ω ji X j fn 1 (X i, Z i e T 1 G 1 n (X i, Z i L(h, X k X i, Z k Z i τ 1 I (Y k q τ 1 ] I (X k, Z k C n (X i, Z i ] = 1 n ω τ 1 I (Y k q τ 1 1 n ] fv 1 XZ (0, X k, Z k f XZ (X k, Z k ψ τ 1 2 (Z k + o p (1, k=1 1 n(n 1(n 2 i j k ω ii ω ji (q τ 1 ii qτ 2 ii 1 fn (X i, Z i e T 1 G 1 n (X i, Z i L(h, X k X i, Z k Z i τ 1 I (Y k q τ 1 ] I (X k, Z k C n (X i, Z i ] = 1 n ω (q τ 1 n qτ 2 τ 1 I (Y k q τ 1 1 ] fv 1 XZ (0, X k, Z k f XZ (X k, Z k ψ τ 1 1 (Z k + o p (1 and k=1 1 n(n 1(n 2 i j k ω ii ω ji (q τ 1 ji qτ 2 ji 1 fn (X i, Z i e T 1 G 1 n (X i, Z i L(h, X k X i, Z k Z i τ 1 I (Y k q τ 1 ] I (X k, Z k C n (X i, Z i ] = 1 n ω τ 1 I (Y k q τ 1 1 n ] fv 1 XZ (0, X k, Z k f XZ (X k, Z k (ψ τ 1τ 1 3 (Z k ψ τ 1τ 2 3 (Z k + o p (1. k=1 19

Combining these results together, we can establish that ( Γ 1n = 1 n π 1k + o p (1 n k=1 π 2k by referring to various notations introduced prior to Proposition 1. establish that Γ 2n = 1 n ( n k=1 π 3k π 4k + o p (1, Similarly, we can Γ 3n = 1 n ( n k=1 π 5k π 6k + o p (1, and Γ 4n = 1 n ( n k=1 π 7k π 8k + o p (1, which completes the proof. 20