Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Songnian Chen a, Xun Lu a, Xianbo Zhou b and Yahong Zhou c a Department of Economics, Hong Kong University of Science and Technology b Lingnan College, Sun Yat-sen University c School of Economics, Shanghai University of Finance and Economics April 3, 24 Abstract We consider nonparametric identi cation and estimation of truncated regression models with unknown conditional heteroskedasticity. The existing methods (e.g., Chen (2)) that ignore heteroskedasticity often result in inconsistent estimators of regression functions. In this paper, we show that both the regression and heteroskedasticity functions are identi ed in a location-scale setting. Based on our constructive identi cation results, we propose kernel-based estimators of regression and heteroskedasticity functions and show that the estimators are asymptotically normally distributed. Our simulations demonstrate that our new method performs well in nite samples. In particular, we con rm that in the presence of heteroskedasticity, our new estimator of the regression function has a much smaller bias than Chen s (2) estimator. Key Words: Truncated data; Nonparametric regression; Heteroskedasticity; Kernel estimator. JEL Classi cation: C4, C24 We would like to thank Arthur Lewbel and Liangjun Su for their helpful comments. Address correspondence to: Songnian Chen, Department of Economics, Hong Kong University of Science and Technology; E-mail: snchen@ust.hk.

Introduction Consider the nonparametric location-scale model Y = m (X) + (X) "; (.) where X is an observable d-dimensional vector of regressors, m () and () are unknown regression (location) and positive heteroskedasticity (scale) functions, respectively, and " is an unobservable error term that is independent of X: This model has been extensively studied in statistics and econometrics (see, e.g., Fan and Gijbels (996) for kernel estimators and Newey (997) for series estimators). However, in empirical research, data are often truncated. That is, Y is latent, and the observed data (Y; X) are from the conditional distribution of (Y ; X) given the event fy > g : The goal of this paper to provide identi cation and estimation results for the two unknown functions m () and () in the presence of data truncation. The multiplicative heteroskedasticity in equation (.) is commonly used in empirical research. For example, almost all applications of weighted least square estimators assume multiplicative heteroskedasticity. It is also commonly assumed in the nonparametric censored models (see, e.g., Fan and Gijbels, 994, Van Keilegom and Akritas, 999, and Chen, Dahl and Khan, 25). In two closely related papers, Lewbel and Linton (22, LL hereafter) and Chen (2, Chen hereafter) consider nonparametric truncated regression models with homoskedasticity, i.e., (X) is assumed to be a constant. When data are truncated, ignoring the presence of heteroskedasticity is likely to lead to inconsistent estimators of the regression function. This is in sharp contrast with the case without truncation where ignoring heteroskedasticity only results in possible e ciency loss. To the best of our knowledge, none of the existing methods can be applied to truncated regression models with the form of heteroskedasticity in equation (.). 2 As discussed in Chen, the identi cation condition in Chen is weaker than that in LL, and the nite sample performance of Chen s method is demonstrated to be better than that of LL s. Hence, we generalize Chen s approach to deal with the case of heteroskedasticity. 3 Although this paper mainly builds on Chen, the identi cation conditions and asymptotic properties are substantially di erent from those in Chen. The complication arises from the fact that here we have to identify and estimate two unknown functions jointly. The key insight behind our identi cation results is that the conditional survival function of Y given X = x; after being appropriately rescaled, matches the survival function of the error term. Thus, at Both LL and Chen brie y discuss a form of heteroskedasticity in their concluding remarks. They consider Y = m (X) + ; where X = (X ; X 2 ). They assume that X and are conditionally independent given X 2 : Thus, is allowed to depend on X 2 in an arbitrary way but cannot depend on X : Here, we allow to depend on both X and X 2 : 2 There is also a relatively large body of literature on the parametric or semiparametric estimation of truncated regression models, see, e.g., Powell (986), Lee (992), Honoré and Powell (994), Newey (2, 24), Cosslett (24), Khan and Lewbel (27), and Chen and Zhou (22). 3 It is not clear how to extend the method in LL to the heteroskedasticity in equation (.). 2

two di erent points, say at X = x and X = x, the conditional survival functions of Y should also be matched, after being appropriately rescaled. Speci cally, we choose a reference point x and impose the location and scale normalization condition that m (x ) = and (x ) =. Let x be an evaluation point, and thus the parameter of interest is (m (x); (x)): Let S (; x) be the conditional survival function of Y given (X = x; Y > ) : S (t; x) P (Y > t j X = x; Y > ) = P (Y > t j X = x) ; which can be estimated from the truncated data. We can show that when m (x) ; S (t; x ) = S ( (x)t + m (x); x) S (m (x); x) for all t : (.2) The left-hand side of equation (.2) is the conditional survival function of Y given the reference point X = x ; whereas the right-hand side is the conditional survival function of Y given X = x, being appropriately rescaled by the true parameters m (x) and (x): Equation (.2) holds because both sides are equal to a rescaled survival function of the error term. m (x) < ; S t S (t; x) = m ; x S m ; x Similarly, we can show that when for all t : (.3) Thus, the true parameter (m (x); (x)) must satisfy equations (.2) and (.3). Further, we show that (m (x); (x)) is the unique solution to equations (.2) and (.3) if the error term " does not follow certain special distributions. Building on equations (.2) and (.3), we further propose a kernel-based estimator of (m (x); (x)) by minimizing a criterion function. Our estimator is consistent and asymptotically normally distributed with a typical nonparametric convergence rate. We conduct Monte Carlo simulations to examine the nite sample performance of our new estimator and nd that our estimator substantially outperforms Chen s estimator in most cases, especially when the variation of the heteroskedasticity function is large. The rest of the paper is organized as follows. In Section 2, we discuss the conditions for the identi- cation of (m (); ()). In Section 3, we propose an estimation procedure based on our identi cation results and study its asymptotic properties. Section 4 provides simulation evidence of the nite sample performance of our new estimator. Section 5 contains discussions on several related issues, including how to identify and estimate the error distribution, censored models, and truncated models in panel data; how to relax the independence assumption to a conditional independence assumption; and how to perform model speci cation testing. Section 6 concludes. All the mathematical proofs are relegated to the Appendix. 3

2 Identi cation Suppose that the observations consist of data fy i ; X i ; i = ; :::ng from the conditional distribution of (Y ; X) given the event fy > g; where the latent dependent variable Y is generated as in equation (.). 4 The unknown parameters here are fm () ; (); F ()g ; where F () is the distribution of ". Apparently, m () and () are not identi ed without suitable normalizations. Equation (.) can be written as Y (X) = [m (X) + c (X)] + " ; where c and c 2 are two arbitrary constants, and " [c 2 (" c )] : Thus, fm () ; () ; F " ()g is n o observationally equivalent to m () + c () ; () c 2 ; F () ; where F () is the distribution of " : Essentially, we need location and scale normalizations. For this, we choose a reference point x 2 X such that P (Y > jx = x ) > ; where X is the support of X: Without loss of generality, we impose the following normalization conditions: c 2 m (x ) = and (x ) = : (See Remark 2. below for the general case in which m (x ) = c and (x ) = c 2 for two known constants c and c 2.) With the normalization above, we now show how to identify m (x) and (x) for a given evaluation point x 2 X such that P (Y > jx = x) > : 5 Let S () and () be the survival function and hazard function of ", respectively, i.e., S (t) F (t) and (t) d (ln S(t)) dt = S (t) d (S (t)) : dt Let S (; x) denote the truncated conditional survival function of Y given X = x: That is, S (t; x) P (Y > tjy > ; X = x) = P (m (X) + (X) " > tjx = x) P (m (X) + (X) " > jx = x) Further, it follows that P " > t m(x) S (X) X = x (t; x) = P " > m(x) = (X) X = x S S t m m for t : for all t ; (2.) where the second equality follows from the assumption that " and X are independent. Equation (2.) provides the key link between the conditional survival function of Y and the survival function of the error term ": have Consider two cases for m (x): (i) m (x) and (ii) m (x) < : For case (i); by equation (2.), we S ( (x)t + m (x); x) S (m (x); x) = S (t) S () and S (t; x ) = S (t) S () for all t : 4 In this section of identi cation, we suppress the observation subscript i for notational convenience. 5 If P (Y > jx = x) = ; neither m (x) nor (x) can be identi ed. 4

Therefore, S (t; x ) = S ( (x)t + m (x); x) S (m (x); x) for all t : (2.2) This suggests that when m (x) ; the true parameter (m (x); (x)) must satisfy equation (2.2). Note that S (t; x) is always identi ed for t ; thus, equation (2.2) forms the basis for identi cation of (m (x); (x)) when m (x) : For case (ii); again by equation (2.), we have S t m ; x S t m S t m = and S (t; x) = for all t : S m ; x S m S m Hence, S t S (t; x) = m ; x S m ; x for all t : (2.3) Equation (2.3) forms the basis for identi cation (m (x); (x)) when m (x) < : Combining equations (2.2) and (2.3) suggests that (m (x); (x)) must satisfy T (m (x); (x)) = ; where T (m; ) T (m; ) fm g + T 2 (m; ) fm < g; (2.4) T (m; ) Z S (t; x ) S (t + m; x) S (m; x) 2 dt; T 2 (m; ) Z S (t; x) S ( t m ; x 2 ) S ( m ; x dt; (2.5) ) and f:g is the usual indicator function. To formalize the discussion above, we make the following assumptions. Assumption A. (i) The data fy; Xg are generated based on equation (.) conditional on Y > : (ii) " is independent of X with a nite rst moment. (iii) m (x ) = and (x ) = ; where x 2 X is a reference point with P (Y > jx = x ) >. (iv) The evaluation point x 2 X satis es P (Y > jx = x) > and (x) > : (v) lim t ts (t) = : n De ne x = max ; o m : Assumption A2. The survival function of ", S () ; satis es the property that there does not exist a pair (; ) with ; > ; (; ) 6= (; ) such that S (t) S (t + ) = S () S () for all t 2 [ x ; ): (2.6) Lemma 2.. (Identi cation) Suppose that A and A2 hold. Then (m (x), (x)) is identi ed as the unique solution to T (m; ) = : 5

In general, identi cation fails when A2 does not hold. The lack of identi cation is not particular to any speci c method. To see this, suppose that A holds and that there exists (; ) 2 R 2 such that (; ) 6= (; ) and S (t) S (t + ) = S () S () for all t 2 R: (2.7) Then, (m (x) ; (x)) and (m (x)+ (x) ; (x)) are observationally equivalent, thus (m (x), (x)) is not identi ed. Chen s identi cation result for the homoskedasticity case is a special case of our Lemma 2., where he imposes (x) = and lets = : In particular, Chen imposes the condition that there does not exist any > such that S (t) S (t + ) = S () S () for all t 2 [max f; m (x)g ; ); (2.8) which corresponds to our equation (2.6) with = : Chen shows that the condition in equation (2.8) can be satis ed under more primitive conditions. Because we have two functions, m () and (); to be identi ed, our condition in equation (2.6) is stronger than Chen s in equation (2.8). Note that when " has a positive density on the real line, equation (2.6) in A2 is equivalent to (t) = (t + ) for all t 2 [ x ; ); and equation (2.8) for Chen s homoskedasticity case is equivalent to (t) = (t + ) for all t 2 [max f; m (x)g ; ): We now discuss the identi cation assumptions A and A2. process for the truncated regression model with multiplicative heteroskedasticity. A(i) describes the data generation A(ii) is the key independence assumption that is indispensable for identi cation. A(iii) and (iv) impose conditions on the reference and evaluation points, respectively. A(v) is a condition that guarantees that the integral in T (; ) is nite. A2 is an important assumption that is not innocuous; thus, it is worth providing a more detailed discussion. In fact, A2 can be violated if the error term " follows certain special distributions. Below, we provide two concrete examples in which A2 is violated. " has unbounded and bounded support from the right in Example and Example 2 below, respectively. Example. Suppose " has the support [K; ] ; where K is nite or can be written as S (t) =. Its survival function S () c 3 (t + c 2 ) c for all t x; (2.9) where c > ; c 2 > ; and c 3 > are some constants. Equivalently, its hazard function () can be written as (t) = c t + c 2 for all t x : 6

Then, we can nd (; ) such that = c 2 ( ) and show that S (t + ) S () = ( + c 2 ) c (t + + c 2 ) = (c 2 ( ) + c 2 ) c c (t + c 2 ( ) + c 2 ) = (c 2) c S (t) c c = (t + c 2 ) S () ; and c (t + ) = = = c = (t) : (t + ) + c 2 (t + c 2 ( )) + c 2 t + c 2 Therefore, this example violates A2. In this case, both (m (x) ; (x)) and (m (x) + (x); (x)) solve the equation T (m (x) ; (x)) = : Thus, (m (x) ; (x)) is not identi ed using our approach. Note that this example includes a location-scale family of Pareto distributions as a special case. 6 Example 2. Suppose that the support of " is K; K ; where K < x < K; and K is nite. Further, its survival function S () can be written as c S (t) = ( c2 K t c ; if x t < K; ; if t K: (2.) where c > and c 2 > are some constants. Its corresponding hazard function () can be written as (t) = c K t ; if x t < K: Find (; ) such that = ( ) K: Then, S (t + ) S () S (t + ) S () = c 2 = K t + ( ) K c K c 2 K ( ) K c = t K c S () = S (t) S () if t K: c = S (t) S () if x t < K and Thus, Example 2 violates A2. It can also be seen that (t + ) = c K (t + ) = c K t + ( ) K = c K t = (t) if x t < K; and (t) is not de ned when t K: The uniform distribution is a special case of this example (c = ). The next two lemmas essentially show that if we rule out these two kinds of special distributions in Examples and 2, A2 holds. Lemma 2.2. (The case of " having unbounded support from the right) 6 Suppose E follows a Pareto distribution on [E; ) with E > : Consider " = ae P (E > t) = E c for all t E; where c > : t c 2 ; where a > and c 2 > : Then, the survival function of " is S (t) = c 3 (t + c 2 ) c for all t ae c 2; where c 3 = (ae) c : 7

(i) Suppose that " has the support [K; ] ; where K is nite or and lim t t r S (t) exists ( nite or in nite) for all r > : Further, the survival function S () cannot be written as equation (2.9). Then, equation (2.6) holds only if = : (ii) Suppose that in addition, () is not a periodic function such that there does not exist any > with (t) = (t + ) for all t 2 [ x ; ): Then, equation (2.6) holds only if (; ) = (; ) : That is, A2 holds. Lemma 2.2(i) states that if the distribution of the error term does not follow the particular form of equation (2.9), then (x) is identi ed. Once (x) is identi ed, the identi cation of m (x) is identical to that in Chen. Lemma 2.2(ii) thus directly follows from Chen. The additional condition in Lemma 2.2(ii) can be satis ed by more primitive conditions. For example, if X is continuously distributed, m () is continuous over the support of X, and () is not a constant on [ x ; ), then m (x) is identi ed (Chen, Lemma (iii)). Lemma 2.3. (The case of " having bounded support from the right) Suppose that " has bounded support K; K ; where K < x < K; K is nite or and K S( K is nite; and lim t) t+ t exists ( nite or r in nite) for all r > : Further, S () cannot be written as equation (2.). Then, equation (2.6) holds only if (; ) = (; ) : That is, A2 holds. Lemma 2.3 shows that if the error term has bounded support from the right and its distribution does not satisfy equation (2.), then both m (x) and (x) are identi ed. Remark 2.. Consider the general normalization m (x ) = c and (x ) = c 2 ; where c and c 2 are two known constants. Then, equation (.) can be written as Y = ~m (X) + ~ (X) ~"; h i h i c where ~m (x) m (x) c 2 (x) ; ~ (x) c 2 ; and ~" c 2 " + c : By construction, ~m (x ) = and ~ (x ) = : Thus, ~m () and ~ () are identi ed as discussed above. Therefore, m () and () are identi ed as " m (x) (x) # " # " # c ~m (x) = : c 2 ~ (x) Remark 2.2. following two equations: Chen assumes homoskedasticity and proposes an estimator of m (x) based on the S (t; x ) = S (t + m (x); x) S for all t if m (x) and (m (x); x) (2.) S (t; x) = S (t m (x); x ) S for all t if m (x) < : ( m (x); x ) (2.2) 8

That is, (x) is forced to be in our equations (2.2) and (2.3). In the presence of heteroskedasticity, equations (2.) and (2.2) are not necessarily true. Thus, Chen s estimator of m (x) is in general inconsistent in the presence of heteroskedasticity. 3 Estimation We consider nonparametric estimation of the location function m (x) and the heteroskedasticity scale function (x) in this section. Our estimator of (m (x); (x)) is based on the identi cation result that (m (x); (x)) is the unique minimizer of T (m; ) de ned in equation (2.4). The objective function for our estimation is constructed by replacing various elements in the expression of T (m; ) by their consistent estimators. Speci cally, our estimator, ( ^m(x); ^(x)); is de ned as the minimizer of the following function: T n (m; ) = T n (m; )w(m) + T n2 (m; )w c (m); over (m; ) on a compact parameter space M R 2 ; where w(m) is a smoothed version of the indicator function fm g; w c (m) = w(m); and T n (m; ) and T n2 (m; ) are the sample analogues of T (m; ) and T 2 (m; ) (equation (2.5)), respectively, with some minor adjustment for technical reasons. Speci cally, and T n (m; ) = T n2 (m; ) = Z Z S n (t; x ) S n(" ; x ) S n (t; x) S n(" ; x) Sn(t 2 + m; x) Sn(" w (t)dt + m; x) Sn( t m ; x 2 ) Sn( " m w (t)dt; ; x ) where " is a small and positive constant, w () is a non-negative weighting function, S n(t; x) = nx Yi K y i= h y t Xi k h x = nx i= Xi k h which is a smoothed version of the nonparametric estimator of S (t; x) based on the random sample fy i ; X i : i = ; 2; ; ng; k() is a kernel function with h as its bandwidth, and K y (t) = R t k y(v)dv is a smoothed step function with h y as the smoothing parameter. To study the large sample properties of our estimator, we make the following assumptions. Assumption B. fy i ; X i : i = ; 2; ; ng is a random sample from the distribution of (Y ; X) conditional on Y > with P P (Y > ) > : The error term " has positive density on the real line. Assumption B2. (i) The density function of X, p(); the regression function, m (); and the positive heteroskedasticity scale function, (); are continuously di erentiable up to order q in some neighborhoods of x and x with both p(x) and p(x ) being positive, and the cumulative distribution function of "; F (); is continuously di erentiable up to order q = maxfq; q y g for q y to be speci ed below, and x ; 9

these derivatives are continuously bounded. (ii) The parameter (m (x) ; (x)) is an interior point in the compact parameter space M R 2 : (iii) The weight function w() is twice continuously di erentiable such that w(m) ; w(m) = for m " =2; and w(m) = for m " =2; (iv) The weight function w () is continuous, non-negative, and integrable such that w (t) = for t 2 [; " ] and w (t) > for t > " : Assumption B3. Let k y (t) = dk y (t)=dt: The kernel function k() is continuously di erentiable, and k y () is twice continuously di erentiable, both with bounded supports. Moreover, they are, respectively, q and q y -order kernel functions, i.e., Z 8 >< ; if j = Z 8 >< ; if r = v j k(v)dv = >: ; if jjj < q B k ; if jjj = q and u r k y (u)du = >: ; if r < q y ; B ky ; if r = q y where v = (v ; ; v d ) ; j = (j ; ; j d ) ; jjj = P d i= j i; and v j = v j vj d d : Assumption B4. The sequence of bandwidths (h; h y ) satis es nh d h 3=2 y = ln n ; h qy y = o(h q ); h 2q =h y = o(); and nh d+2q = O() as n. B-B4 are standard in the nonparametric estimation literature and almost identical to those in Chen. B assumes that the cross-sectional data constitute a random sample. B2 mainly imposes the smoothness and boundedness conditions. B3 and B4 specify conditions on the kernel functions and the rates of the bandwidths, respectively. For example, when d = and the second-order kernels (q = q y = 2) are used, the bandwidths can be chosen as h _ n =5 and h y _ n =3. Theorem 3.. Under Assumptions A-A2 and B-B4, ^m(x) and ^(x) are consistent estimators of m (x) and (x); respectively, and asymptotically normally distributed: p nh d ^m(x) ^(x) m (x) (x) h q b(x) d N(; (x)); where b(x) and (x) are de ned in the proof. For the purpose of conducting statistical inference, we can estimate the asymptotic variance by replacing its various elements with their sample analogues. Remark 3.. With the general normalization m (x ) = c and (x ) = c 2 ; (m (x) ; (x)) can be estimated by # # # " m (x) (x) = C " ^m (x) ^ (x) ; where C = " c c 2 :

It is easy to show that p nh d m (x) (x) m (x) (x) Ch q b(x) d N(; C (x)c ): Remark 3.2. Here, we consider the case in which the regressor X is continuous. Discrete regressors can be easily accommodated. Speci cally, suppose that X = (X c ; X d ), where X c is continuous and X d is discrete. To estimate S (t; x) = S (t; (x c ; x d )); we can simply stratify the sample by each distinct discrete outcome. That is, it can be estimated using the data with X d = x d : More sophisticated methods, such as smoothing across the discrete outcomes, can also be applied (see, e.g., Li and Racine (23)). 4 Monte Carlo Simulations In this section, we examine the nite sample performance of our estimator. 4. Data generating processes and implementation We consider four data generating processes (DGPs). In all four DGPs, Y = m (X) + (X) "; where m (x) = :8 + x: The heteroskedasticity functions are chosen as follows. DGP : (x) = :6( + x 2 ); DGP 2: (x) = :6 exp(:4x 2 ); DGP 3: (x) = :6=(: + x 2 ); DGP 4: (x) = :6=(: + jxj): In all four DGPs, X is uniformly distributed on [ ; 2]; and " is independent of X. For each DGP, we consider three distributions for ": (i) the standard normal distribution, (ii) the logistic distribution standardized to have unit variance, and (iii) the distribution of (e :2)=:865, where e = :5e :5 a + :5( je b j :8 ) is a mixture distribution with e a and e b generated from the standard exponential distribution. The speci cations of m () and " are exactly the same as those in Chen. Figures (a)-(d) show the heteroskedasticity functions for the four DGPs, respectively. In all the simulations, we choose the reference point x =. We impose the normalization m (x ) = :8 for all four DGPs, and (x ) = :6 for DGPs and 2 and (x ) = 6 for DGPs 3 and 4. See Remarks 2. and 3. for how to handle the general normalization. We choose evaluation points (x) : :9; :6; :3; :3; :6; :9; :; :3; :6; and :9 in the support of X, [ ; 2]. The weight function is chosen as w (t) = f t < y :975 g; where y :975 is the :975th sample quantile of fy i g. We let w(m) = fm g and " =. We adopt the standard normal density function as the kernel function, k(): The bandwidth h is set to h = c s x n =5 where s x is the sample standard deviation of fx i g and c = :6 according to Silverman s (986) rule-of-thumb. Di erent values of the constant c; such as :6; :8; and :; produce similar results, thus they are not reported here. 7 It appears that the performance 7 The results for these di erent values of c are available upon request.

of our estimator is not very sensitive to the choice of bandwidth. The same phenomenon is also reported for Chen s estimator for the case of homoskedasticity. For each DGP, we report the bias, standard deviation (SD), and root mean square error (RMSE) for the estimators of m (x) and (x) : For m (x) ; we compare our estimator with Chen s estimator that ignores heteroskedasticity. We consider the sample sizes n = 4 and n = 8: The number of replications is 8. 3 Figure (a): DGP 3 Figure (b): DGP 2 2.5 2.5 σ (x) 2.5 σ (x) 2.5.5 - -.5.5.5 2 x Figure (c): DGP 3 6.5 - -.5.5.5 2 x Figure (d): DGP 4 6 4 4 σ (x) 2 σ (x) 2 - -.5.5.5 2 x - -.5.5.5 2 x Figure : Heteroskedasticity functions 4.2 Simulation results The simulation results for DGP are reported in Tables and 2 for n = 4 and n = 8; respectively. As shown in Figure (a), the range of the heteroskedasticity function () is [:6; 3]: In general, our estimator performs well and substantially outperforms Chen s estimator. In terms of RMSE, our estimator has a much smaller value than Chen s for most of the evaluation points, except those close to. As expected, Chen s estimator has a much larger bias than ours in the presence of heteroskedasticity. For example, when " is a standard normal random variable and n = 4, at the evaluation point x =, the bias of Chen s estimator (.55) is about 8 times as large as that of ours (-.58). When x is close to, say in the area of [ :3; :3] ; the bias of Chen s estimator is relatively small, which is not surprising. Essentially, both our estimator of (m (x) ; (x)) and Chen s estimator of m (x) rely only on the information in two local areas: the area around the evaluation point x and that around the reference point x : Intuitively, Chen s estimator, which ignores heteroskedasticity, performs well if the following three conditions are met: (i) () is at around the evaluation point x; (ii) () is at around the reference 2

point x ; and (iii) (x) is close to (x ) : It turns out that when x 2 [ :3; :3] and x = ; these three conditions are approximately true. As shown in Figure (a), the heteroskedasticity function is almost at in the area x 2 [ :3; :3], with a range of [:6; :654]: Hence, ignoring the heteroskedasticity in this area does not result in a large bias. Tables 3 and 4 present the results for DGP 2 for n = 4 and n = 8; respectively. The range of the heteroskedasticity in this case is [:6; 2:97]; which is similar to that in DGP. When x 2 [ :6; :6] ; the range of (x) is quite at with a range of [:6; :693]. Again, the performance of our estimator is much better than that of Chen s in the region of large heteroskedasticity. Tables 5 and 6 report the simulation results for DGP 3. As shown in Figure (c), the range of the heteroskedasticity function is quite large: [:5; 6]. In this DGP, our estimator is much better than Chen s for all evaluation points. For example, when " is standard normally distributed and n = 4, at x = :9; the bias, standard deviation, and RMSE of Chen s estimator are about 43, 4, and 5 times as large as those of ours, respectively. In contrast to the two previous DGPs, Chen s estimator is much worse than ours even in the area where the heteroskedasticity function is relatively at, say around x = :9: As explained above, Chen s estimator works well when the three conditions are met. In this case, conditions (ii) and (iii) are clearly not satis ed. In the neighborhood of the evaluation point x = ; the heteroskedasticity function varies substantially. In addition, (x) is quite di erent from (x ) when x is close to.9. This suggests that when the heteroskedasticity function varies much globally over the support of X; in general, the performance of Chen s estimator is poor, even in the region where the heteroskedasticity function is at locally. Tables 7 and 8 report the simulation results for DGP4. As shown in Figure (d), the range of the heteroskedasticity function () is also quite wide: [:3; 6]: Strictly speaking, Assumption B2(i) is not satis ed here, as the heteroskedasticity function () is not di erentiable at x = : Even so, our estimator performs well. Similar to DGP 3, our estimator substantially outperforms Chen s for all evaluation points. This DGP con rms that Chen s estimator performs poorly when the heteroskedasticity function has substantial variation globally. 5 Discussions and Extensions 5. The error distribution Our discussion so far has focused entirely on the identi cation and estimation of (m () ; ()). Another object of interest is the distribution of the error term "; F (). We now brie y discuss how to identify and estimate F () : Let Y H (u) E (X) m (X) (X) = u; Y > = E Y (X) m (X) (X) = u ; 3

and H (u) be its derivative. Note that given the identi cation of m () and () ; both H () and H () are identi ed. The next lemma shows how to identify F () : Lemma 5. Suppose that A and A2 hold. Then, Z F (e) = exp e H (u) du : (5.) H (u) LL provide an alternative identi cation result for F () in the case of homoskedasticity. Based on our constructive identi cation result of F () ; we can propose a corresponding estimator. Let ^H () be the nonparametric regression of Y i ^(X i) on ^m(xi) ^(X i) Then, the estimator is the sample analogue of equation (5.): " Z # ^H ^F (u) (e) = exp du : ^H (u) 5.2 Speci cation testing using the truncated data and ^H () be its derivative. Our estimation method provides a foundation for performing various model speci cation tests. First, we may want to test the hypothesis of homoskedasticity. Under the null, (x) = for some > for all x 2 X: With our normalization (x ) = ; the null becomes (x) = for all x 2 X: Simply inspecting the estimates of () at di erent evaluation points provides useful information about whether homoskedasticity is plausible. Alternatively, following Härdle and Mammen (993), we can base the test statistic on the L 2 distance between the estimators under the null and under the alternative. That is, we test the null hypothesis E[( (X) sample analogue can be used as a test statistic: T n = n e )) 2 a (X)] = ; where a () is a positive weight function. The nx (^ (X i ) ) 2 a (X i ) : i= We may also be interested in whether the location function m () satis es certain parametric restrictions. Hence, we test the hypothesis m (x) = m (x; ), where m is a function known up to a nite-dimensional parameter. We can also construct an L 2 distance-based test statistic: T n = n nx [ ^m (X i ) m (X i ; ^)] 2 a (X i ) ; i= where ^ is the parametric estimator of and a () is a positive weight function. Given the asymptotic theory for ( ^m () ; ^ ()); asymptotic distributions for T n and T n can be established. We leave the detailed analysis for future work. 4

5.3 Conditional independence assumption We may relax the independence assumption to a conditional independence assumption. Speci cally, let X (X ; X 2 ): Then, equation (.) becomes Y = m (X ; X 2 ) + (X ; X 2 ) ": We assume that X and " are conditionally independent given X 2 : This setup is similar to that in LL and Chen except that (X ; X 2 ) depends on both X and X 2 here; whereas (X ; X 2 ) depends only on X 2 in LL and Chen. We can show that when m (x ; x 2 ) S t S (t; x ; x 2 ) = and when m (x ; x 2 ) h (x ;x 2) (x ;x 2) m (x ; x 2 ) (x;x2) ; (x ;x 2) i h + m (x ; x 2 ) S h m (x ; x 2 ) m (x ; x 2 ) (x;x2) (x ;x 2) < ; S (t; x ; x 2 ) = S t S h Consequently, we can identify and estimate manner as in Sections 2 and 3. m (x ; x 2 ) (x;x2) m (x ; x 2 ) (x;x2) (x ;x 2) h i m (x ;x 2) m (x ;x 2) (x ;x 2 ) (x ;x 2 ) h i (x ;x 2 ) (x ;x 2 ) i m (x ;x 2) m (x ;x 2) (x ;x 2 ) (x ;x 2 ) h (x ;x 2 ) (x ;x 2 ) m (x ; x 2 ) (x ;x 2) i ; x ; x 2 i ; x ; x 2 ; ; x ; x 2 i ; x ; x 2 : m (x ; x 2 ) (x;x2) (x ;x 2) ; (x;x2) (x ;x 2) in a similar 5.4 Censored regression models We can also consider censored regression models. Suppose that the observable dependent variable is generated as We can show that for all t ; Y = max fy ; g = max fm (X) + (X) "; g : (5.2) S ( (x)t + m (x); x) = S (t; x ) if m (x) and S t m (x) ; x = S (t; x) if m (x) <. (x) Hence, the true parameter (m (x); (x)) must satisfy ~ T (m (x); (x)) = ; where et (m; ) Z [S (t; x ) S (t + m; x)] 2 dt fm g + Z 2 t m S (t; x) S ; x dt fm < g : The estimator can be constructed as the minimizer of the sample analogue of ~ T (m; ) : b e T n (m; ) = Z [S n(t; x ) S n(t + m; x)] 2 w (t)dt w(m) + Z 5 S n(t; x) S n t 2 m ; x w (t)dt w c (m);

where Sn () ; w () ; w(); and w c () are the same as those de ned above. The asymptotic theory for the estimators can be similarly established. In the literature, there are several existing nonparametric estimators for censored regression models. LL consider the homoskedasticity case and Fan and Gijbels (994), Van Keilegom and Akritas (999), and Chen, Dahl and Khan (25) provide various estimators of (m (); ()) for the heteroskedasticity case, as in equation (5.2). The estimator discussed here can be thought of as an alternative approach. 5.5 Panel data case Chen also considers the truncated nonparametric regression in panel data. Speci cally, he considers the following equation for the latent variable Y t : where is a xed e ect and " t the idiosyncratic error term. Y t = + m (X t ) + " t for t = ; 2; (5.3) For simplicity, we only consider two time periods. The key identi cation assumption is the conditional pairwise exchangeability condition that (" ; " 2 ) and (" 2 ; " ) are identically distributed conditional on (X ; X 2 ; ) (see, e.g., Honoré (992) and Honoré and Kyriazidou (2)). Strictly speaking, Chen s estimator allows a certain form of heteroskedasticity, as " t can depend on or time-invariant elements of X t : Naturally, we may extend equation (5.3) to the following equation Y t = + m (X t ) + (X t ) " t for t = ; 2; (5.4) where () is the heteroskedasticity function. However, it turns out that the identi cation of equation (5.4) is di cult. The main reason is that the conditional pairwise exchangeability condition does not Y imply that ; Y 2 (X ) (X 2) given (X = x ; X 2 = x 2 ) is distributed symmetrically around the 45%-line through ; m m (x 2) (x ) (x 2) ; which is the key to achieve identi cation for equation (5.4). Alternatively, we may consider the following equation: Y t = m (X t ) + (X t ) ( + " t ) for t = ; 2: (5.5) This equation requires that the xed e ect and the idiosyncratic error term " t share the same heteroskedasticity function. Admittedly, this is restrictive. Thus, we only brie y discuss how to identify and estimate equation (5.5). De ne " t + " t ; t = ; 2: Let X p be the support of X = (X ; X 2 ). Suppose that P (Y > ; Y 2 > jx = x ; X 2 = x 2 ) > for any (x ; x 2 ) 2 X p : For any s > ; s 2 > ; and (x ; x) 2 X p ; de ne the joint conditional survival function of (Y ; Y 2 ) under the truncation, G(s ; s 2 ; x ; x) = P (Y > s ; Y 2 > s 2 jx = x ; X 2 = x) = S p s m (x ) (x ) ; S p m(x ) s2 m ; m (x ) ; x ; x ; (5.6) ; x ; x 6

where S p (s ; s 2 ; x ; x) = P (" > s ; " 2 > s 2 jx = x ; X 2 = x) is the conditional joint survival function of (" ; " 2) conditional on X = x and X 2 = x: We impose the same normalization as in the crosssectional case: m (x ) = and (x ) = : Then, under the conditional pairwise exchangeability, by equation (5.6), we have for m (x) and G(s ; (x)s 2 + m (x); x ; x) = G(s 2 ; (x)s + m (x); x ; x) s G m (x) s2 ; s 2 ; x ; x = G (x) m (x) ; s ; x ; x (x) for m (x) <. For any (m; ) in the parameter space M R 2 ; de ne two location-and-scale shifted versions of G(s ; s 2 ; x ; x) as G m;(s ; s 2 ; x ; x) G(s ; s 2 + m; x ; x) and G 2 m;(s ; s 2 ; x ; x) G s m ; s 2 ; x ; x : It is easy to show that G m ; (s ; s 2 ; x ; x) is pairwise exchangeable in (s ; s 2 ) if m (x) and that G 2 m ; (s ; s 2 ; x ; x) is pairwise exchangeable in (s ; s 2 ) if m (x) < : De ne where and T p (m; ) T p2 (m; ) T p (m; ) T p (m; )fm g + T p2 (m; )fm < g; Z Z Z Z G m; (s ; s 2 ; x ; x) G m;(s 2 ; s ; x ; x) 2 w (s )w (s 2 )ds ds 2 G 2 m; (s ; s 2 ; x ; x) G 2 m;(s 2 ; s ; x ; x) 2 w (s )w (s 2 )ds ds 2 : Then, the true parameter (m (x) ; (x)) must satisfy T p (m (x) ; (x)) = : With similar conditions in the cross-sectional case, we can show that (m (x) ; (x)) is the unique solution to T p (m; ) = : Thus, identi cation is achieved. For the estimation of (m (x) ; (x) ; suppose that f(y it ; X it ); i = ; 2; ; n; t = ; 2g is a random sample generated from the conditional distribution of fy t ; X t ; t = ; 2g given the event fy > and Y 2 > g: We de ne our nonparametric estimator ( ^m p (x); ^(x)); for (m (x); (x)); as the minimizer of the following objective function: T pn (m; ) = T pn (m; )w(m) + T pn2 (m; )w c (m); where T pn (m; ) = T p2 (m; ) = Z Z Z Z " 2 G n (s ; s 2 + m; x ; x) w (s )w (s 2 )ds ds 2 G n (s 2 ; s + m; x ; x) s G m n ; s 2 ; x ; x # 2 ; s ; x ; x w (s )w (s 2 )ds ds 2 G n s 2 m and 7

with G n (s ; s 2 ; x ; x) = n nx i= Yi s Yi2 s 2 K y K y h y h y h 2d k Xi x h Xi2 k h x : Similar to the cross-sectional case, we can establish p nh d ^m p (x) m (x) ^ p (x) (x) h q B p (x) d N(; p(x)); where B p (x) and p (x) are the bias term and asymptotic variance term, respectively. 8 6 Conclusion In this paper, we establish conditions for the nonparametric identi cation of regression and heteroskedasticity functions in truncated regression models with unknown heteroskedasticity. Building on the constructive identi cation results, a kernel-based estimation procedure is proposed and is shown to have desirable asymptotic properties. Our Monte Carlo simulations demonstrate that in the presence of heteroskedasticity, our new estimator performs much better than the existing method that ignores the heteroskedasticity. 8 The speci c formula for B p (x) and p(x) are available upon request. 8

Table : DGP, n = 4 Our estimator Our estimator Chen s estimator x (x) Bias SD RMSE m (x) Bias SD RMSE Bias SD RMSE " standard normal -.9.86 -.26.324.383 -..275.644.7.766.36.846 -.6.86 -.62.266.274.2.66.523.527.293.93.35 -.3.654.8.227.228.5 -.59.445.449.67.38.53.3.654.35.97.2. -.36.334.336.52.48.57.6.86.45.257.26.4 -.54.44.47.248.249.35.9.86.22.325.325.7 -.66.52.56.84.5.955..2.3.355.355.8 -.58.573.575.55.565.96.3.64 -.4.442.442 2. -.39.699.699.668.477.735.6 2.36 -.83.547.553 2.4.27.824.823.625.48.676.9 2.766 -.32.62.69 2.7.93.969.973.29.575.347 " logistic -.9.86 -.7.38.352 -..24.6.637.728.43.845 -.6.86 -.27.267.268.2 -.5.474.474.257.28.337 -.3.654.22.23.24.5 -.63.374.379.46.3.39.3.654.47.83.89. -.54.258.263.7.3.4.6.86.26.237.238.4 -.37.35.37.99.35.372.9.86.38.32.34.7 -.7.384.39.772.589.97..2.29.328.329.8 -.68.442.447.32.649.29.3.64 -.6.43.43 2. -.44.562.563.5.542.596.6 2.36 -.34.544.545 2.4 -.75.62.625.54.59.587.9 2.766 -.29.65.72 2.7..87.86.87.729.393 " mixture -.9.86.23.445.445 -..98.44.425.364.698.787 -.6.86 -.3.244.245.2 -.64.87.97 -..263.263 -.3.654.39.74.78.5 -.9.9.2.2.69.69.3.654 -.8.24.24. -.5.52.52 -.2.43.44.6.86 -.67.44.59.4 -.3.7.72 -.32.7..9.86 -.49.89.24.7 -.28.76.8.8.398.398..2 -.8.22.27.8 -.29.93.98.48.48.482.3.64 -.298.259.394 2. -.49.29.38.45.722.735.6 2.36 -.44.382.563 2.4 -.68.49.63.26.66.672.9 2.766 -.728.59.889 2.7 -.42.25.249.54.773.774 9

Table 2: DGP, n = 8 Our estimator Our estimator Chen s estimator x (x) Bias SD RMSE m (x) Bias SD RMSE Bias SD RMSE " standard normal -.9.86 -.39.283.36 -..66.69.63.759.39.823 -.6.86 -.2.232.232.2 -.3.497.497.33.38.333 -.3.654.4.82.82.5 -.5.358.362.64.8.26.3.654.4.44.45. -.2.223.224.33.94.99.6.86.22.9.92.4 -.4.294.297.28.8.276.9.86.3.233.233.7 -.34.36.36.775.435.888..2.8.26.26.8 -.47.4.44.72.54.89.3.64 -.2.343.343 2. -.54.54.543.769.4.84.6 2.36 -..45.45 2.4 -.56.72.74.72.296.747.9 2.766 -.236.549.597 2.7.4.83.83.353.395.4 " logistic -.9.86 -.27.26.289 -..96.54.549.76.378.8 -.6.86..27.27.2 -.76.44.42.247.37.282 -.3.654.7.65.66.5 -.58.297.32.45.92.2.3.654.5.3.32. -.6.56.57.2.88.9.6.86.6.56.57.4 -.27.76.78.5.86.24.9.86.9.29.29.7 -.35.242.245.759.592.963..2..228.228.8 -.33.29.292.3.64.84.3.64 -.23.36.37 2. -.45.37.373.58.489.655.6 2.36 -.22.428.428 2.4 -.77.489.494.639.44.69.9 2.766 -.93.558.59 2.7 -.66.64.643.323.52.45 " mixture -.9.86.39.346.347 -..2.33.352.24.73.76 -.6.86 -.32.75.78.2 -.6.84.93 -.53.244.249 -.3.654.2.6.7.5..5.5.2.53.53.3.654 -.3.9.92. -.4.37.37 -..3.33.6.86 -.58.22.35.4 -..5.5 -.3.79.85.9.86 -.28.54.2.7 -.2.62.65 -.24.3.3..2 -.56.64.226.8 -.24.69.72 -.2.373.373.3.64 -.248.99.39 2. -.35.89.95.54.557.559.6 2.36 -.385.262.465 2.4 -.57.5.29.57.642.644.9 2.766 -.628.383.736 2.7 -.2.5.87.44.722.722 2

Table 3: DGP 2, n = 4 Our estimator Our estimator Chen s estimator x (x) Bias SD RMSE m (x) Bias SD RMSE Bias SD RMSE " standard normal -.9.83 -.87.263.277 -..52.55.527.394.27.478 -.6.693 -.25.222.223.2.39.424.425.42.95.24 -.3.622.4.22.224.5 -.85.43.42.7.52.53.3.622.34.9.94. -.38.35.37.3.27.27.6.693.22.222.223.4 -.3.342.342.8.69.87.9.83.23.26.262.7 -..376.376.27.275.386..895.2.263.264.8 -..354.354.45.343.53.3.8.35.323.324 2. -.42.442.443.39.53.59.6.67.2.435.435 2.4 -.37.59.59.522.4.574.9 2.543 -.257.57.625 2.7 -.7.73.729.267.477.354 " logistic -.9.83 -.88.264.278 -..22.58.522.357.383.524 -.6.693.2.25.25.2 -.34.382.384.88.22.229 -.3.622.22.22.23.5 -.55.345.349.9.38.38.3.622.5.6.6...98.98..6.7.6.693.4.82.8.4.3.22.22.52.44.53.9.83..28.28.7..239.239.23.284.349..895 -..227.227.8..267.267.323.4.54.3.8.25.3.3 2. -.36.354.355.999.67.74.6.67.46.458.46 2.4 -.89.53.5.42.474.48.9 2.543 -.278.572.635 2.7 -.9.65.657.6.643.327 " mixture -.9.83.52.347.35 -..23.325.348.259.532.592 -.6.693.2.26.27.2 -.63.75.86 -.3.98.99 -.3.622.35.57.6.5 -.24.23.26 -.6.8.8.3.622.6.5.5..2.49.49 -.4.42.42.6.693 -.9.3.3.4 -..74.74 -.4.49.5.9.83 -.6.52.63.7 -.7.9.92 -.26.3.34..895 -.77.64.82.8 -..86.86 -.27.84.86.3.8 -.48.29.256 2. -.28.92.96 -.5.368.368.6.67 -.292.276.4 2.4 -.53.26.36.6.57.58.9 2.543 -.736.38.828 2.7 -.28.88.227.97.588.595 2

Table 4: DGP 2, n = 8 Our estimator Our estimator Chen s estimator x (x) Bias SD RMSE m (x) Bias SD RMSE Bias SD RMSE " standard normal -.9.83 -.85.233.248 -..27.487.53.38.95.427 -.6.693 -.8.94.94.2 -.6.395.395.37.43.98 -.3.622.27.75.77.5 -.6.333.338.27.4.7.3.622.2.22.22..2.8.8.5.94.95.6.693 -.4.3.3.4.2.8.8.72.8.38.9.83.2.57.57.7.8.24.24.247.83.37..895..78.78.8..264.264.362.25.44.3.8 -.4.232.232 2..4.33.33.34.46.3.6.67.26.347.348 2.4 -.52.46.463.643.322.674.9 2.543 -.26.54.554 2.7 -.52.653.655.382.36.427 " logistic -.9.83 -.62.2.22 -..87.423.43.366.248.442 -.6.693.2.78.79.2 -.64.336.342.7.67.24 -.3.622.39.58.63.5 -.72.273.282.9.4.6.3.622.6.3.3. -.2.25.25.8.78.78.6.693.5.7.7.4..3.3.53.6.9.9.83.6.34.34.7 -.6.55.55.77.27.28..895.4.42.42.8 -.6.54.54.288.322.432.3.8.3.2.2 2. -.5.22.22.95.547.97.6.67.22.38.38 2.4 -.28.32.33.483.4.539.9 2.543 -.64.497.523 2.7 -.3.494.54.39.437.39 " mixture -.9.83.55.273.278 -..6.258.279.7.47.433 -.6.693..52.52.2 -.55.5.6 -.52.9.97 -.3.622.22.8.9.5 -.7.66.66 -.5.7.7.3.622.7.92.92...36.36 -.5.3.3.6.693 -.3.4.5.4 -.4.4.4 -.6.33.37.9.83 -.6.8.33.7 -.5.47.49 -.38.36.52..895 -.7.25.43.8 -.7.49.5 -.34.6.65.3.8 -.43.56.22 2. -.27.65.7 -.34.295.296.6.67 -.24.224.329 2.4 -.47.9.2.4.446.446.9 2.543 -.622.3.69 2.7 -.8.35.79.3.55.552 22

Table 5: DGP 3, n = 4 Our estimator Our estimator Chen s estimator x (x) Bias SD RMSE m (x) Bias SD RMSE Bias SD RMSE " standard normal -.9.66.336.39.5 -. -.6.64.63 2.3 2.396 3.32 -.6.32.5.654.828.2 -.85.93.92 -.735 2.746 2.84 -.3 3.56.588.4.284.5.276.6.623-2.68.569 2.595.3 3.56.522.996.28..4.398.397-2.769.32 3.59.6.32.468.66.8.4 -.27.944.98-3.47.968 3.62.9.66.86.336.384.7 -.32.474.475-3.636.94 3.756..546.74.324.366.8 -.4.429.43-3.676.959 3.799.3.336.26.68.2 2. -..23.23-3.697.28 3.837.6.228.68.2.24 2.4 -.78.28.5-3.678.9 3.836.9.62.98.6.24 2.7 -.9.82.27-3.68.94 3.8 " logistic -.9.66.32.348.468 -..28.526.526 2.53 2.4 3.249 -.6.32.534.582.792.2 -.23.848.874.42 2.882 2.97 -.3 3.56.69.38.242.5.65.455.463 -.289 2.85 2.535.3 3.56.666.32.23. -.86.43.44-2.5.784 2.758.6.32.432.63.762.4 -.6.9.95-2.897.69 3.38.9.66.74.27.38.7.58.377.38-3.24.324 3.476..546.38.24.282.8.64.29.298-3.25.248 3.48.3.336.96.32.62 2..6.56.67-3.36.77 3.59.6.228.56.78.74 2.4 -.33.5.9-3.76.283 3.425.9.62.98.98.282 2.7 -.57.22.98-3.6.248 3.264 " mixture -.9.66.552.6.8 -..79.324.37 2.227.28 2.569 -.6.32.558.84.984.2.56.422.425.969.222 2.37 -.3 3.56.966.34.626.5.27.825.834.32.256.69.3 3.56.624.88.338..3.84.824.924.52.762.6.32.438.678.8.4.22.33.396.43.89.938.9.66.36.396.54.7.98.25.285 -.372.87.95..546.294.342.45.8.25.85.276 -.475.789.85.3.336.26.98.294 2..67.37.26 -.829.639.835.6.228.86.38.234 2.4.38.6.73 -.95.594.854.9.62.86.44.234 2.7 -..266.266 -.852.425.659 23

Table 6: DGP 3, n = 8 Our estimator Our estimator Chen s estimator x (x) Bias SD RMSE m (x) Bias SD RMSE Bias SD RMSE " standard normal -.9.66.3.336.45 -. -.82.594.599 2.289 2.476 3.367 -.6.32.468.492.678.2 -.283.84.885 -.549 2.262 2.737 -.3 3.56.486.894.4.5.283.558.58-2.64.32 2.839.3 3.56.5.948.74. -.97.57.569-3.267.789 3.36.6.32.354.582.678.4 -.83.977.99-3.698.65 3.754.9.66.38.228.264.7.6.324.323-3.783.827 3.872..546.96.26.234.8.36.294.296-3.832.864 3.927.3.336.9.26.56 2. -..5.5-3.792.945 3.98.6.228.38.42.44 2.4 -.77.54.94-3.83.972 3.95.9.62.92.3.92 2.7 -.87.47.93-3.658.4 3.83 " logistic -.9.66.264.38.44 -..3.539.538 2.798 2.5 3.445 -.6.32.456.432.624.2 -.8.739.758 -.639 2.737 2.84 -.3 3.56.63.8.26.5.62.527.524-2.2.637 2.669.3 3.56.624.9.92. -.73.525.53-2.94.226 3.6.6.32.38.498.588.4 -.35.684.684-3.59.724 3.592.9.66.44.92.24.7.55.37.3-3.589.42 3.737..546.26.92.234.8.3.275.276-3.63.62 3.756.3.336.72.4.32 2..49.49.57-3.566.2 3.738.6.228.38.36.44 2.4 -.55.58.79-3.33.22 3.54.9.62.8.3.86 2.7 -.6.47.67-2.945.244 3.96 " mixture -.9.66.44.36.546 -..72.266.36 2.523.7 2.75 -.6.32.42.63.756.2.34.424.424 2.258.957 2.45 -.3 3.56.798.88.428.5.34.85.84.64.36.964.3 3.56.6.22.272..23.84.846.48.98.92.6.32.384.56.642.4.27.37.48.5.77 2.56.9.66.24.264.354.7.234.85.298 -.75.844.848..546.2.98.288.8.22.66.275 -.32.789.8.3.336.86.86.264 2..93.47.242 -.694.537.683.6.228.56.2.86 2.4.48.7.83 -.698.244.424.9.62.62.68.234 2.7.36.3.34 -.659.937.44 24

Table 7: DGP 4, n = 4 Our estimator Our estimator Chen s estimator x (x) Bias SD RMSE m (x) Bias SD RMSE Bias SD RMSE " standard normal -.9.6.248.474.338 -..26.56.578 2.65 2.729 3.82 -.6.858.836.63.944.2.45.545.546.38 3.9 3.356 -.3.5 3.26.858 3.24.5.57.934.935 -.488.888.948.3.5 2.466.69 2.562..28.552.59 -.792.72.876.6.858.242.72.428.4.362.552.66 -.43 2.543 2.92.9.6.84.648.32.7.298.363.469 -.582 2.392 2.866..546.732.642.978.8.284.347.448 -.443 2.325 2.734.3.426.666.564.87 2..93.28.34 -.2.972 2.2.6.354.57.3.648 2.4..78.2 -.692.379.542.9.3.564.282.63 2.7 -.74.56.72 -.88.7.42 " logistic -.9.6.278.54.392 -..238.446.55 2.872 2.465 3.783 -.6.858.896.594.986.2.4.564.565.735 2.77 3.267 -.3.5 3.2.86 3.32.5 -.63.866.867 -.4.93.97.3.5 2.54.642 2.598..92.44.48 -.573.663.757.6.858.278.56.38.4.337.383.59 -.937 2.399 2.573.9.6.864.6.5.7.268.3.49 -.46 2.259 2.53..546.792.384.876.8.236.236.334 -.39 2.66 2.445.3.426.654.33.732 2..24.98.284 -.797.722.896.6.354.62.276.672 2.4..57.86 -.72.344.524.9.3.576.264.63 2.7 -.76.39.58 -.755.26.355 " mixture -.9.6.554.648.68 -..236.224.325.895.495 2.43 -.6.858.974.678 2.88.2.32.297.299.572.46.945 -.3.5 3.8.4 3.8.5 -.25.54.555.643.96.8.3.5 2.562.726 2.664..262.29.39.464.938.46.6.858.662.54.746.4.267.78.32.775.278.493.9.6.326.792.542.7.284.259.384.286.299.329..546.224.42.296.8.282.58.323.3.22.227.3.426.44.378. 2..284.5.32.7.799.8.6.354.86.336.882 2.4.28.3.232 -.7.749.748.9.3.74.336.792 2.7..2.2 -.3.894.93 25

Table 8: DGP 4, n = 8 Our estimator Our estimator Chen s estimator x (x) Bias SD RMSE m (x) Bias SD RMSE Bias SD RMSE " standard normal -.9.6.99.324.38 -..267.45.485 3.8 2.582 4.8 -.6.858.548.432.68.2.6.456.455.566 3.289 3.64 -.3.5 2.928.732 3.8.5 -.57.99. -.769 2.423 2.539.3.5 2.9.5 2.25..256.479.543 -.38 2.6 2.57.6.858.2.42.98.4.342.38.467-2.54 2.272 3.29.9.6.66.348.744.7.3.3.425-2.346.88 3.5..546.66.32.684.8.267.228.35-2.85.858 2.867.3.426.546.258.66 2..92.89.269 -.38.338.692.6.354.492.246.546 2.4.3.78.22 -.9.937.35.9.3.474.228.528 2.7 -.32.65.68 -.987.88.275 " logistic -.9.6.4.288.56 -..235.39.395 3.498.948 4.3 -.6.858.578.426.638.2 -.2.478.477 2.55 2.666 3.688 -.3.5 2.94.62 3..5 -.59.869.883 -.83 2.48 2.479.3.5 2.9.42 2.232..263.432.55 -.79 2.229 2.34.6.858.68.366.28.4.38.279.422 -.62 2.24 2.729.9.6.726.324.798.7.247.227.335 -.737.93 2.575..546.678.36.744.8.23.26.36 -.572.868 2.44.3.426.564.246.62 2..2.72.264 -.969.273.599.6.354.522.24.564 2.4.4.47.86 -.832.885.24.9.3.486.222.534 2.7 -.34.46.5 -.889.724.46 " mixture -.9.6.296.438.368 -..242.89.37 2.365.78 2.598 -.6.858.62.48.692.2.29.222.223.979.83 2.46 -.3.5 2.64.966 2.88.5 -.67.57.5.75.794.48.3.5 2.394.582 2.466..233.248.34.67.48.495.6.858.5.396.548.4.256.8.33.443.36.836.9.6.6.36.58.7.277.54.37.54.83.3..546.38.282.8.8.3.56.338.422.38.9.3.426.882.24.92 2..289.34.38.64.647.667.6.354.72.228.738 2.4.246.6.268.86.599.64.9.3.594.24.63 2.7.27.78.82..46.46 26

7 Appendix Proof of Lemma 2.. We rst show that T (m; ) in equation (2.) is well de ned for any m and ; where is a positive constant. First, consider T (m; ) for m : It is clear that Z T (m; ) = S S 2 (t + m; x) (t; x ) S dt (m; x) 2 Z Z S (t; x ) + S (t + m; x) S dt = 4 S (t) S t+m m 3 (m; x) S () + m = S () Z tf(t)dt + S = E["j" > ] + E < ; m " m (m m (x)) (x) Z m m (x) (x) " S S(u)du m m m (x) > (x) where the last two equalities use lim t ts(t) =, and the last inequality follows from the existence of the rst moment for ": Similarly, we can show that T 2 (m; ) is also well de ned for m <. Clearly, T (m (x); (x)) = : Without loss of generality, assume that m (x) : We will show that (m (x); (x)) is the unique solution of T (m; ) = : Suppose that there is a pair ( ~m x ; ~ x ) 6= (m (x); (x)) such that T ( ~m x ; ~ x ) = ; where ~ x > : Without loss of generality, further assume that ~m x m (x): It follows from T ( ~m x ; ~ x ) = that for all t ; m S (t; x ) = S (t~ x + ~m x ; x) S ( ~m x ; x) or S (t) S () = S t~x+ ~m x m S ~mx m De ne x ~mx and x ~x ; then x ; x > and ( x ; x ) 6= (; ) : Then, we have S (t) S () = S ( xt + x ) S ( x ) for all t : Therefore, S () can be expressed in the form of equation (2.6) for all t ; which leads to a contradiction. Proof of Lemma 2.2. : 5 dt For part (i) ; without loss of generality, assume that m (x) : It is easy to see that when = ; then = : Thus, we consider that > : We prove this lemma by showing (a) that cannot be less than and (b) that cannot be greater than. Consider (a): Suppose that, on the contrary, < : Let t = > ; then, equation (2.6) implies that F F = : F () F () This equation holds either if F () = F () or if F = : F = contradicts the assumption that " has unbounded support from the right. F () = F () implies that F (t) = F (t + ) for all t : 27