I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A)

I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2011/2 ESTIMATION OF THE ERROR DENSITY IN A SEMIPARAMETRIC TRANSFORMATION MODEL SAMB, R., HEUCHENNE, C. and I. VAN KEILEGOM

Estimation of the Error Density in a Semiparametric Transformation Model Rawane Sam Université catholique de Louvain Cédric Heuchenne University of Liège and Université catholique de Louvain Ingrid Van Keilegom Université catholique de Louvain Septemer 27, 2011 Astract Consider the semiparametric transformation model Λ θo (Y ) = m(x) + ε, where θ o is an unknown finite dimensional parameter, the functions Λ θo and m are smooth, ε is independent of X, and E(ε) = 0. We propose a kernel-type estimator of the density of the error ε, and prove its asymptotic normality. The estimated errors, which lie at the asis of this estimator, are otained from a profile likelihood estimator of θ o and a nonparametric kernel estimator of m. The practical performance of the proposed density estimator is evaluated in a simulation study. Key Words: Density estimation; Kernel smoothing; Nonparametric regression; Profile likelihood; Transformation model. R. Sam acknowledges financial support from IAP research network P6/0 of the Belgian Government (Belgian Science Policy). C. Heuchenne acknowledges financial support from IAP research network P6/0 of the Belgian Government (Belgian Science Policy), and from the contract Projet d Actions de Recherche Concertées (ARC) 11/16-09 of the Communauté française de Belgique, granted y the Académie universitaire Louvain. I. Van Keilegom acknowledges financial support from IAP research network P6/0 of the Belgian Government (Belgian Science Policy), from the European Research Council under the European Community s Seventh Framework Programme (FP7/2007-201) / ERC Grant agreement No. 20650, and from the contract Projet d Actions de Recherche Concertées (ARC) 11/16-09 of the Communauté française de Belgique, granted y the Académie universitaire Louvain. 1

1 Introduction Let (X 1, Y 1 ),..., (X n, Y n ) e independent replicates of the random vector (X, Y ), where Y is a univariate dependent variale and X is a one-dimensional covariate. We assume that Y and X are related via the semiparametric transformation model Λ θo (Y ) = m(x) + ε, (1.1) where ε is independent of X and has mean zero. We assume that {Λ θ : θ Θ} (with Θ R p compact) is a parametric family of strictly increasing functions defined on an unounded suset D in R, while m is the unknown regression function, elonging to an infinite dimensional parameter set M. We assume that M is a space of functions endowed with the norm M =. We denote θ o Θ and m M for the true unknown finite and infinite dimensional parameters. Define the regression function m θ (x) = E[Λ θ (Y ) X = x], for each θ Θ, and let ε θ = ε(θ) = Λ θ (Y ) m θ (X). In this paper, we are interested in the estimation of the proaility density function (p.d.f.) f ε of the residual term ε = Λ θo (Y ) m(x). To this end, we first otain the estimators θ and m θ of the parameter θ o and the function m θ, and second, form the semiparametric regression residuals ε i ( θ) = Λ θ(y i ) m θ(x i ). To estimate θ o we use a profile likelihood (PL) approach, developed in Linton, Sperlich and Van Keilegom (2008), whereas m θ is estimated y means of a Nadaraya-Watson-type estimator (Nadaraya, 1964, Watson, 1964). To our knowledge, the estimation of the density of ε in model (1.1) has not yet een investigated in the statistical literature. This estimation may e very useful in various regression prolems. Indeed, taking transformations of the data may induce normality and error variance homogeneity in the transformed model. So the estimation of the error density in the transformed model may e used for testing these hypotheses. Taking transformations of the data has een an important part of statistical practice for many years. A major contriution to this methodology was made y Box and Cox (1964), who proposed a parametric power family of transformations that includes the logarithm and the identity. They suggested that the power transformation, when applied to the dependent variale in a linear regression model, might induce normality and homoscedasticity. Lots of effort has een devoted to the investigation of the Box-Cox transformation since its introduction. See, for example, Amemiya (1985), Horowitz (1998), Chen, Lockhart and Stephens (2002), Shin (2008), and Fitzenerger, Wilke and Zhang (2010). Other dependent variale transformations have een suggested, for example, the Zellner and Revankar (1969) transform and the Bickel and Doksum 2

(1981) transform. The transformation methodology has een quite successful and a large literature exists on this topic for parametric models. See Carroll and Ruppert (1988) and Sakia (1992) and references therein. The estimation of (functionals of) the error distriution and density under simplified versions of model (1.1) has received considerale attention in the statistical literature in recent years. Consider e.g. model (1.1) ut with Λ θo id, i.e. the response is not transformed. Under this model, Escanciano and Jacho- Chavez (2010) considered the estimation of the (marginal) density of the response Y via the estimation of the error density. Akritas and Van Keilegom (2001) estimated the cumulative distriution function of the regression error in a heteroscedastic model with univariate covariates. The estimator they proposed is ased on nonparametrically estimated regression residuals. The weak convergence of their estimator was proved. The results otained y Akritas and Van Keilegom (2001) have een generalized y Neumeyer and Van Keilegom (2010) to the case of multivariate covariates. Müller, Schick and Wefelmeyer (2004) investigated linear functionals of the error distriution in nonparametric regression. Cheng (2005) estalished the asymptotic normality of an estimator of the error density ased on estimated residuals. The estimator he proposed is constructed y splitting the sample into two parts: the first part is used for the estimation of the residuals, while the second part of the sample is used for the construction of the error density estimator. Efromovich (2005) proposed an adaptive estimator of the error density, ased on a density estimator proposed y Pinsker (1980). Finally, Sam (2010) also considered the estimation of the error density, ut his approach is more closely related to the one in Akritas and Van Keilegom (2001). In order to achieve the ojective of this paper, namely the estimation of the error density under model (1.1), we first need to estimate the transformation parameter θ o. To this end, we make use of the results in Linton, Sperlich and Van Keilegom (2008). In the latter paper, the authors first discuss the nonparametric identification of model (1.1), and second, estimate the transformation parameter θ o under the considered model. For the estimation of this parameter, they propose two approaches. The first approach uses a semiparametric profile likelihood (PL) estimator, while the second is ased on a mean squared distance from independence-estimator (MD) using the estimated distriutions of X, ε and (X, ε). Linton, Sperlich and Van Keilegom (2008) derived the asymptotic distriutions of their estimators under certain regularity conditions, and proved that oth estimators of θ o are root-n consistent. The authors also showed that, in practice, the performance of the PL method is etter than that of the MD approach. For this reason, the PL method will e considered in this paper for the estimation of θ o. The rest of the paper is organized as follows. Section 2 presents our estimator of the error density and groups some notations and technical assumptions. Section descries the asymptotic results of the paper.

A simulation study is given in Section 4, while Section 5 is devoted to some general conclusions. Finally, the proofs of the asymptotic results are collected in Section 6. 2 Definitions and assumptions 2.1 Construction of the estimators The approach proposed here for the estimation of f ε is ased on a two-steps procedure. In a first step, we estimate the finite dimensional parameter θ o. This parameter is estimated y the profile likelihood (PL) method, developed in Linton, Sperlich and Van Keilegom (2008). The asic idea of this method is to replace all unknown expressions in the likelihood function y their nonparametric kernel estimates. Under model (1.1), we have P (Y y X) = P (Λ θo (Y ) Λ θo (y) X) = P (ε θo Λ θo (y) m θo (X) X) = F ε (Λ θo (y) m θo (X)). Here, F ε (t) = P(ε t), and so f Y X (y x) = f ε (Λ θo (y) m θo (x)) Λ θ o (y), where f ε and f Y X are the densities of ε, and of Y given X, respectively. Then, the log likelihood function is where f εθ {log f εθ (Λ θ (Y i ) m θ (X i )) + log Λ θ(y i )}, θ Θ, is the density function of ε θ. Now, let ( ) n j=1 Λ Xj x θ(y j )K 1 m θ (x) = n j=1 K 1 ( Xj x h h ) (2.1) e the Nadaraya-Watson estimator of m θ (x), and let f εθ (t) = 1 ng ( ) εi (θ) t K 2. (2.2) g where ε i (θ) = Λ θ (Y i ) m θ (X i ). Here, K 1 and K 2 are kernel functions and h and g are andwidth sequences. Then, the PL estimator of θ o is defined y [ θ = arg max log f ] εθ (Λ θ (Y i ) m θ (X i )) + log Λ θ(y i ). (2.) θ Θ Recall that m θ (X i ) converges to m θ (X i ) at a slower rate for those X i which are close to the oundary of the support X of the covariate X. That is why we assume implicitly that the proposed estimator (2.) of θ o 4

trims the oservations X i outside a suset X 0 of X. Note that we keep the root-n consistency of θ proved in Linton, Sperlich and Van Keilegom (2008) y trimming the covariates outside X 0. But in this case, the resulting asymptotic variance is different to the one otained in the latter paper. In a second step, we use the aove estimator θ to uild the estimated residuals ε i ( θ) = Λ θ(y i ) m θ(x i ). Then, our proposed estimator f ε (t) of f ε (t) is defined y ( ) f ε (t) = 1 ε i ( θ) t K, (2.4) n where K is a kernel function and is a andwidth sequence, not necessarily the same as the kernel K 2 and the andwidth g used in (2.2). Oserve that this estimator is a feasile estimator in the sense that it does not depend on any unknown quantity, as is desirale in practice. This contrasts with the unfeasile ideal kernel estimator f ε (t) = 1 n ( ) εi t K, (2.5) which depends in particular on the unknown regression errors ε i = ε i (θ o ) = Λ θo (Y i ) m(x i ). It is however intuitively clear that f ε (t) and f ε (t) will e very close for n large enough, as will e illustrated y the results given in Section. 2.2 Notations When there is no amiguity, we use ε and m to indicate ε θo and m θo. Moreover, N (θ o ) represents a neighorhood of θ o. For the kernel K j (j = 1, 2, ), let µ(k j ) = v 2 K j (v)dv and let K (p) j e the pth derivative of K j. For any function ϕ θ (y), denote ϕ θ (y) = ϕ θ (y)/ θ = ( ϕ θ (y)/ θ 1,..., ϕ θ (y)/ θ p ) t and ϕ θ (y) = ϕ θ(y)/ y. Also, let A = (A t A) 1/2 e the Euclidean norm of any vector A. For any functions m, r, f, ϕ and q, and any θ Θ, let s = ( m, r, f, ϕ, q), s θ = (m θ, ṁ θ, f εθ, f ε θ, f εθ ), ε i (θ, m) = Λ θ (Y i ) m(x i ), and define { G n (θ, s) = n 1 n 1 f{ε i (θ, m)} G(θ, s) = E[G n (θ, s)] and G(θ o, s θo ) = θ G(θ, s θ) θ=θo. [ ϕ{ε i (θ, m)}{ Λ ] θ (Y i ) r(x i )} + q{ε i (θ, m)} + Λ } θ (Y i) Λ θ (Y, i) 2. Technical assumptions The assumptions we need for the asymptotic results are listed elow for convenient reference. 5

(A1) The function K j (j = 1, 2, ) is symmetric, has compact support, v k K j (v)dv = 0 for k = 1,..., q j 1 and v q j K j (v)dv 0 for some q j 4, K j is twice continuously differentiale, and K (1) (v)dv = 0. (A2) The andwidth sequences h, g and satisfy nh 2q 1 = o(1), ng 2q 2 = o(1) (where q 1 and q 2 are defined in (A1)), (n 5 ) 1 = O(1), n h 2 (log h 1 ) 2 and ng 6 (log g 1 ) 2. (A) (i) The support X of the covariate X is a compact suset of R, and X 0 is a suset with non empty interior, whose closure is in the interior of X. (ii) The density f X is ounded away from zero and infinity on X, and has continuous second order partial derivatives on X. (A4) The function m θ (x) is twice continuously differentiale with respect to θ on X N (θ 0 ), and the functions m θ (x) and ṁ θ (x) are q 1 times continuously differentiale with respect to x on X N (θ 0 ). All these derivatives are ounded, uniformly in (x, θ) X N (θ o ). (A5) The error ε = Λ θo (Y ) m(x) has finite fourth moment and is independent of X. (A6) The distriution F εθ (t) is q + 1 (respectively three) times continuously differentiale with respect to t (respectively θ), and sup θ,t k+l t k θ l1 1... θl p p F εθ (t) < for all k and l such that 0 k + l 2, where l = l 1 +... + l p and θ = (θ 1,..., θ p ) t. (A7) The transformation Λ θ (y) is three times continuously differentiale with respect to oth θ and y, and there exists a α > 0 such that E [ sup θ : θ θ α ] k+l y k θ l 1 1... Λ θl p θ (Y ) p for all θ Θ, and for all k and l such that 0 k + l, where l = l 1 +... + l p and θ = (θ 1,..., θ p ) t. Moreover, sup x X E[ Λ 4 θ o (Y ) X = x] <. (A8) For all η > 0, there exists ɛ(η) > 0 such that < Moreover, the matrix G(θ o, s θo ) is non-singular. inf G(θ, s θ) ɛ(η) > 0. θ θ o >η (A9) (i) E(Λ θo (Y )) = 1, Λ θo (0) = 0 and the set {x X 0 : m (x) 0} has nonempty interior. 6

(ii) Assume that φ(x, t) = Λ θo (Λ 1 θ o (m(x) + t))f ε (t) is continuously differentiale with respect to t for all x and that for all t R and for some δ > 0. sup s: t s δ E φ (X, s) s <. (2.6) Assumptions (A1), part of (A2), (A)(ii), (A4) and (A6), (A7) and (A8) are used y Linton, Sperlich and Van Keilegom (2008) to show that the PL estimator θ of θ o is root n-consistent. The differentiaility of K j up to second order imposed in assumption (A1) is used to expand the two-steps kernel estimator f ε (t) in (2.4) around the unfeasile one f ε (t). Assumptions (A)(ii) and (A4) impose that all the functions to e estimated have ounded derivatives. The last assumption in (A2) is useful for otaining the uniform convergence of the Nadaraya-Watson estimator of m θo in (2.1) (see for instance Einmahl and Mason, 2005). This assumption is also needed in the study of the difference etween the feasile estimator f ε (t) and the unfeasile estimator f ε (t). Finally, (A9)(i) is needed for identifying the model (see Vanhems and Van Keilegom (2011)). Asymptotic results In this section we are interested in the asymptotic ehavior of the estimator f ε (t). To this end, we first investigate its asymptotic representation, which will e needed to show its asymptotic normality. Theorem.1. Assume (A1)-(A9). Then, f ε (t) f ε (t) = 1 n where R n (t) = o P ( (n) 1/2 ) for all t R. ( ) εi t K f ε (t) + R n (t), This result is important, since it shows that, provided the ias term is negligile, the estimation of θ o and m( ) has asymptotically no effect on the ehavior of the estimator f ε (t). Therefore, this estimator is asymptotically equivalent to the unfeasile estimator f ε (t), ased on the unknown true errors ε 1,..., ε n. Our next result gives the asymptotic normality of the estimator f ε (t). Theorem.2. Assume (A1)-(A9). In addition, assume that n 2q+1 = O(1). Then, ( ) ( d n f ε (t) f ε (t) N 0, f ε (t) ) K(v)dv 2, 7

where f ε (t) = f ε (t) + q q! f (q ) ε (t) v q K (v)dv. The proofs of Theorems.1 and.2 are given in Section 6. 4 Simulations In this section, we investigate the performance of our method for different models and different sample sizes. Consider where Λ θ is the Manly (1976) transformation e θy 1 θ, θ 0, Λ θ (y) = y, θ = 0, Λ θo (Y ) = 0 + 1 X 2 + 2 sin(πx) + σ e ε, (4.1) θ o [ 0.5, 1.5], X is uniformly distriuted on the interval [ 0.5, 0.5], and ε is independent of X and has a standard normal distriution ut restricted to the interval [, ]. We study three different model settings. For each of them, 0 = σ e + 2. The other parameters are chosen as follows: Model 1: 1 = 5, 2 = 2, σ e = 1.5; Model 2: 1 =.5, 2 = 1.5, σ e = 1; Model : 1 = 2.5, 2 = 1, σ e = 0.5. The parameters and the error distriution have een chosen in such a way that the transformation Λ θo (Y ) is positive, to avoid prolems when generating the variale Y. Our simulations are done for θ o = 0, 0.5 or 1. The estimator of θ o is chosen from a grid on the interval [ 0.5, 1.5] with step size 0.0625. We used ( the kernel K(x) = ) 15 16 1 x 2 2 1 ( x 1) for oth the regression function and the density estimators. The results are ased on 100 random samples of size n = 50 or n = 100, and we worked with the andwidths h = 0. n 1/5 and = g = r n, where r n = 1.06 std( ε) n 1/5, which is Silverman s (1986) rule of thum andwidth for univariate density estimation. Here std( ε) is the average of the standard deviations of ε over the 100 samples. Tale 1 shows the values of the mean, standard deviation and mean squared error of θ for the considered models, sample sizes and values of θ o. We oserve that the results for the different models are quite similar, and as expected, the results are etter for n = 100 than for n = 50. 8

Tale 2 shows the mean squared error (MSE) of the estimator f ε (t) of the standardized (pseudoestimated) error ε = (Λ θ(y ) m θ(x))/σ e, for sample sizes n = 50 and n = 100 and for t = 1, 0 and 1. Results for f ε (t) have also een otained, ut are not reported here. Indeed, Figure 1, displaying f ε (t), shows that, even though residuals are standardized for each simulation (with known σ e ), etter ehavior is oserved for models with smaller σ e. Moreover, we oserve that for θ o = 0 there is very little difference etween the curve of f ε and the one of the standard normal density. On the other hand for θ o = 0.5 and θ o = 1, we notice an important difference etween the two curves under Model 1 and 2, ut the difference is less important under Model. n θ o mean( θ) std( θ) MSE( θ) Model 1 Model 2 Model Model 1 Model 2 Model Model 1 Model 2 Model 50 0 0.006 0.0065 0.0071 0.0116 0.0161 0.029 0.0064 0.0124 0.0277 0.5 0.787 0.754 0.907 0.0417 0.048 0.0486 0.078 0.0867 0.1140 1 0.8197 0.8449 0.8658 0.0792 0.0796 0.0798 0.506 0.492 0.409 100 0 0.0055 0.0148 0.0170 0.0057 0.0078 0.0116 0.002 0.0059 0.012 0.5 0.4596 0.4621 0.4728 0.0246 0.254 0.0270 0.064 0.0676 0.0752 1 0.9196 0.9545 0.9749 0.0401 0.047 0.048 0.2092 0.1999 0.167 Tale 1: Approximation of the mean, the standard deviation and the mean squared error of θ for the three regression models. All numers are calculated ased on 100 random samples. 5 Conclusions In this paper we have studied the estimation of the density of the error in a semiparametric transformation model. The regression function in this model is unspecified (except for some smoothness assumptions), whereas the transformation (of the dependent variale in the model) is supposed to elong to a parametric family of monotone transformations. The proposed estimator is a kernel-type estimator, and we have shown its asymptotic normality. The finite sample performance of the estimator is illustrated y means of a simulation study. It would e interesting to explore various possile applications of the results in this paper. For example, one could use the results on the estimation of the error density to test hypotheses concerning e.g. the normality 9

of the errors, the homoscedasticity of the error variance, or the linearity of the regression function, all of which are important features in the context of transformation models. 6 Proofs Proof of Theorem.1. Write f ε (t) f ε (t) = [ f ε (t) f ε (t)] + [ f ε (t) f ε (t)], where f ε (t) = 1 n ( ) εi t K and ε i = Λ θo (Y i ) m θo (X i ), i = 1,..., n. In a completely similar way as was done for Lemma A.1 in Linton, Sperlich and Van Keilegom (2008), it can e shown that f ε (t) f ε (t) = 1 n ( ) εi t K f ε (t) + o P ((n) 1/2 ) (6.1) for all t R. Note that the remainder term in Lemma A.1 in the aove paper equals a sum of i.i.d. terms of mean zero, plus a o P (n 1/2 ) term. Hence, the remainder term in that paper is O P (n 1/2 ), whereas we write o P ((n) 1/2 ) in (6.1). Therefore, the result of the theorem follows if we prove that f ε (t) f ε (t) = o P ((n) 1/2 ). To this end, write f ε (t) f ε (t) 1 = n 2 ( ε i ( θ) ε i (θ o ))K (1) + 1 2n ( ε i ( θ) ε i (θ o )) 2 K (2) ( ) εi (θ o ) t ( ε i (θ o ) + β( ε i ( θ) ε i (θ o )) t for some β (0, 1). In what follows, we will show that each of the terms aove is o P ((n) 1/2 ). First consider the last term of (6.2). Since Λ θ (y) and m θ (x) are oth twice continuously differentiale with respect to θ, the second order Taylor expansion gives, for some θ 1 etween θ o and θ (to simplify the notations, we assume here that p = dim(θ) = 1), ), ε i ( θ) ε i (θ o ) = Λ θ(y i ) Λ θo (Y i ) ( m θ(x i ) m θo (X i ) ) = ( θ θ o )( Λ θo (Y i ) m θo (X i )) + 1 2 ( θ θ o ) 2 ( Λ θ1 (Y i ) m θ1 (X i )). 10

Therefore, since θ θ o = o P ((n) 1/2 ) y Theorem 4.1 in Linton, Sperlich and Van Keilegom (2008) (as efore, we work with a slower rate than what is shown in the latter paper, since this leads to weaker conditions on the andwidths), assumptions (A2) and (A7) imply that ( ) 1 n ( ε i ( θ) ε i (θ o )) 2 K (2) ε i (θ o ) + β( ε i ( θ) ε i (θ o )) t ( = O P (n ) 1), which is o P ((n) 1/2 ), since (n 5 ) 1 = O(1) under (A2). For the first term of (6.2), the decomposition of ε i ( θ) ε i (θ o ) given aove yields 1 n 2 ( ε i ( θ) ε i (θ o ))K (1) = ( θ θ o ) n 2 = ( θ θ o ) n 2 ( ) εi (θ o ) t ( Λ θo (Y i ) m θo (X i ))K (1) ( Λ θo (Y i ) ṁ θo (X i ))K (1) ( εi (θ o ) t ( εi t ) + o P ((n) 1/2 ) where the last equality follows from a Taylor expansion applied to K (1), the fact that m θo (x) ṁ θo (x) = O P ((nh) 1/2 (log h 1 ) 1/2 ), ) + o P ((n) 1/2 ), (6.2) uniformly in x X 0 y Lemma 6.1, and the fact that nh (log h 1 ) 1 under (A2). Further, write [ n ( ) ] E ( Λ θo (Y i ) ṁ θo (X i ))K (1) εi t [ ( )] = E Λ θo (Y i )K (1) εi t [ ( )] E [ṁ θo (X i )] E K (1) εi t = A n B n. We will only show that the first term aove is O(n 2 ) for any t R. The proof for the other term is similar. Let ϕ(x, t) = Λ θo (Λ 1 θ o (m(x)+t)) and set φ(x, t) = ϕ(x, t)f ε (t). Then, applying a Taylor expansion to φ(x, ), it follows that (for some β (0, 1)) [ ( A n = E Λ θo Λ 1 θ o (m(x i ) + ε i ) ) ( )] K (1) εi t ( ) = n φ(x, e)k (1) e t f X (x)dxde = n φ(x, t + v)k (1) (v)f X(x)dxdv [ = n φ(x, t) + v φ ] (x, t + βv) K (1) t (v)f X(x)dxdv = n 2 v φ (x, t + βv)k(1) t (v)f X(x)dxdv, 11

since K (1) (v)dv = 0, and this is ounded y Kn2 sup s: t s δ E φ s (X, s) = O(n2 ) y assumption (A9)(ii). Hence, Tcheychev s inequality ensures that ( θ θ o ) ( ) 2 ( Λ θo (Y i ) ṁ θo (X i ))K (1) εi t = ( θ θ o ) n 2 O P (n 2 + (n) 1/2 ) = o P ((n) 1/2 ), since n /2 y (A2). Sustituting this in (6.2), yields 1 ( ) n 2 ( ε i ( θ) ε i (θ o ))K (1) εi (θ o ) t = o P ((n) 1/2 ), for any t R. This completes the proof. Proof of Theorem.2. It follows from Theorem.1 that f ε (t) f ε (t) = [ f ε (t) E f ε (t)] + [E f ε (t) f ε (t)] + o P ((n) 1/2 ). (6.) The first term on the right hand side of (6.) is treated y Lyapounov s Central Limit Theorem (LCT) for triangular arrays (see e.g. Billingsley 1968, Theorem 7.). To this end, let f in (t) = 1 ( ) K εi t. Then, under (A1), (A2) and (A5) it can e easily shown that n E fin (t) E f in (t) Cn 2 f ε (t) K (v) dv + o ( n 2) ( n Var f ) /2 ( in (t) n 1 f ε (t) K(v)dv 2 + o ( = O((n) 1/2 ) = o(1), n 1)) /2 for some C > 0. Hence, the LCT ensures that This gives f ε (t) E f ε (t) Var f ε (t) = f ε (t) E f ε (t) Var f1n (t) ( n fε (t) E f ) ( d ε (t) N 0, f ε (t) n d N (0, 1). ) K(v)dv 2. (6.4) For the second term of (6.), straightforward calculations show that E f ε (t) f ε (t) = q q! f (q ) ε (t) v q K (v)dv + o( q ). Comining this with (6.4) and (6.), we otain the desired result. 12

Lemma 6.1. Assume (A1)-(A5) and (A7). Then, sup m θo (x) m θo (x) = O P ((nh) 1/2 (log h 1 ) 1/2 ), x X 0 sup m θo (x) ṁ θo (x) = O P ((nh) 1/2 (log h 1 ) 1/2 ). x X 0 Proof. We will only show the proof for m θo (x) ṁ θo (x), the proof for m θo (x) m θo (x) eing very similar. Let c n = (nh) 1/2 (log h 1 ) 1/2, and define r θo (x) = 1 nh j=1 Λ θo (Y j )K 1 ( Xj x h x X 0 ), ṙ θo (x) = E[ r θo (x)], f X (x) = E[ f X (x)], where f X (x) = (nh) 1 n j=1 K 1( Xj x h ). Then, sup m θo (x) ṁ θo (x) sup m θo (x) ṙθ o (x) x X 0 f X (x) + sup 1 ṙ θo (x) f X (x)ṁ θo (x). (6.5) x X 0 f X (x) Since E[ Λ 4 θ o (Y ) X = x] < uniformly in x X y assumption (A7), a similar proof as was given for Theorem 2 in Einmahl and Mason (2005) ensures that sup m θo (x) ṙθ o (x) f X (x) = O P (c n ). x X 0 Consider now the second term of (6.5). Since E[ ε(θ o ) X] = 0, where ε(θ o ) = d dθ (Λ θ(y ) m θ (X)) θ=θo, we have [ ( )] X x ṙ θo (x) = h 1 E {ṁ θo (X) + ε(θ o )} K 1 h [ ( )] X x = h 1 E ṁ θo (X)K 1 h = ṁ θo (x + hv)k 1 (v)f X (x + hv)dv, from which it follows that ṙ θo (x) f X (x)ṁ θo (x) = [ṁ θo (x + hv) ṁ θo (x)] K 1 (v)f X (x + hv)dv. Hence, a Taylor expansion applied to ṁ θo ( ) yields sup ṙθo (x) f X (x)ṁ θo (x) = O(h q1 ) = O (c n ), x X 0 since nh 2q 1+1 (log h 1 ) 1 = O(1) y (A2). This proves that the second term of (6.5) is O(c n ), since it can e easily shown that f X (x) is ounded away from 0 and infinity, uniformly in x X 0, using (A)(ii). 1

References [1] Akritas, M.G. and Van Keilegom, I. (2001). Non-parametric estimation of the residual distriution. Scand. J. Statist., 28, 549 567. [2] Amemiya, T. (1985). Advanced Econometrics. Harvard University Press, Camridge. [] Bickel, P.J. and Doksum, K. (1981). An analysis of transformations revisited. J. Amer. Statist. Assoc., 76, 296 11. [4] Billingsley, P. (1968). Convergence of Proaility Measures. Wiley. [5] Box, G.E.P. and Cox, D.R. (1964). An analysis of transformations. J. Roy. Statist. Soc. - Ser. B, 26, 211 252. [6] Carroll, R.J. and Ruppert, D. (1988). Transformation and Weighting in Regression. Chapman and Hall, New York. [7] Chen, G., Lockhart, R.A. and Stephens, A. (2002). Box-Cox transformations in linear models: Large sample theory and tests of normality (with discussion). Canad. J. Statist, 0, 177 24. [8] Cheng, F. (2005). Asymptotic distriutions of error density and distriution function estimators in nonparametric regression. J. Statist. Plann. Infer., 128, 27 49. [9] Efromovich, S. (2005). Estimation of the density of the regression errors. Ann. Statist.,, 2194 2227. [10] Einmahl, U. and Mason, D.M. (2005). Uniform in andwidth consistency of kernel-type function estimators. Ann. Statist.,, 180 140. [11] Escanciano, J.C. and Jacho-Chavez, D. (2010). n-uniformly consistent density estimation in nonparametric regression models (sumitted). [12] Fitzenerger, B., Wilke, R.A. and Zhang, X. (2010). Implementing Box-Cox quantile regression. Econometric Rev., 29, 158 181. [1] Horowitz, J.L. (1998). Semiparametric Methods in Economics. Springer-Verlag, New York. [14] Linton, O., Sperlich, S. and Van Keilegom, I. (2008). Estimation of a semiparametric transformation model. Ann. Statist., 6, 686 718. 14

[15] Manly, B. F. (1976). Exponential data transformation. The Statistician, 25, 7 42. [16] Müller, U.U., Schick, A. and Wefelmeyer, W. (2004). Estimating linear functionals of the error distriution in nonparametric regression. J. Statist. Plann. Infer., 119, 75 9. [17] Nadaraya, E.A. (1964). On a regression estimate. Teor. Verojatnost. i Primenen, 9, 157 159. [18] Neumeyer, N. and Van Keilegom, I. (2010). Estimating the error distriution in nonparametric multiple regression with applications to model testing. J. Multiv. Anal., 101, 1067 1078. [19] Pinsker, M.S. (1980). Optimal filtering of a square integrale signal in Gaussian white noise. Prolems Inform. Transmission, 16, 52 68. [20] Sakia, R.M. (1992). The Box-Cox transformation technique: a review. The Statistician, 41, 169 178. [21] Sam, R. (2010). Contriution à l estimation nonparamétrique de la densité des erreurs de régression. Doctoral Thesis. Avalaile at http : //arxiv.org/p S cache/arxiv/pdf/1011/1011.0674v1.pdf. [22] Shin, Y. (2008). Semiparametric estimation of the Box-Cox transformation model. Econometrics J., 11, 517 57. [2] Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Monographs on Statistics and Applied Proaility, Chapman and Hall, London. [24] Vanhems, A. and Van Keilegom, I. (2011). Semiparametric transformation model with endogeneity: a control function approach (sumitted). [25] Watson, G.S. (1964). Smooth regression analysis. Sankhya - Ser. A, 26, 59 72. [26] Zellner, A. and Revankar, N.S. (1969). Generalized production functions. Rev. Economic Studies, 6, 241 250. 15

n θ o t Mean Squared Error of f ε (t) Model 1 Model 2 Model 50 0-1 0.0026 0.0025 0.0019 0 0.0040 0.007 0.0028 1 0.0026 0.002 0.0017 0.5-1 0.006 0.0048 0.0025 0 0.0527 0.072 0.0147 1 0.0062 0.0046 0.0020 1-1 0.0078 0.0048 0.0024 0 0.0564 0.014 0.01 1 0.0049 0.000 0.0019 100 0-1 0.0012 0.0011 0.0008 0 0.009 0.005 0.0026 1 0.0017 0.0015 0.0012 0.5-1 0.0015 0.0014 0.0011 0 0.0075 0.0057 0.001 1 0.0021 0.0018 0.0012 1-1 0.0024 0.0018 0.0012 0 0.0110 0.0052 0.0270 1 0.0019 0.0016 0.0012 Tale 2: Mean squared error of f ε (t) for three regression models. All numers are calculated ased on 100 random samples. 16

Model 1 Model 2 Model Densities 0.0 0.1 0.2 0. 0.4 Densities 0.0 0.1 0.2 0. 0.4 Densities 0.0 0.1 0.2 0. 0.4 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 Model 1 Model 2 Model Densities 0.0 0.1 0.2 0. 0.4 0.5 Densities 0.0 0.1 0.2 0. 0.4 Densities 0.0 0.1 0.2 0. 0.4 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 Model 1 Model 2 Model Densities 0.0 0.1 0.2 0. 0.4 Densities 0.0 0.1 0.2 0. 0.4 Densities 0.0 0.1 0.2 0. 0.4 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 Figure 1: Curves of the pointwise average of f ε over 100 random samples of size n = 100 (solid curve) and of the standard normal density (dashed curve) for θ o = 0 (first row), θ o = 0.5 (second row) and θ o = 1 (third row). 17