Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Size: px

Start display at page:

Download "Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model"

May Fletcher
5 years ago
Views:

1 Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004; Fan and Huang, 2005) is a very useful semiparametric model which models the key covariates linearly and models the rest of the covariates nonparametrically. Suppose that we are given a random sample {(x i, z i, U i, Y i ); 1 i n}, Y i is the response variable and (x i, z i, U i ) are covariates. The varying-coefficient partially linear model assumes the following conditional linear structure: Y = x T α(u) + z T β + ϵ, (1.1) x = (X 1,..., X p ) T, α(u) = {α 1 (u),..., α p (u)} T, β = (β 1,..., β q ) T, and E(ϵ x, z, U) = 0. Here, we allow the dependence of ϵ on (x, z, U). In this article, we assume U is a scalar. The results can be extended for multivariate U. However, the extension may not be useful in practice due to the curse of dimensionality. Note that, if β = 0, (1.1) becomes a standard varying-coefficient model. If α j ( ) = 0, j = 2,..., p and X 1 = 1, then (1.1) becomes the standard partial linear model (Wahba, 1984; Green and Silverman, 1994). If α j ( ) = 0, j = 2,..., p, X 1 = 1 and β = 0, then (1.1) becomes the nonparametric regression model. In this article, we employ the local linear techniques for estimating α(u 0 ). Other nonparametric smoothers, such as splines, can be also used. We propose to get an initial consistent estimate of α(u 0 ) by minimizing the following loss function, l 1 (a, b, β) = ρ [ Y i { a T x i + b T x i (U i u) + z T i β }] K h (U i U 0 ), (1.2) 1

2 Note that the traditional least squares criterion corresponds to ρ(t) = t 2. In this article, we will mainly consider a robust criterion ρ( ). The median regression corresponds to L 1 loss. The Huber function (Huber 1981) is one commonly used robust loss function, ψ c (t) = ρ (t) = max{ c, min(c, t)} and the tuning constant c regulates the amount of robustness. Huber (1981) recommends using c = in practice. This choice produces a relative efficiency of approximately 95% when the error density is normal. Another possibility for ψ( ) is Tukey s bisquare function ψ c (t) = t{1 (t/c) 2 } 2 +, which weighs the tail contribution of t by a biweight function. In the parametric robustness literature, the use of c = 4.685, which produces 95% efficiency, is recommended. Let {â (0), ˆb (0), ˆβ (0) } be the minimizer of (1.4). Then ˆα (0) (U 0 ) = â (0). We will prove that α (0) (U 0 ) and ˆβ (0) are nh consistent. Note that the global parameters β was estimated locally here. To improve the efficiency, β should be estimated globally. By replacing α(u) by ˆα (0) (u) in (1.1), we estimate β by minimizing l 2 (β) = ρ [ Y i { x T i ˆα (0)(U i ) + z T i β }]. (1.3) Denote by ˆβ (1) the minimizer of β. The root n consistency of ˆβ (1) will be proved in the next section. Note that ˆα (0) (u) obtained in (1.2) needs to account for the uncertainty of β since we estimate α(u) and β locally simultaneously. Therefore, we can further improve the efficiency of the estimate of α(u 0 ) by minimizing the loss function l 3 (a, b) = [ }] ρ Y i {a T x i + b T x i (U i u) + z i ˆβ(1) K h (U i U 0 ), (1.4) ˆβ (1) is obtained in (1.3). Let {â (1), ˆb (1) } be the minimizer of (1.4). Then 2

3 ˆα (1) (U 0 ) = â (1). Since ˆβ (1) has root n consistency, ˆα (1) (U 0 ) has the same efficiency as if β were known. Therefore, the one step backfitting estimate ˆα (1) (U 0 ) will be more efficient than ˆα (0) (U 0 ). The formal result will be provided in the next section. After replacing α(u) by ˆα (1) (u) in (1.1), we can then estimate β by minimizing l 4 (β) = ρ [ Y i { x T i ˆα (1)(U i ) + z T i β }]. (1.5) Denote by ˆβ (2) the minimizer of (1.5). Since ˆα (1) (u) is more efficient than ˆα (0) (u), it will be our interest to check whether ˆβ (2) is more efficient than ˆβ (1). In addition, since the efficiency of ˆα (1) (u) cannot be improved further more, it is natural to think that any further iteration will not improve the efficiency of ˆβ (2). We will prove in the next section that the above two arguments are not necessary true and further demonstrate that the second arguments are in fact wrong in many common situations. 2. Computation In this section, we will consider how to minimize (1.2) (1.5). In general, let s consider how to minimize l(θ) = ρ ( Y i x T i θ ) K h (U i U 0 ). (2.1) Note that the estimate ˆθ is the solution of 0 = ρ ( Y i x T i θ ) x i K h (U i U 0 ) = W (r i )(Y i x T i θ)x i K h (U i U 0 ), (2.2) W (r i ) = ρ (r i ) /r i and r i = Y i x T i θ. If W (r i ) is given, (2.2) is a weighted least squares estimation and have explicit solution. Therefore, we can iterate between estimating W (r i ) and θ. Given the initial 3

4 estimate θ (0), at (k + 1)th step, θ (k+1) = ( W (r (k) i )K h (U i U 0 )x i x T i ) 1 W (r (k) i )K h (U i U 0 )x i Y i, r (k) i = Y i x T i θ (k). The above computation method can be applied to minimize any objective function of (1.2) (1.5). Note that the recommended c values used in ρ( ) introduced in Section 1 assumes the data has unit variance 1. Therefore, in practice, we need to account for the scale of the data. For example, for Huber s ψ function, we can use c = 1.345σ, σ is the standard deviation of ϵ. In practice, we can estimate σ by median absolute deviation (MAD) of the initial estimated residual ˆϵ i s. ˆσ = Median( ˆε i Median(ˆε i ) )/0.675, ˆϵ i can be calculated by using L 1 loss (median regression) for ρ( ) (R might have package to calculate median regression or we can use the computation method introduced in this section to compute median regression). 3. Main Asymptotic results 3.1. Initial estimate using varying coefficient model To establish the asymptotic properties of local maximizer { β, ã, b} of (1.4), let θ = (a T, hb T, β T ) T and x i = {x T i, x T i (U i U 0 )/h, z T i } T. Then, l 1 (θ) = K h (U i U 0 )ρ(y i x T i θ). (3.1) 4

5 Denote by f(u) the marginal density of u. Let γ j (U) = E{ρ (ϵ)(x j x T, 0 1 p, X j z T ) U}, A(X, Z) = xx T 0 p p xz T 0 p p µ 2 xx T 0 p q zx T 0 q p zz T and Ã(X, z) = ν 0 xx T 0 p p ν 0 xz T 0 p p ν 2 xx T 0 p q ν 0 zx T 0 q p ν 0 zz T, (3.2) µ l = u l K(u) du, ν l = u l K 2 (u) du and 0 a b is a by b matrix with all elements 0. Let A(U 0 ) = E{A(x, z)ρ (ϵ) U = U 0 } and Ã(U 0) = E{ρ (ϵ) 2 Ã(x, z) U = U 0 }. Theorem 3.1. Supposing ˆθ (0) is the minimizer of (3.1). Then, as n, h 0, and nh, we have a) θ θ =n 1 {f(u 0 )A(U 0 )} 1 ρ (Y i x T i θ) x i K h (U i U 0 ) + O p (a n ), (3.3) j=1 a n = h 2 + nh log 1/2 (1/h) and O p (a n ) holds uniformly for all U 0 in the support of U. b) nh { θ θ bias( θ) } d N {0, f(u 0 ) 1 A(U 0 ) 1 Ã(U 0 )A(U 0 ) 1 }, bias( θ) = 1 2 µ 2h 2 p j=1 α j (U 0 )A(U 0 ) 1 γ j (U 0 ) + o p (h 2 ). (3.4) 3.2 One step estimate One step of β: 5

6 Theorem 3.2. Suppose ˆβ (1) is the minimizer of (1.3). As n, nh 4 0, and nh 2 / log(1/h), n(ˆβ(1) β) d N(0, Σ 1 ), Σ 1 = B 1 var { ρ (ϵ)z E(ρ (ϵ)zx T U)ψ(x, Y, z, U) } B 1. B = E { ρ (ϵ)zz T } (3.5) and ψ(x j, Y j, z j, U j ) is a p 1 vector, in which the elements are taken from the first p entries of A(U j ) 1 {ρ (ϵ j ) x j }. One step of α( ): Let ξ = {α(u 0 ) T, hα (U 0 ) T } T, ˇx i = {x T i, x T i (U i U 0 )/h} T. Then the first step estimate ξ (1) = { ˆα (1) (U 0 ) T, h ˆα (1)(U 0 ) T } T is the minimizer of l 3 (ξ) = {ρ(y i ˇx T i ξ z T ˆβ i (1) )}. Let (U) = E{ρ (ϵ)xx T U}, (U) = E{ρ (ϵ) 2 xx T U}, η j (U) = E[ρ (ϵ)xx j U], and H = µ 2 (U 0 ), H = ν ν 2 (U 0 ), (3.6) is the Kronecker product. 6

7 Theorem 3.3. As n, h 0, and nh, [ nh ˆα(1) (U 0 ) α(u 0 ) bias{ ˆα (1) (U 0 )} ] { d N 0, ν 0 f(u 0 ) 1 1 (U 0 ) (U } 0 ) 1 (U 0 ), bias{ ˆα (1) (U 0 )} = 1 2 µ 2h 2 1 (U 0 ) p j=1 α j (U 0 )η j (U 0 ) + o p (h 2 ). (3.7) 3.3 Two step estimate Let ω (x j, Y j, z j, U j ) = E { ρ (ϵ)zx T U = Uj } 1 (U j )x j ρ (ϵ j ), (3.8) and M = E [E { ρ (ϵ)zx } { T U E ρ (ϵ)xx } T 1 { U E ρ (ϵ)zx } ] T T U. (3.9) Theorem 3.4. Suppose ˆβ (2) is the minimizer of (1.5). n(ˆβ(2) β) d N(0, Σ 2 ), Σ 2 = B 1 var [ ρ (ϵ i )z i + ω (x i, Y i, z i, U i ) MB 1 { ρ (ϵ i )z i + ω(x i, Y i, z i, U i )} ] B Full iteration Let ˆβ full be the estimate by full iteration of backing fitting algorithm. 7

8 Theorem 3.5. As n, nh 4 0, and nh 2 / log(1/h), n(ˆβfull β) d N(0, Σ full ), Σ full = (M + B) 1 var [ ρ (ϵ i )z i + ω (x i, Y i, z i, U i )] (M + B) 1, ω and M are defined in (3.8) and (3.9). 3.5 Partially linear models It is very difficult to compare Σ 1, Σ 2, and Σ full for general varying coefficient partially linear model due to their complicated expressions. In this section, similar to Carroll et al. (1997), we use a simpler partially linear model to compare different proposed estimators. Example: Considering the partially linear model (i.e., x = 1), we assume ρ(t) = t 2 /2, ϵ is independent of U and z with var(ϵ) = σ 2, and E(z) = 0 (see, Carroll et al. (1997); Opsomer and Ruppert (1999)). Then B = E(zz T ) = var(z) M = E [ E(z U)E(z T U) ] = var{e(z U)} = var(q(u)) ω (x i, Y i, z i, U i ) = E(z U i )ϵ i ω(x i, Y i, z i, U i ) = E(z U i )e T 1 A 1 (1, z T i ) T ϵ i. q(u) = E(z U), e 1 is the unit vector with 1 in the first position and 0 8

9 else. A = 1 E(z T U i ) E(z U i ) E(zz T U i ) 1. For one step estimate, note that since cov { z, E(z U)e T 1 A 1 (1, z T i ) T } = 0, we have Σ 1 = B 1 var {ρ (ϵ)z ω(x, Y, z, U)} B 1 = B 1 σ 2 var { z E(z U)e T 1 A 1 (1, z T i ) T } B 1 = σ 2 [ var(z) 1 + var(z) 1 E { q(u)q(u) T e T 1 A 1 e 1 } var(z) 1 ], e T 1 A 1 e 1 = 1 1 q(u) T E(zz T U) 1 q(u) = 1 + q(u)t var(z U) 1 q(u). For two step estimator, note that cov { ρ (ϵ i )z + ω (x i, Y i, z i, U i ), ρ (ϵ i )z i + ω(x i, Y i, z i, U i )} =σ 2 var(z) σ 2 var{e(z U)} + σ 2 var{e(z U} = σ 2 var(z). 9

10 Therefore, Σ 2 =σ 2 B ( 1 E {var(z U)} 2MB 1 var(z) +MB {[ 1 var(z) + E { q(u)q(u) T a 11 (U) }]} B 1 M ) B 1 = σ 2 B 1 [var(z) + var{q(u)}] B 1 + B 1 MΣ 1 MB 1 = σ [ 2 var(z) 1 + var(z) 1 var{q(u)}var(z) 1] + var(z) 1 var{q(u)}σ 1 var{q(u)}var(z) 1. Note that M + B = E {var(z U)} and var { ρ (ϵ i )z i + ω (x i, Y i, z i, U i )} = var { ϵz i + E(z U i )ϵ i } (3.10) = σ 2 E {var(z U)} (3.11) So Σ full = E {var(z U)} 1 σ 2 When z is independent of U, q(u) = 0 and E(var(z U)) = var(z). Therefore, Σ 1 = Σ 2 = Σ full = σ 2 var(z) 1, i.e., the one step estimator is as efficient as the fully iterated estimator. Carroll et al. (1997) derived a similar result for partial linear models. If we only assume that q(u) = E(z U) is independent of var(z U) (for example, var(z U) does not depend on U). Note that var(z) = var{q(u)} + E{var(z U)}, var{q(u)} = E{q(U)q(U) T },E{var(z U) 1/2 } E{var(z U)} 1/2, and E { q(u)q(u) T var(z U) 1 q(u)q(u) T } =var{q(u)}e{var(z U)} 1 var{q(u)} + cov [ q(u)q(u) T var(z U)} 1/2]. 10

11 Then Σ 1 =σ [ 2 var(z) 1 + var(z) 1 var {q(u)} var(z) 1 (3.12) +var(z) 1 E { q(u)q(u) T var(z U) 1 q(u)q(u) } T var(z) 1] σ [ 2 var(z) 1 + var(z) 1 var {q(u)} var(z) 1 +var(z) 1 var{q(u)}e{var(z U)} 1 var{q(u)}var(z) 1] =σ ( 2 var(z) 1 + var(z) 1 [var(z) E{var(z U)}] var(z) 1 +var(z) 1 [var(z) E{var(z U)}] E{var(z U)} 1 [var(z) E{var(z U)}] var(z) 1) = σ 2 E{var(z U)} 1 = Σ full. (3.13) The above equality only holds when q(u)q(u) T var(z U) 1 q(u)q(u) T does not depend on U, i.e., is a constant. For two step estimator, since Σ 1 σ 2 E{var(z U)} 1, we have Σ 2 σ 2 [ var(z) 1 + var(z) 1 var{q(u)}var(z) 1 +var(z) 1 var{q(u)}e{var(z U)} 1 var{q(u)}var(z) 1] = σ 2 E{var(z U)} 1 = Σ full. The above equality only holds when q(u)q(u) T var(z U) 1 q(u)q(u) T is a constant. Therefore, the full iteration is at least as efficient as the one step and two step estimators. In addition, σ 2 var{q(u)}σ 1 var{q(u)} = var{q(u)} [ var(z) 1 + var(z) 1 E { q(u)q(u) T } var(z) 1 +var(z) 1 E { q(u)q(u) T var(z U) 1 q(u)q(u) T } var(z) 1] var{q(u)} 11

12 and for any positive definite matrix C, we have C var{q(u)}var(z) 1 Cvar(z) 1 var{q(u)} 0 noting var(z) var{q(u)} and var(z)c 1 var(z) var{q(u)}c 1 var{q(u)}. Based on (3.13), we have σ 2 var(z)(σ 1 Σ 2 )var(z) =E { q(u)q(u) T var(z U) 1 q(u)q(u) } T σ 2 var{q(u)}σ 1 var{q(u)} var{q(u)}e{var(z U)} 1 var{q(u)} var{q(u)}var(z) 1 var{q(u)} var{q(u)}var(z) 1 var {q(u)} var(z) 1 var{q(u)} var{q(u)}var(z) 1 var{q(u)}e{var(z U)} 1 var{q(u)}var(z) 1 var{q(u)} = var{q(u)} [ E{var(z U)} 1 E{var(z U)} 1] var{q(u)} = 0. Therefore, Σ 1 Σ 2, i.e., the second step estimator is at least as efficient as the one step estimator. In summary, if z and U are independent, all three estimators have the same efficiency. Therefore, the one step estimator is enough. However, for general situation, if q(u) = E(z U) is independent of var(z U), we only have Σ 1 Σ 2 Σ full, although they have the same convergence rate. The above results are only proved for partial linear model. We conjecture that the above results also holds for varying coefficient partially linear model but the formal generalization needs further research. Appendix The following technical conditions are imposed in this section. They are not the weakest possible conditions, but they are imposed to facilitate the proofs 12

13 (A1) α(u) has continuous second derivative on its support D. (A2) f(u) has continuous first derivative. (A3) x and z have a bounded support. (A4) S(u) and T (u) are continuous and T (u) > 0. (A5) K( ) is a symmetric (about 0) probability density with compact support [ 1, 1]. (A6) E{ρ (ϵ)} = 0 and E{ρ (ϵ)} > 0. (A7) E{ρ (ϵ)} 3 <. Proof of Theorem 3.1: Supposing ˆθ (0) is the minimizer of (3.1). Let ˆθ (0) = γ 1 n (ˆθ (0) θ), γ n = (nh) 1/2 and θ = {α(u 0 ), hα (U 0 ), β} is the true value. Then ˆθ (0) minimizes l 1(θ ) = h Based on Taylor expansion, we have [ K h (U i U 0 ) ρ(y i x T i (θ + θ / ] nh)) ρ(y i x T i θ) l 1(θ ) = V T nθ θ T W n θ {1 + o p (1)}, Note that V n = θ l 1(θ) = hγ n W n = 2 θ θ l 1(θ) = hγ 2 n ρ (Y i x T i θ) x i K h (U i U 0 ) ρ (Y i x T i θ) x i x T i K h (U i U 0 ) W n = E { K h (U U 0 )ρ (ϵ) x x T } (1 + o p (1)) = f(u 0 )A(U 0 ) + o p (1). 13

14 Therefore, we have l 1(θ ) = V T nθ θ T f(u 0 )A(U 0 )θ + o p (1). By applying the convexity Lemma (Pollard, 1991), we obtain ˆθ (0) = {f(u 0 )A(U 0 )} 1 V n + o p (1). (3.14) Note that [ nh E(V n ) = 2 E K h (U U 0 )ρ (ϵ) nh p = 2 µ 2h 2 f(u 0 ) j=1 p j=1 α j (U 0 )(U U 0 ) 2 X j x α j (U 0 )γ j (U 0 )(1 + o p (1)), cov(v n ) = E [ K 2 h(u U 0 )ρ (ϵ) 2 x x T ] (1 + o p (1)) = 1 nh f(u 0)Ã(U 0)(1 + o p (1)). ] (1 + o p (1)) Hence, based on the central limit theorem by checking the Liapounov s condition, we have nh { θ θ bias( θ) } d N {0, f(u 0 ) 1 A(U 0 ) 1 Ã(U 0 )A(U 0 ) 1 }, bias( θ) is defined in (3.4). Based on Lemma A.1 and the result (A.6) of Carroll et al. (1997), we also have a 14

15 stronger representation of (3.14) ˆθ (0) (U) θ(u) =n 1 {f(u)a(u)} 1 ρ (Y i x T i θ) x i K h (U i U) + o p (a n1 ). (3.15) holds uniformly over all u D, a n1 = γ n h 2 + γ 2 n log(1/h). Proof of Theorem 3.2: Denote ˆβ (1) = n(ˆβ (1) β), β is the true value. Then ˆβ (1) minimizes l 2(β ) = { ρ(yi x T i ˆα (0)(U i ) z T i (β + β / n)) ρ(y i x T i ˆα (0)(U i ) z T i β) } (3.16). By a Taylor expansion and some calculation, l 2(β ) = A n β β T B n β + o p (1), (3.17) A n = n 1/2 ρ (Y i x T i ˆα (0)(U i ) z T i β)z i, For B n, it can be shown that B n = n 1 ρ (Y i x T i ˆα (0)(U i ) z T i β)z i z T i. B n = E { ρ (ϵ)z i z T i } = B + op (1). 15

16 Then, we have l n (β ) = A n β 1 2 β T B β + o p (1). (3.18) Next, we expand A n as A n = 1 n ρ (Y i x T i α(u i ) z T i β)z i + T n1 + O p (a 1n ). a 1n = n 1/2 ˆα (0) α 2 and T n1 = 1 n ρ (Y i x T i α(u i ) z T i β)z i x T i { ˆα (0) (U i ) α(u i )}. (3.19) Let ψ(x j, Y j, z j, U j ) be a p 1 vector, in which the elements are taken from the first p entries of A 1 (U j ) {ρ (Y j x T j α(u j ) z T j β) x i }. Based on the result (3.15), by condition nh 2 / log(1/h), we have O p (n 1/2 a n1 ) = o p (1). Since x T j θ(u i ) x T j α(u j ) z T j β = O{(U i U j ) 2 }, we have T n1 = n 3/2 ρ (ϵ i )z i x T i f 1 (U i )ψ(x j, Y j, z j, U j )K h (U i U j ) + O p (n 1/2 h 2 ) j=1 = T n2 + O p (n 1/2 h 2 ). It can be shown, by calculating the second moment, that T n2 T n3 p 0, (3.20) 16

17 T n3 = n 1/2 n j=1 ω(x j, Y j, z j, U j ), with ω(x j, Y j, z j, U j ) = E { ρ (ϵ)zx T U = Uj } ψ(xj, Y j, z j, U j ). By condition nh 4 0, we know A n = n 1/2 { ρ (Y i x T i α(u i ) z T i β)z i + ω(x i, Y i, z i, U i ) } + o p (1). By (3.18) and quadratic approximation lemma, ˆβ (1) = B 1 A n + o p (1). (3.21) Note that E {ρ (ϵ) x, z, U} = 0 and E {ω(x, z, U)} = 0. So based on the central limit theorem, we have n(ˆβ(1) β) d N(0, B 1 ΣB 1 ), Σ = var { ρ (Y x T α(u) z T β)z ω(x, Y, z, U) }. Proof of Theorem 3.3: We have nh{ ξ ξ} = f(u0 ) 1 H 1 ˆ n + o p (1), (3.22) 17

18 h ˆ n = n ρ (Y i ˇX T i ξ z T ˆβ i (1) ) ˇX i K h (U i U 0 ). It can be calculated that h ˆ n = n ρ (Y i ˇX T i ξ z T i β) ˇX i K h (U i U 0 ) + D n + o p (1), D n = h n ρ (Y i ˇX T i ξ z T i β) ˇX i z T i (ˆβ (1) β)k h (U i U 0 ) Since n(ˆβ (1) β) = O p (1), it can be shown that D n = O p (h). Hence nh{ ξ ξ} = f(u0 ) 1 H 1 n + o p (1), h n = n We can show that ρ (Y i ˇX T i ξ z T i β) ˇX i K h (U i U 0 ). var( n ) f(u 0 ) H and E( n ) nh 2 µ 2h 2 f(u 0 ) p j=1 α j (U 0 )(1, 0) T η j (U 0 ), 18

19 So, the asymptotic bias of the resulting estimator ᾰ(u 0 ) is bias{ᾰ(u 0 )} = 1 2 µ 2h 2 1 (U 0 ) p j=1 α j (U 0 )η j (U 0 ) + o p (h 2 1), and the asymptotic variance of ᾰ(u 0 ) is cov{ᾰ(u 0 )} = ν 0 nhf(u 0 ) 1 (U 0 ) (U 0 ) 1 (U 0 )(1 + o p (1)). Based on the central limit theorem, the asymptotic normality nh [ᾰ(u0 ) α(u 0 ) bias{ᾰ(u 0 )}] { d N 0, ν 0 f(u 0 ) 1 1 (U 0 ) (U } 0 ) 1 (U 0 ), bias(ˆθ) is defined in (3.7), can also be proved under some regularity conditions. Proof of Theorem 3.4: Based on the one step result (3.22), ( ˆα (0) (U i ) α(u i )) n 1 {f(u i )} 1 1 (U i ) ρ (Y j ˇX T j ξ z T j β)x j K h (U j U i ) j=1 (3.23) n 1 {f(u i )} 1 1 (U i )Dn, (3.24) D n = ρ (Y j ˇX T j ξ z T j β)x j z T j (ˆβ (1) β)k h (U j U i ) j=1 19

20 From (3.19), we have T n1 =n 3/2 j=1 n 3/2 j=1 ρ (ϵ i )z i x T i 1 (U i )ρ (ϵ j )x j K h (U i U j ) f(u i ) =T n2 T n3 n(ˆβ(1) β) ρ (ϵ i )z i x T i 1 (U i )ρ (ϵ j )x j z T j f(u i ) (ˆβ (1) β)k h (U i U j ) + O p (n 1/2 h 2 ) It can be shown, by calculating the second moment, that T n2 T n4 p 0, (3.25) T n4 = n 1/2 n j=1 ω (x j, Y j, z j, U j ), with ω (x j, Y j, z j, U j ) = E { ρ (ϵ)zx T U = Uj } 1 (U j )x j ρ (ϵ j ) In addition, we have T n3 = E [E { ρ (ϵ)zx } T U 1 (U)E { ρ (ϵ)zx } ] T T U (1 + o p (1)) = M(1 + o p (1)) So n(ˆβ β) = B 1 A n + o p (1), (3.26) 20

21 B = E{ρ (ϵ)zz T }. A n = n 1/2 { ρ (ϵ i )z i + ω (x i, Y i, z i, U i )} M n(ˆβ (1) β), (3.27) and So, n(ˆβ(1) β) = B 1 n 1/2 { ρ (ϵ i )z i + ω(x i, Y i, z i, U i )} n(ˆβ β) d N(0, Σ 2 ), Σ 2 = B 1 var [ ρ (ϵ i )z i + ω (x i, Y i, z i, U i ) MB 1 { ρ (ϵ i )z i + ω(x i, Y i, z i, U i )} ] B 1. Proof of Theorem 3.5: At convergence, (ˆβ (1) β) = (ˆβ β), then (I + B 1 M) n(ˆβ β) = B 1 n 1/2 { ρ (ϵ i )z i + ω (x i, Y i, z i, U i )} So n(ˆβfull β) d N(0, Σ full ), Σ full = (M + B) 1 var [ ρ (ϵ i )z i + ω (x i, Y i, z i, U i )] (M + B) 1 References Carroll, R. J., Fan, J., Gijbels, I. and Wand, M. P. (1997). Generalized Partially Linear Single-Index Models. Journal of American and Statisical Association. 92, 21

22 Fan, J. and Huang, T. (2005). Profile Likelihood Inferences on Semiparametric Varying-Coefficient Partially Linear Models. Bernoulli, 11, Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. London: Chapman and Hall. Opsomer, J. D. and Ruppert, D. (1999) A root-n consistent backfitting estimator for semiparametric additive modeling. Journal of Computational and Graphical Statistics, 8, Wahba, G. (1984). Partial Spline Models for Semiparametric Estimation of Functions of Several Variables. In Statistical Analysis of Time Series, Proceedings of the Japan U.S. Joint Seminar, Tokyo, Institute of Statistical Mathematics, Tokyo. Xia, Y., Zhang, W., and Tong, H. (2004). Efficient Estimation for Semivaryingcoefficient Models. Biometrika, 91, Zhang, W., Lee, S. Y., and Song, X. (2002). Local Polynomial Fitting in Semivarying Coefficent Models. Journal of Multivariate Analysis, 82,

Nonparametric Modal Regression

Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric