USING A BIMODAL KERNEL FOR A NONPARAMETRIC REGRESSION SPECIFICATION TEST

Size: px

Start display at page:

Download "USING A BIMODAL KERNEL FOR A NONPARAMETRIC REGRESSION SPECIFICATION TEST"

Dwayne Townsend
5 years ago
Views:

1 Statistica Sinica 25 (2015), doi: USING A BIMODAL KERNEL FOR A NONPARAMETRIC REGRESSION SPECIFICATION TEST Cheolyong Park 1, Tae Yoon Kim 1, Jeongcheol Ha 1, Zhi-Ming Luo 1 and Sun Young Hwang 2 1 Keimyung University and 2 Sookmyung Womens University Abstract: For a nonparametric regression model with a fixed design, we consider the model specification test based on a kernel. We find that a bimodal kernel is useful for the model specification test with a correlated error, whereas a conventional unimodal kernel is useful only for an iid error. Another finding is that the model specification test suffers from a convergence rate change depending on whether the errors are correlated or not. These results are verified by deriving an asymptotic null distribution and asymptotic (local) power, and by performing a simulation. The validity of the bimodal kernel for testing is demonstrated with the drum roller data (see Laslett (1994) and Altman (1994)). Key words and phrases: bimodal kernel, convergence rate change, correlated error, nonparametric specification test. 1. Introduction Suppose that we are concerned with the nonparametric regression function specification test given Y i = m(x i ) + η i (i = 1,..., n), (1.1) where m is a smooth function defined on [0,1], x i = i/n, and {η i } is a zeromean, covariance stationary process. Here, the design points grow closer as the sample size increases; while, the error process remains the same. Refer to Kim et al. (2009) and references therein for detailed discussions about this model. The testing problem under consideration is whether m(x) belongs to a specific parametric family. This can be described as versus H 0 : m(x) = g(x, γ 0 ) for all x [0, 1] with some γ 0 B R q H 1 : m(x) g(x, γ) for some x [0, 1] with all γ B R q. For testing H 0 nonparametrically, we consider a kernel-based test statistic ( T n = (n 2 h) 1 xi x ) j K ˆϵ i ˆϵ j, (1.2) h

2 1146 C. PARK, T. Y. KIM, J. HA, Z.-M. LUO AND S. Y. HWANG where K is a kernel satisfying (C1) below, ϵ i = Y i g(x i, γ) and ˆϵ i = Y i g(x i, ˆγ), with a consistent estimator ˆγ of γ 0. Here, T n is based on the average squared error d A ( ˆm, m) = n 1 ( ˆm(xi ) m(x i ) ) 2, (1.3) where ˆm(x) = (nh) 1 n K((x x i)/h)y i. In recent years, much research has been performed on applying T n to the model specification test for a random design regression model. See, for example, Fan and Li (1999) Zheng (1996), Luo, Kim, and Song (2011), and Khmaladze and Koul (2004). For the model specification test for the fixed design regression model, not many results are available. See, e.g., Eubank and Spiegelman (1990) or the monograph by Hart (1997), which studies nonparametric lack-of-fit test with iid errors. We study T n as a nonparametric regression specification test for the fixed design regression model, particularly when errors are correlated. The major strengths of T n is its consistency, since the existing parametric tests fail to be consistent against all deviations from the null. It is well known that when a nonparametric method such as ˆm is used to recover m, correlated errors cause trouble. See Opsomer, Wang, and Yang (2001) for a detailed discussion of this. We demonstrate that an analogous size distortion problem arises for a nonparametric specification test when errors are correlated. As a possible solution, we recommend the use of bimodal kernel K with K(0) = 0. In addition, we find that T n shows a power rate change, that yields a continuous but non-monotonic power function over the hypothesis domain. We proceed as follows. Section 2 proposes the specification test with bimodal kernel and demonstrates its usefulness with the drum roller data (Laslett (1994) and Altman (1994)). Section 3 discusses the power rate change of T n and its impact. Some other tests are discussed there. Section 4 reports on simulations that check our theoretical results. All proofs are deferred to the Appendix. 2. Size Distortion and Bimodal Kernel We need the following assumptions. (C1) K is a square integrable symmetric probability density function with support [ κ, κ] for some κ > 0, and K is Lipschitz continuous. (C2) Errors are a geometrically strong mixing sequence with mean zero and E η i r < for some r > 4. (C3) nh 3/2 and h 0. (C4) (i) g (1) (x, ) and g (2) (x, ) are continuous in x [0, 1] and dominated by a bounded function M g (x), where g (1) (x, ) and g (2) (x, ) are the first and

3 NONPARAMETRIC REGRESSION SPECIFICATION 1147 second partial derivatives with respect to γ, respectively. (ii) g (1) (x, γ) 2 = 0 for γ in a neighborhood of γ = lim pˆγ. (C5) The mean function m supported on the interval [0,1], and has a uniformly continuous and square integrable second derivative m (x) on the interval (0,1). Here, (C1) is a standard assumption on the kernel. If, in addition, K(0) = 0, the kernel is bimodal. For iid error, r = 4 suffices for (C2). If M b a be the σ-field generated by {ξ(t) : a t b}, then {ξ(t) : t R} is strong mixing if α(τ) = sup{ P (A B) P (A)P (B) : A M 0 and B M τ } = O(ρ τ ) for some 0 < ρ < 1 when τ. If (C4) is a standard assumption adopted in non-linear regression models: If γ = argmin γ B E(Y i g(x i, γ)) 2 and ˆγ = argmin γ B n (Y i g(x i, γ)) 2, then under H 0, γ = γ 0. And, uder (C4), γ = lim pˆγ and ˆγ γ = O p (n 1/2 ) under both H 0 and H 1. One may refer to Fan and Li (1999) for these results. (C5) is needed when H 1 holds. Theorem 1. Let (C1) (C4) and H 0 hold. If K is a bimodal kernel with K(0) = 0, then nh 1/2 T n ˆσ 0 N(0, 1) (2.1) in distribution, where ˆσ 2 0 σ0 2 = is a consistent estimator of [ (E(ϵ) K 2 2 (u)du ) ( 2 + j= ) 2 ] E(ϵ 0 ϵ j ). (2.2) Remark 1. Theorem 1 suggests an asymptotic one-sided test for H 0 versus H 1 : Reject H 0 at the significance level α if nh 1/2 T n /ˆσ 0 > z α, where z α is the upper α-percentile of the standard normal distribution. Because V ar( ϵ i ) = n(eϵ E(ϵ 0 ϵ j )) + o(n), one possible estimator of σ0 2 is to use the block bootstrap variance estimator of V n = n ˆϵ i, or V ar (V n ), where ˆϵ i = Y i ĝ(x i, ˆγ): ˆσ 0 2 = j=1 K 2 (u)du[(v ar V n n )2 + ( ˆϵ 2 i n )2 ]. (2.3)

4 1148 C. PARK, T. Y. KIM, J. HA, Z.-M. LUO AND S. Y. HWANG Table 1. Testing results for drum roller data. T u 230 T b 230 T u 1150 T b 1150 p-value Z score Remark 2. Correlated error causes size distortion to T n when a unimodal kernel is employed and a bimodal kernel can correct it. Indeed, we can show that ( n 1 P [nh 1/2 T n /ˆσ 0 > z α ] = P Z z α 2h 1/2 K(0) Eϵ 0 ϵ i ˆσ 0 ) + o(1), (2.4) where Z is N(0, 1). Verification of (2.4) is given in the Appendix. It indicates that the size inflated (deflated) by Eϵ 0ϵ i > 0(< 0) leads to a frequent (infrequent) rejection of H 0 unless a bimodal kernel with K(0) = 0 is used, and that when a unimodal kernel is used, the size distortion problem may be avoided by over-smoothing (large h). For iid error, Theorem 1 holds trivially for a unimodal kernel because E(η 0 η i ) = 0 for any i = 0. In order to demonstrate the usefulness of a bimodal kernel for T n, the drum roller data analyzed by Laslett (1994) and Altman (1994) are considered (the data are available on Statlib). As noted there, the data appear to exhibit a significant short-range positive correlation; so testing using a bimodal kernel is warranted. Figure 1 shows two fits to the two datasets using cubic B spline basis functions of order 5, where n = 1,150 represents the full dataset and n = 230 uses every fifth observation. We took the fit with n = 230 as ĝ 230 (x) for x [0, 1], and then, use it as g 0 (x, γ) = 5 j=1 γ jb j (x) = ĝ 230 (x), where B j (x) is the j-th cubic B-spline basis function. Testing H 0 : m(x) = g 0 (x, γ) for x [0, 1] with n = 230 or n = 1,150, we employed Tn u with the unimodal kernel and Tn b with the bimodal kernel. The testing results are summarized in Table 1. Table 1 shows that Tn u rejects H 0 and Tn b accepts it regardless of size n, and that the effect of a bimodal kernel is much more significant for n =1,150 than for n = 230, indicating that the size distortion is more severe for a large n in our case. 3. Power Rate Change We show that we reveal that T n has different convergence rates under H 1, and that such rate change complicates power under the local alternatives near H 0. Theorem 2. Let (C1) (C5) and H 1 hold. If K is a bimodal kernel with K(0) = 0, then n 1/2 (T n µ) 2 1/2 σ 1 N(0, 1) (3.1)

5 NONPARAMETRIC REGRESSION SPECIFICATION 1149 Figure 1. Scatterplots of drum roller data with fitted lines based on five bases of cubic B-splines for n =1,150 (left) and n = 230 (right). in distribution, where µ = (m(x) g(x, γ)) 2 dx and σ1 2 = Cov[(Y 0 m(x 0 ))(g(x 0, γ) m(x 0 )), (Y i m(x i ))(g(x i, γ) m(x i ))]. i= Theorems 1 and 2 have the test convergence rate at nh 1/2 under H 0, and n 1/2 under H 1. From Theorem 2, the test is consistent with rate n 1/2, not nh 1/2, as suggested by Theorem 1. As n, P ( Z n 1/2 (2 1/2 σ 1 ) 1 (z α ˆσ 0 (nh 1/2 ) 1 µ) ) 1. This is related to the fact that T n is a good approximation to d A ( ˆm, m). Since d A is decomposed as bias and variance components, and bias part goes to zero under H 0 but remains as a constant under H 1, the constant dominating d A (and hence T n ) under H 1 forces the rate change. We investigate the power rate change issue at a finer level of the local alternatives. Assume a sequence of local alternatives H 1n : m(x t ) = g(x t, γ 0 )+δ n l(x t ), where the known function l( ) is continuously differentiable and bounded by an integrable function M( ). We have σ 1 δ n under H 1n. Theorem 3. Let (C1) (C5) and H 1n hold. If K is a bimodal kernel with K(0) = 0, then (i) If nhδn 2 0, (nh 1/2 )(T n δnµ 2 1 )/σ 0 N(0, 1), where µ 1 = l 2 (x)dx. (ii) If nhδn 2, n 1/2 δn 1 (T n δnµ 2 1 )/(2 1/2 σ 2 ) N(0, 1), where σ2 2 = i= Cov[(Y 0 m(x 0 ))l(x 0 ), (Y i m(x i ))l(x i )]. (iii) If nhδ 2 n λ > 0, nh 1/2 (T n λ(nh) 1 µ 1 )/(σ σ2 2 /λ)1/2 N(0, 1). Remark 3. From Theorem 3, when the local alternative δ n is of the rate n 1/2 h 1/4, the test starts to have (local) power. The power function or rate, say

6 1150 C. PARK, T. Y. KIM, J. HA, Z.-M. LUO AND S. Y. HWANG s n, is a continuous, but not monotonic, function of δ n. This indicates that no particular asymptotic discontinuity is caused by the power rate change. Verification of these facts is given in the Appendix. Remark 4. Hart (1997) considered an analogy to T n with iid error and established a result corresponding to our Theorem 1. He also verified that when local alternative δ n is of the rate n 1/2 h 1/4, the test starts to have (local) power, but did not address the rate change and size distortion due to dependence. According to Hart (1997) (see Section there), if one employs the sup norm, it would be possible to consider the test statistic R n = sup x [0,1] (nhˆσ2 r) 1/2 ( x xi ) K ˆϵ i, (3.2) h where ˆσ 2 r is an estimator of σ 2 r = K 2 (u)du E(ϵ 0ϵ j ), but R n would suffer from the inflated size distortion when errors are correlated. Glesser and Moore (1983) mentioned that tests based on empiric distribution function are subject to size distortion due to positive dependence, for example. Some numerical work on this point is given in Table 5 of Section 4. Remark 5. Following the approach of Khmaladze and Koul (2004), it would be possible to consider a goodness-of-fit test for our problem, ξ n (y) = n 1/2 I(x i B)[I{ˆϵ i y} F ϵ (y)], where B [0, 1] and F ϵ is the distribution function of ϵ if the underlying distribution of F ϵ. They show that ξ n is an asymptotic distribution-free goodness-of-fit test and derive Brownian motion as its asymptotic distribution under iid conditions. If one replaces the indicator I by the kernel K in ξ n (y) above, then we have W n (y) = (nh) 1/2 ( x xi K h )[ L ( y ˆϵi h ) ] F ϵ (y), where L(y) = y K(x)dx. If the sup norm is applied to W n(y) for obtaining goodness-of-fit test under positive dependence, the test would again suffer from size distortion. 4. Simulation In this section, we show through simulations that the size and power of T n are affected by correlated error and furthermore, that such distortions of T n might be effectively handled by a bimodal kernel. We recommend the use of such a kernel

7 NONPARAMETRIC REGRESSION SPECIFICATION 1151 for a nonparametric regression specification test. As before, T u n and T b n denote the test statistic T n with a unimodal kernel and a bimodal kernel, respectively. The simulated regression setting was concerned with testing H 0 : m(x) = 300x 3 (1 x) 3 for all x [0, 1] versus, H 1 : m(x) γx 3 (1 x) 3 for some x [0, 1] with all γ R. The regression errors for the model Y j = m(x j ) + ϵ j for j = 1,..., n were produced by an AR(1) process ϵ j = ϕϵ j ϕ 2 Z j, where x j = j/n with n = 100, 400, 800, Z j s was pseudo iid normal random variables N(0, 1), and ϵ 1 was N(0, 1). The AR(1) parameters ϕ = 0.95, 0.9, 0.8,..., 0,..., 0.8, 0.9, Here ϕ = and 0.95 were added in order to consider severely correlated errors. The kernel functions used were K(x) = 630(4x 2 1) 2 x 4 I( 1/2 x 1/2) as a bimodal kernel for Tn, b and K(x) = (15/16)(1 x 2 ) 2 I( 1 x 1) as a unimodal kernel for Tn u. In addition, block bootstrap estimate ˆσ 0 2 with block length n 1/3 (refer to (2.3)) was employed. Thus, our test rejected H 0 if nh 1/2 T n /ˆσ 0 > z α. Table 2 provides the simulation results regarding the size distortion and its correction by a bimodal kernel. Tables 3 and 4 provide simulation results regarding how the power of T n is affected by correlated errors when m(x) = 300x 3 (1 x) or m(x) = 300x 3 (1 x) Table 2 calculates the size of Tn u and Tn b for various values of ϕ when H 0 is true. Indeed, we generated data from Y j = γx 3 j (1 x j) 3 + ϵ j for j = 1,..., n, where γ = 300 and γ is estimated via least squares, assuming m(x) = γx 3 (1 x) 3. From Table 2, one sees that Tn u accepts H 0 almost always when the errors are correlated negatively (i.e., ϕ < 0), whereas it rejects H 0 too often when ϕ > 0.3. Further, the size distortion of Tn u becomes more severe as the errors becomes more severely correlated. This verifies the size distortion of Tn u due to the correlated errors. Table 2 also confirms that Tn b corrects the size distortion reasonably well when the errors are positively correlated (ϕ > 0.3), as suggested by Theorem 1 and (2.4). When errors are negatively correlated or severely positively correlated at small n, Tn b does not provide a sufficient correction to the size distortion. In addition, there is not so much difference between Tn u and Tn b in simulated size for iid errors or ϕ = 0. Table 3 calculates the powers of Tn u and Tn b for various values of ϕ when H 1 is true or when m(x) = 300x 3 (1 x) From Table 3, one may observe that Tn u rejects H 0 : m(x) = 300x 3 (1 x) 3 less frequently and Tn b improves it when ϕ 0.7 at n = 100. Such improvements disappear as n increases or ϕ increases to zero. When ϕ is positive, Tn u outdoes Tn b significantly in power, which suggests that Tn b corrects the inflated power of Tn u at the cost of the reduced power. One sees that such ineludible adjustments of Tn b remain strong across n as ϕ gets close to 1. Table 4 calculates the powers of Tn u and Tn b by considering a more distant

8 1152 C. PARK, T. Y. KIM, J. HA, Z.-M. LUO AND S. Y. HWANG Table 2. Simulated size (%) for Tn u with a unimodal kernel and Tn b with a bimodal kernel at size α = 0.05 when m(x) = 300x 3 (1 x) 3 for x [0, 1] and h = n 1/5. ϕ T100 u T100 b T200 u T200 b T400 u T400 b T800 u T800 b Table 3. Simulated power (%) for Tn u with a unimodal kernel and Tn b with a bimodal kernel at size α = 0.05 when m(x) = 300x 3 (1 x) for x [0, 1] and h = n 1/5. ϕ T100 u T100 b T200 u T200 b T400 u T400 b T800 u T800 b

9 NONPARAMETRIC REGRESSION SPECIFICATION 1153 Table 4. Simulated power (%) for T u n with a unimodal kernel and T b n with a bimodal kernel at size α = 0.05 when m(x) = 300x 3 (1 x) for x [0, 1] and h = n 1/5. ϕ T100 u T100 b T200 u T200 b T400 u T400 b T800 u T800 b Table 5. Simulated mean and standard deviation for R n = max j=1,...,n (nhˆσ 2 r) 1/2 n K((x j x i )/h)ˆϵ i with a unimodal kernel when m(x) = 300x 3 (1 x) 3 for x [0, 1] and h = n 1/5. ϕ n = 100 n = 200 n = 400 n =

10 1154 C. PARK, T. Y. KIM, J. HA, Z.-M. LUO AND S. Y. HWANG m(x) = 300x 3 (1 x) as H 1. From Table 4, the overall adjustments made by Tn b tend to disappear as n increases. From Tables 3 and 4, one can infer that, as we have more distant m and large n, Tn u and Tn b achieve similar powers whether the errors are correlated or not. Conclusively, our simulation recommends the use of Tn b because it corrects the size distortion due to correlated error reasonably well, and performs similarly to Tn u for distant H 1 and large n, irrespective of error correlatedness. Table 5 presents the mean and standard deviation of R n = max j=1,...,n (nhˆσ2 r) 1/2 ( xj x ) i K ˆϵ i h for various values of ϕ and n = 100, 200, 400, and 800 when H 0 is true or m(x) = 300x 3 (1 x) 3. There, as ϕ increases from 0 to 0.95 (or dependence gets strong), both mean and standard deviation of R n clearly increase. This shows strong effect of dependency on R n and hence likely size distortion, as discussed in Remark 4. Also as n increases, the mean and standard deviation of R n tend to grow. Acknowledgement TY Kim s research was supported by the Korea Research Foundation Grant funded by the Korean Government (KRF C00046) and by the 2012 Bisa research grant of Keimyung University. J Ha s research has been supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT and Future Planning( ). S.Y. Hwang s work was supported by a grant from the National Research Foundation of Korea (NRF ). Appendix Verification of (2.4). Assume that H 0 is true and let T 1n = (n 2 h) 1 K ij ϵ i ϵ j, where i, j = 1,..., n, K ij = K((i j)/nh); then, from the proof of Theorem 1 below, it suffices to check that (2.4) holds for T 1n. By using (C1) (C3) and Lemma A.1 below T 1n = (n 2 h) 1{ K ij [ϵ i ϵ j Eϵ i ϵ j ] + } K ij Eϵ i ϵ j = (n 2 h) 1{ K ij [ϵ i ϵ j Eϵ i ϵ j ] + K(0) Eϵ i ϵ j

11 + NONPARAMETRIC REGRESSION SPECIFICATION 1155 = (n 2 h) 1{ [K ij K(0)]Eϵ i ϵ j } +O ((n 3 h 2 ) 1 = (n 2 h) 1 = (n 2 h) 1 K ij [ϵ i ϵ j Eϵ i ϵ j ] + K(0) i j Eϵ i ϵ j ) Eϵ i ϵ j } n 1 K ij [ϵ i ϵ j Eϵ i ϵ j ]+2(nh) 1 K(0) Eϵ 0 ϵ i +O((n 2 h 2 ) 1 ) n 1 K ij [ϵ i ϵ j Eϵ i ϵ j ]+2(nh) 1 K(0) Eϵ 0 ϵ i +o((nh 1/2 ) 1 ) n 1 = U 1n + 2(nh) 1 K(0) Eϵ 0 ϵ i + o((nh 1/2 ) 1 ). Since U n1 can be shown to converges to Z in distribution (see the proof of Theorem 1), we have [ nh 1/2 T ] ( n 1 n P > z α = P Z z α 2h 1/2 K(0) ˆσ 0 Eϵ 0 ϵ i ˆσ 0 ) + o(1). Verification of Remark 3. If δ n = (n 1/2 h 1/4 ) ϵ for some ϵ, then by (i) of Theorem 3 we have [ nh 1/2 T ( n p n0 = P > z α ] = P Z z α nh1/2 δ 2 ) nµ 1 ˆσ 0 ˆσ 0 ( = P Z z α (nh1/2 ) (1 ϵ) µ ) 1, ( ) ϵ 0 < ϵ < 1, 1 P Z z α µ 1 ˆσ σ 0, ϵ = 1, (A.1) 0 P (Z z α ), ϵ > 1, where 0 < ϵ 0 = (1 η)/(1 η/2) if h = n η for some 0 < η < 1. Note that nhδn 2 λ > 0 when ϵ = ϵ 0 and that nhδn 2 when 0 < ϵ < ϵ 0. Thus, (A.1) indicates that if the local alternative is of a rate slower than n 1/2 h 1/4 (or ϵ 0 < ϵ < 1), the test or p n0 has asymptotic power 1 with rate s n = (nh 1/2 ) 1 ϵ. If the local alternative is of a rate faster than n 1/2 h 1/4 (or 1 < ϵ), the test has trivial power. In order to prove that the power rate s n is a continuous, but a nonmonotonic function of δ n, we observe that if 0 < ϵ < ϵ 0, then by (ii) of Theorem 3, we have [ nh 1/2 T ] ( ) n p n0 = P > z α = P Z (2 1/2 σ 2 ) 1 (z αˆσ 0 (n 1/2 h 1/2 δ n ) 1 n 1/2 δ n µ 1 ) ˆσ 0

12 1156 C. PARK, T. Y. KIM, J. HA, Z.-M. LUO AND S. Y. HWANG ( ) = P Z (2 1/2 σ 2 ) 1 (z αˆσ 0 (n 1/2 h 1/4 ) (1 ϵ) h 1/4 n (1 ϵ)/2 h ϵ/4 µ 1 ) 1.(A.2) If ϵ = ϵ 0, then by (iii) of Theorem 3, we have [ nh 1/2 T ] ( ( n 2σ 2 ) p n0 = P > z α = P Z 2 ˆσ 0 λ + σ2 0 1/2(zαˆσ 0 λh 1/2 µ 1 )) 1. (A.3) Here p n0 has power rate s n = h 1/2 under the local alternative rate of δ n with ϵ = ϵ 0 and, in addition, it has power rate s n = n 1/2 (nh) ϵ/2 under the local alternative rate δ n with 0 < ϵ < 1. Thus, (A.1) (A.3) may be summarized in terms of power rate s n as follows; 0 ϵ 1; (nh 1/2 ) (1 ϵ) = n (1 η/2)(1 ϵ), ϵ 0 = 1 η 1 η/2 < ϵ < 1; s n = h 1/2 = n η/2, ϵ = ϵ 0 ; (A.4) n (1 ϵ)/2 h ϵ/4 = n 1/2 (1 η/2)ϵ/2 0 < ϵ < ϵ 0 ; n 1/2 ϵ = 0. From (A.4), as ϵ decreases to ϵ 0 (or the rate of the local alternative slows down), the power rate s n slows down to h 1/2 (or ϵ = ϵ 0 ) and then increases to n 1/2. Proof of Theorem 1. Under H 0, ˆϵ i = ϵ i [g(x i, ˆγ) g(x i, γ 0 )], and we can rewrite T n as T n = 1 n 2 h + 1 n 2 h ϵ i ϵ j K ij 2 n 2 h [g(x i, ˆγ) g(x i, γ 0 )]ϵ j K ij [g(x i, ˆγ) g(x i, γ 0 )][g(x j, ˆγ) g(x j, γ 0 )]K ij = T 1n 2T 2n + T 3n, (A.5) where K ij = K( i j nh ). We prove Theorem by showing that (i) nh1/2 T 1n N(0, σ0 2) in distribution; (ii) T 2n = o p ((nh 1/2 ) 1 ); (iii) T 3n = O p (n 1 ). Proof of (i). First note that T 1n = (n 2 h) 1{ K ij [ϵ i ϵ j Eϵ i ϵ j ] + = (n 2 h) 1{ = (n 2 h) 1 K ij [ϵ i ϵ j Eϵ i ϵ j ] + K ij [ϵ i ϵ j Eϵ i ϵ j ] + O((n 3 h 2 ) 1 K ij Eϵ i ϵ j } [K ij K(0)]Eϵ i ϵ j } i j Eϵ i ϵ j )

13 NONPARAMETRIC REGRESSION SPECIFICATION 1157 = (n 2 h) 1 K ij [ϵ i ϵ j Eϵ i ϵ j ] + O((n 2 h 2 ) 1 ) = (n 2 h) 1 K ij [ϵ i ϵ j Eϵ i ϵ j ] + o((nh 1/2 ) 1 ). (A.6) We have used K(0) = 0, (A.1) (A.3), and Lemma A.1. Let U 1n = (n 2 h) 1 Then the proof of (i) is complete if one shows K ij ϵ i ϵ j. (n 2 h) 1/2 [U 1n EU 1n ] N(0, σ 2 0). (A.7) We sketch the proof because its details are established by following the proof of Theorem 2 of Kim et al. (2014) which is a CLT for reduced U statistics under dependence. Here U 1n is basically a reduced degenerate U statistics in its structure because K is compactly supported by (A.1) and K(0) = 0. The reduced U statistics is defined as U nr = 1 i j κ n Ψ(Z i, Z j ), (A.8) N(κ n ) where Ψ is a symmetric function, N(κ n ) is the number of distinct pairs satisfying 1 i j κ n and 1 κ n n. Following the verification of (15) of Lemma 1 of Kim et al. (2014), we can obtain the variance of n 2 hu 1n as follows. V ar( = + K ij ϵ i ϵ j ) K 2 ije(ϵ 2 0)E(ϵ 2 1) + all different indices,i,j,k,l k=1,k i,j K ij K kl E(ϵ i ϵ k )E(ϵ j ϵ l ). K ij K ik E(ϵ 2 0)E(ϵ j ϵ k ) Refer to σ3 2 of (2) of Kim et al. (2014). Using KijE(ϵ 2 2 0)E(ϵ 2 1) = 2n 2 h K 2 [E(ϵ 2 0)] 2 + o(n 2 h), k=1,k i,j K ij K ik E(ϵ 2 0)E(ϵ j ϵ k )

14 1158 C. PARK, T. Y. KIM, J. HA, Z.-M. LUO AND S. Y. HWANG = 4n 2 h K 2 E(ϵ 2 0) j all different indices,i,j,k,l = 8n 2 h K 2 i E(ϵ 0 ϵ j ) + o(n 2 h), K ij K kl E(ϵ i ϵ k )E(ϵ j ϵ l ) E(ϵ 0 ϵ i ) j E(ϵ 0 ϵ j ) + o(n 2 h), we have ( V ar K ij ϵ i ϵ j )=n 2 h [ K 2 (u)du (E(ϵ) 2 ) 2 +( j= ] E(ϵ 0 ϵ j )) 2 +o(n 2 h). For establishing a CLT for nh 1/2 T 1n, one can follow the proof of Theorem 2 of Kim et al. (2014). Indeed, since E( K ij ϵ i ϵ j ) r = i 1 K i1 i 2 K i2r 1 i 2r E(ϵ i1 ϵ i2 ϵ i2r 1 ϵ i2r ), i 2 i 2r the proof of Theorem 2 of Kim et al. (2014) applies in a straightforward fashion, and the proof of (i) is complete. Proof of (ii). Using g(x t, ˆγ) g(x t, γ 0 ) = g (1) (x t, γ 0 )(ˆγ γ 0 )+1/2g (2) (x t, γ)(ˆγ γ 0 ) 2, where γ is between ˆγ and γ 0, we get T 2n = 1 ( n 2 (ˆγ γ 0 ) ϵ s g (1) (x t, γ 0 )K ts + (ˆγ γ 0 ) 2 h t s t s ϵ s g (2) (x t, γ) K ) ts. 2 Then it is not hard to check that, under the conditions of the Theorem, ϵ s g (1) (x t, γ 0 )K ts = O p (n 3/2 h) and ϵ s g (2) (x t, γ)k ts = O p (n 3/2 h). t s t s (A.9) Also refer to Theorem 2 of Kim, Luo, and Ha (2012). Thus 1 n 2 h (ˆγ γ 0) ϵ s g (1) (x t, γ 0 )K ts = O p (n 1 ) = o p ((nh 1/2 ) 1 ) t s 1 n 2 h (ˆγ γ 0) 2 ϵ s g (2) (x t, γ)k ts = O p (n 1 ) = o p ((nh 1/2 ) 1 ). t s We have also used (ˆγ γ 0 ) = O p (n 1/2 ) and H 0. This completes the proof of (ii).

15 NONPARAMETRIC REGRESSION SPECIFICATION 1159 Proof of (iii). This follows from the Mean Value Theorem, (A4)(i), and the fact that ˆγ γ 0 = O p (n 1/2 ). It is trivial to check that s t M g(x t )M g (x s )K st = O(n 2 h). This completes the proof of (iii). Proof of Theorem 2. Since, under H 1, ˆϵ i = θ i [g(x i, ˆγ) g(x i, γ )] where θ i = Y i g(x i, γ ) = η i + m(x i ) g(x i, γ ), we can rewrite T n as T n = 1 n 2 h + 1 n 2 h i j θ i θ j K ij 2 n 2 h [g(x i, ˆγ) g(x i, γ )]θ i K ij i j [g(x i, ˆγ) g(x i, γ )][g(x j, ˆγ) g(x j, γ )]K ij i j = T 1n 2T 2n + T 3n. (A.10) Now, T 1n may be rewritten as T 1n = T 11n + 2T 12n + T 13n where T 11n = 1 n 2 K ij (θ i Eθ i )(θ j Eθ j ), h T 12n = 1 n 2 h T 13n = 1 n 2 h i j K ij (θ i Eθ i )Eθ j, i j K ij Eθ i Eθ j, i j and Eθ i = m(x i ) g(x i, γ ). Now one can show that under the conditions of the Theorem T 11n = O p (n 1 h 1/2 ) = o p (n 1/2 ), T 12n = 1 η i (m(x i ) g(x i, γ )) + o p (n 1/2 ), n i T 13n = [m(x) g(x, γ )] 2 dx + o(1), (A.11) (A.12) (A.13) by using Theorem 2 of Kim, Luo, and Ha (2012). Application of the CLT for triangular array of random variables (see Lemma A.2 of Kim et al. (2014)) yields n 1/2 (σ 1 ) 1 T 12n N(0, 1) (A.14) in distribution where σ1 2 = Cov[(Y 0 m(x 0 ))(g(x 0, γ) m(x 0 )), (Y i m(x i ))(g(x i, γ) m(x i ))]. i= From the above results we have n 1/2 (2σ 1 ) 1 T 1n N(µ, 1) where µ = [m(x) g(x, γ )] 2 dx. Using ˆγ γ = o p (1) under H 1, it is easy to see that T 2n =

16 1160 C. PARK, T. Y. KIM, J. HA, Z.-M. LUO AND S. Y. HWANG o p (n 1/2 ) and T 3n = o p (n 1/2 ), which proves n 1/2 (2σ 1 ) 1 T n N(µ, 1) where µ = [m(x) g(x, γ )] 2 dx. Proof of Theorem 3. (i) If nhδn 2 0, then nh 1/2 T 12n = O p (n 1/2 h 1/2 δ n ) = o p (1) and nh 1/2 T 13n = nh 1/2 [m(x) g(x, γ )] 2 dx + o(1) = nh 1/2 δnµ o(1). Application of Theorem 2 of Kim, Luo, and Ha (2012) yields nh 1/2 T 11n N(0, σ0 2). Thus under H 1n we have nh 1/2 (T n δnµ 2 1 ) N(0, σ0 2 ) in distribution. (ii) If nhδn 2, then n 1/2 δn 1 T 11n = o p (1) since T 11n = O p (n 1 h 1/2 ). Application of CLT for triangular array of random variables (see Lemma A.2 of Kim et al. (2014)) to T 12n yields n 1/2 δn 1 T 12n N(0, σ2 2) where σ2 2 = Cov[(Y 0 m(x 0 ))l(x 0 ), (Y i m(x i ))l(x i )]. i= Then since n 1/2 δn 1 T 13n = n 1/2 δn 1 [ [m(x) g(x, γ )] 2 dx + o(1)], under H 1n we have n 1/2 δn 1 (T n δnµ 2 1 ) N(0, 2σ2 2 ) in distribution. (iii) If nhδn 2 λ > 0, then nh 1/2 T 11n N(0, σ0 2) and nh1/2 T 12n N(0, σ2 2/λ). It is then easy to check that, under the conditions the of Theorem, nh 1/2 T 13n = λh 1/2 [µ 1 + o(1)]. Thus under H 1n we have nh 1/2 (T n λ(nh) 1 µ 1 ) N(0, σ σ2 2 /λ) in distribution. This can be proved by Cramer-Wold device, as in Lemma 6 of Kim, Luo, and Kim (2011). Lemma A.1. Let ζ i M t i s i be α-mixing random variables, where s 1 < t 1 < s 2 < t 2 < < t m and s i+1 t i τ for all i. Assume that, for a positive integer l, ζ i pi = (E η i p i ) 1/p i <, for some p i > 1 with q = m p 1 i < 1. Then m m m E ζ i E ζ i 10(m 1)α(τ) 1 q ζ i pi. For complex valued random variables, it holds that m m m E ζ i E ζ i 40(m 1)α(τ) 1 q ζ i pi. Proof of Lemma A.1. The proof can be found at Theorem 7.4 of Roussas and Ioannides (1987).

17 NONPARAMETRIC REGRESSION SPECIFICATION 1161 References Altman, N. S. (1994). Krige, smooth, both or neither? Technical Report. Biometrics Unit, Cornell Univ. Eubank, R. and Spiegelman, S. (1990). Testing the goodness of fit of a linear model via nonparametric regression techniques, J. Ameri. Statist. Assoc. 85, Fan, Y. and Li, Q. (1999). Central limit theorem for degenerate U-statistics of absolutely regular processes with applications to model specification testing. J. Nonpara. Statist. 10, Glesser, L. J. and Moore, D. S. (1983). The effect of dependence on chi-squared and empiric distribution tests of fit. Ann. Statist. 11, Hart, J. D. (1997). Nonparametric Smoothing and Lack-of-Fit Tests. Springer. Khmaladze, E. V. and Koul, H. L. (2004). Martingale transforms goodness-of-fit tests in regression models. Ann. Statist. 32, Kim, T. Y., Ha, J., Hwang, S. Y., Park, C. and Luo, Z. (2014). Central limit theorems for reduced U-statistics under dependence and their usefulness. Austral. N. Z. J. Statist. 55, Kim, T. Y., Luo, Z. and Ha, J. (2012). Moment bounds for quadratic forms useful in nonparametric functional estimation under dependency. Quantitative Bioscience 31, 1-5. Kim, T. Y., Luo, Z. and Kim, C. (2011). Central limit theorem for degenerate variable U- statistics under dependence. J. Nonpara. Statist. 23, Kim, T. Y., Park, B. U., Moon, M. S. and Kim, C. (2009). Using bimodal kernel for inference in nonparametric regression with correlated errors. J. Multivariate Anal. 100, Laslett, G. (1994). Kriging and splines: An empirical comparison of their predictive performance in some applications. J. Ameri. Statist. Assoc. 89, Luo, Z., Kim, T. Y. and Song, G. M. (2011). Central limit theorem for quadratic errors of Nadaraya-Waston regression estimator under dependence. J. Korean Stat. Soc. 40, Opsomer, J., Wang, Y. and Yang, Y. (2001). Nonparametric regression with correlated errors. Statist. Sci. 16, Roussas, G. G. and Ioannides, D. (1987). Moment inequalities for mixing sequences of random variables. Stochas. Anal. and Appl. 5, Zheng, J. X. (1996). A consistent test of functional form via nonparametric estimation techniques. J. Econometrics 75, Department of Statistics, Keimyung University, Daegu , Korea. cypark1@kmu.ac.kr Department of Statistics, Keimyung University, Daegu , Korea. tykim@kmu.ac.kr Department of Statistics, Keimyung University, Daegu , Korea. jeicy@kmu.ac.kr Department of Statistics, Keimyung University, Daegu , Korea. zhimingluo@kmu.ac.kr Department of Statistics, Sookmyung Womens University, Seoul , Korea. shwang@sookmyung.ac.kr (Received January 2014; accepted July 2014)

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (