Kernel estimation for time series: An asymptotic theory

Size: px

Start display at page:

Download "Kernel estimation for time series: An asymptotic theory"

Trevor Oliver
5 years ago
Views:

Stochastic Processes and their Applications 120 (2010) 2412 2431 www.elsevier.

1 Stochastic Processes and their Applications 120 (2010) Kernel estimation for time series: An asymptotic theory Wei Biao Wu, Yinxiao Huang, Yibi Huang Department of Statistics, The University of Chicago, 5734 S. University Avenue, Chicago, IL 60637, United States eceived 18 June 2009; received in revised form 4 August 2010; accepted 4 August 2010 Available online 8 August 2010 Abstract We consider kernel density and regression estimation for a wide class of nonlinear time series models. Asymptotic normality and uniform rates of convergence of kernel estimators are established under mild regularity conditions. Our theory is developed under the new framework of predictive dependence measures which are directly based on the data-generating mechanisms of the underlying processes. The imposed conditions are different from the classical strong mixing conditions and they are related to the sensitivity measure in the prediction theory of nonlinear time series. c 2010 Elsevier B.V. All rights reserved. Keywords: Kernel estimation; Nonlinear time series; egression; Central limit theorem; Martingale; Markov chains; Linear processes; Sensitivity measure; Prediction theory; Mean concentration function; Fejér kernel 1. Introduction Since the seminal work of Engle [13] on ACH models (autoregressive models with conditional heteroscedasticity) and Tong [37] on TA (threshold autoregressive) models, nonlinear time series has received considerable attention. Since then a variety of new nonlinear time series models have been proposed. Empirical evidence has been found in many disciplines including computer networks, communication, econometrics, electrical engineering, finance, geology, hydrology and other areas that the underlying random processes exhibit nonlinearity and so the classical AMA and AIMA (autoregressive integrated moving-average) based models would be inappropriate. See the excellent monographs of Priestley [29], Tong [38], Fan and Yao [16] and Tsay [40] for examples of nonlinear time series and the related statistical inference. Corresponding author. addresses: wbwu@galton.uchicago.edu (W.B. Wu), yhuang@galton.uchicago.edu (Y. Huang), yibi@galton.uchicago.edu (Y. Huang) /$ - see front matter c 2010 Elsevier B.V. All rights reserved. doi: /j.spa

2 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) A fundamental problem in the study of nonlinear time series is that of unveiling the datagenerating mechanisms that govern the observed time series. Nonparametric methods provide a powerful way to infer the underlying mechanisms and only mild structural assumptions are needed. An important nonparametric procedure is the kernel method. There is an extensive literature on the kernel estimation theory for independent and identically distributed (iid) observations; see, for example, [36,8,43,28,25,14]. Further references can be found in [16]. In time series analysis, however, observations are typically dependent. The dependence is the rule rather than the exception and is actually one of the primary goals of study. In the literature a commonly adopted framework for dependence is the strong mixing condition which asserts that the observations are asymptotically independent as the lags increase. Specifically, a stationary process {X t } t Z is said to be strong mixing if the strong mixing coefficients α n := sup{ P(A B) P(A)P(B) : A A 0, B A n } 0, (1) where A j i = σ (X i,..., X j ), i j. Variants of strong mixing conditions include ρ-mixing, ψ-mixing, β-mixing conditions among others [4]. A variety of asymptotic results have been derived under various mixing rates. It is impossible to give a complete list of references here. epresentative results are [32,35,5,17] and [3] among others. osenblatt [31], Yu [49], Neumann [26] and Neumann and Kreiss [27] deal with β-mixing processes. Further references are given in the excellent reviews by Hardle et al. [20] and Tjostheim [39]. A comprehensive account of nonparametric time series analysis is presented in [16] where numerous asymptotic results are presented under various strong mixing conditions. This paper advances the nonparametric estimation theory for nonlinear time series under a new framework which is different from the one based on the classical strong mixing conditions. In particular, we shall implement the dependence measures proposed in [46] and present a unified asymptotic theory for kernel density and regression estimators. A huge class of time series models can be represented in the form X n = J(..., ε 1, ε n ), (2) where J is a measurable function and ε n, n Z, are iid random variables; see [41,33,22,29,38]. Clearly (2) defines a stationary and causal process. We interpret (2) as a physical system with F n = (..., ε 1, ε n ) being the input, J being a filter and X n being the output. Then it is natural to interpret the dependence as the degree of dependence of the output X n on the input F n, which is a sequence of innovations that drive the system. The paper is organized as follows. Section 2 introduces predictive dependence measures [cf. (8) and (10)], which basically quantify the degree of dependence of outputs on inputs. With those dependence measures, we present an asymptotic theory in Sections 2 and 3 for kernel density and regression estimation of time series. Section 4 contains applications to linear and nonlinear processes. Proofs are given in Section 6. Our results have several interesting features: (i) the predictive dependence measures have nice input/output interpretations and they are directly related to the data-generating mechanisms; (ii) with the martingale theory, the predictive dependence measures are easy to work with; (iii) on the basis of the dependence measures, sharp results can be obtained and (iv) our conditions have a close connection with the sensitivity measure, an important quantity appearing in the prediction theory of stochastic processes. We expect our method and framework to be useful for other problems in time series analysis.

3 2414 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) We now introduce some notation. For a random variable X, write X L p, p > 0, if X p := [E( X p )] 1/p <, and := 2. We say that a function g is Lipschitz continuous on a set A with index 0 < ι 1 if there exists a constant C g < such that g(x) g(x ) C g x x ι for all x, x A. In this case, write g C ι (A). The notation C denotes a constant whose value may vary from line to line. For a sequence of random variables (η n ) and a sequence of positive numbers (d n ), write η n = o a.s. (d n ) if η n /d n converges to 0 almost surely and η n = O a.s. (d n ) if η n /d n is almost surely bounded. We can similarly define the relations o P and O P. Let N(µ, σ 2 ) denote a normal distribution with mean µ and variance σ Kernel density estimation A prerequisite for density estimation is that the marginal density of the process {X t } exists. Unfortunately, X t given in (2) does not always have a density. A simple sufficient condition for the existence of marginal density is that the conditional density exists. ecall that F n = (..., ε 1, ε n ). For i Z, l N, let F l (x F i ) = P(X i+l x F i ) be the l-step-ahead conditional distribution function of X i+l given F i and f l (x F i ) = d dx F l(x F i ) be the conditional density. Condition 1. There exists a constant c 0 < such that sup f 1 (x F 0 ) c 0 almost surely. (3) x Under Condition 1, it is easily seen that X i has a density f satisfying the relation f (x) = E[ f 1 (x F 0 )] c 0. Following [30], given the data X 1,..., X n, the kernel density estimator of f at x 0 is f n (x 0 ) = 1 nb n x0 X t K = 1 n b n K bn (x 0 X t ), (4) where the kernel K satisfies K (u)du = 1, K b(x) = K (x/b)/b and b = b n is a sequence of bandwidths satisfying the natural condition b n 0 and nb n. (5) 2.1. Dependence measures To study asymptotic properties of the density estimate f n, it is necessary to impose appropriate dependence conditions on the underlying process {X t }. Instead of the traditional strong mixing conditions, we shall use a different dependence measure. Let {ε i } be an iid copy of {ε i}, Fi = (..., ε 2, ε 1, ε 0, ε 1,..., ε i ) if i 0 and Fi = F i if i < 0, and Xi = J(Fi ). Namely F i (resp. Xi ) is a coupled process of F i (resp. X i ) with ε 0 replaced by an iid copy ε 0. If f k(x F 0 ) does not depend on ε 0, then f k (x F 0 ) = f k (x F0 ). So the quantity sup x f k (x F 0 ) f k (x F0 ), a distance between the two conditional (predictive) distributions [X k F 0 ] and [Xk F 0 ], measures the contribution of the innovation ε 0 in predicting the future output X k given F 0 by perturbing the input via coupling. For a formal definition, let p > 1, k 0 and θ k,p (x) = f 1+k (x F 0 ) f 1+k (x F 0 ) p (6)

4 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) Note that E[ f 1 (x F k ) F 0 ] = f 1+k (x F 0 ). Then θ k,p (x) = E[ f 1 (x F k ) f 1 (x F k ) F 0, F 0 ] p f 1 (x F k ) f 1 (x F k ) p. (7) Define the sup-distance θ p (k) = sup θ k,p (x) x and the L p integral distance [ ] 1/p θ p (k) = θ p k,p (x)dx. (9) Certainly there are other kinds of distances between probability densities, like total variational distance, Hellinger distance and Kullback Leibler divergence etc. It turns out that in our problem it is more convenient to use the supremum distance (8) and the L p distance (9). If f 1 ( F 0 ) C 1, we define the following distance on the derivatives: ψ k,p (x) = f 1+k (x F 0) f 1+k (x F 0 ) p, ψ p (k) = sup ψ k,p (x), x [ ] 1/p ψ p (k) = ψ p k,p (x)dx. (10) These quantities play an important role in the study of asymptotic properties of f n and they allow us to derive central limit theorems, uniform convergence rates and L p distances of f n (x) f (x) in a very natural way. They are easy to work with since they are directly related to the datagenerating mechanism of X k. In Section 4 we calculate them for the widely used linear processes and some nonlinear time series. In defining our dependence measures, we require that the processes are of form (2). Such a requirement is not needed in the classical strong mixing conditions and the one in [10] L p bounds Let p > 1 and p = min(2, p). For a real sequence a = {a i } i Z, define S p (n; a) = j Z j i=1 j a i p. (11) Let θ p = {θ p (k)} k Z, where θ p (k) = 0 if k < 0. We similarly define θ p, ψ p and ψ p. Let Θ p (n) = S p (n; θ p ), Θ p (n) = S p (n; θ p ) Ψ p (n) = S p (n; ψ p ), Ψ p (n) = S p (n; ψ p ). (12) Theorem 1 provides upper bounds for the sup-norm and integral L p norms of f n (x) E[ f n (x)] in terms of Θ p (n) and Θ p (n), respectively. Theorem 1. Let p > 1. Assume Condition 1, K (v) dv < and sup v K (v) <. Then sup f n (x) E[ f n (x)] p = O[(nb n ) 1/2 + Θp 1/p (n)/n], (13) x (8)

5 2416 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) where p = min(2, p), and [ f n (x) E[ f n (x)] p pdx ] 1/p = O[(nb n ) 1/p 1 + Θ 1/p p (n)/n]. (14) In Theorem 1, the presence of Θ p (n) and Θ p (n) is due to the dependence. In the special case of p = 2, the quantity S 2 (n; a) is interestingly related to Fejér s kernel in Fourier analysis. Let 1 be the imaginary unit. For a nonnegative sequence (a j ), let g(u) = j Z a j e 1 ju, u, be its Fourier transform. Clearly the Fourier transform of the sequence a j a j+n, j Z, is g(u) n k=1 e 1ku. By Parseval s identity, we have the Fejér kernel representation 2π S 2 (n; a) = 2π 0 g(u) 2 e 1ku k=1 du = 2π 0 g(u) 2 sin2 (nu/2) sin 2 du. (15) (u/2) If the nonnegative sequence (a j ) is summable, assume j a j = 1 and let the random variable U have the distribution P(U = j) = a j. Then S 2 (n; a)/n = P 2 (t < U t + n)dt/n. The latter quantity is the mean concentration function of U [21]. So it is natural to view S p (n; a) as a generalized mean concentration function. Corollary 1 provides the magnitude of S p (n; a) for short- and long-range dependent processes, respectively. Lemma 1. Let a = {a i } i Z be a real sequence, p > 1 and p = min(2, p). (i) If i Z a i <, then S p (n; a) = O(n) (ii) If a i = O[ i β l( i )], where 1/p < β < 1 and l is a slowly varying function, then S p (n; a) = O{n[n 1 β l(n)] p }. Proof. (i) Let c 1 = i Z a i. Then ( j i=1 j a i ) p c p 1 j 1 i=1 j a i, so S p (n; a) nc p 1. (ii) Write S n (n; a) = I n + I I n, where I n = j: j 2n and I I n = j: j <2n (cf (11)). By Karamata s theorem, n l=1 a l = O[n 1 β l(n)]. So I I n = O{n[n 1 β l(n)] p }. If j 2n, then j i=1 j a i = no[ j β l( j )] and hence I n = O{n[n 1 β l(n)] p } by another application of Karamata s theorem. Corollary 1. Assume that the conditions in Theorem 1 hold. (i) If θ p (k) <, k=0 then Θ p (n) = O(n) and the bound in (13) becomes O((nb n ) 1/2 ). Similarly, if θ p (k) <, k=1 then Θ p (n) = O(n). (ii) Let θ p (k) = k β l(k), where 1/p < β < 1 and l is a slowly varying function. Then Θ p (n) n[n 1 β l(n)] p and the bound in (13) becomes O[(nb n ) 1/2 + n 1/p β l(n)]. The same bound holds for Θ p 1/p (n) if θ p (k) = k β l(k). (16) (17)

6 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) Note that the quantity θ p (k) = sup x f 1+k (x F 0 ) f 1+k (x F 0 ) p measures the contribution of the innovation ε 0 in predicting X k+1 given F 0. Condition (16) indicates that the cumulative contribution of the input ε 0 in predicting future values {X k } k 1 is finite, thus suggesting shortrange dependence. See [46] for more discussions. The other condition (17) can be similarly interpreted in terms of the L p integral norm. Corollary 1(ii) shows the interesting dichotomous phenomenon [7,44]: If b n = o[n 2β 1 2/p l 2 (n)], then the first term (nb n ) 1/2 dominates and it is same as the one obtained under shortrange dependence. On the other hand, however, if we have a large bandwidth b n such that n 2β 1 2/p l 2 (n) = o(b n ), then the second term n 1/p β l(n) dominates. The overall bound depends on the interplay between the bandwidth b n and the long-range dependence parameter β. Corollary 2. Let the conditions in Theorem 1 be satisfied. Assume f C 2, u2 K (u) du < and uk (u)du = 0. (i) If sup x f (x) <, then sup f n (x) f (x) p = O[bn 2 + (nb n) 1/2 + Θp 1/p (n)/n]. (18) x (ii) If f (u) p du <, then [ ] 1/p f n (x) f (x) p pdx = O[bn 2 + (nb n) 1/p 1 + Θ p 1/p (n)/n]. (19) Proof. Let the bias B n (x) = E f n (x) f (x). Since uk (u)du = 0, B n (x) = K (u)[ f (x b n u) f (x) + b n u f (x)]du. Case (i) is well-known and it easily follows from Taylor s expansion. For (ii), we have B n (x) K (u) b2 n u2 1 f (x b n ut) dtdu = O(bn 2 ) f (x b n ut) K (u)dudt, 0 where K (v) = v 2 K (v)/ u2 K (u) du. Then (19) follows from 1 B n (x) p dx = O(bn 2p ) f (x b n ut) p K (u)dudtdx = O(bn 2p ) Uniform bounds Theorem 2. Assume that, for some ι, a > 0, K C ι is a bounded function with bounded support, and that X i L a. Further assume Condition 1, Θ 2 (n) + Ψ 2 (n) = O[n α l(n)], where α 1 and l is a slowly varying function, and log n = o(nb n ). Then sup f n (x) E[ f n (x)] = O x Here l is another slowly varying function. log n + n α/2 1 l(n) nb n almost surely. (20)

7 2418 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) If uk (u)du = 0 and sup x f (x) <, then the bias B n (x) = E f n (x) f (x) satisfies sup x B n (x) = O(b 2 n ). If additionally α < 8/5, then for b n (n 1 log n) 1/5, by Theorem 2, we have sup f n (x) f (x) = O(n 1 log n) 2/5 almost surely. (21) x Stute [34] showed that, if X i are iid, then (n/ log n) 2/5 sup x c f n (x) f (x) /f (x) converges almost surely to a non-zero constant if inf x c f (x) > 0, c > 0. So (21) gives the optimal convergence rate (n 1 log n) 2/5. Section 5 contains a comparison study of Theorem 2 and results obtained under strong mixing conditions. Bickel and osenblatt [2] obtained a deep result on asymptotic distributional properties of sup 0 x 1 f n (x) E[ f n (x)] for iid random variables X i. Their result is generalized by Neumann [26] to geometrically β-mixing processes; see also [23] for some recent contributions. 3. Kernel regression estimation Nonparametric techniques play an important role in assessing the relationship between predictors and responses if the form of the functional relation is unknown. A popular nonparametric procedure is the Nadaraya Watson estimator. To formulate the regression problem, we consider the model Y n = G(X n, η n ), where η n, n Z, are also iid and η n is independent of F 1 = (..., ε 2, ε 1 ). An important special example of (22) is the autoregressive model X n+1 = (X n, ε n+1 ) on letting η n = ε n+1 and Y n = X n+1. Given the data (X i, Y i ), 0 i n, let T n (x) = 1 n (22) (23) Y t K bn (x X t ). (24) Then the Nadaraya Watson estimator of the regression function g(x 0 ) = E(Y n X n = x 0 ) = E[G(x 0, η 0 )] (25) has the form g n (x 0 ) = T n(x 0 ) f n (x 0 ). Here we shall present an asymptotic theory for g n (x 0 ). In particular, under mild regularity conditions on G and f, we shall provide a central limit theorem and a uniform convergence rate for g n (x 0 ) g(x 0 ). Let V p (x) = E[ G(x, η n ) p ] and σ 2 (x) = V 2 (x) g 2 (x). (27) The following regularity conditions on K are needed. Condition 2. The kernel K is symmetric and bounded on : sup u K (u) K 0, K (u)du = 1 and K has bounded support; namely, K (x) = 0 if x c for some c > 0. (26)

8 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) Theorem 3. Let p > 2. Assume Conditions 1 and 2, V 2, g C() and that V p (x) is bounded on a neighborhood of x 0. Further assume that b n Θ 2 (n) = o(n) and nb n. (28) Let κ = K 2 (u)du. Then nbn {T n (x 0 ) E[T n (x 0 )]} N[0, V 2 (x 0 ) f (x 0 )κ]. (29) In Theorem 3, (29) can be used to prove central limit theorems for kernel density and Nadaraya Watson estimates (cf. Corollary 3). For G 1, T n (x) = f n (x) is the kernel density estimate and one has (29) with V 2 1. Wu [46] obtained asymptotic normality of f n under the condition k=0 θ 2 (k) <. In this case Θ 2 (n) = O(n) and (28) is automatically satisfied under the natural bandwidth condition (5). Clearly condition (28) also allows long-memory processes. Wu and Mielniczuk [44] considered the special cases of short- and long-memory linear processes. Corollary 3. Let f (x 0 ) > 0. Then under the conditions of Theorem 3, we have nbn g n (x 0 ) ET n(x 0 ) N[0, σ 2 (x 0 )κ/f (x 0 )]. (30) E f n (x 0 ) Proof. Let ν n = ν n (x 0 ) = ET n (x 0 ) and µ n = µ n (x 0 ) = E f n (x 0 ). Since f (x 0 ) > 0, K has bounded support and g is continuous, ν n /µ n g(x 0 ). Observe that T n (x 0 ) f n (x 0 ) ν n µ n = {T n (x 0 ) f n (x 0 )g(x 0 ) ν n + µ n g(x 0 )} + [ f n (x 0 ) µ n ][g(x 0 ) ν n /µ n ] =: A n + B n. (31) Applying this time Theorem 3 with G 1 instead of G, we have B n nbn = o P (1) and f n (x 0 ) f (x 0 ) in probability. Hence again by Theorem 3, nb n A n N[0, σ 2 (x 0 )κ f (x 0 )], which by Slutsky s theorem yields (30). Theorem 4. Let p > 1 and p = min(2, p). Assume that Y i L p, g C(), V p ( ) is bounded in an open interval containing [ m, m], m > 0, and the kernel K C ι, ι > 0, satisfies Condition 2. (i) Let z n = n 1/p log n + (nb n log n) 1/2. Then sup T n (x) E[T n (x)] = O a.s.(z n ) + O P[ x [ m,m] nb n Θ2 (n) + Ψ 2 (n)]. (32) n (ii) If additionally Θ 2 (n)+ψ 2 (n) = O[n α l(n)], where α 1 and l is a slowly varying function, then (32) has the bound O a.s. [z n /(nb n )] +o a.s. [n α/2 1 l(n)], where l is a slowly varying function depending on l. In the kernel estimation theory it is routine to compute the bias ν n (x)/µ n (x) g(x). If f, g C 2 and K satisfies Condition 2, then it is easily seen that the bias is of the order O(b 2 n ). Clearly sup x [ m,m] f n (x) µ n (x) has the same bound as the one in (32). By (31) and Theorem 4, we have: Corollary 4. Assume that f, g C 2, inf x m f (x) > 0, sup x m g (x) <, and that K C ι, ι > 0 satisfies Condition 2. Then for any m > 0, sup g n (x) g(x) = O a.s.(z n ) + O P[ Θ2 (n) + Ψ 2 (n)] + O(bn 2 ). (33) x m nb n n

9 2420 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) Corollary 4 allows long-memory as well as heavy-tailed processes. As in (ii) of Lemma 1, if θ 2 (k) + ψ 2 (k) = k β l(n), where 1/2 < β < 1 and l is a slowly varying function, then Θ 2 (n) + Ψ 2 (n) = O{n[n 1 β l(n)] 2 }. The term that dominates the sum will vary with different choices of b n, suggesting dichotomy. Now consider the short-memory case in which k=0 [θ 2 (k) + ψ 2 (k)] <. Then Θ 2 (n) + Ψ 2 (n) = O(n) and the bound in (33) becomes O P [z n /(nb n )] + O P (bn 2). If p 5/2, then the latter bound achieves a minimum if b n (n 1/p 1 log n) 1/3. On the other hand, if p > 5/2, then the minimal bound is achieved if b n (n 1 log n) 1/5. 4. Applications To apply the results of Sections 2 and 3, we need to calculate θ p (k), θ p (k) and ψ p (k) defined in (8) (10). It is usually not difficult to calculate them since they are directly related to the underlying data-generating mechanism. Sections consider linear processes, iterated random functions and chains with infinite memory, respectively Linear processes Let ε i be iid random variables with density f ε ; let (a i ) be real coefficients such that X t = a i ε t i i=0 is a well-defined random variable. Important special cases of (34) include AMA and fractional AIMA models. Assume that ε i L q, q > 0, and that f ε satisfies c 2 := sup[ f ε (x) + f ε (x) + f ε (x) ] <. (35) x Then both θ p (k) and ψ p (k) are of order O[ a k+1 min(1,q/p) ]; see Lemma 3 in [47]. For completeness we include that simple argument here. Let a 0 = 1, Y k = X k+1 ε k+1 and D k = a k+1 (ε 0 ε 0 ), k 0. Then f 1(x F k ) = f ε (x Y k ) and θ k,p (x) = E[ f 1 (x F k ) F 0 ] E[ f 1 (x F k ) F 0 ] p 2 f ε (x Y k ) f ε (x Y k ) p 2c 2 min(1, D k ) p (34) 2c 2 {E[ D k min(q,p) ]} 1/p = O[ a k+1 min(1,q/p) ] (36) since sup x [ f ε (x) + f ε (x) ] <. If, additionally, sup x f ε (x) <, then the same bound holds for ψ p (k). It is worthwhile to mention that in our setting heavy-tailed distributions are allowed. To deal with θ p (k), we shall impose the following analogue of (35): I 0 := [ f ε (x) p + f ε (x) p + f ε (x) p ]dx <. (37) Let t and p > 1. By Hölder s inequality, since f ε (x + t) f ε (x) = t 0 f ε (x + u)du, f ε (x + t) f ε (x) p t dx t p 1 f ε (x + u) p du dx I 0 t p. 0

10 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) It is easily seen that the above integral is also bounded by 2 p I 0. Then, by (36), θ p k,p (x)dx = E f ε (x Y k ) f ε (x Yk ) p dx E[min(2 p I 0, I 0 a k+1 ε 0 a k+1 ε 0 p )] = O[ a k+1 min(p,q) ]. (38) With (36) and (38), we are able to give bounds for Θ p (n) = S p (n; θ p ) and Θ p (n) = S p (n; θ p ). Consider the special case p = q = 2. If the short-range dependence condition a i < i=0 (39) holds, then Θ 2 (n) + Ψ 2 (n) = O(n), and, under the mild bandwidth condition b n + 1/(nb n ) = O(n δ ), δ > 0, the bound in (20) becomes O[(log n) 1/2 /(nb n ) 1/2 ]. Note that the optimal bound (21) continues to hold for long-range dependent processes with a n = O(n β ), 9/10 < β < 1, in which case by Corollary 1 we have Θ 2 (n) + Ψ 2 (n) = O(n 3 2β ) and (21) follows from elementary calculations. For short-memory linear processes, Wu and Mielniczuk [44] proved a central limit theorem for f n (x) by assuming that f ε is Lipschitz continuous and ε i has finite second moment. The former condition is weaker than (35) while in our setting we allow E(ε0 2 ) =. For long-memory linear processes, using Ho and Hsing s [19] empirical process theory, Wu and Mielniczuk [44] discovered the dichotomous and trichotomous phenomena for f n (x) for various choices of bandwidths. Since there is no empirical process theory for long-memory nonlinear processes, our general approach here is unable to produce Wu and Mielniczuk s dichotomy and trichotomy results Iterated random functions Consider the nonlinear time series defined by the recursion X n = εn (X 1 ), where is a bivariate measurable function. For different forms of, one can get threshold autoregressive models [38], A with conditionally heteroscedasticity [13], random coefficient models [24] and exponential autoregressive models [18] among others. Diaconis and Freedman [9] showed that (40) has a unique and stationary distribution if there exist α > 0 and x 0 such that L ε0 + ε0 (x 0 ) L α and E[log(L ε0 )] < 0, where ε0 (x) ε0 (x ) L ε0 = sup x x x x. (41) In this case, by iterating (40), we have that X n is of form (2). Due to the Markovian structure, we can write f k (x F 0 ) = f k (x X 0 ), where f k (x X 0 ) is the conditional density of X k at x given X 0. Let f k (y x) = f k(y x)/ y. For p > 1, k N define [ I k,p (x) = x f p 1/p [ k(y x) dy] and J k,p (x) = x f k p 1/p dy] (y x). (42) (40)

11 2422 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) In the nonlinear prediction theory, a key problem is the sensitivity of initial values. In particular, one needs to study the distance between the k-step-ahead predictive distributions [X k X 0 = x] and [X k X 0 = x + δ], which results from a drift δ in the initial value. A natural way to quantify the sensitivity is to use the L p distance [ 1/p k (x, δ) := f k (y x + δ) f k (y x) dy] p. (43) Fan and Yao [16, p. 466] considered the case p = 2. Under certain regularity conditions, lim δ 0 k (x, δ)/ δ = I k,p (x). J k,p (x) can be similarly interpreted as a prediction sensitivity measure. Wu [48] applied the sensitivity measure to empirical processes. Proposition 1 shows the relation between θ p (k) and I k,p. Proposition 1. Let τ k,p (a, b) = b a I k,p(x)dx, k 1. Then (i) θ p (k 1) τ k,p (X 0, X 0 ) p and (ii) θ p (k 1) 2 τ 1,p (X k 1, X k 1 ) p. Proof. Let q = p/(p 1) and λ(x) = I 1/q k,p (x). By Hölder s inequality, X 0 p X x f 0 f k (y x)/ x p k(y x)dx λ p dx (x) τ k,p(x 0, X0 ) p/q. X 0 X 0 Hence f k(y X 0 ) f k (y X0 ) p dy τ k,p (X 0, X0 ) p and (i) follows. By (7), (ii) similarly follows. If r = L ε0 p < 1, then X n Xn p = O(r n ) [45]. If additionally sup x I 1,p (x) <, then by Proposition 1(ii), θ p (n) = O(r n ). When p = 1, the quantity τ k,1 (X 0, X0 ) in Proposition 1 is closely related to the τ-dependent coefficient in [11]. Let Λ 1 () be the set of 1-Lipschitz functions from to. Then their τ coefficient τ(σ (X 0 ), X k ) is E sup E[g(X k ) X 0 ] Eg(X k ) = E sup g(y)[ f k (y X 0 ) f (y)]dy. g Λ 1 () g Λ 1 () If sup x,y [ f k (y x) + f (y)] <, then τ(σ (X 0 ), X k ) CE f k (y X 0 ) f (y) dy. Note that E f k (y X 0 ) = f (y). Then τ(σ (X 0), X k ) CE[τ k,1 (X 0, X 0 )] Chains with infinite memory Doukhan and Wintenberger [12] introduced a model for chains with infinite memory: X k+1 = F(X k, X k 1,... ; ε k+1 ), (44) where ε k are iid innovations. Here we consider a special form of (44): X k+1 = G(X k, X k 1,...) + ε k+1, (45) where, as in [12], we assume that G satisfies G(x 1, x 2,...) G(x 1, x 2,...) ω j x j x j, where w j 0. (46) j=1

12 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) Under suitable conditions on (ω j ) j 1, iterations of (45) lead to a stationary solution X k of form (2). For such processes we are also able to provide a bound for Θ p (n) and other similar quantities, so our theorems are applicable. For simplicity let p = 2 and assume ε k L 2. Let δ 2 (k) = X k J(Fk ). For k 0, by (45) and (46), we have k+1 δ 2 (k + 1) ω i δ 2 (k + 1 i). (47) i=1 Define the sequence (a k ) k 0 recursively by a 0 = δ 2 (0) and k+1 a k+1 = ω i a k+1 i. i=1 Then S 2 (n; δ 2 ( )) S 2 (n; a). Let A(s) = k=0 a k s k and Ω(s) = i=1 ω i s i, s 1. By (48), we have A(s) = a 0 + A(s)Ω(s). Hence A(s) = a 0 (1 Ω(s)) 1. Assume that, as s 1, 1 Ω(s) (1 s) d with d (0, 1/2). Note that in our setting Ω(1) = j=1 ω j = 1, while Ω(1) < 1 is required in [12]. Hence we can allow stronger dependence. As in (15), with elementary manipulations, we have 2π S 2 (n; a) = 2π 0 A(e 1u ) 2 sin2 (nu/2) sin 2 (u/2) du = O(n1+2d ). Assume that the density function of ε i satisfies (35). As in Section 4.1, let Y k = X k+1 ε k+1. Following the calculation in (36), we have θ k,2 = O( Y k Yk ) = O(δ 2(k + 1)). Hence Θ 2 (n) = O(n 1+2d ). If, as in [12], Ω(1) < 1, then A(1) < and Θ 2 (n) = O(n). Other quantities Θ 2 (n), Ψ 2 (n) and Ψ 2 (n) can be similarly dealt with. 5. A comparison with earlier results For strong mixing processes, uniform error bounds of kernel density estimates sup x f n (x) f (x) have been discussed by Bosq [3] and Fan and Yao [16] among others. Bosq obtained a bound of the form (n 1 log n) 2/5 log...log n under the assumption that the process is exponentially strong mixing. Fan and Yao [16, p. 208] improved Bosq s results by showing that, if the strong mixing coefficient α(n) = O(n χ ) with χ > 5/2, n 2χ 5 b 2χ+5 n (log n) (2χ+1)/4 and b n 0, (49) then over a compact interval [c 1, c 2 ], the following weak upper bound holds: log n sup f n (x) E[ f n (x)] = O P. (50) x [c 1,c 2 ] nb n We now compare our results with that of Fan and Yao for linear processes. It is not easy to obtain a sharp bound for the strong mixing coefficient α k. Consider the special case in which a k k δ, k N. By the result of Withers [42], one can get α k = O(k 4/3 2δ/3 ). estrictive conditions on δ are needed to ensure strong mixing. To apply Fan and Yao s result, one needs to have δ > 23/4 to ensure the strong mixing condition α(n) = O(n χ ) with χ > 5/2. In comparison, however, our Theorem 2 only requires δ > 1 (cf Condition (39)). (48)

13 2424 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) Doukhan and Louhichi [10] introduced an interesting weak dependence measure for stationary processes. In their sense a sequence {Z t } is weakly dependent if there exists a sequence τ(n) 0 such that, for any u-tuple (i 1,..., i u ) Z u and v-tuple ( j 1,..., j v ) Z v with i 1 i u < i u + n j 1 j v, u, v N, and any h, k with finite Lipschitz modulus and h, k 1, cov(h(z i1,..., Z iu ), k(z j1,..., Z jv )) [ulip(h) + vlip(k)]τ(n). (51) Here a function h : u is said to have finite Lipschitz modulus if h(x) h(y) Lip(h) := sup <. x y x y 1 Ango Nze et al. [1] discussed asymptotic properties of nonparametric estimates under (51) and showed that if the weak dependence coefficient τ(n) = O(n r ) with r > 4, then (29) holds, and if τ(n) a n for some 0 < a < 1, then sup T n (x) E[T n (x)] = O x [ m,m] (log n) 2 almost surely. nbn Note that the bound in Theorem 4 is sharper. For the linear process, Z t = i=0 a i ε t i, where ε t are iid with compact support [ 1, 1] (say) and A 0 := i=0 a i <. Assume A 0 1 and let h(x) = k(x) = min(max(x, 1), 1). Then h(z t ) = k(z t ) = Z t and 2τ(n) cov(z 0, Z n ) = i=0 a i a n+i. As a special case, let a n n β ; then τ(n) cn 1 2β for some c > 0. The condition of Ango Nze et al. requires β > 2.5, where β > 1 is sufficient for Theorem 3. For the almost sure convergence, the result of Ango Nze et al. requires exponential decay of τ(n), which forces a n to decay exponentially as well, while Theorem 4 only requires that the a n are summable. 6. Proofs This section provides proofs of the results in previous sections. Lemma 2 easily follows from Burkholder s inequality, so we omit the details. Lemma 3 below gives a bound for H n (x) = ni=1 [ f 1 (x F i ) f (x)]. It allows long-memory processes. Lemma 2. Let D i, i Z, be martingale differences with D i L p, p > 1, p = min(2, p). Then there exists a constant c p > 0 such that i D i p p c p i D i p p. Lemma 3. Let H n (x) = n i=1 [ f 1 (x F i ) f (x)], H n (x) = min(2, p). Then we have: d dx H n(x), p > 1 and p = (i) sup x H n (x) p p c p Θ p (n), (ii) H n(x) p pdx = O[ Θ p p/p (n)], (iii) sup x H n (x) p p c p Ψ p (n), (iv) H n (x) p pdx = O[ Ψ p p/p (n)], (v) E sup x Hk 2(x) H k(u) 2 + H k (u) 2 du = O[ Θ 2 (k) + Ψ 2 (k)], and (vi) if Θ 2 (n) + Ψ 2 (n) = O(n α l(n)), where l is slowly varying, then sup x H n (x) = O a.s. (n α/2 l(n)), where l is another slowly varying function.

14 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) Proof. Define projection operators P k, k Z, by P k V = E(V F k ) E(V F k 1 ), V L 1. Note that P k H n (x), k =..., n 1, n, are martingale differences. By Theorem 1 in [46], P 0 f 1 (x F i ) p θ i,p (x). So (i) follows from Lemma 2 in view of H n (x) p p c p k= k= P k H n (x) p p k i=1 k θ i,p (x) k= p p P k f 1 (x F i ) p i=1 Θ p (n). (52) To prove (ii), let δ k = n i=1 θ p (i k). By Hölder s inequality, k p θ i,p (x) dx n θ p i k,p (x) p 1 θ i=1 k i=1 θ p 1 i k,p dx = δ p k. i k,p i=1 If 1 < p 2, (ii) follows from (52). If p > 2, then p = 2. Let q = 1/(1 2/p). Again by Hölder s inequality and (52), p k H n (x) p p cp p/2 dx k= i=1 k θ i,p (x) δ p 2 k δk 2 k= p/2 1 dx = O[ Θ p p/2 (n)]. Cases (iii) and (iv) can be similarly proved. Case (v) easily follows from the inequality sup x H 2 k (x) H k(u) 2 + H k (u) 2 du. For (vi), define H n = max 1 k n sup x H k (x). Let l 0 (n) = l(n) if α > 1, and l 0 (n) = j:2 j n l1/2 (2 j ) if α = 1; let l(n) = (log n) 1/2+ϵ l 0 (n), where ϵ > 0. Then l is also slowly varying. By Lemma 4 in [47], we get d H 2 d 2 d j 2 sup H 2 j (x) x = j=0 d j=0 d+ j (α 1) O(1)2 2 l 1/2 (2 j ) = O(2 dα/2 l 0 (2 d )), (53) which by the Borel Cantelli lemma implies H 2 d = o a.s. [2 dα/2 l(2 d )] as d since P( H 2 d > 2 dα/2 l(2 d H )) 2 d 2 2 dα l 2 (2 d ) d=1 d=1 1 log 1+2ϵ <. (54) 2d Hence H n = o a.s. [n α/2 l(n)] since H n is non-decreasing and l is slowly varying Proof of Theorem 1 Let H n (x) = n [ f 1 (x F t 1 ) f (x)]. Write n{ f n (x) E[ f n (x)]} = P n (x)+ Q n (x), where P n (x) = d=1 {K bn (x X t ) E[K bn (x X t ) F t 1 ]}, (55)

15 2426 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) Q n (x) = {E[K bn (x X t ) F t 1 ] E[K bn (x X t )]} = K bn (x u)h n (u)du = K (u)h n (x b n u)du. (56) By Hölder s inequality, [ p 1 Q n (x) p K (u)h n (x b n u) p du K (u) du]. (57) By (57) and Lemma 3(i), sup x Q n (x) p p = sup H n (x) p p O(1) = O[Θp p/p (n)]. (58) x Similarly, by Lemma 3(ii), Q n (x) p pdx O(1) H n (x) p pdx = O[ Θ p/p p (n)]. (59) It remains to deal with the martingale part P n (x). Let D i,1 (x) = K ((x X i )/b n ) E[K ((x X i )/b n ) F i 1 ] and define recursively D i,k+1 (x) = Di,k 2 (x) E[D2 i,k (x) F i 1], k N. Then D i,k (x), i = 1, 2,..., are martingale differences. Let l N be fixed. We now show that, for any k, A n,k (2 l 2 l ) := E D i,k (x) dx = O (nb n ) 2l 1 (60) i=1 as n. To this end, we shall apply the induction method. If l = 1, (60) easily follows since the D i,k (x) are orthogonal and E[Di,k 2 (x)] CE[K 2k ((x X i )/b n )] = Cb n K 2k (u) f (x b n u)du. Let l 2. By the Burkholder Davis Gundy inequality [6], A n,k (2 l 2l 1 ) C E D 2 i,k (x) dx i=1 2 l 1 2 l 1 C E D i,1+k (x) dx + C E E[D 2 i,k (x) F i 1] dx. (61) i=1 By the induction hypothesis, the first integral in (61) is of order O((nb n ) 2l 2 ). For the second one, note that E[Di,k 2 (x) F i 1] CE[K 2k ((x X i )/b n ) F i 1 ] Cb n K 2k (u) f 1 (x b n u F i 1 )du. Under Condition 1, since K (v) dv < and sup v K (v) <, the second term in (61) is of order O((nb n ) 2l 1 ) in view of the inequality n i=1 a i p n p 1 n i=1 a i p, p 1. Hence (60) follows and it further implies that, for all p (2 l, 2 l+1 ), we have A n,k (p) [A n,k (2 l )] 2 p/2l [A n,k (2 l+1 )] p/2l 1 = O((nb n ) p/2 ) (62) by Hölder s inequality. So, by (59), (14) follows if p 2. If 1 < p < 2, by Lemma 2, A n,1 (p) nc D 1,1 (x) p pdx nc E K ((x X i )/b n ) p dx = O(nb n ). So (14) holds if 1 < p < 2 in view of (59). i=1

16 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) Using an induction argument similar to that of (60) and (61), we have sup x n i=1 D i,k (x) p = O((nb n ) 1/2 ) for all p = 2 l, l N. Hence by (58), (13) follows Proof of Theorem 2 Let n = sup x n 5/a f n (x) E f n (x). Since K is bounded with bounded support and X 1 L a, we have by Markov s inequality that b n E( n ) 2E sup K [(x X 1 )/b n ] = O(1)P( X 1 n 5/a /2) = O(1) x n 5/a n 5. (63) By the Borel Cantelli lemma, since E( n ) nb n is summable, n nbn = o a.s. (1). Write n{ f n (x) E[ f n (x)]} = P n (x) + Q n (x) as in Theorem 1. From (56), sup x Q n (x) O(1) sup x H n (x). By Lemma 3(vi), sup x H n (x) = O a.s. (n α/2 l(n)). It remains to consider the behavior of P n (x) over x [ n 5/a, n 5/a ]. Let Z t (x) = K bn (x X t ) E(K bn (x X t ) F t 1 ) (64) be the summands of P n (x). Let l = n 1+5/a+1/ι and x l = xl /l. Observe that Z t 2K 0 /b n and E(Zt 2 F t 1) bn 1 K 2 (u) f (x b n u)du bn 1c 1, where c 1 = c 0 K 2 (u)du. Let τ n = nbn 1 log n and λ = 30c 1 (1/a + 1/ι + 1). Since log n = o(nb n ), by the inequality of Freedman [15], P( P n (x) λτ 2 n λτ n ) 2 exp 4K 0 bn 1 λτn + 2nbn 1 = O n λ/(3c 1). c 1 Hence P(max x n 5/a P n ( x l ) > λτ n ) = O(n 5/a ln λ/(3c 1) ) = o(n 2 ), which by the Borel Cantelli lemma implies that max x n 5/a P n (x) = O a.s. (τ n ) since n[n 5/a /(lb n )] ι = O( n), K C ι and sup x P n (x) P n ( x l ) = O(n[n 5/a /(lb n )] ι ) = O(τ n ) Proof of Theorem 3 ecall that G i = (..., η i 1, η i ; F i ). Let ζ n,t = b n /nk bn (x 0 X t ) and ξ n,t = ζ n,t Y t. Then ξ n,t is G t -measurable. Let d n,t = ξ n,t E(ξ n,t G t 1 ). Then (29) follows from and d n,t N[0, V 2 (x 0 ) f (x 0 )κ] (65) L n := [E(ξ n,t G t 1 ) Eξ n,t ] = o P (1). (66) For (66), since K satisfies Condition 2 and E(ξ n,t G t 1 ) = b n /n g(x 0 b n u)k (u) f 1 (x 0 b n u F t 1 )du, (67) by Lemma 3(i), (28), and E H n (x) H n (x),

17 2428 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) bn E L n K (u)g(x 0 b n u) E H n (x 0 b n u) du n = O[ b n Θ 2 (n)/n] = o(1). Next we shall apply the martingale central limit theorem to show (65). Let δ = p/2 1. By (67), n E(ξ n,1 G 0 ) 2 = O(b n ). Again by Lemma 3(i) and (28), [E(ξn,t 2 G t 1) E(ξn,t 2 )] = V 2 (x 0 b n u)k 2 (u) H n(x 0 b n u) du n = O P [Θ 1/2 2 (n)/n] = o P (1). Since E(d 2 n,t G t 1) = E(ξ 2 n,t G t 1) E 2 (ξ n,1 G 0 ) and ne(ξn,t 2 ) = E[Y t 2 K b 2 n (x 0 X t )] = E[V 2 (X t )Kb 2 n (x 0 X t )] = V 2 (x 0 b n u)k 2 (u) f (x 0 b n u)du V 2 (x 0 ) f (x 0 )κ, we have n E(d 2 n,t G t 1) V 2 (x 0 ) f (x 0 )κ in probability. Note that n d n,1 p p 2 p n ξ n,1 p p = 2 p b 1+δ n n δ E{V p (X t ) K bn (x 0 X t ) p } = O[(nb n ) δ ] 0. Then Lindeberg s condition is fulfilled Proof of Theorem 4 Write n{t n (x) E[T n (x)]} = D n (x) + M n (x) + N n (x), where D n (x) = M n (x) = N n (x) = [Y t g(x t )]K bn (x X t ), (68) {K bn (x X t )g(x t ) E[K bn (x X t )g(x t ) F t 1 ]}, and (69) {E[K bn (x X t )g(x t ) F t 1 ] E[K bn (x X t )g(x t )]}. (70) Then Theorem 4 follows from Proposition 2 and Lemma 4 below. Proposition 2. Let the conditions in Theorem 4 be fulfilled. ecall z n = n 1/p log n + (nb n log n) 1/2. Then (i) sup x [ m,m] D n (x) = O a.s. (z n /b n ) and (ii) sup x [ m,m] M n (x) = O a.s. (z n /b n ). Proof of Proposition 2. Let n = 2 log n/ log 2, Y i = Y i 1 Yi n 1/p, Y i = Y i Y i. ecall G i = (..., η i 1, η i, F i ). Let Z t (x) = Z t,n (x) = K bn (x X t )[Y t E(Y t G t 1)], I n (x) = Z t (x) and I I n (x) = D n (x) I n (x). (71)

18 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) Elementary calculations show that there exists a constant C p such that E 2d [ Y t + E( Y t G t 1)] 2 d/p = 2 2 d d/p E[ Y 1 1 Y1 2 d/p] C pe[ Y 1 p ]. d=1 d=1 By the Borel Cantelli lemma, 2 d [ Y t + E( Y t G t 1)] = o a.s. (2 d/p ) as d. Since K is bounded, we have sup x I I n (x) = o a.s. (n 1/p /b n ). We shall now deal with I n (x). Observe that E[Z t (x) G t 1 ] = 0 and Z t is bounded by 2K 0 n 1/p /b n c 1 n 1/p /b n, where c 1 = 2K 0. Also, E[Z 2 t (x) F t, η t 1, η t 2,...] K 2 b n (x X t )E[(Y t )2 F t, η t 1, η t 2,...] K 2 b n (x X t )V p (X t )(n 1/p ) 2 p. Since V p ( ) is bounded in an open interval containing [ m, m], E[Kb 2 n (x X t )V p (X t ) G t 1 ] c 2 bn 1, where c 2 is chosen such that c 0 K 2 (u)v p (x b n u)du c 2. For all x [ m, m], by the inequality of Freedman [15], [ λ 2 z 2 ] n P[ I n (x) λz n /b n ] 2 exp /b2 n 2(c 1 n 1/p /b n )(λz n /b n ) + 2c 2 n 1+2/p p /p. /b n For x let x l = xl /l, where l = n 8/ι. Then E[sup x I n (x) I n ( x l ) ] = O(l ι /bn 2)E Y 0 = O(n 5 ). Using the argument in the proof of Theorem 2, we have (i) by letting λ = 16(c 1/2 2 + c 1 )(1 + ι 1 ). (ii) can be similarly proved and the truncation argument is not needed since K bn (x X t )g(x t ) is bounded on [ m, m]. Lemma 4. With Condition 2, for any m > 0, E[sup x m N 2 n (x)] = O[Θ 2(n) + Ψ 2 (n)]. Furthermore, if Θ 2 (n) + Ψ 2 (n) = O(n α l(n)), where l is a slowly varying function, then sup x m N n (x) = O a.s. (n α/2 l(n)), where l(n) can be chosen as the one in Lemma 3(vi). Proof. Let H n (x) = n i=1 [ f 1 (x F i 1 ) f (x)]. Since K has bounded support, and g is bounded on compact sets, we have sup x N n (x) = O(1) sup x H n (x) in view of N n (x) = K (u)g(x b n u)h n (x b n u)du, (72) Observe that sup u m H n (u) H n (0) + m m H n (x) dx. Then the result follows from Lemma 3(i), (iii) and (vi). Acknowledgements We are grateful to two referees and an Associate Editor for their many helpful comments. This work was supported in part from DMS and DMS eferences [1] P. Ango Nze, P. Bühlmann, P. Doukhan, Weak dependence beyond mixing and asymptotics for nonparametric regression, Ann. Statist. 30 (2002) [2] P.J. Bickel, M. osenblatt, On some global measures of the deviations of density function estimates, Ann. Statist. 1 (1973)

19 2430 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) [3] D. Bosq, Nonparametric Statistics for Stochastic Processes : Estimation and Prediction, in: Lecture Notes in Statist., vol. 110, Springer, New York, [4].C. Bradley, Basic Properties of Strong Mixing Conditions. A Survey and Some Open Questions, Probab. Surveys 2 (2005) [5] J.V. Castellana, M.. Leadbetter, On smoothed probability density estimation, Stochastic Process. Appl. 21 (1986) [6] Y.S. Chow, H. Teicher, Probability Theory, second ed., Springer, New York, [7] S. Csörgő, J. Mielniczuk, Density estimation under long-range dependence, Ann. Statist. 23 (1995) [8] L. Devroye, L. Györfi, Nonparametric Density Estimation: The L 1 View, Wiley, New York, [9] P. Diaconis, D. Freedman, Iterated random functions, SIAM ev. 41 (1999) [10] P. Doukhan, S. Louhichi, A new weak dependence condition and applications to moment inequalities, Stochastic Process. Appl. 84 (1999) [11] J. Dedecker, C. Prieur, Coupling for τ-dependent sequences and applications, J. Theor. Probab. 17 (2004) [12] P. Doukhan, O. Wintenberger, Weakly dependent chains with infinite memory, Stochastic Process. Appl. 118 (2008) [13]. Engle, Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation, Econometrica 50 (1982) [14]. Eubank, Nonparametric egression and Spline Smoothing, second ed., Marcel Dekker, New York, [15] D.A. Freedman, On tail probabilities for martingales, Ann. Probab. 3 (1975) [16] J.Q. Fan, Q. Yao, Nonlinear Time Series. Nonparametric and Parametric Methods, Springer, New York, [17] L. Györfi, W. Härdle, P. Sarda, P. Vieu, Nonparametric Curve Estimation from Time Series, in: Lecture Notes in Statist., vol. 60, Springer, Berlin, [18] V. Haggan, T. Ozaki, Modelling nonlinear random vibrations using an amplitude-dependent autoregressive time series model, Biometrika 68 (1981) [19] H.C. Ho, T. Hsing, On the asymptotic expansion of the empirical process of long memory moving averages, Ann. Statist. 24 (1996) [20] W. Härdle, H. Lutkepohl,. Chen, A review of nonparametric time series analysis, Int. Stat. ev. 65 (1997) [21] T. Kawata, The function of mean concentration of a chance variable, Duke Math. J. 8 (1941) [22] G. Kallianpur, Some ramifications of Wiener s ideas on nonlinear prediction, in: P. Masani (Ed.), Norbert Wiener, Collected Works, 3, MIT Press, Cambridge, 1981, pp [23] W. Liu, W.B. Wu, Simultaneous nonparametric inference of time series, Ann. Stat. 38 (2010) [24] D.F. Nicholls, B.G. Quinn, andom Coefficient Autoregressive Models: An Introduction, Springer, New York, [25] E.A. Nadaraya, Nonparametric Estimation of Probability Densities and egression Curves, Kluwer Academic Pub, [26] M.H. Neumann, Strong approximation of density estimators from weakly dependent observations by density estimators from independent observations, Ann. Statist. 26 (1998) [27] M.H. Neumann, J.P. Kreiss, egression-type inference in nonparametric autoregression, Ann. Statist. 26 (1998) [28] B.L.S. Prakasa ao, Nonparametric Functional Estimation, Academic Press, New York, [29] M.B. Priestley, Non-linear and Non-stationary Time Series Analysis, Academic Press, New York, [30] M. osenblatt, emarks on some nonparametric estimates of a density function, Ann. Math. Statist. 27 (1956) [31] M. osenblatt, Density estimates and Markov sequences, in: M.L. Puri (Ed.), Nonparametric Techniques in Statistical Inference, Cambridge Univ. Press, London, 1970, pp [32] P.M. obinson, Nonparametric estimators for time series, J. Time Ser. Anal. 4 (1983) [33] M. osenblatt, A comment on a conjecture of N. Wiener, Statist. Probab. Lett. 79 (2009) [34] W. Stute, A law of the logarithm for kernel density estimators, Ann. Probab. 10 (1982) [35].S. Singh, A. Ullah, Nonparametric time-series estimation of joint DGP, conditional DGP, and vector autoregression, Econometric Theory 1 (1985) [36] B.W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, [37] H. Tong, Threshold models in non-linear time series analysis, in: Lecture Notes in Statistics, vol. 21, Springer, New York, [38] H. Tong, Non-linear Time Series: A Dynamical System Approach, Oxford Univ. Press, [39] D. Tjostheim, Nonlinear time series, a selective review, Scand. J. Statist. 21 (1994) [40]. Tsay, Analysis of Financial Time Series, Wiley, New York, 2005.

20 W.B. Wu et al. / Stochastic Processes and their Applications 120 (2010) [41] N. Wiener, Nonlinear Problems in andom Theory, MIT Press, John Wiley, 1958, pp [42] C.S. Withers, Conditions for linear process to be strongly mixing, Z. Wahrsch. Verw. Gebiete. 57 (1981) [43] M.P. Wand, M.C. Jones, Kernel Smoothing, Chapman and Hall, London, [44] W.B. Wu, J. Mielniczuk, Kernel density estimation for linear processes, Ann. Statist. 30 (2002) [45] W.B. Wu, X. Shao, Limit theorems for iterated random functions, J. Appl. Probab. 41 (2004) [46] W.B. Wu, Nonlinear system theory: another look at dependence, Proc. Natl Acad. Sci. 102 (2005) [47] W.B. Wu, On the Bahadur representation of sample quantiles for dependent sequences, Ann. Statist. 33 (2005) [48] W.B. Wu, Empirical processes of stationary sequences, Statist. Sinica 18 (2008) [49] B. Yu, Density estimation in the L norm for dependent data with applications to the Gibbs sampler, Ann. Statist. 21 (1993)

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June