Asymptotic properties of density estimates for linear processes: application of projection method

Size: px

Start display at page:

Download "Asymptotic properties of density estimates for linear processes: application of projection method"

Gabriel Blankenship
5 years ago
Views:

1 Nonparametric Statistics Vol. 17, No. 1, January February 005, Asymptotic properties of density estimates for linear processes: application of projection method ARTUR BRYK and JAN MIELNICZUK * Warsaw University of Technology, Pl. Politechniki 1, Warsaw, Poland Institute of Computer Science, Polish Academy of Sciences, Ordona 1, Warsaw, Poland (Received December 003; revised in final form June 004) We specify conditions under which kernel density estimate for linear process is weakly and strongly consistent, and establish rates of its pointwise and uniform convergence. In particular, it is proved that for short-range dependent data of size n and bandwidth b n, the rate of convergence is O((log n/nb n ) 1/ + bn ). The results are established using projection method introduced in this setup by Ho and Hsing (Ho, H. C. and Hsing, T. (1996). On asymptotic expansion of the empirical process of long-memory moving averages. Annals of Statistics, 4, ) and Wu (Wu, W. B. (001). Nonparametric estimation for stationary processes, Ph.D. thesis, University of Michigan, available at Keywords: Long- and short-range dependence; Kernel density estimators; Linear process; Projection method; Rates of convergence 1. Introduction Let (X t ) be a stationary sequence with marginal density f, and ˆ f n (x) be a kernel density estimate of f(x)based on n observations X 1,X,...,X n given by fˆ n (x) = 1 nb n ( x Xt K where the kernel K is some function, not necessarily positive, such that K(s)ds = 1, and bandwidths (smoothing parameters) satisfy natural conditions b n 0 and nb n. Majority of asymptotic properties of fˆ n (x) are established for independent (X t ); however, recently, there has been an increasing interest in the dependent case. In one line of research, weak dependence assumptions (usually various types of mixing conditions) are imposed on (X t ), and it is shown that the results for independent data carry over to this more general case; b n ), Affiliated also with Polish Japanese Institute of Information Technology, Warsaw, Poland *Corresponding author. miel@ipipan.waw.pl Nonparametric Statistics ISSN print/issn online 005 Taylor & Francis Ltd DOI: /

2 1 A. Bryk and J. Mielniczuk see ref. [1] for the overview of such results. Alternatively, one can consider a flexible submodel which restricts a class of considered stationary processes but not the strength of their dependence, and try to check when the analogy with independent case breaks down. This is especially interesting in the case of estimation of local parameters as a density function, because the answer to this question depends not only on the strength of dependence, but also on the effective number of observations used in the estimation, which is determined by magnitude of smoothing parameter. For representative examples of this line of research, we refer to refs [ 5]. Studied models include transformed stationary Gaussian sequences and infinite order moving averages, among others. In this article, we consider the latter, assuming throughout that X t = c i η t i, t = 1,,..., (1) i=0 where (η i ) i= are i.i.d. random variables with mean 0 and finite variance σ, and c i is such that i=0 c i <. We additionally assume that c 0 = 1 and that density f 1 of η 1 exists. If c i = L(i)i β, where 1/ <β<1, and L( ) is slowly varying at, routine calculation based on the Karamata theorem implies that r X (i) := Cov(X 0,X i ) C(β)L (i)i (β 1) σ, where C(β) := 0 (x + x ) β dx and a n b n means lim n a n /b n 1. Thus, in this case, sum of absolute values of covariances diverge. This property is called long-range dependence (LRD) or long-memory, in contrast to short-range dependence (SRD) case of absolutely summable covariances. Note that if i=0 c i <, orβ>1inthe hyperbolic decay condition given earlier, (X t ) is SRD. Let σn = Var(X 1 + +X n ) and put c i = 0 for i<0. Then, it is easily seen that σn = σ n ( n c ) t k is O(n) when i=0 c i < and σn D(β)n (β 1) L (n), where D(β) := σ [( β)(3/ β)] 1 C(β). In this article, we study conditions under which kernel density estimate is consistent in different senses and investigate rates of its a.s. convergence. In particular, we prove that, under natural conditions on bandwidths, fˆ n (x) is always weakly (i.e., in probability) consistent when density of η 1 is Lipschitz and K L. Moreover, for absolutely summable (c i ), rate of a.s. convergence of fˆ n (x) f(x) to0iso((log n/nb n ) 1/ + bn ). This parallels the result for independent data. A method which turns out to be particularly well suited to investigate such problems is projection method, two variants of which were introduced in the considered setup by Ho and Hsing [3] and Wu [6]. The method is briefly discussed in section. Both variants are presented here, as both of them turn out to be useful depending on a particular problem considered.applications of projection method to many statistical problems, including asymptotic behaviour of empirical processes and quantiles for linear processes and iterated dynamical systems, have been studied in depth by Wu in a series of papers [7 9]. Detailed analysis of asymptotic distributions of fˆ n (x) Efˆ n (x) is given by Wu and Mielniczuk [5]. Their results imply (cf. proof of their Theorem ) that when density f 1 of η 1 is three times differentiable with bounded continuous and integrable derivatives, then fˆ n (x) Efˆ n (x) = O P ((nb n ) 1/ + σ n /n). The method of proof also relied on martingale representation given in equation (5). The present article indicates that projection method is useful as well with investigating rates of a.s. and uniform convergence. Similar results to Theorem 4(a) on rates of uniform convergence based on strong mixing condition were established by Bosq [1] and Fan and Yao [10]. In particular, it follows from Theorem 5.3 of Chanda [11] that if strong mixing coefficients α(n) of an underlying stationary process decay geometrically, b n cn γ for 0 <γ <1, f is bounded on [a,b] and K satisfies Lipschitz condition, then sup x [a,b] fˆ n (x) Efˆ n (x) =O((log n/nb n ) 1/ ) in probability. Thus, the rate of convergence of the random component of fˆ n (x) coincides with that

3 Density estimation for linear processes 13 of Theorem 4(a). More generally, α(n) Cn β with β>5/ implies the same rate of convergence, provided the sequence (b n ) tends to 0 sufficiently slowly. One gets further insight into this problem by considering conditions under which linear processes are strongly mixing. There are many such results in the literature, e.g., refs [11 14]. In particular, for c n cn δ under certain regularity conditions on the[ density of η i, it follows from Withers [13] that the strong mixing coefficients α(k) = O n=k max(c1/3 n, ] C n log C n ) = O(k (4 δ)/3 ), where C k = i=k c i. Thus, the conditions of Fan and Yao when c n cn δ are satisfied for δ>3/4. At the same time, Theorem 4(a) does not require any conditions on the decay rate of (c n ); the sole condition being (c n ) l 1. Let us also note that as mixing coefficients measure departure from independence, asymptotic results based on mixing approach concern weak dependence case, and handling LRD case under such conditions seems infeasible. In the article, we restrict ourselves to linear processes and try to quantify the effect of dependence on the rates of convergence of fˆ n ( ).. Projection method Let (V t ) be a strictly stationary sequence of real random variables such that E V 1 <, and V t is F t -measurable, where (F t ) t= is an increasing sequence of σ -fields such that t i= F i is trivial for any t. In this section, we do not assume that V t is an infinite-order moving average. The main tool used in the article is the projection method, which exploits in different ways the following basic equality V t EV t = t E(V t F k ) E(V t F k 1 ), () holding a.s. owing to the fact that E(V t F j ) E(V t t i= F i) = E(V t ) a.s. as j. Let P k U = E(U Fk) E(U F k 1 ) denote projection differences. Note that the equality E(U F k ) = P k U + E(U F k 1 ) yields orthogonal decomposition of E(U F k ) with respect to F k 1. It follows from equation () that (V t EV t ) = t P k V t. (3) There are two possible ways of regrouping summands in equation (3). The first one consists in writing it formally as V t := U n,k, (4) P k using P k V l = 0 for l<k, whereas the second one represents equation (3) as k=0 P t k V t := W n,k. (5) In order to observe the main difference between equations (4) and (5), note that U n,k in equation (4) are sums of not necessarily uncorrelated summands such that U n,k and U n,k are uncorrelated for k = k. On the other hand, summands P t k V t,t = 1,...,n,ofW n,k are k=0

4 14 A. Bryk and J. Mielniczuk uncorrelated being martingale differences with respect to (F t k ) n,butw n,k may be correlated with W n,k for k = k. In what follows, Y =(E(Y )) 1/ denotes L -norm of a random variable Y. Observe that equality (4) leads to a more precise bound for L -norm of centered sum of V t than equation (5). Namely, using stationarity of (V t ) we get [6] (V t EV t ) = U n,k t=max(1,k) whereas analogous handling of equation (5) yields (V t EV t ) P t k V t = n 1/ k=0 k=1 P 1 V t k+1, (6) P 1 V k, which is obviously weaker than equation (6). This is the reason why representation (4), but not equation (5), is used to bound the variance, i.e., L -norm of centred density estimate in Lemma 1. In particular, if P 1 V t θ t for t 1, we have that the last expression in equation (6) is not greater than ( 0 ) ( ) θ t k+1 + θ t k+1 ( n+k k ) + n n =: n, (7) k=1 t=k where n = n i=1 θ i. In the case when i=1 θ i <,wehave n = O(n) as k=1 ( n+k k ) n( i=1 θ i). However, observe that it is often beneficial to exploit martingale structure of W n,k in equation (5), e.g., by taking advantage of known exponential inequalities for sums of martingale differences [15]. This is the approach used in Theorem 3 to study the rates of convergence of fˆ n (x). When using representation (5), summand W n,0 usually has to be handled differently from the terms W n,k for k>0 as it involves random variable V t which is not conditionally averaged in contrast to remaining terms. This leads to natural decomposition W n,0 + k>0 W n,k in which the second term is the sum of random variables V t = E(V t F t 1 ) EV t,t = 1,...,n for which decomposition (4) may be employed. The idea to use martingale approximations to investigate properties of S n = n V t dates back to Gordin and Lifszic [16] who considered this problem for V t = g(y t ), where Y t is an ergodic Markov chain. Note that for V t defined in equation (1), fˆ n (x) is a special case of this setup by letting Y t = (...,η t 1,η t ) and defining g = g n,byg n (Y t ) = n 1 K bn (x X t ). They proved that if E(S n X 1 ) h(x 1 ) in L, then h is a solution of the Poisson equation g(x) = h(x) Qh(x), where Qh(x) = h(y)q(x; dy) and Q(x, B) is a transition function of (Y t ). Then S n = n k 1 M k + R n, where M k = h(y k ) Qh(Y k 1 ) is the martingale difference with respect to F k 1 and R n = Qh(Y 0 ) Qh(Y n) is a remainder term. However, for problems considered in this article, weaker conditions are obtained when representation (5) is studied directly than those pertaining to existence of solution of the Poisson equation for g n defined earlier. k=1 3. Consistency and rates of almost sure convergence of kernel density estimates In Theorem 1, we investigate conditions under which weak and strong consistencies of ˆ f n hold. In both cases, no decay rate of coefficients c i is needed. Theorem provides asymptotic

5 Density estimation for linear processes 15 representation of centred fˆ n (x). In Theorem 3, rate of a.s. convergence of fˆ n (x) data is established for SRD and LRD case. Rate of uniform weak consistency over any finite interval is established in Theorem 4. Projection method is applied to rows of row-wise stationary array Y tn = n 1 (K bn (x X t ) Efˆ n (x)), t = 1,,...,n,n N, and F t = σ(...,η t 1,η t ) is the σ -field generated by all innovations up to the moment t. THEOREM 1 (a) Assume that the density f 1 of η 1 exists and is Lipschitz continuous with Lipschitz constant A, nb n and K (s) ds <. Then fˆ n (x) f(x)in probability. (b) Assume, moreover, that f 1 is bounded, nb n / log n and vk(v) dv <. Then fˆ n (x) f(x)a. s. The following theorem provides decomposition of fˆ n (x) Efˆ n (x) with explicit bound for a remainder term for c i s decaying hyperbolically. A weaker result under more stringent conditions on density f 1 and with remainder term O P (n 1/ ) + o P (σ n /n) has been proved by Wu and Mielniczuk (00). Let n be defined as in equation (7) with θ t = c t 1 ( i=t 1 c i )1/ and let f gdenote convolution of f( ) and g( ). A sequence (Z n ) n=1 of random variables converges to 0 completely if n=1 P( Z n >ε)< for any ε>0. THEOREM Assume that f 1 is bounded two times continuously differentiable function with bounded derivatives, and K satisfies assumptions of Theorem 1(b). Then ˆ f n (x) E ˆ f n (x) = M n (x) K b f (x)n 1 X t,t 1 + O ( n n ), (8) where M n (x) := n Y tn E(Y tn F t 1 ), X t,s := E(X t F s ) and n is defined in equation (7). M n (x) is a sum of martingale differences converging completely to 0 provided nb n / log n. Ifc i = L(i)i β for some β>1/, then n = O(n 1/ n i=1 i1/ β L (i) + n β L (n)) which is O(n β L (n)) for β<3/4 and O(n 1/ ) for β>3/4. Ifβ>3/4 and Eη1 4 <, the second term in equation (8) converges completely to 0. From Theorem, it follows that for twice continuously differentiable f we have fˆ n (x) f(x)= O P ((nb n ) 1/ + σ n /n + bn ). Next, we study a.s. rates and uniform rates in probability. Let C () (R) denote a family of twice continuously differentiable real functions. THEOREM 3 Assume that conditions of Theorem 1(b) are satisfied, f 1 C (R), E η 1 p < for some p>, K is symmetric and K(v) v dv<. (a) If i=1 c i < then ( (log ) ) fˆ n 1/ n (x) f(x) =O + bn a.s. (9) nb n (b) If c i = L(i)i β for 1/ <β<1, then ( (log ) ) fˆ n 1/ n (x) f(x) =O + σ n(log n) 1/ + bn a.s. (10) nb n n

6 16 A. Bryk and J. Mielniczuk THEOREM 4 Assume that conditions of Theorem 1(b) are satisfied, K is bounded, symmetric and Lipschitz continuous with Lipschitz constant L, f 1 is twice continuously differentiable with bounded derivatives, f 1 L1 (R) and f L (R). (a) If i=1 c i < then ( (log ) ) sup fˆ n 1/ n (x) f(x) =O P + bn x [a,b] nb n (b) If c i = L(i)i β for 1/ <β<1then ( (log ) ) sup fˆ n 1/ n (x) f(x) =O P + σ n x [a,b] nb n n + b n for any finite interval [a,b]. 4. Proofs The proofs of Theorems 1 4 proceed through the following lemmas. Let us stress that both variants of projection method turn out to be useful here. We apply bound (6) for fˆ n (x) Efˆ n (x) to prove Theorem 1(a). Decomposition (5) ˆ f n (x) E ˆ f n (x) = W n,0 + k>0 W n,k =: M n (x) + N n (x) is employed to prove Theorem 1(b). Martingale structure of M n (x) is used to establish its a.s. convergence to 0, whereas term N n (x) is analysed using the ergodic theorem. Furthermore, we take advantage of bound (6) again to find suitable approximation for N n (x) needed to prove Theorem. Exponential inequality for sums of martingale differences M n (x) is used to prove Theorem 3, whereas the Burkholder inequality is employed to study the term N n (x). LEMMA 1 Assume that assumptions of Theorem 1(a) are satisfied. Then Var ˆ f n (x) 8C 1 η 1 n where C 1 = A K(s) dds. ( ) c t k + n Var K b n (x X 1 ), (11) Observe that the bound in equation (11) can be written as C Var( X X n )/n) + D Var fˆ n 0(x), where X t = i=0 c i η t i and fˆ n 0 (x) is a kernel density estimate based on i.i.d. sample of size n having density f. This follows upon noting that X X n = n n c t k η k. Only the first term in the bound is influenced by dependence of underlying sequence (X t ). LEMMA Let f 1 be a bounded density, K (s) ds < and nb n / log n. Then M n (x) = Y tn E(Y tn F t 1 ) 0 completely.

7 Density estimation for linear processes 17 LEMMA 3 Assume that conditions of Theorem are satisfied. Then N n (x) + K b f (x)n 1 ( ) n X t,t 1 = O P, n where N n (x) = ˆ f n (x) E ˆ f n (x) M n (x) and X t,s = E(X t F S ). Let S n = n Xt. LEMMA 4 Assume that E( η 1 p )< for some p 1. Then E S n p = O((Var S n ) p ). Proof of Theorem 1(a). Writing X t = X t,t 1 + η t, it is easy to see that continuity of f 1 implies continuity of f, and thus in view of the Bochner lemma Var K b (x X 1 ) f(x)b 1 K (s) ds for b 0 whence the second term on the right-hand side of Eq. (11) tends to 0. Moreover, 1 n ( ) c t k = 1 n n 1 t,t n i=0 j=0 c t k c t k c j c j+i cj j=0 1/ n i=0 cj+i j=0 1/ 0 as j=0 c j <. Thus in view of equation (11), Var fˆ n (x) 0 and weak consistency of fˆ n (x) follows as Efˆ n (x) f(x)in view of continuity of f at x. Proof of Theorem 1(b) Observe that in view of Lemma it is enough to prove that N n (x) 0 a.s. Note that N n (x) = K(v)n 1 f 1 (x X t,t 1 b n v) dv Efˆ n(x). Observe that N n (x) K(v)n 1 (f 1 (x X t,t 1 b n v) f 1 (x X t,t 1 )) dv + f 1 (x X t,t 1 ) f(x) + E fˆ n (x) f(x). (1) n 1 The last term on the right-hand side tends to 0 in view of continuity of f.as(f 1 (x X t,t 1 ) is ergodic as an instantaneous transformation of an ergodic sequence (X t,t 1 ), the second term also tends to 0a.s. in view of ergodic theorem as Ef 1 (x X t,t 1 ) = f(x). The first term is bounded by Ab n K(v)v dv 0 using the fact that f1 is Lipschitz. Proof of Theorem Observe that in view of Lemmas and 3 we have that fˆ n (x) Efˆ n (x) M n (x) + K b f (x)n 1 ( ) n X t,t 1 = O P n

8 18 A. Bryk and J. Mielniczuk and M n (x) 0 completely. In order to establish the bound on n, observe that n = O(n n + i=n+1 ( n+i i ) ) and in view of the Karamata theorem θ i = O(i 1/ β L (i)). Thus, we have ( n+i i ) = i=n+1 i=n+1 O((nθ i ) ) = n i=n+1 O(i 1 4β L 4 (i)) = O(n 4 4β L 4 (n)). We now prove that for β>3/4 and Eη1 4 <, n 1 n X t,t 1 tend to 0 completely.applying Lemma 4 with X t,t 1 in place of X t,s n = n X t,t 1 and p =, we get P ( S n n ) ε ES 4 n (nε) = O 4 ( (Var S n ) Observe that reasoning analogously to proof of Theorem 1(a), we get Var S n = O(n) when c is are absolutely summable, and as discussed in section 1 Var S n n (β 1) L (n) for 1/ < β<1and whence for β>3/4,(var S n ) /n 4 is summable. Thus for such cases P( S n /n ε) is summable and conclusion follows from the Borel Cantelli lemma. The case β 1 is treated analogously. n 4 ). Proof of Lemma 1 In order to prove inequality (11) we use bound (6). Let U t = K bn (x X t ) EK bn (x X t ) and note that P k U t = 0 for t<k. Thus n Var ˆ f n (x) ( P 1 U t k+1 I{t >k}+ P 1 U 1 I{t = k} ( P 1 U t k+1 ) + n P 1 U 1. t>k As P 1 U 1 = EU1 E(E(U 1 F 0 )) EU1, it is enough to prove for t>1 ) P 1 U t C 1 ( c t 1 (Eη 1 )1/ + c t 1 E η 1 ) C 1 c t 1 η 1. To this end, observe that writing X t = i=t s (t s) 1 c i η t i + c i η t i =: X t,s + R t,s, i=0 we have for t>1 P i U t = K bn (x z X t,1 )f t 1 (z) dz K bn (x z X t,0 )f t (z) dz, (13) where f t s ( ) denotes a density of R t,s. Moreover, an easy induction argument implies that f t is Lipschitz continuous with Lipschitz constant A when f 1 has this property. Note that

9 Density estimation for linear processes 19 equation (13) can be written as K(z){f t 1 (x X t,1 b n z) f t (x X t,1 b n z)} dz + K(z){f t (x X t,1 b n z) f t (x X t,0 b n z)} dz =: I + II. Using Lipschitz continuity of f t,wehave II C 1 c t 1 η 1. Moreover, as f t (s) can be written as convolution f t 1 f(s), where f is a density of c t 1 η 1, we have that for s = x X t,1 b n z I K(z) f t 1 (s) f t 1 (s y) f(y)dy dz A K(z) y f(y)dy dz = C 1 E c t 1 η 1. Proof of Lemma We use a special case of Freedman exponential inequality [15] stating that for sum S n = nm n (x) of bounded martingale differences T i, T i B,wehaveforλ>0 E exp(λs n ) exp(βb e(bλ)), where e(λ) = e λ λ 1 and β is a bound of conditional variances of T i P( n i=1 E(T i F i 1) β) = 1. In our case such that E(T i F i 1) = E(K b n (x X i ) F i 1 ) (E(K bn (x X i ) F i 1 )) E(Kb n (x X i ) F i 1 ) = 1 ( ) x K Xi,i 1 s f bn 1 (s) ds b n K (s) ds sup f 1 b n and B = sup K/b n. Thus n i=1 E(T 1 F i 1) nc/b n for some positive C, and for λ>0 P(M n ε) E exp(λs { n) (CB } exp(nλε) exp ne(bλ)) nλε. (14) b n Let λ n = ε n b n /C, where ε n = (8C log n/nb n ) 1/. Note that as e(λ) 1 λ for λ 0we have that e(bλ n )<(3/4)(Bλ n ) for sufficiently large n. Thus, for such n, the last bound is smaller than exp(6 log n 8 log n) = exp( log n). (15) Thus the Borel Cantelli lemma implies that M n (x) (8C log n/nb n ) 1/ finitely often almost surely. Analogously, we show that n P(M n(x) ε n )< and whence M n (x) 0 completely. Proof of Lemma 3 The proof uses the main ideas of the proof of Lemma 4 by Mielniczuk and Wu (004). Observe that nn n (x) = n K b T t (x), where T t (x) = f 1 (x X t,t 1 ) f(x),

10 130 A. Bryk and J. Mielniczuk where denotes convolution. We will prove that P 1 K b T t (x) + K b f (x)c t 1 η 1 =O( c t 1 C 1/ t 1 ), (16) where C t = i=t c i. This will prove the lemma in view of equation (7). Writing K b (x s)ξ s ds ( = E K b (x s)k b (x s )ξ s ξ s ds ds ) for any family of random variables (ξ s ) s R and using Cauchy inequality, we have K b (x s)ξ s ds sup ξ s. s Thus, in order to prove equation (16) in view of P 1 T t (x) = f t 1 (x X t,1 ) f t (x X t,0 ),it is enough to show that sup s f t 1 (s X t,1 ) f t (s X t,0 ) + f (s)c t 1 η 1 =O( c t 1 C 1/ t 1 ). (17) In view of Lemma 1 by Mielniczuk and Wu [17], f (x) = Ef t 1 (x X t,1). Thus sup f (s) f t 1 s as sup s,t f t () (s) <. Moreover Thus equation (17) follows from (s) sup s E f t 1 (s X t,1) f t 1 (s) sup f t 1 (s X t,1) f t 1 (s) =O(C1/ t 1 ) s sup f t 1 (s) f t 1 (s X t,0) =O(C 1/ t ). s sup f t 1 (s X t,1 ) f t (s X t,0 ) + f t 1 (s X t,0)c t 1 η 1 =O( c t 1 ). (18) s Let η 1 be a copy of η 1 independent of (η i ) i= and X t,1 = X t,1 c t 1 η 1 + c t 1 η 1. Random variable on the left-hand side of equation (18) can be written as Adding to equation (19) E(f t 1 (s X t,1 ) f t 1 (s X t,1 ) + f t 1 (s X t,0)c t 1 η 1 F 1 ) (19) E(f t 1 (s X t,0 ) f t 1 (s X t,0 ) f t 1 (s X t,0)c t 1 η 1 F 1) = 0, we see that equation (18) follows from two term Taylor expansion of f t 1 (s X t,0 ) f t 1 (s X t,1 ) and f t 1 (s X t,0 ) f t 1 (s Xt,1 ).

11 Density estimation for linear processes 131 Proof of Lemma 4 S n = n n i= c t iη i =: n i= d n iη i is a sum of martingale differences as η i is independent. Thus, the Burkholder inequality implies ( ) p E S n p C p E (d n i η i ) i= and in view of the Minkowski inequality, we have S n p C1/p p = C 1/p p n i η i ) i= (d p i= C 1/p p i= (d n i η i ) p d n i η i p = C1/p p (Var S n) η 1 p, from which the statement of the lemma follows. Proof of Theorem 3 (a) Proof of Lemma indicates that M n (x) = O((log n/nb n ) 1/ ) a.s. As Efˆ n (x) f(x)= O(bn ), it is enough to show that N n(x) = O((log n/n) 1/ + bn ) a.s., provided c i s are absolutely summable. We bound N n (x) as in proof of Theorem 1(b) and note that the first and the third terms of the bound are O(bn ). Let H n (x) = f 1 (x X t,t 1 ) f(x)=: T t (x). We first show that P 1 T t (x) p = O( c t 1 ). (0) To this end, consider a coupled version of X t,t 1 : X t,t 1 = X t,t 1 c t 1 η 1 + c t 1 η 1, where η 1 has the same distribution as η 1 and is independent of (η i ). Then E(f 1 (x X t,t 1 ) F 0 ) = E(f 1 (x X t,t 1 ) F 1), and as f 1 satisfies Lipschitz condition we have P 1 T t (x) p = E((f 1 (x X t,t 1 ) f 1 (x X t,t 1 )) F 1 p E(A c t 1 (η 1 η 1 ) ) F 1 p A c t 1 (η 1 η 1 ) p = O( c t 1 ) in view of E η 1 p <. Consider decomposition of H n (x) = n j= P j H n (x) into martingale differences P j H n (x). It follows from the Burkholder inequality that E H n (x) p C p E (P j H n (x)) j= p/ and, moreover, P j H n (x) p P 1 T t j+1 (x) p C c t j. t=j t=j

12 13 A. Bryk and J. Mielniczuk Thus (E H n (x) p ) /p C (P j H n (x)) j= p/ C C j= j= P j H n (x) p c t j. (1) As c i s are absolutely summable we get E H n (x) p n p/. We apply Moricz [18] inequality stating that if for some non-negative a i and r>0, q>1, E X 1 + +X i r ( i j=1 a j ) q, then E(max i n X 1 + +X i r ) C rq ( n j=1 a j ) q with r = p and q = p/. This together with equation () implies that E(max i n H i (x) p ) n p/. Thus H n (x) = O(n 1/ (log n) 1/ ) a.s. It follows, by the Borel Cantelli lemma from the observation, that P( max H i (x) ɛ( k k) 1/ ) E max H i k i (x) p = O(k p/ ) i k ( kp/ k p/ ) which is summable as p>. (b) Proof of (b) follows the same argument after noting that the analogue of equation (1) is t=j (E H n (x) p ) /p = O(σ n ). Moreover, two-fold application of the Karamata theorem yields that the Moricz inequality may be applied with a i = i β L (i). It yields E max i n H i (x) p Cσn p. Proof of Theorem 4 We first prove that ( log n sup M n (x) =O x [a,b] nb n almost surely. Let ɛ n = 3(16C log n/nb n ) 1/, where C is defined in the proof of Lemma and a = x 0 <x 1 < <x ln = b is an equipartition of [a,b] with l n =[6L(b a)/(bn ɛ n)]+1. Define k(x) = k to be the point of partition closest to x. Observe that ( ) ( ( ) ) P sup M n (x) ɛ n P sup M n (x) M n (x k ) ɛ n x [a,b] x [a,b] 3 ( ( ) ) 1 + P max M n (x k ) ɛ n. 0 k l n 3 It is easy to see that in view of Lipschitz continuity of K, wehave M n (x) M n (x k ) L(b a)/(bn l n) ɛ n /3 and whence the first probability on the right-hand side is 0. Second term is trivially bounded by ) 1/ (l n + 1) max 0 k l n P( M n (x k ) 3 1 ɛ n ). () Choosing λ n = ɛ n b n /3C and reasoning as in proof of Lemma, we get that equation () is bounded by (l n + 1) exp( 4 log n) (n/b 3 n log n)1/ exp( 4 log n) which is summable, and

13 Density estimation for linear processes 133 the conclusion follows from the Borel Cantelli lemma. Next, we show that in the case of (b) sup x [a,b] N n (x) =O P (σ n /n + bn ). Reasoning in case (a) is similar. Bounding N n(x) as in proof of Theorem 1(b), we see that the first and the third terms of the bound are bn uniformly in x [a,b]. In order to bound the second term let, as in the proof of Theorem 3, H n (x) = n f 1(x X t,t 1 ) f(x) and h n (x) = H n (x). From the proof of Lemma 5 in Wu [9], it follows that f 1 L1 (R) provided f () L (R), and if f 1 is bounded we have ( t ) (x) dx = O c t j = O(σn ). Eh n R j= Moreover, observe that ( b E sup H n (x) H n (a) E h n (x) dx) (b a) x [a,b] a b a Eh n (x) dx. The last two equations imply via the Markov inequality that sup x [a,b] H n (x) H n (a) = O P (σ n ). Proof of Theorem 3(a) implies that H n (a) =O(σ n ) and the conclusion follows via triangle inequality. Acknowledgements We thank Wei Biao Wu for his insightful comments on handling the term H n (x) in the proof of Theorem 4 and a referee for helpful remarks. References [1] Bosq, D., 1998, Nonparametric Statistics for Stochastic Processes. Estimation and Prediction. Vol. 110, Lecture Notes in Statistics (New York: Springer). [] Hall, P. and Hart, J.D., 1990, Convergence rates in density estimation for data from infinite-order moving average process. Probability Theory and Related Fields, 87, [3] Ho, H.C. and Hsing, T., 1996, On asymptotic expansion of the empirical process of long-memory moving averages. Annals of Statistics, 4, [4] Csörgő, S. and Mielniczuk, J., 1999, Random design regression under long-range dependent errors. Bernoulli, 5, [5] Wu, W.B. and Mielniczuk, J., 00, Kernel density estimation for linear processes. Annals of Statistics, 30, [6] Wu, W.B., 001, Nonparametric estimation for stationary processes. PhD thesis, University of Michigan, available at [7] Wu, W.B., 00, Central limit theorems for functionals of linear processes. Statistica Sinica, 1, [8] Wu, W.B., 003a, Empirical processes of long-memory sequences. Bernoulli, 9, [9] Wu, W.B., 003b, On the Bahadur representation of sample quantiles for dependent sequences, preprint. [10] Fan, J. andyao, Q., 003, Nonlinear Time Series. Nonparametric and Parametric Methods (NewYork: Springer). [11] Chanda, K., 1974, Strong mixing properties of linear stochastic processes. Journal of Applied Probability, 11, [1] Gorodetskii, V.V., 1977, On the strong mixing property for linear sequences. Theory of Probability and its Applications,, [13] Withers, C.S., 1981, Conditions for linear process to be strongly mixing. Zeitschrift fur Wahrscheinlichkeits Theorie und Verwandte Gebiete, 57, [14] Pham, T.D. and Tran, L.T., 1985, Some mixing properties of time series models. Stochastic Process and their Applications, 19, [15] Freedman, D., 1975, On tail probabilities for martingales. Annals of Probability, 3, [16] Gordin, M.I. and Lifszic, B., 1978, The central limit theorem for stationary Markov processes. Doklady Akademii Nauk SSSR, 19, [17] Mielniczuk, J. and Wu, W.B., 004, On random-design model with dependent errors. Statistica Sinica, to appear. [18] Moricz, F., 1976, Moment inequalities and the strong laws of large numbers, Zeitschrift fur Wahrscheinlichkeits Theorie und Verwandte Gebiete, 35,

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract Convergence