Asymptotic distributions of some scale estimators in nonlinear models with long memory errors having infinite variance 1

Similar documents
Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*

Nonparametric Density Estimation fo Title Processes with Infinite Variance.

Mi-Hwa Ko. t=1 Z t is true. j=0

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

On uniqueness of moving average representations of heavy-tailed stationary processes

Some functional (Hölderian) limit theorems and their applications (II)

Laplace s Equation. Chapter Mean Value Formulas

Stochastic volatility models: tails and memory

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Asymptotics of empirical processes of long memory moving averages with innite variance

A Note on the Central Limit Theorem for a Class of Linear Systems 1

Regularity of the density for the stochastic heat equation

Estimation of the long Memory parameter using an Infinite Source Poisson model applied to transmission rate measurements

Gaussian Random Field: simulation and quantification of the error

Krzysztof Burdzy University of Washington. = X(Y (t)), t 0}

The Codimension of the Zeros of a Stable Process in Random Scenery

Stein s Method and Characteristic Functions

Asymptotic Statistics-III. Changliang Zou

Selected Exercises on Expectations and Some Probability Inequalities

Prime numbers and Gaussian random walks

Heavy Tailed Time Series with Extremal Independence

NOTES AND PROBLEMS IMPULSE RESPONSES OF FRACTIONALLY INTEGRATED PROCESSES WITH LONG MEMORY

Density estimators for the convolution of discrete and continuous random variables

We denote the space of distributions on Ω by D ( Ω) 2.

CHAPTER 3: LARGE SAMPLE THEORY

Independence of some multiple Poisson stochastic integrals with variable-sign kernels

MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES

Theoretical Statistics. Lecture 1.

2 Makoto Maejima (ii) Non-Gaussian -stable, 0 << 2, if it is not a delta measure, b() does not varnish for any a>0, b() a = b(a 1= )e ic for some c 2

The properties of L p -GMM estimators

Gaussian vectors and central limit theorem

for all f satisfying E[ f(x) ] <.

The Central Limit Theorem: More of the Story

We denote the derivative at x by DF (x) = L. With respect to the standard bases of R n and R m, DF (x) is simply the matrix of partial derivatives,

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

SOLUTIONS OF SEMILINEAR WAVE EQUATION VIA STOCHASTIC CASCADES

A Concise Course on Stochastic Partial Differential Equations

Logarithmic scaling of planar random walk s local times

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Spring 2012 Math 541B Exam 1

Wiener Measure and Brownian Motion

Regular Variation and Extreme Events for Stochastic Processes

1 Introduction. 2 Measure theoretic definitions

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor

Bahadur representations for bootstrap quantiles 1

STA205 Probability: Week 8 R. Wolpert

OPTIMAL SOLUTIONS TO STOCHASTIC DIFFERENTIAL INCLUSIONS

2. Function spaces and approximation

Gaussian Random Fields

STAT 200C: High-dimensional Statistics

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication)

UNIT ROOT TESTING FOR FUNCTIONALS OF LINEAR PROCESSES

SDS : Theoretical Statistics

Evgeny Spodarev WIAS, Berlin. Limit theorems for excursion sets of stationary random fields

On large deviations for combinatorial sums

A Quadratic ARCH( ) model with long memory and Lévy stable behavior of squares

The Convergence Rate for the Normal Approximation of Extreme Sums

Estimation of a quadratic regression functional using the sinc kernel

Almost sure limit theorems for U-statistics

The Equivalence of Ergodicity and Weak Mixing for Infinitely Divisible Processes1

STOCHASTIC GEOMETRY BIOIMAGING

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

Asymptotic Properties of Kaplan-Meier Estimator. for Censored Dependent Data. Zongwu Cai. Department of Mathematics

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

Riemann integral and volume are generalized to unbounded functions and sets. is an admissible set, and its volume is a Riemann integral, 1l E,

Bernoulli Polynomials

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

On variable bandwidth kernel density estimation

Moment Properties of Distributions Used in Stochastic Financial Models

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Refining the Central Limit Theorem Approximation via Extreme Value Theory

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Goodness-of-fit tests for the cure rate in a mixture cure model

Chapter One. The Calderón-Zygmund Theory I: Ellipticity

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018

A CLT FOR MULTI-DIMENSIONAL MARTINGALE DIFFERENCES IN A LEXICOGRAPHIC ORDER GUY COHEN. Dedicated to the memory of Mikhail Gordin

NECESSARY CONDITIONS FOR WEIGHTED POINTWISE HARDY INEQUALITIES

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek

Kesten s power law for difference equations with coefficients induced by chains of infinite order

Probability and Measure

EXISTENCE RESULTS FOR QUASILINEAR HEMIVARIATIONAL INEQUALITIES AT RESONANCE. Leszek Gasiński

Convergence at first and second order of some approximations of stochastic integrals

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Measure-theoretic probability

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION

Fourier Transform & Sobolev Spaces

Some Contra-Arguments for the Use of Stable Distributions in Financial Modeling

FOURIER TRANSFORMS OF STATIONARY PROCESSES 1. k=1

Formulas for probability theory and linear models SF2941

WLLN for arrays of nonnegative random variables

THE L 2 -HODGE THEORY AND REPRESENTATION ON R n

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

STAT 7032 Probability Spring Wlodek Bryc

Ch3 Operations on one random variable-expectation

Asymptotics of minimax stochastic programs

Asymptotics and Simulation of Heavy-Tailed Processes

A generalization of Strassen s functional LIL

Transcription:

Asymptotic distributions of some scale estimators in nonlinear models with long memory errors having infinite variance 1 by Hira L. Koul and Donatas Surgailis Michigan State University and Vilnius University Abstract To have scale invariant M estimators of regression parameters in regression models there is a need for having a robust scale invariant estimator of a scale parameter. The two such estimators are the median of the absolute residuals s 1 and the median of the absolute differences of pairwise residuals, s 2. The asymptotic distributions of these estimators in regression models when errors have finite variances are known in the case errors are either i.i.d. or form a long memory stationary process. Since M estimators are robust against heavy tail error distributions, it is natural to know if these scale estimators are consistent under heavy tail error distribution assumptions. This paper derives their limiting distributions when errors form a linear long memory stationary process with α-stable (1 < α < 2) innovations and moving average coefficients decaying as j d 1, 0 < d < 1 1/α. We prove that s 2 has an α -stable limit distribution with α = α(1 d) < α while the convergence rate of s 1 is generally worse than that of s 2. The proof is based on the 2nd order asymptotic expansion of the empirical process of the stated infinite variance stationary sequence derived in this paper. 1 Introduction and Summary Let p q 1, be fixed integers, n p be an integer, and Ω be an open subset of the p-dimensional Euclidean space R p, R = R 1. Let {z ni, i = 1, 2,, n}, be arrays of known constants and g be a known real valued function on Ω R q. In non-linear regression model of interest here, one observes an array of random variables {X ni, i = 1, 2,, n} such that for some β 0 Ω, (1.1) X ni = g(β 0, z ni ) + ε i, 1 i n, where the errors ε i are given by the moving average (1.2) ε i = j i b i j ζ j, b j c 0 j (1 d), (j ), d < 1/2, c 0 > 0. 1 AMS 1991 subject classification: Primary 62G05; Secondary 62G35. Key words and phrases. Median. Absolute residuals. Pairwise absolute differences. 1

When ζ j, j Z := {0, ±1, } are i.i.d. having zero mean and finite variance, ε i, i Z is well-defined strictly stationary process for all d < 1/2 and has long memory in the sense that the sum of lagged auto-covariances diverge for all 0 < d < 1/2. For convenience, from now on, we shall write g ni (β) for g(β, z ni ) and g ni for g ni (β 0 ). Write med{x i ; 1 i m} for the median of a given set of real numbers {x i, 1 i m}. Let β be an estimator of β 0, r n,i X ni g ni ( β ) and define the scale estimators (1.3) s 1 := med { r n,i ; 1 i n}, s 2 := med { r n,i r n,j ; 1 i < j n}. Observe that s 1 estimates the median σ 1 of the distribution of ε 1 while s 2 the median σ 2 of the distribution of ε 1 ε 1, where ε 1 is an independent copy of ε 1. The fact that each of these estimators estimates a different scale parameter is not a point of concern if our goal is only to use them in arriving at scale invariant robust estimators of β 0. A class of M estimators of β 0 is a priori robust against heavy tail error distributions, cf., Huber (1981). In order to make these estimators scale invariant, it is desirable to use s 1 or s 2 in their derivation. Koul (2002) derived the asymptotic distributions of these two estimators when errors are i.i.d. or form a stationary long memory moving average process with i.i.d. zero mean finite variance innovations. In the latter case it was observed that n 1/2 d (s 1 σ 1 ) = o p (1), when the marginal error distribution function (d.f.) is symmetric around zero. The same fact remains true for s 2 without the symmetry assumption. The claim made in that paper that the rate of consistency of s 1 for the median σ 1 is faster than that of s 2 for σ 2 is erroneous, since the factor [f(σ 2 + x) f( σ 2 + x)]f(x)dx that appears on page 9 of that paper in the analysis of K n (σ 2 ) vanishes, for any square integrable f. In this paper we prove that under general (nonsymmetric) long memory moving average errors in (1.2) with α-stable innovations, 1 < α < 2 and 0 < d < 1 1/α, the convergence rate of s 2 is better than that of s 1, in the sense that n 1 1/α d (s 1 σ 1 ) = O p (1) while n 1 1/α (s 2 σ 2 ) has an α -stable limit distribution, see Theorem 2.2, where (1.4) α := α(1 d) and 1 1/α d < 1 1/α. In the case of symmetric errors, s 1 also has an α -stable limit distribution and the convergence rate of s 1 and s 2 is the same. See Theorem 2.1 (ii). In view of these facts, one may prefer to use s 2 over s 1 in the computation of a scale invariant M estimator of β 0. Proceeding a bit more precisely, we assume that the i.i.d. innovations ζ j, j Z of (1.2) belong to the domain of attraction of an α-stable law, 1 < α < 2, and that the d.f. G of ζ 0 has zero mean and satisfies the tail regularity condition (1.5) lim x x α G(x) = c, lim x α (1 G(x)) = c +, x 2

for some 1 < α < 2 and some constants 0 c ± <, c + + c > 0. From Ibragimov and Linnik (1971, Thm. 2.6.1) it follows that the above assumptions in particular imply that (1.6) n 1/α ζ j D Z, j=1 where Z is α-stable r.v. with characteristic function ( (1.7) Ee i uz = e u αω(α,u), u R, i := 1 ω(α, u) := Γ(2 α)(c + + c ) ( cos(πα/2) α 1 In addition, the weights b j, j 0 satisfy the asymptotics (1.2) for (1.8) 0 < d < 1 1/α. ), 1 i c + c c + + c sgn(u) tan(πα/2) It follows, say from Hult and Samorodnitsky (2008, Remark 3.3), that under these assumptions, ). (1.9) j=0 b j =, j=0 b j α <, the linear process ε i of (1.2) is well defined in the sense of the convergence in probability, and its marginal d.f. F satisfies (1.10) lim x x α F (x) = B, lim x α (1 F (x)) = B +, x where B := ((b j ) α +c + (b j ) α c + ), B + := j=0 ((b j ) α +c + + (b j ) α c ), (b j ) ± := max(0, ±b j ). j=0 Note that (1.10) implies E ε 0 α = and E ε 0 r <, for every r < α, in particular Eε 2 0 = and Eε 0 = 0. Because of these facts and (1.9), this process will be called long memory moving average process with infinite variance. In the sequel, we refer to the assumptions in (1.2), (1.5), and (1.8), as the standard assumptions about the errors in consideration. We also note that under (1.5), the upper bound d < 1 1/α in (1.8) is necessary for the convergence of the series in (1.2) almost surely and in probability, see e.g. Samorodnitsky and Taqqu (1994, Ex. 1.26). Since the interval of d in (1.8) diminishes with α decreasing, thicker tails imply that a smaller degree of memory can be considered. The class of moving averages satisfying these assumptions includes ARFIMA (p, d, q) with α-stable innovations, where d satisfies (1.8). See Kokoszka and Taqqu (1995) for a detailed discussion of the properties of stable ARFIMA series. 3

Section 2 describes the asymptotic distributions of standardized s 1, s 2 along with the needed assumptions (Theorems 2.1 and 2.2) with the proofs appearing in Section 4. Section 3 discusses first and second order asymptotic expansions of the residual empirical process. Section 5 (Appendix A) contains the proofs of the asymptotic expansions of Section 3 while Section 6 (Appendix B) contains the proofs of the two auxiliary results needed in Section 5. 2 Asymptotic distributions This section describes the asymptotic distributions of suitably standardized s 1 and s 2 under the above set up. Let (2.1) a := 1 d 1/α. Assume there exists a p-vector of functions ġ on R p R q such that the following holds, where ġ ni (s) ġ(s, z ni ). For every s R q and for every 0 < k <, (2.2) sup n a gni (s + n a u) g ni (s) n a u ġ ni (s) = o(1). 1 i n, u k In addition assume that (2.3) max 1 i n ġ ni (β 0 ) = O p (1). About the d.f. G we assume that its third derivative G (3) exists and satisfies (2.4) (2.5) G (3) (x) C(1 + x ) α, x R G (3) (x) G (3) (y) C x y (1 + x ) α, x, y R, x y < 1. These conditions are satisfied if G is α-stable d.f., which follows from asymptotic expansion of stable density, see e.g. Christoph and Wolf (1992, Th. 1.5) or Ibragimov and Linnik (1971, (2.4.25) and the remark at the end of ch.2 4). In this case, (2.4) (2.5) hold with α+2 instead of α. Conditions (2.4)-(2.5) imply the existence and smoothness of the marginal probability density f(x) = df (x)/dx, see Lemma 5.2 (5.8) below. They are not vital for our results and can be relaxed by assuming a weak decay condition at infinity of the characteristic function of G as in (GKS, (10.2.1)); however they simplify some technical derivations below. Before stating our results we need to recall the following fact. Let ε n := n 1 n ε i. Astrauskas (1984), Avram and Taqqu (1986, 1992), Kasahara and Maejima (1988) describe the limiting distribution of ε n under the standard conditions as follows. Let (2.6) c = c 0 ( 1 ( 1 αds ) 1/α (t s) (1 d) 0 + dt). 4

Then, with Z as in (1.7), (2.7) n 1 d 1/α ε n = n d 1/α n ε i D c Z. Let γ ± (x) := f(x)±f( x), x 0. We are now ready to state the following two theorems. Theorem 2.1 Suppose the regression model (1.1), (1.2) holds with g satisfying (2.2) and (2.3), and the innovation d.f. G satisfying (2.4) and (2.5). In addition, suppose β is an estimator of β 0 such that (2.8) n 1 d 1/α ( β β 0 ) = O p (1). (i) If, in addition f(σ 1 ) f( σ 1 ), then for every x R, (2.9) P (n 1 d 1/α (s 1 σ 1 ) xσ 1 ) ( = P (n 1 d 1/α ε n + ( 1 ) ) ) ġ ni ( β β0 ) γ (σ 1 ) xσ 1 γ + (σ 1 ) + o(1). n (ii) If, in addition f(σ 1 ) = f( σ 1 ), then, for every x R, P ( n 1 1/α (s 1 σ 1 ) xσ 1 ) P ( Z 1 xσ 1 γ + (σ 1 ) ), where Z 1 := Z (σ 1 ) Z ( σ 1 ) and Z (x), x R is α -stable process defined in (3.12) below. Theorem 2.2 Suppose the regression model (1.1), (1.2) and estimator β satisfy the same assumptions as in Theorem 2.1. Then ) P (n 1 1/α (s 2 σ 2 ) xσ 2 where Z 2 is an α -stable r.v. defined in (4.28) below. ( ) P Z2 x, x R, Remark 2.1 Koul and Surgailis (2001) verify (2.8) for a class of M-estimators when g(β, z) = β z with errors in (1.1) satisfying the above standard conditions. Using arguments similar to those appearing in Koul (1996, 1996a), (2.8) can be shown to hold for a class of M- estimators of β 0 in the general regression model (1.1) with the errors satisfying the above standard conditions under some additional smoothness conditions on g. 3 Asymptotic expansions of the residual EP The proofs of Theorems 2.1 and 2.2 use asymptotic expansions of certain residual empirical processes, which we shall describe in this section. Accordingly, introduce the process ) (3.1) V n (ξ) (x) = n (I(ε 1 i x + ξ ni ) F (x + ξ ni ) + f(x + ξ ni )ε i, x R, 5

where (ξ) (ξ ni ; 1 i n) are non-random real-valued arrays. Write V n (0) all ξ ni 0. Let β 0 be as in (1.1), and define for V (ξ) n when (3.2) F n (x, β) := n 1 I(X ni x + g ni (β)) = n 1 d ni (β) := g ni (β) g ni (β 0 ), 1 i n, β R p. I(ε i x + d ni (β)), x R, Note that under (1.1), F n (x, β 0 ) is equivalent to the first term in V n (0) (x). Consider the assumptions (3.3) (3.4) (3.5) n d 1/α max ξ ni = O(1). 1 i n max ξ ni = o(1). 1 i n ξ ni = O(1). Let D( R) denote the weak convergence of stochastic process in the space D( R) with the sup-topology, where R := [, ]. Theorem 3.1 Suppose standard assumptions and conditions (2.4) and (2.5) hold. Then the following holds. (a) If, in addition (3.3) holds, then there exists κ > 0 such that, for any ϵ > 0, [ ] (3.6) P n 1 d 1/α V n (ξ) (x) > ϵ Cn κ. sup x R In particular, under (1.1), for any ϵ > 0, [ ] P n 1 d 1/α Fn (3.7) (x, β 0 ) F (x) + f(x) ε n > ϵ Cn κ. sup x R Moreover, with Z as in (1.7) and c is as in (2.6), (3.8) n 1 d 1/α (F n (x, β 0 ) F (x)) = D( R) cf(x)z. (b) If, in addition (3.4) and (3.5) hold, then (3.9) sup x R n d 1/α { I(ε i x + ξ ni ) I(ε i x) ξ ni f(x)} = op (1). In the case of finite variance, the fact (3.6) for V n (0), known as the uniform reduction principle (URP), was derived in Dehling and Taqqu (1989), Ho and Hsing (1996) and Giraitis, Koul and Surgailis (1996). For more on this see Giraitis, Koul and Surgailis (2012) (GKS), and the references therein. While the first term of the expansion (3.7) is a degenerated process 6

f(x) ε n having Gaussian asymptotic distribution, higher order terms of order [1/(1 2d)] > k 2 have a complicated limits expressed in terms of multiple Itô-Wiener integrals. Similar asymptotic expansions for V (ξ) n were derived in Koul and Surgailis (2002). In the case of infinite variance, the URP (3.6) was established in Koul and Surgailis (2001) (KS) with the innovations in (1.2) having symmetric α-stable distributions. In the present paper, this result is extended to asymmetric innovations. The above approximations extend to various statistical functionals, see GKS and KS. However, in the case when the first order approximation vanishes as in the case of s 2, a second order approximation of the residual EP by a degenerated α -stable process with α as in (1.4), can be used to obtain the limiting distribution of a suitably standardized s 2, as is seen later in this paper. In order to describe this approximation we need some more notation. Let Z +, Z denote independent copies of totally skewed α -stable r.v. Z with characteristic function (3.10) Ee iuz = e c u α (1 isgn(u) tan(α π/2), u R, c := Γ(2 α ) cos(πα /2)/(1 α ). The above choice of parameters implies P (Z > x) x α, P (Z < x) = o(x α ), as x ; see Ibragimov and Linnik (1971, Ch.2). Also denote (3.11) (3.12) ψ ±(x) := ( c 1 1 d 0 /(1 d) ) 0 ( F (x s) F (x) ± f(x)s ) s 1 1 1 d ds, Z (x) := c 1/α + ψ +(x)z + + c 1/α ψ (x)z, x R. Theorem 3.2 (a) Suppose standard assumptions, conditions (2.4), (2.5) and (3.13) max 1 i n ξ ni = O(n ϵ ), ϵ > 0, hold. Then (3.14) n 1 1/α V (ξ) n (x) = D( R) Z (x). In particular, under (1.1), (3.15) (b) If, in addition, n 1 1/α ( F n (x, β 0 ) F (x) + f(x) ε n ) = D( R) Z (x). (3.16) max 1 i n ξ ni = O(n d+1/α 1 ) holds, then (3.17) sup x R n 1/α { I(ε i x + ξ ni ) I(ε i x) ξ ni f(x)} = op (1). 7

The proofs of Theorems 3.1 and 3.2 are given in Section 5. We remark that from the regularity properties of F (see Lemma 5.2, (5.8)) it follows that the functions ψ± in (3.11) are well-defined and continuously differentiable on R; moreover, ψ±(x) < C(1 + x ) γ for some γ > 1 and hence lim x ψ±(x) = 0. We also note that ψ±(x) agree, up to a multiplicative factor, with the Marchaud (left and right) fractional derivative of F (x) of order 1/(1 d) (1, 2). Remark 3.1 The α -stable limit in (3.15) is related to the limit results for nonlinear functions of moving averages in Surgailis (2002, 2004), and also to the limit of power variations of some Lévy driven processes with continuous time discussed in Basse-O Connor, Lachièze-Rey and Podolskij (2016). 4 Proofs of Theorems 2.1 and 2.2 This section contains the proofs of Theorem 2.1 and 2.2. Before proceeding further we need to introduce some additional notation. With α as in (1.4), let (4.1) a = 1 1 α. Note that a = 1 1 α(1 d) α(1 d) 1 =, a = 1 d 1 α(1 d) α = α(1 d) 1 α Also, 0 < d < 1 (1/α) and α < 2 imply 0 < (2 α)/α < 1 2d < 1. = a (1 d). Therefore a 2a = a 2a (1 d) = a (1 2d) < 0. We also have a < a. For easy reference we summarize these facts as the following inequalities. (4.2) a < a < 2a, 0 < d < 1 1/α. Proof of Theorem 2.1. Let p 1 (y) := F (y) F ( y), y 0. Define σ 1 as the unique solution of p(σ 1 ) = 1/2. Because σ 1 is median of the distribution of ε 1, the derivation of the asymptotic distribution of s 1 is facilitated by analysing the process (4.3) S(y) := I ( r i y), y 0, where from now on, we write r i for r n,i for all 1 i n. The investigation of the asymptotic behavior of this process in turn is facilitated by that of F n (x, β) of (3.2). Let µ n (x, β) := n 1 F (x + d ni (β)), x R, β R p. 8

Rewrite r i ε i d ni ( β ), and note that because of the continuity of F, n 1 S(y) = F n (y, β ) F n ( y, β ), n 1, y 0, with probability (w.p.) 1. We use the following decomposition: Let D n (x, β) := { } I(ε i x + d ni (β)) I(ε i x) f(x)d ni (β), x R, β R p. Then, w.p.1, for all y 0, we have (4.4) S(y) np 1 (y) = n[v n (0) (y) V n (0) ( y)] (f(y) f( y))n ε n +D n (y, β) D n ( y, β) + (f(y) f( y)) d ni ( β). In both cases (i) and (ii) of Theorem 2.1, the behavior of all terms on the r.h.s. of (4.4) except for D n (y, β) can be rather easily obtained from Theorem 3.1, Theorem 3.2 and the conditions in (2.2), (2.3) on the regression model. We shall prove that D n (±y, β) is asymptotically negligible, uniformly in y 0, in probability. Fix a 0 < k < and let t R p be such that t k. Let (4.5) ξ ni := g ni (β 0 + n a t) g ni (β 0 ). Recall (2.1) and write ξ ni = (g ni (β 0 +n a t) g ni (β 0 ) n a t ġ ni )+n a t ġ ni, where ġ ni ġ ni (β 0 ). By (2.2) and (2.3), for any ϵ > 0, N s.t. for all n > N, (4.6) sup ξ ni ϵn a + k n a max ġ ni = O(n a ). 1 i n, t k 1 i n This then verifies the condition (3.16) for the ξ ni of (4.5), for each t. From (3.17) we obtain n 1/α D n (x, β 0 + n a t) = o p (1), for all x R, t k. This in turn together with the monotonicity of F n, F and the compactness of the set {t R p ; t k} and an argument as in Koul (2002a, ch. 8) now yields that for every 0 < k <, (4.7) n 1/α sup D n (x, β 0 + n a t) = o p (1) t k,x R implying by assumption (2.8) on β that (4.8) sup D n (x, β) = o p (n 1/α ). x R 9

By (4.2), a < a, which is equivalent to 1/α < d + 1/α = (a 1). Thus (4.8) implies (4.9) sup x R n a 1 D n (x, β) = n 1/α +a 1 sup n 1/α D n (x, β) = o p (1). x R Proof of (2.9) and (2.10). From the definition (4.3), we obtain that for any y > 0, (4.10) {s 1 y} = {S(y) (n + 1)2 1 }, n odd, {S(y) n2 1 } {s 1 y} {S(y) n2 1 1}, n even. Thus, to study the asymptotic distribution of s 1, it suffices to study those of S(y), y 0. Case (i): f(σ 1 ) f( σ 1 ). Let P n (x) denote the l.h.s. of (2.9). Let t n := (n a x + 1)σ 1. Assume n is large enough so that t n > 0. Then ( ) P n (x) = P S(t n ) (n + 1)/2, n odd P (S(t n ) n/2) P n (x) P (S(t n ) n2 1 1), n even. It thus suffices to analyze P (S(t n ) n2 1 + b), b R. Let ] S 1 (y) := n [n a 1 S(y) p 1 (y), y 0, s n := n a [2 1 + n 1 b p 1 (t n )]. Then P (S(t n ) n2 1 + b) = P ( S 1 (t n ) s n ). Recall p1 (σ 1 ) = 1/2 and hence s n = n a [p 1 (t n ) p 1 (σ 1 )] + n d 1/α b = n a [p 1 ((n a x + 1)σ 1 ) p 1 (σ 1 )] + n d 1/α b = xσ 1 [f(σ 1 ) + f( σ 1 )] + o(1). Next, by (2.2), (3.7), (4.4), (4.9), and uniform continuity of f, proving (2.9). S 1 (t n ) = [f(t n ) f( t n )] (n a ε ) n + n a 1 d ni ( β) +n a [V n (0) (t n ) V n (0) ( t n )] + n a 1 [D n (t n, β) D n ( t n, β)] = [f(σ 1 ) f( σ 1 )] (n a ε n + n a ( ˆβ β 0 ) ( ) ) n 1 ġ ni + o p (1), Case (ii): f(σ 1 ) = f( σ 1 ). Let t n := (n a x + 1)σ 1, where a is as in (4.1). Similarly to case (i), it suffices to analyze P (S(t n) n2 1 + b), b R. Let ] S1(y) := n [n a 1 S(y) p 1 (y), y 0, s n := n a [2 1 + n 1 b p 1 (t n)]. 10

Then P (S(t n) n2 1 + b) = P ( S 1(t n) s n), where s n = xσ 1 [f(σ 1 ) + f( σ 1 )] + o(1). On the other hand, using f(t n) f( t n) = O(n a ) and (2.3), (3.15), (4.4), (4.8), we obtain ) S 1 (t n) = [f(t n) f( t n)] (n a ε n + n a 1 d ni ( β) +n a [V n (0) (t n) V n (0) ( t n)] + n a 1 [D n (t n, β) D n ( t n, β)] = Z (σ 1 ) Z ( σ 1 ) + o p (1), proving (2.10) and completing the proof of Theorem 2.1. Proof of Theorem 2.2. Let (4.11) p 2 (y) := [F (y + x) F ( y + x)]df (x), y 0. Define σ 2 as the unique solution of (4.12) p 2 (σ 2 ) = 1/2. As noted in the introduction, σ 2 is median of the distribution of ε 1 ε 1, where ε 1 is an independent copy of ε 1. Recall r i r n,i = ε i d ni ( β ). The derivation of the asymptotic distribution of s 2 is facilitated by analysing the process (4.13) T (y) := I ( r i r j y), y 0. 1 i<j n Because of the continuity of F, 2n 1 T (y) = n [F n (y + x, β) F n ( y + x, β)]f n (dx, β) 1, n 1, y 0, with probability 1. Let Q n (x) := P (n a (s 2 σ 2 ) xσ 2 ), x n := (xn a + 1)σ 2 > 0. From the definition of s 2 in (1.3), we obtain ( ) Q n (x) = P T (x n) (N + 1)/2, N odd P (T (x n) N/2) Q n (x) P (T (x n) N2 1 1), N even. It thus suffices to analyze P (T (x n) N/2 + b), N := n(n 1)/2, b R. Let Then 2T (y) T 1 (y) := n a [ + 1 n 2 n ] na p 2 (y), kn := (N + 2b)na n 2 P (T (x n) N/2 + b) = P (T 1 (x n) k n). + na n na p 2 (x n). 11

Asymptotics of k n. Note Nn a /n 2 n a /2 = n a p 2 (σ 2 ). Therefore k n = n a [p 2 (x n) p 2 (σ 2 )] + o(1). But [ ( ) ] n a [p 2 (x n) p 2 (σ 2 )] = n a p 2 (n a x + 1)σ 2 p 2 (σ 2 ) [{ = n a F ( (n a x + 1)σ 2 + z ) F ( σ 2 + z )} { F ( (n a x + 1)σ 2 ) + z ) F ( σ 2 + z )}] df (z). Because F has uniformly continuous and bounded density f, n a {F (±(n a x + 1)σ 2 ) + z) F (±σ 2 + z)}df (z) = ±xσ 2 f(±σ 2 + z)df (z) + o(1), and, hence, (4.14) k n = xσ 2 [f(σ 2 + z) + f( σ 2 + z))df (z) + o(1). Asymptotics of T 1 (x n). Let From the definition of T 1, T 1 (y) = n a W n(x, β) := n a [F n (x, β) µ n (x, β)], x R, β R p. = [F n (y + z, β ) F n ( y + z, β )]F n (dz, β ) n a p 2 (y) [Wn(y + z, β ) Wn( y + z, β )]F n (dz, β ) + [µ n (y + z, β ) µ n ( y + z, β )]Wn(dz, β ) + n a [µ n (y + z, β ) µ n ( y + z, β )]µ n (dz, β ) n a p 2 (y) = E 1 (y) + E 2 (y) + E 3 (y), say. Integration by parts and a change of variable formula shows that E 2 (y) E 1 (y). We shall now approximate E 1 and E 3. But, first note that (2.2), (2.3) and (2.8) imply that (4.15) Also note (4.16) max d ni( β ) = O p (n a ). 1 i n [f(y + z) f( y + z)]f(z)dz = 0, y 0. 12

Let n,ij d ni ( β ) d nj ( β ). Observe that by (2.2), (2.3), (4.17) n 1 d 1/α n,ij = O p (1), j=1 n,ij = 0, w.p. 1. j=1 The smoothness of F and (4.11) together with (4.16) imply that E 3 (y) := n a [µ n (y + z, β ) µ n ( y + z, β )]µ n (dz, β ) n a p 2 (y) = n a 2 = n a 2 { F (y + z + n,ij) F ( y + z + n,ij) j=1 j=1 f(z)dz n,ij 0 } F (y + z) + F ( y + z) df (z) (f(y + z + u) f(y + z))du. Using the inequalities f(x) + f (x) C(1 + x ) γ, x R for any 1 < γ < α, see (5.8), and (4.18) (1 + w + z ) γ dw C(1 + z ) γ (v v γ ), z R, v > 0, w <v for 1 < γ 2, from eqn. (10.2.33) in GKS which are often used in this paper, we obtain n,ij 0 n,ij γ n,ij 2 (f(y + z + u) f(y + z))du C C 2 (1 + y + z ) n,ij(1 + y + z ) γ. γ This in turn implies sup E 3 (y) Cn a 2 y>0 Cn a 2 according to (4.15) and (4.2). 2 n,ij sup (1 + z ) γ (1 + y + z ) γ dz y>0 2 n,ij = O p (n a 2a ) = o p (1), j=1 j=1 Next, let V n (z, β) := F n (z, β) F (z) and Kn(y, β) := [Wn(y + z, β) Wn( y + z, β)]df (z), U 1 (y, β) := [Wn(y + z, β) Wn( y + z, β)]dv n (z, β). Then E 1 (y) = K n(y, β) + U 1 (y, β). 13

Kn(x n) D From (3.15) and (4.16) we have for Kn(y) Kn(y, β 0 ) that (4.19) [Z (σ 2 + z) Z ( σ 2 + z)]df (z). We shall prove that (4.20) (4.21) sup Kn(y, β) Kn(y) = o p (1), y>0 sup U 1 (y, β) = o p (1). y>0 To prove (4.20) note that (4.22) Kn(y, β 0 + n a t) Kn(y; β 0 ) [ { = n 1/α I(εi y + z + ξ ni ) I(ε i y + z) ξ ni f(y + z) } { I(εi y + z + ξ ni ) I(ε i y + z) ξ ni f( y + z) }] df (z), where ξ ni are as (4.5). In view of (4.6) these ξ ni satisfy (3.16). Hence (4.16) implies that for any t R p, the left hand side of (4.22) tends to zero in probability, uniformly in y R. Whence, (4.20) follows using condition (2.8) and the compactness argument, similar to (4.7). Next, consider (4.21). We shall first prove (4.23) sup U 1 (y, β 0 ) = o p (1). y>0 We have U 1 (y, β 0 ) = n 1 1/α (F n (y + z) F (y + z) + f(y + z) ε n )dv n (z, β 0 ) n 1 1/α (F n ( y + z) F ( y + z) + f( y + z) ε n )dv n (z, β 0 ) +n 1 1/α ε n [f( y + z) f(y + z)]dv n (z, β 0 ) = I 1 (y) I 2 (y) + I 3 (y), say, where sup y>0 I i (y) = o p (1), i = 1, 2 according to (3.15), and sup y>0 I 3 (y) = O p (n a 2a ) = o p (1) follows from (2.7), (3.6), and (4.2). Thus, sup y>0 U 1 (y, β 0 ) = o p (1). Next, we shall prove that for every 0 < b <, (4.24) sup U 1 (y, β 0 + n a t) U 1 (y; β 0 ) = o p (1). y>0, t b 14

To this end, first fix a t b and let Ṽn(y, t) = V n (y, β 0 + n a t). Similarly as in (4.22) write U 1 (y, β 0 + n a t) U 1 (y; β 0 ) { n 1/α I(εi y + z + ξ ni ) I(ε i y + z) ξ ni f(y + z) } dṽn(z, t) + n 1/α { I(εi y + z + ξ ni ) I(ε i y + z) ξ ni f( y + z) } dṽn(z, t) + n 1/α ξ ni (f(y + z) f( y + z))dṽn(z, t) = J 1 (y, t) + J 2 (y, t) + J 3 (y, t), say, where sup y>0 J i (y, t) = o p (1), i = 1, 2 follow by (3.17) and the fact that the variation of z Ṽn(z, t) on R does not exceed 2, Ṽn(z, t) being the difference of two d.f.s. Relation sup y>0 J 3 (y, t) = O p (n a 2a ) = o p (1) follows from (2.7), (3.6) and (4.2) as above. Uniformity with respect to t R p, t b is obtained by using an argument as in Koul (2002, Ch. 8) and the monotonicity of the indicator function. From (4.20), (4.21), (4.23), (4.24) we conclude that for i = 1, (4.25) E i (x n) = K n(x n, β 0 ) + o p (1). Then from (4.19) we finally obtain (4.26) T 1 (x n) D 2 [Z (σ 2 + z) Z ( σ 2 + z)]df (z). Next, let h(z) := f(σ 2 + z) + f( σ 2 + z), z R, and Z2 1 (4.27) := [Z (σ 2 + z) Z ( σ 2 + z)]df (z). 2σ 2 h(z)df (z) The facts together with (4.14) and (4.26) in turn imply that P (n a (s 2 σ 2 ) xσ 2 ) = P (T 1 (x n) kn) P ( 2 [Z (σ 2 + z) Z ( σ 2 + z)]df (z) xσ 2 (f(σ 2 + z) + f( σ 2 + z))df (z) ) = P (Z 2 x). Let Z± be as in (3.10), ψ±(x) be as in (3.11), and define c 1/α ± ϖ ± := [ψ 2σ 2 h(z)df (z) ±(σ 2 + z) ψ±( σ 2 + z)]df (z). Then, in view of the definition of Z (x) in (3.12), the r.v. Z 2 in (4.27) can be represented as (4.28) Z 2 = ϖ + Z + + ϖ Z. Clearly, Z 2 in (4.28) has α -stable distribution. This completes the proof of Theorem 2.2. 15

5 Appendix A: Proofs of Theorems 3.1 and 3.2 We use several lemmas whose proofs are given in Section 6 (Appendix B). Lemma 5.1 Let η i, i 1 be independent r.v. s satisfying Eη i = 0 for all i and such that for some 1 < α < 2, (5.1) K := sup i 1 sup x α P ( η i > x) <. x>0 Then a iη i converges in L p, p < α for any real sequence a i, i 1, a i α < and, for 2(α 1) p < α, (5.2) E ( ( ) p/p a i η i p < C a i α + a i ), α p := 2α p > α, where the constant C = C(p, α, K) < depends only on p, α, K. (5.3) We need some more notation. For any integer j 0, define the truncated moving averages ε i,j := 0 k j b k ζ i k, ε i,j := k>j b k ζ i k, ε i,j := k 0,k j b k ζ i k. Thus, ε i,j + ε i,j = ε i, ε i,j = ε i b j ζ i j. Let F j (x) := P (ε i,j x), F j (x) := P ( ε i,j x), F j (x) := P (ε i,j x) be the corresponding marginal d.f.s. Also introduce ( (5.4) F l,j (x) := P 0 k j:k l ) b k ζ k x = P (ε i,j b i l ζ l x), 0 l < j. W.l.g., assume that F j and F l,j are not degenerate for any j 0, 0 l < j. Then P (ε i x + ξ ni ζ i j ) = F j (x + ξ ni b j ζ i j ) and ηns(x; ζ s ) := ( n s Fi s(x + ξ ni b i s ζ s ) F (x + ξ ni ) + f(x + ξ ni )b i s ζ s ). The proofs of Lemmas 5.3 5.5 use certain regularity properties of the d.f.s F (x), F j (x), F j (x), F l,j (x) given in the following lemma, whose proof is given in Section 6 (Appendix B) below. Related results can be found in (KS, Lemma 4.2), Surgailis (2002, Lemma 4.1), (GKS, Lemma 10.2.4). For 1 < γ 2 introduce g γ (x) := (1 + x ) γ, x R and a finite measure µ γ on R by (5.5) µ γ (x, y) := y x g γ(u)du, x < y. 16

Note the elementary inequalities: for any x, y R g γ (x + y) Cg γ (x)(1 y ) γ, y (5.6) g 0 γ(x + w)dw Cgγ (x)( y y γ ), see (GKS, Lemma 10.2.3). We shall also use the inequality y x g γ(u + v)du C(µ γ (x, y)) 1/γ v, v 1, (5.7) where C does not depend on x, y, v; see Surgailis (2002, p.264). Given a function g(x), x R and points x < y, z R we use the notation g(x, y) := g(y) g(x), g((x, y) + z) := g(y + z) g(x + z). Lemma 5.2 Suppose standard assumptions and conditions (2.4), (2.5) hold. Then for any p = 1, 2, 3 the d.f. F, F j, F j, F l,j, j 0, 0 l < j are p times continuously differentiable. Moreover, for any 1 < γ < α, p = 1, 2, 3 there exists a constant C > 0 such that for any x, y R, x y 1, j 0, 0 l < j (5.8) (5.9) (5.10) F (p) (x) + F (p) j (x) + (Fj ) (p) (x) + F (p) l,j (x) Cg γ(x), F (p) (x, y) + F (p) j (x, y) + (Fj ) (p) (x, y) + F (p) l,j (x, y) C x y g γ(x), 2 (Fj ) (p) (x) F (p) (x) C b j α g γ (x). p=1 Moreover, for any x < y, j 0, j 2 > j 1 0, z, z 1, z 2, ξ R, ξ 1 (5.11) (5.12) (5.13) (5.14) F ((x, y) b j z) F (x, y) + f(x, y)b j z C min { µ γ (x, y) b j γ, (µ γ (x, y)) 1/γ b j }, Fj ((x, y) b j z) F (x, y) + f(x, y)b j z Cµ γ (x, y) b j γ (1 + z ) γ, ξ ( f((x, y) + v bj z) Ef((x, y) + v b j ζ) + f ((x, y) + v)b j z ) dv 0 w b j1 z 1 w 2 b j2 z 2 C ξ µ γ (x, y) b j γ (1 + z ) γ, F j 1,j 2 ((x, y) + w 1 + w 2 + z)dw 2 dw 2 Cµ γ (x, y) b j1 b j2 (1 + z 1 ) γ (1 + z 2 ) γ (1 + z ) γ. Proof of Theorem 3.1. (a) The proof of (3.6) given in (KS, Thm. 2.1) for the case c + = c of (1.5) applies verbatim in the general case of c ± in (1.5). (We note that KS does not assume the distribution of ζ 0 symmetric around 0.) (b) The sum on the l.h.s. of (3.9) can be written as 4 L ni(x), where L n1 (x) := nv n (ξ) (x), L n2 (x) := nv n (0) (x) and L n3 (x) := n (F (x + ξ ni) F (x) f(x)ξ ni ), L n4 (x) := n (f(x) f(x + ξ ni))ε i. 17

Then sup x R n d 1/α L ni (x) = o p (1), i = 1, 2 follow from (3.6). Using F (x) C, see (KS, Lemma 4.1) and (3.4), (3.5) we obtain sup x R n d 1/α L n3 (x) Cn d 1/α n ξ2 ni = o(n d 1/α n ξ ni ) = o(1). To evaluate L n4 (x), let a ns (x) := n d 1/α n (f(x) f(x + ξ ni ))b i s and δ n,ξ := max 1 i n ξ ni. Then n d 1/α L n4 (x) = n s= a ns(x)ζ s. Moreover, using f(x) f(x + ξ ni ) Cδ n,ξ = o(1), see (3.4), and the bound in Lemma 5.1, (5.2), with p < α sufficiently close to α, we obtain (5.15) E n d 1/α L n4 (x) p C ( n s= a ns(x) α + ( s= a ns(x) α) p/p ). But, for any fixed x R, (5.16) n s= a ns(x) α Cδ α n,ξ n dα 1 n s= ( n (i s)d 1 + ) α = O(δ α n,ξ ) = o(1). Similarly, for any x < y using f (x) C/(1 + x ) r, x R with 1 < r < α, see (KS, (4.4)), we obtain E n d 1/α L n4 (x, y) p C ( n s= a ns(x, y) α + ( s= a ns(x, y) α) p/p ), and (5.17) n s= a ns(x, y) α C ( max 1 i n f(x + ξ ni, y + ξ ni ) α C(µr (x, y)) α. Because µ r (x, y) := y x (1 + z ) r dz is a finite continuous measure on R, it follows that E n d 1/α L n4 (x, y) p C(µ r (x, y)) pα/p, x < y, where pα/p > 1 provided p < α is chosen sufficiently close to α. Therefore using the well-known tightness criterion (see e.g. GKS, Lemma 4.4.1) we conclude that the sequence of random processes {n d 1/α L n4 (x), x R}, n 1 is tight in D( R), implying sup x R n d 1/α L n4 (x) = o p (1). This ends the proof of part (b) and Theorem 3.1. Proof of Theorem 3.2. Proof of (a). We follow the proof in Surgailis (2002, Thm. 2.1) and (2004, Thm. 2.1). The crucial decomposition leading to (3.14) is (5.18) R n (x) := V n (ξ) (x) Zn(x), where ] Zn(x) := n 1 ζs E [I(ε i x + ξ ni ) F (x + ξ ni ) + f(x + ξ ni )ε i. s<i We will show that Z n(x) is the main term and R n (x) is the remainder term. Note that Z n(x) = n 1 s<n η ns(x; ζ s ), where η ns(x; ζ s ) := s ] E [I(ε i x + ξ ni ) F (x + ξ ni ) + f(x + ξ ni )ε ζs i. We shall approximate the above quantity by η(x; ζ s ), where (5.19) η(x; z) := ( i=s F (x bi s z) EF (x b i s ζ s ) + f(x)b i s z ) = ( j=0 F (x bj z) EF (x b j ζ 0 ) + f(x)b j z ) 18

and let (5.20) Z n (x) := n 1 n s=1 η(x; ζ s), x R. Because Z n (x) is a sum of i.i.d. r.v. s, its α -stable limit will follow from the classical central limit theorem for independent r.v. s with heavy-tailed distribution. The proof of (3.14) or of part (a) follows from the following three lemmas. Lemma 5.3 n 1 1/α Z n (x) = D( R) Z (x). Lemma 5.4 n 1 1/α sup x R Z n(x) Z n (x) p 0. Lemma 5.5 n 1 1/α sup x R R n (x) p 0. Proof of (b). As in the proof of Theorem 3.1(b), decompose the sum on the l.h.s. of (3.17) as 4 L ni(x). By (3.14), sup x n 1/α L n1 (x) + L n2 (x) p 0. Next, by Taylor s expansion and condition (3.13), sup x L n3 (x) = O( n ξ2 ni) = o(n 1/α ), since 1/α + 1 > 2d + 2/α, see above. ( The term L n4 (x) can be estimated as in (5.15), (5.16), yielding E n 1/α L n4 (x) p C n dα+1 α/α δn,ξ α + ( n dα+1 α/α δn,ξ) ) α p/p with δ n,ξ C max 1 i n ξ ni = O(n d+1/α 1 ), see (3.13), hence E n 1/α L n4 (x) p = o(1) for any x R. The proof of sup x n 1/α L n4 (x) = o p (1) is similar as in Theorem 3.1(b) by showing the bound E n 1/α L n4 (x, y) p C(µ(x, y)) pα/p, x < y which follows as in (5.17) but uses a more accurate bound: n s= a ns(x, y) α C ( max 1 i n f(x + ξ ni, y + ξ ni f(x, y) ) α C(µ(x, y)) α max 1 i n ξ ni α together with condition (3.13). This proves part (b) and ends the proof of Theorem 3.2. 6 Appendix B: proofs of auxiliary lemmas Here we present the proofs of Lemmas 5.1-5.5 used in the proofs of Theorems 3.1 and 3.2. Proof of Lemma 5.1. W.l.g. let a i 0, i 1. Split η i = η + i + η i, η+ i := η i I( η i 1/ a i ) Eη i I( η i 1/ a i ), η i := η i I( η i > 1/ a i ) Eη i I( η i > 1/ a i ). Then (η ± i, i 1) are sequences of independent zero mean r.v. s and E a iη i p CE a iη + i p + CE a iη i p. By the well-known moment inequality, see e.g. (GKS, Lemma 2.5.2) E a iη i p CE a iη i p + C ( E a ) iη + p/p i p C a i p E η i p + C ( E a ) i p E η + p/p i p. 19

Then (5.2) follows from (6.1) E η i p C a i α p, E η + i p C a i α p, i 1. Note E η i p 2E η i p I( η i > 1/ a i ), E η + i p 2E η i p I( η i 1/ a i ). Hence (5.1) implies E η i p 2 1/ a i 2K a i α p + 2pK Similarly, E η + i p 2 x p dp ( η i > x) = 2 a i p P ( η i > 1/ a i ) + 2p 1/ ai 0 1/ a i x p 1 α dx C a i α p, i 1. x p dp ( η i > x) 1/ a i 1/ ai = 2 a i p P ( η i > 1/ a i ) + 2p x p 1 P ( η i > x)dx 2K a i α p + 2p K 1/ ai 0 0 x p 1 α dx C a i α p, i 1. x p 1 P ( η i > x)dx proving (6.1) and the lemma, too. Proof of Lemma 5.2. The first part of the proposition including (5.8), (5.9) follows by the argument in (KS, Lemma 4.2) with minor changes. The proof of (5.10) also proceeds as in the proof (KS, (4.6)), as follows. Since F (x) = F j (x b j y)dg(y) and Eζ = ydg(y) = 0 so F (p) (x) (F j ) (p) (x) = ((F j ) (p) (x yb j ) (F j ) (p) (x) + yb j (F j ) (p+1) (x))dg(y) and F (p) (x) (F j ) (p) (x) 4 J i(x) where J 1 (x) := yb j 1 dg(y) yb j 0 ((F j ) (p+1) (x + u) (F j ) (p+1) (x))du, J 2 (x) := yb j >1 (F j ) (p) (x yb j )dg(y), J 3 (x) := yb j >1 (F j ) (p) (x)dg(y), J 4 (x) := (f j ) (x)b j yb j >1 ydg(y). To evaluate J 1 (x) for p = 1, 2 use (5.9) yielding yb j 0 (F j ) (p+1) (x, x+u)du Cg γ (x) yb j 0 udu Cg γ (x)(yb j ) 2 and then J 1 (x) Cb 2 jg γ (x) yb j 1 y2 dg(y) C b j α g γ (x) follows from P ( ζ > x) Cx α. Next, using (5.8) and (5.6) we obtain J i (x) C yb j >1 (g γ(x) + g γ (x b j y))dy Cg γ (x) yb j >1 (1 + b jy γ )dg(y) C b j α g γ (x), i = 2, 3, J 4 (x) Cg γ (x) b j yb j >1 y dg(y) C b j α g γ (x), 20

thereby proving (5.10). Consider (5.11). Write the l.h.s. of (5.11) as b j z dw y (f (v + w) f (v))dv 0 x dw y f w b j (v + w) f (v) dv =: I z x j (x, y; z). Let b j z 1 then by (5.8) y x f (v + w) f (v) dv Cµ γ (x, y) w and hence I j (x, y; z) Cµ γ (x, y) w b j z w dw Cb2 jz 2 µ γ (x, y) C b j z γ µ γ (x, y). Next, let b j z > 1 then the l.h.s. of (5.11) does not exceed F ((x, y) b j z) F (x, y) + E F ((x, y) b j ζ) F (x, y) + f(x, y) b j z, where f(x, y) b j z Cµ γ (x, y) b j z C(µ γ (x, y)) 1/γ b j z follows from (5.9), γ > 1 and the boundedness of µ γ. Next, using (5.8) and (5.7) for b j z > 1 we obtain F ((x, y) b j z) F (x, y) y (f(u b x jz) + f(u))du C((µ γ (x, y)) 1/γ b j z + µ γ (x, y)) C(µ γ (x, y)) 1/γ b j z. This proves (5.11). Consider (5.12). Since F (x) = EF j (x b j ζ) the l.h.s. of (5.12) can be written as L 1 (x, y; z)+l 2 (x, y; z), where L 2 (x, y; z) := (f j (x, y) f(x, y))b j z Cµ γ (x, y) b j α+1 z Cµ γ (x, y) b j γ (1 + z ) γ according to (5.10), and L 1 (x, y; z) := E bj z b j ζ dw y x ((f j ) (v + w) (f j ) (v))dv L1 (x, y; z) + E L 1 (x, y; ζ), where L 1 (x, y; z) := b j z 0 dw y x (f j ) (v + w) (f j ) (v) dv. Let b j z 1 then L 1 (x, y; z) Cµ γ (x, y) w b j z w dw Cb2 jz 2 µ γ (x, y) C b j z γ µ γ (x, y) as in the proof of (5.11) above. Next, let b j z > 1 then L 1 (x, y; z) C y x dv b j z 0 (g γ (v+w)+g γ (w))dw C b j z γ y x g γ(v)dv = Cµ γ (x, y) b j z γ follows from (5.8) and (5.6). Hence L 1 (x, y; z) Cµ γ (x, y) b j z γ, E L 1 (x, y; ζ) Cµ γ (x, y) b j γ E ζ γ Cµ γ (x, y) b j γ since γ < α, proving (5.12). Next, the l.h.s. of (5.13) can be written as T j (x, y; z, ξ) ET j (x, y; ζ, ξ), where T j (x, y; z, ξ) := dg(u) ξ dv b j z dw y (f 0 b j (t + v + w) f (v + w))dt, and hence u x where T j (x, y; z, ξ) T j (x, y; z, ξ) + E T j (x, y; ζ, ξ), T j (x, y; z, ξ) := v ξ dv w b j z dw y x f (t + v + w) f (t + v) dt. Let b j z 1 then by (5.9) and (5.6), y f (t + v, t + v + w) dt C w µ x γ (x v, y v) C w µ γ (x, y) for v ξ 1 implying T j (x, y; z, ξ) C ξ (b j z) 2 µ γ (x, y) C ξ b j z γ µ γ (x, y). Next, let b j z > 1 then by (5.8) and (5.6) we obtain that T j (x, y; z, ξ) C ξ dv { dw 0 w b j z y g x γ(t + v + w)dt + y g x γ(t + v)dt } C ξ b j z γ µ γ (x, y), proving (5.13). Finally, consider (5.14). In view of (5.8), the l.h.s. of (5.14) can be bounded by C y dξ dw x w b j1 z 1 1 g w 2 b j2 z 2 γ(ξ + w 1 + w 2 + z)dw 2 =: U j1,j 2 (x, y; z 1, z 2, z). Then using repeatedly inequality (5.6) we get y U j1,j 2 (x, y; z 1, z 2, z) C( b j2 z 2 b j2 z 2 γ ) dξ g γ (ξ + w 1 + z)dw 1 21 x w b j1 z 1

proving (5.14) and the lemma, too. Proof of Lemma 5.3. y C( b j1 z 1 b j1 z 1 γ )( b j2 z 2 b j2 z 2 γ ) g γ (ξ + z)dξ x C( b j1 z 1 b j1 z 1 γ )( b j2 z 2 b j2 z 2 γ )(1 z γ )µ γ (x, y), From (5.11) we get F (x b j z) EF (x b j ζ 0 ) + f(x)b j z C b j r (1 + z ) r with any 1 < r < α sufficiently close to α so that the series in (5.19) absolutely converges for any z R: η(x; z) C(1 + z ) r j=0 b j r C(1 + z ) r. We shall next prove the existence of the limits (6.2) lim 1 z ± z 1 d η(x; z) = ψ ± (x), with ψ ±(x) given in (3.11). Note the integrals in (3.11) converge for 0 < d < 1/2 since F is twice differentiable. To prove (6.2), let (6.3) η(x; z) := j=0 ( F (x bj z) F (x) + f(x)b j z ), where F (x b j z) F (x) + f(x)b j z u < b j z f(x + u) f(x) du Cg r(x) b j z r, for any 1 < r < α follows from (5.9), (5.6), and hence the series in (6.3) converges for any x, z R and E η(x; ζ) < C. Since η(x; z) = η(x; z) E η(x; ζ) it suffices to prove (6.2) for η(x; z) replaced by η(x; z). We have for z > 0 that z 1 1 d η(x; z) = ψ + (x) + ϕ(x; z), where ϕ(x; z) = 0 { F (x zb[uz 1/(1 d) ]) F (x c 0 u 1 d ) + f(x) ( zb [uz 1/(1 d) ] c 0 u 1 d )} du 0 as z, by the dominated convergence theorem. The limit as z in (6.2) follows analogously. Relations (6.2) and (1.5) imply tail relations P (η(x, ζ 0 ) > y) γ + (x)y α, P (η(x, ζ 0 ) < y) γ (x) y α, as y with γ ± (x) 0 written in terms of c ±, ψ ±(x). See Surgailis (2004, Lemma 3.2), (2002, Lemma 3.1) for details. Hence, by the classical CLT for i.i.d. r.v. s and the Cramér-Wold device, finite-dimensional distributions of n 1 1/α Z n (x) converge weakly to those of Z. We shall now prove the tightness of the process n 1 1/α Z n. By the well-known tightness criterion in (Billingsley, 1968, Thm. 15.6), it suffices to show that there exist r > 1 and a finite continuous measure µ such that for all ϵ > 0, x < y (6.4) P ( Z n (x, y) > ϵn 1/α 1 ) ϵ α (µ(x, y)) r, where Z n (x, y) = Z n (y) Z n (x). Let η(x, y; z) = η(y; z) η(x; z), x < y. By the definition of Z n in (5.20), P ( Z n (x, y) > ϵn 1/α 1 ) np ( η(x, y; ζ) > ϵn 1/α ). Hence (6.4) follows from P ( η(x, y; ζ > ϵn 1/α ) (nϵ α ) 1 (µ(x, y)) r, or (6.5) P ( η(x, y; ζ) > w) w α (µ(x, y)) r, w > 0. 22

In turn, (6.5) follows from P ( ζ > x) C/x α, x > 0 and the fact that there exist 1 < γ < α, C < such that for all x < y, z > 1, with µ γ as in (5.5), (6.6) η(x, y; z) C z 1/(1 d) (µ γ (x, y)) 1/γ(1 d). Indeed, (6.6) entails P ( η(x, y; ζ) > w) P (C ζ 1/(1 d) (µ γ (x, y)) 1/γ(1 d) > w) = P ( ζ > Cw 1 d /(µ γ (x, y)) 1/γ ) Cw α (µ γ (x, y)) α/γ implying (6.5) with r = α/γ > 1. See Surgailis (2002, proof of Lemma 3.2). Let us show (6.6). Since η(x, y; z) = η(x, y; z) E η(x, y; ζ) η(x, y; z) + E η(x, y; ζ) and E ζ 1/(1 d) < for 1/(1 d) < α, it suffices to prove (6.6) for η(x, y; z) instead of η(x, y; z) as this implies E η(x, y; z) C(µ γ (x, y)) 1/γ(1 d) E ζ 1/(1 d) C(µ γ (x, y)) 1/γ(1 d) z 1/(1 d), z 1. Using (5.11) with χ := z (µ γ (x, y)) 1/γ we obtain η(x, y; z) C min { b j z (µ γ (x, y)) 1/γ, b j z γ µ γ (x, y) } j=0 Cχ γ j>χ 1/(1 d) j (1 d)γ + Cχ C(µ γ (x, y)) 1/γ(1 d) z 1/(1 d), j (1 d) + 0 j χ 1/(1 d) proving (6.6), (6.4) and the required tightness of the process n 1 1/α Z n (x). Lemma 5.3 is proved. Proof of Lemma 5.4. measure µ on R such that We shall prove that there exist 1 < r < α, κ > 0 and a finite (6.7) E Z n(x, y) Z n (x, y) r µ(x, y)n r(1/α 1) κ, x < y. Lemma 5.4 follows from (6.7) using the chaining argument as in KS and Surgailis (2002). To prove (6.7), similarly to Surgailis (2004, proof of (3.6)) decompose (6.8) n(z n(x) Z n (x)) = 4 V ni(x), where V n1 (x) := s 0 ϕ n1,s(x; ζ s ), V ni (x) := n s=1 ϕ ni,s(x; ζ s ), i = 2, 3, 4 with { ϕ n1,s (x; z) := F i s (x + ξ ni b i s z) F (x + ξ ni ) + f(x + ξ ni )b i s z }, ϕ n2,s (x; z) := ϕ n3,s (x; z) := n ϕ n4,s (x; z) := n i=n+1 i=s i=s { F (x bi s z) EF (x b i s ζ 0 ) + f(x)b i s z }, { F i s (x + ξ ni b i s z) F (x + ξ ni b i s z) F (x + ξ ni ) + EF (x + ξ ni b i s ζ 0 ) }, { F (x + ξni b i s z) F (x b i s z) EF (x + ξ ni b i s ζ 0 ) + EF (x b i s ζ 0 ) + (f(x + ξ ni ) f(x))b i s (z Eζ 0 ) }. 23

Note the for each 1 i 4, ϕ ni,s (x; ζ s ), s Z, are zero mean independent r.v. s. It suffices to prove that for some r i > 1, γ i > 1, κ i > 0, (6.9) E V ni (x, y) r i C(µ γi (x, y)) r i n r i/α κ i, x < y, 1 i 4. Indeed, (6.9) imply (6.7) with r = min{r i, 1 i 4} > 1, κ = min{κ i r/r i, 1 i 4} and µ = Cµ γ, γ = min{γ i, 1 i 4} which follows from E V ni (x, y) r (E V ni (x, y) r i ) r/r i C(µ γi (x, y)) r (n r i/α κ i ) r/r i Cµ γ (x, y)n r/α κ. Because of the last fact we will not indicate the subscript in r i, γ i, κ i in the subsequent proof of (6.9). Consider (6.9) for i = 1. Then E V n1 (x, y) r C s 0 E ϕ n1(x, y; ζ s ) r, 1 r 2. Moreover, w.l.g. we can restrict the proof to ξ ni 0. Hence, using Minkowski s inequality, E V n1 (x, y) r n+s { } r C E F j ((x, y) b j ζ 0 ) F (x, y) + f(x, y)b j ζ 0 C s=1 j=s ( n+s s=1 j=s E 1/r ) r. F j ((x, y) b j ζ 0 ) F (x, y) + f(x, y)b j ζ 0 r Whence and from (5.12) with γ = α and 1 < r < α < α < α < αr and α sufficiently close to α so that (1 d)α /r > 1 follows from r < α, we obtain E V n1 (x, y) r C(µ γ (x, y)) r ( n+s s=1 j=s b j ) α /r r { ( C(µ γ (x, y)) r j ) (1 d)α /r r ( ) } + ns (1 d)α /r r s=1 j=s s>n C(µ γ (x, y)) r n 1+r α (1 d), implying (6.9) for i = 1 since 1 + r α (1 d) < r/α follows by noting that for α = α the last inequality becomes r(1 1/α ) < α 1, or r < α, and hence it holds also for all α < α sufficiently close to α. Next, consider (6.9) for i = 2. Use (5.11) with γ = α and r, α as above to obtain E V n2 (x, y) r ( C(µ γ (x, y)) r E 1/r ) r F ((x, y) b j ζ 0 ) EF ((x, y) b j ζ 0 ) + f(x, y)b j ζ 0 r s=1 j=s ( C(µ γ (x, y)) r j ) (1 d)α /r r C(µ γ (x, y)) r n 1+r α (1 d), s=1 j=s proving (6.9) for i = 2. To prove (6.9) for i = 3, note that (5.13) implies 24

y Fj ((x, y) b j z + ξ ni ) F ((x, y) b j z + ξ ni ) C b j α (1 + u b j z + ξ ni ) γ du x C b j α µ γ (x, y)(1 + b j z ) γ and hence E F j ((x, y) b j z + ξ ni ) F ((x, y) b j z + ξ ni ) r C b j αr (µ γ (x, y)) r (1 + E ζ γr ) C(µ γ (x, y)) r b j αr for r, γ > 1 satisfying rγ < α. Moreover, we may choose α < r < α. Then, since j=0 b j α <, we obtain E V n3 (x, y) r C(µ γ (x, y)) r n s=1 ( j=1 b j α) r C(µγ (x, y)) r n, proving (6.9) for i = 3, because r/α > 1. Consider (6.9) for i = 4. Note ϕ n4,s (x, y; z) can be rewritten as { ξni dg(u) dv ( f((x, y)+v b i s z) f((x, y)+v b i s u)+f ((x, y)+v)b i s (z u) )}. i=s 0 Hence by (5.13) with γ = 1/(1 d) (1, α) and 1 < r < α we obtain E ϕ n4,s (x, y; ζ) r C ( n i=s ξ ni µ γ (x, y) b i s γ E 1/r (1 + ζ ) γr ) r C(µ γ (x, y)) r max 1 i n ξ ni r ( n i=s b i s 1/(1 d) ) r, by noting that E ζ γr < since γr = r/(1 d) < α is equivalent to r < α. Note also that for the above choice b j γ = O(j 1 ). Then E V n4 (x, y) r C(µ γ (x, y)) r max 1 i n ξ ni r n ( n s=1 j=1 j 1) r C(µ γ (x, y)) r max 1 i n ξ ni r n(log n) r. Thus, (6.9) for i = 4 holds due to condition (3.13) since one can choose r < α and κ > 0 so that r/α κ is arbitrary close to 1. This ends the proof of (6.9) and that of Lemma 5.4. Proof of Lemma 5.5. Rewrite (6.10) R n (x) = V n (ξ) (x) Zn(x) = n 1 h ni (x), where h ni (x) := I(ε i x + ξ ni ) F (x + ξ ni ) + f(x + ξ ni )ε i E [ ] I(ε i x + ξ ni ) F (x + ξ ni ) + f(x + ξ ni )ε i ζ s s i Similarly as in the proof of the previous lemma, it suffices to prove that there exist 1 < r < α, κ > 0 and a finite measure µ on R such that for all n 1 and x < y (6.11) E h ni (x, y) r µ(x, y)n (r/α ) κ. 25

Recall the notation ε i,j, ε i,j, ε i,j, in (5.3) and F j (x) := P (ε i,j x), F j (x) := P ( ε i,j x), F j (x) := P (ε i,j x). Then (6.12) h ni (x) = s i U ni,s (x), where U ni,s (x) := F i s 1 (x + ξ ni b i s ζ s ε i,i s ) F i s (x + ξ ni ε i,t s ) F i s (x + ξ ni b i s ζ s ) + F (x + ξ ni ) with the convention F 1 (x) := I(x 0). Then n h ni(x, y) = s n M ns(x, y), where (6.13) M ns (x, y) := s U ni,s (x, y), s n is a martingale difference sequence with E[M ns (x, y) ζ u, u s] = 0. Hence by the well-known martingale moment inequality, see e.g. (GKS, Lemma 2.5.2), (6.14) E h ni (x, y) r C E M ns (x, y) r s n C s n ( To proceed further, we need a good bound of E U ni,s (x, y) r. s E 1/r U ni,s (x, y) r ) r. Towards this end, noting that E[U ni,s (x, y) ζ s ] = 0, we expand U ni,s (x, y) similarly as in (6.12) but according to the conditional probability P [ ζ s ], see Surgailis (2004, (4.8)): U ni,s (x, y) = u<s(e[u ns (x, y) ζ s, ζ v, v u] E[U ns (x, y) ζ s, ζ v, v u 1]) = u<s W n,i,s,u (x, y), where W n,i,s,u (x) := F i s,i u 1 (x + ξ ni b i s ζ s b i u ζ u ε i,i u ) F i s,i u (x + ξ ni b i s ζ s ε i,t u ) F i s 1 (x + ξ ni b i u ζ u ε i,i u ) + F i s (x + ξ ni ε i,i u ). Similarly to Surgailis (2004, (4.12)) W n,i,s,u (x, y) can be rewritten as W n,i,s,u (x, y) = E 0[ H(b i s ζ + b i u η + ε) H(b i s ζ 0 + b i s η + ε) H(b i s ζ + b i s η 0 + ε) + H(b i s ζ 0 + b i s η 0 + ε) ] = E 0 [ b i s ζ b i s ζ 0 bi u η ] H (2) (z 1 + z 2 + ε)dz 1 dz 2, b i u η 0 where, for fixed n, u < s < i, x < y, H(z) := F i s,i u 1 ((x, y) + ξ ni z), ζ := ζ s, η := ζ u, ε := ε i,i u, 26

(ζ 0, η 0 ) is an independent copy of (ζ, η) and E 0 denotes the expectation w.r.t. (ζ 0, η 0 ) only; H (2) (z) = d 2 H(z)/dz 2. From (5.14) for any 1 < γ < α we get (6.15) W n,i,s,u (x, y) Cµ γ (x, y) b i s b i u ( ζ γ + 1)( η γ + 1), with probability 1 and nonrandom C <, and hence for any 1 < r < α, 1 < γ < α, rγ < α E[ W n,i,s,u (x, y) r ζ s ] C(µ γ (x, y)) r b i s r b i u r ( ζ s + 1) rγ. Then using the above mentioned martingale moment inequality w.r.t. P [ ζ s ] we obtain E[ U ni,s (x, y) r ζ s ] C E[ W n,i,s,u (x, y) r ζ s ] u<s C(µ γ (x, y)) r b i s r ( ζ s + 1) rγ u<s (i u) r(1 d) C(µ γ (x, y)) r (i s) 1 2r(1 d) ( ζ s + 1) rγ. Hence (6.16) E 1/r U ni,s (x, y) r Cµ γ (x, y)(i s) 1/r 2(1 d). Substituting (6.16) into (6.14) we obtain E (6.17) h ni (x, y) { r Cµ γ (x, y) s n ( ) r ( ) r } (i s) 1/r 2(1 d) + n i 1/r 2(1 d) Cµ γ (x, y)n 2+r 2r(1 d), for r < α sufficiently close to α. (Note that for any r < α we can find γ > 1 such that rγ < 1.) Also note for such r (6.18) 2 + r 2r(1 d) < r/α = r/α(1 d) Indeed, (6.18) holds for r = α since 2 + α 2α(1 d) < 1/(1 d) is equivalent to (1 2d)(α(1 d) 1) > 0, and therefore (6.18) holds for r < α sufficiently close to α by continuity. Relations (6.17) and (6.18) prove (6.11) and thereby complete the proof of Lemma 5.5. 7 References Astrauskas, A. (1983). Limit theorems for sums of linearly generated random variables. Lithuanian Math. J., 23, 127 134. Avram, F. and Taqqu, M.S. (1986). Weak convergence of moving averages with infinite variance. In: Eberlein, E. and M.S. Taqqu (eds), Dependence in Probability and Statistics, pp. 399-415. Birkhäuser, Boston. 27

Avram, F. and Taqqu, M.S. (1992). Weak convergence of sums of moving averages in the α-stable domain of attraction. Ann. Probab., 20, 483 503. Basse-O Connor, A., Lachièze-Rey, R. and Podolskij, M. (2016). Power variation for a class of stationary increments Lévy driven moving averages. Preprint. Christoph, G. and Wolf, W. (1992). Convergence Theorems with Stable Limit Law. Akademie-Verlag, Berlin. Dehling, H. and Taqqu, M.S. (1989). The empirical process of some long range dependent sequences with an application to U-statistics. Ann. Statist., 17, 1767 1783. Giraitis, L., Koul, H.L. and Surgailis, D. (1996). Asymptotic normality of regression estimators with long memory errors. Statist. Probab. Letters, 29, 317 335. Giraitis, L., Koul, H.L. and Surgailis, D. (2012). Large Sample Inference for Long Memory Processes. Imperial College Press, London. Ho, H.C. and Hsing, T. (1996). On the asymptotic expansion of the empirical process of long memory moving averages. Ann. Statist., 24, 992-1024. Huber, P.J. (1981). Robust Statistics. Wiley, New York. Hult, H. and Samorodnitsky, G. (2008). Tail probabilities for infinite series of regularly varying random vectors. Bernoulli, 14, 838 864. Ibragimov, I.A. and Linnik, Yu.V. (1971). Independent and Stationary Sequences of Random Variables. Wolters Noordhoff, Groningen. Kasahara, Y. and Maejima, M. (1988). Weighted sums of i.i.d. random variables attracted to integrals of stable processes. Probab. Th. Rel. Fields, 78, 75 96. Kokoszka, P.S. and Taqqu, M.S. (1995). Fractional ARIMA with stable innovations. Stoch. Proc. Appl. 60, 19 47. Koul, H.L. (2002). Asymptotic distributions of some scale estimators in nonlinear models. Metrika, 55, 75 90. Koul, H.L. (2002a). Weighted Empirical Processes in Dynamic Nonlinear Models. 2nd Edition. Lecture Notes Series in Statistics, 166, Springer, New York, N.Y., USA. Koul, H.L. and Surgailis, D. (2001). Asymptotics of empirical processes of long memory moving averages with infinite variance. Stochastic Process. Appl., 91, 309 336. Koul, H.L. and Surgailis, D. (2002). Asymptotic expansion of the empirical process of long memory moving averages. In: H. Dehling, T. Mikosch and M. Sorensen (eds.), Empirical Process Techniques for Dependent Data, pp. 213 239. Birkhäuser: Boston. Samorodnitsky, G. and Taqqu, M.S. (1994). Stable Non-Gaussian Random Processes. Chapman and Hall, New York. Surgailis, D. (2002). Stable limits of empirical processes of long memory moving averages with infinite variance. Stochastic Process. Appl., 100, 255 274. Surgailis, D. (2004). Stable limits of sums of bounded functions of long memory moving averages with finite variance. Bernoulli, 10, 327 355. 28