Nonparametric Change-Point Tests for Long-Range Dependent Data

Size: px

Start display at page:

Download "Nonparametric Change-Point Tests for Long-Range Dependent Data"

Eustacia Lyons
5 years ago
Views:

1 Nonparametric Change-Point Tests for Long-ange Dependent Data Herold Dehling Fakultät für Mathematik, uhr-universität Bochum Aeneas ooch Fakultät für Mathematik, uhr-universität Bochum Murad S. Taqqu Department of Mathematics and Statistics, Boston University March 2, 202 Abstract We propose a nonparametric change-point test for long-range dependent data, which is based on the Wilcoxon two-sample test. We derive the asymptotic distribution of the test statistic under the null-hypothesis that no change occured. In a simulation study, we compare the power of our test with the power of a test which is based on differences of means. The results of the simulation study show that in the case of Gaussian data, our test has only slightly smaller power than the difference-of-means test. For heavy-tailed data, our test outperforms the difference-of-means test. unning Head: Change-Point Tests for LD Data Keywords: Change-point problem, functional limit theorem, long memory, long-range dependent data, nonparametric test, Wilcoxon two-sample rank test.

2 Introduction and Statement of Main esults In this paper we study change-point problems in long-range dependent time series. Specifically we consider the model where the observations are generated by a stochastic process X i i of the type X i = µ i + ɛ i, where ɛ i i is a long-range dependent stationary process with mean zero, and where µ i i are the unknown means. We focus on the case when ɛ i i is an instantaneous functional of a stationary Gaussian process with non-summable covariances, i.e. ɛ i = Gξ i, i. We assume that ξ i i is a mean-zero Gaussian process with Eξi 2 = and long-range dependence, that is, with autocovariance function ρk = k D Lk, k, where 0 < D <, and where Lk is a slowly varying function. Moreover, G : is a measurable function satisfying EGξ i = 0. Based on observations X,..., X n, we wish to test the hypothesis H : µ =... = µ n that there is no change in the means of the data against the alternative A : µ =... = µ k µ k+ =... = µ n for some k {,..., n }. We shall refer to this test problem as H, A. In the case of independent observations, change-point problems have been much investigated, and both parametric as well as nonparametric tests have been studied. For an excellent survey, see the monograph by Csörgő and Horváth 997. There are also many results for weakly dependent observations. E.g., Ling 2007 studies the asymptotic distribution of the Wald test for parameter changes in near epoch dependent processes. Aue, Hörmann, Horváth and eimherr 2009 study tests for detecting breaks in the covariance structure of multivariate time series. Wied, Krämer and Dehling 20 study change-points in the correlation structure of near epoch dependent processes. In the case of long-range dependent data, much less is known. Csörgő and Horváth 997 study so-called cusum tests cumulative sum tests for changes in the mean, which are based 2

3 on statistics of the type max k n k X i n i= X i. Krämer, Sibbertsen and Kleiber 2002 observe that the presence of long-range dependence might lead to false rejection of the hypothesis of stationarity in change-point tests. Berkes, Horváth, Kokoszka and Shao 2006 propose cusum type tests for discriminating between long-range dependence and changes in the mean of a weakly dependent process. Giraitis, Leipus and Surgailis 996 consider change-point problems for linear processes where they test the hypothesis of stationarity against the alternative of some break. Their tests are based on supremum functionals of the empirical process. Horvath and Kokoszka 997 study cusum tests for change points for Gaussian processes. Wang 2003 studies cusum tests for change-points in multivariate processes that are subordinated to a Gaussian process. Wang 2008b studies cusum tests for linear processes. In the present paper we want to investigate rank tests for changes in the mean. Antoch, Hušková, Janic and Ledwina 2008 studied rank tests for i.i.d. observations. Wang 2008a investigated rank tests for change-points in long-range dependent linear processes. In this paper, we study rank tests for changes in the mean of long-range dependent processes that have a representation as instantaneous nonlinear functionals of stationary Gaussian processes. Tests for change-point problems usually start from the two-sample problem that one obtains when the change-point is regarded as being known, that is, where for a given k {,..., n }, the alternative is A k : µ =... = µ k µ k+ =... = µ n. For the test problem H, A k, the Wilcoxon two-sample rank test is a commonly used nonparametric test. The Wilcoxon test rejects for large and small values of the test statistic i= W k,n = {Xi X j }. i= j=k+ W k,n counts the number of times the second part of the sample exceeds the first part of the sample. Note that this is the Mann-Whitney U-statistic representation of Wilcoxon s rank test statistic; see e.g. Lehmann 975. Tests for the change-point problem H, A, where one does not suppose that k is known, can be based on the vector of test statistics W,n,..., W n,n. In a parametric model, the generalized likelihood ratio test rejects for large values of the maximum of the test statistics 3

4 obtained for the two-sample problems H, A k ; see e.g. Csörgő and Horvath 997. Also in the nonparametric case, a common procedure is to reject the null hypothesis for large values of W n = max k=,...,n W k,n, though one could in principle take other functions of W,n,..., W n,n. In order to set the critical values of the test, we need to know the distribution of W n under the null hypothesis of no change. Since the exact distribution is hard to obtain, we consider the asymptotic distribution for a large sample size. To this end we consider the process W n,[n λ] = [nλ] i= j=[nλ]+ {Xi X j }, 0 λ, parametrized by λ. After centering and normalizing, we obtain the process W n λ = n d n [nλ] i= j=[nλ]+ {Xi X j } F xdf x, 0 λ. 2 Observe that the centering constant F xdf x equals E {X X }, where X is an independent copy of X. The latter is the proper normalization as the dependence between X i and X j vanishes asymptotically when j i. If the distribution function F is continuous, we get F xdf x =. Note that under the null hypothesis, the distribution 2 of W n λ does not depend on the common mean µ := µ =... = µ n. Thus, we may assume without loss of generality that µ = 0 and hence that X i = Gξ i. In order to analyze the asymptotic distribution of the process W n λ 0 λ, we apply the empirical process invariance principle of Dehling and Taqqu 989 to the sequence Gξ i i. We consider the Hermite expansion {Gξi x} F x = q= J q x H q ξ i, q! where H q is the q-th order Hermite polynomial and where J q x = EH q ξ i {Gξi x}. We define the Hermite rank of the class of functions { {Gξi x} F x, x }, by m := min{q : J q x 0 for some x }, 3 and the normalizing constants d 2 n = Var H m ξ j. 4 j= 4

5 We make the further assumption that 0 < D <, in which case m d 2 n c m n 2 md L m n, 5 where c m = 2m! D m2 D m. Here we use the symbol a n b n to denote that a n /b n as n. Then by the two-parameter empirical process invariance principle of Dehling and Taqqu 989, d n [nλ]f [nλ] x F x x [, ],λ [0,] Jm x m! Z m λ, x [, ],λ [0,] weakly in D[, ] [0, ], where Z m λ λ [0,] is an m-th order Hermite process, defined in Taqqu 979, formula.3. A spectral representation of Z m is given in formula.7 of Dehling and Taqqu 989. Observe the separation of variables in the limit: the limiting process Z m x, λ is expressible as Jmx m! Z m λ. The process Z m λ λ 0 is self-similar with parameter H = md 2 2,, that is, Z m c λ and c H Z m λ have the same finite-dimensional distributions for all constants c > 0. The process Z m λ is not Gaussian when m 2. When m =, Z λ is the standard Gaussian fractional Brownian motion, often denoted B H λ. It is Gaussian with mean zero and covariance E Z λ Z λ 2 = 2 { λ 2H + λ 2H 2 λ λ 2 2H} 6 Theorem Suppose that ξ i i is a stationary Gaussian process with mean zero, variance, and autocovariance function as in with 0 < D <. Moreover let G : be a m measurable function and define X k = Gξ k. Assume that X k has a continuous distribution function F. Let m denote the Hermite rank of the class of functions {Gξi x} F x, x, as defined in 3 and let d n > 0 satisfy 5. Then n d n [nλ] i= j=[nλ]+ {Xi X j } F xdf x, 0 λ, 7 converges in distribution towards the process m! Z mλ λz m J m xdf x, 0 λ. Do not confuse the index H, which is called Hurst parameter with the other H s used in this paper to denote hypothesis and Hermite polynomial. 5

6 2 Proof of Theorem We introduce the empirical distribution functions of the first k and the last n k observations, respectively: F k x = k F k+,n x = i= n k {Xi x} i=k+ {Xi x} so that [nλ]f [nλ] x = [nλ] i= {X i x} and n [nλ]f [nλ]+,n = n i=[nλ]+ {X i x}. Then we obtain the following representation [nλ] i= j=[nλ]+ = [nλ] {Xi X j } F xdf x j=[nλ]+ F [nλ] X j F xdf x = [nλ]n [nλ] F [nλ] xdf [nλ]+,n x F xdf x = [nλ]n [nλ] F [nλ] x F xdf [nλ]+,n x +[nλ]n [nλ] F xdf [nλ]+,n F x = [nλ]n [nλ] F [nλ] x F xdf [nλ]+,n x [nλ]n [nλ] F [nλ]+,n x F xdf x, where in the third line we used the relation EKX = KxdF x, where F and E are respectively, the empirical distribution and the empirical mean of X, and where, in the final step, we have used integration by parts, namely G df = F dg, if F and G are two distribution functions. We now apply the empirical process non-central limit theorem of Dehling and Taqqu 989, which states that in the space D[, ] [0, ], equipped with the supremum norm, d n D [nλ]f nλ x F x JxZλ x [, ],λ [0,] x [, ],λ [0,], 9 8 where Jx = J m x and Zλ = Z m λ/m!. 6

7 Here, D denotes convergence in distribution of random variables with values in the nonseparable metric space D = D[, ] [0, ], as defined e.g. in Shorack and Wellner 986, Section 2.3. Given D-valued random variables W n n and W, we say that W n converges in distribution to W if EfW n EfW holds for all functions f : D that are bounded, uniformly continuous with respect to the supremum norm on D and measurable with respect to the σ-field D B on D that is generated by the open balls. We can then apply the Skorohod-Dudley-Wichura representation theorem; see Theorem in Shorack and Wellner 986. Note that C[, ] [0, ] is a separable subset of D, and that P JxZλ x [, ],λ [0,] C[, ] [0, ] =, since F is assumed to be continuous. Thus the conditions of the Skorohod-Dudley-Wichura theorem are met with M s = C[, ] [0, ]. Hence we may assume without loss of generality that convergence holds almost surely with respect to the supremum norm on the function space D[0, ] [, ], i.e. d n [nλ]f nλ x F x JxZλ 0 a.s. 0 sup λ,x Note that, by definition, n [nλ]f [nλ]+,n x F x = nf n x F x [nλ]f [nλ] x F x. Applying 0 to each of the terms on the right-hand side of, we obtain the following limit theorem for the two parameter empirical process of the observations X [nλ]+,..., X n. d n n [nλ]f [nλ]+,n x F x JxZ Zλ 0. 2 sup λ,x Thus we get, for the first term in the right-hand side of 8, [nλ]n [nλ] F [nλ] x F xdf [nλ]+,n x λ JxZλdF x 3 n d n = n [nλ] d n [nλ]f [nλ] x F xdf [nλ]+,n x n [nλ] JxZλdF x n n { } n [nλ] + λ JxZλdF x n = n [nλ] { d n [nλ]f [nλ] x F x JxZλ } df [nλ]+,n x n + n [nλ] JxZλdF [nλ]+,n F x n { } n [nλ] + λ JxZλdF x. n 7

8 The first term on the right-hand side converges to 0 almost surely, by 0. The third term converges to zero as sup 0 λ n [nλ] n λ 0. egarding the second term, note that JxdF x = EJX i and hence n [nλ] n = Zλ n = Zλ n JxZλdF [nλ]+,n F x i=[nλ]+ JX i EJX i JX i EJX i [nλ] JX i EJX i n. i= By the ergodic theorem, n n i= JX i EJX i 0, almost surely. Thus, given ɛ > 0, we can find n 0 = n 0 ɛ, ω such that, for all n n 0, n i= JX i EJX i ɛ. i= Hence, for n max{n 0, max ɛ k n 0 k i= JX i EJX i }, we have max JX k n i EJX i i= max max JX i EJX i k n 0, max JX i EJX i n 0 k n and thus ɛ n. k n i= max JX i EJX i = on, as n, a.s. 4 i= Hence 3 converges to zero almost surely, uniformly in λ [0, ]. egarding the second term on the right-hand side of 8 we obtain [nλ]n [nλ] F [nλ]+,n F xdf x λ JxZ ZλdF x 5 n d n = [nλ] { d n n [nλ]f [λn]+,n x F x JxZ Zλ } df x n λ [nλ] JxZ ZλdF x. n 8 i=

9 Both terms on the right-hand side converge to zero a.s., uniformly in λ [0, ]. For the first term, this follows from 2. For the second term, this holds since sup 0 λ [nλ] n λ 0, as n. Using 8 and the fact that the right-hand sides of 3 and 5 converge to zero uniformly in 0 λ, we have proved that the normalized Wilcoxon two-sample test statistic 7 converges in distribution towards λzλjxdf x λz ZλJxdF x, 0 λ. Finally, we observe that λzλjxdf x and thus we have established Theorem. λz ZλJxdF x = Zλ λz JxdF x, emark In our proof we have used an integration by parts in order to express our test statistic as a functional of the empirical process. A similar integration by parts technique was used by Dehling and Taqqu 989, 99. Note that in the present paper, we use a onedimensional integration by parts formula, whereas Dehling and Taqqu use a two-dimensional integration by parts. The latter would not work here, as the kernel {x y} does not have locally bounded variation. 2 As a corollary to Theorem, we obtain for fixed λ [0, ] the asymptotic distribution of the Wilcoxon two-sample test, where the two samples are X,..., X [nλ] X [nλ]+,..., X n. After centering and normalization, the corresponding test statistic is U n = n d n [nλ] i= j=[nλ]+ {Xi X j } F xdf x. From Theorem we obtain that U n converges in distribution to This can be rewritten as Zm λ λ Z m m! J m xdf x. λzm λ λz m Z m λ m! J m xdf x, an expression which is useful for the next remark. 9

10 3 A different two-sample problem arises when we observe samples from two independent long-range dependent processes X i i and X i i with identical joint distributions. In this case, the samples are X,..., X [nλ] X,..., X n [nλ]. The normalized two-sample Wilcoxon test statistic for this case is U n = n d n [nλ] i= n [nλ] j= {Xi X j } F xdf x. Following the proof of Theorem, and making the appropriate changes where needed, one can show that U n converges in distribution towards m! λz mλ λz m λ J m xdf x, where Z mλ is an independent copy of Z m λ 0 λ. Note that the limit distributions in the two models are different, as the joint distribution of Z m λ, Z m Z m λ is different from that of Z m λ, Z m λ. This is a result of the fact that the Hermite process does not have independent increments. Observe that this is in contrast to the short-range dependent case. In this situation, the Wilcoxon two-sample test statistic has the same distribution in both models; see Dehling and Fried 202. oughly speaking, the dependence washes away in the limit for short-range dependence, but not for long-range dependence. 3 Application In what follows, we will consider the model X i = µ i + Gξ i, i =,..., n, 6 where ξ i i is a mean-zero Gaussian process with Varξ i = and autocovariance function ρk satisfying. We wish to test the hypothesis H : µ =... = µ n 7 against the alternative A : µ =... = µ k µ k+ =... = µ n for some k {,..., n }. 0

11 3. Wilcoxon-type rank test The change-point test based on Wilcoxon s rank test will reject the null hypothesis for large values of W n = n d n max k n i= j=k+ We first note an invariance property of the test statistic W n. {Xi X j } 2. 8 Lemma The test statistic W n is invariant under strictly monotone transformations of the data, i.e. n d n max k n i= j=k+ {GXi GX j } 2 = n d n for all strictly monotone functions G :. max k n i= j=k+ {Xi X j } 2 Proof. If G is strictly increasing, this is obvious, as GX i GX j if and only if X i X j. If G is strictly decreasing, GX i GX j if and only if X j X i, and thus {GXi GX j } = {Xi X j }. Hence we get i= j=k+ {GXi GX j } = 2 i= j=k+ {Xi X j }, 2 and thus the lemma is proved. Since X i = µ i + Gξ i, under the null hypothesis, W n = max n d n k n {Gξi Gξ j } 2. 9 i= j=k+ Thus, applying Theorem and the continuous mapping theorem, under the null hypothesis, W n converges in distribution, as n, to sup Z m λ λ Z m m! m! 0 λ J m xdf x. In order to set critical values for the asymptotic test based on W n, we need to calculate the distribution of the last expression.

12 In what follows, we will assume that G is a strictly monotone function. In this case, combining 9 and Lemma we get that W n = max n d n k n Note that in this case, J x = Eξ {ξ x} = i= j=k+ x {ξi ξ j } 2. tϕtdt = ϕx, where ϕt = 2π e t2 /2 denotes the standard normal density function. In the last step, we have used the fact that ϕ t = t ϕt. Thus J x 0 for all x and hence the class of functions { {ξ x}, x } has Hermite rank m =. Moreover, as F is the normal distribution function we obtain J xdf x = We have thus proved the following theorem. ϕxϕxdx = ϕs 2 ds = 2 π. 20 Theorem 2 Let ξ i i be a stationary Gaussian process with mean zero, variance and autocovariance function as in with 0 < D <. Moreover, let G : be a strictly monotone function and define X i = µ i + Gξ i. Then, under the null hypothesis H: µ =... = µ n, the test statistic W n, as defined in 8, converges in distribution towards 2 π sup Z λ λz, 0 λ where Z λ λ 0 denotes the standard fractional Brownian motion process with Hurst parameter H = D/2 /2,. The normalizing constant in 8 satisfies n d n /2 Ln D D/2 n 2 D/2. Insert Figure here We have evaluated the distribution of sup 0 λ Z λ λz by simulating 0, 000 realizations of a standard fractional Brownian motion Z j t 0 t, j 0, 000, t =, 0 i 000, using the farma package in. For each realization, we have i 000 2

13 calculated M j := max i 000 Z j i i Zj as a numerical approximation to sup 0 λ Z j λ λ Z j. The empirical distribution F M x := 0, 000 #{ j 0, 000 : M j x} of these 0, 000 maxima was used as approximation to the distribution of sup 0 λ Z λ λz ; see Figure for the estimated probability density and the empirical distribution function, for the Hurst parameter H = 0.7. We have calculated the corresponding upper α-quantiles q α := inf{x : F M x α} 2 for H = 0.6, 0.7, 0.9; that is, D = 2 2H = 0.8, 0.6, 0.2; see Table. Insert Table here 3.2 A difference-of-means test As an alternative, we also consider a test based on differences of means of the observations. We consider the test statistic where D n := D k,n := n d n max D k,n, 22 k n i= j=k+ X i X j. One would reject the null hypothesis 7 if D n is large. To obtain the asymptotic distribution of the test statistic, we apply the functional non-central limit theorem of Taqqu 979 and obtain that n d n [nλ] i= j=[nλ]+ X i X j = [nλ] n [λn] Gξ i [λn] n d n i= w a m λ Z mλ Zm λ m! m! = a m m! Z mλ λ Z m, j=[nλ]+ Z mλ m! Gξ j where m denotes the Hermite rank of Gξ and where a m = EGξH m ξ is the Hermite coefficient. Applying the continuous mapping theorem, we obtain the following theorem concerning the asymptotic distribution of the D n. 3

14 Theorem 3 Let ξ i i be a stationary Gaussian process with mean zero, variance and autocovariance function as in with 0 < D < /m. Moreover, let G : be a measurable function satisfying EG 2 ξ < and define X i = µ i + Gξ i. Then, under the null hypothesis H : µ =... = µ n, the test statistic D n, as defined in 22, converges in distribution towards a m m! sup Z m λ λz m, 0 λ where Z m λ denotes the m-th order Hermite process with Hurst parameter H = Dm/2 /2,. emark 2 i Observe that D k,n may be rewritten as kn k D k,n = X i n d n k n k i= i=k+ Thus, the difference-of-means test is indeed equal to the cusum test, which was investigated by Horvath and Kokoszka 997 and Wang Theorem 3 can be obtained as a corollary of the results of the above mentioned papers. We have presented the short proof in the present paper for the ease of reference. X i ii For a strictly increasing function G, the Hermite rank is, since EGξH ξ = GsH sϕsds = s ϕsgs G sds > 0. Similarly, for a stricly decreasing function G we obtain EGξH ξ < 0. Thus, in these cases the test statistic D n converges under the null hypothesis towards a sup Z λ λz, 0 λ where Z is fractional Brownian motion with index H = D/2. Note that in this case, up to a norming constant, the limit distribution of the difference-of-means test is the same as for the test based on Wilxocon s rank statistics Difference-of-means test under fractional Gaussian noise We want to obtain a lower bound for the power of the difference-of-means test for fractional Gaussian noise, i.e. for the model X i = µ i + ξ i, i =,..., n. 4

15 While the distribution of the difference-of-means test statistic D n, as defined in 22, is not explicitly known, one can calculate the exact distribution of D k,n = n d n i= j=k+ X i X j, for the case of fractional Gaussian noise that we consider here. ecall that fractional Gaussian noise can be obtained by differencing fractional Brownian motion, that is ξ k = B H k B H k, 23 where B H t t 0 is the standard fractional Brownian motion with Hurst parameter H, which we denoted Z. Its covariance is given in 6. The random variables ξ k have then mean zero, variance and autocovariances ρk H2H k 2H 2 = where D = 2 2H, and thus they have long-range dependence. Consider the following alternative D Dk D, 24 2 H λ,h : EX i = 0 for i =,..., [nλ] and EX i = h for i = [nλ] +,..., n. We shall compute the exact distribution of D k,n under H λ,h and thus obtain a lower bound for the power of the test based on D n, defined in 22, since P D n q α P D [nλ],n q α, where {D n q α } is the rejection region and q α is given in 2. First note that d n = n H and thus nd n = n +H, where H is again the Hurst coefficient. D k,n has a normal distribution with mean ED k,n = n +H EX i EX j. i= j=k+ Thus a small calculation shows that { k n [nλ] h n ED k,n = +H n k [nλ] h n +H if k [nλ] if k [nλ]. 5

16 Note that max k n ED k,n = ED [nλ],n = n +H n [nλ] [nλ] h n H λ λ h. Since the variance of D k,n is not changed by the level shift, we get VarD k,n = Var = Var n +H n +H i= j=k+ ξ i ξ j n k ξ i k i= j=k+ ξ j = Var n n kb Hk kb +H H n B H k = Var n n B Hk kb +H H n. By the self-similarity of fractional Brownian motion, we finally get k VarD k,n = Var B H k n n B H k = Var B H + k2 n n Var B H 2 k k 2 n Cov B H, B H n 2H k k = + k2 n n k 2H + k 2H 2 n n n 2H 2 2H+ k k k = + k n n n n + k k 2H n n 2H k = k k k + k k 2H. n n n n n n Defining we thus obtain σ 2 λ = λ 2H λ λ λ + λ λ 2H, VarD k,n = σ 2 k/n. The variance is maximal for k = n/2, in which case we obtain VarD n/2,n = 2 2H The distribution of D k,n gives a lower bound for the power of the difference-of-means test at 6

17 the alternative H λ,h considered above. We have P D n q α P D [nλ],n q α = P D [nλ],n q α + P D [nλ],n q α D[nλ],n + n H λ λh σ2 λ = P +P Φ D[nλ],n + n H λ λh σ2 λ q α + n H λ λh + Φ σ2 λ q α + n H λ λh σ2 λ q α + n H λ λh σ2 λ q α + n H λ λh, σ2 λ where Φ is the c.d.f. of a standard normal random variable. E.g., for H = 0.7, λ = 2, we get q 0.05 = 0.87 using Table and thus n 0.3 h/4 P D n q 0.05 Φ Φ h n 0.3. σ2 /2 In this way, for sample size n = 500 and level shift h = we get Φ as lower bound on the power of the difference-of-means test. For the same sample size, but h = 0.5, we get the lower bound Φ Simulations In this section, we will present the results of a simulation study involving the Wilcoxon type rank test 8 and the difference-of-means test 22. We first investigate whether these tests reach their asymptotic level when applied in a finite sample setting, for sample sizes ranging from n = 0 to n =, 000. Secondly, we compare the power of the two tests for sample size n = 500 at various different alternatives A k : µ =... = µ k µ k+ =... = µ n. We let both the break point k and the level shift h := µ k+ µ k vary. Specifically, we choose k = 25, 50, 50, 250 and h = 0.5,, 2. As a basis for our simulations, we have taken realizations ξ,..., ξ n of a fractional Gaussian noise fgn process with Hurst parameter H; respectively D = 2 2H, see 23 and 24. We have repeated each simulation 0, 000 times. 7

18 5. Normally distributed data In our first simulations, we took Gt = t, so that X i i is fgn. F is then the c.d.f. Φ of a standard normal random variable. In order to determine the critical values for the test statistics W n and D n, we have applied Theorem 2 and Theorem 3. Since G is strictly increasing, Theorem 2 yields that, under the null hypothesis, W n has approximately the same distribution as 2 π sup Z λ λz. 0 λ Since G is the first Hermite polynomial, its Hermite rank is m = and the associated Hermite coefficient is a =. Hence, Theorem 3 yields that, under the null hypothesis, the test statistic D n has approximately the same distribution as sup Z λ λz, λ [0,] We have calculated asymptotic critical values for both tests by using the upper 5%-quantiles of sup λ [0,] Z λ λz, as given in Table. Thus the Wilcoxon-type test rejected the null hypothesis when W n 2 π q α, while the difference-of-means test rejected when D n q α, where q α is given in 2. Insert Table 2 here We have checked whether the tests reach their asymptotic level of 5% and counted the number of rejections of the null hypothesis in 0, 000 simulations, where the null hypothesis was true. We see in Table 2 that both tests perform well already for moderate sample sizes of n = 50, with the notable exception of the Wilcoxon-type test when H = 0.9, i.e. when we have very strong dependence. In that case, convergence in Theorem 2 appears to be very slow so that the asymptotic critical values are misleading when applied in a finite sample setting. Insert Figure 2 here In order to analyze how well the tests detect break points, we have introduced a level 8

19 shift h at time [nλ], i.e. we consider the time series { ξi for i =,..., [nλ] X i = ξ i + h for i = [nλ] +,..., n. We have done this for several choices of λ and h, for sample size n = 500. As Table 3 shows, both tests detect breaks very well and the better, the larger the level shift is and the more in the middle the shift takes place. When the break occurs in the middle, both tests perform equally well. Breaks at the beginning are better detected by the difference-of-means test. Insert Table 3 here 5.2 Heavy-Tailed Data In the second set of simulations, we took Pareto distributed data. Note that the Paretoβ, k distribution has distribution function k β F x = x if x k 0 else, where the scale parameter k is the smallest possible value for x and where β is a shape parameter. The Pareto distribution has finite expected value when β > and finite variance when β > 2. The expected value and the variance are given by µ = EX = βk β, β > σ 2 = VarX = βk 2 β 2 β 2, β > 2. In order to obtain Paretoβ, k-distributed X = Gξ, we take G to be the quantile transform, i.e. Gt = k Φt /β where Φ denotes the standard normal distribution function, so that for x k P X i x = P Gξ i x = P Φξ i /β x β k = P ξ i Φ = k x Since we want the X i to be centered, we will in fact take Gt = k Φt /β 9 β k. x βk β. 25

20 If we want X i to be standardized to have mean 0 and variance, we will consider Z = X µ/σ and take Gt = βk 2 β 2 β 2 /2 kφt /β βk β 26 The corresponding distribution function of Z is then β k if z k µ σz+µ σ F Z z = 0 else. 27 and its density function is f Z z = { k β βσσz + µ β if z k µ σ 0 else 28 Pareto3, Data: We first performed simulations with Pareto3, data, i.e. heavy-tailed data with finite variance. In this case, β = 3, k = and we have EX = 3 2 and VarX = 3 4. For a better comparison with the simulations involving fractional Gaussian noise, we also standardize the data, i.e. we consider see 26, Gt = Φt /3 3. 3/4 2 The probability density function of the standardized X is given by see 28, fx = 2 x + 3 if x 3 0 else. G is again strictly decreasing, and by the above results, the Hermite rank of the class of functions {I {Gξi x} F x, x } is m =, the Hermite rank of G itself is m = and J x df x = 2 π, see 20. Numerical integration yields 4 a = E[ξGξ] = s Φs /3 ϕs ds Table 4 shows the observed level of the tests, for various sample sizes and various Hurst parameters. For sample sizes up to n =, 000, the difference-of-means test has level larger than 0%. We conjecture that this is due to the slow convergence in Theorem 3. This conjecture is supported by the outcomes of simulations with sample sizes n = 2, 000 and n = 0, 000; see Table 4. 20

21 Insert Table 4 here Table 5 gives the observed power of the difference-of-means test and the Wilcoxontype test, for sample size n = 500 and various values of the break points and height of level shift. The results show that the Wilcoxon-type test has larger power than the difference-ofmeans test for small level shift h, but that the difference-of-means test outperforms the Wilcoxon type test for larger level shifts. Insert Table 5 here However, the above comparison is not really meaningful, since the difference-of-means test has a realized level of approximately 0% while the Wilcoxon-type test has level close to 5%; see Table 4. Thus we have calculated the finite sample 5%-quantiles of the distribution of the difference-of-means test, using a Monte-Carlo simulation; see Table 6. For example, for n = 500 and H = 0.7, the corresponding critical value is Thus we reject the null hypothesis of no break point if the difference-of-means test statistic is greater than Insert Table 6 here The value of 0.70 should be contrasted to the asymptotic n = value of To obtain the asymptotic critical values, we proceeded as follows: according to Theorem 3, the asymptotic distribution, under the null hypothesis, of the test statistic D n equals the distribution of a sup Zλ λz. 0 λ Thus the asymptotic upper α-quantiles of D n can be calculated as a q α, where q α is the upper α-quantile of the distribution of sup 0 λ Zλ λz, as tabulated in Table. Insert Table 7 here We have then calculated the power of the difference-of-means test through simulation, 2

22 with n = 500, H = 0.7 and the finite sample quantile critical value of 0.70 rather than the asymptotic value of 0.59 see Table 6. Table 7 shows the power of the test. We can now compare the results of the Wilcoxon-type test given in the right-hand side of Table 5 with the finite sample difference-of-means test results given in Table 7. We see that the Wilcoxontype test has better power than the difference-of-means test, except for large level shifts at an early time. Such changes are detected more often by the difference-of-means test. Pareto2, Data: We now choose k = and β = 2, so that the X have finite expectation, but infinite variance. In order to have centered data, we take G as in 25, i.e. Gt = Φt 2. Insert Table 8 here We will now compare both tests, i.e. the Wilcoxon-type test and the difference of means test, although the latter can strictly speaking not be applied because it requires data with finite variance, respectively G L 2, N. By Theorem 2, under the null hypothesis of no change, the Wilcoxon-type test statistic W n converges in distribution towards 2 π sup Z λ λz. 0 λ In fact, as a consequence of Lemma, even the finite sample distribution of W n is the same as for normally distributed data. Table 8 gives the measured level of the Wilcoxon-type test the asymptotic level is 5% and Table 9 suggests it has good power, especially for small shifts in the middle of the observations. Insert Table 9 here Let us now consider the difference-of-means test. Note that, strictly speaking, Theorem 3 cannot be applied in the case of the Pareto data with β = 2 because it requires the variance of the data to be finite. It is interesting nevertheless to use the asymptotic test suggested by Theorem 3. Since G is strictly monotone, the Hermite rank of G is m = as well, by the 22

23 remark following Theorem 3. Using numerical integration, we have calculated a = E [ξgξ] = sφs /2 2ϕs ds = s Φs /2 ϕs ds We clearly see in Table 8 that the difference-of-means test very often falsely rejects the null hypothesis, that is detects breaks where there are none, while the Wilcoxon-type test is robust. Table 9 shows that both tests have good power, but again, the Wilcoxon tests is clearly better, especially for small shifts in the middle of the observations. 6 Conclusion We have investigated the properties of a nonparametric change-point test in the presence of long-range dependent data. Our test is based on Wilcoxon s two sample rank test statistic. We analytically derive the asymptotic distribution of our test statistic, and we perform a simulation study to calculate its power. The simulation study shows that already for moderate sample sizes, our test attains the asymptotic level. For LD Gaussian data, our test has almost the same power as a difference-of-means test. For heavy-tailed data, our new test outperforms the difference-of-means test. In a subsequent paper, we plan to derive the asymptotic distribution of our test statistic under local alternatives. Acknowledgement. Herold Dehling and Aeneas ooch were supported in part by the Collaborative esearch Grant 823, Project C3 Analysis of Structural Change in Dynamic Processes, of the German esearch Foundation. Murad S. Taqqu was supported in part by the NSF grants DMS and DMS at Boston University. The authors wish to thank the referee of this paper for his/her careful reading of the manuscript and for insightful and constructive comments. eferences [] Antoch, J., Hušková, M., Janic, M.A. & Ledwina, T Data driven rank tests for the change point problem. Metrika 68, 5. [2] Aue, A., Hörmann, S., Horváth, L. & eimherr, M Break detection in the covariance structure of multivariate time series. The Annals of Statistics 37, [3] Berkes, I., Horváth, L., Kokoszka, P. & Shao, Q.-M On discriminating between long-range dependence and changes in the mean. The Annals of Statistics 34,

24 [4] Csörgő, M. & Horváth, L Limit Theorems in Change-Point Analysis. J. Wiley & Sons, Chichester. [5] Dehling, H. & Fried, Asymptotic Distribution of Two-Sample Empirical U-Quantiles with Applications to obust Tests for Shifts in Location. Journal of Multivariate Analysis 05, [6] Dehling, H. & Taqqu, M.S The empirical process of some long-range dependent sequences with an application to U-statistics. The Annals of Statistics 7, [7] Dehling, H. & Taqqu, M.S. 99. Bivariate symmetric statistics of long-range dependent observations. Journal of Statistical Planning and Inference 28, [8] Giraitis, L., Leipus,. & Surgailis, D The change-point problem for dependent observations. Journal of Statistical Planning and Inference 53, [9] Horváth, L. & Kokoszka, P The effect of long-range dependence on change-point estimators. Journal of Statistical Planning and Inference 64, [0] Krämer, W. & Sibbertsen, P. & Kleiber, C Long memory versus structural change in financial time series. Allgemeines Statistisches Archiv 86, [] Lehmann, E Nonparametrics: Statistical Methods Based on anks. Holden-Day, San Francisco. [2] Ling, S Testing for change points in time series models and limiting theorems for NED sequences. The Annals of Statistics 35, [3] Shorack, G.. & Wellner, J.A Empirical Processes with Applications to Statistics. John Wiley & Sons, New York. [4] Taqqu, M.S Convergence of integrated processes of arbitrary Hermite rank. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 50, [5] Wang, L Limit theorems in change-point problems with multivariate long-raneg dependent observations. Statistics & Decisions 2, [6] Wang, L. 2008a. Change-Point Detection with ank Statistics in Long-Memory Time- Series Models. Australian & New Zealand Journal of Statistics 50, [7] Wang, L. 2008b. Change-in-mean problem for long memory time series models with applications. Journal of Statistical Computation and Simulation 78, [8] Wied, D., Krämer, W. & Dehling, H. 20. Testing for a change in correlation at an unknown point in time using an extended functional delta method. Econometric Theory, Available on CJO 20 doi:0.07/s Herold Dehling, Fakultät für Mathematik, uhr-universität Bochum, Universitätsstraße 50, 4480 Bochum, Germany. herold.dehling@rub.de 24

25 Figure : Probability density left and distribution function right of sup 0 λ Zλ λz ; calculations based on 0, 000 simulation runs. 25

26 Figure 2: fgn without breaks top left and with a jump after observation 50 this is [λn] with λ = 0.3 of height h = 0.5 top right, h = bottom left and h = 2 bottom right. 26

27 Table : Upper α-quantiles of sup 0 λ Zλ λz, where Z is a standard fbm, for different LD parameters H, based on 0, 000 repetitions. H / α Table 2: Level of difference-of-means test left and level of Wilcoxon-type test right for fgn time series with LD parameter H; 0, 000 simulation runs. Both tests have asymptotically level 5%. n / H n / H

28 Table 3: Power of difference-of-means test left and power of Wilcoxon-type test right for n = 500 observations of fgn with LD parameter H = 0.7, different break points [λn] and different level shifts h. Both tests have asymptotically level 5%. The calculations are based on 0,000 simulation runs. h / λ h / λ Table 4: Level of difference-of-means test left and level of Wilcoxon-type test right for standardised Pareto3,-transformed fgn with LD parameter H; 0, 000 simulation runs. Both tests have asymptotically level 5%. n / H n / H Table 5: Power of difference-of-means test left and power of Wilcoxon-type test right for n = 500 observations of standardised Pareto3,-transformed fgn with LD parameter H = 0.7, different break points [λn] and different level shifts h. Both tests have asymptotically level 5%.The calculations are based on 0,000 simulation runs. h / λ h / λ

29 Table 6: 5%-quantiles of the finite sample distribution of the difference-of-means test under the null hypothesis for Pareto3,-transformed fgn with different LD parameter H and different sample sizes n. The calculations are based on 0,000 simulation runs. H / n ,000 2,000 0, Table 7: Power of the difference-of-means test, based on the finite sample quantiles, for n = 500 observations of Pareto3,-transformed fgn with LD parameter H = 0.7, different break points [λn] and different level shifts h left. The calculations are based on 0,000 simulation runs. h / λ Table 8: Level of difference-of-means test left and level of Wilcoxon-type test right for Pareto2,-transformed fgn with LD parameter H; 0, 000 simuation runs. n / H n / H

30 Table 9: Power of difference-of-means test left and power of Wilcoxon-type test right for n = 500 observations of Pareto2,-transformed fgn with LD parameter H = 0.7, different break points [λn] and different level shifts h. The calculations are based on 0,000 simulation runs. h / λ h / λ

Statistica Sinica Preprint No: SS R3

Statistica Sinica Preprint No: SS-2015-0435R3 Title Subsampling for General Statistics under Long Range Dependence Manuscript ID SS-2015.0435 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202015.0435