Large sample distribution for fully functional periodicity tests

Large sample distribution for fully functional periodicity tests Siegfried Hörmann Institute for Statistics Graz University of Technology Based on joint work with Piotr Kokoszka (Colorado State) and Gilles Nisol (ULB) Clément Cerovecki (ULB) and Vaidotas Characiejus (ULB) Sept 21, 2018

Pollution data N = 167 observations of PM10 measured in Graz (Austria) from 10/1/15 to 03/15/16. 3 4 5 6 7 8 9 MO TU WE TH FR SA SU Is there a significant difference in the daily average w.r.t. day of week?

Pollution data We use ANOVA: H 0 : µ MO = = µ SU v.s. H A : H 0 doesn t hold. We obtain p-value of 0.72 = do not reject H 0.

Pollution data Same time period with NO (nitrogen monoxide). 2 4 6 8 10 12 14 MO TU WE TH FR SA SU We obtain p-value of 0.25 = do not reject H 0.

Dependence Autocorrelation functions of daily averages of PM10 and NO. Series PM10 Series NO ACF 0.0 0.2 0.4 0.6 0.8 1.0 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag 0 5 10 15 20 Lag

Functional data PM10 data in Graz are available in 30 minutes resolution. = Functional time series Goal: check whether there is underlying periodic signal PM10 20 40 60 80 0 2 4 6 8 10 Time

Weekday averages Intraday mean curves from half-hourly observations. PM10 Mean curves 4.5 5.0 5.5 6.0 6.5 PM10 NO 4 6 8 10 NO mean curves Monday Tuesday Wednesday Thursday Friday Saturday Sunday Time

Three models for periodic functional data Y t (u) = µ(u) + [α cos(tθ) + β sin(tθ)] w 0 (u) +Z t (u), (1) Y t (u) = µ(u) + s t w 0 (u) +Z t (u), (2) Y t (u) = µ(u) + w t (u) +Z t (u), (3) with and s t = s t+d and w t (u) = w t+d (u) d s t = 0 and d w t (u) = 0. t=1 t=1

Settings Part 1: Part 2: Noise: θ is known. θ is unknown. We start with Z t i.i.d. N (0, Γ). Later we generalize to very general stationary processes.

Projections Choose a set of functions v 1 (u),..., v p (u) and compute Y tk = Y t (u)v k (u)du, which gives rise to the vector models Y t = µ + [α cos(tθ) + β sin(tθ)] w 0 + Z t ; Y t = µ + s t w 0 + Z t ; Y t = µ + w t + Z t.

LR tests for projections Let us assume that d is odd and set q = (d 1)/2, ϑ k = 2πk d. Theorem Assume Σ = Var(Z 1 ) > 0 known. LR-test statistics are: T M 1 := A(ϑ1 )Σ 1 A (ϑ 1 ) T M 2 := A(ϑ1,..., ϑ q )Σ 1 A (ϑ 1,..., ϑ q ) T M 3 := A(ϑ1,..., ϑ q )Σ 1 A (ϑ 1,..., ϑ q ) tr Here A(θ i1,..., θ ik ) = [ R(θ i1 ),..., R(θ ik ), C(θ i1 ),..., C(θ ik ) ] where R(θ) + ic(θ) = 1 N Y k e ikθ. N k=1

Model 3: relation to MANOVA Note that where T M 3 = A(ϑ 1,..., ϑ q )Σ 1 A (ϑ 1,..., ϑ q ) tr T MAV T MAV = d k=1 N d (Y k Y ) Σ 1 (Y k Y ), where Y k = 1 n n t=1 Y (t 1)d+k, 1 k d. (Balanced design).

Remarks Approach is easy and purely multivariate. In practice we replace Σ by estimator. NOTE: If we plug in Σ A this test is asymptotically equivalent to Wilks Lambda. For all three tests explicit null-distributions are available. The downside of this method: we need to have a good basis which picks up the signal.

Remarks The standard multivariate test of MacNeill for Model 1 is S M1 := D N (ϑ 1 ) 2 = A(ϑ 1 )Σ 1 A (ϑ 1 ) tr. (Here D N is the DFT of standardized data.)

Rescaling The tests are computed from the standardized data = asymptotically pivotal tests; Σ 1/2 Y 1,..., Σ 1/2 Y N. = component processes with larger variation are not concealing potential periodic patterns in components with little variance. While this is clearly a very desirable property for multivariate data samples, one may favor a different perspective for functional data.

Fully functional approach Let A(θ i1,..., θ ik ) be a 2k-vector of functions computed from functional DFT 1 N Y t (u)e itθ. N t=1 Fully functional test for Model (3) ANOVA model: Again T F 3 := A(ϑ 1,..., ϑ q )A (ϑ 1,..., ϑ q ) tr. T F3 T FAV = d k=1 N d Y k Y 2. Asymptotic distribution of T F3 (functional of DFT s) can be derived and then carried over to T FAV.

Theorem Under L 2 -m-approximability the functional ANOVA test statistic satisfies (d 1)/2 d N d Y k Y 2 d 2 Ξ k, k=1 k=1 where Ξ k ind. HExp(λ 1 (ϑ k ), λ 2 (ϑ k ),...), and λ l (ϑ k ) are the eigenvalues of F ϑk. Remarks: 1. non-pivotal test; 2. consistent estimation of spectral density operator and its eigenvalues is possible; 3. good empirical approximation.

Empirical levels Synthetic but realistic data: PM-residuals resampled. Used to generate a functional MA(5) process. 1.000 repetitions. α = 5% α = 10% T M 1 T M 2 T M 3 T F 3 T M 1 T M 2 T M 3 T F 3 FF 5.0 8.3 4.7 9.9 p = 1 5.0 3.9 3.9 9.6 8.7 8.7 5.9 4.4 4.4 9.8 9.9 9.9 p = 2 6.4 4.3 3.9 11.2 8.7 8.0 6.0 5.4 4.1 10.6 9.8 9.1 p = 3 6.8 3.8 3.9 12.2 7.9 6.9 5.8 5.5 4.2 10.6 9.6 7.9 p = 5 7.4 6.4 6.2 15.5 11.6 11.4 6.7 6.4 5.7 12.5 12.2 10.7 Table 1: Empirical size (in %) at the nominal level α of 5% and 10% for dependent time series with sample sizes N = 210 (top rows) and N = 420 (bottom rows).

Power To study a realistic alternative we add back the weekday means to previous time series. The signal-to-noise ratio in this example is approx 1/30. N = 210 N = 420 T M1 T M2 T M3 T F3 T M1 T M2 T M3 T F3 FF 72.9 99.9 p = 1 14.3 26.5 26.5 21.7 56.6 56.6 p = 2 50.2 89.4 89.9 76.9 99.7 99.8 p = 3 73.4 96.4 98.1 92.2 100 100 p = 5 99.22 100 100 100 100 100 Table 2: Empirical rates (in %) when testing at nominal level α of 5%. Results are based on 1,000 Monte Carlo simulation runs.

Real data tests for NO data T M1 T M2 T M3 T F3 FF (100%) 0.006 v(u) 1 0.305 0.099 0.099 p = 1 (68%) 0.247 0.076 0.076 p = 2 (81%) 0.495 0.204 0.172 p = 3 (87%) < 10 5 < 10 5 < 10 5 p = 5 (97%) < 10 5 < 10 5 < 10 5 Table 3: The p-values for NO data. In parentheses, the percentage of variance explained by the first p principal components on which the curves are projected.

Uniform Test Assume now that we would like to test for a periodic pattern when the frequency θ is unknown. We consider a test based on M N = max D N(θ j ) 2 = max A(θ j)a (θ j ) tr. j=1,...,q j=1,...,q

Scalar setting In the scalar setting, we know that M N log q d G, q N/2 (4) where G is the standard Gumbel distribution. Simple explanation: If data are iid Gaussian, then D N (θ j ) 2 iid Exp(1). Without Gaussianity: we typically only have that ( DN (ω 1 ) 2,..., D N (ω q ) 2) d (E 1,..., E q ), E j iid Exp(1) if q is fixed. Nevertheless, Davis and Mikosch (1999) obtained (4) for i.i.d. sequences and linear processes, and later to a more general notion of dependence in Lin and Liu (2009).

Main results For p 1, we define M p N = max j=1,...,q D p N (ω j) 2, where D p K.L. N N (ω) = N 1/2 t=1 k=1 p X t, v k v k e itω and the v k s are the eigenfunctions of the operator C 0, associated to eigenvalues λ 1 > > λ p. Theorem 1. Theorem 2. M p N bp N λ 1 d G, for a fixed p 1 M p N N b p N N λ 1 d G, for some p = pn + Theorem 3. M N b N λ 1 d G where b p N = λ 1 log q λ p 1 j=2 log(1 λ j/λ 1 ) b N = lim p + bp N λ 1 log q + E X 0 2

Idea of proof (with λ 1 = 1) Write M N b N = M N M p N }{{} A 1 p + MN bp N }{{} A 2 p + bn b N. }{{} A 3 Obviously A 3 0 when p. We can show that A 1 0 if p fast enough. The challenge is A 2! We will need that p grows slowly enough and hope that the intersection of sequences p = p N in A 2 and A 1 is not empty.

Idea of proof Let M p N be computed as Mp N, with a Gaussians sequence Y 1,..., Y N such that Var(Y 1 ) = Var(X 1 ), and we define ρ N,p = sup P(M p N x) P( M p N x) x R Then P ( M p N bp N x) e e x ρn,p + P( M p N bp N x) e e x. }{{}

Gaussian approximation We have that where {M p N x} = {S X N A}, X t, v 1 cos(ω 1 ) X t, v 1 sin(ω 1 ) X t =. R2pq X t, v p sin(ω q ) (2pq Np) and A is 2p sparsely convex, i.e. finite intersections of sets whose indicator function depends on at most 2p components.

Gaussian approximation We use a result of Chernozhukov, Chetverikov, and Kato (2017) sup P(SN X A) P(S N Y A), A A sp where A sp is the class of s sparsely convex subsets of R d. We had to work a bit around since in this bound there was one constant left, which depends (translated into our setup) in an unspecified way on λ p. Doesn t matter if p is fixed! In the end we deduce that ρ N,p p3 log(n) λ 1/2 p N. 1/6

Conclusion of the proof In particular Th. 3 holds if: o λ k ρ k, ρ < 1 and E X 1 s <, with s > 10. (p N log(n)) o λ k 1 k ν, ν > 1 and E X 1 s <, with s very big. (p N N γ )

Comments o All the results are still valid when b N is replaced by b n. o We work under E X 1 4 <, but when p is fixed, we can use a truncation argument to relax it to E X 1 2+ɛ <. Hence, we have extended the result of Davis and Mikosch (1999). o If p is fixed, we have considered the dependent case, i.e. of multivariate linear processes X t = l=1 Ψ l(z t l ), where the Ψ l s are matrices and (Z t ) t Z is an i.i.d. sequence in H. In that case, we rather work with 1 2 L N = max F ω j D N (ω j ) 2 j=1,...,q

Uniform test Consider a functional time series X t = s t + ε t where t s t is periodic of unknown frequency ω and (ε t ) t Z is some iid noise in H. We wish to test whether { H 0 : s t = 0 t, H 1 : t such that s t > 0 From the previous result, we deduce that under H 0, T = M N b N λ 1 d G. On the other hand, we have that T P + under H 1.

Simulation We consider 3 different alternative hypothesis: o low frequency, s (1) t =.5 cos(t 2π 40 ) o high frequency, s (2) o mixture, s (3) t t =.5 cos(t 2π 7 ) =.3 cos(t 2π 2π 40 ) +.3 cos(t 20 ) +.3 cos(t 2π 7 ) 4 2 0 2 4 0 5 10 15 20 25 time

Simulations Rejection rates under the null and 3 different alternatives at level α = 0.05 with increasing sample size, computed with 1000 replications. N = 100 N = 200 N = 500 N = 1000 X (0) 0.06 0.05 0.06 0.06 X (1) 0.18 0.87 0.97 1.00 X (2) 0.31 0.49 0.97 1.00 X (3) 0.14 0.39 0.87 1.00 X (i) t = s (i) t + ε t, t = 1,..., n with s (0) t := 0, and the curves ε t are bootstrapped from the PM 10 data set.

Simulations We set ω = argmax D N (ω j ) 2 j=1,...,q X (3) X (2) 0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 X (1) 0.0 1.0 2.0 3.0 X (0) 0.0 1.0 2.0 3.0 n=100 n=200 n=500 n=1000

Summary We have proposed several tests for periodicity of functional data. We distinguish between projection based approach and fully functional approach. Fully functional approach theoretically and practically appealing. Projection based approach very powerful, but needs tuning. We developed our tests from LR approach which improves also multivariate theory. Extension to dependent data. Extension to unknown length of period.