Testing the Equality of Covariance Operators in Functional Samples

Size: px

Start display at page:

Download "Testing the Equality of Covariance Operators in Functional Samples"

Eileen Chandler
6 years ago
Views:

1 Scandinavian Journal of Statistics, Vol. 4: 38 5, 3 doi:./j x Board of the Foundation of the Scandinavian Journal of Statistics. Published by Blackwell Publishing Ltd. Testing the Equality of Covariance Operators in Functional Samples STEFAN FREMDT and JOSEF G. STEINEBACH Mathematical Institute, University of Cologne LAJOS HORVÁTH Department of Mathematics, University of Utah PIOTR KOKOSZKA Department of Statistics, Colorado State University ABSTRACT. We propose a non-parametric test for the equality of the covariance structures in two functional samples. The test statistic has a chi-square asymptotic distribution with a known number of degrees of freedom, which depends on the level of dimension reduction needed to represent the data. Detailed analysis of the asymptotic properties is developed. Finite sample performance is examined by a simulation study and an application to egg-laying curves of fruit flies. Key words: asymptotic distribution, covariance operator, functional data, quadratic forms, two sample problem. Introduction The last decade has seen increasing interest in methods of functional data analysis which offer novel and effective tools for dealing with problems where curves can naturally be viewed as data objects. The books by Ramsay & Silverman (5) and Ramsay et al. (9) offer comprehensive introductions to the subject, the collection by Ferraty & Romain () reviews some recent developments focusing on advances in the relevant theory, while the monographs of Bosq (), Ferraty & Vieu (6) and Horváth & Kokoszka () develop the field in several important directions. Despite the emergence of many alternative ways of looking at functional data, and many dimension reduction approaches, the functional principal components (FPCs) still remain the most important starting point for many functional data analysis procedures, and Reiss & Ogden (7), Gervini (8), Yao & Müller (), Gabrys et al. () are just a handful of illustrative references. The FPCs are the eigenfunctions of the covariance operator. This paper focuses on testing if the covariance operators of two functional samples are equal. By the Karhunen Loève expansion, this is equivalent to testing if both samples have the same set of FPCs. Benko et al. (9) developed bootstrap procedures for testing the equality of specific FPCs. Panaretos et al. () proposed a test of the type we consider, but assuming that the curves have a Gaussian distribution. The main result of Panaretos et al. () follows as a corollary of our more general approach (theorem ). A generalization to non-gaussian data was discussed in Panaretos et al. (, ). For some recent work confer also Boente et al. () who studied a related approach together with a corresponding bootstrap procedure. Despite their importance, two sample problems for functional data received relatively little attention. In addition to the work of Benko et al. (9) and Panaretos et al. (), the relevant references are Horváth et al. (9) and Horváth et al. () who focus, respectively, on the regression kernels in functional linear models and the mean of functional data exhibiting temporal dependence. For a recent contribution, see also Gaines et al. (), who

2 Scand J Statist 4 Equality of covariance operators 39 use a likelihood ratio-type approach for testing the equality of two covariance operators. Clearly, if some population parameters of two functional samples are different, estimating them using the pooled sample may lead to spurious conclusions. Due to the importance of the FPCs, a relatively simple and non-parametric procedure for testing the equality of the covariance operators is called for. The remainder of this paper is organized as follows. Section sets out the notation and definitions. The construction of the test statistic and its asymptotic properties are developed in section 3. Section 4 reports the results of a simulation study and illustrates the procedure by application to egg-laying curves of Mediterranean fruit flies. The proofs of the asymptotic results of section 3 are given in section 5.. Preliminaries Let X, X,..., X N be independent, identically distributed random variables with values in L [, ], the Hilbert space of square-integrable R-valued functions on [, ], and set EX i (t) = μ(t) and cov(x i (t), X i (s)) = C(t, s). We assume that another sample X *, X *,...X* M is also available and let μ * (t) = EX i * (t) and C * (t, s) = cov(x i * (t), X i * (s)) for t, s [, ]. We wish to test the null hypothesis H : C = C * against the alternative H A that H does not hold. A crucial assumption considering the asymptotics of our test procedure will be that Θ N,M = N Θ (, ) as N, M. () M + N For the construction of our test procedure, we will use an estimate of the asymptotic pooled covariance operator R of the two given samples [cf. (4)] which is defined by the kernel R(t, s) = ΘC(t, s) + ( Θ)C * (t, s). In the case of samples X i and X j * of Gaussian random functions, the latter approach has successfully been applied by Panaretos et al. () to construct an asymptotic test for checking the equality of two covariance operators (see also Panaretos et al., ). Denote by (λ, φ ), (λ, φ ),..., the eigenvalue/eigenfunction pairs of R, which are defined by λ k φ k (t) = Rφ k (t) = Throughout this paper, we assume R(t, s)φ k (s)ds, t [, ], k <. () λ > λ > > λ p > λ p +, (3) i.e. there exist at least p distinct (positive) eigenvalues. Under assumption (3), we can uniquely (up to signs) choose φ,..., φ p satisfying (), if we require φ i =, where always denotes the L -norm, e.g. for x L ([, ]), ( / x = x (t)dt). Thus, under (3), φ i, i p is an orthonormal system that can be extended to an orthonormal basis φ i, i <. Board of the Foundation of the Scandinavian Journal of Statistics.

3 4 S. Fremdt et al. Scand J Statist 4 If H holds, then (λ i, φ i ), i <, are also the eigenvalues/eigenfunctions of the covariance operators C of the first and C * of the second sample. To construct a test statistic which converges under H, we can therefore pool the two samples, as explained in section The test and the asymptotic results Along the lines of Panaretos et al. (), our procedure is also based on projecting the observations onto a suitably chosen finite-dimensional space. To define this space, introduce the empirical pooled covariance operator ˆR N,M defined by the kernel where ˆR N,M (t, s) = N + M X N (t) = N + N (X k (t) X N (t))(x k (s) X N (s)) (X k * (t) X * M(t))(X k * (s) X * M(s)), (4) X k (t) and X * M(t) = M X k * (t) are the sample mean functions. Let ( ˆλ i,ˆφ i ) denote the eigenvalues/eigenfunctions of ˆR N,M, i.e. ˆλ i ˆφ i (t) = ˆR N,M ˆφ i (t) = ˆR N,M (t, s) ˆφ i (s)ds, t [, ], i N + M, with ˆλ ˆλ. We can and will assume that the ˆφ i form an orthonormal system. We consider the projections and â k (i) = X k X N,ˆφ i = â * k( j) = X * k X * M,ˆφ j = (X k (t) X N (t)) ˆφ i (t)dt (5) ( ) X k * (t) X * M(t) ˆφ j (t)dt, (6) where, denotes the inner product of two elements of the Hilbert space L [, ]. To test H, we compare the matrices ˆΔ N and ˆΔ * M with entries and ˆΔ N (i, j) = â k (i)â k ( j), i, j p, N ˆΔ * M(i, j) = â * M k(i)â * k( j), i, j p. We note that ˆΔN (i, j) ˆΔ * M(i, j) is the projection of ˆφ i (t) ˆφ j (s), where Ĉ N (t, s) Ĉ * M(t, s) in the direction of Ĉ N (t, s) = (X k (t) X N (t))(x k (s) X N (s)) N Board of the Foundation of the Scandinavian Journal of Statistics.

4 Scand J Statist 4 Equality of covariance operators 4 and Ĉ * M(t, s) = M (X k * (t) X * M(t))(X k * (s) X * M(s)) are the empirical covariances of the two samples. We create the vector ˆξ N,M from the columns below the diagonal of ˆΔ N ˆΔ * M as follows: ˆΔ N (, ) ˆΔ * M(, ) ( ˆξ N,M = vech ˆΔN ˆΔ ) * ˆΔ M = N (, ) ˆΔ * M(, ).. (7) ˆΔ N (p, p) ˆΔ * M(p, p) For the properties of the vech operator, we refer to Abadir & Magnus (5). Next, we estimate the asymptotic covariance matrix of (MN/(N + M)) / ˆξ N,M. Note that, in general, this estimate differs from the one which was used in the Gaussian case (cf. Panaretos et al.,, and theorem ). Let ˆL N,M (k, k ) = ( Θ N,M ) â`(i)â`(j)â`(i )â`(j ) Ĉ N ˆφ N i,ˆφ j Ĉ N ˆφ i,ˆφ j ` = + Θ N,M â *`(i)â *`(j)â *`(i )â *`(j ) Ĉ * M ˆφ M i,ˆφ j Ĉ * M ˆφ i,ˆφ j, ` = where i, j, i, j depend on k, k (see below), and Ĉ N Ĉ N defined as (Ĉ * M) is interpreted as an operator with Ĉ N ˆφ i = Ĉ N (t, s) ˆφ i (s)ds. (An analogous definition holds for Ĉ * M.) From this definition it follows that Ĉ N ˆφ i,ˆφ j = N â`(i)â`( j). l = There are other ways to estimate the asymptotic covariance matrix. We note that one can use ˆL * N,M(k, k ) instead of ˆL N,M (k, k ), where ˆL * N,M(k, k ) is defined like ˆL N,M (k, k ), but Ĉ N ˆφ i,ˆφ j and Ĉ * M ˆφ i,ˆφ j are replaced with if i j and ˆλ i if i = j. In the same spirit, Ĉ N ˆφ i,ˆφ j and Ĉ * M ˆφ i,ˆφ j are replaced with for i j and ˆλ i if i = j. The index (i, j) is computed from k in the following way: Let k p(p + ) = k +, i = p i + and j = p j +. (8) We look at an upper triangle matrix (a i,j ). Then, for column j, wehavethat(j )j / < k j ( j + )/. Thus, j = k + 4 and i = k ( j )j /, where r = mink Z : k r for r R. Consequently, the index (i, j) can be computed from k via Board of the Foundation of the Scandinavian Journal of Statistics.

5 4 S. Fremdt et al. Scand J Statist 4 j = p p(p + ) k and i = k + p pj + j( j ). (9) With the above notation, we can formulate the main result of this paper in the non- Gaussian case. The latter case has briefly been mentioned (without any mathematical details) in the concluding remarks of Panaretos et al. () (see also Panaretos et al., ). Theorem. We assume that H, () and (3) hold, and E(X (t)) 4 dt <, E(X * (t)) 4 dt <. () Then, NM N + M ˆξ T N,M ˆL ˆξ D N,M N,M χ p(p + )/, as N, M, where χ p(p + )/ stands for a χ random variable with p(p + )/ degrees of freedom. Theorem implies that the null hypothesis is rejected if the test statistic ˆT = NM N + M ˆξ T N,M ˆL N,M ˆξ N,M exceeds a critical quantile of the chi-square distribution with p(p + )/ degrees of freedom. If both samples are Gaussian random processes, the quadratic form ˆξ T N,M ˆL ˆξ N,M N,M can be replaced with the normalized sum of the squares of ˆΔ N,M (i, j) ˆΔ * N,M(i, j), as stated in theorem (cf. Panaretos et al., ). Theorem. If X, X * then, as N, M, ˆT = NM N + M i,j p are Gaussian processes and the conditions of theorem are satisfied, ( ˆΔ N (i, j) ˆΔ * M(i, j)) ˆλ i ˆλ j D χ p(p + )/. Observe that the statistic ˆT can be written as ˆT = NM ( ˆΔ N (i, j) ˆΔ * N(i, j)) p ( + ˆΔ M (i, i) ˆΔ * M(i, i)) N + M ˆλ i < j p i ˆλ j i = ˆλ. i Next, we discuss the asymptotic consistency of the testing procedure based on theorem. Analogously to the definition of ˆξ N,M we define the vector ξ = (ξ(),..., ξ(p(p + )/)) using the columns of the matrix ( ) D = (C(t, s) C * (t, s))φ i (t)φ j (s)dtds () instead of ˆΔ N ˆΔ * M, i.e. ξ = vech(d). i,j =,...,p Board of the Foundation of the Scandinavian Journal of Statistics.

6 Scand J Statist 4 Equality of covariance operators 43 Theorem 3. We assume that H A, (), (3) and () hold. Then, there exist random variables ĥ = ĥ (N, M),..., ĥp(p + )/ = ĥ p(p + )/ (N, M), taking values in, such that, as N, M, max ˆξ N,M (i) ĥiξ(i) = o P () () i p(p + )/ and therefore ˆξ N,M P ξ, (3) where denotes the Euclidean norm. If ξ and the p largest eigenvalues of C and C * are positive, we also have ˆT P, as N, M. (4) The assumption that the p largest eigenvalues of C and C * are positive implies that the random functions X i, i =,..., N, and X j *, j =,..., M, are not included in a (p )-dimensional subspace. Remark. The application of the test requires the selection of the number p of the empirical FPCs to be used. A rule of thumb is to choose p so that the first p empirical FPCs in each sample (i.e. those calculated as the eigenfunctions of Ĉ N and Ĉ * M) explain about 85 9 per cent of the variance in each sample. Choosing p too large generally negatively affects the finite sample performance of tests of this type, and for this reason, we do not study asymptotics as p tends to infinity. It is often illustrative to apply the test for a range of the values of p; each p specifies a level of relevance of differences in the curves or kernels. A good practical approach is to look at the Karhunen Loève approximations of the curves in both samples, and choose p which gives approximation errors that can be considered unimportant. Cross validation has also been suggested in the literature without investigating its properties in detail. For a more formal discussion of this selection, confer also section 3.3 in Panaretos et al. (). 4. A simulation study and an application We first describe the results of a simulations study designed to evaluate finite sample properties of the tests based on the statistics ˆT and ˆT. The emphasis is on verifying the advantage of a non-parametric procedure, i.e. to see the robustness to the violation of the assumption of normality. We simulated Gaussian curves as Brownian motions and Brownian bridges, and non-gaussian curves via X (t) = A sin(πt) + B sin(πt) + C sin(4πt), (5) where A = 5Y, B = 3Y, C = Y 3 and Y, Y, Y 3 are independent t 5 -distributed random variables (similarly X * (t) for the second sample). All curves were simulated at equidistant points in the interval [, ], and transformed into functional data objects using the Fourier basis with 49 basis functions. For each data generating process, we used one thousand replications. Table shows the empirical sizes for non-gaussian data. The test based on ˆT has severely inflated size, due to the violation of the assumption of normality. As documented in Panaretos et al. (), and confirmed by our own simulations, this test has very good empirical size when the data are Gaussian. The test based on ˆT is conservative, especially for smaller sample sizes. This is true for both Gaussian and non-gaussian data; there is not much difference in the empirical size of this test for different data-generating processes. Board of the Foundation of the Scandinavian Journal of Statistics.

7 44 S. Fremdt et al. Scand J Statist 4 Table. Empirical sizes of the tests based on statistics ˆT and ˆT for non-gaussian data. The curves in each sample were generated according to (5) ˆT ˆT Sample sizes % 5% % % 5% % p = N = M = N = M = N = M = p = 3 N = M = N = M = N = M = Table gives an example of the empirical power of the test based on statistic ˆT. The test was carried out for two equally sized samples of, 5 and realizations, respectively, of (5) for the first sample and scaled versions of (5), i.e. X * (t) = cx (t), for the second sample. The results are displayed for a selection of values for the scaling parameter c. It can be seen that in all cases the power increases with the sample size. As can be expected, the convergence of the power towards improves for larger deviations (c ) from the null hypothesis. Since, due to the inflated size of the test based on ˆT in the non-gaussian case (cf. Table ), its power is (misleadingly) higher than that of the test based on ˆT and thus will not be displayed here. We also studied a Monte Carlo version of the test based on the statistic ˆT 3 = NM(N + M) ˆξ T ˆξ N,M N,M and found that its finite sample properties were similar to those of the test based on ˆT. We now describe the results of the application of both tests to an interesting data set consisting of egg-laying trajectories of Mediterranean fruit flies (medflies). The data were kindly made available to us by Hans Georg Müller. This data set has been extensively studied in biological and statistical literature; see Müller & Stadtmüller (5) and references therein. We consider 534 egg-laying curves of medflies who lived at least 34 days, but we only consider the egg-laying activities on the first 3 days. We examined two versions of these egglaying curves. The curves are scaled such that the functions in either version are defined on Table. Power of the test based on statistic ˆT for non-gaussian data. The curves in the equally sized samples were generated according to (5) in the first sample and as a scaled version of (5) in the second sample, i.e. X * (t) = cx (t) c =.8 c =.9 p = p = 3 p = p = 3 N, M % 5% % % 5% % % 5% % % 5% % c =. c =. p = p = 3 p = p = 3 N, M % 5% % % 5% % % 5% % % 5% % Board of the Foundation of the Scandinavian Journal of Statistics.

8 Scand J Statist 4 Equality of covariance operators 45 the interval [, ]. Version curves (denoted X i (t)) are the absolute counts of eggs laid by fly i on day 3t. Version curves (denoted Y i (t)) are the counts of eggs laid by fly i on day 3t relative to the total number of eggs laid in the lifetime of fly i. The 534 flies are classified into long-lived, i.e. those who lived 44 days or longer, and short-lived, i.e. those who died before the end of the 43rd day after birth. In the data set, there are 56 short-lived and 78 long-lived flies. This classification naturally defines two samples: Sample : the egglaying curves X i (t)(resp.y i (t)), t, i =,,..., 56 of the short-lived flies. Sample : the egg-laying curves X j * (t)(resp.y j * (t)), < t 3, j =,,..., 78 of the long-lived flies. The egg-laying curves are very irregular; Fig. shows ten (smoothed) curves of short- and longlived flies for version, and Fig. shows ten (smoothed) curves for version (both using a B-spline basis for the representation). Table 3 shows the p-values for the absolute egg-laying counts (version ). For the statistic ˆT, the null hypothesis cannot be rejected irrespective of the choice of p. For the statistic ˆT, the result of the test varies depending on the choice of p. As explained in section 3, the usual recommendation is to use the values of p which explain 85 to 9 per cent of the variance Fig.. Ten randomly selected smoothed egg-laying curves of short-lived medflies (left panel) and ten such curves for long-lived medflies (right panel) Fig.. Ten randomly selected smoothed egg-laying curves of short-lived medflies (left panel) and ten such curves for long-lived medflies (right panel), relative to the number of eggs laid in the fly s lifetime. Board of the Foundation of the Scandinavian Journal of Statistics.

9 46 S. Fremdt et al. Scand J Statist 4 Table 3. The p-values (in per cent) of the test based on statistics ˆT and ˆT applied to absolute medfly data. Here f p denotes the fraction of the sample variance explained by the first p FPCs, i.e. f p = ( p ˆλ k )/( N + M ˆλ k ) p-values p ˆT ˆT f p For such values of p, ˆT leads to a clear rejection. Since this test has however overinflated size, we conclude that there is little evidence that the covariance structures of version curves for long- and short-lived flies are different. For the version curves, the statistic ˆT yields p-values equal to zero (in machine precision), potentially indicating that the covariance structures for the short- and long-lived flies are different. The assumption of a normal distribution is however questionable, as the QQ-plots in Fig. 3 show. These QQ-plots are constructed for the inner products Y i, e k and Yi, e k, where the Y i are the curves from one of the samples (we cannot pool the data to construct QQ-plots because we test if the stochastic structures are different), and e k is the kth element of the Fourier basis. The normality of a functional sample implies the normality of all projections onto a complete orthonormal system. For X i, e k, the QQ-plots show a strong deviation from a straight line for some projections. Almost all projections Y i, e k have QQ-plots indicating a strong deviation from normality. It is therefore important to apply the non-parametric test based on the statistic ˆT. The corresponding p-values for version are displayed in Table 4. For most values of p, these p-values Fig. 3. Normal QQ-plots for the scores of the version medfly data with respect to the first two Fourier basis functions. Left sample, Right sample. Board of the Foundation of the Scandinavian Journal of Statistics.

10 Scand J Statist 4 Equality of covariance operators 47 Table 4. The p-values (in per cent) of the test based on statistics ˆT applied to relative medfly data; f p denotes the fraction of the sample variance explained by the first p FPCs, i.e. f p = ( p ˆλ k )/( N + M ˆλ k ) p-values p ˆT f p p ˆT f p indicate the rejection of H. Many of them hover around the 5 per cent level, but since the test is conservative, we can with confidence view them as favouring H A. The above application confirms the properties of the statistics established through the simulation study. It shows that while there is little evidence that the covariance structures for the absolute counts are different, there is strong evidence that they are different for relative counts. 5. Proofs of the results of section 3 The proof of theorem follows from several lemmas, which we establish first. We can and will assume without loss of generality that μ(t) = μ * (t) = for all t [, ]. We will use the identity N / (X k (t) X N (t))(x k (s) X N (s)) = N / X k (t)x k (s) N / X N (t) X N (s), (6) and an analogous identity for the second sample. Our first lemma establishes bounds in probability which will often be used in the proofs. Lemma. Under the assumptions of theorem, as N, M, N / X k (t)x k (s) C(t, s) = O P(), (7) N / X N (t) = O P (), (8) and M / X k * (t)x k * (s) C * (t, s) = O P(), (9) M / X * M(t) = O P (), () where here and in the sequel the notation is also used for the corresponding norm in L ([, ] ). Board of the Foundation of the Scandinavian Journal of Statistics.

11 48 S. Fremdt et al. Scand J Statist 4 Proof. These are classical estimates and can easily be obtained by a straightforward calculation of the second moments. Note, for example, that [ E X N / k (t)x k (s) C(t, s)] dt ds = EX (t)x (s) C(t, s) dt ds, so, by Markov s inequality, we have X k (t)x k (s) C(t, s) = O P (). N / Similar arguments yield (8) (). Confer also Dauxois et al. (98) for an early reference. Lemma shows that the estimation of the mean functions, cf. the definition of the projections â k (i) and â k( j) in (5) and (6), has an asymptotically negligible effect. Lemma. Under the assumptions of theorem, for all i, j p, as N, M, and N / ˆΔN (i, j) = N / M / ˆΔ* M(i, j) = M / ( X k,ˆφ i X k,ˆφ j + O ) P N / ( X k *,ˆφ i X k *,ˆφ j + O ) P M /. Proof. Using (6) and (8), we have by the Cauchy Schwarz inequality, N / X N (t) X N (s) ˆφ i (t) ˆφ j (s)dtds = N / N / X N (t) ˆφ i (t)dt N / X N (s) ˆφ j (s)ds ( ( N / N / X N (t) ) ) / ( ( dt ˆφ i (t)dt N / X N (s) ) ds ( = N / N / X N (t) ) dt ) / ˆφ j (s)ds ( = O ) P N /. The second part can be proven in the same way. We now state bounds on the distances between the estimated and the population eigenvalues and eigenfunctions. These bounds are true under the null hypothesis and extend the corresponding one sample bounds. Lemma 3. If the conditions of theorem are satisfied, then, as N, M, max ( ˆλ i λ i = O ) P (N + M) / i p and max ˆφ ( i ĉ i φ i = O ) P (N + M) /, i p where ĉ i = ĉ i (N, M) = sign( ˆφ i, φ i ). Board of the Foundation of the Scandinavian Journal of Statistics.

12 Scand J Statist 4 Equality of covariance operators 49 Proof.These estimates are also well-known (cf., e.g. Bosq,, lemma 4.3 and assertion (4.43), or Horváth & Kokoszka,, lemmas..3). Note that the first rate above is independent of p, whereas the second one may actually depend on the projection dimension p. Lemma 3 now allows us to replace the estimated eigenfunctions by their population counterparts. The random signs ĉ i must appear in the formulation of lemma 4, but they cancel in the subsequent results. Lemma 4. If the conditions of theorem are satisfied, then, for all i, j p, as N, M, ( NM N + M = ) / ( ˆΔ N (i, j) ˆΔ * M(i, j)) ) / ( NM N + M N X k,ĉ i φ i X k,ĉ j φ j M X k *,ĉ i φ i X k *,ĉ j φ j + o P (). Proof. We write N X k,ˆφ i X k,ˆφ j C(t, s) ˆφ i (t) ˆφ j (s)dtds = N / (X N / k (t)x k (s) C(t, s)) ˆφ i (t) ˆφ j (s)dtds. Using lemmas 3 we get N / = (X k (t)x k (s) C(t, s)) (ˆφ i (t) ˆφ j (s) ĉ i φ i (t)ĉ j φ j (s)) dt ds (X N / k (t)x k (s) C(t, s)) ( ˆφ i (t) ĉ i φ i (t)) ˆφ j (s) + ĉ i φ i (t)( ˆφ j (s) ĉ j φ j (s)) dt ds (X N / k (t)x k (s) C(t, s)) dt ds + = N / = o P (). ) / (ˆφ i (t) ĉ i φ i (t)) ˆφ j (s)dtds (X N / k (t)x k (s) C(t, s)) dt ds ) / φ i (t)( ˆφ j (s) ĉ j φ j (s)) dt ds (X k (t)x k (s) C(t, s)) ˆφ i ĉ i φ i + ˆφ j ĉ j φ j Board of the Foundation of the Scandinavian Journal of Statistics.

13 5 S. Fremdt et al. Scand J Statist 4 Similar arguments give that (X * M / k (t)x k * (s) C * (t, s)) ˆφ i (t) ˆφ j (s) ĉ i φ i (t)ĉ j φ j (s) dt ds = o P(). Since C = C *, the lemma is proven. The previous lemmas isolated the main terms in the differences ˆΔ N (i, j) ˆΔ * M(i, j). The following lemma describes the limits of these main terms (without the random signs). Lemma 5. If the conditions of theorem are satisfied, then, as N, M, where Δ N,M (i, j), i, j p D Δ(i, j), i, j p, Δ N,M (i, j) = ( ) / NM N + M N X k, φ i X k, φ j M and Δ(i, j), i, j p is a Gaussian matrix with EΔ(i, j) = and X k *, φ i X k *, φ j, EΔ(i, j)δ(i, j ) = ( Θ)E( X, φ i X, φ j X, φ i X, φ j ) E( X, φ i X, φ j )E( X, φ i X, φ j ) + ΘE( X *, φ i X *, φ j X *, φ i X *, φ j ) E( X *, φ i X *, φ j )E( X *, φ i X *, φ j ). Proof. First we note that E X, φ i X, φ j = E X *, φ i X *, φ j = if i j, λ i if i = j. Since E( X, φ i X, φ j ) < and E( X *, φ i X *, φ j ) <, the multivariate central limit theorem implies the result. Finally, we need an asymptotic approximation to the covariances ˆL N,M (k, k ). Let L N,M (k, k ) = ( Θ N,M ) a`(i)a`( j)a`(i )a`( j ) Ĉ N ˆφ N i,ˆφ j Ĉ N ˆφ i,ˆφ j ` = + Θ N,M a *`(i)a *`( j)a *`(i )a *`( j ) Ĉ * M ˆφ M i,ˆφ j Ĉ * M ˆφ i,ˆφ j, where ` = a`(i) = X`, φ i and a *`(i) = X *`, φ i, and i, j, i, j are determined from k and k as in (8) and (9). Lemma 6. If the conditions of theorem are satisfied, then for all k, k p(p + )/, ˆL N,M (k, k ) ĉ i ĉ j ĉ i ĉ j L N,M (k, k ) = o P () as N, M, where (i, j) and (i, j ) are determined from k and k as in (8) and (9). Proof. The result follows from lemma 3 along the lines of the proof of lemma 4. Board of the Foundation of the Scandinavian Journal of Statistics.

14 Scand J Statist 4 Equality of covariance operators 5 Proof of theorem. According to lemma and lemmas 4 6, the asymptotic distribution of ˆξ T N.M ˆL ˆξ N,M N,M does not depend on the signs ĉ,...,ĉ p, so it is sufficient to prove the result for ĉ = = ĉ p =. The law of large numbers yields that P L N,M (k, k ) L(k, k ), () where L(k, k ) = ( Θ)E ( a (i)a ( j)a (i )a ( j ) ) E ( a (i)a ( j)a (i )a ( j ) ) + ΘE ( a (i)a * ( * j)a (i * )a ( * j ) ) E ( a (i)a * ( * j)a (i * )a ( * j ) ). () The result then follows from lemmas, 4 and 5. Proof of theorem. In the case of Gaussian observations, Δ(i, j), i j p, are independent normal random variables with mean and EΔ λi λ (i, j) = j if i = j, λ i if i = j. Now the result follows from lemmas 5. For more details, we refer to Panaretos et al. (). Proof of theorem 3. First, we observe that by the law of large numbers we have ( ˆR N,M (t, s) R(t, s)) dt ds = o P (). Hence, using the result in section VI.. of Gohberg et al. (99), we get that max ˆλ i λ i = o P () (3) and i p max ˆφ i ĉ i φ i = o P (), (4) i p where ĉ i = ĉ i (N, M) = sign( ˆφ i, φ i ). Relations (3) and (4) show that lemma 3 remains true. It follows from the law of large numbers and (4) that for all i, j p ˆΔ N (i, j) ˆΔ * M(i, j) ĉ i ĉ j (C(t, s) C * (t, s))φ i (t)φ j (s) dt ds ) = (ĈN (t, s) Ĉ * M(t, s) ˆφ i (t) ˆφ j (s)dtds ĉ i ĉ j (C(t, s) C * (t, s))φ i (t)φ j (s)dtds + )) (ĈN (t, s) C(t, s) (Ĉ* M(t, s) C * (t, s) ˆφ i (t) ˆφ j (s)dtds (C(t, s) C * (t, s))( ˆφ i (t) ˆφ j (s) ĉ i φ i (t)ĉ i φ j (s)) dt ds Ĉ N C + Ĉ M C * + C C * ˆφ i ˆφ j ĉ i φ i ĉ i φ j = o P (), where the fact that φ i = = ˆφ i was used. Hence, the proof of () is complete. It is also clear that () implies (3). Board of the Foundation of the Scandinavian Journal of Statistics.

15 5 S. Fremdt et al. Scand J Statist 4 Next, we observe that lemma 6 and () remain true under the alternative. Now, by some lengthy calculations, it can be verified that L given in () is positive definite so that (4) follows from (3). Acknowledgements Research partially supported by NSF grants DMS 954 at the University of Utah, DMS and DMS at Colorado State University and DFG grant STE 36/- at the University of Cologne. References Abadir, K. M. & Magnus, J. R. (5). Matrix algebra. Cambridge University Press, New York. Benko, M., Härdle, W. & Kneip, A. (9). Common functional principal components. Ann. Statist. 37, 34. Boente, G., Rodriguez, D. & Sued, M. (). Testing the equality of covariance operators. In Recent advances in functional data analysis and related topics (ed. F. Ferraty), Physica-Verlag, Heidelberg. Bosq, D. (). Linear processes in function spaces. Springer, New York. Dauxois, J., Pousse, A. & Romain, Y. (98). Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. J. Multivariate Anal., Ferraty, F. & Romain, Y. eds (). The Oxford handbook of functional data analysis. Oxford University Press, Oxford. Ferraty, F. & Vieu, P. (6). Nonparametric functional data analysis: theory and practice. Springer, New York. Gabrys, R., Horváth, L. & Kokoszka, P. (). Tests for error correlation in the functional linear model. J. Amer. Statist. Assoc. 5, 3 5. Gaines, G., Kaphle, K. & Ruymgaart, F. (). Application of a delta-method for random operators to testing equality of two covariance operators. Math. Meth. Statist., Gervini, D. (8). Robust functional estimation using the spatial median and spherical principal components. Biometrika 95, Gohberg, I., Goldberg, S. & Kaashoek, M. A. (99). Classes of linear operators. Operator theory: Advances and applications, 49. Birkhäuser, Basel. Horváth, L. & Kokoszka, P. (). Inference for functional data with applications. Springer Series in Statistics. Springer, New York (in press). Horváth, L., Kokoszka, P. & Reeder, R. (). Estimation of the mean of functional time series and a two sample problem. J. Roy. Statist. Soc., Ser. B (in press). Horváth, L., Kokoszka, P. & Reimherr, M. (9). Two sample inference in functional linear models. Canad. J. Statist. 37, Müller, H. G. & Stadtmüller, U. (5). Generalized functional linear models. Ann. Statist. 33, Panaretos, V. M., Kraus, D. & Maddocks, J. H. (). Second-order comparison of Gaussian random functions and the geometry of DNA minicircles. J. Amer. Statist. Assoc. 5, Panaretos, V. M., Kraus, D. & Maddocks, J. H. (). Second-order inference for functional data with application to DNA minicircles. In Recent advances in functional data analysis and related topics (ed. F. Ferraty), Physica-Verlag, Heidelberg. Ramsay, J. O. & Silverman, B. W. (5). Functional data analysis. Springer, New York. Ramsay, J., Hooker, G. & Graves, S. (9). Functional data analysis with R and MATLAB. Springer, New York. Reiss, P. T. & Ogden, R. T. (7). Functional principal component regression and functional partial least squares. J. Amer. Statist. Assoc., Yao, F. & Müller, H. G. (). Functional quadratic regression. Biometrika 97, Received February, in final form February Josef G. Steinebach, Mathematical Institute, University of Cologne, Weyertal 86-9, 593 Köln, Germany. jost@math.uni-koeln.de Board of the Foundation of the Scandinavian Journal of Statistics.

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)