On Cramér-von Mises test based on local time of switching diffusion process

On Cramér-von Mises test based on local time of switching diffusion process Anis Gassem Laboratoire de Statistique et Processus, Université du Maine, 7285 Le Mans Cedex 9, France e-mail: Anis.Gassem@univ-lemans.fr Abstract We consider a Cramér-von Mises test for hypothesis that the observed diffusion process has sign-type trend coefficient. It is shown that the limit distribution of the proposed test statistic is defined by the integral type functional of continuous Gaussian process. We provide the Karhunen-Loève expansion on [, 1] of the corresponding limiting process. The representation for the limit statistic allow us to find the threshold. AMS Classification subjects: 62G1, 62G2, 62M2. Key Words: Cramér-von Mises test, Gaussian process, asymptotic properties, Karhunen-Loève expansion. 1 Introduction We consider the Cramér-von Mises (C-vM) test, when the basic model is an ergodic diffusion process, i.e., the observations X T = {X t, t T } are form the stochastic differential equation dx t = S(X t ) dt + dw t, X, t T, (1) with initial value X, Wiener process {W t, t } and unknown to the observer trend coefficient S( ). Diffusion process of this type is widely used as model in many different fields such as biology, physics, economics and finance. Let us introduce the stochastic differential equation dx t = S (X t ) dt + dw t, X, t T, 1

where S (x) is some known function. We have two hypotheses: H : S( ) = S ( ), against alternative H 1 : S( ) S ( ). We observe the trajectory X T of (1) and we test the hypothesis H. Our goal is to study the C-vM test in this problem. The problem considered for this stochastic model is similar to the well-known in classical statistic C-vM test (see, e.g., Anderson and Darling [1, 2], Durbin and Knott [9], Durbin [1] and Stephens [24]). Indeed, in situation of i.i.d. observations X n = (X 1,..., X n ) with c.d.f. F (x) and the basic hypothesis H : F (x) = F (x). The C-vM statistic is, W 2 n = n [ ] 2 ˆFn (x) F (x) df (x), ˆFn (x) = 1 n n i=1 1 {Xi <x}, where ˆF n (x) is the empirical distribution function. Anderson-Darling generalize the previous test by adding a (nonnegative) weight function. Let us denote by β (t) a Brownian bridge, i.e., a continuous Gaussian process with Eβ(t) =, K(t, s) = Eβ(t)β(s) = t s ts. Then the limit behavior of this statistic can be described with the help of this process as follows W 2 n = n ( F n (t) t) 2 dt = β 2 n(t)dt = β 2 (t)dt. Hence the C-vM test ψ n (X n ) = 1 {W 2 n >c α }, with constant c α defined by the equation { } P β 2 (t)dt > c α = α. is of asymptotic size α. The classical solution of this problem requires a preliminary computation of the sequence of eigenvalues λ 1 > λ 2 >... and eigenfunctions f 1 (t), f 2 (t),... of the Fredholm operator f(t) f(t) = K(t, s) f(s) ds. By Mercer s theorem we have K(t, s) = k=1 λ kf k (s)f k (t). Moreover, it is possible to expand β ( ) into the Karhunen-Loève (KL) series (see, e.g., Kac and Siegert [15], Kac [14], Ash and Gardner [3]) β(t) = λk ξ k f k (t), W 2 = λ k ξk, 2 (2) k=1 2 k=1

where {ξ k : k 1} is a sequence of independent normal N (, 1) random variables. The relations (2) are used to approximate the value c α. For certain Gaussian processes, such expansions were obtained with the help of special functions. In particular, the trigonometric functions (case of Brownian bridge), Anderson-Darling process expansion has expression in Legendre polyonom functions. Recently, Bessel functions are used by Deheuvels and Martynov [7], Deheuvels [6] and Gassem [12]. More about Cramér-von Mises theory and the use of KL expansion can be found in [1], [21], [8]. In this present work we are interested by the C-vM test based on the observation X T ={X t, t T } solution of (1) W 2 T = T [ ] 2dFS(x), ft (x) f S (x) ft (x) = Λ T (x). T where ft (x) is the local time type estimator of the density (LTE) (see [18] Section 1.1.3). This test was proposed in [4], but it is not distribution free. Despite the fact that this statistic converges in distribution (under hypothesis) to a functional of a Gaussian process [18], the choice of the threshold c α for the test Φ T (X T ) = 1 {W 2 T >c α } is not easy due to the structure of the covariance. To avoid such difficulty, a weighting of this statistic was introduced to make this test asymptotically distribution free (see kutoyants [19]). The purpose of the paper is to study the C-vM test for hypothesis that the observed process (1) is switching diffusion, i.e., the trend coefficient S (x) = sgn(x) is discontinuous function and taking just two values +1 and 1, the observed process has the form dx t = sgn(x) dt + dw t, X, t T, (3) This work is a continuation of a study started by Gassem in [12], where the limit distribution of the test for this model was studied. Here we provide the KL expansion of the corresponding limiting process and we present the threshold c α of the test. 2 Model of observation Suppose that the observed process is (1), where the trend coefficient S(x) satisfies the conditions of the existence and uniqueness of the solution and this solution has ergodic properties, i.e., there exists an invariant probability distribution F S (x), and for any integrable w.r.t. this distribution function 3

h(x) the law of large numbers holds 1 T T h(x t ) dt h(x) df S (x). These conditions can be found, in Durrett [11] and Kutoyants [18]. The invariant density function f S (x) is defined by the equality f S (x) = 1 { x } G(S) exp 2 S(v) dv, x R, where G(S) is the normalizing constant. This density function can be estimated by the LTE f T (x) = Λ T (x)/t. The local time Λ T (x) admits the Tanaka-Meyer representation (see [22]) Λ T (x) = X T x X x T sgn(x t x) dx t, and recently started to play an important role in statistical inference [18], [5], [16]. Note that this estimator is T asymptotically normal as T (see [18], Proposition 1.51) T (f T (x) f S (x)) = N (, R fs (x, x)). Moreover, this normed difference converge weakly to the limit Gaussian process (see [18], Theorem 4.13) with zero mean and covariance function ( ) [1{ξ>x} F S (ξ)][1 {ξ>y} F S (ξ)] R fs (x, y) = 4f S (x)f S (y)e. f S (ξ) 2 3 Choice of the threshold 3.1 C-vM test It is easy to see that (3) is an ergodic diffusion process with the stationary density f S (x) = e 2 x, for x R. Suppose that we observed the trajectory X T = {X t, t T } of (1) then to test the hypothesis H we can use the estimator LTE for construction of goodness-of-fit test based on the C-vM statistic W 2 T = T [f T (x) f S (x)] 2 df S (x) = η 2 T (s) ds 4

The transformation Y t = F S (X t ) simplifies the writing, because by the Itô formula the diffusion process Y t satisfies the differential equation Y t = f S (X t ) [2S (X t )dt + dw t ], Y = F S (X ) with reflecting bounds in and 1 and under hypothesis H has uniform on [, 1] invariant distribution. Fix a number α (, 1) and define the class K α of tests of asymptotic level 1 α as follows: K α = {φ T : lim T E S φ T (X T ) α}. Let us denote by c α the value defined by the equation { } P ηf(s) 2 ds > c α = P ( ) W 2 > c α = α, (4) We have the following Proposition 3.1. The C-vM test Φ T (X T ) = 1 {W 2 T >c α}, belongs to K α. Proof. The main idea of the proof is to show (under H ) by using the method of Ibragimov-Khasminskii (Theorem A22, in [13]) that the distribution of WT 2 converge to the distribution of W 2. The detailed proof is given in [12]. 3.2 KL expansion of Gaussian process η f (t) In this section, we establish the KL expansion of the Gaussian process η f (t), for t 1, the representation for the limit statistic defined by the equation (4) allow to find the threshold c α. Under hypothesis H the process η f (t) admits the following representation η f (t) = 2 (1 2t 1 ) 1 {s>t} s 1 2s 1 dw s, The last integral is with respect to double-sided Wiener process, i.e,. W s = W + (s), s 1/2 and W s = W ( s), s 1/2, where W + ( ) and W ( ) are two independent Wiener processes. By the law of large numbers we have ( T (f 1 T ( ) 1{Xt >x} F S (X t ) 2 T (x) f S (x)) = 2f S (x)w dt) T f S (X t ) ( ( ) 1{y>x} F S (y) 2 2f S (x)w df S(y)). f S (y) 5

where W (u), u is some Wiener process. t = F S (x) we obtain ( η T (t) 2 (1 2t 1 ) W Making the transformation ( ) 1{s>t} s 2 ds) 1 2s 1 1 {s>t} s = 2 (1 2t 1 ) 1 2s 1 dw s = η f (t) The covariance function of η f ( ), for s, t 1, can be written as follows: (1 {u>t} u)(1 {u>s} u) R f (s, t) = 4(1 2t 1 )(1 2s 1 ) du (1 2u 1 ) { 2 s t u 2 = 4(1 2t 1 )(1 2s 1 ) (1 2u 1 ) du 2 s t s t u(1 u) 1 (1 2u 1 ) du + (1 u) 2 2 s t (1 2u 1 ) du 2 Let s t, then by a direct calculation we have 4st ln(4st) + 4s(1 t), s, t 1/2, R f (s, t) = 4s(1 t) ln(4s(1 t)) + 4s(1 t), s 1/2, t > 1/2, 4(1 s)(1 t) ln(4(1 s)(1 t)) + 4s(1 t), s, t > 1/2. Therefore, we can write R f (s, t) = (1 2s 1 )(1 2t 1 ) ln [(1 2s 1 )(1 2t 1 )] Of course, 4(1 {s>t} s)(1 {t>s} t), s, t 1. R f (t, t) = 2(1 2t 1 ) 2 ln (1 2t 1 ) + 4t(1 t), t 1. (6) By the Karhunen-Loève Theorem, we have the following Proposition 3.2. The Gaussian process η f (t) has a KL expansion given by η f (t) = n=1 + ξ n n π ν n=1 n 2 sgn(1/2 t) sin (n π(1 2 t 1 )) ξ n { [ ( 2 α(νn ) Si(ν n ) ) ( sin(νn ) ν n cos(ν n ) } ) ν n α(ν n ) sin (ν n (1 2 t 1 )) α (ν n (1 2 t 1 )) ] }, t 1, 6. (5) (7)

where {ξ n, n 1}, {ξ n, n 1} denote two independent sequences of independent and identically distributed N (, 1) random variables and ν n, n = 1, 2,..., are the positive zeros of f( ), defined by the equation where with f (t) = Si(t) [α(t) t α(t)] G(t) [sin(t) t cos(t)], α(t) = Ci(t) sin(t) Si(t) cos(t), α(t) = d dt α(t), G(t) = t Ci(t) = γ + ln(t) + t cos(s) s ds, Si(t) = t sin(s) s α(s) s the cosine and sine integral respectively, γ =.577215... the Euler s constant. ds, ds, To determine the eigenfunctions, one must solve the integral equa- Proof. tion: λ n ψ n (t) = R f (t, s) ψ n (s) ds, t [, 1]. (8) The fact that we have an absolute value in the covariance function R f (s, t), we will take ψ n (t) = ψ 1,n (t)1 {t<1/2} + ψ 2,n (t)1 {t 1/2}. Here, the λ n n = 1... are eigenvalues yet to be determined. We have the boundary conditions ψ 1,n () = ψ 2,n (1) =, ψ 1,n (1/2) = ψ 2,n (1/2), (9) λ n ψ n (1/2) = (1 2s 1 ) [1 + ln(1 2s 1 )] ψ n (s) ds. (1) Taking derivative of equation (8) with respect to t, we obtain λ n ψ n(t) = R f (t, s) t ψ n (s) ds, t [, 1]. Evaluating the derivative of the last equation we obtain λ n ψ n(t) + 4 ψ n (t) = 4 C n, t [, 1]. (11) 1 2t 1 7

where C n = (1 2s 1 ) ψ n (s) ds. (12) Let ν n = 1/ λ n, the last equation can be written as follow ψ 1,n(t) + 4 νnψ 2 1,n (t) = 2 ν2 nc n, t < 1/2, t ψ 2,n(t) + 4 ν 2 nψ 2,n (t) = 2 ν2 nc n 1 t, t 1/2. When C n = the equation (11) has solution { ψ1,n (t) = A ψ n (t) = 1,n sin(2ν n t) + B 1,n cos(2ν n t), t < 1/2, ψ 2,n (t) = A 2,n sin(2ν n t) + B 2,n cos(2ν n t), t 1/2. Now, using the Lagrange method we have { A 1,n sin(2ν n t) + Ḃ1,n cos(2ν n t) = A 1,n cos(2ν n t) Ḃ1,n sin(2ν n t) = νncn t { cos(2ν A = 1,n = ν n C n t) n, t sin(2ν Ḃ 1,n = ν n C nt) n t. Then, the first equation has a solution of the following form where with ψ 1,n (t) = A 1,n sin(2ν n t) + B 1,n cos(2ν n t) + ν n C n α(2ν n t), t < 1/2, Ci(t) = γ + ln(t) + α(t) = Ci(t) sin(t) Si(t) cos(t) t cos(s) s ds, Si(t) = In the same manner for the second equation, we have t sin(s) s ds. { A 2,n sin(2ν n t) + Ḃ2,n cos(2ν n t) = A 2,n cos(2ν n t) Ḃ2,n sin(2ν n t) = νncn 1 t { cos(2ν A 2,n = ν n C n t) n, = 1 t sin(2ν Ḃ 2,n = ν n C nt) n 1 t. Using the following equalitie cos(2ν n t) = cos(2ν n (1 t)) cos(2ν n ) + sin(2ν n (1 t)) sin(2ν n ), sin(2ν n t) = cos(2ν n (1 t)) sin(2ν n ) sin(2ν n (1 t)) cos(2ν n ). 8

Then, the second equation has a solution of the following form ψ 1,n (t) = A 2,n sin(2ν n t) + B 2,n cos(2ν n t) + ν n C n α(2ν n (1 t)), t 1/2. Therefore, using the boundary conditions (9), the general solution is of the following form ψ n (t) = (A 1,n 1 {t<1/2} A 2,n 1 {t 1/2} ) sin(ν n (1 2t 1 )) + ν n C n α(ν n (1 2t 1 )), t [, 1]. (13) Now, returning to the equation (1), replacing ψ n (s) by this general form and making the change of variable u = ν n s, we have ψ n (1/2) = ν2 n 2 = 1 2 (A 1,n A 2,n ) s (1 + ln(s)) [(A 1,n A 2,n ) sin(ν n s) + 2 C n ν n α(ν n s)] ds νn u [1 ln(ν n ) + ln(u)] sin(u) du νn + C n ν n u [1 ln(ν n ) + ln(u)] α(u) du. By a simple integration by part we have where νn νn u [1 ln(ν n ) + ln(u)] sin(u) du = 2 sin(ν n ) ν n cos(ν n ) Si(ν n ). u [1 ln(ν n ) + ln(u)] α(u) du = 2 α(ν n ) ν n α(ν n ) G(ν n ). α(t) = d α(t) = Si(t) sin(t) + Ci(t) cos(t), dt G(t) Finally we have = t α(s) s ds. (A 1,n A 2,n ) [sin(ν n ) ν n cos(ν n ) Si(ν n )] + 2 C n ν n [α(ν n ) ν n α(ν n ) G(ν n )] =. (14) In the same manner for equality (12), we obtain C n = 1 2 ν 2 n νn u [(A 1,n A 2,n ) sin(u) + 2 C n ν n α(u)] du = 1 (A 2 νn 2 1,n A 2,n ) [sin(ν n ) ν n cos(ν n )] + 1 ν 2 n C n ν n [α(ν n ) ν n α(ν n ) + ν n ], 9

and we have (A 1,n A 2,n ) (sin(ν n ) ν n cos(ν n )) + 2 C n ν n (α(ν n ) ν n α(ν n )) =. (15) Using the boundary conditions (9) and equalities (14), (15), we get the linear system sin(ν n ) sin(ν n ) Si(ν n ) Si(ν n ) 2 ν n G(ν n ) sin(ν n ) ν n cos(ν n ) (sin(ν n ) ν n cos(ν n )) 2 ν n (α(ν n ) ν n α(ν n )) A 1,n A 2,n C n =, (16) which has non-zero solutions in (A 1,n, A 2,n, C n ), if and only if sin(ν n ) sin(ν n ) det Si(ν n ) Si(ν n ) 2 ν n G(ν n ) sin(ν n ) ν n cos(ν n ) (sin(ν n ) ν n cos(ν n )) 2 ν n (α(ν n ) ν n α(ν n )) where = 4 ν n sin(ν n ) f (ν n ) =, (17) f (ν n ) = Si(ν n ) (α(ν n ) ν n α(ν n )) G(ν n ) (sin(ν n ) ν n cos(ν n )). (18) Next we consider the function ψ n (t) of the form (13) and A 1,n, A 2,n, C n fulfilling (16), and by ν 1,n, ν 2,n, n = 1, 2,..., respectively solutions of sin(ν n ) = and f(ν n ) =. We have the following two cases: Case 1: When ν n = ν 1,n = n π, for some n = 1, 2,..., sin (ν 1,n ) =, so that (16) implies that A 1,n = A 2,n and C n =. The requirement that ψ2 n(t) dt = 1 yields, therefore ψ n (t) = 2 sgn(1/2 t) sin (n π(1 2 t 1 )), t [, 1]. Case 2: When ν n = ν 2,n, for some n = 1, 2,..., f (ν n ) =, so that (16) implies that A 1,n = C n (α(ν 2,n ) ν 2,n α(ν 2,n )) / (sin(ν 2,n ) ν 2,n cos(ν 2,n )) = A 2,n. The requirement that ψ2 n(t) dt = 1 yields, therefore [ ( ) 2 α(ν2,n ) ψ n (t) = sin (ν 2,n (1 2 t 1 )) Si(ν 2,n ) and this ends the proof. α(ν 2,n ) ν 2,n ) α (ν 2,n (1 2 t 1 )) ( sin(ν2,n ) ν 2,n cos(ν 2,n ) 1 ], t [, 1].

3.3 Simulation As a direct consequence of (7), the distribution of η2 f (t) dt, coincide with the distribution of the quadratic form (under hypothesis H ), we obtain the identity ηf(t) 2 dt = d ξn 2 n 2 π + ξn 2, (19) 2 n=1 ν 2 n=1 n allowing to evaluate the distribution of η2 f (t) dt by a number of methods (see, e.g., Smirnov [23], Martynov [2], and the references therein). We may check from the formulas (7) and (19) that R f (t, t) dt = E ηf(t) 2 1 dt = n 2 π + 1 = 1 2 6 + 5 18 = 4 9, (2) n=1 ν 2 n=1 n where for the first sum we have used the Euler s formula n=1 1/n2 = π 2 /6 and numerically (Secant Method) for the second sum. To check (2), we infer from (6) that R f (t, t) dt = 2 [ 2t(1 t) + t 2 ln(t) ] dt = 2 ( 1 3 1 9 ) = 4 9, (21) We see therefore that (2),(21) are in agreement. The random variable η2 f (t) dt is a weighted sum of independent χ2 1 components, Thus, calculating the 1 α quantiles of this random variable we obtaining the threshold c α of C-vM test defined in equation (4). 7 Threshold choice of 1 η 2 f (t)dt 6 5 4 c α 3 2 1.2.4.6.8 1 α Particulary, if α =, 5 we obtain c,5 = 2, 422. 11

Acknowledgments The author is very grateful to Professor Y. Kutoyants for suggesting this problem, and fruitful discussion and Professor M. Kleptsyna for important help. References [1] Anderson, T. W. and Darling, D. A. 1952. Asymptotic theory of certain Goodness of fit criteria based on stochastic process. The Annals of Mathematical Statistics., 23, 193-212. [2] Anderson, T. W. and Darling, D. A. 1954. A test of goodness of fit. Journal of the American Statistical Association., 49, 765-769. [3] Ash, R. B., Gardner, M. F., 1975. Topics in Stochastic Processes. Academic Press, New York. [4] Dachian, S., Kutoyants, Yu. A., 27. On the Goodness-of-Fit Testing for Some Continuous Time Processes. Statistical Models and Methods for Biomedical and Technical Systems, F.Vonta et al. (Eds), Birkhäuser, Boston, 395-413. [5] Bosq, D. and Davydov, Y., 1998. Local time and density estimation in continuous time. Math. Methods Statist., 8, 1, 22-45. [6] Deheuvels, P., 26. Karhunen-Loève expansions for mean-centered Wiener processes. IMS Lecture Notes-Monograph Series High Dimensional Probability. Vol. 51 62-76. [7] Deheuvels, P., Martynov, G. V., 23. Karhunen-Loève expansions for weighted Wiener processes and Brownian bridges vie Bessel functions. Prog. Probab. 55, 57-93. [8] Deheuvels, P., Martynov, G. V., 1996. Cramér-von Mises-Type Tests with Applications to Tests of Independence for Multivariate Extreme- Value Distributions. Commun. Statist. Theory Meth., 25(4), 871-98. [9] Durbin, J. and Knott, M. 1972. Components of Cramér-von Mises statistics. II. Journal of the Royal Statistical Society B., 34, 29-37. [1] Durbin, J., 1973. Distribution Theory for tests Based on the Sample Distribution Function, SIAM, Philadelphia. 12

[11] Duret, R., 1996. Stochastic Calculs: A Pratical Introduction. Boca Raton: CRC Press. [12] Gassem, A., 28 Goodness-of-Fit test for switching diffusion, prepublication 8-7, Université du Maine (http://www.univ-lemans.fr/sciences/ statist/download/gassem/article anis.pdf.) [13] Ibragimov, I.A., Khasminskii, R.Z., 1981. Statistical Estimation: Asymptotic Theory. Springer-Verlag, New York. [14] Kac, M., 1951. On some connections between probability theory and differential integral equations. Proc. Second Berkeley Sympos. Math. Statist. Probab., 18-215. [15] Kac, M. and Siegert, A. J. F., 1947. An explicit representation of a stationary Gaussian process. Ann. Math. Statist., 18 438-442. [16] Kutoyants, Yu. A., 1997, Some problems of nonparametric estimation by observations of ergodic diffusion process. Statist. Probab. Lett. 32 311-32. [17] Kutoyants, Yu. A., 2. On parameter estimation for switching ergodic diffusion processes. CRAS Paris, t. 33, Série 1, 925-93. [18] Kutoyants, Yu. A., 24. Statistical Inference for Ergodic Diffusion Processes, London: Springer-Verlag. [19] Kutoyants, Yu. A., 29. On the Goodness-of-fit Testing fot Ergodic Diffusion Process, prepublication 9-2, Université du Maine (http://www. univ-lemans.fr/sciences/statist/download/kutoyants/gof-ed.pdf). [2] Martynov, G. V., 1975. Computation of the distribution function of quadratic forms in normal random variables. Theor. Probab. Appl., 2, 797-89. [21] Martynov, G. V., 1992. Statistical Tests Based on Empirical Processes and Related Question. J. Soviet. Math., 61, 2195-2271. [22] Revuz, D., Yor, M., 1991. Continuous Martingales and Brownian Motion. Springer, N.Y. [23] Smirnov, N.V., 1937. On the distribution of the ω 2 criterion. Rec. Math., 6, 3-26. [24] Stephens, M.A., 1974. Components of goodness-of-fit statistics. Annales de l Institut Henri Poincaré., Section B 1, 37-54. 13