Ef ciency of the empirical distribution for ergodic diffusion

Bernoulli 3(4), 1997, 445±456 Ef ciency of the empirical distribution for ergodic diffusion YURY A. KUTOYANTS Laboratoire de Statistique et Processus, UniversiteÂ du Maine, 7217 Le Mans, France e-mail: kutoyants@univ-lemans.fr We consider the problem of stationary distribution function estimation at a given point by the observations of an ergodic diffusion process on the interval [, T] as T!1. First we introduce a lower (minima) bound on the risk of all estimators and then we prove that the empirical distribution function attains this bound. Hence this estimator is asymptotically ef cient in the sense of the given bound. Keywords: diffusion process; minima bound; nonparametric estimation. 1. Introduction We consider the problem of estimation of a one-dimensional distribution function F() by the observation of a diffusion process fx t,<t<tgas T!1. We suppose that the process X t, t >, possesses ergodic properties with invariant measure P and F() ˆ P ((, ]). We introduce a lower (minima) bound on the risks of all estimators, then we de ne the asymptotically ef cient estimators as estimators attaining this bound, and nally we show that the empirical distribution function is asymptotically ef cient in this problem. The same program was already realized for several other models of observations. For independent identically distributed random variables this was done by Dvoretsky et al. (1956) see also Millar 1983, section VIII, and references therein); then Penev (1991) proved the asymptotic ef ciency of the empirical distribution function for eponentially ergodic Markov chains with state space [, 1] (the more general case was treated by van der Vaart and Wellner 199) and further generalizations were given by Greenwood and Wefelmeyer (1995). The difference between these results lies in the types of model, the regularity conditions and the choice of the de nitions of asymptotic optimality, and common to all of them is the possibility of n 1=2 - consistent estimation of the underlying distribution. An ehaustive description of such situations has been given in Bickel et al. (1993). Our statement of the problem can be termed semi-parametric because we estimate the one-dimensional parameter ô ˆ F(), i.e., the value of the unknown function at one point only (Bickel 1993, p. 59). The diffusion process X t, t >, is supposed to be a solution of the stochastic differential equation dx t ˆ S(X t )dt ó(x t )dw t, X ˆ, < t < T, (1) 135±7265 # 1997 Chapman & Hall

446 Y. A. Kutoyants where fw t, t > g is the standard Wiener process. The trend coef cient S(:) is unknown to the observer and the diffusion coef cient ó (:) 2 is a known positive function. Recall that the diffusion coef cient can be estimated without error by the observation of (1) (see, for eample, Genon-Catalot and Jacod 1994). We suppose that the functions S(:) and ó (:) satisfy the global Lipschitz condition js() S(y)j jó () ó (y)j < Lj yj; so (1) has a unique strong solution (see, for eample, Liptser and Shiryayev (1977, Theorem 4.6). This condition will not be used directly; so any other condition providing that property of the solution can replace the given one. We assume for simplicity of eposition that the functions S(:) and ó (:) are continuous. We suppose that S(v) dv!, as jyj! 1, ó (v) 2 and the condition G(S) G ˆ ó (y) 2 S(v) ep 2 ó (v) 2 dv dy, 1 are ful lled. The class of such functions S(:) we denote by È. These conditions provide the eistence of a stationary distribution F() ˆ G(S) (y) ó 2 S(v) ep 2 ó (v) 2 dv dy for the diffusion process (1) (Mandl 1968). Under this condition the random process X t, t >, has ergodic properties, i.e., for any measurable function g(:) with Ejg(î)j, 1 (î has F(:) as distribution function) P 1 lim T!1 T T! g(x t ) dt ˆ Eg(î) ˆ 1 (Mandl 1968, p. 93). Below we are interested in the problem of nonparametric (S(:) is unknown) estimation of the function F(). The empirical distribution function (EDF) ^F T () ˆ 1 T fx T t, g dt (2) is a natural estimator of the function F(). Recall that the EDF ^F n () ˆ 1 X n fx n j, g jˆ1 in the case of independent identically distributed observations fx 1,..., X n g is uniformly consistent, asymptotically normal and asymptotically ef cient. In this paper it is shown that for the ergodic diffusion process (1) the EDF (2) has similar properties.

Ef ciency of empirical distribution 447 The density function will be denoted as f (y) ˆ G(S) ó (y) 2 ep 2 S(v) ó (v) 2 dv : (3) 2. Lower bound To derive the lower minima bound we shall follow the same approach which was applied in the density estimation problem for this model of observations by Kutoyants (1995). The idea of approimating this nonparametric bound by a family of bounds for parametric families and then choosing the worst parametric family was suggested by Stein (1956) and was realized by Levit (1973), and Koshevnik and Levit (1976) (see also the monographs by Ibragimov and Khasminskii 1981, Chapter 4 and Bickel et al. 1993, Chapter 3). In the present work we take the same approach. So as in the work of Kutoyants (1995) we some function S(:) 2 È and suppose that the observed process is dx t ˆ [S(X t ) H(X t )] dt ó (X t ) dw t, X ˆ, < t < T, (4) where H(:) is such that S(:) H(:) 2 È. Introduce the set U ä fh(:): sup jh()j < ä, S(:) H(:) 2 Èg 2R and denote by fp (T) H, H( :) 2 U ä g the corresponding family of measures P (T) H induced by the process (4) in the space C [, T] of continuous functions on [, T]. The mathematical epectation with respect to this measure will be denoted by E H. Further consider the distribution function F H () ˆ G(S H) (y) ó 2 ep 2 S(v) H(v) ó (v) 2 dv dy, with corresponding normalizing constant G(S H) and put ( ) 2 F(î ^ )f1 F(î _ )g I ˆ 4E, (5) ó (î) f (î) where î ^ ˆ min (î, ) and î _ ˆ ma (î, ). Introduce the set È ˆ fs(:): sup G(S H), 1, S(:) 2 Èg: H( : )2U ä The set È is not empty. For eample, if ó (:) 1 and S(y) sgn y < ã for all large values of jyj with some ã. then S(:) 2 È for any ä, ã. We suppose that the loss function l (:) has the following `usual' properties: l (y, z) ˆ l (y z), y, z 2 R; l (:) is non-negative on R; l () ˆ is continuous at z ˆ but is not identically ; l (:) is symmetric and non-decreasing on R ; nally the function l (z) grows as z! 1 more slowly than any one of the functions ep (åz 2 ), å. (see, for eample, Ibragimov and Khasminskii 1981).

448 Y. A. Kutoyants Theorem 1. Let S(:) 2 È and I. ; then lim lim inf sup E H l [T 1=2 ff T () F H ()g] > 1 ä! T!1 F T H( : )2U ä (2ð) 1=2 where inf is taken over all possible estimators F T (). l (I =2 ) e 2 =2 d, (6) Proof. The supremum on U ä is estimated from below by supremum on some parametric family with a special parametrization passing through the model with S(:). For this parametric model we apply the Hajek±Le Cam inequality, then maimize the right-hand side of the last inequality and nd the worst parametric family (with minimal Fisher information). This last quantity (risk) is just the right-hand side of (6). Let us introduce the parametric family of functions S h () ˆ S() (h ô)ø()ó () 2, where h 2 (ô ã, ô ã), ã., and the function ø(:) has a compact support, S h (:) 2 È. Then, for small ã, (h ô)ø()ó () 2 2 U ä : The corresponding family of stochastic differential equations is dx t ˆ [S(X t ) (h ô)ø(x t )ó (X t ) 2 ] dt ó (X t ) dw t, X ˆ, < t < T, (7) with h 2 (ô ã, ô ã) and the corresponding family of measures is fp (T) h, h 2 (ô ã, ô ã)g. We can consider the problem of the estimation of h based on observation of (7). In our assumptions the stochastic process (7) has also the ergodic properties if h ˆ ô; hence the family of measures fp (T) h, h 2 (ô ã, ô ã)g is locally asymptotically normal at the point h ˆ ô, i.e., the likelihood ratio admits the representation (see, for eample, Kutoyants 1984, Theorem 3.3.8) dp (T) ô T =2 u dp ô (T) (X ) ˆ ep uä T u2 2 I ø r T, where u 2 R, I ø ˆ ø() 2 ó () 2 f () d, P ô lim T!1 r T ˆ, Ä T ˆ T =2 T ø(x t )ó (X t ) dw t, L ô fä T g ) N (, I ø ): Thus by the Hajek±Le Cam inequality lim lim inf sup E h l ft 1=2 (h T h)g > El (æi =2 ø ), (8) ä! T!1 h T jh ôj, ä with æ N (, 1) and, as was noted by Ibragimov and Khasminskii (1981, p. 217), the difference h T h can be replaced by h T h o(jh ôj) without changing the right-hand side of (8).

Ef ciency of empirical distribution 449 We put F h () ˆ G h ó (y) 2 ep 2 S(v) dv 2(h ô) ó (v) 2 ø(v) dv dy: The function ø(:) has a compact support; hence we can epand F h (:) in powers of h ô in the vicinity of the point ô and obtain F h () ˆ F() 2(h ô) ø(v) dv f (y) dy F() ø(v) dv f (y) dy o(jh ôj) ˆ F() 2(h ô)ef[ fî, g F()]Ø(î)g o(jh ôj), where î as before has the distribution function F(:) and Ø(î) ˆ î ø(y) dy: Let us put ô ˆ F(), ô h ˆ F h () and introduce the class K of functions ø(:) satisfying the equality Ef[ fî, g F()]Ø(î)g ˆ 1 2 : Then for ø(:) 2 K we have the epansion F h () ˆ F() 2(h ô)ef[ fî, g F()]Ø(î)g o(jh ôj) ˆ h o(jh ôj) or ô h ˆ h o(jh ôj). Let the function ø(:) 2 K then sup E H l [T 1=2 ff T () F H ()g] > sup E h l [T 1=2 ff T () F h ()g] H( : )2U ä jh ôj, ã ˆ sup E h l [T 1=2 fh T h o(jh ôj)g], jh ôj, ã where h T ˆ F T () is an arbitrary estimator of h. So we can choose ã ˆ ã(ä)! as ä! in such a way that for any estimator F H () lim lim sup E H l [T 1=2 ff T () F H ()g] ä! T!1 H( : )2U ä > lim ã! lim sup T!1 jh ôj, ã E h l [T 1=2 fh T h o(jh ôj)g] > 1 l (I =2 (2ð) 1=2 ø ) e 2 =2 d: The last integral is a monotonically decreasing function of I ø ˆ Eø(î) 2 ó (î) 2 ; hence to nd the worst parametric family (de ned by ø(:)) we have to minimize I ø on the class K.

45 Y. A. Kutoyants Fubini's theorem allows us to write Ef[ fî, g F()]Ø(î)g ˆ f1 F()gØ(y) f (y) dy F() ˆ ff() 1g (v> yg ø(v) f (y) dv dy f1 F()g fv, yg ø(v) f (y) dv dy F()f1 F()gØ() F() ˆ ff() 1g ø(v)f(v) dv F() ˆ Ø(y) f (y) dy fv, yg ø(v) f (y) dv dy ø(v)f1 F(v)g dv ø(v)[ff() 1gF(v) fv, g ff(v) 1gF() fv>g ] dv: The function ø(:) belongs to the class K hence by the Cauchy±Schwarz inequality 1 4 ˆ [Ef fî, g F()gØ(î)] 2 2 ˆ ø(v)ff(v _ ) 1gF(v ^ ) dv < ø(v) 2 ó (v) 2 [ff(v _ ) 1gF(v ^ )] 2 f (v) dv ó (v) 2 dv f (v) 2 ˆ Eø(î) 2 ó (î) 2 ff(î _ ) 1gF(î ^ ) E : ó (î) f (î) Therefore ( ) 2 f1 F(î _ )gf(î ^ ) I ø > 4E ˆ I ó (î) f (î) for all ø(:) 2 K and we obtain the equality for the function ø(v) ˆ C ó (v) 2 f1 F(v _ gf(v ^ ), f (v) with some constant C.. This function does not have compact support but we can introduce a sequence ø N (v) ˆ ø (v) fjvj, Ng, N! 1 to obtain (6). u

Ef ciency of empirical distribution 451 The inequality (6) suggests that we introduce the following de nition of asymptotically ef cient estimator. De nition. Let the conditions of Theorem 1 be ful lled then we say that the estimator F T () is locally asymptotically minima (LAM) for the loss function l (:) if for any 2 R lim lim sup E H l [T 1=2 ff ä! T!1 H( : T () F H()g] ˆ El (æi =2 ) )2U ä where L (æ) ˆ N (, 1). We shall show below that for the polynomial loss functions l (:) the EDF is LAM. 3. Empirical distribution function By the law of large numbers the EDF is a consistent estimator of the value F() and even uniformly consistent. Theorem (Glivenko±Cantelli). Pf lim sup T!1 j^f T () F()j ˆ g ˆ 1: Proof. The proof is well known and coincides with the proof of this theorem in the i.i.d. case. u The normed difference W T () ˆ T 1=2 [ ^F T () F()] ˆ 1 f T 1=2 fx t, g F()g dt is asymptotically normal by the following central limit theorem. Lemma 1. Let the integrals and where, 4G h(y) converge absolutely; then Eh(î) ˆ s y T h(y) f (y) dy ˆ h(z) f (z) p(s) f (y) dz ds dy D(h, h), 1, p(s) ˆ ep 2 s S(v) ó (v) 2 dv

452 Y. A. Kutoyants ( L 1! T h(x T 1=2 t ) dt ) N (, D(h, h)): (9) Proof. For the proof see Mandl (1968, pp. 93 and 94) (see also Lanska 1979 for this formulation). u Hence the EDF is a uniformly consistent and asymptotically normal estimator of the onedimensional distribution function F(). Below we calculate its limit variance and compare with the bound (6). By Lemma 1 the limit variance of W T () is D(h, h ) with h (y) ˆ f y, g F() This quantity is calculated in the Appendi and is equal to f1 F(î _ )gf(î ^ ) D(h, h ) ˆ 4E 2ˆ I ó (î) f (î) : Therefore, if the convergence is uniform, then the EDF is LAM for the bounded loss functions. Remark. The central limit theorem (9) was proved by Mandl (1968) with the help of the central limit theorem for sums of independent identically distributed random variables and the conditions of uniform asymptotic normality for such sums can be found in Ibragimov and Khasminskii (1981, Appendi 1). The asymptotic ef ciency of the EDF can be also proved for the polynomial loss functions l (u) ˆ juj p with p > 2. To show this we shall use another form of Lemma 1 and we have to strengthen the conditions. Let us introduce the functions ( ) I(H) ˆ 4 f H () 2 F H (î ^ ) F H ()F H (î) 2 E H ó (î) f H (î) and F H (v ^ ) F H (v)f H () g(y) ˆ 2 ó (v) 2 dv f H (v) and introduce the following condition. (C 1 ) There eists a number p. 2 such that sup E H jg(î)j p, 1, H( : )2U ä F H (î ^ ) F H (î)f H () sup E H H( : )2 H ä ó (î) f H (î) p, 1

Ef ciency of empirical distribution 453 and the law of large numbers 1 T F H (X t ^ ) F H ()F H (X t ) 2 P H lim dt ˆ I(H) T!1 T ó (X t ) f H (X t ) is uniform on H(:) 2 U ä. Theorem 2. Let the condition (C 1 ) be ful lled, and let S(:) 2 È, I. and I(H) be continuous at the point H(:) ˆ ; then the EDF ^F T () is LAM for the loss functions l (u) ˆ juj p with p, p. Proof. By the ItoÃ formula we have 1 T 1=2 T [ fx t, g F H ()] dt ˆ g(x T ) g(x ) 2 T F H (X t ^ ) F H (X t )F H () dw T 1=2 T 1=2 t : ó (X t ) f H (X t ) The last stochastic integral is, by condition (C 1 ), asymptotically normal, uniformly on H(:) 2 U ä : 2 T F H (X t ^ ) F H (X t )F H () dw T 1=2 t ) N (, I(H) ) ó (X t ) f H (X t ) (Kutoyants 1984, Theorem 3.3.3) and the random variables ç T () ˆ l [T 1=2 f ^F T () F H ()gj] are uniformly integrable; for any p, p, sup E H jç T ()j p = p < C 1 T p=2 E H fg(x T ) g(x )g p T, H( : )2U ä T C 2 T F H (X t ^ ) F H (X t )F H () p E H dt < C: ó (X t ) f H (X t ) Therefore sup E H l [T 1=2 f ^F T () F H ()g! sup El fæi(h) =2 g, H( : )2U ä H( : )2U ä and the LAM of EDF now follows from the continuity of I(H). u 4. Concluding remarks It is interesting to have the similar lower bound for loss functions such as l (sup T 1=2 ^F T () F()j) and to prove the LAM of EDF in this situation as was done in i.i.d. case. Another problem closely related with this model of observations is the density function f () estimation. As was shown by Castellana and Leadbetter (1986) the rate of convergence

454 Y. A. Kutoyants of kernel-type estimators is T 1=2. It can be shown that the lower bound is similar to that given above in (6) but with ( I ˆ 4 f () 2 E ) fî. g F(î) 2 ó (î) f (î) and the kernel-type estimators as well as an unbiased estimator f T () ˆ 2 Tó () 2 T fx t, g dx t of the density are LAM in this problem (Kutoyants 1995). All these results, including those given in the present paper, were given (without detailed proofs) in Kutoyants (1996). Appendi Below we calculate the limit variance D(h, h ) of EDF ^F T () as follows. First we have Then s h (z) f (z) dz ˆ F(s ^ ) F(s)F(): D(h, h ) ˆ4 F()g f1 2 y 4F() f (y) y F(s) ó (s) 2 ds f (y) dy F(s ^ ) F(s)F() ó (s) 2 ds dy, because Gp(s) ˆ ó (s) 2 and for all y, in the rst integral we have s ^ ˆ s. Echanging the order of integration we obtain F(s) F(s) f (y) ó (s) 2 ds dy ˆ f (y) fs> yg ó (s) 2 ds dy y f (y) F(s) fs, yg ó (s) 2 ds dy ˆ F(s) 2 ó (s) 2 F(s)fF() F(s)g ds ó (s) 2 ds

Ef ciency of empirical distribution 455 and Therefore F(s)F() F(s ^ ) f (y) y ó (s) 2 ds dy ˆ f (y) ˆ f1 F()g 2 F(s)f1 F()g ó (s) 2 F(s) ó (s) 2 ds F() F()f1 F(s)g ds ó (s) 2 ds D(h, h ) ˆ 4f1 F()g 2 F(s) 2 ó (s) 2 ds 4F()2 f1 F(s)g 2 ó (s) 2 ds: f1 F(s)g 2 ó (s) 2 dy ds F(s ^ ) 2 f1 F(s _ )g 2 ˆ 4 ó (s) 2 ds f1 F(î _ )gf(î ^ ) ˆ 4E 2ˆ I ó (î) f (î) : Acknowledgements The author is indebted to the referee for pointing out several references and for advice improving the presentation of the paper. References Bickel, P.J. (1993) Estimation in semiparametric models. In C.R. Rao (ed.), Multivariate Analysis, Future Directions, pp. 55±73. Amsterdam: Elsevier. Bickel, P.J., Klaassen, C.A.J., Ritov, Y. and Wellner, J.A. (1993) Ef cient and Adaptive Estimation for Semiparametric Models. Baltimore, MD: John Hopkins University Press. Castellana, J.V. and Leadbetter, M.R. (1986) On smoothed density estimation for stationary processes. Stochastic Proc. Appl., 21, 179±193. Dvoretsky, A., Kiefer, J. and Wolfowitz, J. (1956) Asymptotic minima character of the sample distribution function and the classical multinomial estimator. Ann Statist., 27, 642±669. Genon-Catalot, V. and Jacod, J. (1994) On the estimation of diffusion coef cient for diffusion processes. Scand. J. Statist., 21, 193±221. Greenwood, P.E. and Wefelmeyer, W. (1995) Ef ciency of empirical estimators for Markov chains. Ann. Statist., 23, 132±143. Ibragimov, I.A. and Khasminskii, R.Z. (1981) Statistical Estimation. Asymptotic Theory. New York: Springer-Verlag.

456 Y. A. Kutoyants Koshevnik, Ya.A. and Levit, B.Ya. (1976) On a non-parametric analog of the information matri. Theor. Probab. Appl., 21, 738±753. Kutoyants, Yu.A. (1984) Parameter Estimation for Stochastic Processes. Berlin: Heldermann. Kutoyants, Yu.A. (1995) On density estimation by the observations of ergodic diffusion process. Preprint 95-8, January, UniversiteÂ du Maine, Le Mans. Kutoyants, Yu.A. (1997) Some problems of nonparametric estimation by the observations of ergodic diffusion process. Statist. Probab. Lett. To appear. Lanska, V. (1979) Minimum contrast estimation in diffusion processes. J. Appl. Probab., 16, 65±75. Levit, B.Ya (1973) On optimality of some statistical estimates. Proceedings of the Prague Symposium on Asymptotic Statistics, Vol. 2, pp. 215±238. Liptser, R.S. and Shiryayev, A.N. (1977) Statistics of Random Processes, Vol. 1. New York: Springer- Verlag. Mandl, P. (1968) Analytical Treatment of One-Dimensional Markov Processes. Prague: Academia. Berlin: Springer. Millar, P.W. (1983) The minima principle in asymptotic statistical theory. In P.L. Hennequin, (ed.), Ecole d'eteâ de ProbabiliteÂs de Saint Flour XI 1981, pp. 75±266. Lect. Notes in Math. 976. Berlin: Springer. Penev, S. (1991) Ef cient estimation of the stationary distribution for eponentially ergodic Markov chains. J. Statist. Plan. Inference, 27, 15±123. Stein, C. (1956) Ef cient nonparametric testing and estimation. Proceedings of the Third Berkeley Symposium, Vol. 1, pp. 187±195. van der Vaart, A.W. and Wellner, J.A. (199) Prohorov and continuous mapping theorems in the Hoffmann±Jùrgensen weak convergence theory with applications to convolution and asymptotic minima theorems. Technical Report 157, Department of Statistics, University of Washington, Seattle, WA. Received March 1995 and revised November 1996