Efficient estimation for semiparametric semi-markov processes

Similar documents
Institute of Mathematical Statistics LECTURE NOTES MONOGRAPH SERIES PLUG-IN ESTIMATORS IN SEMIPARAMETRIC STOCHASTIC PROCESS MODELS

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Nonparametric Estimation for Semi-Markov Processes Based on K-Sample Paths with Application to Reliability

Estimators For Partially Observed Markov Chains

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

Estimating the inter-arrival time density of Markov renewal processes under structural assumptions on the transition distribution

Density estimators for the convolution of discrete and continuous random variables

Variance bounds for estimators in autoregressive models with constraints

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency for heteroscedastic regression with responses missing at random

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

Faithful couplings of Markov chains: now equals forever

On the Central Limit Theorem for an ergodic Markov chain

AN EFFICIENT ESTIMATOR FOR GIBBS RANDOM FIELDS

A note on L convergence of Neumann series approximation in missing data problems

Goodness-of-fit tests for the cure rate in a mixture cure model

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

A Note on the Central Limit Theorem for a Class of Linear Systems 1

AN EFFICIENT ESTIMATOR FOR THE EXPECTATION OF A BOUNDED FUNCTION UNDER THE RESIDUAL DISTRIBUTION OF AN AUTOREGRESSIVE PROCESS

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Estimation of a quadratic regression functional using the sinc kernel

LIST OF MATHEMATICAL PAPERS

Zero-sum semi-markov games in Borel spaces: discounted and average payoff

University of Toronto Department of Statistics

Additive Isotonic Regression

Separation of Variables in Linear PDE: One-Dimensional Problems

Limit Theorems for Exchangeable Random Variables via Martingales

Three Papers by Peter Bickel on Nonparametric Curve Estimation

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

Geometric ρ-mixing property of the interarrival times of a stationary Markovian Arrival Process

Mohsen Pourahmadi. 1. A sampling theorem for multivariate stationary processes. J. of Multivariate Analysis, Vol. 13, No. 1 (1983),

Lecture 2: Linear Algebra Review

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Citation Osaka Journal of Mathematics. 41(4)

MATH 304 Linear Algebra Lecture 19: Least squares problems (continued). Norms and inner products.

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Conditional independence, conditional mixing and conditional association

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Asymptotic efficiency of simple decisions for the compound decision problem

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Regression: Lecture 2

A note on the growth rate in the Fazekas Klesov general law of large numbers and on the weak law of large numbers for tail series

When is a Markov chain regenerative?

Introduction. log p θ (y k y 1:k 1 ), k=1

MATH 304 Linear Algebra Lecture 18: Orthogonal projection (continued). Least squares problems. Normed vector spaces.

Research Article Least Squares Estimators for Unit Root Processes with Locally Stationary Disturbance

Stochastic relations of random variables and processes

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

Bootstrap with Larger Resample Size for Root-n Consistent Density Estimation with Time Series Data

Heavy Tailed Time Series with Extremal Independence

[1] Thavaneswaran, A.; Heyde, C. C. A note on filtering for long memory processes. Stable non-gaussian models in finance and econometrics. Math.

1 Review of di erential calculus

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models

Bayesian estimation of the discrepancy with misspecified parametric models

Integration - Past Edexcel Exam Questions

On Convergence of Recursive Monte Carlo Filters in Non-Compact State Spaces

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

12 - Nonparametric Density Estimation

Nonparametric Bayesian Methods (Gaussian Processes)

Local consistency of Markov chain Monte Carlo methods

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Consistency of the maximum likelihood estimator for general hidden Markov models

Controlled Markov Processes with Arbitrary Numerical Criteria

Estimation of the Bivariate and Marginal Distributions with Censored Data

Optimal global rates of convergence for interpolation problems with random design

Conditional moment representations for dependent random variables

Linear regression COMS 4771

Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes

Introduction to Empirical Processes and Semiparametric Inference Lecture 22: Preliminaries for Semiparametric Inference

On variable bandwidth kernel density estimation

Total positivity in Markov structures

On Differentiability of Average Cost in Parameterized Markov Chains

SEMIPARAMETRIC INFERENCE AND MODELS

University of California, Berkeley

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

B 1 = {B(x, r) x = (x 1, x 2 ) H, 0 < r < x 2 }. (a) Show that B = B 1 B 2 is a basis for a topology on X.

ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS

STABILITY OF THE INVENTORY-BACKORDER PROCESS IN THE (R, S) INVENTORY/PRODUCTION MODEL

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

On robust and efficient estimation of the center of. Symmetry.

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Bivariate distributions

Spring 2012 Math 541B Exam 1

Nonparametric inference for ergodic, stationary time series.

Vectors in Function Spaces

MATH 205C: STATIONARY PHASE LEMMA

Two special equations: Bessel s and Legendre s equations. p Fourier-Bessel and Fourier-Legendre series. p

17 : Markov Chain Monte Carlo

Computation of an efficient and robust estimator in a semiparametric mixture model

DA Freedman Notes on the MLE Fall 2003

LAN property for ergodic jump-diffusion processes with discrete observations

Nonparametric regression with martingale increment errors

On the estimation of the entropy rate of finite Markov chains

A nonparametric method of multi-step ahead forecasting in diffusion processes

Bayesian Regularization

Math Linear Algebra II. 1. Inner Products and Norms

Simple and Explicit Estimating Functions for a Discretely Observed Diffusion Process

Transcription:

Efficient estimation for semiparametric semi-markov processes Priscilla E. Greenwood Arizona State University Ursula U. Müller Universität Bremen Wolfgang Wefelmeyer Universität Siegen Abstract We consider semiparametric models of semi-markov processes with arbitrary state space. Assuming that the process is geometrically ergodic, we characterize efficient estimators, in the sense of Hájek and Le Cam, for arbitrary real-valued smooth functionals of the distribution of the embedded Markov renewal process. We construct efficient estimators of the parameter and of linear functionals of the distribution. In particular we treat the two cases in which we have a parametric model for the transition distribution of the embedded Markov chain and an arbitrary conditional distribution of the inter-jump times, and vice versa. Introduction Suppose we observe a semi-markov process Z t, t 0, with embedded Markov renewal process (X 0, T 0 ), (X, T ),..., on a time interval 0 t n. The transition distribution of the Markov renewal process factors as D(x, dy, ds) = Q(x, dy)r(x, y, ds), where Q(x, dy) is the transition distribution of the embedded Markov chain X 0, X,..., and R(x, y, ds) is the conditional distribution of the inter-jump times S j = T j T j given X j = x and X j = y. We assume that the embedded Markov chain is geometrically ergodic. We write P (dx, dy, ds) for the joint stationary law of (X j, X j, S j ), and P (dx) and P 2 (dx, dy) for its marginals. We are interested in estimation of functionals of Q and R. Our results hold also for observations (X 0, T 0 ),..., (X n, T n ) of the embedded Markov renewal process. For discrete state space and the fully parametric or nonparametric

cases, the asymptotic distribution of maximum likelihood estimators, Bayes estimators, and empirical estimators has been studied by Taga [3], Pyke and Schaufele [27], Hatori [3], McLean and Neuts [20], Moore and Pyke [22], Ouhbi and Limnios [23, 24] and, with censoring, by Lagakos, Sommer and Zelen [8], Gill [5], Voelkel and Crowley [32] and Phelan [25, 26]. We focus primarily on semiparametric models and on the construction of efficient estimators. The simplest semiparametric models are obtained by specifying a parametric form for one of the factors of D(x, dy, ds) = Q(x, dy)r(x, y, ds). In one case, we assume a parametric model Q for the transition distribution of the embedded Markov chain and leave the conditional distribution of the inter-jump times unspecified (model Q). In a second case, we assume a parametric model R for the conditional distribution of the inter-jump times and leave the transition distribution of the embedded Markov chain unspecified (model R). The estimating problems connected with these two models are specific to the semi- Markov setting; in particular, they have no non-trivial counterpart for Markov chains. To keep the paper readable and short, we concentrate on the two simple models above. More general models, involving possibly infinite-dimensional parameters, perhaps on both factors simultaneously, could be treated along the same lines. We want to estimate and linear functionals of the form Ef(X j, X j, S j ) = P (dx)q(x, dy) R(x, y, ds)f(x, y, s) = P QRf, with Q = Q or R = R parametric. Interesting applications are estimation of probabilities P (X j A, X j B, S j c), P (X j A, X j B) and P (X j A), and of ratios P (S j c X j A, X j B) and P (X j B X j A). We can also treat expectations ES j and conditional expectations E(S j X j A, X j B) and E(X j X j A). Natural estimators for are the maximum likelihood estimators based on the conditional distributions Q or R. We show that they are efficient in our two models. In particular, they are adaptive in the sense that knowing the nonparametric factor of Q(x, dy)r(x, y, ds) cannot give estimators with smaller asymptotic variance. A natural estimator for a linear functional Ef(X j, X j, S j ) is the empirical estimator f(x j, X j, S j ), 2

where = max{j : T j n}. Greenwood and Wefelmeyer [8] have shown that this estimator is efficient in the fully nonparametric semi-markov model; see also Greenwood and Wefelmeyer [7] for Markov step processes. We construct better, efficient, estimators for our two semiparametric models Q and R. For our first model, the functional Ef(X j, X j, S j ) can be written Ef(X j, X j, S j ) = P Q Rf = P (dx)q (x, dy)r xy f with R xy f = R(x, y, ds)f(x, y, s). To exploit the structure of the model, we use a plugin estimator, i.e. we replace the conditional expectation Rf by a kernel estimator ˆRf. By what we refer to as the plug-in principle, we expect that P Q ˆRf will converge at the parametric rate n /2 under appropriate conditions on the kernel and the bandwidth, even though the kernel estimator has a slower rate of convergence. In a second step, we replace the parameter by an estimator ˆ. This results in the estimator P ˆQ ˆ ˆRf for Ef(X j, X j, S j ). It is efficient if an efficient estimator ˆ is used for. Related plug-in estimators have been used in other, mainly nonparametric, contexts before. For quadratic functionals of densities with i.i.d. observations see Hall and Marron [2], Bickel and Ritov [2], Eggermont and LaRiccia [4] and the references there. Similar results exist for regression models; see e.g. Goldstein and Messer [6] and Efromovich and Samarov [3]. In semiparametric time series models with independent innovations, the stationary density can be written as a smooth functional of the innovation density and the parameters; n /2 -consistent and efficient plug-in estimators are constructed in Saavedra and Cao [28] and Schick and Wefelmeyer [29, 30]. For our second model, the functional Ef(X j, X j, S j ) can be written Ef(X j, X j, S j ) = P 2 R f = P 2 (dx, dy) R (x, y, ds)f(x, y, s). Here we can estimate the nonparametric part P 2 by the empirical distribution based on the embedded Markov chain. Again we replace by an estimator ˆ. We show that the resulting estimator N n R ˆ(X j, X j, ds)f(x j, X j, s) is efficient if ˆ is efficient. The paper is organized as follows. In Section 2 we state local asymptotic normality for arbitrary semi-markov models and characterize efficient estimators for smooth 3

functionals on such models. In Section 3 we construct efficient estimators of and Ef(X j, X j, S j ) for model Q, and in Section 4 for model R. Throughout the paper, the discussion will be informal. 2 Characterization of efficient estimators In this section we consider general semi-markov models described by families of distributions Q(x, dy) and R(x, y, dz). To calculate asymptotic variance bounds and characterize efficient estimators, we fix Q and R and introduce a local model at (Q, R) by perturbing Q as Q nu (x, dy). = Q(x, dy)( + n /2 u(x, y)) and R as R nv (x, y, ds). = R(x, y, ds)( + n /2 v(x, y, s)). Since Q nu and R nv are again conditional distributions, the function u will vary in some subset U 0 of U = {u L 2 (P 2 ) : Q x u = 0}, and the function v will vary in some subset V 0 of V = {v L 2 (P ) : R xy v = 0}. Here Q x u = Q(x, dy)u(x, y) and R xy v = R(x, y, ds)v(x, y, s). Similarly, we will write D x v = D(x, dy, ds)v(x, y, s). The sets U 0 and V 0 are called the tangent spaces for Q and R. For simplicity we take them linear and closed. Note that U 0 and V 0 are orthogonal subspaces of L 2 (P ). The perturbations Q nu. = Q( + n /2 u) and R nv. = R( + n /2 v) are meant in the sense that Q nu and R nv are Hellinger differentiable with derivatives u and v. For appropriate versions in arbitrary Markov step models and in nonparametric semi-markov models see Höpfner, Jacod and Ladelli [4] and Greenwood and Wefelmeyer [8]. We assume that D(x, dy, {0}) = 0, that the mean inter-jump time m = ES j is finite, and that the embedded Markov chain is positive Harris recurrent. Then n m a.s. (2.) Furthermore, the following law of large numbers and martingale central limit theorem hold. For f L 2 (P ) we have f(x j, X j, S j ) P f a.s., (2.2) 4

and for w L 2 (P ) with D x w = 0 we have n /2 w(x j, X j, S j ) m /2 L, (2.3) where L is a normal random variable with mean zero and variance P w 2. M (n) uv Now write M (n) for the distribution of Z t, 0 t n, if Q and R are in effect, and if Q nu and R nv are. Similarly as in Höpfner, Jacod and Ladelli [4] and Greenwood and Wefelmeyer [8], and using orthogonality of U 0 and V 0, we obtain local asymptotic normality: For u U 0 and v V 0, where log dm (n) uv dm (n) = H n 2 σ2 (u, v) + o p (), (2.4) ( H n = n /2 u(xj, X j ) + v(x j, X j, S j ) ), σ 2 (u, v) = m (P 2 u 2 + P v 2 ), and H n is asymptotically normal with mean zero and variance σ 2. We want to estimate functionals of (Q, R). A real-valued functional ϕ(q, R) is said to be differentiable at (Q, R) with gradient (g, h) if g U, h V, and the functional has a linear approximation in terms of the inner product from the LAN-norm, n /2 (ϕ(q nu, R nv ) ϕ(q, R)) m (P 2 (ug) + P (vh)), u U 0, v V 0. (2.5) The projection (g 0, h 0 ) of (g, h) onto U 0 V 0 is called the canonical gradient of ϕ. An estimator ˆϕ is called regular for ϕ at (Q, R) with limit L if L is a random variable such that n /2 ( ˆϕ ϕ(q nu, R nv )) L under M (n) uv, u U 0, v V 0. (2.6) The convolution theorem of Hájek [] and Le Cam [9] says that L is distributed as the convolution of a normal random variable with mean zero and variance σ 2 (g 0, h 0 ) = m (P 2 g 2 0 + P h 2 0) with another random variable. This justifies calling ˆϕ efficient if it has this asymptotic variance. An estimator ˆϕ is called asymptotically linear with influence function (a, b) if a U, b V, and ( n /2 ( ˆϕ ϕ(q, R)) = n /2 a(xj, X j ) + b(x j, X j, S j ) ) + o p (). (2.7) 5

With these definitions, ˆϕ is regular and efficient if and only if it is asymptotically linear with influence function equal to the canonical gradient: ( n /2 ( ˆϕ ϕ(q, R)) = n /2 g0 (X j, X j ) + h 0 (X j, X j, S j ) ) + o p (). (2.8) A reference for this characterization in the i.i.d. case is in Bickel, Klaassen, Ritov and Wellner []; for semi-markov processes parametrized by D see Greenwood and Wefelmeyer [8]. We point out that the orthogonality of U 0 and V 0 implies that functionals of one of the factors of Q(x, dy)r(x, y, ds) can be estimated adaptively with respect to the other factor in the following sense. Suppose ϕ(q, R) depends only on Q. Then (2.5) holds with h = 0, and the canonical gradient is of the form (g 0, 0). Suppose now that ˆϕ is efficient in a model with R completely unspecified. Then it will remain efficient for any submodel for R, in particular when R is known. The same holds with interchanged roles of Q and R. We apply this observation to estimation of in models Q and R, Sections 3 and 4. We will also need a version of the central limit theorem (2.3) for functions that are not conditionally centered. Suppose that the embedded Markov chain is geometrically ergodic in the L 2 sense. For k L 2 (P 2 ) define (Ak)(x, y) = i=0 (Q i yk Q i+ x ). Set f 0 (x, y, s) = f(x, y, s) R xy f. Then n /2 ( f(xj, X j, S j ) P QRf ) ( = n /2 ARf(Xj, X j ) + f 0 (X j, X j, S j ) ) + o p (). (2.9) Note that Q x Ak = 0 for k L 2 (P 2 ). For Markov chains, the above martingale approximation goes back to Gordin [9] and Gordin and Lifšic [0]; see Meyn and Tweedie [2], Section 7.4. For semi-markov processes we refer to Greenwood and Wefelmeyer [8]. From (2.3) we obtain that the above standardized sum is asymptotically normal with variance m (P 2 (ARf) 2 + P f0 2 ). 6

To calculate gradients of linear functionals Ef(X j, X j, S j ), we need the following perturbation expansion due to Kartashov [5, 6, 7]: n /2 (P nu Q nu k P Qk) P 2 (kbu) = P 2 (uak), k L 2 (P 2 ), (2.0) where B is the adjoint of A. We will not need the explicit form of B. The perturbation expansion implies that ϕ(q, R) = P f = P QRf is differentiable for f L 2 (P ), n /2 (P nu Q nu R nv f P QRf) P 2 (uarf) + P (vf 0 ), u U 0, v V 0. (2.) Here we have used that U and V are orthogonal. For a proof of (2.) we refer to Greenwood and Wefelmeyer [8]. Note that there we do not factor D and have local parameters h(x, y, s) which here are written u(x, y) + v(x, y, s). 3 Model Q In this section we consider model Q, in which we have a parametric family Q for Q and leave R unspecified. For simplicity we assume that is one-dimensional. A natural estimator for is the maximum likelihood estimator based on Q. Suppose Q (x, dy) has density q (x, y) with respect to some dominating measure ν Q (x, dy), and that q has derivative q with respect to. Write λ = q /q for the score function. The maximum likelihood estimator ˆ solves the estimating equation λ (X j, X j ) = 0. A stochastic expansion of ˆ is now obtained by the usual arguments. First we recall two well-known relations for λ and λ, namely 0 = (ν Q q ) = ν Q q = Q λ, 0 = (Q λ ) = ν Q (λ q ) = ν Q (λ q + λ q ) = Q (λ 2 + λ ). Write P 2 = P Q and let I = P 2 λ 2 denote Fisher information. We obtain from the second relation that I = P 2 λ. From the law of large numbers (2.2) and (2.) we obtain by Taylor expansion that ˆ is asymptotically linear with influence function (mi λ, 0): n /2 ( ˆ ) = mi n /2 7 λ (X j, X j ) + o p (). (3.)

From the martingale central limit theorem (2.3) we conclude that n /2 ( ˆ ) is asymptotically normal with variance mi. To prove semiparametric efficiency of ˆ, we must interpret as a functional of (Q, R) through ϕ(q, R) =. The local model for Q is obtained by perturbing as na = + n /2. a and Q as Q na = Q ( + n /2 aλ ). Hence the tangent space U 0 for Q consists of all functions of the form aλ, a R. The canonical gradient (g 0, h 0 ) of is therefore of the form (a 0 λ, 0), where a 0 is determined from (2.5) by a = m P 2 (aλ a 0 λ ) = aa 0 m I, a R. This gives a 0 = mi and g 0 = mi λ. Since ˆ has influence function (mi λ, 0) by (3.), it is efficient by characterization (2.8). Note that ˆ is adaptive with respect to R in the sense that it remains efficient even if we know R. Now we consider estimation of a linear functional Ef(X j, X j, S j ) = P 2 Rf with f L 2 (P 2 R). A natural estimator is the empirical estimator f(x j, X j, S j ). We have ARf U and f 0 V. From (2.9) we obtain that the empirical estimator is asymptotically linear with influence function (marf(x, y), mf 0 (x, y, s)) and asymptotic variance m(p 2 (ARf) 2 + P 2 Rf 2 0 ). If nothing were known about Q, the empirical estimator would be efficient; see Greenwood and Wefelmeyer [8]. Since we have assumed a parametric model Q, we can construct better estimators exploiting the structure of the model. We assume that the state space is the real line, and that P has Lebesgue density p. Let p and q be the densities of P and Q. Then p 2 (x, y) = p (x)q (x, y) is the density of P 2. We write Rf = a/p 2 with a(x, y) = p(x, y, s)ds f(x, y, s) and estimate Rf by ˆRf = â/ˆp 2 with kernel estimators â(x, y) = ˆp 2 (x, y) = ( x b k Xj 2 b ( x b k Xj 2 b 8, y X ) j f(x, y, S j ), b, y X ) j, b

where k is a mean zero density and b = b n is a bandwidth that tends to zero at a rate to be determined later. Our estimator for P 2 Rf is P 2 ˆ ˆRf with ˆ a n /2 -consistent estimator of. We prove that it is asymptotically linear if f is differentiable. Under appropriate smoothness assumptions on p, a modified proof will cover discontinuous f, in particular indicator functions. To calculate the influence function of P 2 ˆ ˆRf, we write ˆRf = Rf + â a ˆp 2 ˆp 2 p 2 Rf. ˆp 2 Then our estimator is approximated as. P 2 ˆ ˆRf = P2 ˆRf + dxdy(â(x, y) a(x, y)) dxdy (ˆp 2 (x, y) p 2 (x, y))r xy f. Let b = n /4. Since the kernel k integrates to one and has mean zero, a change of variables u = (x X j )/b and v = (x X j )/b and a Taylor expansion give Similarly, dxdy â(x, y) = dxdy ˆp 2 (x, y)r xy f = dudv k(u, v)f(x j + bu, X j + bv, S j ) = f(x j, X j, S j ) + o p (n /2 ). (3.2) dudv k(u, v) R(X j + bu, X j + bv, ds)f(x j + bu, X j + bv, s) = R Xj,X N j f + o p (n /2 ). (3.3) n With the notation f 0 (x, y, s) = f(x, y, s) R xy f, these two expansions lead to P 2 ˆ ˆRf = P2 ˆRf + f 0 (X j, X j, S j ) + o p (n /2 ). (3.4) It remains to expand P 2 ˆRf. With Q na. = Q ( + n /2 aλ ) and the perturbation expansion (2.0) for u = aλ and a = n /2 ( ˆ ), a Taylor expansion gives P 2 ˆRf = P 2 Rf + P 2 (λ ARf)(ˆ ) + o p (n /2 ). (3.5) 9

Suppose now that ˆ is efficient. Then it has influence function (mi λ, 0) by (3.). Together with (3.4) and (3.5) we obtain that n /2 (P 2 ˆ ˆRf P2 Rf) ( = mn /2 I P 2(λ ARf)λ (X j, X j ) + f 0 (X j, X j, S j ) ) + o p (). Hence by (2.3) our estimator is asymptotically normal with variance m ( I (P 2(λ ARf)) 2 + P 2 Rf 2 0 ). Note that by the Cauchy Schwarz inequality, I (P 2(λ ARf)) 2 P 2 (ARf) 2. Since the empirical estimator has asymptotic variance m(p 2 (ARf) 2 + P 2 Rf 2 0 ), our estimator is better unless ARf is proportional to λ. Now we prove that our estimator P 2 ˆ ˆRf is efficient. By the characterization (2.8) of efficient estimators, we must show that the influence function of P 2 ˆ ˆRf equals the canonical gradient of the functional ϕ(q, R) = P 2 Rf. Let na = + n /2 a and R nv. = R( + n /2 v). Then Q na. = Q ( + n /2 aλ ), and the perturbation expansion (2.) implies n /2 (P 2na R nv f P 2 Rf) ap 2 (λ ARf) + P 2 R(vf 0 ), a R, v V. Since R is unspecified and hence the tangent space V 0 for R is V, the canonical gradient of P 2 Rf is of the form (a 0 λ, mf 0 ), where a 0 is determined from (2.5) by ap 2 (λ ARf) = aa 0 m I, a R. This gives a 0 = mi P 2(λ ARf). Hence P 2 ˆ ˆRf is efficient by characterization (2.8). We end this section with some comments. If we set f(x, y, s) = s, we obtain an efficient estimator for P 2 Rf = ES j = m, the mean inter-jump time. If the inter-jump time distribution does not depend on the states, then our estimator is asymptotically equivalent to the empirical estimator Nn S j. If the state space is discrete, we can replace ˆR = â/ˆp 2 by the simpler estimator 0

Rf = a/p 2 with a(x, y) = (X j = x, X j = y)f(x, y, S j ), p 2 (x, y) = (X j = x, X j = y). The analysis of P 2 ˆRf then simplifies in (3.2) and (3.3). Some examples would be estimation of P (a, b, (, c]), P 2 (a, b), P (a) and of ratios R(a, b, (, c]) and Q(a, b). 4 Model R In this section we consider model R, in which we have a parametric family R for R and leave Q unspecified. Again we assume that is one-dimensional. We proceed as in Section 3. A natural estimator for is the maximum likelihood estimator based on R. We assume that R (x, y, ds) has density r (x, y, s) with respect to some dominating measure ν R (x, y, ds), and write µ = ṙ /r for the score function. We have R µ = 0 and R (µ 2 + µ ) = 0. In particular, the Fisher information J = P 2 R µ 2 equals P 2R µ. The maximum likelihood estimator solves the estimating equation µ (X j, X j, S j ) = 0. As in Section 3 we obtain that ˆ is asymptotically linear, now with influence function (0, mj µ ): n /2 ( ˆ ) = mj n /2 µ (X j, X j, S j ) + o p (). (4.) Hence n /2 ( ˆ ) is asymptotically normal with variance mj. To prove efficiency of ˆ, we interpret as a functional of (Q, R) through ϕ(q, R ) =. The local model for R is described by perturbing as na = + n /2 a and R as. = R ( + n /2 aµ ). So the tangent space for R consists of all functions of the R na form aµ, a R, and the canonical gradient (g 0, h 0 ) of is of the form (0, a 0 µ ), where a 0 is determined from (2.5) by a = m P 2 R (aµ a 0 µ ) = aa 0 m J, a R.

This gives a 0 = mj function (0, mj to Q. and h 0 = mj µ. Since ˆ is asymptotically linear with influence µ ), it is efficient by characterization (2.8) and adaptive with respect To estimate Ef(X j, X j, S j ) = P 2 R f, we can again use the empirical estimator. However, a better estimator is ˆP 2 R ˆf = Here ˆP 2 stands for the empirical distribution R ˆ(X j, X j, ds)f(x j, X j, s). δ (Xj,X j )(dx, dy), where δ (Xj,X j ) is the one-point distribution on (X j, X j ). With R na. = R ( + n /2 aµ ) and a = n /2 ( ˆ ), a Taylor expansion gives ˆP 2 R ˆf = P 2 R f + ˆP 2 R ˆf P 2 R ˆf + P 2 R ˆf P 2 R f = P 2 R f + R (X j, X j, ds)f(x j, X j, s) P 2 R f +P 2 R (µ f)(ˆ ) + o p (n /2 ). Since R µ = 0, we have P 2 R (µ f) = P 2 R (µ f 0 ) and hence n /2 ( ˆP 2 R ˆf P 2 R f) ( = mn /2 R (X j, X j, ds)f(x j, X j, s) P 2 R f) +P 2 R (µ f 0 )(ˆ ) + o p (). Suppose that ˆ is efficient for. Then ˆ is asymptotically linear with influence function (0, mj µ ); see (4.). From the martingale approximation (2.9) we see that ˆP 2 R ˆf then has influence function (mar f, mj P 2R (µ f 0 )µ ). Hence ˆP 2 R ˆf is asymptotically normal with variance m ( P 2 (AR f) 2 + J (P 2R (µ f 0 )) 2). Note that by the Cauchy Schwarz inequality, J (P 2R (µ f 0 )) 2 P 2 R f 2 0. 2

Hence our estimator is better than the empirical estimator unless f 0 is proportional to µ. Now we prove that ˆP 2 R ˆf is efficient. Let na = +n /2 a and Q nu. Then R na = R ( + n /2 aµ ), and (2.) implies. = Q(+n /2 u). n /2 (P 2nu R na f P 2 R f) P 2 (uar f) + ap 2 R (µ f 0 ), u U, a R. Since Q is unspecified and hence the tangent space U 0 for Q is U, the canonical gradient of P 2 R f is of the form (mar f, a 0 µ ), where a 0 is determined by ap 2 R (µ f 0 ) = aa 0 m J, a R. This gives a 0 = mj P 2R (µ f 0 ). Hence ˆP 2 R ˆf is efficient by characterization (2.8). For example, if we set f(x, y, s) = s, we obtain an efficient estimator of the mean inter-jump time m = ES j. R ˆ(X j, X j, ds)s It is better than the empirical estimator Nn S j unless s R (x, y, ds)s is proportional to µ (x, y, s). If the inter-jump time distribution does not depend on the states, i.e. R (x, y, ds) = R (ds), then our estimator is equivalent to the simpler estimator R ˆ(ds)s, which is better than the empirical estimator Nn S j unless s R (x, y, ds)s is proportional to µ (s), i.e. if the inter-jump time distribution R is exponential with scale parameter. Acknowledgment. Work supported by NSERC, Canada. References [] Bickel, P. J.; Klaassen, C. A. J.; Ritov, Y.; Wellner, J. A., Efficient and Adaptive Estimation for Semiparametric Models. Springer: New York, 998. [2] Bickel, P. J.; Ritov, Y., Estimating integrated squared density derivatives: Sharp best order of convergence estimates. Sankhya Ser. A, 50 (988) 38 393. [3] Efromovich, S.; Samarov, A., Adaptive estimation of the integral of squared regression derivatives. Scand. J. Statist., 27 (2000) 335 35. [4] Eggermont, P. P. B.; LaRiccia, V. N., Maximum Penalized Likelihood Estimation, Vol I, Density Estimation. Springer: New York, 200. [5] Gill, R. D., Nonparametric estimation based on censored observations of a Markov renewal process. Z. Wahrscheinlichkeitstheorie verw. Gebiete, 53 (980) 97 6. 3

[6] Goldstein, L.; Messer, K., Optimal plug-in estimators for nonparametric functional estimation. Ann. Statist., 20 (992) 306 328. [7] Greenwood P. E.; Wefelmeyer, W., Nonparametric estimators for Markov step processes. Stochastic Process. Appl., 52 (994) -6. [8] Greenwood P. E.; Wefelmeyer, W., Empirical estimators for semi-markov processes. Math. Meth. Statist., 5 (996) 299-35. [9] Gordin, M. I., The central limit theorem for stationary processes. Soviet Math. Dokl., 0 (969) 74-76. [0] Gordin, M. I.; Lifšic, B. A., The central limit theorem for stationary Markov processes. Soviet Math. Dokl., 9 (978) 392 394. [] Hájek, J., A characterization of limiting distributions of regular estimates. Z. Wahrsch. Verw. Gebiete, 4 (970) 323 330. [2] Hall, P.; Marron, J. S., Estimation of integrated squared density derivatives. Statist. Probab. Lett., 6 (987) 09 5. [3] Hatori, H., A limit theorem on (J, X)-processes. Kōdai Math. Sem. Reports, 8 (966) 37 32. [4] Höpfner, R.; Jacod, J.; Ladelli, L., Local asymptotic normality and mixed normality for Markov statistical models. Probab. Theory Related Fields, 86 (990) 05 29. [5] Kartashov, N. V., Criteria for uniform ergodicity and strong stability of Markov chains with a common phase space. Theory Probab. Math. Statist., 30 (985a) 7 89. [6] Kartashov, N. V., Inequalities in theorems of ergodicity and stability for Markov chains with common phase space. I. Theory Probab. Appl., 30 (985b) 247 259. [7] Kartashov, N. V., Strong Stable Markov Chains. VSP: Utrecht, 996. [8] Lagakos, S. W.; Sommer, C. J.; Zelen, M., Semi-Markov models for partially censored data. Biometrika 65 (978) 3 37. [9] Le Cam, L., Limits of experiments. Proc. Sixth Berkeley Symp. Math. Statist. Probab., l (97) 245 26. [20] McLean, R. A.; Neuts, M. F., The integral of a step function defined on a semi-markov process. SIAM J. Appl. Math., 5 (967) 726 737. [2] Meyn, S. P.; Tweedie, R. L., Markov Chains and Stochastic Stability. Springer: London, 993. [22] Moore, E. H.; Pyke, R., Estimation of the transition distributions of a Markov renewal process. Ann. Inst. Statist. Math., 20 (968) 4 424. [23] Ouhbi, L.; Limnios N., Nonparametric estimation for semi-markov kernels with applications to reliability analysis. Appl. Stochastic Models Data Anal., 2 (996) 209 220. [24] Ouhbi, L.; Limnios N., Nonparametric estimation for semi-markov processes based on its hazard rate functions. Stat. Inference Stoch. Process., 2 (999) 5 73. 4

[25] Phelan, M. J., Bayes estimation from a Markov renewal process. Ann. Statist., 8 (990a) 603 66. [26] Phelan, M. J., Estimating the transition probability from censored Markov renewal processes. Statist. Probab. Lett., 0 (990b) 43 47. [27] Pyke, R.; Schaufele, R., The existence and uniqueness of stationary measures for Markov renewal processes. Ann. Math. Statist., 37 (966) 439 462. [28] Saavedra, A.; Cao, R., On the estimation of the marginal density of a moving average process. Canad. J. Statist., 28 (2000) 799 85. [29] Schick, A.; Wefelmeyer, W., Root n consistent and optimal density estimators for moving average processes. Technical Report, Department of Mathematical Sciences, Binghamton University, 2002a. http://math.binghamton.edu/anton/preprint.html. [30] Schick, A.; Wefelmeyer, W., Functional convergence and optimality of plug-in estimators for stationary densities of moving average processes. Technical Report, Department of Mathematical Sciences, Binghamton University, 2002b. http://math.binghamton.edu/anton/preprint.html. [3] Taga, Y., On the limiting distributions in Markov renewal processes with finitely many states. Ann. Inst. Statist. Math., 5 (963) 0. [32] Voelkel, J. G.; Crowley, J., Nonparametric inference for a class of semi-markov processes with censored observations. Ann. Statist., 2 (984) 42 60. 5