Iterated Random Functions: Convergence Theorems. C. D. Fuh Institute of Statistical Science Academia Sinica, Taipei, Taiwan, ROC ABSTRACT

Size: px

Start display at page:

Download "Iterated Random Functions: Convergence Theorems. C. D. Fuh Institute of Statistical Science Academia Sinica, Taipei, Taiwan, ROC ABSTRACT"

Kelley Ball
5 years ago
Views:

1 Iterated Random Functions: Convergence Theorems C. D. Fuh Institute of Statistical Science Academia Sinica, Taipei, Taiwan, ROC ABSTRACT Iterated random functions are used to draw pictures, simulate large Ising models or likelihood function representation of hidden Markov models, among other applications. They offer a method for studying the steady state distribution of a Markov chain, and there is a simply unifying idea: the iterated random Lipschitz functions converge if the functions are contracting on the average. To be more precise, let (X, d) be a complete separable metric space and (F n ) n 0 a sequence of i.i.d. random functions from X to X which are uniform Lipschitz, that is, L n = sup x y d(f n (x), F n (y))/d(x, y) < a.s. Providing the mean contraction assumption E log + L 1 < 0 and E log + d(f 1 (x 0 ), x 0 ) < for some x 0 X, it is known that the forward iterations Mn x = F n F 1 (x), n 0, converge weakly to a unique stationary distribution π for each x X. The associated backward iterations ˆM n x = F 1 F n (x) are a.s. convergent to a random variable ˆM which does not depend on x and has distribution π. In this paper, we describe the essential results about asymptotic behavior of the iterated random functions Mn x. To start with, we summarize recent results regarding stochastic stability of iterated random functions. Then, we study limiting theorems for additive functions of a Markov chain that can be constructed as an iterated random functions, which include ergodic theorem, central limit theorem, quick convergence, Edgeworth expansion and renewal theorems. Three prototypical methods are introduced to prove limiting theorems: regeneration method, Poisson equation, and spectral theory for the transition operator. Several examples are given for illustration. AMS 2000 subject classifications. 60J05, 60J15, 60K05, 60G17. Keywords and phrases: random function, Lipschitz map, Markov chain, Poisson equation, forward iterations, backward iterations, stationary distribution, Prokhorov metric, level γ ladder epoch, moment generating function, product of random matrices, Liapunov exponent, Harris recurrence, total variation, w-ergodicity, geometric ergodicity, uniform ergodicity, strictly contraction, drift condition, central limit theorem, quick convergence, Edgeworth expansion, renewal theorem. Research partially supported by NSC M

2 1 Introduction Iterated random functions (IRF) have a wide range of applications including perfect simulation, the generation of fractal images, data compression, queuing theory, autoregressive processes and likelihood representation of hidden Markov models, and among others. The reader is referred to Duflo (1997), and Diaconis and Freedman (1999) for excellent recent survey including an extensive list of relevant literature. In this paper, we study the theoretical aspect of iterated random functions; to summarize recent limiting theorems in the literature. To be more precise, a sequence of the form M n = F (θ n, M n 1 ), n 0, (1.1) is called an iterated random functions (IRF) of i.i.d. Lipschitz maps providing 1. M 0, θ 1, θ 2, are independent random elements on a common probability space (Ω, U, P); 2. θ 1, θ 2, are identically distributed with common distribution Λ and take values in a second countable measurable space (Θ, A); 3. M 0, M 1, take values in a complete separable metric space (X, d) with Borel σ-field B(X); 4. F : (Θ X, A B(X)) (X, B(X)) is jointly measurable and Lipschitz continuous in the second argument. Let X 0 be a dense subset of X and M(X 0, X) the space of all mappings f : X 0 X endowed with product topology and product σ-field. Then the space L Lip (X, X) of all Lipschitz continuous mappings f : X X properly embedded forms a Borel subset of M(X 0, X) and the mappings L Lip (X, X) X (f, x) f(x) X, d(f(x), f(y)) L Lip (X, X) f l(f) := sup x y d(x, y) are Borel, see Lemma 5.1 in Diaconis and Freedman (1999) for details. Hence L n := l(f (θ n, )), n 0, (1.1) are also measurable and form a sequence of i.i.d. random variables. In the following, we write F n (x) for F (θ n, x). Let F k:n := F k F n, F n:k := F n F k and F n:n 1 the identity on X for all 1 k n. Hence M n = F n (M n 1 ) = F n:1 (M 0 ) (1.2) 2

3 for all n 0. Closely related to these forward iterations, and in fact a key tool to their analysis, is the following sequence of backward iterations The connection is established by the identity ˆM n := F 1:n (M 0 ), n 0. (1.3) P x (M n ) = P x ( ˆM n ) for all n 0. Put also M x n := F n:1 (x) and ˆM x n := F 1:n (x) for x X and note that P((M x n, ˆM x n ) n 0 ) = P x ((M n, ˆM n ) n 0 ). The reason for introducing these additional sequences is that we will do comparisons of ˆM x n and ˆM y n, or M x n and M y n, for different x, y. In the verification of stochastic stability, it is known that the forward iterations M x n = F n F 1 (x), n 0, converge weakly to a unique stationary distribution π for each x X; while the associated backward iterations ˆM x n = F 1 F n (x) are a.s. convergent to a random variable ˆM which does not depend on x and has distribution π, providing the mean contraction assumption E log + L 1 < 0 and E log + d(f 1 (x 0 ), x 0 ) < for some x 0 X. The theory of additive functional of iterated random functions gives rise to general results of which typical examples are the ergodic theorem and central limit theorem; the results describe here can be considered as an infinite dimensional extension of this theory. The aspect of the situation which is new is the non-commutativity of the iteration, and thus we are led to study a certain Markov chain theory. Clearly, by definition (1.1), (M n ) n 0 constitutes a temporarily homogeneous Markov chain with state space X and transition kernel P, given by P (x, B) = Λ(F (, x) B) for x X and B B(X). The n-step transition kernel is denoted P n. For x X, let P x be the probability measure on the underlying measurable space under which M 0 = x a.s. The associated expectation is denoted E x, as usual. For an arbitrary distribution ν on X, we put P ν ( ) := P x ( ) ν(dx) with associated expectation E ν. We use P and E for probabilities and expectations, respectively, that do not depend on the initial distribution. It is known (cf. Alsmeyer, 2003) that the induced Markov chain from iterated random functions is Harris recurrent on a set H, and w-ergodic for some weight function w if extra moment conditions is assumed. The results may be more easily derived from related results in Meyn and Tweedie (1993, Chapter 17) if (M n ) n 0 is further irreducible (with respect to some measure on B(X)) in which case it is even positive Harris recurrent on some P - absorbing set. However, many IRF of i.i.d. Lipschitz functions are not irreducible but only weak Feller chains. It is this fact that complicates the necessary arguments in the general situation. 3

4 In this paper, we study limiting theorems for additive functions of a Markov chain that can be constructed as an iterated random functions, which include Stochastic stability, ergodic theorem, central limit theorem, quick convergence, Edgeworth expansion and renewal theorems. Three prototypical methods are introduced to prove limiting theorems: regeneration method, Poisson equation, and spectral theory for the transition operator. To start with, we introduce the regeneration method to prove rate of convergence and ergodic theorem for IRF in Section 2. Secondly, without the assumption of irreducibility, we apply Poisson equation method to prove central limit theorem and quick convergence in Section 3. To prove Edgeworth expansion and renewal theorems, we need to put the irreducibility assumption, to which two types of conditions are imposed here. A density hypothesis on Λ leads to a situation in the context of Harris recurrent; another natural hypothesis is the positivity of the functions in the support of Λ and we have then contraction properties which lead also to precise results. In Section 4, we state the results of Harris recurrent and w-ergodic for iterated random functions, and introduce a sufficient condition, based on the density hypothesis, for irreduciblity. In Section 5, we study the hypothesis of positivity for the functions in the support of Λ, on which basis we develop our spectral theory. Edgeworth expansion, and renewal theorems, considered in Sections 6 and 7 respectively, are then follow from the established Markov chains theory. Illustrated examples are included in Section 8. The first two satisfy the density assumption; while the third one satisfies the positivity assumption. The fourth example does not satisfy neither one. 2 Stochastic stability and ergodic theorem In this section, we summarize the results of stochastic stability and rate of convergence for iterations of i.i.d. mean contraction random Lipschitz functions. Ergodic theorem is also given. A central question for an IRF (M n ) n 0 is under which conditions it stabilizes, that is, converges to a stationary distribution π. Elton (1990) showed in the more general situation of a stationary sequence (F n ) n 1 that this holds true whenever E log + l(f 1 ) and E log + d(f 1 (x 0 ), x 0 ) are both finite for some (and then all) x 0 X and the Liapunov exponent l := lim n n 1 log l(f n:1 ) which exists by Kingman s subadditive ergodic theorem, is a.s. negative. His results for i.i.d. F 1, F 2, under the slightly stronger assumptions E log + l(f 1 ) < 0, E log + d(f 1 (x 0 ), x 0 ) < for some x 0 X are restated in Theorem 2.1. The basic idea is to consider the backward iterations ˆM n x = F 1:n(x) and to prove their a.s. convergence to a limit ˆM which does not depend on x and which has distribution π. The obvious inequality d( ˆM x n+m, ˆM x n) ( n k=1 ) l(f k ) d(f n+1:n+m (x), x) a.s., (2.1) 4

5 valid for all n, m 0 and x X, forms a key tool in the necessary analysis. Alsmeyer and Fuh (2001) embarks on that same inequality together with the simple observation that ( n log k=1 ) l(f k ) = n log l(f k ), n 0, (2.2) k=1 is an ordinary zero-delayed random walk and thus perfectly amenable to renewal theoretic (regeneration) arguments. Under the mean contraction assumption E log + l(f 1 ) < 0, it has negative drift whence, for arbitrary γ (0, 1), the level log γ ladder epochs σ 0 (γ) 0, σ n (γ) := inf { k > σ n 1 : k j=σ n 1 (γ)+1 } log l(f j ) log γ, n 1, (2.3) are all a.s. finite and constituting an ordinary discrete renewal process. As a consequence, the subsequence (M σn(γ)) n 0 forms again an IRF of i.i.d. Lipschitz maps which further is strictly contractive because, by construction, l(f 1:σ1 (γ)) γ < 1. For the associated backward iterations the very strong form ˆM x σn(γ) = F 1:σ n(γ)(x), inequality (2.1) hence takes d( ˆM x σ n+m (γ), ˆM x σ n(γ) ) γn d(f σn+1 (γ):σ n+m (γ)(x), x) (2.4) for all n, m 0 and x X and suggests the following procedure to prove convergence results for (M n ) n 0 and its associated sequence of backward iterations: Step 1. Given a set of conditions, find out what kind of results hold true for the strictly contractive IRF (M σn(γ)) n 0 for any γ (0, 1). Step 2. Analyze the excursions of (M n ) n 0 between two successive ladder epochs σ k (γ) and σ k+1 (γ) and adjust the results with respect to (M n ) n 0 if necessary. The stability results in this section are taken from Alsmeyer and Fuh (2001). They focus on estimates for d( ˆM, ˆM n ) under P x, x X, and d(mn x, M n y ) for x, y X. The latter distance may be viewed as the coupling rate of the forward iterations at time n when started at different values x and y. The two sets of conditions we will consider are that, for some p > 0 and some x 0 X, either or E log p+1 (1 + L 1 ) < and E log p+1 (1 + d(f 1 (x 0 ), x 0 )) < (2.5) EL p 1 < and Ed(F 1 (x 0 ), x 0 ) p < (2.6) 5

6 holds. Two major conclusions will concern the distance of P n (x, ) for x X and π in the Prokhorov metric associated with d. Following Diaconis and Freedman (1999), the latter is also denoted d and defined, for two probability measures λ 1, λ 2 on X, as the infimum over all δ 0 such that λ 1 (B) < λ 2 (B δ ) + δ and λ 2 (B) < λ 1 (B δ ) + δ for all B B(X), where B δ := {x X : d(x, y) < δ for some y B}. We will show that, for all x X and n 0, if (2.5) holds, and d(p n (x, ), π) A x (n + 1) p, (2.7) d(p n (x, ), π) A x r n (2.8) for some r (0, 1) not depending on x and n, if (2.6) is true. Now let σ 1 (γ) be as defined in (2.3) for γ (0, 1), i.e. σ 1 (γ) := inf{n 1 : L 1:n γ} = inf { n 1 : n k=1 } log L k log γ. (2.9) Providing E log + L 1 < 0, a condition which will always be in force throughout, σ 1 (γ) is an a.s. finite first passage time with finite mean µ(γ). It has also finite variance θ(γ) 2, say, if E log(1 + L 1 ) 2 <. Let further log γ log γ := inf γ (0,1) µ(γ). (2.10) If E log L 1 <, then it is well known from renewal theory, that log γ E log L 1 µ(γ) log γ E log L 1 (1 + o(1)) (γ 0). (2.11) It is now easily checked that in this case log γ = lim γ 0 log γ µ(γ) = E log L 1. (2.12) Theorem 2.1. Given an IRF (M n ) n 0 of i.i.d. Lipschitz maps, suppose E log + L 1 < 0 and E log + d(f 1 (x 0 ), x 0 ) < (2.13) 6

7 for some x 0 X. Then the following assertions hold: (a) ˆM n converges a.s. to a random element ˆM with distribution π which does not depend on the initial distribution. (b) For each γ (γ, 1), lim n P x (d( ˆM, ˆM n ) > γ n ) = 0 for all x X. (c) M n converges in distribution to π under every P x, x X. (d) π is the unique stationary distribution of (M n ) n 0 and ( ˆM n ) n 0 a stationary sequence under P π. (e) (M n ) n 0 is ergodic under P π. Theorem 2.2. Given the situation of Theorem 2.1 and additionally condition (2.5) for some p > 0, the following assertions hold: (a) For each γ (γ, 1), n p 1 P x (d( ˆM, ˆM ) n ) > γ n ) c γ (1 + log p (1 + d(x, x 0 )) and n 1 for all x X and some c γ (0, ). (b) For each γ (γ, 1), lim sup n lim n np P x (d( ˆM, ˆM n ) > γ n ) = 0 ( n p 1 p 1 n log d( ˆM, ˆM n ) log γ) 0 P x -a.s. for all x X. In case 0 < p 1 this remains true for γ = γ. (c) If p = 1, then lim n γ n d( ˆM, ˆM n ) = 0 P x -a.s. for all x X and all γ (γ, 1). (d) d(p n (x, ), π) A x (n + 1) p for all n 0, x X and a positive constant A x of the form max{a, 2d(x, x 0 )}, where A does neither depend on x nor on n. (e) log p (1 + d(x, x 0 0 )) π(dx) = pt p 1 π(x : log(1 + d(x, x 0 0 )) > t) dt <. Theorem 2.3. Given the situation of Theorem 2.1 and additionally condition (2.6) for some p > 0, the following assertions hold: (a) For each γ (γ, 1), lim n α n γ P x(d( ˆM, ˆM n ) > γ n ) = 0 for all x X and some α γ (0, 1). (b) There exists η > 0) such that for each q (0, η), lim sup n x X α n q (1 + d(x, x 0 )) q E x d( ˆM, ˆM n ) q = 0 7

8 for some α q (0, 1). The same holds true for q = η with α q = 1. (c) d(p n (x, ), π) A x r n for all n 0, some r (0, 1) and a constant A x of the form max{a, d(x, x 0 )}. The constants r and A do not depend on x nor n. (d) X d(x, x 0) η π(dx) = 0 ηt η 1 π(x : d(x, x 0 ) > t) dt < for some η > 0. Let us mention that the constants c γ, α γ, α q,, A x and r in the previous theorems generally further depend on p > 0 of the supposed respective moment condition. The assertions of the previous two theorems on d( ˆM, ˆM n ) are easily translated into similar results on d(mn x, M n y ) for the forward iterations started at different values x and y. Essentially, this only takes the observation that (Mn x, M n y) and ( ˆM n x, ˆM n y ) are identically distributed for all x, y X and n 0 and that d( ˆM x n, ˆM y n) d( ˆM x 0, ˆM x n) + d( ˆM x 0, ˆM y n). We summarize the results in the following two corollaries. Corollary 2.1. Given the situation of Theorem 2.2, the following assertions hold: (a) For each γ (γ, 1), ) n p 1 P(d(Mn x, M n y ) > γn ) c γ (1 + log p (1 + d(x, x 0 )) + log p (1 + d(y, x 0 )) n 1 and lim n np P(d(Mn, x Mn) y > γ n ) = 0 for all x, y X and some c γ (0, ). (b) For each γ (γ, 1), ( lim sup n p 1 p 1 ) x log d(m n n n, Mn) y log γ 0 a.s. for all x, y X. In case 0 < p 1 this remains true for γ = γ. (c) If p = 1, then lim n γ n d(mn x, M n y) = 0 a.s. for all x, y X and all γ (γ, 1). Corollary 2.2. Given the situation of Theorem 2.3, the following assertions hold: (a) For each γ (γ, 1), lim n α n γ P(d(M x n, M y n ) > γn ) = 0 for all x, y X and some α γ (0, 1). (b) There exists η > 0 such that for each q (0, η), lim sup αq n (1 + d(x, x 0 ) d(y, x 0 ) q E x d(mn, x Mn) y q = 0 n x,y X for some α q (0, 1). The same holds true for q = η with α q = 1. 8

9 3 Central limit theorem and quick convergence: Poisson equation approach In this section we show that a continuous functions obtained by iterated random functions converge to a standard normal distribution. The machinery which we develop to prove this result rests on the stability theory developed in Section 2. These techniques are extremely appealing as well as powerful, and can lead to much further insight into asymptotic behavior of the iterated random functions. Here we will focus on two results: central limit theorem and quick convergence. Let g L 2 0(π) be a square integrable function with mean 0, i.e. g dπ = 0 and g 2 2 = g 2 dπ <. (3.1) X Consider the sequence X S n (g) := g(m 1 ) + + g(m n ), n 1, (3.2) which may be viewed as a Markov random walk with driving chain (M n ) n 0. By constructing a solution h L 2 (π) to the Poisson equation h = g + P h, (3.3) where P h(x) := X h(y) P (x, dy), and a subsequent decomposition of S n(g) into a martingale and a stochastically bounded sequence, Benda (1998) showed that S n (g)/ n is asymptotically normal as n under P x for π-almost all x X, if g L Lip (X, R), EL 2 1 < 1 and Ed(F 1 (x 0 ), x 0 ) 2 <. (3.4) It was observed by Wu and Woodroofe (2000) that these conditions may be relaxed if the integrability assumption on g is slightly strengthened to g L 2 0(π) L r (π) for some r > 2. Their further assumptions are E log + L 1 < 0, (2.6) and a π-square integrability condition on a certain local Lipschitz constant for g with respect to a flattened metric ψ d. The main point is that it allows discontinuous g, for instance suitable indicator functions. A main purpose in this section is to summarize Benda (1998), and Wu and Woodroofe s (2000) results for the asymptotic normality of S n (g)/ n, and apply the results from Alsmeyer (1990), and Fuh and Zhang (2000) for quick convergence of n 1 S n (g) to 0. As to the above mentioned local Lipschitz constant for g, we will show that its integrability (instead of square integrability) with respect to π suffices. We will further give sufficient conditions for the β-quick convergence of n 1 S n (g) to 0. The concept of quick convergence was introduced by Strassen (1967). A sequence (Z n ) n 0 is said to converge β-quickly (β > 0) to a constant µ if E( sup{n 0 : Z n µ ε}) β < (3.5) 9

10 for all ε > 0. Plainly, Z n µ β-quickly implies Z n µ a.s. Put N ε := sup{n 0 : Z n µ ε}. Since (3.5) then reads ENε β < for all ε > 0, the β-quick convergence of Z n to µ holds if, and only if, n β 1 P(N ε n) = ( ) n β 1 P sup Z j µ ε j n n 1 n 1 < (3.6) for all ε > 0. Our results will be stated in Theorem 3.1 and Corollaries 3.1 and 3.2. As in Benda (1998), and Wu and Woodroofe (2000), the bulk of the work is to verify the existence of a solution to the Poisson equation (3.3). This is the content of Theorem 3.1. The asymptotic normality of S n (g)/ n (Corollary 3.1) then follows as in Benda (1998) by applying a martingale central limit theorem; while the β-quick convergence of S n (g)/n to 0 for suitable β (Corollary 2) will be obtained by using a result from Alsmeyer (1990), and Fuh and Zhang (2000). Some preliminary considerations are needed before presenting our results: A. Flattening the metric. In order for solving the Poisson equation (3.3) for a given function g, the particular given complete separable metric d on the space X will not be essential but may rather be altered to our convenience. This has been observed by Wu and Woodroofe (2000) who therefore consider flattened variations of d obtained by composing d with an arbitrary nondecreasing, concave function ψ : [0, ) [0, ) with ψ(0) = 0 and ψ(t) > 0 for all t > 0. Let Ψ be the collection of all such functions. It is easy to see that d ψ := ψ d is again a complete metric. Possible choices from Ψ include ψ p (t) := t p for any 0 < p 1 as well as ψ (t) := t. The latter choice leads to a bounded metric d 1+t ψ satisfying d ψ (x, y) d(x, y) 2d ψ (x, y) (3.7) for all x, y {(u, v) X 2 : d(u, v) 1}. This shows that the behavior of d and d ψ essentially the same for small values. Notice further that ψ ψ Ψ with is lim t 0 ψ ψ(t) ψ(t) = 1 (3.8) for all ψ Ψ. B. Integrable local Lipschitz constant. One can further relax the global Lipschitz continuity of g needed in Benda (1998) and instead be satisfied with a π-almost sure local Lipschitz continuity (with respect to a flattened metric d ψ ) in combination with an integrability condition on the local Lipschitz constant. To make this precise, let ψ Ψ. For a 10

11 measurable g : X R, define its local Lipschitz constant at x X with respect to d ψ as g(x) g(y) l ψ (g, x) = sup y:0<d(x,y) 1 d ψ (x, y) (3.9) and, for r [1, ], g r,ψ = l ψ (g, ) r, (3.10) where r denotes the usual norm on L r (π). It is easily seen that r,ψ defines a (pseudo-) norm on the space } L r ψ,0 {g (π) = L r (π) : g(x) π(dx) = 0 and g r,ψ < (3.11) X and that L r ψ,0 (π) = Lr ψ ψ,0 (π) with 1 2 r,ψ ψ r,ψ r,ψ ψ on this space (use (3.7) and (3.8)). Possibly after replacing ψ with ψ ψ, we may therefore always assume ψ be bounded when dealing with elements of L r ψ,0 (π). Plainly, all global Lipschitz functions, i.e. all g L Lip (X, R), are elements of L r ψ,0 (π) for any ψ Ψ. However, g need not be continuous in order for being an element of some L r ψ,0 (π). As pointed out in Wu and Woodroofe (2000), if g = 1 B is the indicator function of some B B(X), then l ψ (1 B, x) = 1 d ψ ( B, x), (3.12) where B denotes the topological boundary of B and d ψ ( B, x) := inf y B d(x, y). They further show that, if B(x, R) = {y : x y R} is the closed R-ball with center x X, ψ(t) = t 1/4 and λ denotes Lebesgue measure, then, for each x X, 1 B(x,R) π(b(x, R)) L 2 ψ,0 (π) for λ-almost all R > 0, see their Theorem 3. Theorem 3.1. Let r (1, ] with conjugate number s 1, given by = 1. Let also r s ψ Ψ be satisfying 1 0 ψ(t) t dt <. (3.13) If E log + L 1 < 0 and (2.6) holds for some p > s, then each g L r (π) L 1 ψ,0 (π) admits a solution h L r 0 (π) to the Poisson equation h = g + P h. We remark that all examples of ψ Ψ mentioned in Section 8 satisfy condition (2.6). With the help of the Poisson equation, one may write S n (g) = W n + R n, n 1 (3.14) 11

12 where W n := n (h(m k ) P h(m k 1 )), n 0 (3.15) k=1 forms a zero mean martingale under P π with stationary increments from L r (π) and R n := P h(m 0 ) P h(m n ), n 1 (3.16) is stochastically L r -bounded under P π in the sense that sup P π ( R n > t) 2P π (Z > t) (3.17) n 1 for all t > 0 and some Z L r (π); take any random variable Z 0 with distribution function P π ( P h(m 0 ) t/2) for t 0 In the stationary regime, that is under P π, the following central limit theorem now follows exactly as in Benda (1998) from Theorem 3.1 and a martingale central limit theorem. However, an additional argument is needed to show that the same result holds true under P x for π-almost all x X. While this extension is not considered in Wu and Woodroofe (2000), its proof in Benda (1998) fails to work here because it draws on the continuity of g and a moment condition like (2.5). Corollary 3.1. Given the assumptions of Theorem 3.1 with r 2 and p > s, S n (g)/ n is asymptotically normal with mean 0 and variance s 2 (g) := (h 2 (P h) 2 ) dπ under P π as well as under P x for π-almost all x X. So if g L 2 (π) L 1 ψ,0 (π) we need moment condition (2.6) for some p > 2, to conclude asymptotic normality of S n (g)/ n. By using the result of the existence of solution for the Poisson equation (3.3), the following corollary is taken from Theorem 2 in Fuh and Zhang (2000). Corollary 3.2. Given the assumptions of Theorem 3.1 with p > s > 1, S n (g)/n converges β-quickly to 0 for β = r 1, i.e. ( ) n r 2 P π sup j 1 S j (g) ε < (3.18) j n for all ε > 0. n 1 12

13 4 Harris recurrence of iterated random functions Let M n = F (θ n, M n 1 ), n 0 be the iterated random functions defined in Section 1. By the ergodic theorem as shown in Theorem 2.1(e), the latter implies for each B B(X) 1 lim n n n 1 B (M k ) = π(b) (4.1) k=1 P π -a.s. and thus also P x -a.s. for π-almost all x X. Hence, if π(b) > 0, then P x (M n B i.o.) = 1 (4.2) for π-almost all x X and we would like to conclude that every π-positive set B is recurrent. Unfortunately, the π-null set of x X for which (4.2) fails to hold in general depends on the set B. On the other hand, if it does not, we infer the π-irreducibility of the chain (M n ) n 0 on some H with π(h) = 1 and then, because of (4.2) for each π-positive B, further its Harris recurrence on H. Providing additionally aperiodicity, this in turn implies that P x (M n ) converges to π in total variation for every x H which, of course, is a much stronger conclusion than Elton s result appeared in Theorem 2.1. With regard to a further analysis of IRF, for instance the rate of convergence towards stationarity (in total variation), it also gives access to the highly developed theory for irreducible and Harris recurrent Markov chains on general state spaces. Given an IRF of i.i.d. Lipschitz maps satisfying the conditions of an a.s. negative Liapunov exponent and condition (2.13), two questions will be considered in this section and discussed in various examples in Section 8. we state a sufficient condition for H = X in Theorem 4.1. These conditions are quite often easy to check in applications when the stationary distribution is known to some extent. See Section 8 for several examples. We also deals with the convergence towards stationarity for Harris recurrent IRF. Under additional moment conditions on L 1 and d(f 1 (x 0 ), x 0 ), we will show w-regularity and w-ergodicity for suitable functions w in Theorem 4.2, and provide polynomial as well as geometric rates of convergence towards stationarity in Theorem 4.3. Theorems 4.1 to 4.3 are taken from Alsmeyer (2003). A set B B(X) is called π-full, if π(b) = 1, and P -absorbing, if P (x, B) = 1 for all x X. For the definitions of irreducibility, Harris recurrence and related notions for Markov chains on general state spaces not explicitly repeated here, we refers to the standard monograph by Meyn and Tweedie (1993). If (M n ) n 0 is a Harris chain on a set H, this set is called a Harris set (for (M n ) n 0 ). It is well-known that in this case there always exists a maximal absorbing set with this property, called maximal Harris set. Our next theorem contains some information on when this latter set is the whole space X. Let int(b) denote the interior of a set B B(X). 13

14 Theorem 4.1. Suppose (M n ) n 0 is an IRF of i.i.d. Lipschitz maps which has a.s. negative Liapunov exponent l and satisfies (2.13). Let π denote its stationary distribution. Suppose (M n ) n 0 is Harris recurrent with maximal Harris set H. Then the following assertions hold: (a) Either π(int(h)) = 0, or H = X. (b) If there exists a π-positive set X 0 and a σ-finite measure λ on (X, B(X 0 )) such that each P (x, ), x X 0, possesses a λ-continuous component. Furthermore, if X with π(int(x 0 )) > 0 and if int(supp π), then H = X. As already mentioned above, Theorem 4.1 implies, by invoking the ergodic theorem for aperiodic, positive Harris chains (see Meyn and Tweedie (1993), Theorem ) that lim P x(m n ) π = 0 (4.3) n for all x H where denotes the total variation distance. A weaker metric considered in Diaconis and Freedman (1999) and Alsmeyer and Fuh (2001) is the Prokhorov metric associated with d. See also Theorems 2.2 and 2.3 in Section 2. If (M n ) n 0 is Harris recurrent, it is natural to ask in view of Theorem 2.2(d) and Theorem 2.3(c), whether or not similar conclusions hold when replacing the Prokhorov distance with the total variation distance. The positive answer is provided in Theorem 4.3 for the case H = X and under the additional assumptions that the support of the stationary distribution π has nonempty interior. Weaker conclusions, stated as Theorem 4.2, can considerably easier concerning the w- regularity of (M n ) n 0. Following Meyn and Tweedie (1993), a set C B(X) is called w-regular for a function w : X [1, ) if for each π-positive B B(X) sup x C ( ϱ(b) 1 E x n=0 ) w(m n ) <, where ϱ(b) := inf{n 1 : M n B}. (M n ) n 0 is called w-regular on a P -absorbing set H if it is π-irreducible and H admits a countable cover of w-regular sets. Defining the w-norm ν w for a signed measure ν as ν w := sup ν(g), ν(g) := g dν. g w (M n ) n 0 is called w-ergodic on H if it is positive Harris on H with invariant distribution π satisfying π(f) < and if lim n P n (x, ) π w = 0 14

15 for all x H. Now put providing (2.5) for p > 0, and w(x) := 1 + log p (1 + d(x, x 0 )) (4.4) w(x) := 1 + d(x, x 0 ) η (4.5) providing (2.6) for p > 0, and 0 < η p such that X d(x, x 0) η dπ(x) <. By using Meyn and Tweedie s main result on w-regularity, the following result is now immediate and hence stated without proof. Theorem 4.2. Let (M n ) n 0 be an IRS of i.i.d. Lipschitz maps satisfying Elton s conditions. Suppose further that (M n ) n 0 is an aperiodic positive Harris chain on a π-full, absorbing set H and that either (2.5) or (2.6) holds for some p > 0. Then H may be chosen such that (M n ) n 0 is w-regular and w-ergodic on H with w according to (4.4), respectively (4.5). It is to be understood that the Harris set H on which (M n ) n 0 is w-regular need not be the maximal Harris set. Theorem 4.3. Let (M n ) n 0 be an IRF of i.i.d. Lipschitz maps with a.s. negative Liapunov exponent l and stationary distribution π. Suppose further that (M n ) n 0 is a positive Harris chain on whole X and that int(supp π). Then the following assertions hold: (a) If (M n ) n 0 satisfies (2.5) for some p > 0, then as well as n p 1 P x (M n ) π < (4.6) n 1 for all x X. (b) If (M n ) n 0 satisfies (2.6) for some p > 0, then lim n np P x (M n ) π = 0. (4.7) r n P x (M n ) π w < (4.8) n 0 for all x X and some r (0, 1) not depending on x X, where w is defined as in (4.5). 15

16 5 Spectral decomposition and characteristic functions of Markov random walks It is shown, in Section 4, that the induced Markov chain (M n ) n 0 of the iterated random functions is Harris recurrent on a set H. Under the assumption of H = X, and moment assumption (2.6), (M n ) n 0 is w-geometric with w defined in (4.5). In this section, we introduce the culminating form of the geometric ergodicity theorem, and show that such convergence can be viewed as geometric convergence of an operator norm. That is, the convergence is bounded independently of the starting point. In the following, we study the spectral theory for uniform ergodic Markov chains with respect to a general norm, and apply it to iterated random functions in the next two sections. The materials of this section are similar to that of Fuh and Lai (2001), and Fuh and Lai (2003), we include here for completeness. Let {(X n, S n ), n 0} be a Markov random walk on X R d. For sake of notation, denote P (x, A) = P (x, A R d ). For all transition probability kernels P (x, A), Q(x, A), x X, A A and for all measurable functions h(x), x X, define Qh and P Q by Qh(x) = Q(x, dy)h(y) and P Q(x, A) = P (x, dy)q(y, A), respectively. Let N be the Banach space of measurable functions h : X C (:= set of complex numbers) with norm h <. We introduce the Banach space B of transition probability kernels Q such that the operator norm Q = sup{ Qg ; g 1} is finite. Two prototypical norms used in the literature are the supnorm and the L p -norm for 1 < p <. Another two commonly used norms in applications are the weighted variation norm and the bounded Lipschitz norm, to be described as follows: 1. Let w : X [1, ) be a measurable function, define for all measurable functions h, a weighted variation norm h w = sup h(x) /w(x), (5.1) x X and set N w = {h : h w < }. Corresponding norm in B w is of the form Q w = sup x X Q (x, dy)w(y)/w(x). 2. Let (X, d) be a metric space. For any continuous function h on X, the Lipschitz seminorm is defined by h L := sup x y h(x) h(y) /d(x, y). Call the supremum norm h = sup x X h(x). Let the bounded Lipschitz norm h BL := h L + h (5.2) and N BL = {h : h BL < }. Here BL stands for bounded Lipschitz. Denote by P n (x, A) = P (X n A X 0 = x), the transition probabilities over n steps. The kernel P n is a n-fold power of P. Define the Césaro averages P (n) = n j=0 P j /n, where P 0 = P (0) = I and I is the identity operator on B. 16

17 Definition 1 A Markov chain {X n, n 0} is said to be uniformly ergodic (or strongly stable) with respect to a given norm, if there exists a stochastic kernel Π such that P (n) Π as n in the induced operator norm in B. The Markov chain {X n, n 0} is called w-uniformly ergodic in the case of weighted variation norm. The Markov chain {X n, n 0} is assumed to be irreducible (with respect to some measure on A), aperiodic and strongly stable. Theorem 1.1. of Kartashov (1996) leads that P has a unique stationary projector Π, in the sense of Π 2 = Π = P Π = ΠP, and Π(x, A) = π(a) for all x X and A A. The following assumptions will be used in this section. C1. There exists a natural n, a measure α on A and a measurable function h on A such that π(dx)h(x) > 0, α(x ) = 1, α(dx)h(x) > 0, and the kernel T (x, A) = P n (x, A) h(x)α(a) is nonnegative. C2. sup h 1 E[h(X 1 ) X 0 = x] <. C3. sup x E x ξ 1 2 < and sup h 1 E[ ξ 1 r h(x 1 ) X 0 = x] < for some r 3. C4. Let ν be an initial distribution of the Markov chain {X n, n 0}, assume that for some r 1, ν := sup h(x)e x ξ 1 r ν(dx) <. h 1 x X Remarks: 1. Condition C1 is a mixing condition on the Markov chain {X n, n 0}, and it is satisfied for Harris recurrent Markov chain. An example on page 9 of Kartashov (1996) shows that there exists an uniformly ergodic Markov chain with respect to a given norm, which is not Harris recurrent. C2 is a condition to guarantee the operators defined in (5.4)-(5.5) below to be bounded. C3 and C4 are moment conditions. We also note that by making use a similar argument as that in Section 3 of Jensen (1987), the X 1 and ξ 1 appeared in C2-C4 can be relaxed to X t and ξ t, for some fixed t > Theorem 2.2 and Corollary 2.1 of Kartashov (1996) shows that under Condition C1, a Markov chain X with transition probability kernel P is uniformly ergodic with respect to a given norm if and only if there exist γ > 0 and 0 < ρ < 1 such that for all n 1 P n Π γρ n. (5.3) When the Markov chain {X n, n 0} is w-uniformly ergodic, (5.3) is satisfied without Condition C1. For d 1 vectors θ, define the linear operators P θ, P, ν θ and Q on N by (P θ h)(x) = h(y)e iθ s P (x, dy ds) = E[h(X 1 )e iθ S 1 X 0 = x], (5.4) (Ph)(x) = h(y)p (x, dy ds) = E[h(X 1 ) X 0 = x], (5.5) 17

18 ν θ h = E ν {h(x 0 )e iθ S 1 }, Qh = h(y)π(dy). (5.6) Condition C2 ensures that P θ and P are bounded linear operators on N, and (5.3) implies that P n Q = sup h N, h 1 P n h Qh γρ n. (5.7) For a bounded linear operator T : N N, the resolvent set is defined as {z C : (T zi) 1 exists} and (T zi) 1 is called the resolvent (when the inverse exists). From (5.7), it follows that for z 1 and z > ρ, R(z) := Q/(z 1) + (P n Q)/z n+1 (5.8) is well defined. Since R(z)(P zi) = I = (P zi)r(z), the resolvent of P is R(z). Moreover, by C3 and an argument similar to the proof of Lemma 2.2 of Jensen (1987), there exist K > 0 and η > 0 such that for θ η, z 1 > (1 ρ)/6 and z > ρ+(1 ρ)/6, n=0 P θ P K θ, (5.9) R θ (z) := R(z){(P θ P)R(z)} n is well defined. (5.10) n=0 Since R θ (z)(p θ zi) = R θ (z){(p θ P) + (P zi)} = I = (P θ zi)r θ (z), the resolvent of P θ is R θ (z). For θ η the spectrum (which is the complement of the resolvent set) of P θ therefore lies inside the two circles C 1 = {z : z 1 = (1 ρ)/3} and C 2 = {z : z = ρ + (1 ρ)/3}. (5.11) Hence, by the spectral decomposition theorem (cf. Riesz and Sz-Nagy (1955), page 421), N = N 1 (θ) N 2 (θ) and Q θ := 1 R θ (z)dz, I Q θ := 1 R θ (z)dz (5.12) 2πi C 1 2πi C 2 are parallel projections of N onto the subspaces N 1 (θ), N 2 (θ), respectively. Moreover, by an argument similar to the proof of Lemma 2.3 of Jensen (1987), there exists 0 < δ η such that N 1 (θ) is one-dimension for θ δ and sup Q θ Q < 1. (5.13) θ δ 18

19 For θ δ, let λ(θ) be the eigenvalue of P θ with corresponding eigenspace N 1 (θ). Since Q θ is the parallel projection onto the subspace N 1 (θ) in the direction of N 2 (θ), P θ Q θ h = λ(θ)q θ h for h N. (5.14) Letting ν denote the initial distribution of (X 0, S 0 ) and defining the operator ν θ by (5.6), we then have for h N, E ν {e iθ Sn h(x n )} = ν θ P n θ h = ν θp n θ {Q θ + (I Q θ )}h (5.15) = λ n (θ)ν θ Q θ h + ν θ P n θ (I Q θ)h. Suppose C4 also holds. An argument similar to the proof of Lemma 2.4 of Jensen (1987) shows that there exist K > 0 and 0 < δ < δ such that for θ δ, ν θ P n θ (I Q θ)h K h θ {(1 + 2ρ)/3} n. (5.16) We next consider the summand λ n (θ)ν θ Q θ h in (5.15). Suppose that C3 holds with r 3 and let [r] denote the integer part of r. Then analogous to Lemma 2.5 of Jensen (1987), λ(θ) has the Taylor expansion λ(θ) = 1 + (j 1,,j d ):1 j 1 + +j d [r] i j 1+ +j d λ j1,,j d θ j 1 1 θ j d d /(j 1! j d!) + (θ) (5.17) in some neighborhood of the origin, where (θ) = O( θ r ) as θ 0. Assume furthermore that C4 holds. Then, analogous to Lemma 2.6 of Jensen (1987), ν θ Q θ h 1 has continuous partial derivatives of order [r] 2 in some neighborhood of the origin. Moreover, there exist constants K and 0 < δ < δ such that for θ < δ and l r 2, we have d l dθ ν r 3 1 θq l θ h = (j 1)! θj 1 β j + ck θ r 2 l, (5.18) j=1 where c 1, and β j, j = 0, 1,, r 3, are constants with β 0 = 1. ν θ P n θ (I Q θ)h has continuous partial derivatives of order [r] in some neighborhood of the origin. Moreover, the norm of any such partial derivatives converges to 0 geometrically fast. In summary, we have Theorem 5.1 Let {(X n, S n ), n 0} be the Markov random walk defined as (5.1), satisfying Conditions C1-C4. Then, there exists a δ > 0 such that for all θ R d with θ < δ, we have P θ = λ(θ)q θ + P θ (I Q θ ), (5.19) 19

20 and (i) λ(θ) is the unique eigenvalue of the maximal modulus of P θ ; (ii) Q θ is a rank-one projection such that Q θ (I Q θ ) = (I Q θ )Q θ = 0; (iii) the mappings λ(θ), Q θ and I Q θ are analytic for θ < δ; (iv) λ(θ) > 2+ρ and for each p N, the set of positive integers, there exists c > 0 such 3 that for each n N, dp dθ p Pn θ (I Q θ) c( 1 + 2ρ ) n. 3 (v) defining γ j = lim n (1/n)E x log T n (j) as the upper Liapunov exponent, it follows that γ j = λ(α) α=0 = α j E x (log M (j) 1 u / u )dm(x, ū). Remarks: 1. Under C1-C3 with respect to a norm together with the assumption E x { S 1 r } <, it can be shown by an argument similar to the proof of Lemma 2.7 of Jensen (1987) that ν θ P n θ (I Q θ)h has continuous partial derivatives of order [r] in some neighborhood of the origin. Moreover analogous to (5.16), the norm of any such partial derivatives converges to 0 geometrically fast, by an argument similar to the proof of Lemma 2.4 of Jensen (1987). 2. For the special case ξ t = g(x t ) with g : X R, the representation (5.15)-(5.17) of the characteristic function E ν (e iθsn ) was first obtained by Nagaev (1957) under the uniform ergodicity condition sup P (X m A X 0 = x) P (X m A X 0 = y) < 1 for some m 1. (5.20) A,x,y As noted by Nagaev, (5.20) implies the existence of a stationary distribution π and a uniform geometric rate of converge to the stationary distribution sup P (X n A X 0 = x) π(a) γρ n, (5.21) A,x for some γ > 0, 0 < ρ < 1 and all n 1. Jensen (1987) first clarified Nagaev s arguments and then, considered the more general case ξ t = g(x t 1, X t ) with g : X X R. Noting that the moment condition sup x E[ g(x, X 1 ) r X 0 = x] < required in Nagaev s arguments for such ξ t is not satisfied in most cases where g depends on both X t 1 and X t, he extended Nagaev s representation (5.15)-(5.17) to the case where (5.20) holds (also in the case of L p -norm for 1 < p < ) and sup E[ g(x m, X m+1 ) r X 0 = x] < for some m 1 and r 3, (5.22) x E[ g(x t 1, X t ) r 2 X 0 = x]ν(dx) < for 1 t m 2. (5.23) 20

21 Instead of introducing a delay m as in (5.22) and (5.23), we broaden the scope of applicability of Nagaev s representation theory by using general norm not only in the moment condition C3 but also in the ergodicity condition. In Sections 6, 7 and 8, the usefulness of this idea is discussed further and illustrated with examples of iterated random functions. 6 Rate of convergence theorems: Asymptotic expansion We show in Section 4 that iterated random functions, under some regularity conditions, satisfying the properties of Harris recurrent and w-ergodic. Sufficient condition in Theorem 4.1 is also given for π-irreducible. Section 5 provides the spectral theory for irreducible Markov operator. In this section, we apply the results in Harris recurrent and strong stable Markov chains, to have Edgeworth expansion for iterated random functions. Edgeworth expansion for irreducible Harris recurrent Markov chains can be found in Hipp (1985), Malinovskii (1987) and Jensen (1989); while Edgeworth expansion for strong stable Markov chains is in Fuh and Lai (2003). By the assumption of Harris recurrent, we can assume, without loss of generality, that the state space X has an atom A 0, that is π(a 0 ) > 0 and P (x, ) = P (y, ) for all x, y A 0. We may then define the stopping times T 0 = inf{n 0 g(m n ) A 0 }, T k = inf{n > T k 1 g(m n ) A 0 } for k 1 and τ k = T k T k 1. Also we let 0 denote a fixed point in A 0. Adapted the notations from Section 3. For j = 1,, d, let g j L 2 0(π) be a square integrable function with mean 0, i.e. g j dπ = 0 and g j 2 2 = dπ <. X Denote g = (g 1,, g d ) and consider the sequence gj 2 X S n := S n (g) := g(m 1 ) + + g(m n ), n 1, which may be viewed as a Markov random walk with driving chain (M n ) n 0. We want to make asymptotic expansion of the distribution of the sum S n (g). For this we define the random variable Z k = T k j=t k 1 +1 g(m j ) (6.1) for k = 1, 2,. The uniform Cramer condition for (Z 1, τ 1 ) under P 0 states that for any c > 0 there exists a δ < 1 such that E 0 {exp(iu Z 1 + ivτ 1 )} < δ (6.2) 21

22 for all v R and all u R d with u > c. We define a uniformity class B c of Borel sets in the following way, B c = {B B d Φ{( B) ε } < cε for all ε > 0}, (6.3) where Φ is the standard normal distribution in R d and ( B) ε = {B(x, ε) x B, B(x, ε) B c } with B(x, ε) a ball centered at x and with radius ε. The following theorem is taken from Theorem 1 of Jensen (1989). Theorem 6.1 Suppose (M n ) n 0 is an IRF of i.i.d. Lipschitz maps which has a.s. negative Liapunov exponent l and satisfies (2.13). Let π denote its stationary distribution. Suppose there exists a π-positive set X 0 and a σ-finite measure λ on (X, B(X)) such that each P (x, ), x X 0, possesses a λ-continuous component. Furthermore, if X with π(int(x 0 )) > 0 and if int(supp π). Assume further that A 0 is the positive Harris recurrent atom and let 0 A 0. Let µ be the initial distribution of M 0. Assume for some s 3 the moment conditions T 0 E µ (T0 s 2 ) < E µ ( g(m j ) ) s 2 ) < (6.4) j=0 T 1 E 0 (τ1 s ) < E 0 ( g(m j ) ) s ) <, (6.5) and assume that under P the covariance of (Z 1, τ 1 ) is positive definite and that (Z 1, τ 1 ) satisfies the uniform Cramer condition (6.2). Then P( 1 n n {g(m j ) π(g)} B) = j=1 B j=1 s 3 ϕ Σ(x) n r/2 q r (x)dx + O(n (s 2)/2 ) uniformly for B B c. Here ϕ Σ is the density of the normal distribution with mean zero and covariance Σ, q r is a polynomial in x and r=0 Σ = E π {g(m 1 ) π(g)} {g(m 0 ) π(g))} + C where C = E π {g(m 1 ) π(g)} {g(m n ) π(g))}. n=2 In the second part of this section, we will introduce the Edgeworth expansion for strong stable Markov chains, and then apply it to additive functional of IRF. As in (5.3), the 22

23 Markov chain {X n, n 0} is geometrically mixing in the sense that there exist γ > 0 and 0 < ρ < 1 such that for all x X, k 0 and n 1 and for all real-valued measurable functions g, h with g, h N, E x {g(x k )h(x k+n )} {E x g(x k )}{E x h(x k+n )} γρ n. (6.6) Let g, h be real-valued measurable functions on X X. Since E x h(xk, X k+1 ) = E x h(x k ), where h(z) = E z { h(z, X 1 )}, the same proof as that of Theorem 2.2 of Kartashov (1996) can be used to show that there exist γ 1 > 0 and 0 < ρ 1 < 1 such that for all x X, k 0 and n 1 and for all measurable g, h with sup y g 2 (x, y) N and sup y h2 (x, y) N, E x { g(x k, X k+1 ) h(x k+n, X k+n+1 )} {E x g(x k )}{E x h(x k+n )} γ 1 ρ n 1 1. (6.7) To establish asymptotic expansion for Markov random walks, we shall make use of (6.7) in conjunction with the following extension of conditional Cramér s (strongly nonlattice) condition (cf. Fuh and Lai (2003)): There exists m 1 such that lim sup E{exp(iθ S m ) X 0, X m } < 1. (6.8) θ Next, we assume the strong mixing condition hold. E ν { g(x k, X k+1 ) h(x k+n, X k+n+1 )} {E ν g(x k )}{E π h(x k+n )} γ 1 ρ n 1 1. (6.9) Remarks: 1. When the norm is the weighted variation norm, we do not need the strong mixing condition (6.9), cf. page 653 of Fuh and Lai (2001). 2. In the special case where S 1 is independent of (X 0, X 1 ) so that the kernel P (x, A B) in (1.1) can be factorized as P 1 (x, A)P 2 (B), (6.8) reduces to the condition that the random variable S 1 is strongly nonlattice: lim sup exp(iθ s)p 2 (ds) < 1. θ In addition to (6.8) and (6.9), we shall assume that C1-C4 in Section 5 hold for some integer r 3. Let µ = E x S 1 π(dx) (= λ (0)), (6.10) and let V = ( 2 λ(θ)/ θ i θ j θ=0 ) 1 i,j d be the Hessian matrix of λ at 0. By Theorem 5.1, where denotes the transpose. lim n n 1 E ν {(S n nµ)(s n nµ) } = V, (6.11) 23

24 Let ψ n (θ) = E ν (e iθ Sn ) and let h 1 N be the constant function h 1 1. Then by Proposition 1 and the fact that ν θ Q θ h 1 has continuous partial derivatives of order r 2 in some neighborhood of θ = 0, we have the Taylor series expansion of ψ n (θ/ n) for θ/ n ε (some sufficiently small positive number): ψ n (θ/ r 2 n) = {1 + n j/2 π j (iθ)}e θ V θ/2 + o(n (r 2)/2 ), (6.12) j=1 where π j (iθ) is a polynomial in iθ of degree 3j, whose coefficients are smooth function of the partial derivatives of λ(θ) at θ = 0 up to the order j + 2 and those of ν θ Q θ h 1 at θ = 0 up to the order j. Letting D denote the d 1 vector whose kth component is the partial differentiation operator D k with respect to the kth coordinate, define the differential operator π j ( D). As in the case of sums of i.i.d. zero-mean random vectors (cf. Bhattacharya and Rao, 1976), we obtain an Edgeworth expansion for the formal density of the distribution of S n by replacing the π j (iθ) and e θ V θ/2 in (6.12) by π j ( D) and φ V (y), respectively, where φ V is the density function of the d-variate normal distribution with mean 0 and covariance matrix V. The following two theorems are taken from Fuh and Lai (2003). Theorem 6.2 Let r 3 be an integer. Assume C1-C4, (6.8) and (6.9) hold (or C1-C4, and (6.8) hold in the case of w-uniformly ergodic). Let φ j,v = π j ( D)φ V for j = 1,..., r 2. For 0 < α 1 and c > 0, let B α,c be the class of all Borel subsets B of R d such that φ ( B) ε V (y)dy cε α for every ε > 0, where B denotes the boundary of B and ( B) ε denotes its ε-neighborhood. Then sup P ν {(S n nµ)/ r 2 n B} {φ V (y) + n j/2 φ j,v (y)}dy = o(n (r 2)/2 ). (6.13) B B α,c B Next, we apply Theorems 6.2 to the case of iterated random functions. The following theorem proves that M n is a strong stable Markov chain, under some moment conditions. Theorem 6.3 Given an IRF (M n ) n 0 of i.i.d. Lipschitz maps, suppose for some p > 0, j=1 E log + L 1 < 0, EL p 1 < and Ed(F 1 (x 0 ), x 0 ) p < (6.14) for some x 0 X. Under the assumptions of Theorem 6.1. Then (M n ) n 0 is ergodic with stationary distribution π and uniform ergodic with respect to the norm h wl := h w + h BL (6.15) := h(x) sup x X 1 + d(x 0, x) + sup h(x) h(y), p x y d(x, y) q 24

25 for q (0, p) and some x 0 X. Here, wl represents a combination of the weighted variation norm with w(x) = 1 + d(x 0, x) p and the bounded Lipschitz norm. Furthermore, there exist γ > 0 and 0 < α q < 1 such that where P, Q are defined as (5.4)-(5.6). P n Q wl = sup P n h Qh wl γαq n, (6.16) h =1 Under negative Liapunov assumption E log + L 1 < 0 and moment conditions EL 2 1 < 1, Ed(F 1 (x 0 ), x 0 ) 2 <, for some x 0 X, Benda (1998) and Wu and Woodroofe (2000) proved the central limit theorem for S n (g)/ n := n t=1 g(m t)/ n in iterated random functions. In this section, we study the asymptotic expansion for S n (g)/ n for a given function g. Note that the method used in Benda (1998), and Wu and Woodroofe (2000) is based on the idea of Poisson equation. And no irreducible assumption is needed in their argument. Here, we apply Theorem 4.1 for aperiodic, irreducible and uniform ergodic (with respect to the wl norm) Markov chain that can be constructed as an iterated random functions. 7 Renewal theorems In this section, we summarize the results from Fuh and Lai (2001) to state d-dimensional renewal theorems, with an estimate on the rate of convergence, for the Markov random walks induced by the iterated random functions. Although the norm considered in Fuh and Lai (2001) is the weighted variation norm (5.1), the spectral theory from Section 5 can be used to generalize them to general norm without any difficulty. Let {(X n, S n ), n 0} be the Markov random walk considered in Section 5. In the one-dimensional case, let g : X R R. The classical Markov renewal theorem states that under certain regularity conditions, E ν ( g(x k, b S k )) k=0 X R g(x, s)dsdπ(x)/ E x ξ 1 dπ(x) (7.1) X as b. In Theorem 7.4 we establish rates of convergence for (7.1), generalizing Stone s (1965) results in the i.i.d. case. While in Theorems we establish the results to multidimensional Markov renewal theory (for the case d > 1) with convergence rates, where the Markov random walks are induced by iterated random functions. Our approach uses the Fourier transform of the Markov transition operator and Schwartz s theory of distributions, developed in Section 5. 25

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................