The first divisible sum

Similar documents
Convergence Rates for Renewal Sequences

Faithful couplings of Markov chains: now equals forever

A COMPOUND POISSON APPROXIMATION INEQUALITY

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Poisson Approximation for Independent Geometric Random Variables

Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

A Note on Poisson Approximation for Independent Geometric Random Variables

A Pointwise Approximation of Generalized Binomial by Poisson Distribution

Non Uniform Bounds on Geometric Approximation Via Stein s Method and w-functions

arxiv: v2 [math.pr] 4 Sep 2017

Mi-Hwa Ko. t=1 Z t is true. j=0

SIMILAR MARKOV CHAINS

A New Approach to Poisson Approximations

Markov Chains and Stochastic Sampling

The Moran Process as a Markov Chain on Leaf-labeled Trees

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

PAIRS OF SUCCESSES IN BERNOULLI TRIALS AND A NEW n-estimator FOR THE BINOMIAL DISTRIBUTION

A CLT FOR MULTI-DIMENSIONAL MARTINGALE DIFFERENCES IN A LEXICOGRAPHIC ORDER GUY COHEN. Dedicated to the memory of Mikhail Gordin

AN EXTENSION OF THE HONG-PARK VERSION OF THE CHOW-ROBBINS THEOREM ON SUMS OF NONINTEGRABLE RANDOM VARIABLES

Decomposition of Riesz frames and wavelets into a finite union of linearly independent sets

Notes on Poisson Approximation

LIST OF MATHEMATICAL PAPERS

SMSTC (2007/08) Probability.

A mathematical framework for Exact Milestoning

A matrix over a field F is a rectangular array of elements from F. The symbol

Information Theory and Statistics Lecture 3: Stationary ergodic processes

Countable state discrete time Markov Chains

NEW FRONTIERS IN APPLIED PROBABILITY

Multiple points of the Brownian sheet in critical dimensions

RENEWAL THEORY STEVEN P. LALLEY UNIVERSITY OF CHICAGO. X i

LIMITS FOR QUEUES AS THE WAITING ROOM GROWS. Bell Communications Research AT&T Bell Laboratories Red Bank, NJ Murray Hill, NJ 07974

A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM

On Differentiability of Average Cost in Parameterized Markov Chains

A Note on the Approximation of Perpetuities

18.175: Lecture 2 Extension theorems, random variables, distributions

Lebesgue Measure on R n

COMPOSITION SEMIGROUPS AND RANDOM STABILITY. By John Bunge Cornell University

T. Liggett Mathematics 171 Final Exam June 8, 2011

RENEWAL THEORY STEVEN P. LALLEY UNIVERSITY OF CHICAGO. X i

A Short Introduction to Stein s Method

1 Continuous-time chains, finite state space

The Azéma-Yor Embedding in Non-Singular Diffusions

A NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES

Logarithmic scaling of planar random walk s local times

ELEMENTARY LINEAR ALGEBRA

On bounds in multivariate Poisson approximation

On Approximating a Generalized Binomial by Binomial and Poisson Distributions

MARKOV PROCESSES. Valerio Di Valerio

Herz (cf. [H], and also [BS]) proved that the reverse inequality is also true, that is,

IEOR 6711, HMWK 5, Professor Sigman

Short cycles in random regular graphs

GARCH processes probabilistic properties (Part 1)

A Note on the Strong Law of Large Numbers

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

RENEWAL THEORY STEVEN P. LALLEY UNIVERSITY OF CHICAGO. X i

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

On the asymptotic behaviour of the number of renewals via translated Poisson

The tensor algebra of power series spaces

RECENT RESULTS FOR SUPERCRITICAL CONTROLLED BRANCHING PROCESSES WITH CONTROL RANDOM FUNCTIONS

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Injective semigroup-algebras

PSEUDO POISSON APPROXIMATION FOR MARKOV CHAINS

Random Times and Their Properties

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

1 Stat 605. Homework I. Due Feb. 1, 2011

Chapter 7. Markov chain background. 7.1 Finite state space

arxiv: v1 [math.pr] 16 Jun 2009

µ n 1 (v )z n P (v, )

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10

On the discrepancy of circular sequences of reals

Convergence of Banach valued stochastic processes of Pettis and McShane integrable functions

APPROXIMATION OF GENERALIZED BINOMIAL BY POISSON DISTRIBUTION FUNCTION

Probability and Measure

Duality of multiparameter Hardy spaces H p on spaces of homogeneous type

ADDITIVE RESULTS FOR THE GENERALIZED DRAZIN INVERSE

A NOTE ON STOCHASTIC INTEGRALS AS L 2 -CURVES

On the Distribution of Sums of Vectors in General Position

On lower limits and equivalences for distribution tails of randomly stopped sums 1

Strong approximation for additive functionals of geometrically ergodic Markov chains

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

This condition was introduced by Chandra [1]. Ordonez Cabrera [5] extended the notion of Cesàro uniform integrability

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition

Introduction and Preliminaries

KERSTAN S METHOD FOR COMPOUND POISSON APPROXIMATION. BY BERO ROOS Universität Hamburg

Stat 516, Homework 1

MAXIMAL COUPLING OF EUCLIDEAN BROWNIAN MOTIONS

A Remark on Complete Convergence for Arrays of Rowwise Negatively Associated Random Variables

MODERATE DEVIATIONS IN POISSON APPROXIMATION: A FIRST ATTEMPT

Model theory, algebraic dynamics and local fields

Wald for non-stopping times: The rewards of impatient prophets

ITÔ S ONE POINT EXTENSIONS OF MARKOV PROCESSES. Masatoshi Fukushima

Entropy, Compound Poisson Approximation, Log-Sobolev Inequalities and Measure Concentration

arxiv:math/ v1 [math.fa] 26 Oct 1993

Conditional moment representations for dependent random variables

Wasserstein-2 bounds in normal approximation under local dependence

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Multivariate Markov-switching ARMA processes with regularly varying noise

Transcription:

The first divisible sum A.D. Barbour and R. Grübel Universität Zürich and Technische Universiteit Delft now: "Universität-Gesamthochschule-Paderborn" Abstract. We consider the distribution of the first sum of a sequence of iid random variables which is divisible by d. This is known to converge, when divided by d, to a geometric distribution as d. We obtain results on the rate of convergence, using two contrasting approaches. In the first, Stein s method is adapted to geometric limit distributions. The second method is based on the theory of Banach algebras. Each method is shown to have its merits. Keywords: discrete renewal theory, Stein s method, Banach algebras. AMS Classification: 60K05, 60F05 Postal addresses: Institut für Angewandte Mathematik, Rämistrasse 74, CH 8001 ZÜRICH and Department of Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, NL 2628CD DELFT. now: "Fachbereich 17 (Mathematik-Informatik), Universität- Gesamthochschule-Pad erborn, 33095 Paderborn, Germany" 1

1. Introduction. There is a mistake in the very first sentence that had a result similar to that of a missing semicolon in certain computer programs - it fouled the whole thing up. For me, N 0 ( the N with the na ught ) has always been {0, 1, 2,...}, but of course, it should then be N (= {1, 2,...}) here. I suggest that we insert something like "We will write N for the set {1, 2, 3,...} of strictly positive integers and Z + for the set {0, 1, 2,...} of non-negative integers." Many of the referee s comments wil l become vacuous once N and N 0 are interpreted as above. Let (X i :i N 0 ) be a sequence of independent and identically distributed random variables with values in N and let p n = P (X 1 = n). We assume that µ = EX 1 is finite and that the sequence p = (p n : n N 0 ) is aperiodic, i.e. gcd{n N 0 : p n > 0} = 1. We are interested in the distribution of Y d, the (normalized) first sum that is divisible by d, Y d = 1 n d S τ d, τ d = inf{n N:d S n }, S n = X i. Maybe we should indeed add something about exceptional sets where Y d is not d efined. I suggest something like: "Here we take the infimum of an empty set to be infini te; on τ d = we set Y d = 0. Under our assumptions τ d will be fi nite with probability 1, which means that this is irrelevant for our results as these refer t o distributions." This might also be the right place to add some motivation, e.g. "The variable d Y d arises as the time between observed renewals in a periodically insp ected self-renewing aggregate." In Grübel (1985) it was shown that Y d converges in distribution to Ge(1/µ). Here Ge(θ) (or Ge θ ) denotes the geometric distribution on N with parameter θ, i.e. Y Ge(θ) means P (Y = n) = (1 θ) n 1 θ for all n N. In the present note we investigate the associated rate of convergence and we establish some related upper bounds. We use two rather different techniques: Stein s method (used here for the first time in connection with the geometric distribution) in Section 2, and an analytic approach in Section 3. The latter is based on results from the theory of commutative Banach algebras, see Gelfand, Raikov and Shilov (1964); a general reference for Stein s method is Stein (1986). In Section 4 we compare the two approaches. 2. Stein s method. Let X 0 be an additional random variable, defined on the same probability space as and independent of the X i s. The process (ξ n : n N 0 ) 2 i=1

of forward recurrence times associated with the delayed renewal process (X 0 +S n : n N 0 ) is then defined by ξ n = min{x 0 + S i : i N 0, X 0 + S i n} n. This process is a Markov chain with state space N 0, and Y d defined as d 1 inf{n dn 0 : ξ n = 0} is the same random variable as in Section 1, when X 0 = 0 a.s. I would change this sentence into e.g.: "This process is a Markov chain with sta te space Z +. Let Y d = d 1 inf{n dn : ξ n = 0}; this generalizes the setup introduced in the introduction where X 0 = 0." If instead X 0 π, where π is the probability distribution on N 0 defined by π k = P (X 1 > k)/µ, (ξ n ) is stationary, and all ξ n s have distribution π. We will write P π, E π for the corresponding probability and expectation respectively and P i, E i for probability and expectation conditional on X 0 = i. Maybe the last sentence should have a John the Baptist type introduction: "Follo wing the custom in Markov chain theory,...". We are interested in the distribution of Y d under P = P 0 or, more precisely, in the total variation distance between this distribution and the geometric distribution with mean µ, 0 Ge(1/µ) TV = sup P0 (Y d A) Ge 1/µ (A). A N Our upper bound will be a multiple of the tail P(τ π,0 > d) of the coupling time of any pair of ξ-processes, one of which is stationary and the other starting with X 0 = 0; the coupling could be chosen to make P(τ π,0 > d) = P ξ d 0 π TV, or the processes could run independently until coupling. Shall we change this? I offer: "Our upper bound will be a multiple of the tail of the coupling time of any pair of ξ- processes, one of which is stationary and the other starting with X 0 = 0. By this we mean the following: if (Ω, F, P) is a probability space, if ξ (1) and ξ (2) are two processes on this space such that the distribution of xi (1) under P is the same as the distribution of ξ under P π and the distribution of ξ (2) equals the distribution of ξ under P 0, a nd if further τ denotes the first index k with ξ (1) k = ξ (2) k (τ = inf ty if no coupling takes place), then our upper bound will involve the tail pr obabilities of τ under P. To make the dependence on the 3

Because two different in itial distrib utions of ξ (1) and ξ (2) clear, we will write P(τ π,0 > n) for these probabilities. There are, of course, many such co nstructions. It is known that (Ω, F, P), ξ (1) and ξ (2) c an be chosen such that P(τ π,0 > d) = P ξ d 0 π TV (see e.g.?? and??). Our upper bound does not depend on any specific construction, we could use, for example, an independent coupling where P is the product of P π and P 0." π P Y d 0 TV P(τ π,0 > d), (1) it is enough, if only the order of approximation is of interest, to consider the stationary process. Due to stationarity, the process (ξ d+n : n N 0 ) has the same distribution as the original (ξ n : n N 0 ). Let Y d be the normalized first divisible sum for the shifted process; Y d and Y d have the same conditional distributions, in the sense that, for all k, L(Y d ξ 0 = k) = L(Y d ξ d = k). On Y d > 1 we have Y d = Y d 1, and a decomposition with respect to the value of ξ d gives the first of the following equalities, for f any bounded function on N: E π f(y d ) = π 0 f(1) + j 1 π j E j f(y d + 1) or = π 0 f(1) + (1 π 0 )E π f(y d + 1) + π 0 ( Eπ f(y d + 1) E 0 f(y d + 1) ), E π {(1 π 0 )f(y d + 1) + π 0 f(1) f(y d )} = π 0 {E 0 f(y d + 1) E π f(y d + 1)}. From (1) it follows that E π f(y d + 1) E 0 f(y d + 1) P(τ π,0 > d) sup f(j) f(k). j,k N Hence, if we define an operator A acting on functions f : N R by ( Af ) (j) = (1 θ)f(j + 1) + θf(1) f(j) for all j N with θ = π 0 = 1/µ, then the above yields Eπ Af(Y d ) 1 µ P(τ π,0 > d) sup f(j) f(k). j,k N What is the referee complaining about? Maybe we should replace "the above yields " by "this upper bound, together with the 4

previous equality, gives". Alternative ly, we could number the formulas. Maybe I overlook something. This bound will be very useful if, for each A N, we can find a bounded function f = f A such that (1 θ)f(j + 1) + θf(1) f(j) = 1 A (j) Ge θ (A), (2) since with this f we have E π Af(Y d ) = P π (Y d A) Ge θ (A). Equation (2) can be considered as a Stein equation for geometric limit distributions; a heuristic explanation is given at the end of the section. Let h A (j) = 1 A (j) Ge θ (A). It is easy to check that Af = h is solved by j 1 f(j) = f(1) + (1 θ) j (1 θ) i h(i) for all j > 1, i=1 with any choice of f(1). For h A as above it holds that i=1 (1 θ)i h(i) = 0, so that the corresponding solution can equally be written as f A (j) = f A (1) (1 θ) i h A (j + i), i=0 from which the bound sup j,k N f(j) f(k) 1/θ = µ for all A N is immediate, so that π Ge(θ) TV P(τ π,0 > d). Combining the above considerations gives us the following result. Theorem 1 With τ π,0 any coupling time as above, we have Ge(1/µ) TV 2P(τ π,0 > d). The literature contains numerous results on the tails of coupling times. Pitman (1974) shows that EX 1 < implies that P(τ π,0 < ) = 1, and Lindvall (1979) has results showing that EX 1+γ 1 < for some γ > 0 implies P(τ π,0 > d) = o(d γ ) for the independent coupling. Combining this with the above theorem we obtain the following corollary. Corollary 2 Let γ 0. If EX 1+γ 1 < then Ge(1/µ) TV = o(d γ ). 5

The following is a heuristic explanation for the success of A in the present context: we can think of a sequence of successive Y d s being embedded in the renewal process (S n :n N). The value of the current Y d either increases by one if a multiple of d is passed, or a hit occurs in which case a new run for the next Y d begins. This structure resembles that of a birth-catastrophe process, by which we mean a Markov chain on N with transition probabilities p i,i+1 = 1 θ, p i1 = θ for all i N. The stationary distribution of such a process is Ge θ ; further, it is easy to check that A is the associated generator. The latter fact can also be used to solve (2); see Barbour (1988) for more on the connection between Stein equations and generators of Markov processes. 3. An analytic approach. The quantity of interest is the distribution of Y d and this distribution depends solely on the distribution of X 1. Identifying distributions on N 0 with the associated sequences of point masses leads us to regard this dependence as a mapping from and to l 1, the space of all summable sequences with index set N 0. We will analyze the functional l 1 p p (d) l 1, where p (d) k = P (Y d = k), by decomposing it into several simpler functionals. A key idea is the relation to renewal sequences which has also been used in Grübel (1985). We assume that EX 1+γ 1 < for some fixed γ 0. The renewal sequence (u n : n N 0 ) associated with a distribution p can be defined recursively by u 0 = 0, u n = n i=1 p iu n i ; u n is the probability that one of the partial sums of the X-sequence lands in n. For a geometric distribution with parameter θ, symbolized by the sequence p [θ], the associated renewal sequence u [θ] has u [θ] n = θ for all n N. Renewal sequences are useful in the present context because of the following: u (d), the renewal sequence associated with p (d), is related to u by u (d) n = u dn. Hence, a first decomposition of the functional consists of three steps: pass to the renewal sequence of the distribution of X 1, take every d th element, then find the distribution associated with this renewal sequence. For this to work we need a thorough understanding of the relationship between distributions and the associated renewal sequences. Here we can build upon a sizeable literature, beginning with one of the original proofs of the discrete renewal theorem given by Erdős, Feller and Pollard (1949). Again, a decomposition into simpler steps is crucial: let Σp and u be the sequences given by (Σp) n = k>n p k for all n N 0, and ( u) 0 = 0, ( u) n = u n u n 1 for all n N, respectively. For a finite mean, aperiodic p these sequences are both summable and are convolution inverse, i.e. Σp u = δ where δ 0 = 1, δ n = 0 for all n N. 6

Now I suggest: replace full stop by semicolon and continue with "aperiodicity, for e xample, is used to show that the value 0 is not in the range of the Fourier transform of Σp, see Erd}os et al. (1949) for details." p Σp u u u (d) u (d) Σp (d) p (d) summarizes all the necessary steps in one diagram. We now introduce the space l 1 (γ) of all sequences (a n :n N 0 ) with the property that a γ < where a γ = n=0 (1 + n)γ a n. Endowed with this norm l 1 (γ) becomes a Banach algebra; in particular, the norm inequality a b γ a γ b γ holds for all a, b l 1 (γ). Let I be the set of invertible elements in l 1. A well-known result from the theory of Banach algebras (see e.g. Gelfand, Raikov and Shilov (1964), Section 19) states that, if a I l 1 (γ), then a ( 1) l 1 (γ): taking the inverse does not lead out of l 1 (γ). Hence: EX 1+γ 1 < means Σp l 1 (γ) which implies u l 1 (γ). Using the relationship between u and u (d) we see that ( u (d) u [θ]) n which gives, with θ = 1/µ = lim n u n, u (d) u [θ] γ 2 γ u d θ + 2 γ k>d 0, if n = 0, u d θ, if n = 1, d k=1 ( u) (n 1)d+k, if n > 1, d (1 + n) γ ( u) (n 1)d+k n=2 ( u)k + 2 γ d γ k=1 2 γ d γ ( k>d(1 + k) γ ( u) k = o(d γ ). + ( sup k>d n=2 k=1 From the norm inequality and simple algebra we obtain d (1 + nd) γ ( u)(n 1)d+k ) ) γ 1 + k + d (1 + k) γ ( u)k 1 + k k>d Σp (d) Σp [θ] γ = Σp (d) Σp [θ] ( u (d) u [θ] ) γ Σp (d) γ Σp [θ] γ u (d) u [θ] γ. 7

The middle factor does not depend on d and the last factor is o(d γ ). To bound the first factor we apply another piece of analysis: the set I l 1 (γ) is γ -open in l 1 (γ), and a a ( 1) is continuous on this set (Gelfand, Raikov and Shilov (1964), Section 2). We know that u (d) converges to u [θ], which is in I l 1 (γ), with respect to γ (we even have a rate), hence, for any given ɛ > 0 we can find a d(ɛ) such that ( u (d) ) ( 1) ( u [θ] ) ( 1) γ ɛ for all d d(ɛ). This implies sup d N Σp (d) γ <, so that we have proved the following theorem. Theorem 3 If, for some γ 0, EX 1+γ 1 < then k γ P ( Y d k ) k=1 ( 1 1 µ) k 1 = o(d γ ). It is easy to see that this implies o(d γ )-convergence of the total variation distance between the distribution of Y d and the geometric distribution with mean µ, hence we reobtain Corollary 2. In fact, the theorem amplifies the total variation result in a manner discussed in Chapter 2.4 of Barbour, Holst and Janson (1992). It is also interesting to note that the weights k γ are in some sense optimal; as dy d > X 1 the left hand side of the equality in the theorem would become infinite for any γ > γ unless EX 1+γ 1 <. 4. Synthesis. For mathematicians, the meaning of the word analytic is twofold, and it may be a coincidence that both meanings can be attached to the approach of the previous section. We should perhaps point out that the dissection into simple parts, such as given in the diagram, also points the way towards an algorithm for obtaining the distribution of Y d for a given distribution of X 1 ; see Grübel (1988) for the computation of renewal sequences and related problems. A somewhat typical aspect of the approach in Section 3 is the fact that the switch to analysis, away from the stochastic model, occurs at a very early stage. From a probabilistic point of view Stein s method is more appealing as more use is made of the stochastic aspects of the problem. This is perhaps more than a matter of taste, especially if it comes to generalizing to non-independent models where Stein s method often has no direct competitors. For instance, if the X i arise as the intervals between the points of a stationary, mixing point process ξ, the inequality in Theorem 1 can be proved in much the same way: P 0 is now the Palm distribution, and of the ξ processes used in the definition of τ π,0, one has the Palm distribution and the other the stationary distribution. On the other hand, for independent X i s, the analytic approach gives a somewhat sharper result. 8

The contrast between the two approaches is not confined to this example. In proving bounds for the approximation in the central limit theorem, one has a similar choice between the operator technique of Trotter (1959) and Fourier methods on the one hand, and methods such as Stein s (1970) on the other. In the Poisson context, one has the operator technique of Deheuvels and Pfeifer (1986) and the complex analytic method of Uspensky (1931) as opposed to the Stein Chen method (Chen, 1975). In these cases, too, Stein s method adapts more easily to dependent settings than the analytic methods. References Barbour, A.D. (1988) Stein s method and Poisson process convergence. Journal of Applied Probability 25A 175 184. Barbour, A.D., Holst, L. and Janson, S. Poisson Approximation. Clarendon Press, Oxford 1992. Chen, L.H.Y. (1975) Poisson approximation for dependent trials. Ann. Probab. 3 534 545. Deheuvels, P. and Pfeifer, D. (1986) A semigroup approach to Poisson approximation. Ann. Probab. 14 665 678. Erdős, P., Feller, W. and Pollard, H. (1949) A property of power series with positive coefficients. Bull. Am. Math. Soc. 55 201 204. Gelfand, I.M., Raikov, D.A. and Shilov, G.E. Commutative Normed Rings. Moscow, 1960. Engl. Edition: Chelsea, New York, 1964. Grübel, R. (1985) An application of the renewal theoretic selection principle: the first divisible sum. Metrika 32 327 337. Grübel, R. (1988) The fast Fourier transform in applied probability theory. Nieuw Archief voor Wiskunde Ser. 4 7 289 300. Lindvall, T. (1979) On coupling of discrete renewal processes. Z. Wahrscheinlichkeitstheorie verw. Gebiete 48 57 70. Pitman, J.W. (1974) Uniform rates of convergence for Markov chain transition probabilities. Z. Wahrscheinlichkeitstheorie verw. Gebiete 29 193 227. Stein, C. (1970) A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Proc. Sixth Berkeley Symp. Math. Statist. Prob. 2 583 602. Stein, C. Approximate computation of expectations. IMS Lecture Notes, Vol. 7, Hayward, California 1986. Trotter, H.F. (1959) An elementary proof of the central limit theorem. Archiv der Math. 9 226 234. Uspensky, J.V. (1931) On Ch. Jordan s series for probability. Ann. Math. 32 306 312. 9