A strong law of large nubers for artingale arrays Yves F. Atchadé arxiv:0905.2761v1 [ath.pr] 17 May 2009 March 2009 Abstract: We prove a artingale triangular array generalization of the Chow-Birnbau- Marshall s inequality. The result is used to derive a strong law of large nubers for artingale triangular arrays whose rows are asyptotically stable in a certain sense. To illustrate, we derive a siple proof, based on artingale arguents, of the consistency of kernel regression with dependent data. Another application can be found in [1] where the new inequality is used to prove a strong law of large nubers for adaptive Markov Chain Monte Carlo ethods. AMS 2000 subject classifications: Priary 60J27, 60J35, 65C40. Keywords and phrases: Martingales and Martingale arrays, Strong law of large nubers, Kernel regression. 1. Strong law of large nubers for artingale arrays Let Ω, F, P be a probability space and E the expectation operator with respect to P. Let {D n,i, F n,i, 1 i n}, n 1 be a artingale-difference array. That is for eac 1, {F n,i, 1 i n} is a non-decreasing sequence of sub-siga-algebra of F, for any 1 i n, E D n,i < and E D n,i F n,i 1 = 0. We assue throughout the paper that F n,0 = {,Ω} for all n 0. We introduce the partial sus k M n,k := D n,i, 1 k n, n 1. i=1 For eac 1, {M n,k, F n,k, 1 k n} is a artingale. Let {c n, n 1} be a non-increasing sequence of positive nubers. We are interested in conditions under which c n M n,n converges alost surely to zero. Martingales and artingale arrays play an iportant role in Probability and Statistics as valuable tools for liit theory. Much is known on the liit theory of artingales see e.g. [4] but coparatively little work has been done on the law of large nubers for artingale arrays. Departent of Statistics, University of Michigan, eail: yvesa@uich.edu 1
/SLLN for artingales arrays 2 One of the ost effective approach to proving the strong law of large nubers for artingales is via the Kologorov s inequality for artingales obtained by Chow [3] and Birnbau-Marshall [2]. Theore 1.1 Chow-Birnbau-Marshall s inequality. Let {S k, F k, k 1} be a sub-artingale and {c k,k 1} a non-increasing real-valued sequence. For p 1 and n N P sup c S 1 n N c p N E[ S N p ] + N 1 c p c p +1 E[ S p ]. The following theore gives an extension to artingale arrays. We introduce the sequence k S n,k = M n,k = D n,i, 1 k n and n S n,k = D n,i + k D j,j, k > n. i=1 i=1 R n := n 1 j=1 D n,j D n 1,j. Theore 1.2. Let {D n,i, F n,i, 1 i n}, n 1 be a artingale-difference array and {c n, n 1} a non-increasing sequence of positive nubers. Assue that F n,i = F i for all i,n. For n N, p 1 and λ > 0 2 p λ p P ax c M, > λ n N c p N E S n,n p + N 1 j=n c p j cp j+1 E S n,j p + E c j R j p. 1 Proof. For n N, we have M, = M 1, 1 + D, + R leading to the decoposition M, = M n,n + D j,j + R j = S n, + We note that {S n,, F, 1 N} is a artingale. We also introduce Z = c p n S n,n p + It is easy to check that Z has the alternative for Z = c p S n, p + 1 j=n c p j S n,j p S n,j 1 p. c p j cp j+1 S n,j p. R j.
/SLLN for artingales arrays 3 Since { S n, p, F, 1 N} is a sub-artingale and {c k, k 1} is non-increasing, we have E Z F 1 Z 1, that is {Z, F, n N} is a sub-artingale. For n N, we introduce the sets A 1 := {c S n, > λ/2}, A 2 := {c R j > λ/2} and B := {c j M j,j λ,j = n,..., 1, c M, > λ}. We have: λ p P ax c M, > λ n N N N = E λ p N 1 B E λ p 1 B A 1 + E λ p 1 B A 2 N E 2 p c M, p 1 B A 1 + E 2 p c R j p 1 B A 2 N E 2 p Z 1 B A 1 + 2 p E c j R j p [ N ] E 2 p E Z N 1 B A 1 F + 2 p E c j R j p 2 p E Z N + c j R j p. In any situations, one deals with artingale arrays whose rows are asyptotically stable in the sense that the sequence E [ R n ] converges to zero as n increases to infinity. Theore 1.2 can be used to prove a strong law of large nubers for such artingale arrays. Corollary 1.1. Let {D n,i, F n,i, 1 i n}, n 1 be a artingale-difference array and {c n, n 1} a non-increasing sequence of positive nubers. Assue that F n,i = F i for all i,n. Suppose that there exists p 1 such that for any n 0 1 c p ne [ S n0,n p ] + li n k=n c p k cp k+1 E [ S n,k p ] = 0, and c n E 1/p [ R n p ] <. 2 n 1 Then c n M n,n converges alost surely to zero. Reark 1.1. With respect to the process {R n } in Theore 1.2, we point out that, because of the assuption F n,j = F j, the sequence { k j=1 D n,j D n 1,j, F k, 1 k n 1} is a also artingale. The conditions in Corollary 1.1 are expressed in ters of oents of artingales. These oents can be nicely bounded by oents of the artingale differences. We give one such
/SLLN for artingales arrays 4 bound in the next proposition. It is a consequence of the Burkholder s inequality [4], Theore 2.10 and soe classical convexity inequalities. We oit the details. Proposition 1.1. Let {D n,i, F n,i, 1 i n}, n 1 be a artingale-difference array. For any p > 1, where C = 18pq 1/2 p, p 1 + q 1 = 1. k E [ M n,k p ] Ck axp/2,1 1 E D n,j p, 3 j=1 2. Kernel regression with Markov chains As an application, we prove the strong consistency of the Nadaraya-Watson estiator for nonparaetric regression where the data arises fro a non-stationary Markov chain. The approach of the proof can be adapted to study other kernel ethods or other statistical soothing procedures with dependent data. We assue the following structure for the data. {X i,ǫ i, i 0} is a joint R 2 -valued Markov chain on soe probability space Ω, F, P such that P X n,ǫ n A B X k,ǫ k, k n 1 = A px n 1,zqz,Bdz, for transition probability densities p and q. p is the transition probability density of the arginal Markov chain {X i, i 0} and qx,a = p ε n A X n = x is the transition probability density of the error ter ε n. All densities are with respect to the Lebesgue easure denoted dx. We assue that p has an invariant distribution π that is πx = R πypy,xdy, x R and πx ǫqx, ǫdǫ dx = 0. 4 R R We consider the dependent variable Y i = rx i + ǫ i, i 0. We are interested in estiating the regression function r. Note that the error ters ǫ i are correlated and we do not assue that E ǫ i = 0 unless, as assued in 4, the Markov chain {X i, i 0} is in stationarity. For the reader s convenience, we will soeties use the notation EUY X = x to denote the integral R U rx + ǫqx,ǫdǫ, whenever such integral is well-defined. A popular
/SLLN for artingales arrays 5 nonparaetric estiator for r is the Nadaraya-Watson estiator ni=1 Y i K x0 X i ˆr n x 0 =, x 0 R. 5 ni=1 K x0 X i where K is the kernel a nonnegative function such that R Kxdx = 1 and > 0 the bandwidth. Let ψ : R R be a easurable function. We study the alost sure convergence of ˆr n,ψ x 0 = 1 n n x0 X i ψy i K h i=1 n as n. We can then deduce the convergence of the Nadaraya-Watson estiator by setting ψx = x for the nuerator and ψx = 1 for the denoinator. Let µ be the distribution of X 0, the initial distribution of the Markov chain. We write P for the Markov kernel induced by p which operates on nonnegative bounded easurable functions as Pfx = R px,yfydy. The iterates operators of P are defined as P 0 fx = fx and for n 1, P n fx = PP n 1 fx. We will assue that P is geoetrically ergodic. That is B1 P is φ-irreducible, aperiodic and there exist a function V : R [1,, λ 0,1, b 0, such that for soe sall set C. PV x λv x + b1 C x, This assuption is a well known stability assuption for Markov kernels extensively studied in [5]. One iportant consequence of B1 that we will use is the following. For any α 0,1], there exists Cα < such that for all n 0, sup P n fx fxπxdx Cαρn V α x, x R 6 f V α 1 R fx where f V α := sup x R V α x. A proof can be found [5], Chapter 15. We assue that µv := R V xµxdx <. By iterating the drift condition B1, it is easy to see that On the function ψ, we assue that sup x R sup E [V X n ] µv + b/1 λ <. 7 n 0 [ ] V 1/2 x1 + x E [ ψy X = x] <, and supv xe ψ 2 Y X = x x R <. 8,
/SLLN for artingales arrays 6 On the kernel K, we assue that supkx <, x R Kx Kx li x Kx = 0, and sup x x =x x x <. 9 On the sequence {, n 0}, we assue that: n β, with β 0,1/4. 10 Theore 2.1. Assue B1, 8-10 and that the function x πxe ψy X = x is continuous at x 0. Then li n ˆr n,ψ x 0 = πx 0 E ψy X = x 0 with P-probability one. Proof. Throughout the proof, x 0 R is fixed and C will denote a finite constant whose actual value ight differ fro one appearance to the next. Define F n := σx k,y k, k n}. For h > 0, define F h x,y = ψyk x 0 x h, fh x = K x 0 x h E ψy X = x, and g h x = l 0 P l f h x, where P l fx := P l fx R fxπxdx. By 8, the boundedness of K and the geoetric ergodicity assuption 6, g h is well-defined and satisfies g h V 1/2 C1 ρ 1. It is also wellknown that g h solves the Poisson equation for f h and P. In other words, we have g h x Pg h x = f h x, 11 where f h x = f h x R f hxπxdx. Siilarly, define H h x,y = F h x,y + Pg h x. It is left to the reader to check that E [H h X n,y n X n 1 = x,y n 1 = y] = Pf h x+p 2 g h x = Pg h x+ R f hxπxdx using 11. It follows that F h x,y πxf h xdx = H h x,y E [H h X n,y n X n 1 = x,y n 1 = y], x,y R. 12 Using 12, we can decopose ˆr n,ψ x 0 as ˆr n,ψ x 0 = 1 K R x0 x E [ψy X = x]πxdx + 1 n n D n,k k=1 + n 1 E [H hn X 1,Y 1 F 0 ] E [H hn X n+1,y n+1 F n ], where D n,k = H hn X k,y k E [H hn X k,y k F k 1 ].
/SLLN for artingales arrays 7 Under the stated assuptions, it is a standard result of kernel estiation that li n 1 R See e.g. [6] for a proof. x0 x K E [ψy X = x]πxdx = E ψy X = x 0 πx 0. We deduce fro the drift condition B1 and 8 that sup E [ H h X n,y n F n 1 ] CV 1/2 X n 1, n 1 for soe finite constant C that does not depend on h. Cobined with 7 we get for any δ > 0, k 1 P n 1 E [H hn X 1,Y 1 F 0 ] E [H hn X n+1,y n+1 F n ] > δ Cδ 2 n 21 β <. n 1 This easily iplies that the ter n 1 E [H hn X 1,Y 1 F 0 ] E [H hn X n+1,y n+1 F n ] converges alost surely to zero. Lastly, the process {D n,k, F k k n} is a artingale-difference array. Again by 8, the boundedness of K, the drift condition B1, E D n,k 2 [ E H hn X k,y k 2] CE V X k. [ Then using 7, we obtain that sup n 0 sup 0 k n E D n,k 2] <. This iplies, in the notations of Theore 1.2, that E [ S n, 2] C, for soe finite constant C that does not depend on n nor. Moreover, we can write D n,j D n 1,j = H hn X j,y j H hn 1 X j,y j E H hn X j,y j H hn 1 X j,y j F j 1, and we note that Hhn H hn 1 x,y = ψy K x0 x + l 1 By the Lipschitz condition on K and 8, K x0 x 1 K E [ψy X = x] x0 x x0 x K C 1 Therefore H hn H hn 1 x,y C h 1 x ψy + V 1/2 x fro which we deduce using 8 and 7 that E P K l x0 x x0 x K E [ψy X = x]. n 1 h 1 n h 1 1 n 1 h 1 n V 1/2 x. D n,j D n 1,j 2 = On 2 1+β uniforly in j which iplies as in Proposition 1.1 that E 1/2 n 1 j=1 D n,j D n 1,j 2 = On 1/2+β which together with E [ S n, 2] C proves 2, since β < 1/4. We can therefore conclude that n 1 n k=1 D n,k 0, P-alost surely, which ends the proof.
/SLLN for artingales arrays 8 References [1] Atchade, Y. F. and Fort, G. To appear. Liit theores for soe adaptive cc a;goriths with sub-geoetric kernels. Bernoulli. [2] Birnbau, Z. W. and W., M. A. 1961. Soe ultivarite chebyshev inequalities with extensions to continuous paraeter processes. Ann. Math. Statist. 32 687 703. [3] Chow, Y. S. 1960. A artingale inequality and the law of large nubers. Proc. Aer. Math. Soc. 11 107 111. [4] Hall, P. and Heyde, C. C. 1980. Martingale Liit theory and its application. Acadeic Press, New York. [5] Meyn, S. P. and Tweedie, R. L. 1993. Markov chains and stochastic stability. Springer- Verlag London Ltd., London. [6] Prakasa, B. L. S., R. 1983. Nonparaetric functional estiation. Acadeic Press, New York.