arxiv: v1 [math.pr] 17 May 2009

Similar documents
Asymptotics of weighted random sums

The degree of a typical vertex in generalized random intersection graph models

1 Bounding the Margin

Solutions of some selected problems of Homework 4

3.8 Three Types of Convergence

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

Tail estimates for norms of sums of log-concave random vectors

A Simple Regression Problem

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

1. INTRODUCTION AND RESULTS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

Generalized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili,

Metric Entropy of Convex Hulls

A Bernstein-Markov Theorem for Normed Spaces

FAST DYNAMO ON THE REAL LINE

Computational and Statistical Learning Theory

Some Classical Ergodic Theorems

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics

Statistics and Probability Letters

LORENTZ SPACES AND REAL INTERPOLATION THE KEEL-TAO APPROACH

A Martingale Central Limit Theorem

Generalized eigenfunctions and a Borel Theorem on the Sierpinski Gasket.

Prerequisites. We recall: Theorem 2 A subset of a countably innite set is countable.

The Methods of Solution for Constrained Nonlinear Programming

arxiv: v1 [math.co] 19 Apr 2017

arxiv: v1 [math.nt] 14 Sep 2014

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Shannon Sampling II. Connections to Learning Theory

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Non-uniform Berry Esseen Bounds for Weighted U-Statistics and Generalized L-Statistics

STRONG LAW OF LARGE NUMBERS FOR SCALAR-NORMED SUMS OF ELEMENTS OF REGRESSIVE SEQUENCES OF RANDOM VARIABLES

On weighted averages of double sequences

Alireza Kamel Mirmostafaee

Estimating Parameters for a Gaussian pdf

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint.

A CLT FOR MULTI-DIMENSIONAL MARTINGALE DIFFERENCES IN A LEXICOGRAPHIC ORDER GUY COHEN. Dedicated to the memory of Mikhail Gordin

Distributed Subgradient Methods for Multi-agent Optimization

A generalization of Dobrushin coefficient

}, (n 0) be a finite irreducible, discrete time MC. Let S = {1, 2,, m} be its state space. Let P = [p ij. ] be the transition matrix of the MC.

Max-Product Shepard Approximation Operators

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Polygonal Designs: Existence and Construction

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

Characterization of the Line Complexity of Cellular Automata Generated by Polynomial Transition Rules. Bertrand Stone

arxiv: v2 [math.nt] 5 Sep 2012

Some Proofs: This section provides proofs of some theoretical results in section 3.

Necessity of low effective dimension

VARIATION FOR THE RIESZ TRANSFORM AND UNIFORM RECTIFIABILITY

IN modern society that various systems have become more

The Hilbert Schmidt version of the commutator theorem for zero trace matrices

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

STA205 Probability: Week 8 R. Wolpert

NON-COMMUTATIVE GRÖBNER BASES FOR COMMUTATIVE ALGEBRAS

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Local asymptotic powers of nonparametric and semiparametric tests for fractional integration

ON THE COMPLETE CONVERGENCE FOR WEIGHTED SUMS OF DEPENDENT RANDOM VARIABLES UNDER CONDITION OF WEIGHTED INTEGRABILITY

In this chapter, we consider several graph-theoretic and probabilistic models

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

arxiv:math/ v1 [math.nt] 15 Jul 2003

On Conditions for Linearity of Optimal Estimation

Polytopes and arrangements: Diameter and curvature

Some Results on the Ergodicity of Adaptive MCMC Algorithms

L p moments of random vectors via majorizing measures

Bootstrapping Dependent Data

arxiv: v1 [cs.cc] 19 Jan 2016

Analysis of Polynomial & Rational Functions ( summary )

Learnability and Stability in the General Learning Setting

arxiv: v1 [cs.ds] 3 Feb 2014

Research Article Some Formulae of Products of the Apostol-Bernoulli and Apostol-Euler Polynomials

Note on generating all subsets of a finite set with disjoint unions

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

THE AVERAGE NORM OF POLYNOMIALS OF FIXED HEIGHT

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

The Asymptotics of the Mahalanobis. Depth-Based Theil-Sen Estimators in. Multiple Linear Models

Conditional formulae for Gibbs type exchangeable random partitions

Study on Markov Alternative Renewal Reward. Process for VLSI Cell Partitioning

The concavity and convexity of the Boros Moll sequences

Bernoulli Numbers. Junior Number Theory Seminar University of Texas at Austin September 6th, 2005 Matilde N. Lalín. m 1 ( ) m + 1 k. B m.

APPROXIMATION BY MODIFIED SZÁSZ-MIRAKYAN OPERATORS

Closed-form evaluations of Fibonacci Lucas reciprocal sums with three factors

Zdzis law Brzeźniak and Tomasz Zastawniak

Stochastic Process (ENPC) Monday, 22nd of January 2018 (2h30)

Question 1. Question 3. Question 4. Graduate Analysis I Exercise 4

APPROXIMATION BY GENERALIZED FABER SERIES IN BERGMAN SPACES ON INFINITE DOMAINS WITH A QUASICONFORMAL BOUNDARY

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

4 = (0.02) 3 13, = 0.25 because = 25. Simi-

Approximation by Piecewise Constants on Convex Partitions

arxiv: v2 [math.co] 3 Dec 2008

EXISTENCE OF ASYMPTOTICALLY PERIODIC SOLUTIONS OF SCALAR VOLTERRA DIFFERENCE EQUATIONS. 1. Introduction

On the Navier Stokes equations

The Weierstrass Approximation Theorem

On the Use of A Priori Information for Sparse Signal Approximations

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

Asynchronous Gossip Algorithms for Stochastic Optimization

A note on the growth rate in the Fazekas Klesov general law of large numbers and on the weak law of large numbers for tail series

Covariance-based orthogonality tests for regressors with unknown persistence

On Uniform Convergence of Sine and Cosine Series. under Generalized Difference Sequence of. p-supremum Bounded Variation Sequences

A note on the multiplication of sparse matrices

Transcription:

A strong law of large nubers for artingale arrays Yves F. Atchadé arxiv:0905.2761v1 [ath.pr] 17 May 2009 March 2009 Abstract: We prove a artingale triangular array generalization of the Chow-Birnbau- Marshall s inequality. The result is used to derive a strong law of large nubers for artingale triangular arrays whose rows are asyptotically stable in a certain sense. To illustrate, we derive a siple proof, based on artingale arguents, of the consistency of kernel regression with dependent data. Another application can be found in [1] where the new inequality is used to prove a strong law of large nubers for adaptive Markov Chain Monte Carlo ethods. AMS 2000 subject classifications: Priary 60J27, 60J35, 65C40. Keywords and phrases: Martingales and Martingale arrays, Strong law of large nubers, Kernel regression. 1. Strong law of large nubers for artingale arrays Let Ω, F, P be a probability space and E the expectation operator with respect to P. Let {D n,i, F n,i, 1 i n}, n 1 be a artingale-difference array. That is for eac 1, {F n,i, 1 i n} is a non-decreasing sequence of sub-siga-algebra of F, for any 1 i n, E D n,i < and E D n,i F n,i 1 = 0. We assue throughout the paper that F n,0 = {,Ω} for all n 0. We introduce the partial sus k M n,k := D n,i, 1 k n, n 1. i=1 For eac 1, {M n,k, F n,k, 1 k n} is a artingale. Let {c n, n 1} be a non-increasing sequence of positive nubers. We are interested in conditions under which c n M n,n converges alost surely to zero. Martingales and artingale arrays play an iportant role in Probability and Statistics as valuable tools for liit theory. Much is known on the liit theory of artingales see e.g. [4] but coparatively little work has been done on the law of large nubers for artingale arrays. Departent of Statistics, University of Michigan, eail: yvesa@uich.edu 1

/SLLN for artingales arrays 2 One of the ost effective approach to proving the strong law of large nubers for artingales is via the Kologorov s inequality for artingales obtained by Chow [3] and Birnbau-Marshall [2]. Theore 1.1 Chow-Birnbau-Marshall s inequality. Let {S k, F k, k 1} be a sub-artingale and {c k,k 1} a non-increasing real-valued sequence. For p 1 and n N P sup c S 1 n N c p N E[ S N p ] + N 1 c p c p +1 E[ S p ]. The following theore gives an extension to artingale arrays. We introduce the sequence k S n,k = M n,k = D n,i, 1 k n and n S n,k = D n,i + k D j,j, k > n. i=1 i=1 R n := n 1 j=1 D n,j D n 1,j. Theore 1.2. Let {D n,i, F n,i, 1 i n}, n 1 be a artingale-difference array and {c n, n 1} a non-increasing sequence of positive nubers. Assue that F n,i = F i for all i,n. For n N, p 1 and λ > 0 2 p λ p P ax c M, > λ n N c p N E S n,n p + N 1 j=n c p j cp j+1 E S n,j p + E c j R j p. 1 Proof. For n N, we have M, = M 1, 1 + D, + R leading to the decoposition M, = M n,n + D j,j + R j = S n, + We note that {S n,, F, 1 N} is a artingale. We also introduce Z = c p n S n,n p + It is easy to check that Z has the alternative for Z = c p S n, p + 1 j=n c p j S n,j p S n,j 1 p. c p j cp j+1 S n,j p. R j.

/SLLN for artingales arrays 3 Since { S n, p, F, 1 N} is a sub-artingale and {c k, k 1} is non-increasing, we have E Z F 1 Z 1, that is {Z, F, n N} is a sub-artingale. For n N, we introduce the sets A 1 := {c S n, > λ/2}, A 2 := {c R j > λ/2} and B := {c j M j,j λ,j = n,..., 1, c M, > λ}. We have: λ p P ax c M, > λ n N N N = E λ p N 1 B E λ p 1 B A 1 + E λ p 1 B A 2 N E 2 p c M, p 1 B A 1 + E 2 p c R j p 1 B A 2 N E 2 p Z 1 B A 1 + 2 p E c j R j p [ N ] E 2 p E Z N 1 B A 1 F + 2 p E c j R j p 2 p E Z N + c j R j p. In any situations, one deals with artingale arrays whose rows are asyptotically stable in the sense that the sequence E [ R n ] converges to zero as n increases to infinity. Theore 1.2 can be used to prove a strong law of large nubers for such artingale arrays. Corollary 1.1. Let {D n,i, F n,i, 1 i n}, n 1 be a artingale-difference array and {c n, n 1} a non-increasing sequence of positive nubers. Assue that F n,i = F i for all i,n. Suppose that there exists p 1 such that for any n 0 1 c p ne [ S n0,n p ] + li n k=n c p k cp k+1 E [ S n,k p ] = 0, and c n E 1/p [ R n p ] <. 2 n 1 Then c n M n,n converges alost surely to zero. Reark 1.1. With respect to the process {R n } in Theore 1.2, we point out that, because of the assuption F n,j = F j, the sequence { k j=1 D n,j D n 1,j, F k, 1 k n 1} is a also artingale. The conditions in Corollary 1.1 are expressed in ters of oents of artingales. These oents can be nicely bounded by oents of the artingale differences. We give one such

/SLLN for artingales arrays 4 bound in the next proposition. It is a consequence of the Burkholder s inequality [4], Theore 2.10 and soe classical convexity inequalities. We oit the details. Proposition 1.1. Let {D n,i, F n,i, 1 i n}, n 1 be a artingale-difference array. For any p > 1, where C = 18pq 1/2 p, p 1 + q 1 = 1. k E [ M n,k p ] Ck axp/2,1 1 E D n,j p, 3 j=1 2. Kernel regression with Markov chains As an application, we prove the strong consistency of the Nadaraya-Watson estiator for nonparaetric regression where the data arises fro a non-stationary Markov chain. The approach of the proof can be adapted to study other kernel ethods or other statistical soothing procedures with dependent data. We assue the following structure for the data. {X i,ǫ i, i 0} is a joint R 2 -valued Markov chain on soe probability space Ω, F, P such that P X n,ǫ n A B X k,ǫ k, k n 1 = A px n 1,zqz,Bdz, for transition probability densities p and q. p is the transition probability density of the arginal Markov chain {X i, i 0} and qx,a = p ε n A X n = x is the transition probability density of the error ter ε n. All densities are with respect to the Lebesgue easure denoted dx. We assue that p has an invariant distribution π that is πx = R πypy,xdy, x R and πx ǫqx, ǫdǫ dx = 0. 4 R R We consider the dependent variable Y i = rx i + ǫ i, i 0. We are interested in estiating the regression function r. Note that the error ters ǫ i are correlated and we do not assue that E ǫ i = 0 unless, as assued in 4, the Markov chain {X i, i 0} is in stationarity. For the reader s convenience, we will soeties use the notation EUY X = x to denote the integral R U rx + ǫqx,ǫdǫ, whenever such integral is well-defined. A popular

/SLLN for artingales arrays 5 nonparaetric estiator for r is the Nadaraya-Watson estiator ni=1 Y i K x0 X i ˆr n x 0 =, x 0 R. 5 ni=1 K x0 X i where K is the kernel a nonnegative function such that R Kxdx = 1 and > 0 the bandwidth. Let ψ : R R be a easurable function. We study the alost sure convergence of ˆr n,ψ x 0 = 1 n n x0 X i ψy i K h i=1 n as n. We can then deduce the convergence of the Nadaraya-Watson estiator by setting ψx = x for the nuerator and ψx = 1 for the denoinator. Let µ be the distribution of X 0, the initial distribution of the Markov chain. We write P for the Markov kernel induced by p which operates on nonnegative bounded easurable functions as Pfx = R px,yfydy. The iterates operators of P are defined as P 0 fx = fx and for n 1, P n fx = PP n 1 fx. We will assue that P is geoetrically ergodic. That is B1 P is φ-irreducible, aperiodic and there exist a function V : R [1,, λ 0,1, b 0, such that for soe sall set C. PV x λv x + b1 C x, This assuption is a well known stability assuption for Markov kernels extensively studied in [5]. One iportant consequence of B1 that we will use is the following. For any α 0,1], there exists Cα < such that for all n 0, sup P n fx fxπxdx Cαρn V α x, x R 6 f V α 1 R fx where f V α := sup x R V α x. A proof can be found [5], Chapter 15. We assue that µv := R V xµxdx <. By iterating the drift condition B1, it is easy to see that On the function ψ, we assue that sup x R sup E [V X n ] µv + b/1 λ <. 7 n 0 [ ] V 1/2 x1 + x E [ ψy X = x] <, and supv xe ψ 2 Y X = x x R <. 8,

/SLLN for artingales arrays 6 On the kernel K, we assue that supkx <, x R Kx Kx li x Kx = 0, and sup x x =x x x <. 9 On the sequence {, n 0}, we assue that: n β, with β 0,1/4. 10 Theore 2.1. Assue B1, 8-10 and that the function x πxe ψy X = x is continuous at x 0. Then li n ˆr n,ψ x 0 = πx 0 E ψy X = x 0 with P-probability one. Proof. Throughout the proof, x 0 R is fixed and C will denote a finite constant whose actual value ight differ fro one appearance to the next. Define F n := σx k,y k, k n}. For h > 0, define F h x,y = ψyk x 0 x h, fh x = K x 0 x h E ψy X = x, and g h x = l 0 P l f h x, where P l fx := P l fx R fxπxdx. By 8, the boundedness of K and the geoetric ergodicity assuption 6, g h is well-defined and satisfies g h V 1/2 C1 ρ 1. It is also wellknown that g h solves the Poisson equation for f h and P. In other words, we have g h x Pg h x = f h x, 11 where f h x = f h x R f hxπxdx. Siilarly, define H h x,y = F h x,y + Pg h x. It is left to the reader to check that E [H h X n,y n X n 1 = x,y n 1 = y] = Pf h x+p 2 g h x = Pg h x+ R f hxπxdx using 11. It follows that F h x,y πxf h xdx = H h x,y E [H h X n,y n X n 1 = x,y n 1 = y], x,y R. 12 Using 12, we can decopose ˆr n,ψ x 0 as ˆr n,ψ x 0 = 1 K R x0 x E [ψy X = x]πxdx + 1 n n D n,k k=1 + n 1 E [H hn X 1,Y 1 F 0 ] E [H hn X n+1,y n+1 F n ], where D n,k = H hn X k,y k E [H hn X k,y k F k 1 ].

/SLLN for artingales arrays 7 Under the stated assuptions, it is a standard result of kernel estiation that li n 1 R See e.g. [6] for a proof. x0 x K E [ψy X = x]πxdx = E ψy X = x 0 πx 0. We deduce fro the drift condition B1 and 8 that sup E [ H h X n,y n F n 1 ] CV 1/2 X n 1, n 1 for soe finite constant C that does not depend on h. Cobined with 7 we get for any δ > 0, k 1 P n 1 E [H hn X 1,Y 1 F 0 ] E [H hn X n+1,y n+1 F n ] > δ Cδ 2 n 21 β <. n 1 This easily iplies that the ter n 1 E [H hn X 1,Y 1 F 0 ] E [H hn X n+1,y n+1 F n ] converges alost surely to zero. Lastly, the process {D n,k, F k k n} is a artingale-difference array. Again by 8, the boundedness of K, the drift condition B1, E D n,k 2 [ E H hn X k,y k 2] CE V X k. [ Then using 7, we obtain that sup n 0 sup 0 k n E D n,k 2] <. This iplies, in the notations of Theore 1.2, that E [ S n, 2] C, for soe finite constant C that does not depend on n nor. Moreover, we can write D n,j D n 1,j = H hn X j,y j H hn 1 X j,y j E H hn X j,y j H hn 1 X j,y j F j 1, and we note that Hhn H hn 1 x,y = ψy K x0 x + l 1 By the Lipschitz condition on K and 8, K x0 x 1 K E [ψy X = x] x0 x x0 x K C 1 Therefore H hn H hn 1 x,y C h 1 x ψy + V 1/2 x fro which we deduce using 8 and 7 that E P K l x0 x x0 x K E [ψy X = x]. n 1 h 1 n h 1 1 n 1 h 1 n V 1/2 x. D n,j D n 1,j 2 = On 2 1+β uniforly in j which iplies as in Proposition 1.1 that E 1/2 n 1 j=1 D n,j D n 1,j 2 = On 1/2+β which together with E [ S n, 2] C proves 2, since β < 1/4. We can therefore conclude that n 1 n k=1 D n,k 0, P-alost surely, which ends the proof.

/SLLN for artingales arrays 8 References [1] Atchade, Y. F. and Fort, G. To appear. Liit theores for soe adaptive cc a;goriths with sub-geoetric kernels. Bernoulli. [2] Birnbau, Z. W. and W., M. A. 1961. Soe ultivarite chebyshev inequalities with extensions to continuous paraeter processes. Ann. Math. Statist. 32 687 703. [3] Chow, Y. S. 1960. A artingale inequality and the law of large nubers. Proc. Aer. Math. Soc. 11 107 111. [4] Hall, P. and Heyde, C. C. 1980. Martingale Liit theory and its application. Acadeic Press, New York. [5] Meyn, S. P. and Tweedie, R. L. 1993. Markov chains and stochastic stability. Springer- Verlag London Ltd., London. [6] Prakasa, B. L. S., R. 1983. Nonparaetric functional estiation. Acadeic Press, New York.