arxiv: v1 [math.pr] 19 Sep 2007

Size: px

Start display at page:

Download "arxiv: v1 [math.pr] 19 Sep 2007"

Abigail Waters
6 years ago
Views:

1 arxiv: v1 [math.pr] 19 Sep 2007 A tail inequality or suprema o unbounded empirical processes with applications to Markov chains Rados law Adamczak June 11, 2008 Abstract We present an easy extension o Talagrand s inequality or suprema o empirical processes to processes generated by variables with inite ψ 1 norm and apply it to some geometrically ergodic Markov chains to derive versions o Bernstein s and Talagrand s inequalities. We also obtain a bounded dierence inequality or symmetric statistics o such Markov chains. AMS 2000 Subject Classiication: Primary 60E15, Secondary 60J05. Keywords: concentration inequalities, empirical processes, Markov chains 1 Introduction The celebrated Talagrand s tail inequality or suprema o empirical processes indexed by uniormly bounded classes o unctions [20] has proved to be a very important tool in iniinite dimensional probability, machine learning and M-estimation. It drew considerable attention resulting in several simpliied proos see [8, 2, 16]. Also some counterparts in terms o moment estimates or processes which are not necessarily bounded have been obtained [4, 3]. These estimates give a bound on the p-th moment o the supremum o empirical processes sum o independent Banach space valued random variables under the assumption that the summands have p-th moments. So ar however there have been no counterparts o Talagrand s inequality in the case o summands with bounded exponential Orlicz norms. Institute o Mathematics, Polish Academy o Sciences 1

2 Such inequalities can be obtained rom the abovementioned moment estimates via Pisier s inequality see 5 below, however this method results in non-optimal dependence on p in moment estimates or equivalently incorrect order o decay in tail inequalities. In the irst part o the article we show that such Orlicz norm counterpart o Talagrand s inequality may be obtained very easily rom the original inequality or the uniormly bounded case by combining the approach to moment estimates rom [4] with another exponential inequality by Talagrand. Whereas the estimates obtained in this way are o independent interests rom the point o view o empirical processes, it turns out that they may be also applied to geometrically ergodic Markov chains to give an in a sense optimal version o Bernstein s and Talagrand s inequalities. Extension o classical inequalities or sums o independent variables to Markov chains is a chalenging problem in the concentration o measure theory. The inequalities known so ar are well suited or uniormly ergodic Markov chains and usually were proven via quite complicated inormation theoretic inequalities connected with the transportation o measure theory a more detailed description o known results is given in Section 7. To our best knowledge, there has been no approach using the natural idea o regeneration and splitting which has turned out to be very ruitul in the analysis o limit theorems [14]. Since the i.i.d. blocks appearing in the decomposition o a Markov chain may be arbitrarily long, one cannot directly apply the classical Bernstein s or Talagrand s inequalities or uniormly bounded variables. However in many cases o interest this random length o the i.i.d. blocks has inite ψ 1 Orlicz norm, so our inequalities may be applied in this situation. The organisation o the article is as ollows. First, as already mentioned, we prove an inequality or sums o independent random variables and show that it is in a sense optimal sections 2 and 3, then we briely recall the theory o regeneration or Markov chains and present the counterparts o Bernstein s and Talagrand s inequalities ollowed by a discussion o optimality sections 4 and 5. Finally in section 6 we prove some other inequalities or Markov chains, based again on the regeneration technique and in section 7 we present a short discussion relating our inequalities to other results on the concentration o measure phenomenon or Markov chains. In the article, by K we will denote universal constants and by K α constants depending only on α where α is some parameter. In both cases the values o constants may change rom line to line. We would like to inish this introduction by recalling the classical deinition o exponential Orlicz norms. 2

3 Deinition 1. For α > 0, deine the unction ψ α : R + R + with the ormula ψ α x = expx α 1. For a random variable X, deine also the Orlicz norm X ψα = in{λ > 0: Eψ α X /λ 1}. Let us also note a basic act that we will use in the sequel, namely that by Chebyshev s inequality or t 0, [ t α ] P X t 2exp. X ψα 2 A tail inequality or suprema o empirical processes We will now ormulate the main result o this section, namely moment estimates or suprema o empirical processes under the assumption that the summands have inite ψ α Orlicz norm. Theorem 1. Let X 1,...,X n be independent random variables with values in a measurable space S, B and F a countable class o measurable unctions : S R. Assume that or every F and every i, EX i = 0 and or some α 0,1] and all i, sup X i ψα <. Let Deine moreover Z = sup F σ 2 = sup F n X i. n EX i 2. Then or all t 1, P Z KEZ + K tσ + K α t 1/α max sup i n F X i e t. 1 ψα The proo is an easy compilation o the classical Homan-Jørgensen inequality with two deep results on empirical processes. Theorem 2 Talagrand, [20]. In the setting o Theorem 1, assume that or all F and all i n, X i a. Then, or all t 1, P Z K EZ + tσ + ta e t. 3

4 Theorem 3 Ledoux, Talagrand, [10]. In the setting o Theorem 1, we have Z ψα K α Z 1 + max sup X i. i ψα Proo o Theorem 1. The proo is based on an argument rom [4]. We will show that or p 2, Z p KEZ + K pσ + K α p 1/α max sup X i, 2 i ψα which implies 1 via the Chebyshev inequality. Let ε 1,...,ε n be independent Rademacher variables, independent o X i. Deine also M = 24E max i sup X i K α max i sup X i. ψα Then, by the classical symmetrization arguments, Theorem 2 and integration by parts, we get EZ p 2 p E sup εi X i p 2 2p 1 E sup ε i X i 1 {sup X i M} p i + 2 2p 1 E sup ε i X i 1 {sup X i >M} p K p E sup i ε i X i 1 {sup X i M} i + K p p p/2 σ p + K p αp p max i sup X i p p ψ α + Kαp p p/α sup ε i X i 1 {sup X i >M} p ]. 3 ψ α i Notice that by the contraction principle or just Jensen s inequality applied conditionally on X i s and symmetrization inequalities we have E sup ε i X i 1 {sup X i M} 2EZ. 4 i 4

5 Moreover, by the Chebyshev inequality, we have P sup ε i X i 1 {sup X i >M} > 0 i Pmax i sup X i > M 1 3 8, hence by the Homan-Jørgensen inequality see e.g. [10], Chapter 6, we have E sup ε i X i 1 {sup X i >M} 6E max sup X i 1 {sup X i i >M} i K α sup max i X i, ψα which together with Theorem 3, gives sup sup ε i X i 1 {sup X i >M} K α max X i ψα i i To inish the proo, it is now enough to combine the above estimate with 3 and 4. 3 A counterexample We will now present a simple example, showing that one cannot replace sup max i X i ψα with max i sup X i ψα. With such a modiication, the inequality ails to be true even in the real valued case, i.e. when F is a singleton. For simplicity we will consider only the case α = 1. Consider a sequence Y 1,Y 2,..., o i.i.d. real random variables, such that PY i = r = e r = 1 PY i = 0. Let ε 1,ε 2,..., be a Rademacher sequence, independent rom Y i i. Deine inally X i = ε i Y i. We have so X i ψ1 1. Moreover Ee X i = e r e r + 1 e r 2, E X i 2 = r 2 e r. Assume now that we have or all n,r N and t 0, n P X i K nt X1 2 + t X 1 ψ1 Ke t, 5 ψα

6 where K is an absolute constant. For suiciently large r, the above inequality applied with n e r r 2 and t r, implies that n P X i r Ke r/k. On the other hand, by Levy s inequality, we have n 2P X i r Pmax X i r 1 i n 2 minnp X 1 r,1 1 2 r 2, which gives a contradiction or large r. Remark A small modiication o the above argument shows that one cannot hope or an inequality P Z KEZ + K tσ + t[log β n]max sup X i Ke t i F ψ1 with β < 1. For β = 1, this inequality ollows rom Theorem 1 via Pisier s inequality [17] max X i K α max X i ψα log 1/α n. 5 i n ψα i n 4 Applications to Markov chains We will now ocus on applications o the above results to Markov chains. Our aim is to obtain Bernstein type deviation inequalities or X X n and suprema o such variables, where X i i is a Markov chain, satisying some additional assumptions. From now on, we will work with a class o Markov chains on S, with transition kernel P = Px, A, satisying the so called minorization assumption, i.e. such that there exist positive m N, δ > 0, a set C B,,small set and a probability measure ν on S or which and x C A B P m x,a δνa 6 x S n P nm x,c >

7 One can show that in such a situation i the chain admits an invariant measure π, then this measure is unique and satisies πc > 0 see [14]. Moreover under some conditions on the initial distribution ξ, it can be extended to a new so called split chain X n,r n S {0,1}, satisying the ollowing properties Properties o the split chain X n n is again a Markov chain with transition kernel P and initial distribution ξ hence or our purposes o estimating the tail probabilities we may and will identiy X n and X n, i we deine T 1 = in{n > 0: R nm = 1}, T i+1 = in{n > 0: R T T i +nm = 1}, then T 1,T 2,..., are well deined, independent, moreover T 2,T 3,... are i.i.d., i we deine S i = T T i, then the,,blocks Y 0 = X 1,...,X mt1 +m 1, Y i = X msi +1,...,X msi+1 +m 1, i > 0, orm a one-dependent sequence i.e. or all i, σy j j<i and σy j j>i are independent. Moreover the sequence Y 1,Y 2,... is stationary. I m = 1, then the variables Y 0,Y 1,... are independent. In consequence, or : S R, the variables Z i = Z i = ms i+1 +m 1 i=ms i +1 X i, i 1, constitute a one-dependent stationary sequence an i.i.d. sequence i m = 1. Additionally, i is π-integrable, then EZ i = δ 1 πc 1 m dπ. the distribution o T 1 depends only on ξ,p,c,δ,ν, whereas the law o T 2 only on P,C,δ and ν. 7

8 We rerain rom speciying the construction o this new chain in ull generality as well as conditions under which 6 and 7 hold and reer the reader to the classical monograph [14] or a survey aricle [18] or a complete exposition. Here, we will only sketch the construction or m = 1, to give its,,lavour. Inormally speaking, at each step i i we have X i = x and x / C, we generate the next value o the chain, according to the measure Px,. I x C, then we toss a coin with probability o success equal to δ. In the case o success R i = 1, we draw the next sample according to the measure ν, otherwise R i = 0, according to Px, δν. 1 δ The construction works, providing that the chain, starting rom the initial distribution will reach C almost surely and starting rom arbitrary point in C, will return to C ininitely oten. When R i = 1, one usually says that the chain regenerates, as the distribution in the next step or m = 1, ater m steps in general is again ν. Regeneration assumption. In what ollows we will work under the assumption that the chain admits such a representation. We will not however take advantage o the explicit construction. Instead we will use the properties stated in points above. A similar approach is quite common in the literature. Let us remark that or a recurrent chain on a countable state space, admitting a stationary distribution, the above assumption is always satisied with m = 1 and δ = 1 or C we can take {x}, where x is an arbitrary element o the state space. Also the construction o the split chain becomes trivial. Assumption on the regeneration time. To derive concentration o measure inequalities, we will also assume that T 1 ψ1 < and T 2 ψ1 <. At the end o the article we will present examples or which this assumption is satisied and relate the obtained inequalities to known results. We will start with deviation inequalities or sums o real valued random variables. 8

9 Consider now a unction : S R such that a. Denote also τ = max{ T 1 ψ1, T 2 ψ1 }. Let us set or convenience Z 0 = mt 1 +m 1 n X i. For a ixed number n, consider the random variable Z = X X n = Z Z N + n i=s N+1 +1m X i, 8 where N = sup{i N: ms i+1 + m 1 n}, where sup = 0 note that N is a random variable. Thus Z 0 represents the sum up to the irst regeneration time, then Z 1,...,Z N are identically distributed blocks between consecutive regeneration times, included in the interval [1, n], inally the last term corresponds to the initial segment o the last block. The sum Z Z N is empty i up to time n, there has not been any regeneration i.e. mt 1 + m 1 > n or there has been only one regeneration mt 1 + m 1 n and mt 1 + T 2 + m 1 > n. The last sum on the right hand side is empty i there has been no regeneration or the last ull block ends with n. We have Z 0 2aT 1 m, so by the remark ater Deinition 1, t P Z 0 t PT 1 t/2am 2exp. 9 2amτ Let us now take a look at the variable M n = n ms N I M n > t then S N+1 < n t + 1 n t <. m m 9

10 Thereore PM n > t = = = k< n t+1 m 1 PS N+1 = k k< n t+1 m 1 k< n t+1 m 1 k< n t+1 m 1 n t+1 m 1 k=1 n t+1 m 1 k=1 2exp k PS l = k &N + 1 = l l=1 k PS l = k & mk + T l+1 + m 1 > n l=1 k l=1 PS l = kpt 2 > n + 1 m 1 k PT 2 > n + 1 m 1 k 1 2exp τ τ 1 1 n + 1 m t Kτ exp, mτ k + 1 n + 1 m exp exp1/τ τ 1 n t+1 m 1 exp1/τ 1 where the irst equality ollows rom the act that S N+1 N +1, the second rom the deinition o N, the third rom the act that T 1,T 2,... are independent and T 2,T 3,... are i.i.d., inally the second inequality rom the act that S i S j or i j see the properties o the split chain at the beginning o the section. Let us notice that i t > 2mτ log τ, then t t τ exp exp exp mτ 2mτ t Kmτ log τ where in the last inequality we have used the act that τ > c or some universal constant c > 1. On the other hand, i t < 2mτ log τ, then 1 e exp t 2mτ log τ 10.,

11 Thereore we obtain or t 0, PM n > t K exp Now, n P i=s N+1 +1m X i > t t Kmτ log τ PM n > t/a K exp. 10 t Kamτ log τ 11 We would like to apply to Z Z N the inequalities or sums o independent random variables obtained in the previous section since we have only one-dependence here, we will split the sum, treating even and odd indices separately. The number o summands is random, but clearly not larger than n. Since the variables Z i are equidistributed, we can use maximal inequalities by Montgomery-Smith which we will recall in a while to reduce this random sum to a deterministic one. Note however that by the LLN we have denoting N = N n to stress the dependence on n N n lim n n = 1 a.s. 12 met 2 In consequence, the asymptotic variance the variance in the CLT with the normalization n equals m 1 ET 2 1 VarZ 1 + EZ 1 Z 2, we again reer the reader to [14], Chapter 17 or details. We would like our estimate to recover the asymptotic variance at least or m = 1 and modulo an abslolute constant in the exponent. Thereore we would like the exponent or large n to be comparable with t 2 met 2 nvarz 1. To achieve this, we will have to provide a quantitative version o 12. To this aim we will use the classical Bernstein s inequality actually its version or ψ 1 variables. Lemma 1 Bernstein s ψ 1 inequality, see [21], Lemma and the subsequent remark. I Y 1,...,Y n are independent random variables such that EY i = 0 and Y ψ1 τ, then or every t > 0, n P Y i > t 2exp 1 t 2 K min t nτ 2,. τ 11

12 Assume now that n/met 2 1. We have PN > 3n/mET 2 P mt T 3n/mET2 +1 n P P = P 3n/mET 2 +1 i=2 3n/mET 2 +1 i=2 3n/mET 2 +1 i=2 T i ET 2 n/m 3n/mET 2 ET 2 T i ET 2 n/m 3n/2m T i ET 2 n/2m. We have T 2 ET 2 ψ1 2 T 2 ψ1 2τ, thereore Bernstein s inequality Lemma 1, gives PN > 3n/mET 2 2exp 1 K min 2exp 1 K min net2 mτ 2, n mτ = 2exp 1 net 2 K mτ 2, n/2m 2 n 3n/mET 2 τ 2, mτ where the equality ollows rom the act that ET 2 τ. I n/met 2 < 1, then also net 2 /mτ 2 < 1, thus inally we have PN > 3n/mET 2 K exp 1 net 2 K mτ Now we are in position to apply inequalities or sums o independent variables. We will need the ollowing maximal inequality by Montgomery-Smith [15]. Lemma 2. Let Y 1,...,Y n be i.i.d. Banach space valued random variables. Then or some universal constant C and every t > 0, P max k n k n Y i > t CP Y i > t/c Let us notice that Z i amt i+1, so Z i ψ1 am T 2 ψ1 amτ. Denote moreover R = 3n/mET 2. Inequality 13, Lemma 2 and Theorem 1 12

13 with α = 1, combined with Pisier s estimate 5 give P Z Z N > 2t P Z Z N > 2t &N R + K exp 1 K P Z 1 + Z Z 2 N 1/2 +1 > t &N R P Z 2 + Z Z 2 N/2 > t &N R + K exp P max Z 1 + Z Z 2k+1 > t k R 1/2 + P max Z Z 2k > t + K exp 1 k R/2 K CP Z 1 + Z Z 2 R 1/2 +1 > t/c + CP Z 2 + Z Z 2 R/2 > t/c + K exp K exp 1 K min + K exp 1 K t 2 nmet 2 1 VarZ 1, net 2 mτ 2. net 2 mτ 2 1 net 2 K mτ 2 net 2 mτ 2 1 net 2 K mτ 2 t log3n/met 2 amτ Combining the above estimate with 8, 9, 11, we obtain P S n > 4t K exp 1 K min t 2 t nmet 2 1, VarZ 1 log3n/met 2 amτ + K exp 1 net 2 t t K mτ 2 + 2exp + K exp. 2amτ Kamτ log τ For t > na/4, the let hand side o the above inequality is equal to 0, thereore, using the act that ET 2 1,τ > 1, we inally obtain Theorem 4. Let X 1,X 2,...,X n be a Markov chain with values in S, satisying the Regeneration assumption and such that T 1 ψ1, T 2 ψ1 τ. Consider a unction : S R, such that a and E π = 0 where π is the unique invariant measure or the chain. Then or all t > 0, P Z > t K exp 1 K min t 2 nmet 2 1 VarZ 1, t τ 2. am log n 13

14 Remark It is clear rom the above proo that in the tail estimate one can split the dependence on T 1 ψ1 and T 2 ψ1. Since we are interested rather in the behaviour with respect to n, and these quantities or a ixed chain are constant, we preer to use the parameter τ so as not to complicate the inal ormula. We will now prove a version o the above theorem or suprema o empirical processes under an additional assumpion that m = 1. Theorem 5. In the setting o Theorem 4, let F be a countable class o measurable unctions : S R, such that a and E π = 0. Assume additionally that m = 1. Deine the random variable Z = sup F and the asymptotic weak variance Then, or all t 1, n X i σ 2 = sup VarZ 1 /ET 2. F P Z KES n + t K exp 1 t 2 K min t nσ 2, τ 3 ET 2 1. alog n Remark In the above theorem, the dependence on the chain is worse that in Theorem 4, i.e. we have τ 3 ET 2 1 instead τ 2 in the denominator. It is a result o just one step in the argument we present below, however at present we do not know how to improve this dependence or extend the result to m > 1. Passing to the proo o Theorem 5, let us irst notice that similarly as in the real valued case see inequalities 9 and 11, under the assumption that sup a, we have t Psup Z 0 t PT 1 t/a 2exp. 14 aτ P sup n i=s N+1 +1 X i > t PM n > t/a K exp t Kaτ log τ 15 14

15 One can also see that in the case m = 1, the splitting o Z Z N into sums over even and odd indices is not necessary since the summands are independent. Using the act that Lemma 2 is valid or Banach space valued variables, we can repeat the argument rom the proo o Theorem 4 and obtain or R = 3n/ET 2, P Z KE sup R Z i +t K exp 1 t 2 K min nσ 2, K exp 1 t 2 K min nσ 2, Thus, Theorem 5 will ollow i we prove that R E sup Z i KE sup t τ 2 alog n t τ 3 ET 2 1 alog n. n X i + Kτ 3 a/et 2 16 recall that K may change rom line to line. Proo o 16. From the triangle inequality, the act that the blocks Y i = X Si +1,...,X Si+1, i 1, are i.i.d. and Jensen s inequality it ollows that E sup R Z i 12E sup 12E sup n/4et 2 n/4et 2 Z i Z i + 12aτ, 17 where in the last inequality we used the act that E sup Z i EaT i+1 aτ. We will split the integral on the right hand side into two parts, depending on the size o the variable N. Let us irst consider the quantity E sup n/4et 2 Z i 1 {N< n/4et2 } Assume that n/4et 2 1. Then, using Bernstein s inequality, we ob- 15

16 tain P N < n/4et 2 = P PT 1 > n/2 + P 2e n/2τ + P 2e n/2τ + 2exp n/4et 2 +1 n/4et 2 +1 i=2 n/4et 2 +1 i=2 T i > n T i ET 2 > n/2 n/4et 2 ET 2 T i ET 2 > n/4 1 K min n 2 ET 2 nτ 2, n τ Ke net 2/Kτ 2. I n/4et 2 < 1, the above estimate holds trivially. Thereore E sup n/4et 2 Z i 1 {N< n/4et2 } a an T 2 2 PN < n/4et2 n/4et 2 ET i+1 1 {N< n/4et2 } Kaτne net 2/Kτ 2 Kaτ 3 /ET Now we will bound the remaining part i.e. E sup n/4et 2 Z i 1 {N n/4et2 }. Recall that Y 0 = X 1,...,X T1, Y i = X Si +1,...,X Si+1 or i 1 and consider a iltration F i i 0 deined as F i = σy 0,...,Y i, where we regard the blocks Y i as random variables with values in the disjoint sum Si, with the natural σ-ield, i.e. the σ-ield generated by B i recall that B denotes our σ-ield o reerence in S. Let us urther notice that T i is measurable with respect to σy i 1 or i 1. We have or i 1, {N + 1 i} = {T T i+1 > n} F i 16

17 and {N + 1 0} =, so N + 1 is a stopping time with respect to the iltration F i. Thus we have E sup n/4et 2 = E sup Z i [ E E sup = E sup 1 {N n/4et2 } n/4et 2 N+1 N+1 aτ + E sup = aτ + E sup N+1 Z i N+1 i=0 Z i Z i 1 {N+1> n/4et2 } F n/4et2 N+1 1 {N+1> n/4et2 } Z i 1 {N+1> n/4et2 } n X i + E sup S N+2 i=n+1 ] 1 {N+1> n/4et2 } X i, where in the irst inequality we used Doob s optional sampling theorem together with the act that sup n Z i is a submartingle with respect to F i notice that Z i is measurable with respect to σy i or i N and F. The second equality ollows rom the act that {N + 1 > n/4et 2 } F n/4et2 N+1. Indeed or i n/4et 2, we have {N + 1 > n/4et 2 & n/4et 2 N + 1 i} = {N + 1 > n/4et 2 } F n/4et2 F i, whereas or i < n/4et 2 this set is empty. Now, combining the above estimate with 17 and 18 and taking into acount the inequality τ ET 2 1, it is easy to see that to inish the proo o 16 it is enough to show that E sup S N+2 i=n+1 X i Kaτ 3 /ET 2 19 This in turn will ollow i we prove that ES N+2 n Kτ 3 /ET 2. 17

18 Recall 10, stating under our assumptions m = 1 that Pn S N+1 > t K exp t/kτ log τ or t 0. We have PS N+2 n > t Pn S N+1 > t + Pn S N+1 t & S N+2 n > t &N > 0 + PS N+2 n > t & N = 0 t n Ke t/kτ log τ + PS N+1 = n k&t N+2 > t + k &N > 0 + PT 1 + T 2 > t Ke t/kτ log τ + k=0 t n k=0 t n k PS l = n k &T l+1 > t + k + 2e t/2τ l=2 n k Ke t/kτ log τ + PS l = n kpt l+1 > t + k k=0 l=2 Ke t/kτ log τ + 2t + 1e t/τ Ke t/kτ log τ. This implies that ES N+1 n Kτ log τ Kτ 3 /ET 2, which proves 19. Thus 16 is shown and Theorem 5 ollows. 5 Another counterexample I we do not pay attention to constants, the main dierence between inequalities presented in the previous section and the classical Bernstein s inequality or sums o i.i.d. bounded variables is the presence o the additional actor log n. We would now like to argue that under the assumptions o Theorems 4 and 5, this additional actor is indispensable. To be more precise, we will construct a Markov chain on a countable state space, satisying the assumptions o Theorem 4 with m = 1 and such that or β < 1, there is no constant K, such that P X X n t K exp 1 K min t 2 nvarz 1, t log β n 20 or all n and all unctions : S R, with 1 and E π = 0. The state space o the chain will be the set S = {0} {n} {1,2,...,n} {+1, 1}. n=1 18

19 The transition probabilities are as ollows p n,k,s,n,k+1,s = 1 or p n,n,s,0 = 1 or p 0,n,1,s = 1 2A e n n = 1,2,..., k = 1,2,...,n 1, s = 1,+1 n = 1,2,..., s = 1,+1, or n = 1,2,..., s = 1,+1, where A = n=1 e n. In other words, whenever a particle is at 0, it chooses one o countably many loops and travels deterministically along it until the next return to 0. It is easy to check that this chain has a stationary distribution π, given by A π 0 = A + n=1 ne n, π n,i,s = 1 2A e n π 0. This chain satisies the minorization condition 6 with C = {0}, ν{x} = p 0,x, δ = 1 and m = 1. The random variables T 1 is now just the time o the irst visit to 0 and T 2,T 3,... indicate the time between consecutive visits to 0. Moreover PT 2 = n = e n A, so T 2 ψ1 <. I we start the chain rom initial distribution ν, then T 1 has the same law as T 2, so τ = T 2 ψ1 = T 1 ψ1. Let us now assume that there is a constant K, such that 20 holds. Since we work with a ixed chain, in what ollows we will again use the letter K to denote also constants depedning on our chain which again may dier at dierent occurences. We can in particular apply 20 to the unction = r where r is a large integer, given by the ormula We have E π r = 0. Moreover Thereore 20 gives 0 = 0, n,i,s = s1 {n r}. VarZ 1 r = n 2 e n A 1 Kr 2 e r. n=r P r X r X n Kre r/2 nt + t log β n e t 21 19

20 or t 1 and n N. Recall that S i = T T i. By Bernstein s inequality Lemma 1, we have or large n, PS n/3et2 > n = PT T n/3et2 > n = P P n/3et 2 n/3et 2 2exp 1 K min T i ET i > n n/3et 2 ET 2 T i ET i > n/2 n 2 n T 2 2 ψ 1, n = 2e n/k. T 2 ψ1 From the above estimate, or some integer L and n large enough, divisible by L, n/l P Z i r 2Kre r/2 nt + t log β n i=0 n/l 2e n/k + P Z i r 2Kre r/2 nt + t log β n &S n/l+1 n i=0 2e n/k n + P r X i Kre r/2 nt + t log β n &S n/l+1 n n + P i=0 i=s n/l e n/k + e t + n P k n =2e n/k + e t i=s n/l [ n k E 1 {Sn/L+1 =k}p k n r X i Kre r/2 nt + t log β n &S n/l+1 n r X i Kre r/2 nt + t log β n &S n/l+1 = k r X i Kre r/2 ] nt + t log β n 2e n/k + e t + e t k n E1 {Sn/L+1 =k} 2e n/k + 2e t, 20

21 where in the third and ourth inequality we used 21 and in the equality, the Markov property. For n r 2 e r and t 1, we obtain P Z 0 r Z n/l r Kt log β n 2e t + 2e n/k 22 On the other hand we have P Z i r r > 1 2A e r. Thereore Pmax i n/l Z i r > r 2 1 minne r /2AL,1. Since Z i r are symmetric, by Levy s inequality, we get 2P Z 0 r Z n/l r r 1 2 minne r /2AL,1 c r 2, whereas 22 applied or t = K 1 r/log β n K 1 r 1 β 1 gives P Z 0 r Z n/l r r 2e r1 β /K + 2e er /Kr 2, which gives a contradiction. 6 A bounded dierence type inequality or symmetric unctions Now we will present an inequality or more general statistics o the chain. Under the same assumptions on the chain as above with an additional restriction that m = 1, we will prove a version o the bounded dierence inequality or symmetric unctions see e.g. [9] or the classical i.i.d. case. Let us consider a measurable unction : S n R which is invariant under permutations o arguments i.e. x 1,...,x n = x σ1,...,x σn 23 or all permutations σ o the set {1,...,n}. Let us also assume that is L-Lipschitz with respect to the Hamming distance, i.e. x 1,...,x n y 1,...,y n L#{i: x i y i }. 24 Then we have the ollowing 21

22 Theorem 6. Let X 1,...,X n be a Markov chain with values in S, satisying the Regeneration assumption with m = 1 and T 1 ψ1, T 2 ψ1 τ. Then or every unction : S n R, satisying 23 and 24, we have P X 1,...,X n EX 1,...,X n t 2exp 1 t 2 K nl 2 τ 2 or all t 0. To prove the above theorem, we will need the ollowing Lemma 3. Let ϕ: R R be a convex unction and G = Y 1,...,Y n, where Y 1,...,Y n are independent random variables with values in a measurable space E. Denote G i = Y 1,...,Y i 1,Ỹi,Y i+1,...,y n, where Ỹ1,...,Ỹn is an independent copy o Y 1,...,Y n and assume that G G i F i Y i,ỹi or some unctions F i : E 2 R, i = 1,...,n. Then EϕG EG Eϕ n ε i F i Y i,ỹi, 25 where ε 1,...,ε n is a sequence o independent Rademacher variables, independent o Y i n and Ỹi n. Proo. Induction with respect to n. For n = 0 the statement is obvious, since both the let-hand and the right-hand side o 25 equal ϕ0. Let us thereore assume that the lemma is true or n 1. Then, denoting by E X integration with respect to the variable X, EϕG EG = EϕG EỸn G n + E Yn G EG EϕG G n + E Yn G EG = EϕG n G + E Yn G EG = Eϕε n G G n + E Yn G EG Eϕε n F n Y n,ỹn + E Yn G EG, where the equalities ollow rom the symmetry and the last inequality rom the contraction principle, applied conditionally on Y i i,ỹ i i. Now, denoting Z = E Yn G, Z i = E Yn G i, we have or i = 1,...,n 1 Z Z i = E Yn G E Yn G i E Yn G G i F i Y i,ỹi, 22

23 and thus or ixed Y n,ỹn and ε n, we can apply the induction assumption to the unction t ϕε n FY n,ỹn + t instead o ϕ and E Yn G instead o G, to obtain n EϕG EG Eϕ F i Y i,ỹiε i. Lemma 4. In the setting o Lemma 3, i or all i, F i Y i,ỹi ψ1 τ, then or all t > 0, P Y 1,...,Y n EY 1,...,Y n t 2exp 1 t 2 K min t nτ 2,. τ Proo. For p 1, Y 1,...,Y n EY 1,...,Y n p n ε i FY i,ỹi K p nτ + pτ, p where the irst inequality ollows rom Lemma 3 and the second one rom Benrstein s inequality and integration by parts. Now, by the Chebyshev s inequality we get P Y 1,...,Y n EY 1,...,Y n K tn + tτ e t or t 1, which is up to the constant in the exponent equivalent to the statement o the lemma note that i we can change the constant in the exponent, the choice o the constant in ront o the exponent is arbitrary, provided it is bigger than 1. Proo o Theorem 6. Consider a disjoint sum E = S i and a unction : E n R deined as y 1,...,y n = x 1,...,x n, where x i s are deined by the condition y 1 = x 1,...,x t1 S t 1 y 2 = x t1 +1,...,x t1 +t 2 S t 2... y n = x t t n 1 +1,...,x t t n S tn

24 Let now T 1,...,T n be the regeneration times o the chain and set Y i = X T T i 1 +1,...,X T T i or i = 1,...,n we change the enumeration with respect to previous sections, but there is no longer need to distinguish the initial block. Then Y 1,...,Y n are independent E-valued random variables recall the assumption m = 1. Moreover we have X 1,...,X n = Y 1,...,Y n. Let now Ỹ1,...,Ỹn be an independent copy o the sequence Y 1,...,Y n. Deine G and G i like in Lemma 3 or the unction. Deine also T i = j i Ỹi S j and let Xi,1,..., X i,t T i 1 + T i +T i T n correspond to Y 1,...,Y i 1,Ỹi,Y i+1,...,y n in the same way as in 26. Let us notice that we can rearrange the sequence X i,1,..., X i,n in such a way that the Hamming distance o the new sequence rom X 1,...,X n will not exceed maxt i, T i. Since the unction is invariant under permutation o arguments and L- Lipschitz with respect to the Hamming distance, we have G G i LmaxT i, T i = FY i,ỹi. Moreover, FY i,ỹi ψ1 2Lτ, so by Lemma 4, we obtain P X 1,...,X n EX 1,...,X n t = P Y 1,...,Y n E Y 1,...,Y n t 2exp 1 t 2 K min t nl 2 τ 2,. Lτ But rom Jensen s inequality and 24 it ollows that X 1,...,X n EX 1,...,X n Ln, thus or t > Ln, the let hand side o the above inequality is equal to 0, whereas or t Ln the inequality τ > 1 gives which proves the theorem. t 2 nl 2 τ 2 t Lτ, 7 A ew words on connections with previous results Concentration o measure inequalities or general unctions o Markov chains were investigated by Marton [11], Samson [19] and recently by Kontorovich and Ramanan [6]. They actually consider more general mixing processes and give estimates on the deviation o a random variable rom the mean or 24

25 median in terms o mixing coeicients. When specialized to Markov chains, their estimates yield inequalities in the spirit o Theorem 6 or general nonnecessarily symmetric unctions o uniormly ergodic Markov chains. To obtain their results, Marton and Samson used transportation inequalities, whereas Kontorovich s and Ramanan s approach was based on martingales. In all cases the bounds include sums o expressions o the orm sup P i x P j,y TV, x,y S thereore these results are not well suited or Markov chains which are not uniormly ergodic like the chain in Section 5, since or such chains the summands are bounded rom below by a constant. It would be interesting to know i in results o this type, the supremum o the total variation distances can be replaced by some other norm, or instance a kind o average. Inequalities o the bounded dierence type or sums X X n where X i s orm uniormly ergodic Markov chains were also obtained by Glyn and Ormoneit [5]. Their method was to analyse the Poisson equation associated with the chain. Their result has been complemented by an inormation theoretic approach in Kontoyiannis et al. [7]. Estimates or sums, in terms o variance, appeared in the work by Samson [19]. In particular he proved a Bernstein type inequality or uniormly ergodic Markov chains. The main dierence rom Theorem 4 is that instead o the asymptotic variance multiplied by n, in Samson s result there is n EX i 2. As can be seen in the example rom Section 5, the asymptotic variance may be much smaller than the second moment o X i. The asymptotic variance or r decreases exponentially with r, whereas E r X i 2 = 1 here however the chain is not uniormly ergodic. On the other hand in Samson s result there is no additional log n actor. Samson presents also an analogous result or empirical processes o uniormly ergodic chains also without the log n actor. His bounds use some strong type variance instead o the supremum o variances. Replacing it with sup i EX i 2 is stated in Samson s work as an open problem, which to our best knowledge has not been yet solved. Finally we would like to comment on the assumptions o our main theorems, concerning Markov chains. We assume that the Orlicz norms T 1 ψ1 and T 2 ψ1 are inite, which is equivalent to existence o a number κ > 1, such that E ξ κ T 1 <, E ν κ T 1 <, 25

26 where ξ is the initial distribution o the chain and ν the minorizing measure rom condition 6. This is satisied or instance i m = 1 and the chain satisies the drit condition, i.e. i there is a measurable unction V : S [1,, together with constants λ < 1 and K <, such that { λv x or x / C, PV x = V ypx,dy K or x C S and V is ξ and ν integrable see e.g. [1], Propositions 4.1 and 4.4, see also [18], [14]. For m > 1 one can similarly consider the kernel P m instead o P however in this case our inequalities are restricted to averages o real valued unctions as in Theorem 4. Such drit conditions have gained considerable attention in the Markov Chain Monte Carlo theory as they imply geometric ergodicity o the chain. Reerences [1] Baxendale P. H. Renewal theory and computable convergence rates or geometrically ergodic Markov chains. Ann. Appl. Probab , no. 1B, MR [2] Bousquet O. A Bennett concentration inequality and its application to suprema o empirical processes. C. R. Math. Acad. Sci. Paris , no. 6, MR [3] Bousquet O., Boucheron S., Lugosi G., Massart P Moment inequalities or unctions o independent random variables. Ann. Probab. 33, no. 2, MR [4] Giné E., Lata la R., Zinn J. Exponential and moment inequalities or U-statistics. In High Dimensional Probability II, Progr. Probab. 47. Birkhauser, Boston, Boston, MA, MR [5] Glynn P. W., Ormoneit D. Hoeding s inequality or uniormly ergodic Markov chains. Statist. Probab. Lett. 56, 2002, no. 2, MR [6] Kontorovich L., Ramanan K. Concentration Inequalities or Dependent Random Variables via the Martingale Method, preprint, available at 26

27 [7] Kontoyiannis I., Lastras-Montano L., Meyn S. P. Relative Entropy and Exponential Deviation Bounds or General Markov Chains IEEE International Symposium on Inormation Theory. [8] Ledoux M. On Talagrand s deviation inequalities or product measures. ESAIM: Probability and Statistics, 11996, MR [9] Ledoux M. The concentration o measure phenomenon. Mathematical Surveys and Monographs, 89. American Mathematical Society, Providence, RI, MR [10] Ledoux M., Talagrand M. Probability in Banach spaces. Isoperimetry and processes. Ergebnisse der Mathematik und ihrer Grenzgebiete 3, 23. Springer-Verlag, Berlin, MR [11] Marton K. A measure concentration inequality or contracting Markov chains. English summary Geom. Funct. Anal , no. 3, MR [12] Marton K. Erratum to: A measure concentration inequality or contracting Markov chains. Geom. Funct. Anal , no. 3, MR [13] Marton, K. Measure concentration or a class o random processes. English summary Probab. Theory Related Fields , no. 3, MR [14] Meyn, S. P., Tweedie, R. L. Markov chains and stochastic stability. Communications and Control Engineering Series. Springer-Verlag London, Ltd., London, MR [15] Montgomery-Smith S.J. Comparison o sums o independent identically distributed random vectors. English summary Probab. Math. Statist , no. 2, MR [16] Panchenko D. Symmetrization approach to concentration inequalities or empirical processes. Ann. Probab , no. 4, MR [17] Pisier, G., Some applications o the metric entropy condition to harmonic analysis. Banach spaces, harmonic analysis, and probability theory., , Lecture Notes in Math., 995, Springer, Berlin, MR

28 [18] Roberts, G. O., Rosenthal, J. S. General state space Markov chains and MCMC algorithms. Probab. Surv , MR [19] Samson, P.M. Concentration o measure inequalities or Markov chains and Φ-mixing processes. English summary Ann. Probab , no. 1, MR [20] Talagrand M. New concentration inequalities in product spaces. Invent. Math , no. 3, MR [21] van der Vaart, Aad W., Wellner, Jon A. Weak convergence and empirical processes. With applications to statistics. Springer Series in Statistics. Springer-Verlag, New York, MR Rados law Adamczak Institute o Mathematics Polish Academy o Sciences Śniadeckich 8. P.O.Box Warszawa 10 Poland R.Adamczak@impan.gov.pl 28

A tail inequality for suprema of unbounded empirical processes with applications to Markov chains

A tail inequality for suprema of unbounded empirical processes with applications to Markov chains E l e c t r o n i c J o u r n a l o P r o b a b i l i t y Vol. 13 2008, Paper no. 34, pages 1000 1034. Journal URL http://www.math.washington.edu/~ejpecp/ A tail inequality or suprema o unbounded empirical