A tail inequality for suprema of unbounded empirical processes with applications to Markov chains

Size: px

Start display at page:

Download "A tail inequality for suprema of unbounded empirical processes with applications to Markov chains"

Shannon Warren
6 years ago
Views:

1 E l e c t r o n i c J o u r n a l o P r o b a b i l i t y Vol , Paper no. 34, pages Journal URL A tail inequality or suprema o unbounded empirical processes with applications to Markov chains Rados law Adamczak Institute o Mathematics Polish Academy o Sciences Śniadeckich 8. P.O.Box Warszawa 10, Poland R.Adamczak@impan.gov.pl Abstract We present a tail inequality or suprema o empirical processes generated by variables with inite ψ α norms and apply it to some geometrically ergodic Markov chains to derive similar estimates or empirical processes o such chains, generated by bounded unctions. We also obtain a bounded dierence inequality or symmetric statistics o such Markov chains. Key words: concentration inequalities, empirical processes, Markov chains. AMS 2000 Subject Classiication: Primary 60E15; Secondary:60J05. Submitted to EJP on September 19, 2007, inal version accepted May 27, Research partially supported by MEiN Grant 1 PO3A

2 1 Introduction Let us consider a sequence X 1, X 2,...,X n o random variables with values in a measurable space S, B and a countable class o measurable unctions : S R. Deine moreover the random variable Z = sup F X i. In recent years a lot o eort has been devoted to describing the behaviour, in particular concentration properties o the variable Z under various assumptions on the sequence X 1,...,X n and the class F. Classically, one considers the case o i.i.d. or independent random variables X i s and uniormly bounded classes o unctions, although there are also results or unbounded unctions or sequences o variables possessing some mixing properties. The aim o this paper is to present tail inequalities or the variable Z under two dierent types o assumptions, relaxing the classical conditions. In the irst part o the article we consider the case o independent variables and unbounded unctions satisying however some integrability assumptions. The main result o this part is Theorem 4, presented in Section 2. In the second part we keep the assumption o uniorm boundedness o the class F but relax the condition on the underlying sequence o variables, by considering a class o Markov chains, satisying classical small set conditions with exponentially integrable regeneration times. I the small set assumption is satisied or the one step transition kernel, the regeneration technique or Markov chains together with the results or independent variables and unbounded unctions allow us to derive tail inequalities or the variable Z Theorem 7, presented in Section 3.2. In a more general situation, when the small set assumption is satisied only by the m-skeleton chain, our results are restricted to sums o real variables, i.e. to the case o F being a singleton Theorem 6. Finally, in Section 3.4, using similar arguments, we derive a bounded dierence type inequality or Markov chains, satisying the same small set assumptions. We will start by describing known results or bounded classes o unctions and independent random variables, beginning with the celebrated Talagrand s inequality. They will serve us both as tools and as a point o reerence or presenting our results. 1.1 Talagrand s concentration inequalities In the paper [24], Talagrand proved the ollowing inequality or empirical processes. Theorem 1 Talagrand, [24]. Let X 1,...,X n be independent random variables with values in a measurable space S, B and let F be a countable class o measurable unctions : S R, such that a < or every F. Consider the random 1001

3 variable Z = sup F n X i. Then or all t 0, PZ EZ + t K exp 1 t ta log1 + K a V, 1 where V = Esup F n X i 2 and K is an absolute constant. In consequence, or all t 0, PZ EZ + t K 1 exp 1 t 2 K 1 V + at or some universal constant K 1. Moreover, the above inequalities hold, when replacing Z by Z. Inequalities 1 and 2 may be considered unctional versions o respectively Bennett s and Bernstein s inequalities or sums o independent random variables and similarly as in the classical case, one o them implies the other. Let us note, that Bennett s inequality recovers both the subgaussian and Poisson behaviour o sums o independent random variables, corresponding to classical limit theorems, whereas Bernstein s inequality recovers the subgaussian behaviour or small values and exhibits exponential behaviour or larger values o t. The above inequalities proved to be a very important tool in ininite dimensional probability, machine learning and M-estimation. They drew considerable attention resulting in several simpliied proos and dierent versions. In particular, there has been a series o papers, starting rom the work by Ledoux [10], exploring concentration o measure or empirical processes with the use o logarithmic Sobolev inequalities with discrete gradients. The irst explicit constants were obtained by Massart [16], who proved in particular the ollowing Theorem 2 Massart, [16]. Let X 1,...,X n be independent random variables with values in a measurable space S, B and let F be a countable class o measurable unctions : S R, such that a < or every F. Consider the random variable Z = sup F n X i. Assume moreover that or all F and all i, EX i = 0 and let σ 2 = sup n F EX i 2. Then or all η > 0 and t 0, and PZ 1 + ηez + σ 2K 1 t + K 2 ηat e t PZ 1 ηez σ 2K 3 t K 4 ηat e t, where K 1 = 4, K 2 η = /η, K 3 = 5.4, K 4 η = /η. Similar, more reined results were obtained subsequently by Bousquet [3] and Klein and Rio [7]. The latter article contains an inequality or suprema o empirical processes with the best known constants

4 Theorem 3 Klein, Rio, [7], Theorems 1.1., 1.2. Let X 1, X 2,...,X n be independent random variables with values in a measurable space S, B and let F be a countable class o measurable unctions : S [ a, a], such that or all i, EX i = 0. Consider the random variable Then, or all t 0, Z = sup F X i. t 2 PZ EZ + t exp 2σ 2 + 2aEZ + 3at and where t 2 PZ EZ t exp 2σ 2, + 2aEZ + 3at σ 2 = sup F EX i 2. The reader may notice, that contrary to the original Talagrand s result, estimates o Theorem 2 and 3 use rather the weak variance σ 2 than the strong parameter V o Theorem 1. This stems rom several reasons, e.g. the statistical relevance o parameter σ and analogy with the concentration o Gaussian processes which by CLT, in the case o Donsker classes o unctions correspond to the limiting behaviour o empirical processes. One should also note, that by the contraction principle we have σ 2 V σ aEZ see [12], Lemma 6.6. Thus, usually one would like to describe the subgaussian behaviour o the variables Z rather in terms o σ, however the price to be paid is the additional summand o the orm ηez. Let us also remark, that i one does not pay attention to constants, inequalities presented in Theorems 2 and 3 ollow rom Talagrand s inequality just by the aorementioned estimate V σ aEZ and the inequality between the geometric and the arithmetic mean in the case o Theorem Notation, basic deinitions In the article, by K we will denote universal constants and by Cα, β, K α constants depending only on α, β or only on α resp. where α, β are some parameters. In both cases the values o constants may change rom line to line. We will also use the classical deinition o exponential Orlicz norms. Deinition 1. For α > 0, deine the unction ψ α : R + R + with the ormula ψ α x = expx α 1. For a random variable X, deine also the Orlicz norm X ψα = in{λ > 0: Eψ α X/λ 1}. 1003

5 Let us also note a basic act that we will use in the sequel, namely that by Chebyshev s inequality, or t 0, t α PX t 2 exp. X ψα Remark For α < 1 the above deinition does not give a norm but only a quasi-norm. It can be ixed by changing the unction ψ α near zero, to make it convex which would give an equivalent norm. It is however widely accepted in literature to use the word norm also or the quasi-norm given by our deinition. 2 Tail inequality or suprema o empirical processes corresponding to classes o unbounded unctions 2.1 The main result or the independent case We will now ormulate our main result in the setting o independent variables, namely tail estimates or suprema o empirical processes under the assumption that the summands have inite ψ α Orlicz norm. Theorem 4. Let X 1,...,X n be independent random variables with values in a measurable space S, B and let F be a countable class o measurable unctions : S R. Assume that or every F and every i, EX i = 0 and or some α 0, 1] and all i, sup X i ψα <. Let Deine moreover Z = sup F σ 2 = sup F X i. EX i 2. Then, or all 0 < η < 1 and δ > 0, there exists a constant C = Cα, η, δ, such that or all t 0, and PZ 1 + ηez + t exp PZ 1 ηez t exp t δσ exp t δσ exp t α C max i sup F X i ψα t α. C max i sup F X i ψα 1004

6 Remark The above theorem may be thus considered a counterpart o Massart s result Theorem 2. It is written in a slightly dierent manner, relecting the use o Theorem 3 in the proo, but it is easy to see that i one disregards the constants, it yields another version in lavour o inequalities presented in Theorem 2. Let us note that some weaker e.g. not recovering the proper power α in the subexponential decay o the tail inequalities may be obtained by combining the Pisier inequality see 13 below with moment estimates or empirical processes proven by Giné, Lata la and Zinn [5] and later obtained by a dierent method also by Boucheron, Bousquet, Lugosi and Massart [2]. These moment estimates irst appeared in the context o tail inequalities or U-statistics and were later used in statistics, in model selection. They are however also o independent interest as extensions o classical Rosenthal s inequalities or p-th moments o sums o independent random variables with the dependence on p stated explicitly. The proo o Theorem 4 is a compilation o the classical Homan-Jørgensen inequality with Theorem 3 and another deep result due to Talagrand. Theorem 5 Ledoux, Talagrand, [12], Theorem p In the setting o Theorem 4, we have Z ψα K α Z 1 + max sup X i. i ψα We will also need the ollowing corollary to Theorem 3, which was derived in [4]. Since the proo is very short we will present it here or the sake o completeness Lemma 1. In the setting o Theorem 3, or all 0 < η 1, δ > 0 there exists a constant C = Cη, δ, such that or all t 0, PZ 1 + ηez + t exp t δσ 2 + exp t Ca and PZ 1 ηez t exp t δσ 2 + exp t. Ca Proo. It is enough to notice that or all δ > 0, exp t 2 2σ 2 exp + 2aEZ + 3at + exp t δσ 2 t δ 1 4aEZ + 3ta and use this inequality together with Theorem 3 or t + ηez instead o t, which gives C = 1 + 1/δ3 + 2η

7 Proo o Theorem 4. Without loss o generality we may and will assume that t/ max sup 1 i n F X i ψα > Kα, η, δ, 3 otherwise we can make the theorem trivial by choosing the constant C = Cη, δ, α to be large enough. The conditions on the constant Kα, η, δ will be imposed later on in the proo. Let ε = εδ > 0 its value will be determined later and or all F consider the truncated unctions 1 x = x1 {sup F x ρ} the truncation level ρ will also be ixed later. Deine also unctions 2 x = x 1 x = x1 {sup F x>ρ}. Let F i = { i : F}, i = 1, 2. We have Z = sup F X i sup 1 F 1 1 X i E 1 X i + sup 2 F 2 2 X i E 2 X i 4 and Z sup 1 F 1 1 X i E 1 X i sup 2 F 2 2 X i E 2 X i 5 where we used the act that E 1 X i + E 2 X i = 0 or all F. Similarly, by Jensen s inequality, we get E sup 1 F 1 1 X i E 1 X i 2E sup E sup 1 F 1 2 F 2 2 X i EZ 1 X i E 1 X i + 2E sup 2 X i. 6 2 F 2 Denoting and A = E sup 1 F 1 1 X i E 1 X i B = E sup 2 F 2 2 X i, 1006

8 we get by 4 and 6, PZ 1 + ηez + t P sup 1 X i E 1 X i 1 + ηez + 1 εt 1 F 1 + P sup 2 F 2 P sup 1 F 1 2 X i E 2 X i εt 1 X i E 1 X i 1 + ηa 4B + 1 εt + P sup 2 F 2 2 X i E 2 X i εt 7 and similarly by 5 and 6, PZ 1 ηez t P sup 1 X i E 1 X i 1 ηez 1 εt 1 F 1 + P sup 2 F 2 P sup 1 F 1 2 X i E 2 X i εt 1 X i E 1 X i 1 ηa 1 εt + 2B + P sup 2 F 2 2 X i E 2 X i εt. 8 We would like to choose a truncation level ρ in a way, which would allow to bound the irst summands on the right-hand sides o 7 and 8 with Lemma 1 and the other summands with Theorem 5. To this end let us set ρ = 8E max sup 1 i n F X i K α max sup 1 i n F X i. 9 ψα Let us notice that by the Chebyshev inequality and the deinition o the class F 2, we have Pmax k n sup 2 F 2 k 2 X i > 0 Pmax i sup X i > ρ 1/8 1007

9 and thus by the Homann-Jørgensen inequality see e.g. [12], Chapter 6, Proposition 6.8., inequality 6.8, we obtain In consequence E sup 2 F 2 B = E sup 2 F 2 2 X i 8E max sup 1 i n F 2 X i E 2 X i 16E max sup 1 i n F K α max sup 1 i n F X i. 10 X i X i. ψα We also have max sup 2 X i E 2 X i 1 i n 2 F 2 ψα K α max sup E 2 X i + K α max sup 2 X i 1 i n 2 F 2 ψα 1 i n 2 F 2 K α max sup 2 X i 1 i n 2 F 2 ψα K α max X i sup 1 i n F recall that with our deinitions, or α < 1, ψα is a quasi-norm, which explains the presence o the constant K α in the irst inequality. Thus, by Theorem 5, we obtain which implies sup 2 F 2 ψα 2 X i E 2 X i K α max ψα P sup 2 F 2 2 X i E 2 X i εt 2 exp Let us now choose ε < 1/10 and such that sup 1 i n F X i, ψα ψα εt α. 11 K α max 1 i n sup F X i ψα 1 5ε δ/2 1 + δ. 12 Since ε is a unction o δ, in view o 9 and 10, we can choose the constant Kα, η, δ in 3 to be large enough, to assure that B 8E max sup 1 i n F 1008 X i εt.

10 Notice moreover, that or every F, we have E 1 X i E 1 X i 2 E 1 X i 2 EX i 2. Thus, using inequalities 7, 8, 11 and Lemma 1 applied or η and δ/2, we obtain PZ 1 + ηez + t, PZ 1 ηez t exp t2 1 5ε 2 1 5εt 21 + δ/2σ 2 + exp Kη, δρ εt α + 2 exp. K α max 1 i n sup F X i ψα Since ε < 1/10, using 9 one can see that or t satisying 3 with Kα, η, δ large enough, we have exp 1 5εt Kη, δρ exp, exp εt α K α max 1 i n sup F X i ψα t α Cα, η, δ max 1 i n sup F X i ψα note that the above inequality holds or all t i α = 1. Thereore, or such t, PZ 1 + ηez + t, PZ 1 ηez t exp t2 1 5ε δ/2σ 2 t α + 3 exp. Cα, η, δ max 1 i n sup F X i ψα To inish the proo it is now enough to use 12. Remark We would like to point out that the use o the Homan-Jørgensen inequality in similar context is well known. Such applications appeared in the proo o the aorementioned moment estimates or empirical processes by Giné, Lata la, Zinn [5], in the proo o Theorem 5 and recently in the proo o Fuk-Nagaev type inequalities or empirical processes used by Einmahl and Li to investigate generalized laws o the iterated logarithm or Banach space valued variables [4]. As or using Theorem 5 to control the remainder ater truncating the original random variables, it was recently used in a somewhat similar way by Mendelson and Tomczak- Jaegermann see [17]. 1009

11 2.2 A counterexample We will now present a simple example, showing that in Theorem 4 one cannot replace sup max i X i ψα with max i sup X i ψα. With such a modiication, the inequality ails to be true even in the real valued case, i.e. when F is a singleton. For simplicity we will consider only the case α = 1. Consider a sequence Y 1, Y 2,..., o i.i.d. real random variables, such that PY i = r = e r = 1 PY i = 0. Let ε 1, ε 2,..., be a Rademacher sequence, independent rom Y i i. Deine inally X i = ε i Y i. We have Ee X i = e r e r + 1 e r 2, so X i ψ1 1. Moreover EX i 2 = r 2 e r. Assume now that we have or all n, r N and t 0, P X i K nt X1 2 + t X 1 ψ1 Ke t, where K is an absolute constant which would hold i the corresponding version o Theorem 4 was true. For suiciently large r, the above inequality applied with n e r r 2 and t r, implies that P X i r Ke r/k. On the other hand, by Levy s inequality, we have 2P X i r Pmax X i r 1 i n 2 minnpx 1 r, r 2, which gives a contradiction or large r. Remark A small modiication o the above argument shows that one cannot hope or an inequality P Z KEZ + tσ + t[log β n] max sup X i ψ1 Ke t i F with β < 1. For β = 1, this inequality ollows rom Theorem 4 via Pisier s inequality [21], max Y i K α max Y i ψα log 1/α n 13 i n ψα i n or independent real variables Y 1,..., Y n. 1010

12 3 Applications to Markov chains We will now turn to the other class o inequalities we are interested in. We are again concerned with random variables o the orm Z = sup X i, F but this time we assume that the class F is uniormly bounded and we drop the assumption on the independence o the underlying sequence X 1,...,X n. To be more precise, we will assume that X 1,...,X n orm a Markov chain, satisying some additional conditions, which are rather classical in the Markov chain or Markov Chain Monte Carlo literature. The organization o this part is as ollows. First, beore stating the main results, we will present all the structural assumptions we will impose on the chain. At the same time we will introduce some notation, which will be used in the sequel. Next, we present our results Theorems 6 and 7 ollowed by the proo which is quite straightorward but technical and a discussion o the optimality Section 3.3. At the end, in Section 3.4, we will also present a bounded dierences type inequality or Markov chains. 3.1 Assumptions on the Markov chain Let X 1, X 2,... be a homogeneous Markov chain on S, with transition kernel P = Px, A, satisying the so called minorization condition, stated below. Minorization condition We assume that there exist positive m N, δ > 0, a set C B small set and a probability measure ν on S or which and x C A B P m x, A δνa 14 x S n P nm x, C > 0, 15 where P i, is the transition kernel or the chain ater i steps. One can show that in such a situation i the chain admits an invariant measure π, then this measure is unique and satisies πc > 0 see [18]. Moreover, under some conditions on the initial distribution ξ, it can be extended to a new so called split chain X n, R n S {0, 1}, satisying the ollowing properties. Properties o the split chain P1 X n n is again a Markov chain with transition kernel P and initial distribution ξ hence or our purposes o estimating the tail probabilities we may and will identiy X n and X n, 1011

13 P2 i we deine T 1 = in{n > 0: R nm = 1}, T i+1 = in{n > 0: R T T i +nm = 1}, then T 1, T 2,..., are well deined, independent, moreover T 2, T 3,... are i.i.d., P3 i we deine S i = T T i, then the blocks Y 0 = X 1,...,X mt1 +m 1, Y i = X msi +1,...,X msi+1 +m 1, i > 0, orm a one-dependent sequence i.e. or all i, σy j j<i and σy j j>i are independent. Moreover, the sequence Y 1, Y 2,... is stationary. I m = 1, then the variables Y 0, Y 1,... are independent. In consequence, or : S R, the variables Z i = Z i = ms i+1 +m 1 i=ms i +1 X i, i 1, constitute a one-dependent stationary sequence an i.i.d. sequence i m = 1. Additionally, i is π-integrable recall that π is the unique stationary measure or the chain, then EZ i = δ 1 πc 1 m dπ. 16 P4 the distribution o T 1 depends only on ξ, P, C, δ, ν, whereas the law o T 2 only on P, C, δ and ν. We rerain rom speciying the construction o this new chain in ull generality as well as conditions under which 14 and 15 hold and reer the reader to the classical monograph [18] or a survey article [22] or a complete exposition. Here, we will only sketch the construction or m = 1, to give its lavour. Inormally speaking, at each step i, i we have X i = x and x / C, we generate the next value o the chain, according to the measure Px,. I x C, then we toss a coin with probability o success equal to δ. In the case o success R i = 1, we draw the next sample according to the measure ν, otherwise R i = 0, according to Px, δν. 1 δ When R i = 1, one usually says that the chain regenerates, as the distribution in the next step or m = 1, ater m steps in general is again ν. Let us remark that or a recurrent chain on a countable state space, admitting a stationary distribution, the Minorization condition is always satisied with m = 1 and δ = 1 or C we can take {x}, where x is an arbitrary element o the state space. Also the construction o the split chain becomes trivial. 1012

14 Beore we proceed, let us present a general idea, our approach is based on. To derive our estimates we will need two types o assumptions. Regeneration assumption. We will work under the assumption that the chain admits a representation as above Properties P1 to P4. We will not however take advantage o the explicit construction. Instead we will use the properties stated in points above. A similar approach is quite common in the literature. Assumption o the exponential integrability o the regeneration time. To derive concentration o measure inequalities, we will also assume that T 1 ψ1 < and T 2 ψ1 <. At the end o the article we will present examples or which this assumption is satisied and relate obtained inequalities to known results. The regenerative properties o the chain allow us to decompose the chain into onedependent independent i m = 1 blocks o random length, making it possible to reduce the analysis o the chain to sums o independent random variables this approach is by now classical, it has been successully used in the analysis o limit theorems or Markov chains, see [18]. Since we are interested in non-asymptotic estimates o exponential type, we have to impose some additional conditions on the regeneration time, which would give us control over the random length o one-dependent blocks. This is the reason or introducing the assumption o the exponential integrability which ater some technical steps allows us to apply the inequalities or unbounded empirical processes, presented in Section Main results concerning Markov chains Having established all the notation, we are ready to state our main results on Markov chains. As announced in the introduction, our results depend on the parameter m in the Minorization condition. I m = 1 we are able to obtain tail inequalities or empirical processes Theorem 7, whereas or m > 1 the result is restricted to linear statistics o Markov chains Theorem 6, which ormally corresponds to empirical processes indexed by a singleton. The variables T i and Z 1 appearing in the theorems were deined in the previous section see the properties P2 and P3 o the split chain. Theorem 6. Let X 1, X 2,... be a Markov chain with values in S, satisying the Minorization condition and admitting a unique stationary distribution π. Assume also that T 1 ψ1, T 2 ψ1 τ. Consider a unction : S R, such that a and E π = 0. Deine also the random variable Z = X i. 1013

15 Then or all t > 0, P Z > t K exp 1 K min t 2 nmet 2 1 VarZ 1, t τ amlog n Theorem 7. Let X 1, X 2,... be a Markov chain with values in S, satisying the Minorization condition with m = 1 and admitting a unique stationary distribution π. Assume also that T 1 ψ1, T 2 ψ1 τ. Consider moreover a countable class F o measurable unctions : S R, such that a and E π = 0. Deine the random variable and the asymptotic weak variance Then, or all t 1, Remarks Z = sup F X i σ 2 = sup VarZ 1 /ET 2. F P Z KEZ + t K exp 1 t 2 K min t nσ 2, τ 3 ET 2 1. alog n 1. As it was mentioned in the previous section, chains satisying the Minorization condition admit at most one stationary measure. 2. In Theorem 7, the dependence on the chain is worse that in Theorem 6, i.e. we have τ 3 ET 2 1 instead o τ 2 in the denominator. It is a result o just one step in the argument we present below, however at the moment we do not know how to improve this dependence or extend the result to m > Another remark we would like to make is related to the limit behaviour o the Markov chain. Let us notice that the asymptotic variance the variance in the CLT or n 1/2 X X n equals which or m = 1 reduces to m 1 ET 2 1 VarZ 1 + 2EZ 1 Z 2, ET 2 1 VarZ 1 we again reer the reader to [18], Chapter 17 or details. Thus, or m = 1 our estimates relect the asymptotic behaviour o the variable Z. 1014

16 Let us now pass to the proos o the above theorems. For a unction : S R let us deine and recall the variables S i and Z 0 = Z i = Z i = mt 1 +m 1 n ms i+1 +m 1 i=ms i +1 X i X i, i 1, deined in the previous section see property P3 o the split chain. Recall also that Z i s orm a one-dependent stationary sequence or m > 1 and an i.i.d. sequence or m = 1. Using this notation, we have with X X n = Z Z N + i=s N+1 +1m X i, 18 N = sup{i N: ms i+1 + m 1 n}, 19 where sup = 0 note that N is a random variable. Thus Z 0 represents the sum up to the irst regeneration time, then Z 1,...,Z N are identically distributed blocks between consecutive regeneration times, included in the interval [1, n], inally the last term corresponds to the initial segment o the last block. The sum Z Z N is empty i up to time n, there has not been any regeneration i.e. mt 1 + m 1 > n or there has been only one regeneration mt 1 + m 1 n and mt 1 + T 2 + m 1 > n. The last sum on the right hand side is empty i there has been no regeneration or the last ull block ends with n. We will irst bound the initial and the last summand in the decomposition 18. To achieve this we will not need the assumptions that is centered with respect to the stationary distribution π. In consequence the same bound may be applied to proos o both Theorem 6 and Theorem 7. Lemma 2. I T 1 ψ1 τ and a, then or all t 0, t PZ 0 t 2 exp. 2amτ Proo. We have Z 0 2aT 1 m, so by the remark ater Deinition 1, t PZ 0 t PT 1 t/2am 2 exp. 2amτ 1015

17 The next lemma provides a similar bound or the last summand on the right hand side o 18. It is a little bit more complicated, since it involves additional dependence on the random variable N. Lemma 3. I T 1 ψ1, T 2 ψ1 τ, then or all t 0, Pn ms N > t K exp In consequence, i a, then P i=s N+1 +1m X i > t K exp t Kmτ log τ t Kamτ log τ Proo. Let us consider the variable M n = n ms N I M n > t then S N+1 < n t + 1 n t <. m m Thereore PM n > t = = = PS N+1 = k k< n t+1 m 1 k PS l = k &N + 1 = l k< n t+1 m 1 k< n t+1 m 1 k< n t+1 m 1 n t+1 m 1 k=1 n t+1 m 1 k=1 2 exp l=1.. k PS l = k & mk + T l+1 + m 1 > n l=1 k l=1 PS l = kpt 2 > n + 1 m 1 k PT 2 > n + 1 m 1 k 1 2 exp τ τ 1 1 n + 1 m t Kτ exp, mτ k + 1 n + 1 m exp exp1/τ τ 1 n t+1 m 1 exp1/τ 1 where the irst equality ollows rom the act that S N+1 N + 1, the second rom the deinition o N, the third rom the act that T 1, T 2,... are independent and T 2, T 3,

18 are i.i.d., inally the second inequality rom the act that S i S j or i j see the properties P2 and P3 o the split chain. Let us notice that i t > 2mτ log τ, then t t τ exp exp exp mτ 2mτ t Kmτ log τ where in the last inequality we have used the act that τ > c or some universal constant c > 1. On the other hand, i t < 2mτ log τ, then Thereore we obtain or t 0, 1 e exp PM n > t K exp which proves the irst part o the Lemma. Now, P i=s N+1 +1m X i > t t 2mτ log τ. t Kmτ log τ, PM n > t/a K exp, t Kamτ log τ Beore we proceed with the proo o Theorem 6 and Theorem 7, we would like to make some additional comments regarding our approach. As already mentioned, thanks to the property P3 o the split chain, we may apply to Z Z N the inequalities or sums o independent random variables obtained in Section 2 since or m > 1 we have only one-dependence, we will split the sum, treating even and odd indices separately. The number o summands is random, but clearly not larger than n. Since the variables Z i are equidistributed, we can reduce this random sum to a deterministic one by applying the ollowing maximal inequality by Montgomery-Smith [19]. Lemma 4. Let Y 1,...,Y n be i.i.d. Banach space valued random variables. Then or some universal constant K and every t > 0, P max k n k Y i > t KP Y i > t/k. Remark The use o regeneration methods makes our proo similar to the proo o the CLT or Markov chains. In this context, the above lemma can be viewed as a counterpart o the Anscombe theorem they are quite dierent statements but both are used to handle the random number o summands. 1017

19 One could now apply Lemma 4 directly, using the act that N n. Then however one would not get the asymptotic variance in the exponent see remark ater Theorem 7. The orm o this variance is a consequence o the aorementioned Anscombe theorem and the act that by the LLN we have denoting N = N n to stress the dependence on n N n lim n n = 1 a.s. 20 met 2 Thereore to obtain an inequality which at least up to universal constants and or m = 1 relects the limiting behaviour o the variable Z, we will need a quantitative version o 20 given in the ollowing lemma. Lemma 5. I T 1 ψ1, T 2 ψ1 τ, then PN > 3n/mET 2 K exp 1 net 2 K mτ 2. To prove the above estimate, we will use the classical Bernstein s inequality actually its version or ψ 1 variables. Lemma 6 Bernstein s ψ 1 inequality, see [25], Lemma and the subsequent remark. I Y 1,...,Y n are independent random variables such that EY i = 0 and Y ψ1 τ, then or every t > 0, P Y i > t 2 exp 1 t 2 K min t nτ 2,. τ Proo o Lemma 5. Assume now that n/met 2 1. We have PN > 3n/mET 2 P mt T 3n/mET2 +1 n P P = P 3n/mET 2 +1 i=2 3n/mET 2 +1 i=2 3n/mET 2 +1 i=2 T i ET 2 n/m 3n/mET 2 ET 2 T i ET 2 n/m 3n/2m T i ET 2 n/2m. We have T 2 ET 2 ψ1 2 T 2 ψ1 2τ, thereore Bernstein s inequality Lemma 6, 1018

20 gives PN > 3n/mET 2 2 exp 1 K min 2 exp 1 K min net2 mτ 2, n mτ = 2 exp 1 net 2 K mτ 2, n/2m 2 n 3n/mET 2 τ 2, mτ where the equality ollows rom the act that ET 2 τ. I n/met 2 < 1, then also net 2 /mτ 2 < 1, thus inally we have which proves the lemma. PN > 3n/mET 2 K exp 1 net 2 K mτ 2, We are now in position to prove Theorem 6 Proo o Theorem 6. Let us notice that Z i amt i+1, so or i 1, Z i ψ1 am T 2 ψ1 amτ. Additionally, by 16, or i 1, EZ i = 0. Denote now R = 3n/mET 2. Lemma 5, Lemma 4 and Theorem 4 with α = 1, combined with Pisier s estimate 13 give P Z Z N > 2t 21 P Z Z N > 2t &N R + K exp 1 net 2 K mτ 2 P Z 1 + Z Z 2 N 1/2 +1 > t &N R P Z 2 + Z Z 2 N/2 > t &N R + K exp 1 net 2 K mτ 2 P max Z 1 + Z Z 2k+1 > t k R 1/2 + P max Z Z 2k > t + K exp 1 net 2 k R/2 K mτ 2 KP Z 1 + Z Z 2 R 1/2 +1 > t/k + KP Z 2 + Z Z 2 R/2 > t/k + K exp 1 net 2 K mτ 2 K exp 1 K min + K exp 1 K t 2 nmet 2 1 VarZ 1, net 2 mτ 2. t log3n/met 2 amτ 1019

21 Combining the above estimate with 18, Lemma 2 and Lemma 3, we obtain P S n > 4t K exp 1 K min t 2 t nmet 2 1, VarZ 1 log3n/met 2 amτ + K exp 1 net 2 t t K mτ exp + K exp. 2amτ Kamτ log τ For t > na/4, the let hand side o the above inequality is equal to 0, thereore, using the act that ET 2 1, τ > 1, we obtain 17. The proo o Theorem 7 is quite similar, however it involves some additional technicalities related to the presence o EZ in our estimates. Proo o Theorem 7. Let us irst notice that similarly as in the real valued case, we have t Psup Z 0 t PT 1 t/a 2 exp, aτ moreover Lemma 3 applied to the unction x sup F x gives P sup i=s N+1 +1 X i > t K exp t Kaτ log τ One can also see that since we assume that m = 1, the splitting o Z Z N into sums over even and odd indices is not necessary by the property P3 o the split chain the summands are independent. Using the act that Lemma 4 is valid or Banach space valued variables, we can repeat the argument rom the proo o Theorem 6 and obtain or R = 3n/ET 2, R P Z KEsup Z i +t Thus, Theorem 7 will ollow i we prove that R Esup Z i KEsup recall that K may change rom line to line. K exp 1 t 2 K min nσ 2, K exp 1 t 2 K min nσ 2,. t τ 2 alog n t τ 3 ET 2 1 alog n. X i + Kτ 3 a/et

22 From the triangle inequality, the act that Y i = X Si +1,...,X Si+1, i 1, are i.i.d. and Jensen s inequality it ollows that R Esup Z i 12Esup 12Esup n/4et 2 n/4et 2 Z i Z i + 12aτ, 23 where in the last inequality we used the act that Esup Z i EaT i+1 aτ. We will split the integral on the right hand side into two parts, depending on the size o the variable N. Let us irst consider the quantity Esup n/4et 2 Z i 1 {N< n/4et2 } Assume that n/4et 2 1. Then, using Bernstein s inequality, we obtain P N < n/4et 2 = P PT 1 > n/2 + P 2e n/2τ + P 2e n/2τ + 2 exp n/4et 2 +1 n/4et 2 +1 i=2 n/4et 2 +1 i=2 T i > n T i ET 2 > n/2 n/4et 2 ET 2 T i ET 2 > n/4 1 K min n 2 ET 2 nτ 2, n τ I n/4et 2 < 1, the above estimate holds trivially. Thereore Esup n/4et 2 Z i 1 {N< n/4et2 } a an T 2 2 PN < n/4et2 Ke net 2/Kτ 2. n/4et 2 ET i+1 1 {N< n/4et2 } Kaτne net 2/Kτ 2 Kaτ 3 /ET Now we will bound the remaining part i.e. Esup n/4et 2 Z i 1 {N n/4et2 }. 1021

23 Recall that Y 0 = X 1,...,X T1, Y i = X Si +1,...,X Si+1 or i 1 and consider a iltration F i i 0 deined as F i = σy 0,...,Y i, where we regard the blocks Y i as random variables with values in the disjoint union Si, with the natural σ-ield, i.e. the σ-ield generated by B i recall that B denotes our σ-ield o reerence in S. Let us urther notice that T i is measurable with respect to σy i 1 or i 1. We have or i 1, {N + 1 i} = {T T i+1 > n} F i and {N + 1 0} =, so N + 1 is a stopping time with respect to the iltration F i. Thus we have Esup n/4et 2 = Esup Z i [ E E sup = Esup 1 {N n/4et2 } n/4et 2 N+1 N+1 aτ + Esup aτ + Esup N+1 Z i N+1 i=0 Z i Z i 1 {N+1> n/4et2 } F n/4et2 N+1 1 {N+1> n/4et2 } Z i 1 {N+1> n/4et2 } X i + Esup S N+2 i=n+1 ] 1 {N+1> n/4et2 } X i, where in the irst inequality we used Doob s optional sampling theorem together with the act that sup n Z i is a submartingale with respect to F i notice that Z i is measurable with respect to σy i or i N and F. The second equality ollows rom the act that {N + 1 > n/4et 2 } F n/4et2 N+1. Indeed or i n/4et 2, we have {N + 1 > n/4et 2 & n/4et 2 N + 1 i} = {N + 1 > n/4et 2 } F n/4et2 F i, whereas or i < n/4et 2, this set is empty. Now, combining the above estimate with 23 and 24 and taking into account the inequality τ ET 2 1, it is easy to see that to inish the proo o 22 it is enough to 1022

24 show that Esup S N+2 i=n+1 X i Kaτ 3 /ET 2 25 This in turn will ollow i we prove that ES N+2 n Kτ 3 /ET 2. Recall now the irst part o Lemma 3, stating under our assumptions m = 1 that Pn S N+1 > t K exp t/kτ log τ or t 0. We have PS N+2 n > t Pn S N+1 > t + Pn S N+1 t & S N+2 n > t &N > 0 + PS N+2 n > t & N = 0 t n Ke t/kτ log τ + PS N+1 = n k&t N+2 > t + k &N > 0 + PT 1 + T 2 > t Ke t/kτ log τ + k=0 t n k=0 t n k PS l = n k &T l+1 > t + k + 2e t/2τ l=2 n k Ke t/kτ log τ + PS l = n kpt l+1 > t + k k=0 l=2 Ke t/kτ log τ + 2t + 1e t/τ Ke t/kτ log τ. This implies that ES N+2 n Kτ log τ Kτ 3 /ET 2, which proves 25. Thus 22 is shown and Theorem 7 ollows. Remark Two natural questions to ask in regard to Theorem 7 is irst whether the constant K in ront o the expectation can be reduced to 1+η as in Massart s Theorem 2 or Theorem 4 and second, whether one can reduce the constant K in the Gaussian part to 21 + δ as in Theorem Another counterexample I we do not pay attention to constants, the main dierence between inequalities presented in the previous section and the classical Bernstein s inequality or sums o i.i.d. bounded variables is the presence o the additional actor log n. We would now like to argue that under the assumptions o Theorems 6 and 7, this additional actor is indispensable. To be more precise, we will construct a Markov chain on a countable state space, satisying the assumptions o Theorem 6 with m = 1 and such that or β < 1, there is 1023

25 no constant K, such that P X X n t K exp 1 K min t 2 nvarz 1, t log β n 26 or all n and all unctions : S R, with 1 and E π = 0. The state space o the chain will be the set S = {0} {n} {1, 2,...,n} {+1, 1}. n=1 The transition probabilities are as ollows p n,k,s,n,k+1,s = 1 or n = 1, 2,..., k = 1, 2,...,n 1, s = 1, +1 p n,n,s,0 = 1 or n = 1, 2,..., s = 1, +1, p 0,n,1,s = 1 2A e n or n = 1, 2,..., s = 1, +1, where A = n=1 e n. In other words, whenever a particle is at 0, it chooses one o countably many loops and travels deterministically along it until the next return to 0. It is easy to check that this chain has a stationary distribution π, given by A π 0 = A + n=1 ne n, π n,i,s = 1 2A e n π 0. The chain satisies the minorization condition 14 with C = {0}, ν{x} = p 0,x, δ = 1 and m = 1. The random variable T 1 is now just the time o the irst visit to 0 and T 2, T 3,... indicate the time between consecutive visits to 0. Moreover PT 2 = n = e n A, so T 2 ψ1 <. I we start the chain rom initial distribution ν, then T 1 has the same law as T 2, so τ = T 2 ψ1 = T 1 ψ1. Let us now assume that or some β < 1, there is a constant K, such that 26 holds. Since we work with a ixed chain, in what ollows we will use the letter K also to denote constants depending on our chain the value o K may again dier at dierent occurrences. We can in particular apply 26 to the unction = r where r is a large integer, given by the ormula 0 = 0, n, i, s = s1 {n r}. 1024

26 We have E π r = 0. Moreover Thereore 26 gives or t 1 and n N. VarZ 1 r = n 2 e n A 1 Kr 2 e r. n=r P r X r X n Kre r/2 nt + t log β n e t 27 Recall that S i = T T i. By Bernstein s inequality Lemma 6, we have or large n, PS n/3et2 > n = PT T n/3et2 > n = P P n/3et 2 n/3et 2 2 exp 1 K min T i ET i > n n/3et 2 ET 2 T i ET i > n/2 n 2 n T 2 2 ψ 1, n = 2e n/k. T 2 ψ1 From the above estimate, or some integer L and n large enough, divisible by L, 1025

27 n/l P Z i r 2Kre r/2 nt + t log β n i=0 n/l 2e n/k + P Z i r 2Kre r/2 nt + t log β n &S n/l+1 n i=0 2e n/k + P r X i Kre r/2 nt + t log β n &S n/l+1 n + P i=0 i=s n/l e n/k + e t + P k n =2e n/k + e t i=s n/l [ n k E 1 {Sn/L+1 =k}p k n r X i Kre r/2 nt + t log β n &S n/l+1 n r X i Kre r/2 nt + t log β n &S n/l+1 = k r X i Kre r/2 ] nt + t log β n 2e n/k + e t + e t k n E1 {Sn/L+1 =k} 2e n/k + 2e t, where in the third and ourth inequality we used 27 and in the equality, the Markov property. For n r 2 e r and t 1, we obtain P Z 0 r Z n/l r Kt log β n 2e t + 2e n/k 28 On the other hand we have PZ i r r > 1 2A e r. Thereore Pmax i n/l Z i r > r 2 1 minne r /2AL, 1. Since Z i r are symmetric, by Levy s inequality, we get 2P Z 0 r Z n/l r r 1 2 minne r /2AL, 1 c r 2, whereas 28 applied or t = K 1 r/ log β n K 1 r 1 β 1 gives P Z 0 r Z n/l r r 2e r1 β /K + 2e er /Kr 2, which gives a contradiction. 1026

28 3.4 A bounded dierence type inequality or symmetric unctions Now we will present an inequality or more general statistics o the chain. Under the same assumptions on the chain as above with an additional restriction that m = 1, we will prove a version o the bounded dierence inequality or symmetric unctions see e.g. [11] or the classical i.i.d. case. Let us consider a measurable unction : S n R which is invariant under permutations o arguments i.e. or all permutations σ o the set {1,...,n}. x 1,...,x n = x σ1,...,x σn 29 Let us also assume that is L-Lipschitz with respect to the Hamming distance, i.e. Then we have the ollowing x 1,...,x n y 1,...,y n L#{i: x i y i }. 30 Theorem 8. Let X 1, X 2,... be a Markov chain with values in S, satisying the Minorization condition with m = 1 and admitting a unique stationary distribution π. Assume also that T 1 ψ1, T 2 ψ1 τ. Then or every unction : S n R, satisying 29 and 30, we have or all t 0. PX 1,...,X n EX 1,...,X n t 2 exp 1 t 2 K nl 2 τ 2 To prove the above theorem, we will need the ollowing Lemma 7. Let ϕ: R R be a convex unction and G = Y 1,...,Y n, where Y 1,...,Y n are independent random variables with values in a measurable space E and : E n R is a measurable unction. Denote G i = Y 1,...,Y i 1, Ỹi, Y i+1,...,y n, where Ỹ1,...,Ỹn is an independent copy o Y 1,...,Y n. Assume moreover that G G i F i Y i, Ỹi or some unctions F i : E 2 R, i = 1,...,n. Then EϕG EG Eϕ ε i F i Y i, Ỹi, 31 where ε 1,...,ε n is a sequence o independent Rademacher variables, independent o Y i n and Ỹi n. 1027

29 Proo. Induction with respect to n. For n = 0 the statement is obvious, since both the let-hand and the right-hand side o 31 equal ϕ0. Let us thereore assume that the lemma is true or n 1. Then, denoting by E X integration with respect to the variable X, EϕG EG = EϕG EỸn G n + E Yn G EG EϕG G n + E Yn G EG = EϕG n G + E Yn G EG = Eϕε n G G n + E Yn G EG Eϕε n F n Y n, Ỹn + E Yn G EG, where the equalities ollow rom the symmetry and the last inequality rom the contraction principle or simply convexity o ϕ, applied conditionally on Y i i, Ỹi i. Now, denoting Z = E Yn G, Z i = E Yn G i, we have or i = 1,...,n 1, Z Z i = E Yn G E Yn G i E Yn G G i F i Y i, Ỹi, and thus or ixed Y n,ỹn and ε n, we can apply the induction assumption to the unction t ϕε n FY n, Ỹn + t instead o ϕ and E Yn G instead o G, to obtain EϕG EG Eϕ F i Y i, Ỹiε i. Lemma 8. In the setting o Lemma 7, i or all i, F i Y i, Ỹi ψ1 τ, then or all t > 0, PY 1,...,Y n EY 1,...,Y n t 2 exp 1 t 2 K min t nτ 2,. τ Proo. For p 1, Y 1,...,Y n EY 1,...,Y n p ε i FY i, Ỹi K p nτ + pτ, p where the irst inequality ollows rom Lemma 7 and the second one rom Bernstein s inequality Lemma 6 and integration by parts. Now, by the Chebyshev inequality we get PY 1,...,Y n EY 1,...,Y n K tn + tτ e t or t 1, which is up to the constant in the exponent equivalent to the statement o the lemma note that i we can change the constant in the exponent, the choice o the constant in ront o the exponent is arbitrary, provided it is bigger than 1. Proo o Theorem 8. Consider a disjoint union E = 1028 S i

30 and a unction : E n R deined as where x i s are deined by the condition y 1,...,y n = x 1,...,x n, y 1 = x 1,...,x t1 S t 1 y 2 = x t1 +1,...,x t1 +t 2 S t 2... y n = x t t n 1 +1,...,x t t n S tn. 32 Let now T 1,...,T n be the regeneration times o the chain and set Y i = X T T i 1 +1,...,X T T i or i = 1,...,n we change the enumeration with respect to previous sections, but there is no longer need to distinguish the initial block. Then Y 1,...,Y n are independent E- valued random variables recall the assumption m = 1. Moreover we have X 1,...,X n = Y 1,...,Y n. Let now Ỹ1,...,Ỹn be an independent copy o the sequence Y 1,...,Y n. Deine G and G i like in Lemma 7 or the unction. Deine also T i = j i Ỹi S j and let X i,1,..., X i,t T i 1 + T i +T i T n correspond to Y 1,...,Y i 1, Ỹi, Y i+1,...,y n in the same way as in 32. Let us notice that we can rearrange the sequence X i,1,..., X i,n in such a way that the Hamming distance o the new sequence rom X 1,...,X n will not exceed maxt i, T i. Since the unction is invariant under permutation o arguments and L-Lipschitz with respect to the Hamming distance, we have G G i LmaxT i, T i =: FY i, Ỹi. Moreover, FY i, Ỹi ψ1 2Lτ, so by Lemma 8, we obtain PX 1,...,X n EX 1,...,X n t = P Y 1,...,Y n E Y 1,...,Y n t 2 exp 1 t 2 K min t nl 2 τ 2,. Lτ But rom Jensen s inequality and 30 it ollows that X 1,...,X n EX 1,...,X n Ln, thus or t > Ln, the let hand side o the above inequality is equal to 0, whereas or t Ln, the inequality τ > 1 gives which proves the theorem. t 2 nl 2 τ 2 t Lτ, 1029

31 3.5 A ew words on connections with other results First we would like to comment on the assumptions o our main theorems, concerning Markov chains. We assume that the Orlicz norms T 1 ψ1 and T 2 ψ1 are inite, which is equivalent to existence o a number κ > 1, such that E ξ κ T 1 <, E ν κ T 1 <, where ξ is the initial distribution o the chain and ν the minorizing measure rom condition 14. This is true or instance i m = 1 and the chain satisies the drit condition, i.e. i there is a measurable unction V : S [1,, together with constants λ < 1 and K <, such that { λv x or x / C, PV x = V ypx, dy K or x C S and V is ξ and ν integrable see e.g. [1], Propositions 4.1 and 4.4, see also [22], [18]. For m > 1 one can similarly consider the kernel P m instead o P however in this case our inequalities are restricted to averages o real valued unctions as in Theorem 6. Such drit conditions have gained considerable attention in the Markov Chain Monte Carlo theory as they imply geometric ergodicity o the chain. Concentration o measure inequalities or general unctions o Markov chains were investigated by Marton [13], Samson [23] and more recently by Kontorovich and Ramanan [8]. They actually consider more general mixing processes and give estimates on the deviation o a random variable rom the mean or median in terms o mixing coeicients. When specialized to Markov chains, their estimates yield inequalities in the spirit o Theorem 8 or general non-necessarily symmetric unctions o uniormly ergodic Markov chains see [18], Chapter 16 or the deinition. To obtain their results, Marton and Samson used transportation inequalities, whereas Kontorovich s and Ramanan s approach was based on martingales. In all cases the bounds include sums o expressions o the orm sup P i x, P i y, TV, x,y S where P i is the i step transition unction o the chain. These results are not well suited or Markov chains which are not uniormly ergodic like the chain in Section 3.3, since or such chains the summands are bounded rom below by a constant which spoils the dependence on n in the estimates. It would be interesting to know i in results o this type, the supremum o the total variation distances can be replaced by some other norm, or instance a kind o average. This would allow to extend the estimates to some classes o non-uniormly ergodic Markov chains. Inequalities o the bounded dierence type or sums X X n where X i s orm a uniormly ergodic Markov chain were also obtained by Glyn and Ormoneit [6]. Their method was to analyze the Poisson equation associated with the chain. Their result has been complemented by an inormation theoretic approach in Kontoyiannis et al. [9]. 1030

32 Estimates or sums, in terms o variance, appeared in the work by Samson [23], who presents a result or empirical processes o uniormly ergodic chains. He gives a real concentration inequality around the mean and not just a tail bound as in Theorem 7. The coeicient responsible or the subgaussian behavior o the tail is E n sup X i 2. Replacing it with V = Esup i X i 2 which would correspond to the original Talagrand s inequality is stated in Samson s work as an open problem, which to our best knowledge has not been yet solved. Additionally, in Samson s estimate there is no log n actor, which is present in Theorems 6 and 7. Since we have shown that in our setting this actor is indispensable, we would like to comment on the dierences between the results by Samson and ours. Obviously, the irst dierence is the setting. Although non-uniormly ergodic chains may satisy our assumptions T 1 ψ1, T 2 ψ1 <, the Minorization condition need not hold or them with m = 1, which restricts our results to linear statistics o the chain Theorem 6. However, there are many examples o non-uniormly ergodic chains, or which one cannot apply Samson s result but which satisy our assumptions. Such chains have been considered in the MCMC theory. When specialized to sums o real variables, Samson s result can be considered a counterpart o the Bernstein inequality, valid or uniormly ergodic Markov chains. The subgaussian part o the estimate is controlled by n EX i 2, which can be much bigger than the asymptotic variance and thereore does not relect the limiting behaviour o X X n. Consider or instance a chain consisting o the origin connected with initely many loops in which, similarly as in the example rom Section 3.3, the randomness appears only at the origin i.e. ater the choice o the loop the particle travels along it deterministically until the next return to the origin. Then, one can easily construct a unction with values in {±1}, centered with respect to the stationary distribution and such that its asymptotic variance is equal to zero, whereas n EX i 2 = n or all n it happens or instance i the sum o the values o along each loop vanishes. In consequence, n 1/2 X X n converges weakly to the Dirac mass at 0 and we have PX X n nt 0 or all t > 0, which is not recovered by Samson s estimate. One can also construct other examples o similar lavour, in which the asymptotic variance is nonzero but is still much smaller than E n X i 2. On the other hand Samson s results do not require the condition E π = 0 and as already mentioned in the case o empirical processes they provide a two sided concentration around the mean. As or the log n actor, at present we do not know i at the cost o replacing the asymptotic variance with n EX i 2 one can eliminate it in our setting. Summarizing, our inequalities, when compared to known results have both advantages and disadvantages. On the one hand, when specialized to uniormly ergodic 1031

33 Markov chains, they do not recover the ull generality or strength o previous estimates or instance Theorem 8 is restricted to symmetric statistics and m = 1, on the other hand they may be applied to Markov chains arising in statistical applications, which are not uniormly ergodic and thereore beyond the scope o the estimates presented above. Another property, which in our opinion, makes the estimates o Theorems 6 and 7 interesting at least rom the theoretical point o view is the act that or m = 1, the coeicient responsible or the Gaussian level o concentration corresponds to the variance o the limiting Gaussian distribution. Acknowledgements The author would like to thank Witold Bednorz and Krzyszto Latuszyński or their useul comments concerning the results presented in this article as well as the anonymous Reeree, whose remarks helped improve their presentation. Reerences [1] Baxendale P. H. Renewal theory and computable convergence rates or geometrically ergodic Markov chains. Ann. Appl. Probab , no. 1B, MR [2] Boucheron S., Bousquet O., Lugosi G., Massart P. Moment inequalities or unctions o independent random variables. Ann. Probab , no. 2, MR [3] Bousquet O. A Bennett concentration inequality and its application to suprema o empirical processes. C. R. Math. Acad. Sci. Paris , no. 6, MR [4] Einmahl U., Li D. Characterization o LIL behavior in Banach space. To appear in Trans. Am. Math. Soc. [5] Giné E., Lata la R., Zinn J. Exponential and moment inequalities or U- statistics. In High Dimensional Probability II, Progr. Probab. 47. Birkhauser, Boston, Boston, MA, MR [6] Glynn P. W., Ormoneit D. Hoeding s inequality or uniormly ergodic Markov chains. Statist. Probab. Lett , no. 2, MR [7] Klein T., Rio, E. Concentration around the mean or maxima o empirical processes. Ann. Probab , no. 3, MR [8] Kontorovich L., Ramanan K. Concentration Inequalities or Dependent Random Variables via the Martingale Method. To appear in Ann. Probab. [9] Kontoyiannis I., Lastras-Montaño L., Meyn S. P. Relative Entropy and Exponential Deviation Bounds or General Markov Chains IEEE International Symposium on Inormation Theory. 1032

arxiv: v1 [math.pr] 19 Sep 2007

arxiv: v1 [math.pr] 19 Sep 2007 arxiv:0709.3110v1 [math.pr] 19 Sep 2007 A tail inequality or suprema o unbounded empirical processes with applications to Markov chains Rados law Adamczak June 11, 2008 Abstract We present an easy extension