Deviation inequalities for martingales with applications

Similar documents
A generalization of Cramér large deviations for martingales

Cramér large deviation expansions for martingales under Bernstein s condition

Hoeffding s inequality for supermartingales

arxiv: v1 [math.pr] 12 Jun 2012

Sharp large deviation results for sums of independent random variables

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Introduction to Self-normalized Limit Theory

A CLT FOR MULTI-DIMENSIONAL MARTINGALE DIFFERENCES IN A LEXICOGRAPHIC ORDER GUY COHEN. Dedicated to the memory of Mikhail Gordin

On large deviations for combinatorial sums

MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

1. Stochastic Processes and filtrations

4 Sums of Independent Random Variables

arxiv: v3 [math.pr] 14 Sep 2014

Mi-Hwa Ko. t=1 Z t is true. j=0

Nonparametric regression with martingale increment errors

4 Expectation & the Lebesgue Theorems

STAT 200C: High-dimensional Statistics

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek

Some functional (Hölderian) limit theorems and their applications (II)

Self-normalized deviation inequalities with application to t-statistic

A log-scale limit theorem for one-dimensional random walks in random environments

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Large deviations for random walks under subexponentiality: the big-jump domain

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

Entropy and Ergodic Theory Lecture 15: A first look at concentration

The Convergence Rate for the Normal Approximation of Extreme Sums

Probability and Measure

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

3. Probability inequalities

Lecture 22 Girsanov s Theorem

OPTIMAL SOLUTIONS TO STOCHASTIC DIFFERENTIAL INCLUSIONS

A Type of Shannon-McMillan Approximation Theorems for Second-Order Nonhomogeneous Markov Chains Indexed by a Double Rooted Tree

Bahadur representations for bootstrap quantiles 1

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

Conditional independence, conditional mixing and conditional association

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

Estimation of the functional Weibull-tail coefficient

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

Spectral Gap and Concentration for Some Spherically Symmetric Probability Measures

On lower and upper bounds for probabilities of unions and the Borel Cantelli lemma

Limiting behaviour of moving average processes under ρ-mixing assumption

Wasserstein-2 bounds in normal approximation under local dependence

Practical approaches to the estimation of the ruin probability in a risk model with additional funds

Asymptotic Tail Probabilities of Sums of Dependent Subexponential Random Variables

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

Complete q-moment Convergence of Moving Average Processes under ϕ-mixing Assumption

The exact asymptotics for hitting probability of a remote orthant by a multivariate Lévy process: the Cramér case

P (A G) dp G P (A G)

MODERATE DEVIATIONS FOR STATIONARY PROCESSES

Complete Moment Convergence for Sung s Type Weighted Sums of ρ -Mixing Random Variables

Random convolution of O-exponential distributions

Selected Exercises on Expectations and Some Probability Inequalities

Exponential martingales: uniform integrability results and applications to point processes

The Codimension of the Zeros of a Stable Process in Random Scenery

3 Integration and Expectation

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

EXPONENTIAL INEQUALITIES FOR SELF-NORMALIZED MARTINGALES WITH APPLICATIONS

A Martingale Central Limit Theorem

ITERATED FUNCTION SYSTEMS WITH CONTINUOUS PLACE DEPENDENT PROBABILITIES

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

Zeros of lacunary random polynomials

On an Effective Solution of the Optimal Stopping Problem for Random Walks

STA205 Probability: Week 8 R. Wolpert

Consistency of the maximum likelihood estimator for general hidden Markov models

A note on the growth rate in the Fazekas Klesov general law of large numbers and on the weak law of large numbers for tail series

Properties of an infinite dimensional EDS system : the Muller s ratchet

On the Bennett-Hoeffding inequality

Lecture 17 Brownian motion as a Markov process

Krzysztof Burdzy University of Washington. = X(Y (t)), t 0}

On Asymptotic Proximity of Distributions

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*

Exercises in Extreme value theory

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

(2m)-TH MEAN BEHAVIOR OF SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS UNDER PARAMETRIC PERTURBATIONS

Mandelbrot s cascade in a Random Environment

An almost sure invariance principle for additive functionals of Markov chains

LAN property for ergodic jump-diffusion processes with discrete observations

On the asymptotic normality of frequency polygons for strongly mixing spatial processes. Mohamed EL MACHKOURI

Rare Events in Random Walks and Queueing Networks in the Presence of Heavy-Tailed Distributions

Research Article Exponential Inequalities for Positively Associated Random Variables and Applications

The Subexponential Product Convolution of Two Weibull-type Distributions

Mean convergence theorems and weak laws of large numbers for weighted sums of random variables under a condition of weighted integrability

arxiv: v1 [math.pr] 9 May 2014

ARTICLE IN PRESS. J. Math. Anal. Appl. ( ) Note. On pairwise sensitivity. Benoît Cadre, Pierre Jacob

Your first day at work MATH 806 (Fall 2015)

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Asymptotic behavior for sums of non-identically distributed random variables

On Optimal Stopping Problems with Power Function of Lévy Processes

SPECTRAL GAP FOR ZERO-RANGE DYNAMICS. By C. Landim, S. Sethuraman and S. Varadhan 1 IMPA and CNRS, Courant Institute and Courant Institute

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Tail bound inequalities and empirical likelihood for the mean

IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence

Lecture 4 Lebesgue spaces and inequalities

On large deviations of sums of independent random variables

(A n + B n + 1) A n + B n

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

Complete moment convergence for moving average process generated by ρ -mixing random variables

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Probability and Measure

Transcription:

Deviation inequalities for martingales with applications Xiequan Fan,a,b, Ion Grama c, Quansheng Liu,c a Center for Applied Mathematics, Tianjin University, Tianjin 300072, China b Regularity Team, Inria, France c Université de Bretagne-Sud, LMBA, UMR CNRS 6205, Campus de Tohannic, 5607 Vannes, France Abstract Using changes of probability measure developed by Grama and Haeusler Stochastic Process. Appl., 2000, we extend the deviation inequalities of Lanzinger and Stadtmüller Stochastic Process. Appl., 2000 and Fuk and Nagaev Theory Probab. Appl., 97 to the case of martingales. Our inequalities recover the best possible decaying rate in the independent case. In particular, these inequalities improve the results of Lesigne and Volný Stochastic Process. Appl., 200 under a stronger condition that the martingale differences have bounded conditional moments. Applications to linear regressions with martingale difference innovations, weak invariance principles for martingales and self-normalized deviations are provided. In particular, we establish a type of self-normalized deviation bounds for parameter estimation of linear regressions. Such type bounds have the advantage that they do not depend on the distribution of the regression random variables. Keywords: Large deviations; Martingales; Deviation inequalities; Linear regressions; Weak invariance principles; Self-normalized deviations 2000 MSC: Primary 60G42, 60E5, 60F0; Secondary 62J05, 60F7.. Introduction Assume that ξ i i is a sequence of centered random variables. Denote by S n = n i= ξ i the partial sums of ξ i i. If ξ i i are independent and identically distributed i.i.d. and satisfy the following subexponential condition: for a constant α 0,, K α := E[ξ 2 expξ + α <,. where x + = maxx,0, Lanzinger and Stadtmüller [26 have obtained the following subexponential inequality: for any x,y > 0, P S n x exp x y α nk α + n E[expξ + 2xy α e yα α..2 Corresponding authors. E-mail: fanxiequan@hotmail.com X. Fan, quansheng.liu@univ-ubs.fr Q. Liu. ion.grama@univ-ubs.fr I. Grama, Preprint submitted to Elsevier November 5, 206

In particular, by taking y = x, inequality.2 implies that for any x > 0, lim sup n n α logp S n nx x α.3 and P S n n = O exp cn α, n,.4 wherec > 0doesnotdependonnandcanbeanyvaluesmallerthanc 0 = supt > 0 : E[exptξ + α < see [26, Remark 3. The last two results.3 and.4 are the best possible under the present condition, since a large deviation principle LDP with good rate function x α can be obtained in situations where some more information on the tail behavior of ξ is available; see Theorem 2.3. Under the subexponential condition., more precise estimations on tail probabilities, or large deviation expansions, can be found in Nagaev [32, 33, Saulis and Statulevičius [36 and Borovkov [3, 4. Recently, the generalizations of.4 have attracted certain interest. Doukhan and Neumann [0 have established a generalization of.4 under a new concept of weak dependence which extends usual mixing assumptions. This concept is particularly well suited for deriving estimates for the cumulants of sums of random variables. For a LDP for weighted sum of i.i.d. random variables satisfying conditions similar to. we refer to Kiesel and Stadtmüller [25 and Gantert, Ramanan and Rembart [7. Ourfirstaimistogiveageneralization of.4 formartingales. Letξ i,f i i beasequenceofmartingaledifferenceswithrespecttothefiltrationf i i. UndertheCramérconditionsup i E[exp ξ i <, Lesigne andvolný [27firstproved that.4 holdswithα = /3, and that thepower /3 is optimal even for the class of stationary and ergodic sequences of martingale differences. Later, Fan, Grama and Liu [2 generalized the result of Lesigne and Volný by proving that.4 holds under the more general exponential moment condition sup i E[exp ξ i 2α α <, α 0,, and that the power α in.4 is optimal for the class of stationary sequences of martingale differences. It is obvious that the condition sup i E[exp ξ i 2α α < is much stronger than condition.. Thus, the result in [2 does not imply.4 in the i.i.d. case. To fill this gap, we consider the case of the martingale differences having bounded conditional subexponential moments. Under this assumption, we can recover the inequalities.2,.3 and.4; see Theorem 2.. Denote by ξ the essential supremum of a random variable ξ. Our first result implies that if u n := max E[ξi 2 expξ i + α F i, <, then, for any x > 0, i= P max S k x k n 2exp x 2 2u n +x 2 α..5 To illustrate the bound.5, consider the simple case where ξ i i is a sequence of i.i.d. random variables: in this case u n = On as n. It is interesting to see that when 0 x = on /2 p, our bound.5 is sub-gaussian. When x = ny with y > 0 fixed, the bound.5 is subexponential exp c y n α, where c y > 0 does not depend on n, thus recovering.4. 2

For martingale differences ξ i,f i i satisfying u n = on 2 α, we find an improved version of the inequality.3; see Corollary 2.2. To show the best value of the constant in.3, for i.i.d. random variables we establish large deviation principles for Sn n = n n i= ξ i and n max k ns k in Theorem 2.3. For the methods, an approach for obtaining subexponential bound is to combine the method of the tower property of conditional expectation and uniform norm. This approach has been applied in Fuk [6, Chung and Lu [5, Liu and Watbled [30 and Dedecker and Fan [6. With this approach, one can obtain inequality.2 with nk α = E[ξi 2 expξ i + α F i..6 i= However, this result is not the best possible in some cases. It turns out that with the method based on the change of probability measure for martingales from Grama and Haeusler [8 one can obtain better bounds. Using this method, we show that the inequality.2 holds with nk α = E[ξi 2 expξ+ i α F i..7 i= Such a refinement has been first proved by Freedman [4 for improving Fuk s inequality [6 for martingales, and similar analogs were established by Haeusler [9, van de Geer [38, de la Peña [8 and [. Since the latter nk α is less than the former one, i.e., E[ξi 2 expξ+ i α F i i= E[ξi 2 expξ+ i α F i, our method has certain advantage. To illustrate it, consider the following example. Assume that ε i i is a sequence of independent and unbounded random variables, and that ε i i is independent of ξ i,f i i. Assume that E[ξi F 2 i and E[ξi 2 exp ξ i α F i D i= / for a constant D and all i. Denote by ξ i = ξ n iε i k= ε2 k and F i = σε j, j i,f i, ε k, k n. Then ξ i,f i i is also a sequence of martingale differences. It is easy to see that E[ξ i 2 expξ i + α F i ε 2 i n k= ke[ξ 2 ε2 i exp ξ i α F i D, i= i= and that, by the fact that ε i i are unbounded, E[ξ i 2 expξ i + α F i ε 2 i n i F k= ke[ξ 2 i ε2 i= i= 3 i= ε 2 i n k= ε2 k = n.

Hence and sup n sup n E[ξ i 2 expξ i + α F i D, i= E[ξ i 2 expξ i + α F i =. i= Thus our extension.7 in the right hand-side of.2 is nontrivial, while with.6 it is infinite. Using the same method, we also generalize the following Fuk inequality for martingales cf. Corollary 3 of Fuk [6; see also Nagaev [33 for independent case: assume that E[ ξ i p F i < for some p 2 and all i [,n, then for any x > 0, P max S k x k n exp x2 + C p 2Ṽ 2 x p,.8 where Ṽ 2 := 4 p+22 e p E[ξi F 2 i and Cp := i= + 2 p p E[ ξ i p F i. i= In Corollary 2.5, we prove that.8 holds true when Ṽ 2 and C p are replaced by the following two smaller values V 2 and C p respectively, where V 2 := 4 p+22 e p E[ξi F 2 i and C p := i= + 2 p p E[ ξ i p F i. To illustrate this improvement on Fuk s inequality.8, consider thefollowing comparison between C p and C p in the case of self-normalized deviations. Assume that ε i i=,...,n is a sequence of independent, unbounded and symmetric random variables. Denote by ξ i = ε i / n k= ε k p /p, and F i = σε j,j i, ε k, k n. Then ξ i,f i i=,...,n is a sequence of martingale differences. It is easy to see that E[ ξ i p F i = i= i= i= ε i p n k= ε k p =, and that, by the fact that ε i i=,...,n are unbounded, E[ ξ i p F i = ε i p n k= ε k p = n. i= i= Hence C p is of order O while C p is of order On, which implies a significant improvement on Fuk s inequality.8. Under the condition that ξ i,f i i satisfy sup i E[ ξ i p < for a p 2, Lesigne and Volný [27 proved that P S n n = O n p/2, n..9 4

Under a stronger condition on the conditional moments, we obtain an improvement on the inequality of Lesigne and Volný.9. Assume either or sup i E[ ξ i p+δ < and sup E[ξi F i < i sup E[ ξ i p F i < i for two positive constants δ and p 2 do not depend on n. Then we have for any α 2,, P max S k n α = O k n n αp, n..0 Since p p/2 for p 2, it follows that.0 is an improvement of.9. We refer to our Theorem 2.6 and Corollary 2.5 where we give more precise bounds. From Theorem 4. of Hao and Liu [22 it follows that.0 is close to the optimal. For necessary and sufficient conditions to have.0 we refer to [22. The paper is organized as follows. We present our main results in Section 2, and discuss the applications to linear regressions with martingale difference innovations, weak invariance principles for martingales and self-normalized deviations in Section 3. The proofs of theorems are given in Sections 4-9. 2. Main results Assume that we are given a sequence of real-valued martingale differences ξ i,f i i=0,...,n defined on some probability space Ω,F,P, where ξ 0 = 0 and,ω = F 0... F n F are increasing σ-fields. So, by definition, E[ξ i F i = 0, i =,...,n. Define the martingale S := S k,f k k=0,...,n by setting S 0 = 0 and S k = ξ i, k =,...,n. 2. i= Denote by S the quadratic characteristic of the martingale S : S 0 = 0 and S k = E[ξi F 2 i, k =,...,n. 2.2 i= For any α 0,, set ΥS k = E[ξi 2 expξ+ i α F i, k [,n. 2.3 i= Our first result is a subexponential inequality on tail probabilities for martingales. A similar inequality for separately Lipschitz functionals has been obtained recently by Dedecker and Fan [6. 5

Theorem 2.. Assume exp C n := for some α 0,. Then for all x,u > 0, P S k x and ΥS k u for some k [,n x2 x 2/ α +C n exp 2u u exp It is obvious that E[ξi 2 expξ i + α < 2.4 i= x α u 2x 2 α u α/ α x +C n x x 2 exp α C n = E[ΥS n ΥS n. if 0 x < u /2 α if x u /2 α. Hence, when u max ΥS n,, from 2.5 we get the following rough bounds 2exp x2 if 0 x < u /2 α P max S 2u k x k n 2exp 2 2.6 xα if x u /2 α x 2 2exp 2u+x 2 α. 2.7 Thus for moderate x 0,u /2 α, the bound 2.5 is sub-gaussian. For x u /2 α, the bound 2.5 is subexponential of the order exp 2 xα x. Moreover, when, by 2.5, this order u /2 α can be improved to exp εx α, for any given small ε > 0. For any α 0, and k [,n, set ΥS k = E[ξi 2 exp ξ i α F i. i= Since ΥS k ΥS k, it is obvious that 2.5 is also an upper bound on the tail probabilities P ±S k x and ΥS k u for some k [,n. Moreover, if ΥS n u, then 2.5 is an upper bound on tail probabilities of the maxima and minima of the partial sums Pmax k n S k x and P min k n S k x. When ξ i i are i.i.d. random variables, we have C n = ΥS n = c 0 n with c 0 = E[ξ 2expξ+ α. In this case, inequality 2.6 implies the following large deviation bound: for any x > 0 and n sufficiently large, P max S k nx k n 2.5 2exp c x n α 2.8 6

with c x = xα 2. In fact, the constant c x in 2.8 is close to x α, as shown by the following large deviation bound for martingales, which is a consequence of Theorem 2.. Corollary 2.2. Assume condition 2.4, for some α 0,. If ΥS n = on 2 α, n, 2.9 then for any x 0, lim sup n n α logp n max S k x k n x α. 2.0 The bound 2.0 cannot be improved under condition 2.9. This is shown by the following large deviation principles on Sn n = n n i= ξ i and n max k ns k which hold true for i.i.d. random variables ξ i i with subexponential tails. Theorem 2.3. Consider an i.i.d. sequence ξ i i with E[ξ = 0. i If E[ξ 2 < and for some constants α 0, and c > 0, then for all x > 0, lim n n α logp lim x n max S k x k n ii If for some constant α 0, and c > 0, then for all x > 0, lim n n α logp lim x n max S k x k n x α logp ξ x = c, 2. = lim n n α logp Sn n x = cx α. 2.2 x α logp ξ x = c, 2.3 = lim n n α logp Sn n x = cx α 2.4 and lim n n α logp n min S k x k n = lim n n α logp Sn n x = cx α. 2.5 In the special case where ξ has a density px satisfying px exp x α as x, the asymptotic 2.2 for P S nn x follows from Theorem 3 in Nagaev [32 we also refer to Gantert, Ramanan and Rembart [7 for related results. Notice that condition 2.3 is weaker than that used in [32. Since the rate function x cx α is continuous and strictly increasing on R + = [0,, by Lemma 4.4 in Huang and Liu [23 we can reformulate the results of the previous theorem as large deviation 7

principles the proof therein was given for two sided tails but it can be easily adapted to one sided tails. We give the details below. From part i of the previous theorem it follows that, if E[ξ 2 < and 2. holds, then Sn n and n max k ns k verify a LDP with norming factor /n α and rate function x x α on R + : for any Borel set B R +, we have inf = liminf x B 0cxα n n α logp Sn n B limsup n n α logp Sn n B = inf cx α, 2.6 x B where B 0 and B denote the interior and the closure of B respectively; moreover the result 2.6 remains true when S n is replaced by max k n S k. From part ii it follows that, if 2.3 holds, then Sn n and n max k ns k verify a LDP with norming factor /n α and rate function x x α on R =,, that is 2.6 holds for any Borel set B R. Now we consider the case when the martingale differences ξ i,f i i=0,...,n have absolute moments of order p 2. For any p > 2 denote ΞS k = E[ξ i + p F i, i= k [,n. We prove the following inequality, which is similar to the results of Haeusler [9 and [. Theorem 2.4. Let p 2. Assume E[ ξ i p < for all i [,n. Then for all x,y,v,w > 0, P S k x, S k v and ΞS k w for some k [,n exp α2 x 2 2e p +exp βx v y log + βxyp +P max w ξ i > y, 2.7 i n where α = 2 p+2 and β = α. 2.8 Setting y = βx, we obtain the following generalization of the Fuk-Nagaev inequality.8. Corollary 2.5. Let p 2. Assume E[ ξ i p F i < for all i [,n. Then for all x > 0, P max S k x exp x2 k n 2V 2 + C p x p, 2.9 where V 2 = 4 p+22 e p S n and C p = + 2 p E[ ξ i p F i p. 2.20 i= 8

It is worth noting that if sup i E[ ξ i p F i C for a constant C, then, by Jensen s inequality, it holds sup i E[ξi 2 F i C 2/p. Therefore, inequality 2.9 implies the following sub-gaussian bound: for any x = O nlogn β, n, with β satisfying β > 0 if p = 2 and β 0,/2 if p > 2, P max S k x k n = O exp C x2 n, 2.2 where C > 0 does not depend on x and n. Inequality 2.9 also implies that for any α 2, and any x > 0, P max S k > n α x k n cx = O n αp, n, 2.22 where c x > 0 does not depend on n. The asymptotic 2.22 was first obtained by Fuk [6 and it is the best possible under the stated condition even for the sums of independent random variables cf. Fuk and Nagaev [5. When the martingale differences ξ i,f i i satisfy sup i E[ ξ i p <, Lesigne and Volný [27 proved that for any x > 0, P S n > nx cx = O n p/2, n, 2.23 where c x > 0 does not depend on n, and that the order n p/2 is optimal for the class of stationary and ergodic sequences of martingale differences. When α =, equality 2.22 implies the following large deviation convergence rate: for any x > 0, P max S k > nx k n cx = O n p, n, 2.24 where c x > 0 does not depend on n. When p 2, it holds p p/2. Thus 2.24 refines the bound 2.23 under the stronger assumption that the p-th conditional moments are uniformly bounded sup i E[ ξ i p F i <. Moreover, the following proposition of Lesigne and Volný [27 shows that the estimate of 2.24 cannot be essentially improved even in the i.i.d. case. Proposition A. Let p and c n n be a real positive sequence approaching zero. There exists a sequence of i.i.d. random variables ξ i i such that E[ ξ i p <, E[ξ i = 0 and lim sup n n p c n P S n n =. When E[ ξ i 2 F i and E[ ξ i p, i =,...,n, are uniformly bounded for some p > 2, but for the same p the condition E[ ξ i p F i C may be violated for some i [,n, we have the following result. Theorem 2.6. Let p 2. Assume E[ ξ i p+δ < for a small δ > 0 and all i [,n. Then for all x,v > 0, P S k x and S k v 2 for some k [,n exp x 2 2 v 2 + 3 x2p+δ/p+δ + [ x p E ξ i p+δ ξi >x p/p+δ. 2.25 i= 9

In particular, as a consequence of 2.25, if sup i E[ ξ i p+δ < and sup i E[ξi 2 F i <, we obtain that for any α 2, and any x > 0, P max S k n α x exp c x min np+δ,n αδ 2α + C/xp k n n αp cx = O n αp, n, 2.26 where c x > 0 does not depend on n. Note that 2.22 and2.26 have the same convergence rate. To highlight the difference between the conditionsunderwhichweobtain2.22and2.26, noticethattheassumptionsup i E[ ξ i p F i < has been replaced by the two assumptions sup i E[ ξ i p+δ < and sup i E[ξ 2 i F i <. It is obvious that the two assumptions sup i E[ ξ i p F i < and sup i E[ ξ i p+δ < do not imply each other. Therefore Corollary 2.5 and Theorem 2.6 do not imply each other. Actually, Hao and Liu [22, Theorem 4. gave necessary and sufficient conditions for 2.26 to hold, in particular from their result it follows that 2.26 holds under the weaker conditions sup i E[ ξ i p < and sup i E[ξ 2 i F i < and the exponent in 2.26 is optimal. The following corollary of Theorem 2.6 is obvious. Corollary 2.7. Assume the condition of Theorem 2.6. Then for all x,v > 0, P max S x 2 k x exp k n 2nv 2 + 3 x2p+δ/p+δ + [ x p E ξ i p+δ ξi >x p/p+δ + [ S n v p+δe p+δ/2. 2.27 n Moreover, i= [ S n E p+δ/2 n n E[ ξ i p+δ. 2.28 Inequality 2.28 implies that if sup i E[ ξ i p < for a p 2, then E[ S n /n p/2 are uniformly bounded for all n. Compared to Theorem 2.6, Corollary 2.7 is simpler, since it only requires the moment of S n instead of bounding S n uniformly in n. Assume sup i E[ ξ i p+δ < for same p 2 without any condition on S n. Applying 2.28 to 2.27 with nv 2 = 2 3 x2p+δ/p+δ, we obtain for all x,v > 0, P max S k x k n exp The last inequality shows that for any x > 0, P max S k n k n i= 2 xδ/p+δ + nc 3n x p + 2 p+δ 2 C xp+δ/2. 2.29 = O n p/2, n. 2.30 Since δ > 0 is arbitrary small, the asymptotic 2.30 is close to the best possible large deviation convergence rate n p+δ/2 given by Lesigne and Volný [27 cf. 2.23. 0

3. Applications The exponential concentration inequalities for martingales have many applications. McDiarmid [3, Rio [35 and Dedecker and Fan [6 applied such type inequalities to estimate the concentration of separately Lipschhitz functions. Liu and Watbled [30 adopted these inequalities to deduce asymptotic properties of the free energy of directed polymers in a random environment. We refer to Bercu and Touati[ and[3 for more interesting applications of the concentration inequalities for martingales. In the sequel, we provide some applications of our results to linear regressions with martingale difference innovations, weak invariance principles for martingales and self-normalized deviations. a. Linear regressions Linear regressions can be used to investigate the impact of one variable on the other, or to predict the value of one variable based on the other. For instance, if one wants to see impact of footprint size on height, or predict height according to a certain given value of footprint size. The stochastic linear regression model is given by, for all k [,n, X k = θφ k +ε k, 3. where X k k=,...,n,φ k k=,...,n and ε k k=,...,n are the observations, the regression random variables and the driven noises, respectively. We assume that φ k k=,...,n is a sequence of independent random variables, and that ε k k=,...,n is a sequence of martingale differences with respect to the natural filtration. Moreover, we suppose that φ k k=,...,n and ε k k=,...,n are independent. Our interest is to estimate the unknown parameter θ. The well-known least-squares estimator θ n is given below θ n = n k= φ kx k n. 3.2 k= φ2 k Recently, Bercu and Touati [ have obtained some very precise exponential bounds on the tail probabilities P θ n θ x. However, their precise bounds depend on the distribution of the regression random variables φ k k=,...,n, which restricts the applications of these bounds when the distributions of the regression variables are unknown. When ε k k=,...,n are independent normal random variables with a common variation σ 2 > 0, Liptser and Spokoiny [29 have established the following estimation: for all x, P ±θ n θ Σ nk= φ2k x 2 π σ x exp x2 2σ 2. 3.3 When ε k k=,...,n are conditionally sub-gaussian, similar estimation is allowed to be obtained in Liptser and Spokoiny [29. An interesting feature of bound 3.3 is a type of self-normalized deviations and the bound does not depend on the distribution of the regression random variables. Thus the selfnormalized approximation θ n θ Σ n k= φ2 k has certain advantage on the usual approximation of θ n θ. Next, we would like to consider the case that ε k k=,...,n are martingale differences. When the regression random variables are constants, such case has been considered in Section 5 of Dedecker and Merlevède [7, where the authors gave rate of convergence for the normal approximation of θ n θ in

terms of minimal distance. In the following theorems, we will assume that the regression variables are random. More importantly, we will consider the self-normalized approximation θ n θ Σ n k= φ2 k instead of the usual approximation of θ n θ considered in Bercu and Touati [ and Dedecker and Merlevède [7. Theorem 3.. Assume for two constants α 0, and D, [ E ε 2 ie ε i α σεj,j i D for all i [,n. Then for any u maxd, and all x > 0, 2exp P ±θ n θ Σ x2 if 0 x < u /2 α nk= φ2k x 2u 2exp 2 3.4 xα if x u /2 α x 2 2exp 2u+x 2 α. 3.5 In particular, it holds for any x > 0, P ±θ n θ Σ n k= φ2 k nx = O exp c x n α/2, n, 3.6 where c x > 0 does not depend on n. If ε k k=,...,n have the Weibull distributions and the conditional variances are uniformly bounded, then we have the following inequality which has the same exponentially decaying rate of 3.6. Theorem 3.2. Assume for three constants α 0,, E and F, [ E σε j,j i E and E [exp ε i α α F ε 2 i for all i [,n. Then for all x > 0, P ±θ n θ Σ nk= φ2k x exp x 2 2E + +nf exp x α. 3.7 3 x2 α In particular, equality 3.6 holds. If ε k k=,...,n have finite conditional moments, by Corollary 2.5, then we have the following result. Theorem 3.3. Let p 2. Assume for a constant A, [ E ε i p σεj,j i A 2

for all i [,n. Then for all x > 0, P ±θ n θ Σ nk= φ2k x exp x2 2V 2 + C p x p, 3.8 where V 2 = 4 p+22 e p A 2/p and C p = + 2 p pa. 3.9 In particular, it holds for any x > 0, P ±θ n θ Σ n k= φ2 k nx cx = O n p/2, n, 3.0 where c x > 0 does not depend on n. A similar inequality can be obtained by applying the Fuk inequality.8 to the martingale difference sequence cf. 7. for the definition of ξ i,f i i=,...,n. The Fuk inequality implies that for all x > 0, P ±θ n θ Σ nk= φ2k x exp x2 2nV 2 + nc p x p, 3. where V 2 and C p are defined by 3.9. In particular, it implies that for any x > 0, P ±θ n θ Σ n k= φ2 k nx cx = O n p/2, n, 3.2 where c x > 0 does not depend on n. The order of 3.0 is much better than that of 3.2. Thus the refinement of 3.8 on 3. is significant. If ε k k=,...,n have finite moments and uniformly bounded conditional variances, by Theorem 2.6, we obtain the following result which has the same polynomially decaying rate of Theorem 3.3. Theorem 3.4. Let p 2. Assume for two constants A and B, [ [ E σε j,j i A and E ε i p+δ B ε 2 i for a small δ > 0 and all i [,n. Then for all x > 0, P ±θ n θ Σ nk= φ2k x exp x 2 2 A+ 3 x2p+δ/p+δ + B xp. 3.3 In particular, equality 3.0 holds. In the following theorem, we assume that ε i i=,...,n have only a moment of order p [,2. 3

Theorem 3.5. Let p [,2. Assume for a constant A, E[ ε i p A for all i [,n. Then for all x > 0, P ±θ n θ Σ nk= φ2k x 2A x p. 3.4 In particular, equality 3.0 holds. Theorems 3. and 3.5 focus on obtaining the large deviation inequalities. These inequalities do not depend on the distribution of input random variables φ k k=,...,n. Similar bounds are also expected to be obtained via the decoupling techniques of de la Peña [8 and de la Peña and Giné [9. In particular, if ε k k=,...,n are independent random variables instead of martingale differences, with the method of conditionally independent in de la Peña and Giné [9, more precise bounds, but depending on the distribution of input random variables, may be established. HaeuslerandJoos[20provedthatifthemartingaledifferencessatisfyE[ ξ i 2+δ < foraconstant δ > 0 and all i [,n, then there exists a constant C δ, depending only on δ, such that for all x R, PS [ n x Φx C δ E ξ i 2+δ +E [ S n +δ/2 /3+δ + x 2+δ, 3.5 i= where Φx = 2π x exp t2 /2dt is the standard normal distribution; see also Hall and Heyde [2 with the larger factor + x 4+δ/22 /3+δ replacing + x 2+δ. Using 3.5, we obtain the following nonuniform Berry-Esseen bound, which depends on the distribution of input random variables. Theorem 3.6. Let p > 2. Assume that ε i i=,...,n satisfy E[ε 2 i σε j,j i = σ 2 a.s. for a positive constant σ and all i [,n. Assume E[ ε i p A for a constant A and all i [,n. Then for all x R, θ P n θ Σ nk= φ2k xσ Φx C p E[ where C p is a constant depending only on A,σ and p. Notice that E[ i= φ i Σ n k= φ2 k p i= E[ i= φ i φ i Σ n k= φ2 k Σ n k= φ2 k p /+p + x p, 3.6 2 =. Thus 3.6 implies that the tail probability P θ n θ Σ nk= φ2k x has the decaying rate x p as x, which is coincident with the inequalities 3.8 and 3.3. 4

b. Weak invariance principles In this subsection, let ξ i,f i i be a sequence of martingale differences. The following rate of convergence in the central limit theorem CLT for martingale difference sequences is due to Ouchti cf. Corollary of [34. Assumethat there exists a constant M > 0 such that E[ ξ i 3 F i ME[ξi 2 F i a.s. for all i N. If the series i= E[ξ2 i F i diverges a.s. then there is a constant C M > 0, depending on M, such that sup S P vn x n Φx C M, 3.7 n/4 where x R vn = inf Define the process H n t,0 t by k N, S k n. H n t = n S vk for 0 t, for t k = k/n and k = 0,...,n and by interpolation on the interval t k,t k for k =,...,n. Then the following invariance principle holds. Theorem 3.7. Assume that there exists a constant M > 0 such that E[ ξ i 3 F i ME[ξi 2 F i a.s. for all i N. If the series i= E[ξ2 i F i diverges a.s., then the sequence of processes H n t,0 t converges in distribution to the standard Wiener process in the space D [0, endowed with the Srorokhod metric. The result can be obtained from Theorem 2 of Chapter 7 see also Theorem 2 of Chapter 5 of Liptser and Shiryaev [28. As an illustration of our Theorem 2.4, we will give a short proof of the tightness of the processes H n t,0 t which is much simpler than the proof in [28. c. Self-normalized deviation inequalities Consider the self-normalized deviations for independent random variables. Assume that ξ i i=,...,n is a sequence of independent and symmetric random variables. Denote by V n β = ξ i β /β for a constant β, 2. We have the following self-normalized deviation inequality. i= Theorem 3.8. Assume that ξ i i=,...,n is a sequence of independent and symmetric random variables. Let t 0 be a positive number such that e x t 0 x β +e x t 0 x β 3.8 2 for a constant β,2 and all x R. Then for all x > 0, S k P max k n V n β x exp 5 Cβ,t 0 x β β, 3.9

where Cβ,t 0 = t 0 Remark 3.9. Let us comment on Theorem 3.8. β β β β β.. Ideally we want to make bound 3.9 as small as possible. Since Cβ,t 0 is decreasing in t 0, it is enough to choose t 0 satisfying 3.8 as small as possible for a given β, i.e., t 0 β = inf t > 0, e x t x β +e x t x β for all x R. 2 It is obvious that t 0 β = for β > 2 since for β > 2, e x t 0 x β +e x t 0 x β = 2+x 2 +ox 2, x 0. Moreover, t 0 β as β is decreasing to, t 0 β /2 as β is increasing to 2, and t 0 β for all x R and β,2. For practical purposes one can use the following table: β..2.3.4.5.6.7.8.9 2.0 t 0 0.742 0.627 0.554 0.504 0.470 0.448 0.436 0.436 0.450 0.5 2. For certain special cases, Jing, Liang and Zhou [24 have obtained the following self-normalized moderate deviation principle MDP result see also Shao [37 for self-normalized LDP result. Assume that Pξ i x = Pξ i x c x αh ix, x, where α 0,2, c > 0 and h i x s are slowly varying at. Under certain regularity conditions on the tail probabilities of ξ i cf. Theorem 2.3 of [24 for details, the following limit holds for x n and x n = on β /β and β > max,α, β lim n x β n logps n /V n β x n = β C α β, 3.20 where C α β is a positive constant depending on α and β. For self-normalized sums of symmetric random variables, the constant C α β is the solution of 0 2 expβx x β /C β exp βx x β /C β x +α dx = 0. 3.2 According to the MDP result of Jing, Liang and Zhou [24, the power in the right hand sides of 3.9 is the best possible for moderate x s. 3. For β,2, inequality 3.9 implies the following upper bounds of LDP lim sup n n logp S k max k n V n β xnβ β Cβ,t 0 xβ, β x 0,. 3.22 According to the LDP result of Shao [37, the power the best possible for large x s. β β β β in the right hand side of 3.9 is also 6

4. Proof of Theorem 2. To prove Theorem 2., we need the following technical lemma based on a truncation argument. Lemma 4.. Assume E[ξ 2 i exp ξ i α < for a constant α 0,. Set η i = ξ i ξi y for y > 0. Then for all λ > 0, E[e λη i F i + λ2 2 E[η2 i expλy α η + i α F i. The proof of Lemma 4. can be found in the proof of Proposition 3.5 in Dedecker and Fan [6. However, instead of using the tower property of conditional expectation as in Dedecker and Fan [6, we use changes of probability measure in the proof of this theorem. Set η i = ξ i ξi y for some y > 0. The exact value of y is given later. Then η i,f i i=,...,n is a sequence of supermartingale differences, and it holds E[expλη i < for all λ 0, and all i. Define the exponential multiplicative martingale Zλ = Z k λ,f k k=0,...,n, where Z k λ = k i= expλη i E[expλη i F i, Z 0λ =. If T is a stopping time, then Z T k λ is also a martingale, where Z T k λ = T k i= expλη i E[expλη i F i, Z 0λ =. Thus, the random variable Z T k λ is a probability density on Ω,F,P, i.e. Z T k λdp = E[Z T k λ =. Define the conjugate probability measure dp λ = Z T n λdp, 4. and denote by E λ the expectation with respect to P λ. Since ξ i = η i +ξ i ξi >y, it follows that for any x,y,u > 0, P S k x and ΥS k u for some k [,n P η i x and ΥS k u for some k [,n i= + P ξ i ξi >y > 0 for some k [,n i= =: P +P max ξ i > y. 4.2 i n 7

For any x,u > 0, define the stopping time Tx,u = min k [,n : with the convention that min = 0. Then, i= η i x and ΥS k u, Sk x and ΥS k u for some k [,n = Tx,u=k. By the change of measure 9.2, we deduce that for any x,λ,u > 0, where k= P = E λ [Z T n λ Sk x and ΥS k u for some k [,n = k= E λ [exp λ Ψ k λ = i= η i +Ψ k λ Tx,u=k, 4.3 loge[expλη i F i. 4.4 i= Set λ = y α. By Lemma 4. and the inequality log+t t for all t 0, it is easy to see that for any x > 0, Ψ k λ i= i= log + λ2 2 E[η2 i expλy α η i + α F i λ 2 2 E[η2 i expλy α η + i α F i 2 y2α 2 ΥS k. Using the fact that k i= η i x and Ψ k λ 2 y2α 2 u on the set Tx,u = k, we find that for any x,u > 0, P exp λx+ [ 2 y2α 2 u E λ Tx,u=k exp y α x+ 2 y2α 2 u. From 4.2, it follows that P S k x and ΥS k u for some k [,n exp y α x+ 2 y2α 2 u +P max ξ i > y. 4.5 i n 8 k=

By the exponential Markov inequality, we have the following estimation: for any y > 0, P max ξ i > y i n Pξ i > y i= y 2 exp yα E[ξi 2 expξ+ i α i= C n y 2 exp yα. 4.6 Taking u / α if 0 x < u /2 α y = x x if x u /2 α, from 4.5 and 4.6, we obtain the desired inequality. This completes the proof of Theorem 2.. Proof of Corollary 2.2. Set u n = ΥS n. Then u n = on 2 α,n, by the assumptions of Theorem 2.2. For any x > 0, by Theorem 2., we have P max S k nx exp nx α u n k n 2nx 2 α + C n nx 2 exp nx α nx α. + C n nx 2 exp Since u n C n, we have C n = on 2 α,n. Hence it holds lim sup n n α logp max S k nx x α. k n u n 2nx 2 α This completes the proof of Corollary 2.2. Proof of Theorem 2.3. i We suppose that c = by considering c /α ξ i instead of ξ i. Let ε 0, be fixed and consider ξ i = ξ i ε β with β > /α. By the assumption of the theorem Pξ x e εxα for x > 0 large enough, so that Pξ x e ε βα x α for x > 0 large enough, and E[ξ + 2 expξ + α = 0 Pξ x 2x+αx +α e xα dx <. Together with E[ξ 2 < this implies that c 0 := E[ξ 2 expξ + α <. Thus ΥS n := n i= E[ξ 2 expξ + α = nc 0 = on 2 α as n. Hence, from Corollary 2.2 applied to ξ i i we have for any x > 0, lim sup n n α logp ε β max S k nx k n x α, 9

which implies that for any x > 0, lim sup n n α logp max S k nx k n ε βα x α. Letting ε 0 we obtain for any x > 0, lim sup n n α logp max S k nx k n x α. 4.7 We now consider the lower bound. By the independence of ξ i s, it is easy to see that for any x,ε > 0, we have P max k nx k n P S n nx P i=2 i=2 ξ i nε,ξ nx+ε = P ξ i nε P ξ nx+ε. The first probability on the right-hand side converges to as n due to the law of large numbers. By 2., the second term on the right-hand side has the following lower bound α+ε P ξ nx+ε exp nx+ε for all n large enough. Hence lim inf n n α logp Sn n x x+ε α +ε. Letting ε 0, we obtain lim inf n n α logp Sn n x x α. 4.8 Combining the lower bound 4.8 with the upper bound 4.7, we get 2.2. This ends the proof of part i. ii The proof of 2.4 is similar to that of part i. The assertion 2.5 follows from 2.4 applied to ξ i i. 5. Proof of Theorem 2.4 To prove Theorem 2.4, we need the following technical lemma. 20

Lemma 5.. Let p 2. Assume E[ ξ i p < for all i [,n. Set η i = ξ i ξi y for y > 0. Then for all λ > 0, where the function E[e λη i F i + 2 ep λ 2 E[ξ 2 i F i +fye[ξ + i p F i, fu = eλu λu u p, u > 0. 5. Proof. We argue as in Fuk and Nagaev [5 see also Fuk [6. Using a two term Taylor s expansion, we have for some θ [0,, e λη i +λη i + λ2 2 η2 i ληi pe λθη i +fη i η + i p ληi >p. Remark that the function f is positive and increasing for λu p. Since E[η i F i E[ξ i F i = 0 and η i y, it follows that E[e λη i F i + 2 ep λ 2 E[η 2 i F i +fye[η + i p F i + 2 ep λ 2 E[ξ 2 i F i +fye[ξ + i p F i, which gives the desired inequality. We make use of Lemma 5. to prove Theorem 2.4. Set η i = ξ i ξi y for y > 0. Define the conjugate probability measure dp λ by 9.2 and denote by E λ the expectation with respect to P λ. Since ξ i = η i +ξ i ξi >y, it follows that for any x,y,u,w > 0, PS k > x, S k v and ΞS k w for some k [,n P η i x, S k v and ΞS k w for some k [,n i= + P ξ i ξi >y > 0 for some k [,n i= =: P 2 +P max ξ i > y. 5.2 i n For any x,v,w > 0, define the stopping time T : Tx,v,w = min k [,n : S k x, S k v and ΞS k w, with the convention that min = 0. Then Sk >x, S k v and ΞS k w for some k [,n = T=k. 2 k=

By the change of measure 9.2, we deduce that for any x,y,λ,u,w > 0, P 2 = E λ [Z T n λ Sk >x, S k v and ΞS k w for some k [,n = k= E λ [exp λ i= η i +Ψ k λ T=k, where Ψ k λ is defined by 4.4. By Lemma 5. and the inequality log+t t for t 0, it is easy to see that for any x,y,λ,u,w > 0, Ψ k λ i= i= log + 2 ep λ 2 E[ξi 2 F i +fye[ξ i + p F i 2 ep λ 2 E[ξi 2 F i +fye[ξ i + p F i, where fy is defined by 5.. By the fact that k i= η i x and Ψ k λ 2 ep λ 2 v +fyw on the set T = k. we find that for any x,y,λ,u,w > 0, P 2 exp λx+ [ 2 ep λ 2 v +fyw E λ T=k exp λx+ 2 ep λ 2 v +fyw. Next we carry out an argument as in Fuk and Nagaev [5. Then P 2 exp α2 x 2 2e p +exp βx v y log k= + βxyp w, 5.3 where α and β are defined by 2.8. Combining the inequalities 5.2 and 5.3 together, we obtain the desired inequality. This completes the proof of Theorem 2.4 Proof of Corollary 2.5. When y = βx, from 2.7, it is easy to see that for all x > 0, P max ξ i > y i n i= P ξ i > βx β p x p i= E[ ξ i p C p x p and exp βx y log + βxyp w w βxy p +w w β p x p C p x p, where C p is defined by 2.20. Thus 2.7 implies 2.9. 22

6. Proofs of Theorem 2.6 and Corollary 2.7 To prove Theorem 2.6, we need the following inequality whose proof can be found in Fan, Grama and Liu [ cf. Corollary 2.3 and Remark 2. therein. Lemma 6.. Assume E[ξi 2 < for all i [,n. Then for all x,y,v > 0, P S k x and S k v 2 for some k [,n x 2 exp 2v 2 + 3 xy +P max ξ i > y. i n Proof of Theorem 2.6. By Lemma 6. and the Markov inequality, it follows that for all x,y,v > 0, P S k x and S k v 2 for some k [,n x 2 exp 2v 2 + 3 xy + P ξ i > y i= x 2 exp 2v 2 + 3 xy + [ y p+δ E ξ i p+δ ξi >y. Taking y = x p/p+δ in the last inequality, we obtain the desired inequality. This completes the proof of Theorem 2.6. Proof of Corollary 2.7. Notice that p+δ > 2. It is easy to see that for any x,v > 0, P max S k x P max S k x and S n nv 2 +P S n > nv 2 k n k n P S k x and S k nv 2 for some k [,n +P S n > nv 2 P S k x and S k nv 2 for some k [,n + E[ S n p+δ/2 n p+δ/2 v p+δ, which gives the first desired inequality. By the Hölder inequality, it follows that Hence Then we have a i n 2/p+δ 2/p+δ, a p+δ/2 i ai 0, i =,...,n. i= i= p+δ/2 a i n p 2+δ/2 i= i= E[ S n p+δ/2 n p 2+δ/2 23 i= a p+δ/2 i, a i 0, i =,...,n. i= E [E[ξi F 2 i p+δ/2

[ n p 2+δ/2 E E[ ξ i p+δ F i i= = n p 2+δ/2 E[ ξ i p+δ. i= This completes the proof of corollary. 7. Proofs of Theorems 3. - 3.6 From 3. and 3.2, it is easy to see that θ n θ = φ k ε k Σ n k= k= φ2 k. For any i =,...,n, set φ i ε i ξ i = Σ n k= φ2 k and F i = σ φ k,ε k, k i, φ 2 k, k n. 7. Then ξ i,f i i=,...,n is a sequence of martingale differences, and satisfies S n = i= ξ i = θ n θ Σ n k= φ2 k. Proof of Theorem 3.. Notice that ΥS n φ 2 i Σ n i= k= φ2 k E[ε 2 i exp ε i α F i φ 2 i D Σ n i= k= φ2 k = D. Applying Theorem 2. to ξ i,f i i=,...,n, we find that 2.6, with u maxd,, is an upper bound on the tail probabilities P θ n θ Σ nk= φ2k x. Similarly, applying Theorem 2. to ξ i,f i i=,...,n, we find that 2.6, with u maxd,, is also an upper bound on the tail probabilities P θ n θ Σ nk= φ2k x. This completes the proof of Theorem 3.. Proof of Theorem 3.2. By the fact E[ε 2 i F i = E[ε 2 i σε k, k i E, it follows that S n i= φ 2 i Σ n k= φ2 k E[ε2 i F i E. 24

Similarly, by the fact E[exp ε i α α F, it is easy to see that E[exp ξ i α α E[exp εi α α F. Applying Theorem 2.2 of Fan, Grama and Liu [2 to ±ξ i,f i i=,...,n, we obtain the desired inequality. Proof of Theorem 3.3. By the fact E[ ε i p F i = E[ ε i p σε k, k i A, it follows that and S n i= φ 2 i Σ n k= φ2 k E[ε2 i F i E[ ξ i p F i i= i= i= φ 2 2/p i Σ n k= φ2 k E[ ε i p F i = A 2/p φ 2 i Σ n k= φ2 k E[ ε i p F i A. Applying Corollary 2.5 to ±ξ i,f i i=,...,n, we obtain the desired inequality. Proof of Theorem 3.4. By the fact E[ε 2 i F i = E[ε 2 i σε k, k i A, it follows that S n i= Similarly, by the fact E[ ε i p+δ B, it follows that E[ ξ i p+δ = i= φ E[ 2 i i= Σ n k= φ2 k φ 2 i Σ n k= φ2 k E[ε2 i F i A. p+δ 2 E[ ε i p+δ E [ φ 2 i Σ n i= k= φ2 k Applying Theorem 2.6 to ±ξ i,f i i=,...,n, we obtain the desired inequality. B = B. Proof of Theorem 3.5. Let p [,2. By the inequality we have E[ ξ i p = i= α a i a α i, a i 0 and α 0,, i= i= [ E i= φ 2 i p/2 Σ n k= φ2 k p/2 E[ ε i p E [ n i= φ2 i p/2 Σ n k= φ2 k p/2 By the inequality of von Bahr and Esseen cf. Theorem 2 of [39, we get E[ S n p 2 E[ ξ i p 2A. i= 25 A = A.

Then for all x > 0, P ±θ n θ Σ nk= φ2k x = P ±S n x E[ S n p x p 2A x p. This completes the proof of theorem. Proof of Theorem 3.6. It is obvious that θ n θ σ Σ nk= φ2k = η i, where η i = ξ i /σ. Notice that E[ε 2 i F i = E[ε 2 i σε j,j i = σ 2 a.s.. Then we have i= i= E[η 2 i F i = S n σ 2 = i= φ 2 i Σ n k= φ2 k E[ε 2 i F i σ 2 = φ 2 i Σ n i= k= φ2 k = and E[ η i p F i i= i= [ φ i E Σ n k= φ2 k p E[ ε i p F i σ p A σ p i= [ φ i E Σ n k= φ2 k p. Applying inequality 3.5 to the martingale difference sequence η i,f i i=,...,n with δ = p 2, we obtain the desired inequality. 8. Proof of tightness in Theorem 3.7 By Theorem 8.4, Chapter 3, Section 8 of Billingsley [2 see also Chapter 3, Section 6 for the convergence in the space, we only need to show that for any ε > 0, there exist a λ > and an integer n 0 such that for every n n 0, P max S i S k λ n k i k+n ε λ2. 8. Since we deduce that E[ ξ i 3 F i ME[ξ 2 i F i, This, in turn, implies that E[ξ 2 i F i 3/2 E[ ξ i 3 F i ME[ξ 2 i F i. 8.2 E[ξ 2 i F i M 2, S k+n S k nm 2 and k+n E[ ξ i 3 F i nm 3. i=k 26

Applying 2.7 with p = 3,x = 2y = λ n, we obtain, by 8.2, P max S i S k λ n k i k+n 2exp 2λ2 25e 3 M 2 +2exp 65 log + 3λ3 n 20M 3 + P max ξ i > k i k+n 2 λ n +P max ξ i > k i k+n 2 λ n 2exp 4λ2 3λ 3 6/5 n 50e 3 M 2 +2 20M 3 + 6 k+n λ 3 n 3/2 E [ ξ i 3 ε λ 2, i=k provided that λ is sufficiently large, which proves 8.. 9. Proof of Theorem 3.8 For any i =,...,n, set η i = ξ i V n β, F 0 = σ ξ j, j n and F i = σ ξ k, k i, ξ j, j n. 9. Since ξ i i=,...,n are independent and symmetric, then E[ξ i > y F i = E[ξ i > y ξ i = E[ ξ i > y ξ i = E[ ξ i > y F i. Thus η i,f i i=,...,n is a sequence of conditionally symmetric martingale differences, i.e. E[η i > y F i = E[ η i > y F i. It is easy to see that S n V n β = η i i= is a sum of martingale differences, and that η i,f i i=,...,n satisfies E[ η i β F i = i= For any x > 0, define the stopping time T: η i β = i= i= ξ i β V n β β =. Tx = min k [,n : 27 i= η i x,

with the convention that min = 0. Then it follows that max k n S k /V nβ x = Tx=k. For any nonnegative number λ, define the martingale Mλ = M k λ,f k k=0,...,n, where M k λ = k i= k= expλη i E[expλη i F i, M 0λ =. SinceT is a stoppingtime, then M T k λ, λ > 0, is also a martingale. Define theconjugate probability measure P λ on Ω,F: Denote E λ the expectation with respect to P λ. Denote by Ψ k λ = i= dp λ = M T n λdp. 9.2 [ loge expλη i F i, k [,n. Using the change of probability measure 9.2, we have for all x > 0, S k P max k n V n β x [ = E λ M T n λ max k n S k /V nβ x = k= k= E λ [exp λ i= η i +Ψ k λ Tx=k E λ [exp λx+ψ k λ Tx=k, 9.3 where the last line follows by the fact that k i= η i x on the set Tx = k. Since η i,f i i=,...,n is conditionally symmetric, one has [ [ E expλη i F i = E exp λη i F i, and thus it holds [ [ E expλη i F i = E coshλη i F i. Notice that η 2 i is F i measurable, and that Thus coshx = Ψ k λ = i= k=0 2k! x2k. log coshλη i. 28

Let t 0 be a number such that e x t 0 x β +e x t 0 x β 2 for all x R. 9.4 Then we have and This completes the proof. Ψ k λ P max k n log e t 0 λη i β = t 0 λ β η i β t 0 λ β. i= S k V n β x inf = exp i= λx+t exp 0 λ β λ>0 Cβ,t 0 x β β. 9.5 Acknowledgements The authors would like to thank the anonymous referee for valuable comments and remarks. The work has been partially supported by the National Natural Science Foundation of China Grants no. 60375, no. 5046, no. 57052 and no. 40590. [ Bercu, B., Touati, A., 2008. Exponential inequalities for self-normalized martingales with applications. Ann. Appl. Probab. 85: 848 869. [2 Billingsley, P. Convergence of Probability Measures. John Wiley: New York, 968. [3 Borovkov, A. A., 2000. Estimates for the distribution of sums and maxima of sums of random variables when the Cramer condition is not satisfied. Siberian Math. J. 45: 8 848. [4 Borovkov, A. A., 2000. Probabilities of large deviations for random walks with semi-exponential distributions. Siberian Math. J. 46: 06 093. [5 Chung, F., Lu, L., 2006. Concentration inequalities and martingale inequalities: a survey. Internet Math. 3: 79 27. [6 Dedecker, J., Fan, X., 205. Deviation inequalities for separately Lipschitz functionals of iterated random functions, Stochastic Process. Appl. 25: 60 90. [7 Dedecker, J., Merlevède, F., 20. Rates of convergence in the central limit theorem for linear statistics of martingale differences. Stochastic Process. Appl. 25: 03 043. [8 de la Peña, V. H., 999. A general class of exponential inequalities for martingales and ratios. Ann. Probab. 27 : 537 564. [9 de la Peña, V. H., Giné, E., 999. Decoupling: from dependence to independence. Springer. [0 Doukhan, P., Neumann, M. H., 2007. Probability and moment inequalities for sums of weakly dependent random variables, with applications. Stochastic Process. Appl. 77: 878 903. 29

[ Fan, X., Grama, I., Liu, Q., 202. Hoeffding s inequality for supermartingales. Stochastic Process. Appl. 220: 3545 3559. [2 Fan, X., Grama, I., Liu, Q., 202. Large deviation exponential inequalities for supermartingales. Electron. Commun. Probab. 759: 8. [3 Fan, X., Grama, I. and Liu, Q., 205. Exponential inequalities for martingales with applications. Electron. J. Probab. 20: 22. [4 Freedman, D.A., 975. On tail probabilities for martingales. Ann. Probab. 3, No., 00 8. [5 Fuk, D. Kh., Nagaev, S. V., 97. Probabilistic inequalities for partial sums of independent variables. Theory Probab. Appl. 64: 660 675. [6 Fuk, D. Kh., 973. Some probabilistic inequalities for martingales. Siberian. Math. J. 4: 85 93. [7 Gantert, N., Ramanan, K., Rembart, F., 204. Large deviations for weighted sums of stretched exponential random variables. Electron. Commun. Probab. 94: 4. [8 Grama, I., Haeusler, E., 2000. Large deviations for martingales via Cramér s method. Stochastic Process. Appl. 852: 279 293. [9 Haeusler, E., 984. An exact rate of convergence in the functional central limit theorem for special martingale difference arrays. Probab. Theory Relat. Fields 654: 523 534. [20 Haeusler, E., Joos, K., 988. A nonuniform bound on the rate of convergence in the martingale central limit theorem. Ann. Probab. 64: 699 720. [2 Hall, P., Heyde, C. C., 980. Martingale Limit Theory and Its Application. Academic Press. [22 Hao, S., Liu, Q., 204. Convergence rates in the lax of large numbers for arrays of martingale differences. J. Math. Anal. Appl. 472: 733 773. [23 Huang, C., Liu, Q., 202. Moments, moderate and large deviations for a branching process in a random environment. Stoch. Proc. Appl. 22: 522 545. [24 Jing, B. Y., Liang, H. Y., Zhou, W., 202. Self-normalized moderate deviations for independent random variables. Sci. China Math. 55: 2297 235 [25 Kiesel, R., Stadtmüller, U., 2000. A large deviation principle for weighted sums of independent and identically distributed random variables. J. Math. Anal. Appl. 252: 929 939. [26 Lanzinger, H., Stadtmüller, U., 2000. Maxima of increments of partial sums for certain subexponential distributions. Stochastic Process. Appl. 862: 307 322. [27 Lesigne, E., Volný, D., 200. Large deviations for martingales. Stochastic Process. Appl. 96: 43 59. [28 Liptser, R., Shiryaev, A., 989. Theory of Martingales. Kluwer Academic Publishers. 30

[29 Liptser, R., Spokoiny, V., 2000. Deviation probability bound for martingales with applications to statistical estimation. Statist. Probab. Lett. 464: 347 357. [30 Liu, Q., Watbled, F., 2009. Exponential inequalities for martingales and asymptotic properties of the free energy of directed polymers in a random environment. Stochastic Process. Appl. 90: 30 332. [3 McDiarmid, C., 989. On the method of bounded differences. Surveys in combi. [32 Nagaev, A. V., 969. Integral limit theorems taking into account large deviations when Cramér s condition does not hold. I. Theory Probab. Appl. 4: 5 63. [33 Nagaev, S. V., 979. Large deviations of sums of independent random variables. Ann. Probab. 7: 745 789. [34 Ouchti, L., 2005. On the rate of convergence in the central limit theorem for martingale difference sequences. Ann. Inst. Henri Poincaré Probab. Stat. 4: 35 43. [35 Rio, E., 203. On McDiarmid s concentration inequality. Electron. Commun. Probab. 844:. [36 Saulis, L., Statulevičius, V. A., 978. Limit theorems for large deviations. Kluwer Academic Publishers. [37 Shao, Q. M., 997. Self-normalized large deviations. Ann. Probab. 25: 285 328. [38 van de Geer, S., 995. Exponential inequalities for martingales, with application to maximum likelihood estimation for counting process. Ann. Statist. 23, 779 80. [39 von Bahr, B., Esseen, C. G., 965. Inequalities for the rth absolute moment of a sum of random variables, r 2. Ann. Math. Statist. 36: 299 303. 3