REGULARIZED DISTRIBUTIONS AND ENTROPIC STABILITY OF CRAMER S CHARACTERIZATION OF THE NORMAL LAW. 1. Introduction

Similar documents
arxiv: v1 [math.pr] 12 Apr 2015

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

On Concentration Functions of Random Variables

ON CONCENTRATION FUNCTIONS OF RANDOM VARIABLES. Sergey G. Bobkov and Gennadiy P. Chistyakov. June 2, 2013

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

Dimensional behaviour of entropy and information

Spectral Gap and Concentration for Some Spherically Symmetric Probability Measures

A note on the convex infimum convolution inequality

STABILITY RESULTS FOR THE BRUNN-MINKOWSKI INEQUALITY

arxiv: v1 [math.fa] 29 Nov 2011

CENTRAL LIMIT THEOREM AND DIOPHANTINE APPROXIMATIONS. Sergey G. Bobkov. December 24, 2016

ON A UNIQUENESS PROPERTY OF SECOND CONVOLUTIONS

KLS-TYPE ISOPERIMETRIC BOUNDS FOR LOG-CONCAVE PROBABILITY MEASURES. December, 2014

Contents 1. Introduction 1 2. Main results 3 3. Proof of the main inequalities 7 4. Application to random dynamical systems 11 References 16

Strong log-concavity is preserved by convolution

Entropy and Ergodic Theory Lecture 15: A first look at concentration

RÉNYI DIVERGENCE AND THE CENTRAL LIMIT THEOREM. 1. Introduction

arxiv: v3 [math.pr] 24 Mar 2016

On the absolute constants in the Berry Esseen type inequalities for identically distributed summands

b i (µ, x, s) ei ϕ(x) µ s (dx) ds (2) i=1

Lecture 4 Lebesgue spaces and inequalities

BOUNDS ON THE DEFICIT IN THE LOGARITHMIC SOBOLEV INEQUALITY

The Central Limit Theorem: More of the Story

Concentration Properties of Restricted Measures with Applications to Non-Lipschitz Functions

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Discrete Ricci curvature: Open problems

Weak and strong moments of l r -norms of log-concave vectors

arxiv: v2 [math.ap] 6 Sep 2007

BLOWUP THEORY FOR THE CRITICAL NONLINEAR SCHRÖDINGER EQUATIONS REVISITED

Fisher information and Stam inequality on a finite group

Universität des Saarlandes. Fachrichtung 6.1 Mathematik

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Appendix B: Inequalities Involving Random Variables and Their Expectations

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals

GAUSSIAN MEASURE OF SECTIONS OF DILATES AND TRANSLATIONS OF CONVEX BODIES. 2π) n

A Note on the Central Limit Theorem for a Class of Linear Systems 1

A COUNTEREXAMPLE TO AN ENDPOINT BILINEAR STRICHARTZ INEQUALITY TERENCE TAO. t L x (R R2 ) f L 2 x (R2 )

ON THE REGULARITY OF WEAK SOLUTIONS OF THE 3D NAVIER-STOKES EQUATIONS IN B 1

ROBUSTNESS OF THE GAUSSIAN CONCENTRATION INEQUALITY AND THE BRUNN-MINKOWSKI INEQUALITY

Score functions, generalized relative Fisher information and applications

From the Brunn-Minkowski inequality to a class of Poincaré type inequalities

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law

A Criterion for the Compound Poisson Distribution to be Maximum Entropy

Zeros of lacunary random polynomials

Regularity estimates for fully non linear elliptic equations which are asymptotically convex

On large deviations for combinatorial sums

Probability and Measure

Recent developments in elliptic partial differential equations of Monge Ampère type

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES

NEW FUNCTIONAL INEQUALITIES

Stability in geometric & functional inequalities

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

Wasserstein-2 bounds in normal approximation under local dependence

HARNACK INEQUALITY FOR NONDIVERGENT ELLIPTIC OPERATORS ON RIEMANNIAN MANIFOLDS. Seick Kim

Part II Probability and Measure

arxiv: v1 [math.ap] 28 Mar 2014

A Note on Jackknife Based Estimates of Sampling Distributions. Abstract

Anomalous transport of particles in Plasma physics

Krzysztof Burdzy University of Washington. = X(Y (t)), t 0}

ECE 4400:693 - Information Theory

Global unbounded solutions of the Fujita equation in the intermediate range

Entropy, Compound Poisson Approximation, Log-Sobolev Inequalities and Measure Concentration

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

ON THE ZERO-ONE LAW AND THE LAW OF LARGE NUMBERS FOR RANDOM WALK IN MIXING RAN- DOM ENVIRONMENT

An Inverse Problem for Gibbs Fields with Hard Core Potential

Independence of some multiple Poisson stochastic integrals with variable-sign kernels

Optimal global rates of convergence for interpolation problems with random design

HARDY INEQUALITIES WITH BOUNDARY TERMS. x 2 dx u 2 dx. (1.2) u 2 = u 2 dx.

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*

An Approximate Solution for Volterra Integral Equations of the Second Kind in Space with Weight Function

OXPORD UNIVERSITY PRESS

PROPERTY OF HALF{SPACES. July 25, Abstract

The Entropy Per Coordinate of a Random Vector is Highly Constrained Under Convexity Conditions Sergey Bobkov and Mokshay Madiman, Member, IEEE

Mod-φ convergence I: examples and probabilistic estimates

On Friedrichs inequality, Helmholtz decomposition, vector potentials, and the div-curl lemma. Ben Schweizer 1

3. Probability inequalities

WLLN for arrays of nonnegative random variables

ON THE CONVEX INFIMUM CONVOLUTION INEQUALITY WITH OPTIMAL COST FUNCTION

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

Energy method for wave equations

SIMILAR MARKOV CHAINS

Soo Hak Sung and Andrei I. Volodin

Richard F. Bass Krzysztof Burdzy University of Washington

Concentration inequalities and the entropy method

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

Generalization of some extremal problems on non-overlapping domains with free poles

A concentration theorem for the equilibrium measure of Markov chains with nonnegative coarse Ricci curvature

C 1,α h-principle for von Kármán constraints

GERHARD S I E GEL (DRESDEN)

PIECEWISE LINEAR FINITE ELEMENT METHODS ARE NOT LOCALIZED

On Smoothness of Suitable Weak Solutions to the Navier-Stokes Equations

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Whittaker-Kotel nikov-shannon approximation of sub-gaussian random processes

Concentration inequalities: basics and some new challenges

Inverse Brascamp-Lieb inequalities along the Heat equation

STA205 Probability: Week 8 R. Wolpert

ON NONHOMOGENEOUS BIHARMONIC EQUATIONS INVOLVING CRITICAL SOBOLEV EXPONENT

Transcription:

REGULARIZED DISTRIBUTIONS AND ENTROPIC STABILITY OF CRAMER S CHARACTERIZATION OF THE NORMAL LAW S. G. BOBKOV,4, G. P. CHISTYAKOV 2,4, AND F. GÖTZE3,4 Abstract. For regularized distributions we establish stability of the characterization of the normal law in Cramer s theorem with respect to the total variation norm and the entropic distance. As part of the argument, Sapogov-type theorems are refined for random variables with finite second moment.. Introduction Let X and Y be independent random variables. A theorem of Cramer [Cr] indicates that, if the sum X + Y has a normal distribution, then both X and Y are normal. P. Lévy established stability of this characterization property with respect to the Lévy distance, which is formulated as follows. Given > 0 and distribution functions F, G, LF G, Φ) < LF, Φ a,σ ) < δ, LG, Φ a2,σ 2 ) < δ, for some a, a 2 R and σ, σ 2 > 0, where δ only depends on, and in a such way that δ 0 as 0. Here Φ a,σ stands for the distribution function of the normal law Na, σ 2 ) with mean a and standard deviation σ, i.e., with density ϕ a,σ x) = σ /2σ2 e x a)2, x R, 2π and we omit indices in the standard case a = 0, σ =. As usual, F G denotes the convolution of the corresponding distributions. The problem of quantitative versions of this stability property of the normal law has been intensively studied in many papers, starting with results by Sapogov [S-3] and ending with results by Chistyakov and Golinskii [C-G], who found the correct asymptotic of the best possible error function δ for the Lévy distance. See also [Z], [M], [L-O], [C], [Se] [Sh-2]. As for stronger metrics, not much is known up to now. According to McKean [MC], cf. also [C-S] for some related aspects of the problem), it was Kac who raised the question about the stability in Cramer s theorem with respect to the entropic distance to normality. Let us recall that, if a random variable X with finite second moment has a density px), its entropy hx) = px) log px) dx 99 Mathematics Subject Classification. Primary 60E. Key words and phrases. Cramer s theorem, normal characterization, stability problems. ) School of Mathematics, University of Minnesota, USA; Email: bobkov@math.umn.edu. 2) Faculty of Mathematics, University of Bielefeld, Germany; Email: chistyak@math.uni-bielefeld.de. 3) Faculty of Mathematics, University of Bielefeld, Germany; Email: goetze@math.uni-bielefeld.de. 4) Research partially supported by NSF grant DMS-06530 and SFB 70.

2 S. G. Bobkov, G. P. Chistyakov and F. Götze is well-defined and is bounded from above by the entropy of the normal random variable Z, having the same variance σ 2 = VarZ) = VarX). The entropic distance to the normal is given by the formula DX) = hz) hx) = px) log px) ϕ a,σ x) dx, where in the last formula it is assumed that a = EZ = EX. It represents the Kullback-Leibler distance from the distribution F of X to the family of all normal laws on the line. In general, 0 DX), and an infinite value is possible. This quantity is affine invariant, and so it does not depend on the mean and variance of X. It is stronger than the total variation distance F Φ a,σ TV, as may be seen from the Pinsker inequality DX) 2 F Φ a,σ 2 TV. Thus, Kac s question is whether one can bound the entropic distance DX +Y ) from below in terms of DX) and DY ) for independent random variables, i.e., to have an inequality DX + Y ) αdx), DY )) with some non-negative function α, such that αt, s) > 0 for t, s > 0. If so, Cramer s theorem would be an immediate consequence of this. Note that the reverse inequality does exist, and in case VarX + Y ) = we have DX + Y ) VarX)DX) + VarY )DY ), which is due to the general entropy power inequality, cf. [D-C-T]. It turned out that Kac s question has a negative solution. More precisely, for any > 0, one can construct independent random variables X and Y with absolutely continuous symmetric distributions F, G, and with VarX) = VarY ) =, such that a) DX + Y ) < ; b) F Φ a,σ TV > c and G Φ a,σ TV > c, for all a R and σ > 0, where c > 0 is an absolute constant, see [B-C-G]. In particular, DX) and DY ) are bounded away from zero. Moreover, refined analytic tools show that the random variables may be chosen to be identically distributed, i.e., a) b) hold with F = G, see [B-C-G2]. Nevertheless, Kac s problem remains to be of interest for subclasses of probability measures obtained by convolution with a smooth distribution. The main purpose of this note is to give an affirmative solution to the problem in the rather typical) situation, when independent Gaussian noise is added to the given random variables. That is, for a small parameter σ > 0, we consider the regularized random variables X σ = X + σz, Y σ = Y + σz 2, where Z and Z 2 denote independent standard normal random variables, which are independent of X, Y. Note that the functionals DX σ ), DY σ ), DX σ + Y σ ) are still translation invariant. As a main result, we prove: Theorem.. Let X, Y be independent random variables with VarX + Y ) =. Given 0 < σ, the regularized random variables X σ and Y σ satisfy { DX σ + Y σ ) exp c log7 2 + /D) } D 2, where c > 0 is an absolute constant, and D = σ 2 VarX σ ) DX σ ) + VarY σ ) DY σ ) ).

Stability in Cramer s theorem 3 Thus, if DX σ + Y σ ) is small, the entropic distances DX σ ) and DY σ ) have to be small, as well. In particular, Cramer s theorem is a consequence of this statement. However, it is not clear whether the above lower bound is optimal with respect to the couple DX σ ), DY σ )), and perhaps the logarithmic term in the exponent may be removed. As we will see, a certain improvement of the bound can be achieved, when X and Y have equal variances. Beyond the realm of results around P. Lévy s theorem, recently there has been renewed the interest in other related stability problems in different areas of Analysis and Geometry. One can mention, for example, the problems of sharpness of the Brunn-Minkowski and Sobolev-type inequalities cf. [F-M-P-2], [Seg], [B-G-R-S]). We start with the description and refinement of Sapogov-type theorems about the normal approximation in Kolmogorov distance Sections 2-3) and then turn to analogous results for the Lévy distance Section 4). A version of Theorem. for the total variation distance is given in Section 5. Sections 6-7 deal with the problem of bounding the tail function EX 2 { X T } in terms of the entropic distances DX) and DX + Y ), which is an essential part of Kac s problem. A first application, namely, to a variant of Chistyakov-Golinskii s theorem, is discussed in Section 8. In Section 9, we develop several estimates connecting the entropic distance DX) and the uniform deviation of the density p from the corresponding normal density. In Section 0 an improved variant of Theorem. is derived in the case, where X and Y have equal variances. The general case is treated in Section. Finally, some relations between different distances in the space of probability distributions on the line are postponed to appendix without proofs), and we refer to the extended version [B-C-G3] for more details. 2. Sapogov-type theorems for Kolmogorov distance Throughout the paper we consider the following classical metrics in the space of probability distributions on the real line: ) The Kolmogorov or L -distance F G = sup x F x) Gx) ; 2) The Lévy distance LF, G) = min { h 0 : Gx h) h F x) Gx + h) + h, x R } ; 3) The Kantorovich or L -distance W F, G) = 4) The total variation distance F x) Gx) dx; F G TV = sup F x k ) Gx k )) F y k ) Gy k )), where the sup is taken over all finite collections of points y < x < < y n < x n. In these relations, F and G are arbitrary distribution functions. Note that the quantity W F, G) is finite, as long as both F and G have a finite first absolute moment. In the sequel, Φ a,v or Na, v 2 ) denote the normal distribution function) with parameters a, v 2 ), a R v > 0. If a = 0, we write Φ v, and write Φ in the standard case a = 0, v =. Now, let X and Y be independent random variables with distribution functions F and G. Then the convolution F G represents the distribution of the sum X + Y. If both random variables have mean zero and unit variances, Sapogov s main stability result reads as follows:

4 S. G. Bobkov, G. P. Chistyakov and F. Götze Theorem 2.. Let EX = EY = 0 and VarX) = VarY ) =. If F G Φ Φ <, then with some absolute constant C F Φ C log and G Φ C log. In the general case that is, when there are no finite moments), the conclusion is somewhat weaker. Namely, with 0, ), we associate a = N N x df x), σ 2 = N N x 2 df x) a 2 σ 0), and similarly a 2, σ2 2) for the distribution function G, where N = N) = + 2 log/). In the sequel, we also use the function mσ, ) = min{ σ, log log ee }, σ > 0, 0 <. Theorem 2.2. Assume F G Φ <. If F has median zero, and σ, σ 2 > 0, then with some absolute constant C C F Φ a,σ mσ, ), σ log and similarly for G. Originally, Sapogov derived a weaker bound in [Sa-2] with worse behaviour with respect C to both σ and. In [Sa3] he gave an improvement, F Φ a,σ with a correct log/) asymptotic of the right-hand side with respect to, cf. also [L-O]. The correctness of the asymptotic with respect to was studied in [M], cf. also [C]. In 976 Senatov [Se], using the ridge property of characteristic functions, improved the factor σ 3 to σ3/2, i.e., F Φ a,σ σ 3/2 C log σ 3. 2.) He also emphasized that the presence of σ in the bound is essential. A further improvement of the power of σ is due to Shiganov [Sh-2]. Moreover, at the expense of an additional -dependent factor, one can replace σ 3/2 with σ. As shown in [C-G], see Remark on p. 286, F Φ a,σ C log log ee σ log. 2.2) Therefore, Theorem 2.2 is just the combination of the two results, 2.) and 2.2). Let us emphasize that all proofs of these theorems use the methods of the Complex Analysis. Moreover, up to now there is no Real Analysis proof of the Cramér theorem and of its extensions in the form of Sapogov-type results. This, however, does not concern the case of identically distributed summands, cf. [B-C-G2]. We will discuss the bounds in the Lévy distance in the next sections. The assumption about the median in Theorem 2.2 may be weakened to the condition that the medians of X and Y, mx) and my ), are bounded in absolute value by a constant. For example, if EX = EY = 0 and VarX + Y ) =, and if, for definiteness, VarX) /2, then, by Chebyshev s inequality, mx), while my ) will be bounded by an absolute constant, when is small enough, due to the main hypothesis F G Φ.

Stability in Cramer s theorem 5 Moreover, if the variances of X and Y are bounded away from zero, the statement of Theorem 2.2 holds with a = 0, and the factor σ can be replaced with the standard deviation of X. In the next section, we recall some standard arguments in order to justify this conclusion and give a more general version of Theorem 2.2 involving variances: Theorem 2.3. Let EX = EY = 0, VarX + Y ) =. If F G Φ <, then with some absolute constant C F Φ v Cmv, ) v log and G Φ v2 Cmv 2, ), v 2 log where v 2 = VarX), v2 2 = VarY ) v, v 2 > 0). Under the stated assumptions, Theorem 2.3 is stronger than Theorem 2.2, since v σ. Another advantage of this formulation is that v does not depend on, while σ does. 3. Proof of Theorem 2.3 Let X and Y be independent random variables with distribution functions F and G, respectively, with EX = EY = 0 and VarX + Y ) =. We assume that F G Φ <, and keep the same notations as in Section 2. Recall that N = N) = + 2 log/). The proof of Theorem 2.3 is entirely based on Theorem 2.2. We will need: Lemma 3.. With some absolute constant C we have 0 σ 2 + σ2 2 ) CN 2. A similar assertion, σ 2 + σ2 2 CN 2, is known under the assumption that F has a median at zero without moment assumptions). For the proof of Lemma 3., we use arguments from [Sa] and [Se], cf. Lemma. It will be convenient to divide the proof into several steps. Lemma 3.2. Let 0 = 4 Φ ) = 0.093... Then mx) 2 and my ) 2. Indeed, let VarX) /2. Then mx), by Chebyshev s inequality. Hence, P{X, Y my )} P{X + Y my ) + } ΦmY ) + ) +, 4 which for 4 implies that my ) + Φ 4 ). In particular, my ) 2, if 0. Similarly, my ) 2. To continue, introduce truncated random variables at level N. Put X = X in case X N, X = 0 in case X > N, and similarly Y for Y. Note that EX = a, VarX ) = σ 2, EY = a 2, VarY ) = σ 2 2. By the construction, σ v and σ 2 v 2. In particular, σ 2 + σ2 2 v2 + v2 2 =. Let F, G denote the distribution functions of X, Y, respectively. Lemma 3.3. With some absolute constant C we have F F C, G G C, F G Φ C. Proof. One may assume that N = N) is a point of continuity of both F and G. Since the Kolmogorov distance is bounded by, one may also assume that is sufficiently small,

6 S. G. Bobkov, G. P. Chistyakov and F. Götze e.g., < min{ 0, }, where = exp{ /3 2 2 )}. In this case N 2) 2 > N ) 2 /2, so Φ N 2)) = ΦN 2) /2 2 e N 2)2 /4 2 e N )2 = 2. By Lemma 3.2 and the basic assumption on the convolution F G, P{Y N} 2 P{X 2, Y N} P{X + Y N 2)} = F G) N 2)) Φ N 2)) +. So, G N) 2Φ N 2))+2 3. Analogously, GN) 3. Thus, { x N} dgx) 6 as well as { x N} df x) 6. In particular, for x < N, we have F x) F x) = F x) 6, and similarly for x > N. If N < x < 0, then F x) = F x) F N), and if 0 < x < N, we have F x) = F x)+ F N)). In both cases, F x) F x) 6. Therefore, F F 6. Similarly, G G 6. From this, by the triangle inequality, F G F G F G F G + F G F G F F + G G 2. Finally, F G Φ F G F G + F G Φ 2 + 3. Proof of Lemma 3.. Since X +Y 2N and a +a 2 = E X +Y ) = x df G x), we have, integrating by parts, a + a 2 = 2N 2N x df G )x) Φx)) = x F G )x) Φx)) x=2n x= 2N 2N F G )x) Φx)) dx. 2N Hence, a + a 2 8N F G Φ, which, by Lemma 3.3, is bounded by CN. Similarly, 2N E X + Y ) 2 = x 2 df G )x) Φx)) x 2 dφx) Hence, 2N { x >2N} = x 2 F G x=2n )x) Φx)) x= 2N 2N 2 x F G )x) Φx)) dx 2N E X + Y ) 2 24 N 2 F G Φ + 2 2N { x >2N} x 2 dφx). x 2 dφx). The last integral asymptotically behaves like 2Nϕ2N) < Ne 2N )2 = N 4. Therefore, E X + Y ) 2 is bounded by CN 2. Finally, writing σ 2+σ2 2 = E X +Y ) 2 a +a 2 ) 2, we get that σ 2 + σ2 2 E X + Y ) 2 + a + a 2 ) 2 CN 2 with some absolute constant C. Lemma 3. follows. Proof of Theorem 2.3. First note that, given a > 0, σ > 0, and x R, the function ψx) = Φ 0,σ x) Φ a,σ x) = Φ ) x σ Φ x a ) σ is vanishing at infinity, has a unique extreme point

x 0 = a 2, and ψx 0) = a/2σ a/2σ ϕy) dy a Stability in Cramer s theorem 7 σ. Hence, including the case a 0, as well, we get 2π Φ a,σ Φ 0,σ a σ 2π. We apply this estimate for a = a and σ = σ. Since EX = 0 and VarX + Y ) =, by Cauchy s and Chebyshev s inequalities, Hence, Φ a,σ Φ 0,σ a σ 2π a = E X { X N} P{ X N} /2 N < log e. C σ log. A similar inequality also holds for a 2, σ 2 ). Now, define the non-negative numbers u = v σ, u 2 = v 2 σ 2. By Lemma 3., CN 2 σ 2 + σ 2 2) = v u ) 2 + v 2 u 2 ) 2) = u 2v u ) + u 2 2v 2 u 2 ) u v + u 2 v 2. Hence, u CN 2 v, u 2 CN 2 v 2. These relations can be used to estimate = Φ 0,v Φ 0,σ. Given two parameters α > β > 0, consider the function of the form ψx) = Φαx) Φβx). In case x > 0, by the mean value theorem, for some x 0 βx, αx), ψx) = α β) xϕx 0 ) < α β) xϕβx). Here, the right-hand side is maximized for x = β, which gives ψx) < β. A similar bound also holds for x < 0. Using this bound with α = /σ σ > 0), β = /v, we obtain v ) = u CN 2 CN 2 2πe σ v 2πe σ σ v σ 2. 2πe α β Thus, applying Theorem 2.2, we get with some universal constant C > that F Φ 0,v F Φ a,σ + Φ a,σ Φ 0,σ + Φ 0,σ Φ 0,v C C mσ, ) + σ log σ log 2C σ log + CN 2 σ 2 mσ, ) + CN 2 σ 2. 3.) The obtained estimate remains valid when σ = 0, as well. On the other hand, σ = v u v CN 2 v 2 v where the last inequality is fulfilled for the range v v) = C N 4) /4. Hence, from 3.) and using mσ, ) 2mv, ), for this range F Φ 0,v 8Cmv, ) v log + 4CN 2 v 2. Here, since mv, ), the first term on the right-hand side majorizes the second one, if v ṽ) = N 2 log. Therefore, when v w) = max{v), ṽ)}, with some absolute constant C we have F Φ 0,v C mv, ). v log

8 S. G. Bobkov, G. P. Chistyakov and F. Götze Thus, we arrive at the desired inequality for the range v w). But the function w behaves almost polynomially near zero and admits, for example, a bound of the form w) C /6, 0 < < 0, with some universal 0 0, ), C >. So, when v w), 0 < < 0, we have. v log w) log C /6 log Here, the last expression is greated than, as long as is sufficiently small, say, for all 0 < <, where is determined by C, 0 ). Hence, for all such, we have a better C bound F Φ 0,v. It remains to increase the constant C in order to involve the v log remaining values of. A similar conclusion is true for G. 4. Stability in Cramer s theorem for the Lévy distance Let X and Y be independent random variables with distribution functions F and G. It turns out that in the bound of Theorem 2.2, the parameter σ can be completely removed, if we consider the stability problem for the Lévy distance. More precisely, the following theorem was established in [C-G]. Theorem 4.. Assume that F G Φ <. If F has median zero, then with some absolute constant C LF, Φ a,σ ) C log log 4 )2. log Recall that a = N N x df x), σ2 = N N x2 df x) a 2 σ 0), and similarly a 2, σ 2 2 ) for G, where N = + 2 log/). As we have already discussed, the assumption about the median may be relaxed to the condition that the median is bounded by a universal constant). The first quantitative stability result for the Lévy distance, namely, LF, Φ a,σ ) C log /8 /), was obtained in 968 by Zolotarev [Z], who applied his famous Berry-Esseen-type bound. The power /8 was later improved to /4 by Senatov [Se] and even more by Shiganov [Sh-2]. The stated asymptotic in Theorem 4. is unimprovable, which was also shown in [C-G]. Note that in the assumption of Theorem 4., the Kolmogorov distance can be replaced with the Lévy distance LF, Φ) in view of the general relations LF, Φ) F G Φ + M) LF, Φ) with M = Φ Lip = 2π. However, in the conclusion such replacement cannot be done at the expense of a universal constant, since we only have F Φ a,σ + M) LF, Φ a,σ ), M = Φ a,σ Lip =. σ 2π Now, our aim is to replace in Theorem 4. the parameters a, σ ), which depend on, with 0, v ) like in Theorem 2.3. That is, we have the following: Question. Assume that EX = EY = 0, VarX + Y ) =, and LF G, Φ) <. Is it true that LF, Φ v ) C log log 4 )2 log with some absolute constant C, where v 2 = VarX)?

Stability in Cramer s theorem 9 In a sense, it is the question on the closeness of σ to v in the situation, where σ is small. Indeed, using the triangle inequality, one can write LF, Φ v ) LF, Φ a,σ ) + LΦ a,σ, Φ 0,σ ) + LΦ σ, Φ v ). Here, the first term may be estimated according to Theorem 4.. For the second one, we have a trivial bound LΦ a,σ, Φ 0,σ ) a, which follows from the definition of the Lévy metric. In turn, the parameter a admits the bound, which was already used in the proof of Theorem 2.3, a <. This bound behaves better than the one in Theorem 4., so we obtain: log e Lemma 4.2. If EX = EY = 0, VarX + Y ) =, and LF G, Φ) <, then LF, Φ v ) C log log 4 )2 log + LΦ σ, Φ v ). Thus, we are reduced to estimating the distance LΦ σ, Φ v ), which in fact should be done in terms of v 2 σ2. The proof of the following elementary bound can be found in [B-C-G3]. Lemma 4.3. If v σ 0, v 2 σ 2, then LΦ σ, Φ v ) 2 v 2 σ 2 ) log 2 v 2 σ 2. Attempts to derive bounds on the distance LΦ σ, Φ v ) by virtue of standard general relations, such as Zolotarev s Berry-Esseen-type estimate [Z2], lead to worse dependences of α 2 = v 2 σ 2. In view of Lemmas 4.2-4.3, to proceed, one needs to bound v 2 σ2 in terms of. However, this does not seem to be possible in general without stronger hypotheses. Note that v 2 σ2 = { x >N} x2 df x) + a 2. Hence, we need to deal with the quadratic tail function δ XT ) = { x >T } x2 df x) T 0), whose behavior at infinity will play an important role in the sequel. Now, combining Lemmas 4.2 and 4.3, we obtain LF, Φ v ) C log log 4 )2 log + R δ X N) + a 2 ), where Rt) = t log2/t). This function is non-negative and concave in the interval 0 t 2, with R0) = 0. Hence, it is subadditive in the sense that Rξ + η) Rξ) + Rη), for all ξ, η 0, ξ + η 2. Hence, R δ X N) + a 2 ) RδX N)) + Ra 2 ) = As we have noticed, a A = a 2 log 2 a 2 log e a 2 log e a 2 δ X N) log 2 ) /2 + a 2 δ X N) log2/a2 )., so a. Since t t loge/t) is increasing on [0, ], A 2 log e A 2 = log e + log log e ). Taking the square root of the right-hand side, we obtain a function which is majorized and absorbed by the bound of Theorem 4.. As a result, we arrive at the following consequence of this theorem. Theorem 4.4. Assume independent random variables X and Y have distribution functions F and G with mean zero and with VarX + Y ) =. If LF G, Φ) <, then with some

0 S. G. Bobkov, G. P. Chistyakov and F. Götze absolute constant C LF, Φ v ) C log log 4 )2 log + δ X N) log2/δ X N)), where v = VarX), N = + 2 log/), and δ X N) = { x >N} x2 df x). It seems that in general it is not enough to know that VarX) and LF G, Φ) <, in order to judge the decay of the quadratic tail function δ X T ) as T. So, some additional properties should be involved. As we will see, the entropic distance perfectly suits this idea, so that one can start with the entropic assumption DX + Y ). 5. Application of Sapogov-type results to Gaussian regularization In this section we consider the stability problem in Cramer s theorem for the regularized distributions with respect to the total variation norm. As a basic tool, we use Theorem 2.3. Thus, let X and Y be independent random variables with distribution functions F and G, and with variances VarX) = v 2, VarY ) = v2 2 v, v 2 > 0, v 2 + v2 2 = ), so that X + Y has variance. What is not important and is assumed for simplicity of notations, only), let both X and Y have mean zero. As we know from Theorem 2.3, the main stability result asserts that if F G Φ <, then F Φ v Cmv, ), G Φ v2 v log Cmv 2, ) v 2 log for some absolute constant C. Here, as before mv, ) = min{ v, log log ee }, v > 0, 0 <. On the other hand, such a statement even in the case of equal variances is no longer true for the total variation norm. So, it is natural to use the Gaussian regularizations X σ = X +σz, Y σ = Y + σz, where Z N0, ) is independent of X and Y, and where σ is a small) positive parameter. For definiteness, we assume that 0 < σ. Note that VarX σ ) = v 2 + σ 2, VarY σ ) = v 2 2 + σ 2 and VarX σ + Y σ ) = + 2σ 2. Denote by F σ and G σ the distributions of X σ and Y σ, respectively. Assume X σ + Y σ is almost normal in the sense of the total variation norm and hence in the Kolmogorov distance, namely, F σ G σ N0, + 2σ 2 ) 2 F σ G σ N0, + 2σ 2 ) TV. Note that X σ + Y σ = X + Y ) + σ 2 Z represents the Gaussian regularization of the sum X + Y with parameter σ 2. One may also write X σ + Y σ = X + Y + σ 2 Z), or equivalently, X σ + Y σ + 2σ 2 = X + Y, where X = X + 2σ 2, Y = Y + σ 2 Z + 2σ 2. Thus, we are in position to apply Theorem 2.3 to the distributions of the random variables X and Y with variances v 2 = v2 and v 2 +2σ 2 2 = v2 2 +2σ2. Using + 2σ 2 3, it gives +2σ 2 F Φ v Cmv, ) v log 3Cmv, ). v log

Stability in Cramer s theorem Now, we apply Proposition A.2.2 b) to the distributions F and G = Φ v with B = v and get Fσ N0, v 2 + σ 2 ) TV 4v σ F Φ v /2 4v 3Cmv, ) σ v /2 log. )/4 One may simplify this bound by using v mv, ) v, and then we may conclude: Theorem 5.. Let F and G be distribution functions with mean zero and variances v 2, v2 2, respectively, such that v 2 + v2 2 =. Let 0 < σ. If the regularized distributions satisfy Fσ G σ N0, + 2σ 2 ) 2 TV, then with some absolute constant C F σ N0, v 2 + σ 2 ) TV C ) /4 σ log, G σ N0, v2 2 + σ 2 ) TV C ) /4 σ log. 6. Control of tails and entropic Chebyshev-type inequality One of our further aims is to find an entropic version of the Sapogov stability theorem for regularized distributions. As part of the problem, we need to bound the quadratic tail function δ X T ) = EX 2 { X T } quantitatively in terms of the entropic distance DX). Thus, assume a random variable X has mean zero and variance VarX) =, with a finite distance to the standard normal law DX) = hz) hx) = px) log px) ϕx) dx, where p is density of X and ϕ is the density of N0, ). One can also write another representation, DX) = Ent γ f), where f = p ϕ, with respect to the standard Gaussian measure γ on the real line. Let us recall that the entropy functional Ent µ f) = E µ f log f E µ f log E µ f is well-defined for any measurable function f 0 on an abstract probability space Ω, µ), where E µ stands for the expectation integral) with respect to µ. We are going to involve a variational formula for this functional cf. e.g. [Le]): For all measurable functions f 0 and g on Ω, such that Ent µ f) and E µ e g are finite, E µ fg Ent µ f) + E µ f log E µ e g. Applying it on Ω = R with µ = γ and f = p ϕ, we notice that E µf = and get that px) gx) dx DX) + log Take gx) = α 2 x2 { x T } with a parameter α 0, ). Then, e gx) ϕx) dx = γ[ T, T ] + e gx) ϕx) dx. 2 α Φ T α )). Using γ[ T, T ] < and the inequality log + t) t, we obtain that 2 δ XT ) α DX) + 2 α )) Φ T α. α

2 S. G. Bobkov, G. P. Chistyakov and F. Götze To further estimate the right-hand side, we apply the bound Φt) ϕt)/t, which leads to Choosing just α = /2, we get 2 δ XT ) α DX) + 2 2π T α α) e α) T 2 /2. 6.) 2 δ XT ) 2DX) + 8 T 2π e T 2 /4 2DX) + 2 e T 2 /4, where the last bounds is fulfilled for T 4/ 2π. For the remaining T the obtained inequality is fulfilled automatically, since then 2e T 2 /4 2e 4/2π >, while 2 δ XT ) 2 EX2 = 2. Thus, we have proved the following: Proposition 6.. If X is a random variable with EX = 0 and VarX) =, having density px), then for all T > 0, x 2 px) dx 4DX) + 4 e T 2 /4. { x T } In particular, the above integral does not exceed 8DX) for T = 2 log + /DX)). The choice α = 2/T 2 in 6.) leads to a better asymptotic in T, and then we also have: Proposition 6.2. If X is a random variable with EX = 0 and VarX) =, having density px), then for all T 2, x 2 px) dx T 2 DX) + 6T e T 2 /2. { x T } In the Gaussian case X = Z this gives an asymptotically correct bound for T up to a factor). Note as well that in the non-gaussian case, from Proposition 6. we obtain an entropic Chebyshev-type inequality P { X 2 log/dx)) } 2DX) log/dx)) DX) < ). Finally, let us give a more flexible variant of Propositions 6. with an arbitrary variance B 2 = VarX) B > 0), but still with mean zero. Applying the obtained statements to the random variable X/B and replacing the variable T with T/B, we then get that B 2 { x T } x 2 px) dx 4DX) + 4 e T 2 /4B 2. 7. Entropic control of tails for sums of independent summands We apply Proposition 6. in the following situation. Assume we have two independent random variables X and Y with mean zero, but perhaps with different variances VarX) and VarY ). Assume they have densities. The question is: Can we bound the tail functions δ X and δ Y in terms of DX + Y ), rather than in terms of DX) and DY )? In case VarX + Y ) =, by Proposition 6., applied to the sum X + Y, δ X+Y T ) = E X + Y ) 2 { X+Y T } 4 DX + Y ) + 4 e T 2 /4. 7.) Hence, to answer the question, it would be sufficient to bound from below the tail functions δ X+Y in terms of δ X and δ Y.

Stability in Cramer s theorem 3 Assume for a while that VarX + Y ) = /2. In particular, VarY ) /2, and according to the usual Chebyshev s inequality, P{Y } 2. Hence, for all T 0, E X + Y ) 2 {X+Y T } E X + Y ) 2 {X T +, Y } E X ) 2 {X T +, Y } 2 E X )2 {X T +}. If X T + 4, then clearly X ) 2 2 X2, hence, E X ) 2 {X T +} 2 E X2 {X T +}. With a similar bound for the range X T + ), we get δ X+Y T ) 4 δ XT + ), T 3. 7.2) Now, change T + with T assuming that T 4) and apply 7.) to 2 X + Y ). Together with 7.2) it gives 4 δ 2X T ) 4 D 2 X + Y ) ) + 4 e T )2 /4. But the entropic distance to the normal is invariant under rescaling of coordinates, i.e., D 2 X +Y )) = DX +Y ). Since also δ 2X T ) = 2 δ XT/ 2), we obtain that δ X T/ 2) 8 DX + Y ) + 8 e T )2 /4, provided that T 4. Simplifying by e T )2 /4 e T 2 /8 valid for T 4), and then replacing T with T 2, we arrive at δ X T ) 8 DX + Y ) + 8 e T 2 /4, T 4/ 2. Finally, to involve the values 0 T 4/ 2, just use e 2 < 8, so that the above inequality holds automatically for this range: δ X T ) VarX) < 8 e T 2 /4. Moreover, in order to allow an arbitrary variance VarX + Y ) = B 2 B > 0), the above estimate should be applied to X/B 2 and Y/B 2 with T replaced by T/B 2. Then it takes the form We can summarize. 2B 2 δ XT ) 8 DX + Y ) + 8 e T 2 /8B 2. Proposition 7.. Let X and Y be independent random variables with mean zero and with VarX + Y ) = B 2 B > 0). Assume X has a density p. Then, for all T 0, B 2 x 2 px) dx 6 DX + Y ) + 6 e T 2 /8B 2. { x T } 8. Stability for Lévy distance under entropic hypothesis Now we can return to the variant of the Chistyakov-Golinski result, as in Theorem 4.4. Let the independent random variables X and Y have mean zero, with VarX +Y ) =, and denote by F and G their distribution functions. Also assume X has a density p. In order to control the term δ X N) in Theorem 4.4, we are going to impose the stronger condition DX +Y ) 2. Using Pinsker s inequality, this yields bounds for the total variation and Kolmogorov distances F G Φ 2 F G Φ TV 2 2DX + Y ) =.

4 S. G. Bobkov, G. P. Chistyakov and F. Götze Hence, the assumption of Theorem 4.4 is fulfilled, whenever <. As for the conclusion, first apply Proposition 7. with B =, which gives δ X T ) = x 2 px) dx 6 DX + Y ) + 6 e T 2 /8 6 + 6 e T 2 /8. { x T } In our situation, N = + 2 log/ ) = + log/), so, δ X N) 6 +6 e N 2 /8 C /8. Thus, we arrive at: Proposition 8.. Let the independent random variables X and Y have mean zero, with VarX +Y ) =, and assume that X has a density with distribution function F. If DX +Y ) 2 < 2, then LF, Φ v ) C log log 4 )2, log where v = VarX) and C is an absolute constant. In general, in the conclusion one cannot replace the Lévy distance LF, Φ v ) with DX). However, this is indeed possible for regularized distributions, as we will see in the next sections. 9. Entropic distance and uniform deviation of densities Let X and Y be independent random variables with mean zero, finite variances, and assume X has a bounded density p. Our next aim is to estimate the entropic distance to the normal, DX), in terms of DX + Y ) and the uniform deviation of p above the normal density X) = ess sup x px) ϕ v x)), where v 2 = VarX) and ϕ v stands for the density of the normal law N0, v 2 ). For a while, assume that VarX) =. Proposition A.3.2 gives the preliminary estimate [ DX) X) 2π + 2T + 2T log + X) )] 2π e T 2 /2 + 2 δ XT ), involving the quadratic tail function δ X T ). In the general situation one cannot say anything definite about the decay of this function. However, it can be bounded in terms of DX + Y ) by virtue of Proposition 7.: we know that, for all T 0, 2B 2 δ XT ) 8 DX + Y ) + 8 e T 2 /8B 2, where B 2 = VarX + Y ) = + VarY ). So, combining the two estimates yields DX) 8B 2 DX + Y ) + 8B 2 e T 2 /8B 2 + [ 2π + 2T + 2T log + 2π e T 2 /2 )], where = X). First assume and apply the above with T 2 = 8B 2 log. Then 8B2 e T 2 /8B 2 = 8B 2, and putting β = 4B 2 3, we also have log + 2π e T 2 /2 ) = log + β ) 2π = β log + β ) /β 2π ) < β log + 2π)/2β < β log + 2 ).

Stability in Cramer s theorem 5 Collecting all the terms and using B, we are lead to the estimate of the form DX) 8B 2 DX + Y ) + CB 3 log 3/2 2 + ), where C > 0 is an absolute constant. It holds also in case > in view of the logarithmic bound of Proposition A.3., DX) log + ) 2π + 2. Therefore, the obtained bound holds true without any restriction on. Now, to relax the variance assumption, assume VarX) = v 2, VarY ) = v2 2 v, v 2 > 0), and without loss of generality, let VarX + Y ) = v 2 + v2 2 =. Apply the above to X = X v, Y = Y v. Then, B 2 = /v 2 and X ) = v X), so with some absolute constant c > 0, As a result, we arrive at: c v 2 DX) DX + Y ) + X) log 3/2 2 + ). v X) Proposition 9.. Let X, Y be independent random variables with mean zero, VarX+Y ) =, and such that X has a bounded density. Then, with some absolute constant c > 0, c VarX) DX) DX + Y ) + X) log 3/2 2 + VarX) X) ). Replacing the role of X and Y, and adding the two inequalities, we also have as corollary: Proposition 9.2. Let X, Y be independent random variables with mean zero and positive variances v 2 = VarX), v2 2 = VarY ), such that v2 + v2 2 =, and both with densities. Then, with some absolute constant c > 0, ) c v 2 DX)+v2 2 DY )) DX +Y )+ X) log 2+ 3/2 )+ Y ) log 3/2 2+. v X) v 2 Y ) This inequality may be viewed as the inverse to the general property of the entropic distance, which we mentioned before, namely, v 2 DX)+v2 2 DY ) DX +Y ), under the normalization assumption v 2 + v2 2 =. Let us also state separately Proposition 9. in the particular case of equal unit variances, keeping the explicit constant 8B 2 = 6 in front of DX + Y ). Proposition 9.3. Let X, Y be independent random variables with mean zero and variances VarX) = VarY ) =, and such that X has a density. Then, with some absolute constant C DX) 6 DX + Y ) + C X) log 3/2 2 + ). X) One may simplify the right-hand side for small values of X) and get a slightly weaker inequality DX) 6 DX + Y ) + C α X) α, 0 < α <, where the constants C α depend on α, only. For large values of X), the above inequality holds, as well, in view of the logarithmic bound of Proposition of A.3..

6 S. G. Bobkov, G. P. Chistyakov and F. Götze 0. The case of equal variances We are prepared to derive an entropic variant of Sapogov-type stability theorem for regularized distributions. That is, we are going to estimate DX σ ) and DY σ ) in terms of DX σ +Y σ ) for two independent random variables X and Y with distribution functions F and G, by involving a small smoothing parameter σ > 0. It will not be important whether or not they have densities. Since it will not be important for the final statements, let X and Y have mean zero. Recall that, given σ > 0, the regularized random variables are defined by X σ = X + σz, Y σ = Y + σz, where Z is independent of X and Y, and has a standard normal density ϕ. The distributions of X σ, Y σ are denoted F σ, G σ, with densities p σ, q σ. In this section, we consider the case of equal variances, say, VarX) = VarY ) =. Put σ = + σ 2, σ 2 = + 2σ 2. Since VarX σ ) = VarY σ ) = σ 2, the corresponding entropic distances are given by DX σ ) = hσ Z) hx σ ) = p σ x) log p σx) ϕ σ x) dx, and similarly for Y σ, where, as before, ϕ v represents the density of N0, v 2 ). Assume that DX σ +Y σ ) is small in the sense that DX σ +Y σ ) 2 < 2. According to Pinsker s inequality, this yields bounds for the total variation and Kolmogorov distances F σ G σ Φ σ2 2 F σ G σ Φ σ2 TV <. In the sequel, let 0 < σ. This guarantees that the ratio of variances of the components in the convolution F σ G σ = F G Φ σ 2 ) is bounded away from zero by an absolute constant, so that we can apply Theorem 2.3. Namely, it gives that F Φ C log /2 ), and similarly for G. Note that raising to any positive power does not change the above estimate.) Applying Proposition A.2. a), when one of the distributions is normal, we get X σ ) = sup x p σ x) ϕ σ x)) σ F Φ C. σ log We are in position to apply Proposition 9.3 to the random variables X σ /σ, Y σ /σ. It gives DX σ ) 6 DX σ + Y σ ) + C X σ ) log 3/2 ) 2 + 32 + C log3/2 2 + σ log ), X σ ) σ log where C is an absolute constant. In the last expression the second term dominates the first one, and at this point, the assumption on the means may be removed. We arrive at: Proposition 0.. Let X and Y be independent random variables with variance one. Given 0 < < and 0 < σ, the regularized random variables X σ and Y σ satisfy DX σ + Y σ ) 2 DX σ ) + DY σ ) C log3/2 2 + σ log ), 0.) σ log where C is an absolute constant. Note that all entropic distances in 0.) do not change when adding constants to X and Y, which allows us to remove the mean zero assumption. This statement may be formulated equivalently by solving the above inequality with respect to. The function ux) = x log 3/2 2+x)

Stability in Cramer s theorem 7 is increasing in x 0, and, for any a 0, ux) a x 8 a log 3/2 2 + a). Hence, assuming DX σ + Y σ ), we obtain from 0.) that σ log 8C D log3/2 2 + C/D) C D log3/2 2 + /D) with some absolute constant C, where D = DX σ ) + DY σ ). As a result, { DX σ + Y σ ) exp C 2 log 3 2 + /D) } σ 2 D 2. Note also that this inequality is fulfilled automatically, if DX σ + Y σ ). Thus, we get: Proposition 0.2. Let X, Y be independent random variables with VarX) = VarY ) =. Given 0 < σ, the regularized random variables X σ and Y σ satisfy { DX σ + Y σ ) exp C log3 2 + /D) } σ 2 D 2, where D = DX σ ) + DY σ ) and C > 0 is an absolute constant.. Proof of Theorem. Now let us consider the case of arbitrary variances VarX) = v 2, VarY ) = v2 2 v, v 2 0). For normalization reasons, let v 2 + v2 2 =. Then VarX σ ) = v 2 + σ 2, VarY σ ) = v 2 2 + σ 2, VarX σ + Y σ ) = σ 2 2, where σ 2 = + 2σ 2. As before, we assume that both X and Y have mean zero, although this will not be important for the final conclusion. Again, we start with the hypothesis DX σ + Y σ ) 2 < 2 and apply Pinsker s inequality: F σ G σ Φ σ2 2 F σ G σ Φ σ2 TV <. For 0 < σ, write F σ G σ = F G Φ σ 2 ). Now, the ratio of variances of the components v in the convolution, 2, may not be bounded away from zero, since v +2σ 2 is allowed to be small. Hence, the application of Theorem 2.3 will only give F Φ v Cmv,) and similarly for v log G. The appearance of v on the right is however not desirable. So, it is better to involve the Lévy distance, which is more appropriate in such a situation. Consider the random variables X = X + 2σ 2, Y = Y + σ 2Z + 2σ 2, so that VarX + Y ) =, and denote by F, G their distribution functions. Since the Kolmogorov distance does not change after rescaling of the coordinates, we still have LF G, Φ) F G Φ = F σ G σ Φ σ2 <. In this situation, we may apply Proposition 8. to the couple F, G ). It gives that LF, Φ v ) C log log 4 ) 2 log ) /2 with some absolute constant C, where v = VarX v ) = +2σ. Since 2 v v 3v, we have a similar conclusion about the original distribution functions, i.e. LF, Φ v )

8 S. G. Bobkov, G. P. Chistyakov and F. Götze C log log 4 )2 log ) /2. Now we use Proposition A.2.3 applied when one of the distributions is normal), which for σ gives X σ ) 3 2σ 2 LF, Φ v ), and similarly for Y. Hence, X σ ) C log log 4 )2, Y σ ) C σ log log log 4 )2..) 2 σ log 2 We are now in a position to apply Proposition 9.2 to the random variables X σ = X σ / + σ 2, Y σ = Y σ / + σ 2, which ensures that with some absolute constant c > 0 c v σ) 2 DX σ ) + v 2 σ) 2 DY σ )) DX σ + Y σ ) + X σ ) log 3/2 2 + v σ) X σ ) ) + Y σ ) log 3/2 2 + ), v 2 σ) Y σ ) where v σ) 2 = VarX σ) = v2 +σ2 and v +σ 2 2 σ) 2 = VarY σ) = v2 2 +σ2 v +σ 2 σ), v 2 σ) 0). Note that v σ) σ/ 2. Applying the bounds in.), we obtain that c v σ) 2 DX σ ) + v 2 σ) 2 DY σ )) DX σ + Y σ ) + log log 4 )2 log σ log 3/2 2 + σ log ) 2 log log 4 )2 with some other absolute constant c > 0. Here, DX σ + Y σ ) 2, which is dominated by the last expression, and we arrive at: Proposition.. Let X, Y be independent random variables with VarX +Y ) =. Given 0 < σ, if the regularized random variables X σ, Y σ satisfy DX σ + Y σ ) 2 < 2, then with some absolute constant C VarX σ ) DX σ ) + VarY σ ) DY σ ) C log log 4 )2 log σ log 3/2 2 + σ log ) 2 log log 4..2) )2 It remains to solve this inequality with respect to. Denote by D the left-hand side of.2) and let D = σ 2 D. Assuming that DX σ + Y σ ) < 2 and arguing as in the proof of Proposition 0.2, we get σ log 8C σd log 3/2 2 + C/D ), hence C A log log 4 )2 log log 4 )4 D 2 some absolute constant C. The latter inequality implies with some absolute constants log C A log 4 2 + A) C D 2 log log7 2 + /D), log 3 2 + /D) with and we arrive at the inequality of Theorem. which holds automatically, if DX σ +Y σ ) ). 2. Appendix I: General bounds for distances between distribution functions Here we collect a few elementary and basically known relations for classical metrics, introduced at the beginning of Section 2. Let F and G be arbitrary distribution functions of some random variables X and Y. First of all, the Lévy, Kolmogorov, and the total variation distances are connected by the chain of the inequalities 0 LF, G) F G 2 F G TV. As for the Kantorovich-Rubinshtein distance, there is the following well-known bound. Proposition A... We have LF, G) W F, G) /2.

Stability in Cramer s theorem 9 We also have: Proposition A..2. If x2 df x) B 2 and x2 dgx) B 2 B 0), then a) W F, G) 2LF, G) + 4B LF, G) /2 ; b) W F, G) 4B F G /2. 3. Appendix II: Relations for distances between regularized distributions Now, let us turn to the regularized random variables X σ = X + σz, Y σ = Y + σz, where σ > 0 is a fixed parameter and Z N0, ) is a standard normal random variable independent of X and Y. They have distribution functions F σ and G σ with densities p σ x) = q σ x) = These identities easily imply: ϕ σ x y) df y) = σ 2 F x y) y ϕ σ y) dy, ϕ σ x y) dgy) = σ 2 Gx y) y ϕ σ y) dy. Proposition A.2.. a) sup x p σ x) q σ x) σ F G ; b) F σ G σ TV σ W F, G). Thus, if F is close to G in a weak sense, then the regularized distributions will be closed in a stronger sense, at least when σ is not very small. One may replace W in part b) and the Kolmogorov distance in part a) with other metrics: Proposition A.2.2. If x2 df x) B 2 and x2 dgx) B 2 B 0), then a) F σ G σ TV 2 σ [ LF, G) + 2B LF, G) /2 ] ; b) F σ G σ TV 4B σ F G /2. Proposition A.2.3. sup x p σ x) q σ x) LF,G) σ + 2σ ). 4. Appendix III: Special bounds for entropic distance to the normal Let X be a random variable with mean zero and variance VarX) = v 2 v > 0) and with a bounded density p. In this section we formulate bounds for the entropic distance DX) in terms of the quadratic tail function δ X T ) = { x T } x2 px) dx and another quantity, which is directly responsible for the closeness to the normal law, X) = ess sup x px) ϕ v x)). As before, ϕ v stands for the density of a normal random variable Z N0, v 2 ), and we write ϕ in the standard case v =. The functional X) is homogeneous with respect to X with power of homogeneity in the sense that λx) = X)/λ λ > 0). Hence, the functional VarX) X) is invariant under rescaling of the coordinates. To relate DX) and X), write px) ϕ v x) + v +, so px) v 2π + v 2π. This gives: 2π

20 S. G. Bobkov, G. P. Chistyakov and F. Götze Proposition A.3.. Let X be a random variable with mean zero and variance VarX) = v 2 v > 0), having a bounded density. Then DX) log + v X) ) 2π + 2. This estimate cannot, however, be used to see that X is almost normal. So, we need to refine Proposition A.3. for the case, where X) is small. Proposition A.3.2. Let X be a random variable with mean zero and variance VarX) =, having a bounded density. For all T 0, DX) X) [ 2π + 2T + 2T log + X) 2π e T 2 /2 )] + 2 δ XT ). Hence, if X) is small and T is large, but not much, the right-hand side can be made small. When X) 2, one may take T = 2 log/ X)) which leads to the estimate DX) C X) log/ X)) + 2 δ XT ), where C is absolute constant. If X satisfies the tail condition P{ X t} Ae t2 /2 t > 0), we have δ X T ) ca + T 2 ) e T 2 /2 and then DX) C A X) log X). References [B-C-G] Bobkov, S. G., Chistyakov, G. P., Götze, F. Entropic instability of Cramer s characterization of the normal law. Selected works of Willem van Zwet, 23-242, Sel. Works Probab. Stat., Springer, New York, 202. [B-C-G2] Bobkov, S. G., Chistyakov, G. P., Götze, F. Stability problems in Cramer-type characterization in case of i.i.d. summands. Theory Probab. Appl. 57 202), no. 4, 70-723. [B-C-G3] Bobkov, S. G., Chistyakov, G. P., Götze, F. Regularized distributions and entropic stability of Cramer s characterization of the normal law. Extended version arxiv:504.0296v [math.pr] 2 Apr 205. [B-G-R-S] Bobkov, S. G., Gozlan, N., Roberto, C., Samson, P.-M. Bounds on the deficit in the logarithmic Sobolev inequality. J. Funct. Anal. 267 204), no., 40 438. [C-S] Carlen, E. A., Soffer, A. Entropy production by block variable summation and central limit theorems. Comm. Math. Phys. 40 99), no. 2, 339 37. [C] Chistyakov, G. P. The sharpness of the estimates in theorems on the stability of decompositions of a normal distribution and a Poisson distribution. Russian) Teor. Funkcii Funkcional. Anal. i Prilozhen. Vyp. 26 976), 9-28, iii. [C-G] Chistyakov, G. P., Golinskii, L. B. Order-sharp estimates for the stability of decompositions of the normal distribution in the Levy metric. Russian) Translated in J. Math. Sci. 72 994), no., 2848 287. Stability problems for stochastic models Russian) Moscow, 99), 6 40, Vsesoyuz. Nauchno-Issled. Inst. Sistem. Issled., Moscow, 99. [Cr] Cramér, H. Ueber eine Eigenschaft der Normalen Verteilungsfunktion. Math. Zeitschrift, Bd. 4 936), 405 44. [D-C-T] Dembo, A., Cover, T. M., Thomas, J. A. Information-theoretic inequalities. IEEE Trans. Inform. Theory, 37 99), no. 6, 50 58.

Stability in Cramer s theorem 2 [F-M-P] Figalli, A., Maggi, F., Pratelli, A. A refined Brunn Minkowski inequality for convex sets. Ann. Inst. H. Poincaré Anal. Non Linéaire 26 2009), no. 6, 25 259. [F-M-P2] Figalli, A., Maggi, F., Pratelli, A. Sharp stability theorems for the anisotropic Sobolev and log-sobolev inequalities on functions of bounded variation. Adv. Math. 242 203), 80 0. [Le] [L] [L2] Ledoux, M. Concentration of measure and logarithmic Sobolev inequalities. Seminaire de Probabilites, XXXIII, 20 26, Lecture Notes in Math., 709, Springer, Berlin, 999. Linnik, Yu. V. A remark on Cramer s theorem on the decomposition of the normal law. Russian) Teor. Veroyatnost. i Primenen. 956), 479 480. Linnik, Yu. V. General theorems on the factorization of infinitely divisible laws. III. Sufficient conditions countable bounded Poisson spectrum; unbounded spectrum; stability ). Theor. Probability Appl. 4 959), 42 63. [L-O] Linnik, Yu. V., Ostrovskii, I. V. Decompositions of random variables and random vectors. Russian) Izd. Nauka, Moscow, 972, 479 pp. Translated from the Russian: Translations of Mathematical Monographs, Vol. 48. American Math. Society, Providence, R. I., 977, ix+380 pp. [M] Maloshevskii, S. G. Unimprovability of N. A. Sapogov s result in the stability problem of H. Cramer s theorem. Russian) Teor. Verojatnost. i Primenen, 3 968) 522 525. [MK] McKean, H. P., Jr. Speed of approach to equilibrium for Kac s caricature of a Maxwellian gas. Arch. Rational Mech. Anal. 2 966), 343 367. [S] [S2] [S3] Sapogov, N. A. The stability problem for a theorem of Cramér. Russian) Izvestiya Akad. Nauk SSSR. Ser. Mat. 5 95), 205 28. Sapogov, N. A. The stability problem for a theorem of Cramér. Russian) Izvestiya Akad. Nauk SSSR. Ser. Mat. 5 95), 205 28. Sapogov, N. A. The problem of stability for a theorem of Cramér. Russian) Vestnik Leningrad. Univ. 0 955), no., 6 64. [Seg] Segal, A. Remark on stability of Brunn Minkowski and isoperimetric inequalities for convex bodies, in Geometric Aspects of Functional Analysis, in: Lecture Notes in Mathematics, 2050 202), 38 39. [Se] Senatov, V. V. Refinement of estimates of stability for a theorem of H. Cramér. Russian) Continuity and stability in problems of probability theory and mathematical statistics. Zap. Naucn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. LOMI) 6 976), 25 34. [Sh] Shiganov, I. S. Some estimates connected with the stability of H. Cramer s theorem. Russian) Studies in mathematical statistics, 3. Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. LOMI) 87 979), 87-95. [Sh2] Shiganov, I. S. On stability estimates of Cramer s theorem. Stability problems for stochastic models Varna, 985), 78-8, Lecture Notes in Math., 233, Springer, Berlin, 987. [Z] [Z2] Zolotarev, V. M. On the problem of the stability of the decomposition of the normal law into components. Russian) Teor. Verojatnost. i Primenen., 3 968), 738 742. Zolotarev, V. M. Estimates of the difference between distributions in the Levy metric. Russian) Collection of articles dedicated to Academician Ivan Matveevich Vinogradov on his eightieth birthday, I. Trudy Mat. Inst. Steklov. 2 97), 224 23, 388. Sergey G. Bobkov School of Mathematics, University of Minnesota 27 Vincent Hall, 206 Church St. S.E., Minneapolis, MN 55455 USA

22 S. G. Bobkov, G. P. Chistyakov and F. Götze E-mail address: bobkov@math.umn.edu Gennadiy P. Chistyakov Fakultät für Mathematik, Universität Bielefeld Postfach 003, 3350 Bielefeld, Germany E-mail address: chistyak@math.uni-bielefeld.de Friedrich Götze Fakultät für Mathematik, Universität Bielefeld Postfach 003, 3350 Bielefeld, Germany E-mail address: goetze@mathematik.uni-bielefeld.de