A square bias transformation: properties and applications

A square bias transformation: properties and applications Irina Shevtsova arxiv:11.6775v4 [math.pr] 13 Dec 13 Abstract The properties of the square bias transformation are studied, in particular, the precise moment-type estimate for the L 1 -metric between the transformed and the original distributions is proved, a relation between their characteristic functions is found. As a corollary, some new moment-type estimates for the proximity of arbitrary characteristic function with zero mean and finite third moment to the normal one with zero mean and the same variance are proved involving the double integrals of the square- and zero- bias transformations. Key words and phrases: probability transformation, zero bias transformation, size bias transformation, square bias transformation, characteristic function, L 1 -metric AMS 1 Mathematics Subject Classification: 6E1, 6E15 1 Introduction Let X be a random variable r.v.) with the distribution function d.f.) Fx) = PX < x), x R, and the characteristic function ch.f.) ft) = Ee itx = e itx dfx), t R, which is the Fourier-Stieltjes transform of the d.f. Fx). As is well known, if X is nonnegative with < EX <, then f t), t R, 1) f ) is a ch.f., and if < EX <, then f t) f ), tf ) f t) f ) Research supported by the Russian Foundation for Basic Research projects 11-1-515a, 11-7-11a, 11-1-16-ofi-m) and by the grant of the President of Russia MK 56.1.1). Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics, Leninskie Gory, GSP-1, Moscow, 119991, Russia; Institute for Informatics Problems of the Russian Academy of Sciences; e-mail: ishevtsova@cs.msu.su 1

are ch.f. s as well see, e.g., [15, Theorem 1..5]). The probability transformation given by 1) is called the X-size bias transformation. By a transformation of a random variable we mean that of its distribution. The X-size bias transformation was introduced by Goldstein and Rinott [7] for the purpose of estimation of the accuracy of the multivariate normal approximation to nonnegative random vectors under conditions of local dependence by Stein s method. Namely, in [7] an almost surely nonnegative r.v. X with < EX < is said to have the X-size biased distribution if dpx < x) = x dfx), x R. ) EX It is easy to see that the distribution given by ) has the ch.f. given by 1), hence, by virtue of the uniqueness theorem, definitions 1) and ) are equivalent. In the same paper Goldstein and Rinott also noticed that the distribution of X may be characterized by the relation EXGX) = EXEGX ), which should hold for all functions G such that EXGX) <. As regards the second transformation, if EX =, then the distribution given by the ch.f. f t) f ) tf ) = 1 σ f t), t where σ = EX >, is called the X-zero biased distribution. This definition was introduced by Goldstein and Reinert in [5] in an equivalent form for the purpose of generalization of the size bias transformation to r.v. s taking both positive and negative values and was inspired by the characteristic property of the mean zero normal distribution as the unique fixed point of the zero bias transformation. Namely, in [5] a r.v. X z) is said to have the X-zero biased distribution if EX = and EXGX) = σ EG X z) ) for all absolutely continuous functions G for which EXGX) exists. The zero biased transformation possesses the following elementary properties most of them are noticed/proved in [5]): 1. The zero biased distribution is absolutely continuous and unimodal about zero with the probability density function px) = σ EXIX > x), x R, and the ch.f. Ee itxz) 1 σ f t), t R. t. X z) d = X if and only if X has the normal distribution with zero mean [1, 5]. 3. The zero biased transformation preserves symmetry. 4. σ EX z) ) n = EX n+ /n+1) for n N, in particular, σ EX z) =.5EX 3. 5. If X = Y 1 +...Y n, where Y 1,...,Y n are independent r.v. s with zero means and EYj = σj > so that σ 1 +... + σ n = σ, then X z) = X I + Y z) I, where I is a random index independent of Y 1,...,Y n with the distribution PI = i) = σi/σ, i = 1,...,n, and X i = X Y i = j i Y j.

6. The following non-trivial estimate was proved in 9 independently by Goldstein [4] and Tyurin [3, ]: L 1 X,X z) ) E X 3 σ, 3) L 1 X,Y) being the L 1 -distance between the r.v. s X and Y, { } L 1 X,Y) = inf E X Y : X = d X, Y = d Y, E X <, E Y <. As regards the third transformation, given by the characteristic function f t) f t) f ) = f t) σ, t R, 4) where σ EX, ), it is called the X-square bias transformation. It is easy to see that a r.v. X has the ch.f. f t) if and only if EX GX) = σ EGX ) 5) for all functions G such that EX GX) <. In 7 L.Goldstein [3] called the distribution of a r.v. X satisfying 5) the X-square biased distribution. In 11 L.Chen, L.Goldstein and Q.-M. Shao [1, Proposition.3] proved the following relation between the distribution of the zero biased X z) and square biased X distributions of a symmetric r.v. X: X z) d = UX, where the r.v. s U, X are independent, U having uniform distribution on [ 1,1]. Taking into account that the square bias transformation preserves symmetry see below), the latest relation Later in [19] it was noticed that X = X ), and the distribution of X obtained the second name: X-double size bias distribution. In the same paper the following characterization of the modulus of the normally distributed r.v. was proved: X d = U 1 X, where U 1 has uniform distribution on[,1] and is independent ofx, if and only if X d = Z, where Z has the standard normal distribution. It is easy to see that the square bias transformation possesses the following elementary properties. 1. A r.v. X has the X-square biased distribution if and only if its d.f. F x) satisfies df x) = x dfx), x R. 6) σ. X d = X if and only if P X = σ) = 1, i.e. any Bernoulli distribution with symmetric atoms is a fixed point of the square bias transformation. This can be verified by noticing that the solution of the corresponding linear homogeneous differential equation f t) + σ ft) = of the second order with the initial condition f) = 1 has the form ft) = pe iσt +1 p)e iσt, p R, being a ch.f. if and only if p [,1]. 3. Square bias transformation preserves symmetry. Indeed, if the r.v. X has a symmetric distribution, then it s ch.f. ft) is even, and hence, f t) = f t)/f ) = f t)/f ) = f t), i.e. the distribution of X is symmetric as well. 3

4. X ) d = X ), where X ) has the X -size biased distribution. 5. cx) = cx for any constant c R. 6. σ EX ) n = EX n+, n N, σ E X r = E X r+, r >, in particular, σ EX = EX 3, σ E X = E X 3. Moreover, the following estimate for the L 1 -distance between the distributions of X and X will be proved in this paper. Theorem 1. If EX =, EX = 1, and E X 3 <, then L 1 X,X ) E X 3, moreover, for any ε > there exists a distribution of a r.v. X concentrated in two points, such that EX =, EX = 1, E X 3 <, and L 1 X,X ) > 1 ε)e X 3. The existence of the square bias transformation follows from the earlier result of [15] mentioned above. Moreover, in 5, Goldstein and Reinert [6] proved the existence of a class of transformations of probability distributions that are characterized by equations like 5). Namely, the authors described a class of measurable functions T : R R that provide the existence and uniqueness of the distribution of a random variable X T) such that ETX)GX) = EG m) X T) ) EXm TX) m! for all m times differentiable functions G: R R with E TX)GX) <. The authors of [6] also noticed that this class includes the zero- and size- bias transformations respectively with m = 1, Tx) = x and m =, Tx) = x +. L.Goldstein [3] noticed that this class also includes the square bias transformation with m =, Tx) = x ). However, up till now the properties of the square biased transformation have not been studied, in particular, the characteristic function of the square biased distribution and the estimate for L 1 X,X ) are established in this paper for the first time. Motivation and applications The zero bias transformation gives an opportunity to construct an integral estimate for the proximity of a ch.f. with zero mean to the normal one with the same variance in terms of the proximity of the corresponding zero biased distribution to the original one, which might be sharper than non-integral estimates based on the Taylor formula in the neighborhood of zero. Namely, for the sake of convenience put σ = 1 implying β 3 1 by the Lyapounov inequality. Then using the elementary relations ft) e t / = e t / t fu)e u / 1 ) t du = e / t f u)+σ ufu) ) e u / du = 4 = e t / t Ee iux Ee iuxz) ) ue u / du, 7)

and the estimate for the difference of arbitrary ch.f. s with finite first moments due to Korolev and Shevtsova [1]: Ee itx Ee ity t sin L 1X,Y) π ), E X,E Y <, t R, 8) where a b min{a,b}, a,b R, it is not difficult to conclude that t rt) ft) e t / e t / Ee iux ue Ee iuxz) u / du t e t / sin u L 1X,X z) ) π ) ue u / du, t R. Finally, applying inequality 3) to estimate L 1 X,X z) ) one obtain t rt) e t / β3 u sin 4 π ) ue u / du, t R, 9) for any r.v. X with EX =, EX = 1, E X 3 = β 3 <. Estimate 9) is exact as t, since, as is well known, rt) β 3 t 3 /6, and 9) implies that for all β 3 1 such that β 3 t π/ we have t β3 u ) rt) usin du = 3 sin β 3 t β 3 t 4 β3 4 4 cos β ) 3 t < β 3 t 3 4 6, with the least possible factor 1/6. However, estimate 9) is always sharper than the power-type estimate rt) β 3 t 3 /6 especially for moderate separated from zero) values of β 3 t. Note that β 3 t can be separated from zero for large enough values of β 3 1 even if t is small. Thus, the estimates for rt) of an integral 9)-type form play an important role in the construction of the least possible upper moment-type bounds of the accuracy of the normal approximation which should be uniform in some classes of distributions, especially if in these classes extremal distributions have large third absolute moments. This situation is typical, for example, for the problem of optimization of the absolute constants in the Berry Esseen-type inequalities with an improved structure see [11, 1, 1, 14, ] where a smoothing inequality is applied with the subsequent estimation of the difference f n t) e t /, f n t) being the ch.f. of the normalized sum of independent random variables, in terms of the difference ft) e t /, ft) being the ch.f. of a single r.v.) and in its non-uniform analogues for sums of independent r.v. s that use the Berry Esseen inequality with an improved structure see [, 18, 8]), as well as in the moment-type estimates of the rate of convergence in limit theorems for compound and mixed compound Poisson distributions where β 3, see [13, 1, 17]) which use the Berry Esseen inequality with an improved structure as well. The above reasoning suggests that for the moderate values of β 3 t, estimates for rt) in the twice-integrated form might be even sharper than estimates in the once-integrated form like 9). Since the ch.f. ft) is supposed to be differentiable at least twice, it is possible to continue 7) as ft) e t / = e t / t u e u / t u e u / = e t / f s)+sfs) ) dsdu = f s)+fs)+sf s) ) dsdu = = e t / 5 t u e u / ) Ee isx Ee isx +sf s) dsdu,

or as ft) e t / = e t / = e t / t u t u = e t / t u fs)e s / 1 ) dsdu = f s)+fs)+sf s)+s f s)+sfs) )) e s / dsdu = Ee isx Ee isx +sf s)+s Ee isx Ee isxz) ) ) e s / dsdu. Note that the second estimate contains the additional term s Ee isx Ee isxz) ), but the factor e s / does not exceed the analogous factor e u / in the first one. However, for all s t this additional term satisfies g 3 s) s Ee isx Ee isxz) s t = Ot ), t, actually, an even sharper estimate can be obtained, if inequalities 8) and 3) are used). If EX =, EX = 1, then for all s t we have g s) = s f s) = s EXe isx EX se Xe isx 1) s EX = s t = Ot ), t, so that while the first term sup g s)+g 3 s)) = Ot ), t, s t g 1 s) = Ee isx Ee isx = fs)+f s) should be equivalent to β 3 s as s + in order that the final integrated estimate have the exact order β 3 t 3 /6 as t. Thus, it is g 1 s) that determines the behavior of the final integral estimate for small values of s, and the problem of construction of the least possible bound for g 1 s) is very important. Theorem 1 gives an opportunity to construct such a bound. Namely, the following corollaries hold. Corollary 1. Let X be a r.v. with the ch.f. ft) and EX =, EX = 1, β 3 E X 3 <. Then for all t R Ee isx Ee isx ft)+f β3 t t) sin π ). Corollary. Let X be a r.v. with the ch.f. ft) and EX =, EX = 1, β 3 E X 3 <. Then for all t R ft) e t / e t / u t u β3 s sin π ) )ds+ u3 e u / 3 β3 s sin π ) +s β3 s sin 4 π ) )+s e s / dsdu. 6

to Note that, ast +, the r.-h. sides of the inequalities presented in corollary are equivalent t u sin β3 s ) dsdu = 8 β3 t β3 sin β ) 3t < < 3 β 3 sin β 3t 4 β 3t 4 cos β ) 3t = 4 t usin β3 u ) du, 4 provided that β 3 t π. Thus, the estimates including the square bias transformation which are presented in corollary in the twice-integrated form are sharper as t that the estimates in the once-integrated form which include the zero bias transformation only. So, corollary plays an important role in estimation of the rate of convergence in limit theorems for sums of independent random variables mentioned above. However, particular application of corollary is the subject of a separate investigation and will be published elsewhere. 3 Proof of theorem 1 As is known see, e.g. [4, Theorem 1.3.1]), the L 1 -metric can be represented in terms of the ζ 1 -metric as L 1 X,Y) = ζ 1 X,Y) sup{ EgX) EgY) : g F 1 }, E X <,E Y <, where F 1 is the set of all real-valued functions on R such that sup x y gx) gy) / x y 1. Since for any function g F 1 we also have g) F 1, we conclude that the modulus in the definition of ζ 1 X,Y) can be omitted: L 1 X,X ) = ζ 1 X,X ) = sup{egx) EgX ): g F 1 }. Let X be a r.v. with the d.f. Fx) and EX =, EX = 1, E X 3 <, X have X-square biased distribution, i.e. the d.f. F x) of the r.v. X satisfying the relation df x) = x dfx), x R. Then For g F 1 denote Then L 1 X,X ) = sup g F 1 EgX) EgX )) = sup g F 1 JF,g) = E1 X )gx) X 3 ) = L 1 X,X ) E X 3 = sup g F 1 JF,g), and the statement of the theorem is equivalent to sup sup g F 1 F JF,g) =, 1 x )gx)dfx). 1 x )gx) x 3 )dfx). where the supremum sup F is taken over all d.f. s F of the r.v. X satisfying two moment-type conditions: EX =, EX = 1. As it follows from the results of [9, 16], the supremum of a linear 7

with respect to the d.f. Fx)) functional JX, g) under two linear equality-type conditions EX =, EX = 1 is attained at the distributions concentrated in at most three points. Before passing to checking three- and two-point distributions, recall that for L 1 -metric the following representation in terms of the mean metric holds as well see, e.g. [4, 1.3]): and hence, L 1 X,Y) = κx,y) L 1 X,X ) = PX < u) PY < u) du, E X E Y <, Fu) F u) du = L 1 X,X ) = Fu) EX IX < u) du. Fu) EX IX < u) du. Let the r.v. X take two values and satisfy the conditions EX =, EX = 1. Then its distribution should necessarily have the form P X = ) q/p = p = 1 P X = ) p/q, q = 1 p,1). It is easy to see that EX 3 = q p)/ pq, E X 3 = p +q )/ pq. Then and hence L 1 X,X ) =, u p/q, EX IX < u) = p, p/q < u q/p, 1, q/p < u, EX IX < u) Fu) = p q)i p/q < u q/p ), p q I p/q < u q/p ) du = p q pq = EX 3 E X 3 by virtue of the Jensen inequality, thus, the statement of the theorem holds. Moreover, for any ε > L 1 X,X ) E X 3 = 1 p 1 p > 1 ε 1 p+p for all < p < ε/. Now consider a r.v. X taking exactly three values. Note that { L1 X,X ) } { sup : EX =, EX L1 X,X )σ } = 1 = sup : σ >, EX =, EX = σ, E X 3 E X 3 where the supremums are taken over three-point distributions of the r.v. X. Let X take values x,y,z with probabilities p,q,r > respectively, p+q +r = 1. Without loss of generality one can assume that x < y < z. From the conditions EX =, EX = σ we find that p = σ +yz z x)y x), q = σ +xz z y)y x), r = σ +xy z x)z y), yz < σ < xz. 8

For all u R we have EX IX < u) = σ EX IX < u) Fu) =, u x, px, x < u y, px +qy, y < u z, σ, z < u,, u x, px /σ 1), x < u y, px +qy )/σ p q, y < u z,, z < u. Noticing that px +qy )/σ p q = σ rz )/σ 1+r = r1 z /σ ), we obtain L 1 X,X ) = Fu) σ EX IX < u) du = p x y 1 σ 1 z z x)+r y). σ Consider the function L 1 X,X )σ E X 3 = py x) x σ +rz y) z σ +px 3 +qy 3 rz 3 = = 1 x σ σ +yz)+ z σ σ +xy) z3 σ +xz) + z x z y +σ z x xy +yz)+xyzz x)+xz 3 ) gx,y,z,σ ). The statement of the theorem is equivalent to supgx,y,z,σ ) =, where the sumpremum is taken over all σ >, x < y < z such that yz < σ < xz. Note that it suffices to consider only σ < max{x,z }, since the opposite inequality with q > ) implies that EX < σ. So, there are only three possibilities: 1) < σ < min{x,z }, ) x σ < z, 3) z σ < x. Opening the modules, we notice that gx,y,z,σ ) is a parabola with respect to σ on each of the intervals specified above. Consider the behavior of gx,y,z,σ ) on each of these intervals. 1. < σ < min{x,z }, then necessarily σ < xz and gx,y,z,σ ) = σ +xy)yz +σ z y)). z y)z x) The coefficient /z x) at σ 4 is negative, thus the branches of this parabola with respect to σ look down and the maximum value of the function gz,y,z,σ ) within the interval < σ < min{x,z } is attained either at the vertex σ = +xz xy) yz, z y) if σ > yz, or at the point σ yz +, if σ yz. We have σ +yz = yzz x) yz x)) z y) since x < y < z, with the equality attained if and only if y =. Thus, the supremum is attained as σ yz +, which implies that p and reduces the problem to checking two-point distributions considered above. 9,

. z σ < x, then by virtue of the conditions r >, z >. 3. x σ < z, then the function gx,y,z,σ ) = z3 σ +xy) z x)z y) = rz3 < gx,y,z,σ ) = σ x z y)+y z x)+xyz)+xyzxy xz yz) z x)z y) is linear and decreases monotonically in σ, since x z y)+y z x)+xyz >. Thus, if x yz, then the supremum of gx,y,z,σ ) is supplied by σ yz +, which reduces the problem to checking two-point distributions considered above. If x > yz, then the supremum of gx,y,z,σ ) is attained at σ = x. With this value of σ we have gx,y,z,x ) = xx+y)x z yx +yz ). z x)z y) y gx,y,z,x ) = xx+z)yzz y)+xz y) ) z x)z y) <, since x < z and x < y < z. Thus, the supremum of gx,y,z,x ) over all y such that x < y and yz < x = σ is supplied by y max{x, x /z}+ = x /z +, i.e., p, which reduces the problem to checking two-point distributions considered above. Thus, the theorem is completely proved. References [1] L. H. Y. Chen, L. Goldstein, Q.-M. Shao. Normal approximation by Stein s method. Springer, Berlin, Heidelberg, 11. [] S. V. Gavrilenko. An improvement of the nonuniform estimates of convergence rate of distributions of Poisson random sums to the normal law. Informatics and its Applications in Russian), 51):1 4, 11. [3] L. Goldstein. L 1 bounds in normal approximation. Ann. Probab, 355):1888 193, 7. [4] L. Goldstein. Bounds on the constant in the mean central limit theorem. Ann. Probab, 384):167 1689, 1. arxiv:91.76, 9. [5] L. Goldstein, G. Reinert. Stein s method and the zero bias transformation with application to simple random sampling. Ann. Appl. Probab., 74):935 95, 1997. [6] L. Goldstein, G. Reinert. Distributional transformations, orthogonal polynomials, and Stein characterizations. J. Theor. Probab., 181):37 6, 5. [7] L. Goldstein, Y. Rinott. On multivariate normal approximations by Stein s method and size bias couplings. J. Appl. Probab., 33:1 17, 1996. 1

[8] M. E. Grigorieva, S. V. Popov. An upper bound for the absolute constant in the nonuniform version of the Berry Esseen inequalities for nonidentically distributed summands. Dokl. Math., 861):54 56, 1. [9] W. Hoeffding. The extrema of the expected value of a function of independent random variables. Ann. Math. Statist., 6):68 75, 1955. [1] V. Korolev, I. Shevtsova. An improvement of the Berry Esseen inequality with applications to Poisson and mixed Poisson random sums. Scand. Actuar. J., 1):81 15, 1. Available online since 4 June 1. [11] V. Yu. Korolev, I. G. Shevtsova. An improvement of the Berry Esseen inequalities. Dokl. Math., 811):119 13, 1. [1] V. Yu. Korolev, I. G. Shevtsova. On the upper bound for the absolute constant in the Berry Esseen inequality. Theory Probab. Appl., 544):638 658, 1. [13] V. Yu. Korolev, I. G. Shevtsova. Sharpened upper bounds for the absolute constant in the Berry Esseen inequality for mixed Poisson random sums. Dokl. Math., 81):18 18, 1. [14] V. Yu. Korolev, I. G. Shevtsova. A new moment-type estimate of convergence rate in the Lyapunov theorem. Theory Probab. Appl., 553):55 59, 11. [15] E. Lukacs. Characteristic Functions. Griffin, London, nd edition, 197. [16] H. P. Mulholland, C. A. Rogers. Representation theorems for distribution functions. Proc. London Math. Soc., 8):177 3, 1958. [17] Yu. S. Nefedova, I. G. Shevtsova. Structural improvement of nonuniform estimates for the rate of convergence in the central limit theorem with applications to Poisson random sums. Dokl. Math., 84):675 68, 11. [18] Yu. S. Nefedova, I. G. Shevtsova. On non-uniform convergence rate estimates in the central limit theorem. Theory Probab. Appl. in Russian), 571):6 97, 1. [19] E. Peköz, A. Röllin, N. Ross. Degree asymptotics with rates for preferential attachment random graphs. Ann. Appl. Probab., 33):1188 118, 13. [] I. Shevtsova. On the absolute constants in the Berry Esseen type inequalities for identically distributed summands. arxiv:1111.6554, 11. [1] C. Stein. Estimation of the mean of a multivariate normal distribution. Ann. Statist., 996):1135 1151, 1981. [] I. Tyurin. New estimates of the convergence rate in the Lyapunov theorem. arxiv:91.76, 9. [3] I. S. Tyurin. On the accuracy of the Gaussian approximation. Dokl. Math., 83):84 843, 9. [4] V. M. Zolotarev. Modern Theory of Summation of Random Variables. VSP, Utrecht, The Netherlands, 1997. 11