On large deviations for combinatorial sums

arxiv:1901.0444v1 [math.pr] 14 Jan 019 On large deviations for combinatorial sums Andrei N. Frolov Dept. of Mathematics and Mechanics St. Petersburg State University St. Petersburg, Russia E-mail address: Andrei.Frolov@pobox.spbu.ru January 15, 019 Abstract We investigate asymptotic behaviour of probabilities of large deviations for normalized combinatorial sums. We find a zone in which these probabilities are equivalent to the tail of the standard normal law. Our conditions are similar to the classical Bernstein condition. The range of the zone of the normal convergence can be of power order. AMS 000 subject classification: 60F05 Key words: combinatorial central limit theorem, combinatorial sum, large deviations 1 Introduction Let { X nij,1 i,j n,n =,3,...} be a sequence of matrices of independent random variables and { π n = π n 1),π n ),...,π n n)), n =,3,...} be a sequence of random permutations of numbers 1,,...,n. Assume that π n has the uniform distribution on the set of permutations of 1,,...,n and it is independent with X nij for all n. Define the combinatorial sum S n by relation S n = i=1 X niπni) Under certain conditions, a sequence of distributions of combinatorial sums converges weakly to the standard normal law. Every such result is called a combinatorial central limit theorem CLT). This investigation was supported by RFBR, research project No. 18 01 00393 1

Investigations in this direction have a long history. One can find results on combinatorial CLT in Wald and Wolfowitz [1], Noether [], Hoeffding [3], Motoo [4], Kolchin and Chistyakov [5]. Further, non-asymptotic Esseen type bounds have been derived for accuracy of normal approximation of distributions of combinatorial sums. Such results have been obtained in Bolthausen [6], von Bahr [7], Ho and Chen [8], Goldstein [9], Neammanee and Suntornchost [10], Neammanee and Rattanawong [11], Chen, Goldstein and Shao [1], Chen and Fang [13], Frolov [14, 15], and in Frolov [16] for random combinatorial sums. Note that if X nij are identically distributed for all 1 j n and n, then the combinatorial sum has the same distribution as that of independent random variables. This case is well investigated, but one has to take it into account for estimation of optimality of derived results. Besides some partial cases, combinatorial sums have not independent increments. Hence, it is difficult to use classical methods of proofs for Esseen type inequalities those are based on bounds for differences of characteristic functions c.f.). One usually applies the Stein method. For combinatorial sums, it yields Esseen type inequalities for random variables with finite third moments. Applying of the truncation techniques, Frolov [14, 15] derived generalizations of these results to the case of finite moments of order +δ and for infinite variations as well. Every bound in CLT similar to the Esseen inequality yields results on asymptotic behaviour for large deviations coinciding with that for tail of the normal law in a logarithmic zone. Such results are usually called moderate deviations. Moderate deviations for combinatorial sums have been investigated in Frolov [17]. In this paper, we derive new results on the asymptotic behaviour for large deviations of combinatorial sums in power zones. Note that ranges of power zones are powers from some characteristic similar to the Lyapunov ratio. Indeed, we deal with non-identically distributed random variables. Even for sums of independent random variables, ranges of zones of the normal convergence depend on the Lyapunov ratios. For identically distributed random variables, this yields that the ranges are powers from the number of summands. But the last case corresponds to the classical theory for sums of independent random variables and it is not new therefore. In our proofs, we will use the method of conjugate distributions. Note that von Bahr [7] developed a method to bound distances between c.f. s of normalized combinatorial sums and normal law. Assuming that random variables are bounded or satisfy certain analogue of the classical Bernstein condition, we conclude that moment generating functions m.g.f.) of normalised combinatorial sums are analytic in a circle of the complex plane. Adopting the Bahr s method, we will bound the difference between m.g.f. s in some circle. In view of the analytic property, this will also give bounds for derivatives of m.g.f. s. Hence, we will arrive at desired asymptotics for m.g.f. s and their first and second logarithmic derivatives which are means and variations of random variables being conjugate for normalized combinatorial sums. Then we will estimate a closeness of distributions of conjugate random variables and the standard normal law. Using relationship between distributions and conjugate ones, we will derive the asymptotics of large deviations under consideration.

Results Let { X nij,1 i,j n,n =,3,...} be a sequence of matrices of independent random variables such that EX nij = EX nij = 0 1) i=1 for all n. Let { π n = π n 1),π n ),...,π n n)), n =,3,...} be a sequence of random permutations of numbers 1,,...,n. Assume that π n has the uniform distribution on the set of permutation P n and it is independent with X nij for all n. Put S n = X niπni). It is not difficult to check that i=1 ES n = 0, DS n = ES n ES n ) = 1 n 1 EX nij ) + 1 n DX nij. Hence, condition 1) yields that combinatorial sums are centered at zero. Moreover, 1 DS n = EX nij ) + 1 EX nn 1) n nij. If DS n as n, then the main part of the variance is the normalized sum of second moments B n = 1 EX n nij. Therefore, in the sequel, we will use {B n } as norming sequence for S n. Our main result is as follows. Theorem 1. Let {M n } be a non-decreasing sequence of positive numbers such that for s = 1,,3, inequalities EXnij k Dk!Mn k s E X nij s ) hold for all k s, 1 i,j n and n, where D is an absolute positive constant. Put n γ n = max max EX nij EX nij E X nij 3 E X nij, max, max, i,j Bn i B n j B 3/ n nb. n Then for every sequence of positive numbers {u n } with u n, u 3 n = o n/γ n ) and u n = o B n /M n ) as n, relation ) P S n u n Bn 1 Φu n ) as n, 3) holds, where Φx) is the standard normal distribution function. 3

Note that γ n 1. This follows from the inequality max i EXnij B n. In- deed, assuming that max EXnij < B n, we arrive at the incorrect inequality i EXnij < nb n = n EXnij. Bahr [7] proved the following Esseen type inequality: supp S n < x ) B n Φx) A γ n, x n where A is an absolute positive constant. Hence the condition u 3 n = o n/γ n ) as n is natural for relation 3), giving exact non-logarithmic) asymptotics of large deviations. For identically distributed X nij, this condition turns to u n = on 1/6 ) as n. Note that the conditions u 3 n = o n/γ n ) and u n asn implyγ n / n 0 as n. Theorem 1 is stronger than the results in Frolov [14] since the zone of normal convergence may be of power order while it is logarithmic in [14]. Of course, this requires stronger moment assumptions. Condition ) is an analogue of the Bernstein condition which is a form of existence for the exponential moment. In classical theory, one mainly deals with centered random variables and the Berstein condition yields that the logarithm of the m.g.f. is asymptotically a quadratic function at zero. For combinatorial CLT, it is principally important that summands could be non-centered and even degenerate sometimes. In this case, the logarithm of m.g.f. may be a linear function in a neighbourhood of zero provided the mean is not zero. One can rewrite inequalities ) for k 3 as follows: EX k nij Dk!Mn k min E X nij s. 1 s 3 Hence, the Lyapunov inequality implies that the next condition is sufficient for ): the inequalities EXnij DM ne X nij and { ) } EXnij k Dk!Mn k min E X nij E Xnij 3, M n hold for all k 3, 1 i,j n and n. Consider two important examples in which condition ) is satisfied. 1. Bounded random variables. If there exists a non-decreasing sequence of positive constants {M n } such that P X nij M n ) = 1 for all 1 i,j n and n, then condition ) holds. For degenerate case with PX nij = c nij ) = 1 for all 1 i,j n and n, condition ) is fulfilled with M n = max i,j c nij for every n.. Exponential random variables. Let ξ and η be random variables having the exponential distributions with the parameters α and β correspondingly. Assume 4 M s n M n

that each random variable in every matrix X nij has one from four distributions of random variables ξ, ξ, η and η. Since Eξ k = α k and Eη k = β k for all k, condition ) holds with с M n = 1/minα,β). One can easily expand this example for a larger number of exponential distributions using for construction of matrices of X s. Parameters of these distributions may depend on n. Moreover, one can easily replace exponential distributions by Gamma ones. Note that γ n has an order of max{ n/b n, n/b n ) 3 } in the last example. It is also clear that the behaviour of γ n will be similar when every random variable X nij has one from k given distributions. In the last case, one says about k-sequences of matrix { X nij }. 3 Proofs For all i, j and n, put ϕ nij z) = Ee zx nij, ϕ n z) = Ee z Sn Bn, z C, where C is the set of complex numbers. We have e z ϕn z)= 1 n! n p n P n i=1 { )} z e z n ϕnipni) = 1 Bn n! n p n P n i=1 { } 1+bnipni). 4) Note that the last sum is the permanent of the matrix 1+b nij. To investigate its behaviour we will use the following result. Lemma 1. Let X be a random variable such that for s = 1,,3 the inequalities EX k Dk!M k s E X s 5) hold for all k s, where D and M are positive constants. Then Ee ux is an analytic function in the circle u 1/4M) and for every u,v C with v 1/ and u 1/8M), the inequalities EeuX v 1 C 1 u E X + v ), EeuX v 1 uex C u EX + v ), v EeuX v 1 uex + u EX C 3 u 3 E X 3 + v 3 ) hold, where constants C i depends on D and do not depend on M. Proof. By inequality 5) and Stirling s formula, we have E X k EX k Dk)!M k 1 E X D 1 k!m) k 5

for all k 1, where the constant D 1 depends on D, M and E X. Hence, the series k=0 converges in the circle u 1/4M). Put e n z) = u k k! E X k k=0 z k k!. Then e n ux ) e ux a.s. The monotone convergence theorem yields that Ee ux = lim n Ee n ux ) = k=0 u k k! E X k in the circle u 1/4M). In view of e ux e ux, the Lebesgue dominate convergence theorem implies that j=0 Ee ux = lim n Ee nux) = j=0 k=0 u k k! EXk in the circle u 1/4M). Put W = ux v /. For s = 1,,3 and k s, we have EW k = k ) k j k ) E C j k ux)j v C j v k u j EX j k j = T ks +T ks, where s 1 T ks = C j v k u j EX j ) k j k, T ks = C j v k u j EX j ) k j. j=0 By inequalities 5), we get T ks Dk! u s E X s j=0 j=s k C j v k um j s j=s k s Dk! u s E X s C j+s k um j v k s Dk! u s E X s k s C j v k s um j j=0 ) k j ) k j s ) k j s ) k s Dk! u s E X s k s um + v Dk! u s E X s k s 4 k+s 6

for all k s and s = 1,,3. Since v 1/, we have T k1 = v k k v 8 k for all k 1. Hence, EeuX v 1 = 1 k! EWk 1 1 k! EWk k! T k1 +T k1 ) 4D u E X k4 k 8 k + v C 1 u E X + v ). k! The first inequality follows. Making use of the inequalitya a +1 fora = u EX, the Lyapunov inequality and v 1/, we obtain v k T k = k u EX v k k 1 + v k k k u EX) v k k 1 + k k 1 + v k k 4k8 k u EX +16k v 8 k +4 v 8 k 4k8 k u EX +0k v 8 k for all k. It follows that 1 uex = EeuX v v +16D +1) u EX v + 1 k! EWk k= k 4 k +0 v k= v + k= 8 k k! k= 1 k! T k +T k ) C u EX + v ). The second inequality is proved. Applying the inequality a a +1 fora = u EX and the Lyapunov inequality, we have T k3 kk 1) = kk 1) u EX v k 4 k u EX v k k 1 + k u EX v k 4 k +k u EX v k k 1 + v k k k + k u EX v k k 1 + k v k k 1 + v k k. v k k 1 + v k k Using a a + 1 for a = u EX, the Lyapunov inequality, inequality 5) and 7

v 1/, we further get T k3 k u 4 EX ) v k k 1 + k v k k 1 + k v k k 1 + v k k k u 4 EX 4 v k k 1 +k v k k 1 k u 4 D4!ME X 3 v k k 1 + 7 k v 3 8 k 3D k u 3 E X 3 v k k 1 + 7 k v 3 8 k 1Dk u 3 E X 3 8 k + 7 k v 3 8 k for all k 3. It yields that v EeuX v 1 uex + u EX = u EX + 1 1 EW + k! EWk k=3 u EX + 1 EW + 1 k! T k3 +T k3 ) k=3 u EX + 1 EW +76D u 3 E X 3 k 3 4 k + 7 v 3 k 8 k k! k=3 k=3 v 4 EW +ux) +76D u 3 E X 3 k 3 4 k + 7 v 3 k 8 k k! C 3 u 3 E X 3 + v 3 ). The lemma is proved. Proof of Theorem 1. By Lemma 1 with X = X nij, u = z/ B n, v = z/ n and M = M n, for all n, i and j, the inequalities z b nij C 1 E X nij + z ), 6) Bn n b nij z ) z EX nij C EXnij Bn B + z, 7) n n b nij z ) EX nij + z Bn n z EXnij B n C z 3 3 8) k=3 B 3/ n k=3 E X nij 3 + z 3 n 3/ hold for every z in the circle z min{ n, B n /M n }/8. Relations 7) and 1) imply that b n j = b nij = b nij z ) ) EX nij C z 1+ 1 EXnij 9) Bn B n and i=1 i=1 b ni = b nij C z 1+ 1 B n 8 EX nij i=1. 10)

It follows from relations 8) and 1) that b n = b nij = b nij z EX nij + z Bn n z EX B nij) n C 3 z 3 1 n 1+ E X 3/ nij 3. 11) nb n From relations 6) and 9) 11), the definition of γ n and γ n 1, we have b nij C 1 γ n +1) z n C 1 γ n z n, 1) b n j C γ n +1) z n C γ n z n, 13) b ni C γ n z n, 14) i=1 b n C 3 γ n +1) z 3 n C 3 γ n z 3 n. 15) Note that the function ϕ n it), t R, is the c.f. for the normalized combinatorial sum. In Bahr [7], relations 4) and 1) 15) for z = it and t 0 have been used to bound the distance between ϕ n it) and the c.f. of the standard normal law. The bounds for b nij from there will coincide with our ones provided we change t by z. Hence, we borrow one further bound from [7] with a formal replacing t by z. We use the first formula from p. 137 in [7] with C 3 γ n instead of δ. Then we have e z ϕn z) 1 1 1 γ n z 8eC 3 ) k 3 + 1 k! n ) 4e γ n z k C 4 n for all z in the circle z min{ n, B n /M n }/8, where C 4 is an absolute positive constant. Hence, e z ϕn z) 1 1 1 γ n z C 3 ) k 5 + 1 k! n where C 5 = max{8ec 3,4e C 4 }. If z n/c 5 γ n ), then e z ϕn z) 1 1 C 5 γ n z 3 n C 5 γ n z n ) k, 1 γ n z C 3 ) k 1 γ 5 n z +C 5 k 1)! n n k. It follows that ϕ n z) 1 C γ n z 3 5 e C 5 γn z n n e z 3 +C 5 γ n z n = g n z ) 16) 9

for all z in the circle z C 7 min{ n/γ n, B n /M n } = y n. Let {x n } be a sequence of positive numbers that will be chosen later. Assume that x n y n /16. The function f n z) = e z ϕ n z) 1 is analytic in the circle z 16x n. Hence, f n z) = a nk z k, f n z) = a nk kz k 1, f n z) = a nk kk 1)z k, where by the Cauchy inequalities, the coefficients a nk satisfy to the relations Put a nk 8x n ) k sup z =8x n fz) 8x n ) k g n 8x n ). m n z) = ϕ n z) ϕ n z), σ nz) = ϕ n z) ϕ n z) Then, in the circle z 4x n, the inequalities m n z) z)e z / ϕ n z) = f n z) g n8x n ) 4x n σn z) 1+m nz) z) ) e z / ϕ n z) = f C 9 g n 8x n ) x n ϕ ) n z), z C. ϕ n z) k k C 8 g n 8x n ) x n, 17) n z) g n8x n ) 4x n ) kk 1) k hold. This and inequality 16) yield that m n z) z m n z) z)e z / ϕ n z) + m n z) z) e z / ϕ n z) 1) C 8 g n 8x n ) x n + m n z) z g n 8x n ) for z 4x n. Hence, 18) m n z) z C 8 g n 8x n ) x n 1 g n 8x n )) 19) for z 4x n. Further, making use of relations 17) 19), we get σn z) 1 σn z) 1) e z / + ϕ n z) 1) σn z) 1+m nz) z) ) e z / ϕ n z) + m n z) z) e z / ϕ n z) σ n z) 1 gn x n )+C 9 g n 8x n ) x n 10 +C 8 gn 8x n) x n 1 g n8x n ))

for z 4x n. It follows that σ n z) 1 C 10 g n8x n )+g n 8x n ) x n1 g n 8x n )) for z 4x n. Let {h n } be a sequence of positive numbers. Let S n be a random variable conjugate to S n / B n, i.e. S n has the following distribution function 0) P S n < x ) = 1 ϕ n h n ) x e hnu dps n < u B n ), x R. Note that ES n = m n h n ) andds n = σ n h n). In the sequel, we takeh n such that relations 19) и 0) will yield m n h n ) = h n +o1) and σ n h n) = 1+o1). Hence, we investigate the distance between the standard normal law and the distribution of S n centered at and normalized by main terms of the mean and the variance. Denote and estimate Put R n v) = P S n h n < v ) Φv), v R, n = sup R n v). v R ψ n z) = Ee zsn hn) = e zhnϕ nz +h n ), z C. ϕ n h n ) It is clear that ψ n it) is a c.f. of the random variable S n m n h n ). We have e z / ψ n z) 1 = eh n / e z+hn) / ϕ n z +h n ) e h n / ϕ n h n ) ϕ n h n ) = eh n / ϕ n h n ) f nz +h n ) f n h n ) eh n / a nk z +h n ) k h k n ϕ n h n ) for z +h n 4x n. Since z +h n ) k h k n = zz +h n ) k 1 +h n z +h n ) k 1 h k n = zz +h n ) k 1 +zh n z +h n ) k +h n z +h n) k h k n k = = z z +h n ) k j h j 1 we obtain z+h n ) k h k n k z 4x n ) k 1 for z +h n 4x n. It follows from relations 16) and 17) that e z / ψ n z) 1 eh n / ϕ n h n ) z n, a nk k4x n ) k 1 11 1 1 g n h n ) z C g n 8x n ) 8 x n

for z +h n 4x n. Putting z = it, we get ψ n it) e t / C 8 t e t / g n 8x n ) x n 1 g n h n )) for all t x n and h n x n. By the Esseen inequality, we have n = sup R n v) 1 v R π Furthermore, x n x n ψ n it) e t / dt t + 4 1 π) 3/ C 11. 1) x n x n P S n m n h n ) ) B n = ϕ n h n ) e hnu dps n < u) m nh n) = ϕ n h n )e h n e hnv dp Sn h n < v ) m nh n) h n = ϕ n h n )e h n e hnv dφv)+r n v)). ) m nh n) h n We have m nh n) h n e hnv dφv) = = eh n / π e h n / π m nh n) h n e v+hn)/ dv e v / dv eh n m n hn))/ πmn h n ), 3) m nh n) provided m n h n ). Moreover, e hnv dr n v) = R n m n h n ) h n ) R n v)de hnv ) n. 4) m nh n) h n m nh n) h n Putx n = u n n, where n enough slowly to satisfyx n, x 3 n = o n/γ n ) and x n = o B n /M n ). Note that in view of relation 16), we have g n 8x n ) = o1). Let h n be a solution of the equation m n h n ) = u n. 5) The function m n h) is strictly increasing, m n 0) = 0 and, by relations 19) and 16), the inequalities m n 4x n ) = 4x n +ox 1 n ) x n > u n hold for all sufficiently large 1

n. It follows that the unique solution of equation 5) exists for all sufficiently large n. Moreover, relation 19) yields that and h n = u n +ox 1 n ) m nh n ) h n = h n +ox 1 n )) h n = oh n x 1 n ) = o1). It follows from relations 1) 4) and 16) that ) P S n u n Bn = ϕ n h n )e h n = e h n 1+o1) πmn h n ) +Ox 1 e h n m n hn))/ πmn h n ) 1+o1))+O n) n ) )1+o1)) = e u n +o1) 1+o1) +ou 1 n )1+o1)) ) = 1 Φu n ))1+o1)). πun Theorem 1 is proved. Finally, we mention some unsolved problems. In Frolov, Martikainen and Steinebach [18], one can find more exact results on large deviations for sums of independent random variables in the scheme of series. In there, the conditions are imposed on the logarithms of m.g.f. s of summands. Now we can not adopt the techniques from there to combinatorial sums. We see from relation 4) that the m.g.f. of S n / B n is the permanent of the matrix Ee zx nij/ B n. Above, the method of the investigation of the behaviour for this permanent implied bounds with γ n / n instead of analogues of the Lyaponov ratios. The second problem is that the proof in [18] involves some bounds in CLT which variants for combinatorial sums are unknown. Solutions of these problems could yield more exact results under weaker conditions. References [1] Wald A., Wolfowitz J.,1944. Statistical tests based on permutations of observations. Ann. Math. Statist. 15, 358 37. [] Noether G.E., 1949. On a theorem by Wald and Wolfowitz. Ann. Math. Statist. 0, 455 458. [3] Hoeffding W., 1951. A combinatorial central limit theorem. Ann. Math. Statist., 558 566. [4] Motoo M., 1957. On Hoeffding s combinatorial central limit theorem. Ann. Inst. Statist. Math. 8, 145 154. [5] Kolchin V.F., Chistyakov V.P. 1973) On a combinatorial limit theorem. Theor. Probab. Appl. 18, 78-739. [6] Bolthausen E., 1984. An estimate of the remainder in a combinatorial central limit theorem. Z. Wahrsch. verw. Geb. 66, 379 386. [7] von Bahr B., 1976. Remainder term estimate in a combinatorial central limit theorem. Z. Wahrsch. verw. Geb. 35, 131-139. 13 )

[8] Ho S.T., Chen L.H.Y., 1978. An L p bounds for the remainder in a combinatorial central limit theorem. Ann. Probab. 6, 31 49. [9] Goldstein L., 005. Berry-Esseen bounds for combinatorial central limit theorems and pattern occurrences, using zero and size biasing. J. Appl. Probab. 4, 661 683. [10] Neammanee K., Suntornchost J., 005. A uniform bound on a combinatorial central limit theorem. Stoch. Anal. Appl. 3, 559-578. [11] Neammanee K., Rattanawong P., 009. A constant on a uniform bound of a combinatorial central limit theorem. J. Math. Research 1, 91-103. [1] Chen L.H.Y., Goldstein L., Shao Q.M., 011. Normal approximation by Stein s method. Springer. [13] Chen L.H.Y., Fang X. 015) 0n the error bound in a combinatorial central limit theorem. Bernoulli, 1, N.1, 335-359. [14] Frolov A.N., 014. Esseen type bounds of the remainder in a combinatorial CLT. J. Statist. Planning and Inference, 149, 90 97. [15] Frolov A.N. 015a) Bounds of the remainder in a combinatorial central limit theorem. Statist. Probab. Letters 105, 37-46. [16] Frolov A.N. 015b) On the probabilities of moderate deviations for combinatorial sums. Vestnik St. Petersburg University. Mathematics, 48, No. 1, 3-8. Allerton Press, Inc., 015. [17] Frolov A.N. 017) On Esseen type inequalities for combinatorial random sums. Communications in Statistics -Theory and Methods. 46 1), 593-5940. [18] Frolov A.N., Martikainen A.I., Steinebach J. 1997) Erdös Rényi Shepp type laws in non-i.i.d. case. Studia Sci. Math. Hungar. 34, 165 181. 14