TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

Similar documents
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Eigenvalues of Random Graphs

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Edge Isoperimetric Inequalities

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: September 12

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Finding Dense Subgraphs in G(n, 1/2)

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

APPENDIX A Some Linear Algebra

Limited Dependent Variables

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

s: 1 (corresponding author); 2

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Boning Yang. March 8, 2018

Dirichlet s Theorem In Arithmetic Progressions

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

The Geometry of Logit and Probit

Complete subgraphs in multipartite graphs

Maximizing the number of nonnegative subsets

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Randić Energy and Randić Estrada Index of a Graph

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

Distribution of subgraphs of random regular graphs

Probability and Random Variable Primer

The Order Relation and Trace Inequalities for. Hermitian Operators

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Strong Markov property: Same assertion holds for stopping times τ.

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

This document is downloaded from the Digital Open Access Repository of VTT. VTT P.O. box 1000 FI VTT Finland

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

The Expectation-Maximization Algorithm

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

P exp(tx) = 1 + t 2k M 2k. k N

Assortment Optimization under MNL

Lecture 3: Probability Distributions

Affine transformations and convexity

A new construction of 3-separable matrices via an improved decoding of Macula s construction

The lower and upper bounds on Perron root of nonnegative irreducible matrices

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Generalized Linear Methods

An (almost) unbiased estimator for the S-Gini index

MAT 578 Functional Analysis

Solutions to exam in SF1811 Optimization, Jan 14, 2015

b ), which stands for uniform distribution on the interval a x< b. = 0 elsewhere

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

SELECTED SOLUTIONS, SECTION (Weak duality) Prove that the primal and dual values p and d defined by equations (4.3.2) and (4.3.3) satisfy p d.

Conjugacy and the Exponential Family

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Another converse of Jensen s inequality

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

Finding Primitive Roots Pseudo-Deterministically

Lecture Notes on Linear Regression

18.1 Introduction and Recap

Problem Set 9 Solutions

STAT 3008 Applied Regression Analysis

SELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Restricted divisor sums

Expected Value and Variance

On the size of quotient of two subsets of positive integers.

Randomness and Computation

Notes on Frequency Estimation in Data Streams

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

Vapnik-Chervonenkis theory

On mutual information estimation for mixed-pair random variables

Lecture Space-Bounded Derandomization

CSCE 790S Background Results

Foundations of Arithmetic

A Note on Bound for Jensen-Shannon Divergence by Jeffreys

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

Convergence of random processes

The internal structure of natural numbers and one method for the definition of large prime numbers

First Year Examination Department of Statistics, University of Florida

SUCCESSIVE MINIMA AND LATTICE POINTS (AFTER HENK, GILLET AND SOULÉ) M(B) := # ( B Z N)

Course 395: Machine Learning - Lectures

Convexity preserving interpolation by splines of arbitrary degree

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Modelli Clamfim Equazione del Calore Lezione ottobre 2014

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]

Convergence of option rewards for multivariate price processes

Bernoulli Numbers and Polynomials

Linear Regression Analysis: Terminology and Notation

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

Google PageRank with Stochastic Matrix

Credit Card Pricing and Impact of Adverse Selection

Appendix B. Criterion of Riemann-Stieltjes Integrability

Exercises of Chapter 2

Large Deviations and Multinomial Probit Choice

arxiv: v3 [math.nt] 28 Apr 2011

Lecture 4. Instructor: Haipeng Luo

Global Sensitivity. Tuesday 20 th February, 2018

STEINHAUS PROPERTY IN BANACH LATTICES

Transcription:

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent parameters. 1. Introducton and notaton Let X n X, where n 1 and X, 1,..., n, are ndependent geometrc random varables wth possbly dfferent dstrbutons: X Ge wth 0 < 1,.e., PX k 1 k 1, k 1, 2,.... 1.1 Our goal s to estmate the tal probabltes PX x. Snce X s ntegervalued, t suffces to consder nteger x. However, t s convenent to allow arbtrary real x, and we do so. We defne µ : E X E X 1, 1.2 : mn. 1.3 We shall see that plays an mportant role n our estmates, whch roughly speakng show that the tal probabltes of X decrease at about the same rate as the tal probabltes of Ge,.e., as for the varable X wth smallest and thus fattest tal. Recall the smple and well-known fact that 1.1 mples that, for any non-zero z such that z 1 < 1, E z X z k PX k k1 z 1 1 z z 1 1 +. 1.4 For future use, note that snce x ln1 x s convex on 0, 1 and 0 for x 0, ln1 x x ln1 y, 0 < x y < 1. 1.5 y Date: 28 June, 2014; typo corrected 24 September, 2017. Partly supported by the Knut and Alce Wallenberg Foundaton. 1

2 SVANTE JANSON Remark 1.1. The theorems and corollares below hold also, wth the same proofs, for nfnte sums X X, provded E X p 1 <. Acknowledgement. Ths work was ntated durng the 25th Internatonal Conference on Probablstc, Combnatoral and Asymptotc Methods for the Analyss of Algorthms, AofA14, n Pars-Jusseu, June 2014, n response to a queston by Donald Knuth. I thank Donald Knuth and Coln McDarmd for helpful dscussons. 2. Upper bounds for the upper tal We begn wth a smple upper bound obtaned by the classcal method of estmatng the moment generatng functon or probablty generatng functon and usng the standard nequalty an nstance of Markov s nequalty PX x z x E z X, z 1, 2.1 or equvalently PX x e tx E e tx, t 0. 2.2 Cf. the related Chernoff bounds for the bnomal dstrbuton that are proved by ths method, see e.g. [3, Theorem 2.1], and see e.g. [1] for other applcatons of ths method. See also e.g. [2, Chapter 2] or [4, Chapter 27] for more general large devaton theory. Theorem 2.1. For any p 1,..., p n 0, 1] and any λ 1, PX λµ e p µλ 1 ln λ. 2.3 Proof. If 0 t <, then e t 1 + t > 0, and thus by 1.4, E e tx e t p 1 + t 1 t 1. 2.4 Hence, f 0 t < mn, then E e tx E e tx 1 t 1 2.5 and, by 2.2, PX λµ e tλµ E e tx exp tλµ + ln 1 t. 2.6 By 1.5 and 0 < / 1, we have, for 0 t <, ln 1 t p ln 1 t. 2.7 Consequently, 2.6 yelds PX λµ exp tλµ ln exp tλµ µ ln 1 t 1 t. 2.8

TAIL BOUNDS FOR GEOMETRIC AND EXPONENTIAL VARIABLES 3 Choosng t 1 λ 1 whch s optmal n 2.8, we obtan 2.3. As a corollary we obtan a bound that s generally much cruder, but has the advantage of not dependng on the s at all. Corollary 2.2. For any p 1,..., p n 0, 1] and any λ 1, PX λµ λe 1 λ eλe λ. 2.9 Proof. Use µ 1/ for each, and thus µ 1 n 2.3. Alternatvely, use t 1 λ 1 /µ n 2.8. The bound n Theorem 2.1 s rather sharn many cases. Also the cruder 2.9 s almost sharp for n 1 a sngle X and small p 1 ; n ths case µ 1/p 1 and PX λµ 1 p 1 λµ 1 exp λ + Oλp 1. 2.10 Nevertheless, we can mprove 2.3 somewhat, n partcular when mn s not small, by usng more careful estmates. Theorem 2.3. For any p 1,..., p n 0, 1] and any λ 1, PX λµ λ 1 1 λ 1 ln λµ. 2.11 The proof s gven below. We note that Theorem 2.3 mples a mnor mprovement of Corollary 2.2: Corollary 2.4. For any p 1,..., p n 0, 1] and any λ 1, Proof. Use 2.11 and 1 µ e p µ e 1. PX λµ e 1 λ. 2.12 We begn the proof of Theorem 2.3 wth two lemmas yeldng a mnor mprovement of 2.1 usng the fact that the varables are geometrc. The lemmas actually use only that one of the varables s geometrc. Lemma 2.5. For any ntegers j and k wth j k, PX j 1 j k PX k. 2.13 For any real numbers x and y wth x y, PX x 1 x y+1 PX y. 2.14 Proof.. We may wthout loss of generalty assume that p 1. Then, for any ntegers, j, k wth j k, PX j X X 1 PX 1 j 1 j 1 +, 2.15 and smlarly for PX k X X 1. Snce j 1 + j k + k 1 +, t follows that PX j X X 1 1 j k PX k X X 1 2.16 for every, and thus 2.13 follows by takng the expectaton.

4 SVANTE JANSON. For real x and y we obtan from 2.13 PX x PX x 1 x y PX y 1 x y+1 PX y. 2.17 Lemma 2.6. For any x 0 and z 1 wth z1 < 1, PX x 1 z1 z x E z X. 2.18 Proof. Snce z 1, 2.13 mples that for every k 1, X 1 E z X Ez X 1{X k} E z k + z 1 z 1{X j k} E z k 1{X k} + z 1 z k PX k + z 1 jk z j 1{X j + 1} jk z j PX j + 1 jk z k PX k 1 + z 1 z j k 1 j+1 k jk z k PX k 1 + z 11 p 1 z1 z k PX k 1 z1. 2.19 The result 2.18 follows when x k s a postve nteger. The general case follows by takng k max x, 1 snce then PX x PX k. Proof of Theorem 2.3. We may assume that < 1. Otherwse every 1 and X 1 a.s., so X n µ a.s. and the result s trval. We then choose.e., z : z 1 λ1 λ λ λ1, 2.20 1 λ 1 λ ; 2.21 note that z 1 1 so z 1 and z 1 > 1 1 for every. Thus, by 1.4, E z X E z X z 1 1 + 1 1 1 z 1 /. 2.22

TAIL BOUNDS FOR GEOMETRIC AND EXPONENTIAL VARIABLES 5 By 2.22, 2.7 wth t 1 z 1 < and 2.21, ln E z X ln 1 1 z 1 ln 1 1 z 1 Furthermore, by 2.20, p ln 1 λ 1 µ ln 1 µ ln λ. λ λ 1 1 z1 Hence, Lemma 2.6, 2.20 and 2.23 yeld where ln PX λµ ln λ λµ ln z + ln E z X 2.23 1 λ /λ 1 λ. 2.24 ln λ λµ ln λ λ1 + µ ln λ 1 ln λ + λµ ln1 + µfλ, 2.25 fλ : λ ln λ + ln λ λ 1 λ lnλ + λ ln λ ln1. 2.26 We have f1 ln1 and, for λ 1, usng 1.5, f λ lnλ + ln λ ln 1 p 1 λ λ ln1. 2.27 Consequently, by ntegratng 2.27, for all λ 1, fλ ln1 ln λ ln1, 2.28 and the result 2.11 follows by 2.25. Remark 2.7. Note that for large λ, the exponents above are roughly lnear n λ, whle for λ 1+o1 we have λ 1 ln λ 1 2 λ 12 so the exponents are quadratc n λ 1. The latter s to be expected from the central lmt theorem. However, f λ 1 + ε wth ε very small and the central lmt theorem s applcable, then PX 1 + εµ s roughly exp ε 2 µ 2 /2σ 2, where σ 2 Var X n Var X n 1. Hence, n ths case the p 2 exponents n 2.3 and 2.11 are asymptotcally too small by a factor of rougly, for small, µ µ 2 /σ 2 p n p 2, 2.29 whch may be much smaller than 1. p 1 p 2 /n 1/3. n p 1 For example f p 2 p n and

6 SVANTE JANSON 3. Upper bounds for the lower tal We can smlarly bound the probablty PX λµ for λ 1. We gve only a smple bound correspondng to Theorem 2.1. Note that λ 1 ln λ > 0 for both λ 0, 1 and λ 1,. Theorem 3.1. For any p 1,..., p n 0, 1] and any λ 1, PX λµ e p µλ 1 ln λ. 3.1 Proof. We follow closely the proof of Theorem 2.1. If t 0, then by 1.4, E e tx 1 + t 1. 3.2 Hence E e tx e t 1 + E e tx t + 1 + t 1 3.3 and, n analogy to 2.2, PX λµ e tλµ E e tx exp tλµ ln 1 + t. 3.4 In analogy wth 2.7, stll by the convexty of ln x, ln 1 + t p ln 1 + t, 3.5 and 3.4 yelds PX λµ exp tλµ ln exp tλµ µ ln Choosng t λ 1 1, we obtan 3.1. 1 + t 4. A lower bound 1 + t. 3.6 We show also a general lower bound for the upper tal probabltes, whch shows that for constant λ > 1, the exponents n Theorems 2.1 and 2.3 are at most a constant factor away from best possble. Theorem 4.1. For any p 1,..., p n 0, 1] and any λ 1, PX λµ 1 1+1/p 1 λ 1µ. 4.1 2 µ Lemma 4.2. If A 1 and 0 x 1/A, then A x + ln1 x ln 1 Ax 2 /2. 4.2

TAIL BOUNDS FOR GEOMETRIC AND EXPONENTIAL VARIABLES 7 Proof. Let fx : A x + ln1 x ln 1 Ax 2 /2. Then f0 0 and f x A 1 1 Ax + 1 x 1 Ax 2 /2 Ax 1 x + Ax 1 Ax 2 0 4.3 /2 for 0 x < 1/A 1, snce then 0 < 1 x 1 Ax 2 /2. Hence fx 0 for 0 x 1/A. Proof of Theorem 4.1. Let ε : 1/ µ. By Theorem 3.1 wth λ 1 ε and Lemma 4.2 wth A µ 1, PX 1 εµ exp µ ε ln1 ε 1 µε 2 Hence, PX 1 εµ 1/2 µ, and by Lemma 2.5, 2 1 1 2 µ. 4.4 PX λµ 1 λ 1+εµ+1 PX 1 εµ 1 λ 1+εµ+1 1 2 µ, whch completes the proof snce εµ 1/. 5. Exponental dstrbutons In ths secton we assume that X n X where X, 1,..., n, are ndependent random varables wth exponental dstrbutons: X Expa, wth densty functon a xe ax, x > 0, and expectaton E X 1/a. Thus a can be nterpreted as a rate. The exponental dstrbuton s the contnuous analogue of the geometrc dstrbutons, and the results above have smpler analogues for exponental dstrbutons. We now defne µ : E X E X 1 a, 5.1 a : mn a. 5.2 Theorem 5.1. Let X n X wth X Expa ndependent. For any λ 1, PX λµ λ 1 e a µλ 1 ln λ. 5.3 For any λ 1, we have also the smpler but weaker For any λ 1, v For any λ 1, PX λµ e 1 λ. 5.4 PX λµ e a µλ 1 ln λ. 5.5 PX λµ 1 2ea µ e a µλ 1. 5.6

8 SVANTE JANSON Proof. Let X N Gea /N be ndependent for N > max a. Then X N /N d d X, where denotes convergence n dstrbuton, and thus X N /N d X, where X N : n XN. Furthermore, µ N : E X N Mν and : mn a /N a /N. The results follow by takng the lmt as N n 2.11, 2.12, 3.1 and 4.1. Alternatvely, we may mtate the proofs above, usng E e tx a /a t for t < a. References [1] Stéphane Boucheron, Gábor Lugos and Pascal Massart, Concentraton Inequaltes, Oxford Unv. Press, Oxford, 2013. [2] Amr Dembo and Ofer Zetoun, Large Devatons Technques and Applcatons. 2nd ed., Sprnger, New York, 1998. [3] Svante Janson, Tomasz Luczak & Andrzej Rucńsk, Random Graphs. Wley, New York, 2000. [4] Olav Kallenberg, Foundatons of Modern Probablty. 2nd ed., Sprnger, New York, 2002. Department of Mathematcs, Uppsala Unversty, PO Box 480, SE-751 06 Uppsala, Sweden E-mal address: svante.janson@math.uu.se URL: http://www2.math.uu.se/ svante/