Pr[X (p + t)n] e D KL(p+t p)n.

Similar documents
D KL (P Q) := p i ln p i q i

2. Independence and Bernoulli Trials

Chain Rules for Entropy

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

Lecture 02: Bounding tail distributions of a random variable

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

Introduction to Probability

Lecture 4 Sep 9, 2015

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

Unit 9. The Tangent Bundle

The Occupancy and Coupon Collector problems

Dimensionality Reduction and Learning

= lim. (x 1 x 2... x n ) 1 n. = log. x i. = M, n

Mu Sequences/Series Solutions National Convention 2014

Econometric Methods. Review of Estimation

PTAS for Bin-Packing

Lecture 9. Some Useful Discrete Distributions. Some Useful Discrete Distributions. The observations generated by different experiments have

MATH 371 Homework assignment 1 August 29, 2013

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Entropy, Relative Entropy and Mutual Information

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

STK3100 and STK4100 Autumn 2017

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Application of Generating Functions to the Theory of Success Runs

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Third handout: On the Gini Index

2SLS Estimates ECON In this case, begin with the assumption that E[ i

5 Short Proofs of Simplified Stirling s Approximation

IS 709/809: Computational Methods in IS Research. Simple Markovian Queueing Model

Special Instructions / Useful Data

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables

Class 13,14 June 17, 19, 2015

Factorization of Finite Abelian Groups

MATH 247/Winter Notes on the adjoint and on normal operators.

Lecture 3 Probability review (cont d)

18.413: Error Correcting Codes Lab March 2, Lecture 8

Chapter 5 Properties of a Random Sample

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

STK3100 and STK4100 Autumn 2018

ρ < 1 be five real numbers. The

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

STK4011 and STK9011 Autumn 2016

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Median as a Weighted Arithmetic Mean of All Sample Observations

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

Random Variables and Probability Distributions

Introduction to local (nonparametric) density estimation. methods

A tighter lower bound on the circuit size of the hardest Boolean functions

Semi-Riemann Metric on. the Tangent Bundle and its Index

STRONG CONSISTENCY FOR SIMPLE LINEAR EV MODEL WITH v/ -MIXING

MA 524 Homework 6 Solutions

means the first term, a2 means the term, etc. Infinite Sequences: follow the same pattern forever.

Pseudo-random Functions

The Arithmetic-Geometric mean inequality in an external formula. Yuki Seo. October 23, 2012

Two Fuzzy Probability Measures

Chapter 3 Sampling For Proportions and Percentages

Lower and upper bound for parametric Useful R-norm information measure

Chapter 5 Properties of a Random Sample

On the introductory notes on Artin s Conjecture

Channel Models with Memory. Channel Models with Memory. Channel Models with Memory. Channel Models with Memory

Bayes (Naïve or not) Classifiers: Generative Approach

Probability and Statistics. What is probability? What is statistics?

Parameter, Statistic and Random Samples

PROJECTION PROBLEM FOR REGULAR POLYGONS

The Mathematical Appendix

Bounds for the Connective Eccentric Index

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

X ε ) = 0, or equivalently, lim

Lecture 9: Tolerant Testing

Algorithms Design & Analysis. Hash Tables

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

AN UPPER BOUND FOR THE PERMANENT VERSUS DETERMINANT PROBLEM BRUNO GRENET

Nonparametric Density Estimation Intro

Point Estimation: definition of estimators

Module 7: Probability and Statistics

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

Pseudo-random Functions. PRG vs PRF

Minkowski s inequality and sums of squares

Chapter 4 Multiple Random Variables

ON BIVARIATE GEOMETRIC DISTRIBUTION. K. Jayakumar, D.A. Mundassery 1. INTRODUCTION

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

CHAPTER VI Statistical Analysis of Experimental Data

Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134

Functions of Random Variables

Non-uniform Turán-type problems

#A27 INTEGERS 13 (2013) SOME WEIGHTED SUMS OF PRODUCTS OF LUCAS SEQUENCES

Law of Large Numbers

COMPUTERISED ALGEBRA USED TO CALCULATE X n COST AND SOME COSTS FROM CONVERSIONS OF P-BASE SYSTEM WITH REFERENCES OF P-ADIC NUMBERS FROM

1 Onto functions and bijections Applications to Counting

On generalized fuzzy mean code word lengths. Department of Mathematics, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India

1 Solution to Problem 6.40

The internal structure of natural numbers, one method for the definition of large prime numbers, and a factorization test

,m = 1,...,n; 2 ; p m (1 p) n m,m = 0,...,n; E[X] = np; n! e λ,n 0; E[X] = λ.

Laboratory I.10 It All Adds Up

Research Article A New Iterative Method for Common Fixed Points of a Finite Family of Nonexpansive Mappings

Transcription:

Cheroff Bouds Wolfgag Mulzer 1 The Geeral Boud Let P 1,..., m ) ad Q q 1,..., q m ) be two dstrbutos o m elemets,.e.,, q 0, for 1,..., m, ad m 1 m 1 q 1. The Kullback-Lebler dvergece or relatve etroy of P ad Q s defed as m D KL P Q) : l. q If m 2,.e., P, 1 ) ad Q q, 1 q), we also wrte D KL q). The Kullback-Lebler dvergece rovdes a measure of dstace betwee the dstrbutos P ad Q: t reresets the exected loss of effcecy f we ecode a m-letter alhabet wth dstrbuto P wth a code that s otmal for dstrbuto Q. We ca ow state the geeral form of the Cheroff Boud: Theorem 1.1. Let X 1,..., X be deedet radom varables wth X {0, 1} ad Pr[X 1], for 1,.... Set X : 1 X. The, for ay t [0, 1 ], we have 1 Pr[X + t)] e D KL+t ). 2 Four Proofs 2.1 The Momet Method The usual roof of Theorem 1.1 uses the exoetal fucto ex ad Markov s equalty. It s called momet method because ex smultaeously ecodes all momets of X,.e., X, X 2, X 3, etc. The roof techque s very geeral ad ca be used to obta several varats of Theorem 1.1. Let λ > 0 be a arameter to be determed later. We have Pr[X + t)] Pr[λX λ + t)] Pr [ e λx e λ+t)]. From Markov s equalty, we obta Now, the deedece of the X yelds [ E[e λx ] E [e λ ] 1 X E Pr [ e λx e λ+t)] E[eλX ] e λ+t). 1 e λx 1 ] 1 [ ] E e λx e λ + 1 ).

Thus, e λ + 1 ), Pr[X > + t)] 1) e λ+t) for every λ > 0. Otmzg for λ usg calculus, we get that the rght had sde s mmzed f e λ 1 ) + t) 1 t). Pluggg ths to 1), we get as desred. Pr[X > + t)] [ ] ) +t 1 ) 1 t e DKL+t ), + t 1 t 2.2 Chvátal s Method Let B, ) the radom varable that gves the umber of heads deedet Beroull trals wth success robablty. It s well kow that ) Pr[B, ) l] l 1 ) l, l for l 0,...,. Thus, for ay τ 1 ad k, we get Pr[B, ) k] k k ) 1 ) ) 1 ) τ k }{{} 1 k 1 + 0 ) 1 ) τ k } {{ } 0 0 ) 1 ) τ k. Usg the Bomal theorem, we obta Pr[B, ) k] 0 ) 1 ) τ k τ k If we wrte k + t) ad τ e λ, we ca coclude Pr[B, ) + t)] 0 )τ) 1 ) τ + 1 ) τ k. e λ + 1 e λ+t) ). Ths s the same as 1), so we ca comlete the roof of Theorem 1.1 as Secto 2.1. 2

2.3 The Imaglazzo-Kabaets Method Let λ [0, 1] be a arameter to be chose later. Let I {1,..., } be a radom dex set obtaed by cludg each elemet {1,..., } wth robablty λ. We estmate Pr [ I X 1 ] two dfferet ways, where the robablty s over the radom choce of X 1,..., X ad I. O the oe had, usg the uo boud ad deedece, we have [ ] Pr X 1 I S {1,...,} S {1,...,} [ Pr I S ] X 1 S λ S 1 λ) S S s0 S {1,...,} Pr[I S] Pr[X 1] S ) λ) s 1 λ) s λ + 1 λ), 2) s by the Bomal theorem. O the other had, by the law of total robablty, [ ] [ ] Pr X 1 Pr X 1 X + t) Pr[X + t)]. I I Now, fx X 1,..., X wth X + t). For the fxed choce of X 1 x 1,..., X x, the robablty Pr [ I x 1 ] s exactly the robablty that I avods all the X dces where x 0. Thus, [ ] Pr x 1 1 λ) X 1 λ) 1 t). I Sce the boud holds uformly for every choce of x 1,..., x wth X + t), we get [ ] Pr X 1 X + t) 1 λ) 1 t), so Combg wth 2), I [ ] Pr X 1 1 λ) 1 t) Pr[X + t)]. I Pr[X + t)] ) λ + 1 λ 1 λ) 1 t). 3) Usg calculus, we get that the rght had sde s mmzed for λ t/1 ) + t) ote that λ 1 for t 1 ). Pluggg ths to 3), [ ] ) +t 1 ) 1 t Pr[X > + t)] e DKL+t ), + t 1 t as desred. 3

2.4 The Codg Theoretc Argumet The ext roof, due to Luc Devroye, Gábor Lugos, ad Pat Mor, s sred by codg theory. Let {0, 1} be the set of all bt strgs of legth, ad let w : {0, 1} [0, 1] be a weght fucto. We call w vald f x {0,1} wx) 1. The followg lemma says that for ay robablty dstrbuto x o {0, 1}, a vald weght fucto s ulkely to be substatally larger tha x. Lemma 2.1. Let D be a robablty dstrbuto o {0, 1} that assgs to each x {0, 1} a robablty x, ad let w be a vald weght fucto. For ay s 1, we have Pr [wx) s x] 1/s. x D Proof. Let Z s {x {0, 1} wx) s x }. We have Pr [wx) s x] x D x Z s x>0 x x Z s x>0 x wx) s x sce wx)/s x 1 for x Z s, x > 0, ad sce w s vald. 1/s) x Z s wx) 1/s, We ow show that Lemma 2.1 mles Theorem 1.1. For ths, we terret the sequece X 1,..., X as a bt strg of legth. Ths duces a robablty dstrbuto D that assgs to each x {0, 1} the robablty x kx 1 ) kx, where k x deotes the umber of 1-bts x. We defe a weght fucto w : {0, 1} [0, 1] by wx) +t) kx 1 t) kx, for x {0, 1}. The w s vald, sce wx) s the robablty that x s geerated by settg each bt to 1 deedetly wth robablty + t. For x {0, 1}, we have wx) x + t ) kx ) 1 t kx. 1 Sce + t)/)1 )/1 t)) 1, t follows that wx)/ x s a creasg fucto of k x. Hece, f k x + t), we have wx) x [ + t ) +t ) ] 1 t 1 t e DKL+t ). 1 We ow aly Lemma 2.1 to D ad w to get Pr[X + t)] Pr [kx) + t)] Pr x D x D as clamed Theorem 1.1. [ wx) x e D KL+t ) ] e D KL+t ), We rovde some codg-theoretc backgroud to exla the tuto behd the roof. A code for {0, 1} s a ectve fucto C : {0, 1} {0, 1}. The mages of C are called codewords. A code s called refx-free f o codeword s the refx of aother codeword,.e., for all x, y {0, 1} wth x y, we have that f x y, the x ad y dffer at least oe bt osto. A refx-free code has a atural reresetato as a rooted bary tree whch the leaves corresod to elemets of {0, 1}. Eve though the codeword legths a refx-free code may vary, ths structure moses a restrcto o the allowed legths. Ths s formalzed Kraft s equalty. 4

Lemma 2.2 Kraft s equalty). Let C : {0, 1} {0, 1} be a refx-free code. The, x {0,1} 2 Cx) 1. Coversely, gve a fucto l : {0, 1} N wth x {0,1} 2 lx) 1, there exsts a refx-free code C : {0, 1} {0, 1} wth Cx) lx) for all x {0, 1}. Proof. Let m max x {0,1} Cx), ad let y be radom elemet of y {0, 1} m. The, for each x {0, 1}, the robablty that Cx) s a refx of y s exactly 2 Cx). Furthermore, sce C s refx-free, these evets are mutually exclusve. Thus, x {0,1} 2 Cx) 1, as clamed. Next, we rove the secod art. Let m max x {0,1} lx) ad let T be a comlete bary tree of heght m. We costruct C accordg to the followg algorthm: we set X {0, 1}, ad we ck x X wth lx ) m x X lx). The we select a ode v T wth deth lx ). We assg to Cx ) the codeword of legth l that corresods to v, ad we remove v ad all ts descedats from T. Ths deletes exactly 2 m lx ) leaves from T. Next, we remove x from X ad we reeat ths rocedure utl X s emty. Whle X, we have 2 m lx) < 2 m, x {0,1} \X so T cotas each terato at least oe leaf ad thus also at least oe ode of deth lx ). Sce we assg the odes by creasg deth, ad sce all descedats of a assged ode are deleted from the tree, the resultg code s refx-free. Kraft s equalty shows that a refx-free code C duces a vald weght fucto wx) 2 Cx). Thus, Lemma 2.1 mles that for ay robablty dstrbuto x o {0, 1} ad for ay refx-free code, the robablty mass of the strgs x wth codeword legth log1/ x ) s s at most 2 s. Now, f we set lx) k x log + t) k x ) log1 t) for x {0, 1}, the coverse of Kraft s equalty shows that there exsts a refx free code C wth C x) lx). The calculato above shows that C saves roughly + t) log + t)/) + 1 t) log1 t)/1 )) bts over log1/ x ) for ay x wth k x + t), whch almost gves the desred result. We geeralze to arbtrary vald weght fuctos to avod the slack troduced by the celg fucto. 5

3 Useful Cosequeces 3.1 The Lower Tal Corollary 3.1. Let X 1,..., X be deedet radom varables wth X {0, 1} ad Pr[X 1], for 1,.... Set X : 1 X. The, for ay t [0, ], we have Proof. Pr[X t)] e D KL t ). Pr[X t)] Pr[ X t)] Pr[X 1 + t)], where X 1 X wth deedet radom varables X {0, 1} such that Pr[X 1] 1. The result follows from D KL 1 + t 1 ) D KL t ). 3.2 Motwa-Raghava verso Corollary 3.2. Let X 1,..., X be deedet radom varables wth X {0, 1} ad Pr[X 1], for 1,.... Set X : 1 X ad µ. The, for ay δ 0, we have e δ ) µ Pr[X 1 + δ)µ] 1 + δ) 1+δ, ad Pr[X 1 δ)µ] e δ 1 δ) 1 δ ) µ. Proof. Settg t δµ/ Theorem 1.1 yelds [ 1 Pr[X 1 + δ)µ] ex 1 + δ) l1 + δ) + ) µ 1 δ/1 )) δ 1 )/ 1 + δ) 1+δ ) µ e δ2 /1 )+δ e δ ) µ 1 + δ) 1+δ 1 + δ) 1+δ. Settg t δµ/ Corollary 3.1 yelds [ 1 Pr[X 1 δ)µ] ex 1 δ) l1 δ) + ) µ 1 + δ/1 )) δ 1 )/ 1 δ) 1 δ ) µ e δ2 /1 ) δ e δ ) µ 1 δ) 1 δ 1 δ) 1 δ. ) δ l ) + δ l )]) 1 δ 1 )]) 1 + δ 1 6

3.3 Hady Versos Corollary 3.3. Let X 1,..., X be deedet radom varables wth X {0, 1} ad Pr[X 1], for 1,.... Set X : 1 X ad µ. The, for ay δ 0, 1), we have Proof. By Corollary 3.2 Pr[X 1 δ)µ] e δ2 µ/2. e δ ) µ Pr[X 1 δ)µ] 1 δ) 1 δ. Usg the ower seres exaso of l1 δ), we get Thus, as clamed. 1 δ) l1 δ) 1 δ) 1 δ δ + δ 1) δ + δ2 /2. 2 Pr[X 1 δ)µ] e [ δ+δ δ2 /2]µ e δ2 µ/2, Corollary 3.4. Let X 1,..., X be deedet radom varables wth X {0, 1} ad Pr[X 1], for 1,.... Set X : 1 X ad µ. The, for ay δ 0, we have Pr[X 1 + δ)µ] e m{δ2,δ}µ/4. Proof. We may assume that 1 + δ) 1. The Theorem 1.1 gves Defe fδ) : D KL 1 + δ) ). The Pr[X 1 + δ)] e D KL1+δ) ). f δ) l1 + δ) l1 δ/1 )) ad By Taylor s theorem, we have f δ) 1 + δ)1 δ) 1 + δ. fδ) f0) + δf 0) + δ2 2 f ξ), for some ξ [0, δ]. Sce f0) f 0) 0, t follows that fδ) δ2 2 f ξ) δ2 21 + ξ) δ2 21 + δ). For δ 1, we have δ/1 + δ) 1/2, for δ < 1, we have 1/δ + 1) 1/2. Ths gves for all δ 0 ad the clam follows. fδ) m{δ 2, δ}/4, 7

Corollary 3.5. Let X 1,..., X be deedet radom varables wth X {0, 1} ad Pr[X 1], for 1,.... Set X : 1 X ad µ. The, for ay δ > 0, we have Proof. Combe Corollares 3.3 ad 3.4. Pr[ X µ δµ] 2e m{δ2,δ}µ/4. Corollary 3.6. Let X 1,..., X be deedet radom varables wth X {0, 1} ad Pr[X 1], for 1,.... Set X : 1 X ad µ. For t 2eµ, we have Proof. By Corollary 3.2 Pr[X t] 2 t. e δ ) µ Pr[X 1 + δ)µ] 1 + δ) 1+δ ) e 1+δ)µ. 1 + δ For δ 2e 1, the deomator the rght had sde s at least 2e, ad the clam follows. 4 Geeralzatos We meto a few geeralzatos of the roof techques for Secto 2. Sce the cosequeces from Secto 3 are based o smle algebrac maulato of the bouds, the same cosequeces also hold for the geeralzed settgs. 4.1 Hoeffdg-Exteso Theorem 4.1. Let X 1,..., X be deedet radom varables wth X [0, 1] ad E[X ]. Set X : 1 X ad : 1/) 1. The, for ay t [0, 1 ], we have Pr[X + t)] e D KL+t ). Proof. The roof geeralzes the momet method. Let λ > 0 a arameter to be determed later. As before, Markov s equalty yelds Usg deedece, we get Pr [ e λx e λ+t)] E[eλX ] e λ+t). E[e λx ] E [e λ ] 1 X 1 [ ] E e λx. 4) Now we eed to estmate E [ e λx ]. The fucto z e λz s covex, so e λz 1 z)e 0 λ + ze 1 λ for z [0, 1]. Hece, E [ e λx ] E[1 X + X e λ ] 1 + e λ. 8

Gog back to 4), E[e λx ] 1 + e λ ). 1 Usg the arthmetc-geometrc mea equalty 1 x 1/) 1 x ), for x 0, ths s From here we cotue as Secto 2.1. E[e λx ] 1 + e λ ). 4.2 Hyergeometrc Dstrbuto Chvátals roof geeralzes to the hyergeometrc dstrbuto. Theorem 4.2. Suose we have a ur wth N balls, P of whch are red. We radomly draw balls from the ur wthout relacemet. Let HN, P, ) deote the umber of red balls the samle. Set : P/N. The, for ay t [0, 1 ], we have Proof. It s well kow that for l 0,...,. Pr[HN, P, ) + t)] e D KL+t ). Pr[HN, P, ) l] Clam 4.3. For every {0,..., }, we have N ) 1 P P l ) N P ) N l ) ) ) N l ) 1, ). Proof. Cosder the followg radom exermet: take a radom ermutato of the N balls the ur. Let S be the sequece of the frst elemets the ermutato. Let X be the umber of -subsets of S that cota oly red balls. We comute E[X] two dfferet ways. O the oe had, E[X] ) Pr[S cotas red balls] N ) 1 P ) N P ) ). 5) O the other had, let I {1,..., } wth I. The the robablty that all the balls the ostos dexed by I are red s P N P 1 N 1 P + 1 ) P N + 1. N Thus, by learty of exectato E[X] ). Together wth 5), the clam follows. 9

Clam 4.4. For every τ 1, we have ) N 1 0 P ) ) N P τ 1 + τ 1)). Proof. Usg Clam 4.3 ad the Bomal theorem twce), ) N 1 ) ) ) P N P N 1 P τ 0 0 ) N 1 ) ) P N P 0 0 ) N 1 P τ 1) 0 as clamed. 0 Thus, for ay τ 1 ad k, we get as before ) N 1 ) ) P N P Pr[HN, P, ) k] k ) N 1 by Clam 4.4. From here the roof roceeds as Secto 2.2. ) ) N P 1 τ 1)) ) τ 1) ) N P ) τ 1)) 1 + τ 1)), 0 P ) ) ) N P )τ k τ + 1 ) τ k, 4.3 Geeral Imaglazzo-Kabaets Theorem 4.5. Let X 1,..., X be radom varables wth X 0, 1. Suose there exst [0, 1], 1,...,, such that for every dex set I {1,..., }, we have Pr[ I X 1] I. Set X : 1 X ad : 1/) 1. The, for ay t [0, 1 ], we have Pr[X + t)] e D KL+t ). Proof. Let λ [0, 1] be a arameter to be chose later. Let I {1,..., } be a radom dex set obtaed by cludg each elemet {1,..., } wth robablty λ. As before, we estmate the robablty Pr [ I X 1 ] two dfferet ways, where the robablty s over the radom choce of X 1,..., X ad I. Smlarly to before, [ ] [ ] Pr X 1 Pr X 1 [ Pr I S ] X 1 S I I S {1,...,} S {1,...,} [ ] Pr[I S] Pr X 1 S S {1,...,} λ S 1 λ) S. 6) S 10

We defe deedet radom varables Z 1,..., Z as follows: for 1,...,, wth robablty 1 λ, we set Z 1, ad wth robablty λ, we set Z. By 6), ad usg deedece ad the arthmetc-geometrc mea equalty. [ ] [ ] Pr X 1 E Z I 1 E[Z ] The roof of the lower boud remas uchaged ad yelds [ ] Pr X 1 1 λ) 1 t) Pr[X + t)], I 1 1 λ + λ) 1 λ + λ). 7) as before. Combg wth 7) ad otmzg for λ fshes the roof, see Secto 2.3. 1 11