Hoeffding, Azum, McDirmid Krl Strtos 1 Hoeffding (sum of independent RVs) Hoeffding s lemm. If X [, ] nd E[X] 0, then for ll t > 0: E[e tx ] e t2 ( ) 2 / Proof. Since e t is conve, for ll [, ]: This mens: E[e tx ] ( where φ(t) : t + ln et e t et + et ( et the form ( ). Look t the derivtives of φ: φ () φ () ( 1 e t( ) e t( ) e t( ) ) et( ) e t e φ(t) et( ) ). We did the second step ecuse we wnt ) 2 α t( ) ( ) 2 ( t( ) + α ) 2 for α : α ( t( ) + α ) } {{ } u ( t( ) t( ) + α ) ( ) 2 } {{ } 1 u ( )2 4 We used the fct tht the concve function u(1 u) u u 2 chieves its mimum of 1/4 t u 1/2. Now we pproimte φ(t) t t 0 with the first-degree Tylor polynomil. Reminder theorem gives us tht The φ(t) φ(0) + 1 t φ (0) + R 1 (θ) for some θ [0, t] t2 2 φ (θ) t2 ( ) 2 1
Hoeffding s inequlity. Given iid rndom vriles X 1... X m where X i [ i, i ], let S m : m X i. Then for ny ɛ > 0: P (S m E[S m ] ɛ) e 2ɛ2 / m (i i)2 Proof. Using the Chernoff ounding technique, we write for ll t 0: P (S m E[S m ] ɛ) P (e t(sm E[Sm]) e tɛ ) E[e t(sm E[Sm]) ]e tɛ [ m ] E e t(xi E[Xi]) e tɛ m [ E e t(xi E[Xi])] e tɛ m m y Mrkov y independence e t 2 (i i ) 2 e tɛ y Hoeffding s lemm e t 2 (i i ) 2 tɛ Since t2 ( i i) 2 4ɛ tɛ is conve, we minimize it with t ( i i), yielding the ound 2 m e 2ɛ2 2ɛ ( i i ) 2 2 m ( e i i ) 2. 1 The proof suggests tht the result cn e generlized to vriles tht re not necessrily independent, since we just need the epecttion to rek over product. 2 Azum (sum of mrtingle differences) Conditionl Hoeffding s lemm. If V [f(z), f(z) + c] nd E[V Z] 0, then for ll t > 0: E[e tv Z] e t2 c 2 / Note tht E[e tv Z] is rndom vrile in Z. Proof. Similr to the proof of Hoeffding s lemm. Use f(z), f(z) + c nd use E[ Z] insted of E[ ]. V 1, V 2,... is clled mrtingle difference sequence wrt. X 1, X 2,... if V i is function of X 1... X i. E[ V i ] < E[V i+1 X 1... X i ] 0 1 Without Hoeffding s lemm, we could hndle the cse X i {0, 1} y eplicitly ounding the non-centered quntity E[e tx i] p i e t +(1 p i ) 1 p i (e t +1) ep( p i (e t +1)) (here p i : E[X i ]) nd oserving m E[etX i] ep( E[S m](e t + 1)). 2
Azum s inequlity. Given mrtingle difference sequence V 1, V 2,... wrt. X 1, X 2,... where V i [f i (X 1... X i 1 ), f i (X 1... X i 1 ) + c i ] for some f i nd c i 0, for ll ɛ > 0: [ m ] E V i ɛ e 2ɛ2 / m c2 i Proof. For ech k [m], define S k : k V i. By the lw of iterted epecttions (LIE) E X [X] E Z [E X Z [X Z]] (see the ppendi): where E[e ts k ] E[E[e ts k X 1... X k 1 ]] E[e ts k X 1... X k 1 ] E[e ts k 1 e tv k X 1... X k 1 ] E[e ts k 1 X 1... X k 1 ]E[e tv k X 1... X k 1 ] E[e ts k 1 X 1... X k 1 ]e t2 c 2 k / The second step holds ecuse S k 1 only depends on X 1... X k 1. The third step holds y conditionl Hoeffding s lemm. Thus E[e tsm ] e t2 c 2 m / E[e tsm 1 ] e t 2 m c 2 i Use the Chernoff ounding technique on S m : P (S m ɛ) P (e tsm e tɛ ) E [ e tsm] e tɛ e t 2 m c 2 i tɛ y Mrkov y the ove rgument By minimizing the conve function t2 m c2 i tɛ with t 4ɛ/ m ound e 2ɛ2 / m c2 i. c2 i, we get the 3 McDirmid ( Lipschitz function of independent RVs) McDirmid s inequlity. Given iid rndom vriles X 1... X m X, let f : X m R e function ounded in Lipschitz-like mnner s follows: for ll 1... m, i X, there is some c i 0 such tht Let f(s) : f(x 1... X m ). Then f( 1... i... m ) f( 1... i... m ) c i P (f(s) E[f(S)] ɛ) e 2ɛ2 / m c2 i Proof. Define V : f(s) E[f(S)]. Will show V m V i is sum of ounded mrgingle differences V i [f i (X 1... X i 1 ), f i (X 1... X i 1 ) + c i ]. Then Azum s inequlity gives the desired result. Define V i : E[V X 1... X i ] E[V X 1... X i 1 ]. Note tht ech V i is function of X 1... X i nd the telescoping sum gives m V i E[V X 1... X m ] V 3
In ddition, E[E[V X 1... X i ] X 1... X i 1 ] E[V X 1... X i 1 ] (y LIE), so we hve E[V i X 1... X i 1 ] E[E[V X 1... X i ] V X 1... X i 1 ] 0 Thus V 1... V m is mrtingle difference sequence wrt. X 1... X m. 2 Now ound V i in terms of X 1... X i 1 : V i sup E[V X 1... X i ] E[V X 1... X i 1 ] : W i X V i inf X E[V X 1... X i ] E[V X 1... X i 1 ] : U i Using the Lipschitz condition on f: W i U i sup E[V X 1... X i ] E[V X 1... X i ], X sup E[f(S) X 1... X i ] E[f(S) X 1... X i ], X c i Thus W i U i +c i nd it follows V i [U i, U i +c i ] where U i is function of X 1... X i 1. References. Appendi D of Foundtions of Mchine Lerning (MRT), Chpter 12 of Proility nd Computing (MU) 2 We ve constructed doo mrtingle Z 0, Z 1,..., Z m wrt. X 0 constnt, X 1,..., X m for the trget quntity V. Tht is, Z i : E[V X 0... X m] which gives V i Z i Z i 1. 4
4 Appendies 4.1 Crsh Course on Conditionl RVs The proof of Azum s nd McDirmid s inequlity mkes hevy use of conditionl epecttions. Let s sy X is rndom vrile. Then E X [X] is constnt. However, E X Y [X Y ] is rndom vrile (rndom over Y )! compute vlue for specific y Y : E X Y [X Y y] P X Y (X Y y) d is constnt. We cn only The lw of iterted epecttions (LIE) 3 sttes tht E Y [E X Y [X Y ]] E X [X] }{{}}{{} fnc of Y constnt Now tht we know the definition, it s pretty esy to show: E Y [E X Y [X Y ]] P Y (Y y) E X Y [X Y y] dy y y E X [X] ( ) P Y (Y y) P X Y (X Y y) d dy ( ) P Y (Y y) P X Y (X Y y) dy d y P X (X ) d The sme principle holds when we work with more thn two vriles: E Y Z [E X Y,Z [X Y, Z] Z] E X Z [X Z] }{{}}{{} fnc of Y, Z fnc of Z It siclly sys we re free to condition on nything s long s we eventully tke epecttion over it. 4.2 Mrtingles A sequence Z 0, Z 1... is mrtingle wrt. X 0, X 1... if Z i is function of X 0... X i. E[ Z i ] E[Z i+1 X 0... X i ] Z i 3 Also clled the lw of totl epecttion, the tower rule, the smoothing theorem, Adm s Lw. 5
A doo mrtingle is mrtingle constructed s follows. Let X 0... X n e ny sequence. We re interested in Y tht depends on ll X 0... X n ; we ssume E[ Y ]. We define Z i to e the epecttion of Y given X 0... X i : Z i : E[Y X 0... X i ] To verify Z 0... Z n is mrtingle, we need to check the third condition: E[Z i+1 X 0... X i ] E[E[Y X 0... X i+1 ] X 0... X i ] y def E[Y X 0... X i ] y LIE Z i For instnce, consider sequence of rewrds in n independent fir gmles: X 1... X n where E[X i ] 0. We re interested in the totl rewrd Y n X i. Then our doo mrtingle is given y Z i n E[X j X 1... X i ] j1 since E[X j X 1... X i ] E[X j ] 0 for j > i. I.e., the refined estimte of the totl rewrd t time i is simply the sum up to tht time. By construction, if Z 0, Z 1,... is mrtingle wrt. X 0, X 1,..., then V 1, V 2,... defined y V i : Z i Z i 1 i j1 is mrtingle difference sequence defined efore since V i Z i Z i 1 is function of X 1... X i. E[ V i ] E[ Z i Z i 1 ] < E[V i+1 X 1... X i ] E[Z i+1 X 1... X i ] Z i 0 X j 6