On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925

Outlie Itroductio Beyod Baach spaces Extras

If Z 1,..., Z idepedet with EZ t = 0 the E ( 2 Z t ) = EZ 2 t.

If Z 1,..., Z idepedet with EZ t = 0 the Exteds to Hilbert space E ( E 2 Z t ) 2 Z i = EZ 2 t. = E Z i 2.

(Pielis 94): Let Z 1,..., Z be a martigale differece sequece i a separable 2-smooth Baach space (B, ). For ay u > 0 P (sup 1 where σ 2 Z t 2. Z t σu) 2 exp { u2 2D 2 },

(Pielis 94): Let Z 1,..., Z be a martigale differece sequece i a separable 2-smooth Baach space (B, ). For ay u > 0 P (sup 1 where σ 2 Z t 2. Z t σu) 2 exp { u2 2D 2 }, Questios: replace σ with sequece-depedet versio? is it always possible? exted beyod liear structure of Baach spaces? Cotributios: address these questios the actual techique: equivalece of tail bouds ad determiistic pathwise regret iequalities

Baby versio Uit Euclidea ball B i R d. Let z 1,..., z B be arbitrary. Defie ŷ t+1 = ŷ t+1 (z 1,..., z t ) = Proj B (ŷ t 1 z t ) with ŷ 1 = 0.

Baby versio Uit Euclidea ball B i R d. Let z 1,..., z B be arbitrary. Defie ŷ t+1 = ŷ t+1 (z 1,..., z t ) = Proj B (ŷ t 1 z t ) with ŷ 1 = 0. The, y, z 1,..., z B, ŷ t y, z t

Baby versio Uit Euclidea ball B i R d. Let z 1,..., z B be arbitrary. Defie ŷ t+1 = ŷ t+1 (z 1,..., z t ) = Proj B (ŷ t 1 z t ) with ŷ 1 = 0. The, y, z 1,..., z B, ŷ t y, z t Rewrite as (ŷ t ) z 1,..., z B, z t ŷ t, z t.

Determiistic iequality: (ŷ t ) z 1,..., z B, z t ŷ t, z t. (1)

Determiistic iequality: (ŷ t ) z 1,..., z B, z t ŷ t, z t. (1) Apply to a MDS Z 1,..., Z with values i B P ( Z t u) P ( ŷ t, Z t u) (2)

Determiistic iequality: (ŷ t ) z 1,..., z B, z t ŷ t, z t. (1) Apply to a MDS Z 1,..., Z with values i B P ( Z t u) P ( ŷ t, Z t u) exp{ u 2 /2} (2) by Asuma-Hoeffdig.

Determiistic iequality: (ŷ t ) z 1,..., z B, z t ŷ t, z t. (1) Apply to a MDS Z 1,..., Z with values i B P ( Z t u) P ( ŷ t, Z t u) exp{ u 2 /2} (2) by Asuma-Hoeffdig. Itegrate tails: E Z t c (3) Usig vo Neuma miimax theorem, it is possible to show (ŷ t ) y, z 1,..., z B, ŷ t y, z t sup mds E W t

(ŷ t ) y, z 1,..., z B, ŷ t y, z t P ( Z t u) exp{ u 2 /2} E Z t c

(ŷ t ) y, z 1,..., z B, ŷ t y, z t P ( Z t u) exp{ u 2 /2} E Z t c Curiosities: i particular (3) (2) amplifies i-expectatio to high prob. improve tail bouds by takig a better gradiet descet improve gradiet descet by fidig better tail bouds move beyod liear structure of Baach space

Warmup: mirror descet with adaptive step size (B, ) 2-smooth, (B, ) deotes dual. D R B B R Bregma divergece w.r.t. R, which is 1-strogly covex o uit ball B B. Deote R 2 max sup f,g B D R (f, g). Here z t s eed ot be i uit ball. Lemma. F B covex. Defie, ŷ t+1 = ŷ t+1(z 1,..., z t) = argmi y F {η t y, z t + D R (y, ŷ t)} ad η t R max mi {1, ( t s=1 z s 2 + t 1 s=1 z s 2 1 ) }. The for ay y F ad ay z 1,..., z B, ŷ t y, z t 2.5R max z t 2 + 1.

Warmup: mirror descet with adaptive step size Let E t be coditioal expectatio. Theorem. Let Z 1,..., Z be a B-valued MDS. For ay u > 0, Z t 2.5R max ( V + 1) P V +W + (E > u 2 exp { u 2 /16}, V +W ) 2 where V = Z t 2 ad W = E t 1 Z t 2. Holds with W 0 if MDS coditioally symmetric. -idepedet, self-ormalized, ca be exteded to p-smooth

summary so far coectio betwee first-order covex optimizatio methods ad oe-sided probabilistic tail bouds

Outlie Itroductio Beyod Baach spaces Extras

Iterpret as supremum of stochastic process Z t = sup y 1 y, Z t Geeralizatio (after ceterig): take ay stochastic process Z t ad sup g G g(z t ) E t 1 [g(z t )]

Iterpret as supremum of stochastic process Z t = sup y 1 y, Z t Geeralizatio (after ceterig): take ay stochastic process Z t ad sup g G g(z t ) E t 1 [g(z t )] Eough to cosider D t = σ(ɛ 1,..., ɛ t ) geerated by i.i.d. Rademacher: sup f F ɛ t f(x t ) where x t is D t 1 -measurable. (exted Pacheko s symmetrizatio techique to martigales) f(x t(ɛ 1 t 1)) = g(z t(ɛ 1 t 1, +1) g(z t(ɛ 1 t 1, 1))

Determiistic regret iequalities Let y 1,..., y {±1}, x 1,..., x X, F = {f X R} For a give fuctio B F X R, wat a predictio strategy such that ŷ t = ŷ t(x 1,..., x t, y 1,..., y t 1) (x t, y t), ŷ ty t if f F { y tf(x t) + 2B(f; x 1,..., x )}. If existece of (ŷ t) is certified, apply to y t = ɛ t ad x t = x t(ɛ): P (sup f F { ɛ tf(x t) 2B(f; x 1,..., x )} u) P ( ɛ tŷ t u) exp{...}.

Lemma. If for ay predictable process x = (x 1,..., x ) E [sup f F ɛ t f(x t ) 2B(f; x 1,..., x )] 0, the there exists a strategy (ŷ t ) with values ŷ t sup f F f(x t ) such that the determiistic iequality holds for all sequeces. automatic amplificatio to high probability existetial o explicit predictio strategy (ŷ t ) a offset versio of sequetial Rademacher complexity R (F; x) = E [sup f F ɛ t f(x t )] (ɛ 1,..., ɛ ) sup f F ɛ t f(x t ) is ot Lipschitz; cocetratio methods fail

Defiitio. Let r (1, 2]. We say that sequetial Rademacher complexity of F exhibits a 1/r growth if 1, x, R (F; x) C 1/r sup f(x t (ɛ)). f F,ɛ {±1},t

Usig amplificatio ad reverse Hölder (due to Burkholder/Pisier): Lemma. Let F R X. Suppose sequetial Rademacher complexity exhibits 1/r growth, r (1, 2]. For ay p < r, E sup f F ɛ t f(x t ) C r,p E ( sup f(x t ) )1/p p. f F Further, if F [ 1, 1] X, the E sup f F ɛ t f(x t ) C log E ( sup f(x t ) )1/r r f F I spirit of: if ca prove E Z t the E Z t E Z t 2

Defiitio. We say G R Z has martigale type p if C such that E[sup g G (g(z t) E t 1 [g(z t)])] C E( E t 1 sup g(z t) g(z t) p 1/p ) g G Theorem. For ay G R Z, 1. If sequetial Rademacher exhibits 1/r growth, r (1, 2], the G has martigale type p for every p < r. 2. If G has martigale type p, the sequetial Rademacher exhibits 1/p growth.

Fier aalysis for type 2 Defie Var = sup f(x t ) 2, Var(f) = f(x t ) 2 f F Wheever log N seq (α) α q, q [0, 2], E [sup f F High probability via amplificatio. ɛ t f(x t ) C (Var 1/2 ) q 4 (Var 1/2 (f)) 2 q 4 ] 0 Compare to (Massart, Rossigol 13): weak variace improvemet of Nemirovskii iequality: for i.i.d. zero mea Z 1,..., Z R d : E [max j d ɛ t Z t,j ] 2 l(2d)e max j d Z 2 t,j. We match idex j o both sides; exted to martigales beyod fiite case.

Coclusios Equivalece of determiistic regret iequalities ad martigale tail bouds gives a way of provig tail bouds (for martigales or i.i.d.) by exhibitig a method or certifyig its existece amplificatio to high probability Use it to exted otio of martigale type to geeral classes Not i this talk: data-depedet bouds for olie learig

Ope questios What is behid the equivalece? Replace with E( E( sup 1/p E t 1 sup g(z t ) g(z t) p ) g G g G 1/p g(z t ) E t 1g(Z t) p ) If sequetial Rademacher complexity exhibits 1/r growth rate, the does G have martigale type r? We oly prove martigale type p for ay p < r.

Outlie Itroductio Beyod Baach spaces Extras

Reverse Hölder priciple For p (0, ), defie Z p, = (sup t>0 t p P(Z > t)) 1/p Lemma (Pisier). For ay δ (0, 1) ad ay R there exists C p (δ, R) s.t. the followig holds. For i.i.d. (Z i ) i 0, if sup N 1 P (sup N 1/p Z i > R) δ i N the Z p, C p (δ, R) Corollary: For ay 0 < q < p < there exists C p,q such that Z p, C p,q sup N 1/p sup Z i N 1 i N q