On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
|
|
- Anabel Cobb
- 6 years ago
- Views:
Transcription
1 Proceedigs of Machie Learig Research vol 65:1 19, 2017 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Alexader Rakhli Uiversity of Pesylvaia Karthik Sridhara Corell Uiversity rakhli@wharto.upe.edu sridhara@cs.corell.edu Abstract We study a equivalece of (i) determiistic pathwise statemets appearig i the olie learig literature (termed regret bouds), (ii) high-probability tail bouds for the supremum of a collectio of martigales (of a specific form arisig from uiform laws of large umbers), ad (iii) i-expectatio bouds for the supremum. By virtue of the equivalece, we prove expoetial tail bouds for orms of Baach space valued martigales via determiistic regret bouds for the olie mirror descet algorithm with a adaptive step size. We show that the pheomeo exteds beyod the settig of olie liear optimizatio ad preset the equivalece for the supervised olie learig settig. Keywords: martigale iequalities; olie learig 1. Itroductio The paper ivestigates equivalece of regret iequalities that hold for all sequeces ad probabilistic iequalities for martigales. I recet years, it was show that existece of regret-miimizatio strategies ca be certified o-algorithmically by studyig certai stochastic processes. I this paper, we make the coectio i the opposite directio ad show a certai equivalece. We preset several ew deviatio iequalities that follow with surprisig ease from pathwise regret iequalities, while it is far from clear how to prove them with other methods. Arguably the simplest example of the equivalece betwee predictio of idividual sequeces ad probabilistic iequalities ca be foud i the work of Cover (1965). Cosider the task of predictig a biary sequece y = (y 1,..., y ) {±1} i a olie maer. Let φ {±1} [0, 1] be 1/-Lipschitz with respect to the Hammig distace. The there exists a radomized strategy such that y, E [ 1 1 {ŷ t y t }] φ(y) (1) if ad oly if Eφ(ε) 1/2. The expectatio i (1) is with respect to the radomized predictios ŷ t = ŷ t (y 1,..., y t 1 ) {±1} made by the algorithm, ε = (ε 1,..., ε ) is a sequece of idepedet Rademacher radom variables, ad 1 {} is the idicator loss fuctio. While this result is ot difficult to prove by backward iductio (see e.g. (Rakhli ad Sridhara, 2016)), the message is rather itriguig: existece of a predictio strategy with a give 2017 A. Rakhli & K. Sridhara.
2 Rakhli Sridhara mistake boud φ is equivalet to a simple statemet about the expected value of φ with respect to the uiform distributio. Furthermore, the Lipschitz coditio o φ implies a high-probability boud for the deviatio of φ from Eφ via McDiarmid s iequality. Our secod example of the equivalece is i the settig of olie liear optimizatio. Cosider the uit Euclidea ball B i R d. Let z 1,..., z B ad defie, recursively, the Euclidea projectios ŷ t+1 = ŷ t+1 (z 1,..., z t ) = Proj B (ŷ t 1/2 z t ) (2) for each t = 1,...,, with the iitial value ŷ 1 = 0. Elemetary algebra 1 shows that for ay f B, the regret iequality ŷ t f, z t holds determiistically for ay sequece z 1,..., z B. By optimally choosig f i the directio of the sum, we re-write this statemet equivaletly as z t ŷ t, z t. (3) Sice the iequality holds pathwise, by applyig it to a B-valued martigale differece sequece Z 1,..., Z, we coclude that P ( Z t > u) P ( ŷ t, Z t > u) exp { u2 2 }. (4) The latter upper boud is a applicatio of the Azuma-Hoeffdig s iequality. Ideed, the process (ŷ t ) is predictable with respect to σ(z 1,..., Z t 1 ), ad thus ( ŷ t, Z t ) is a [ 1, 1]- valued martigale differece sequece. It is worth emphasizig the coclusio: oe-sided deviatio tail bouds for a orm of a vector-valued martigale ca be deduced from tail bouds for real-valued martigales with the help of a determiistic regret iequality. Next, itegratig the tail boud i (4) yields a seemigly weaker i-expectatio statemet E Z t c (5) for a appropriate costat c. The twist i this ucomplicated story comes ext: with the help of the miimax theorem, (Aberethy et al., 2009; Rakhli et al., 2010) established existece of strategies (ŷ t ) such that z 1,..., z, f B, ŷ t f, z t sup E Z t, (6) with the supremum take over all 2B-valued martigale differece sequeces. I view of (5), this boud is c. What have we achieved? Let us summarize. The determiistic iequality (3), which holds for all sequeces, implies a tail boud (4). The latter, i tur, implies a iexpectatio boud (5), which implies (3) (with a worse costat) through a miimax argumet, thus closig the loop. The equivalece studied i depth i this paper is iformally stated below: 1. See the two-lie proof i the Appedix, Lemma 12. 2
3 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Iformal: The followig bouds imply each other: (a) a iequality that holds for all sequeces; (b) a deviatio tail probability for the size of a martigale; (c) a i-expectatio boud o the size of a martigale. The equivalece, i particular, allows us to amplify the i-expectatio bouds to appropriate high-probability tail bouds. While writig the paper, we leared of the trajectorial approach, extesively studied i recet years. I particular, it has bee show that Doob s maximal iequalities ad Burkholder-Davis-Gudy iequalities have determiistic couterparts (Acciaio et al., 2013; Beiglböck ad Nutz, 2014; Gushchi, 2014; Beiglböck ad Siorpaes, 2015). The olie learig literature cotais a trove of pathwise iequalities, ad further sythesis with the trajectorial approach (ad the applicatios i mathematical fiace) appears to be a promisig directio. This paper is orgaized as follows. I the ext sectio, we exted the Euclidea result to martigales with values i Baach spaces ad improve it by replacig with square root of variatio. I particular, we coclude a high probability self-ormalized tail boud, a statemet that appears to be difficult to obtai with other methods (see (Bercu et al., 2015; de la Peña et al., 2008) for a survey of techiques i this area). Sectio 3 is devoted to the aalysis of equivalece for supervised learig. Fially, Sectio 4 shows that it is eough to cosider dyadic martigales if oe is iterested i geeral martigale iequalities of a certai form. 2. Adaptive Bouds ad Probabilistic Iequalities i Baach Spaces For the case of the Euclidea (or Hilbertia) orm, it is easy to see that the boud of (5) ca be improved to a distributio-depedet quatity ( E Z t 2 ) 1/2. Give the equivalece sketched earlier, oe may woder whether this upper boud is also equivalet to a gradiet-descet-like olie method with a sequece-depedet variatio goverig the rate of covergece. Below, we ideed preset such a equivalece for 2-smooth Baach spaces. Furthermore, the probabilistic tail bouds obtaied this way appear to be ovel. Suppose that we have a orm o some vector space such that 2 is a smooth fuctio: x + y 2 x 2 + x 2, y + C y 2 (7) for some C > 0. Repeatedly usig smoothess of the orm, we coclude that 2 E Z t C E Z t 2 (8) for ay martigale differece sequece takig values i that vector space, sice the crossterms vaish. Istead of (8), we will work with the tighter iequality E Z t CE Z t 2. (9) Let (B, ) be a reflexive Baach space with dual space (B, ). Assume that (B, ) is 2-smooth (that is, ρ(δ) sup { 1 2 ( x + y + x y ) 1 x = 1, y = δ}, the modulus of 3
4 Rakhli Sridhara smoothess, behaves as cδ 2 ). The there exists a equivalet orm B (i the sese that c 1 B c 2 B for some possibly dimesio-depedet c 1, c 2 ) that is smooth. I this case, we ca expect that (9) holds for martigale differece sequeces takig values i B. Let us ow argue this more formally, ad also show equivalece to the existece of determiistic predictio strategies From regret iequality to expected value ad back Lemma 1 Existece of a (determiistic) predictio strategy (ŷ t ), with values ŷ t(z 1,..., z t 1 ) i the uit ball B of B such that z 1,..., z B, f B, ŷ t f, z t C z t 2 (10) for some C is equivalet to (9) (with a possibly differet costat C) holdig for all martigale differece sequeces with values i B. Proof By rearragig (10) as i (3), choosig a uit vector f, ad takig a expectatio o both sides implies (9) with the same costat C as i (10). We ow argue the reverse directio: (9) implies existece of a strategy with a regret boud (10). First, cosider a arbitrary collectio (X 1,..., X ) of radom variables takig values i a R-radius cetered ball of B ad defie the coditioal expectatios E t 1 [ ] = E[ X 1,..., X t 1 ]. Observe that the collectio (X t E t 1 X t ), t = 1,...,, is a martigale differece sequece. Hece, by triagle iequality ad our assumptio, E X t E E t 1 X t E (X t E t 1 X t ) CE X t E t 1 X t 2. (11) The right-most expressio i (11) ca be upper bouded by 2CE X t 2 + E t 1 X t 2 8CE X t 2. (12) To justify the last iequality, first observe that E t 1 X t E t 1 X t. Secod, the fuctio x A + x 2 is covex, ad (12) follows by Jese s iequality. Combiig (11) ad (12), for ay fiite R ad ay collectio (X 1,..., X ) with values i R B, E X t E t 1 X t C X t 2 0 (13) with C = 8C. Writig we coclude that E t 1 X t = if ŷ t, E t 1 X t, ŷ t 1 sup E if ŷ t, E t 1 X t if f, X t C X t 2 0 (14) ŷ t 1 f 1 4
5 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities where the supremum is over the distributios of (X 1,..., X ) with values i R B. The rest of the argumet ca be see as ruig the proof of (Aberethy et al., 2009) backwards. The miimax theorem holds because of fiiteess of R, the radius of the support of the X t s, via argumets i (Rakhli et al., 2014, Appedix A). Thakfully, the strategy that guaratees (10) is already kow: it is a adaptive versio of Mirror Descet. For completeess, the proof is provided i the Appedix. To defie the strategy, we eed the followig fact: if B is a 2-smooth Baach space, the there is a fuctio R o the dual space B which is strogly covex with respect to the orm. I fact, oe ca take the squared dual orm correspodig to the smooth equivalet orm o B (Borwei et al., 2009). To avoid extra costats, let us simply assume that R is 1- strogly covex o the uit ball B of B. The fuctio R iduces the Bregma divergece D R B B R, defied as D R (f, g) = R(f) R(g) R(g), f g. Lemma 2 Let F B be a covex set. Defie, recursively, ŷ t+1 = ŷ t+1 (z 1,..., z t ) = argmi η t f, z t + D R (f, ŷ t ) (15) with ŷ 1 = 0, η t R max ( t s=1 z s 2 ) 1/2, ad R 2 max sup f,g F D R (f, g). The for ay f F ad ay z 1,..., z B, ŷ t f, z t 2R max z t 2. Lemma 2 is complemetary to Lemma 1, as it gives the algorithm whose existece was guarateed by Lemma 1. I Sectio 3, we will ot have the luxury of producig a explicit algorithm, yet the equivalece will still be established From regret iequalities to tail bouds ad back We ow start from a regret-miimizatio strategy ad deduce a ew probabilistic iequality for martigales. We the coclude the i-expectatio boud ad use the equivalece of Lemma 1 to close the loop. The adaptive Mirror Descet algorithm of the previous sectio implies the followig theorem: Theorem 3 Let Z 1,..., Z be a B-valued martigale differece sequece, ad let E t stad for the coditioal expectatio give Z 1,..., Z t. Defie V = 2 Z t 2 ad W = 2 E t 1 Z t 2, (16) which are assumed to have a fiite expected value. For ay u > 0, it holds that Z t 2R max V P V + W + (E > u 2 exp { u 2 /16}, (17) V + W ) 2 5
6 Rakhli Sridhara ad for ay u 2, it holds that P Z t 2R max V (V + W + 1) ( log (V + W + 1)) u exp { u 2 /2}. (18) Furthermore, both bouds also hold with W 0 ad V = Z t 2 if the martigale differeces are coditioally symmetric. 2 I additio to extedig the Euclidea result of the previous sectio to Baach spaces, (17) ad (18) offer several advatages. First, the bouds are -idepedet. The deviatios i (17) ad (18) are self-ormalized (that is, scaled by root-variatio terms) ad all the terms are either distributio-depedet or data-depedet, as i the case of the Studet s t- statistic (de la Peña et al., 2008). The advatage of (18), especially i the case of coditioal symmetry, is that all the terms, modulo the additive costats 1, are data-depedet. We are ot aware of similar bouds for orms of radom vectors i the literature, ad we wish to stress that the proof of the result is almost immediate, give the regret iequality. We would also like to stress that Theorem 3 holds without ay assumptio o the martigale differece sequece beyod square itegrability. Proof [Theorem 3] We take F i Lemma 2 to be the uit ball i B, esurig ŷ t 1. For ay martigale differece sequece (Z t ) with values i B, the above lemma implies, by the defiitio of the orm, Z t 2R max V ŷ t, Z t (19) determiistically for all sample paths. Dividig both sides by V + W + (E V + W ) 2, we coclude that the left-had side i (17) is upper bouded by ŷ P t, Z t V + W + (E > u. (20) V + W ) 2 To cotrol this probability, we recall the followig results (de la Peña et al., 2008, Theorem 12.4, Corollary 12.5): Theorem 4 ((de la Peña et al., 2008)) For a pair of radom variables A, B, with B > 0, such that it holds that for ay u > 0, E exp {λa λ 2 B 2 /2} 1 λ R, (21) P A B 2 + (EB) > u 2 2 exp { u 2 /4} 2. A martigale differece sequece Z 1,..., Z is coditioally symmetric if the law L(Z t Z 1,..., Z t 1) = L( Z t Z 1,..., Z t 1). 6
7 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities ad for ay y > 0 ad u 2, P A (B 2 + y) ( log (B2 /y + 1)) To apply this theorem, we verify assumptio (21): u exp { u 2 /2}. Lemma 5 The radom variables A = ŷ t, Z t ad B 2 = 2 ( Z t 2 + E t 1 Z t 2 ) satisfy (21). Furthermore, if Z t s are coditioally symmetric, the A = ŷ t, Z t ad B 2 = Z t 2 satisfy (21). The simple proof of the Lemma is postpoed to the Appedix. Puttig together (20) with Lemma 5 ad Theorem 4 cocludes the proof of Theorem 3. To close the loop of equivaleces, we eed to deduce (9) from the tail boud iequality. Let us use the first part of Theorem 3. Deote the radom variable i the umerator of the fractio i (17) as Y ad the deomiator as a radom variable U. The (17) implies that (Y /U) is a subgaussia radom variable. Hece, its secod momet is bouded by a costat: E(Y /U) 2 c. However, by Cauchy-Schwartz iequality, implyig EY = E (U Y U ) (EU 2 ) 1/2 (E Y 2 U 2 ) 1/2 ceu 2, E Z t 2R max E V + 2 cev. (22) This almost closes the loop, except the last term i (22) has the expectatio iside the square root rather tha outside, ad thus presets a weaker upper boud (i the sese of (8) rather tha (9)). We cojecture that there is a way to prove the upper boud with the expectatio outside the square root. Noetheless, to keep the promise of closig the loop, we observe that the upper boud of (8) implies that the Baach space has martigale type 2, which implies, via (Srebro et al., 2011), existece of a strogly covex fuctio o the dual space, ad, hece, existece of a strategy that guaratees (10) with a costat C that may deped at most logarithmically o Remarks We compare our result to that of Pielis (1994). Let Z 1,..., Z be a martigale differece sequece takig values i a separable (2, D)-smooth Baach space (B, ). Pielis (1994) proved, through a sigificatly more difficult aalysis, that for ay u > 0, 1 Z t σu) 2 exp { u2 2D 2 }, (23) where σ is a costat satisfyig Z t 2 σ2. I compariso to Theorem 3, this result ivolves a distributio-idepedet variatio σ as a worst-case poitwise upper boud. 7
8 Rakhli Sridhara The reader will otice that the pathwise iequality (19) does ot deped o ad the costructio of ŷ t is also oblivious to this value. A simple argumet the allows us to lift the real-valued Burkholder-Davis-Gudy iequality (with the costat from (Burkholder, 2002)) to the Baach space valued martigales: Lemma 6 With the otatio of Theorem 3, E max s=1,..., s Z t (2R max + 3) E V. Remarkably, the costat i the resultig BDG iequality is, up to a additive costat, proportioal to R max. Oce agai, we have ot see such results i the literature, yet they follow with ease from regret iequalities. We also remark that Theorem 3 ca be aturally exteded to p-smooth Baach spaces B. This is accomplished i a straightforward maer by extedig Lemma Probabilistic Iequalities ad Supervised Learig We ow look beyod liear predictio ad aalyze supervised learig problems with side iformatio. Here agai we establish a strog coectio betwee existece of predictio strategies, the i-expectatio iequalities for martigales, ad high-probability tail bouds. I cotrast to Sectio 2, we will ot preset ay algorithms. Note that the simplest example of the equivalece (for biary predictio ad i the absece of side iformatio) was already stated i the very begiig of this paper Supervised learig with side iformatio We let y 1,..., y {±1} ad x 1,..., x X for some abstract measurable set X. Let F be a class of [ 1, 1]-valued fuctios o X. Fix a cost fuctio l R R R, covex i the first argumet. For a give fuctio B F X R, we aim to costruct ŷ t = ŷ t (x 1,..., x t, y 1,..., y t 1 ) [ 1, 1] such that the followig adaptive boud holds: (x t, y t ), l(ŷ t, y t ) if { l(f(x t ), y t ) + B(f; x 1,..., x )}. (24) We may view ŷ t as a predictio of the ext value y t havig observed x t ad all the data thus far. I this paper, we focus o the liear loss l(a, b) = ab (equivaletly, absolute loss a b = 1 ab whe a [ 1, 1] ad b {±1}) ad the square loss l(a, b) = (a b) 2. We write (24) for the liear cost fuctio as sup { y t f(x t ) B(f; x 1,..., x )} while for the square loss it becomes sup { 2y t f(x t ) f(x t ) 2 B(f; x 1,..., x )} y t ŷ t (25) 2y t ŷ t ŷ 2 t. (26) 8
9 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Give a fuctio B ad a class F, there are two goals we may cosider: (a) certify the existece of (ŷ t ) (ŷ 1,..., ŷ ) satisfyig the pathwise iequality (24) for all sequeces (x t, y t ) ; or (b) give a explicit costructio of (ŷ t). Both questios have bee studied i the olie learig literature, but the o-costructive approach will play a especially importat role. Ideed, explicit costructios such as the simple gradiet descet update (2) might ot be available i more complex situatios, yet it is the existece of (ŷ t ) that yields the sought-after tail bouds. To certify the existece of a strategy (ŷ t ), cosider the followig object: A(F, B) = sup x t if ŷ t max y t { l(ŷ t, y t ) if { l(f(x t ), y t ) + B(f; x 1,..., x )}} (27) where the otatio stads for the repeated applicatio of the operators (the outer operators correspodig to t = 1). The variable x t rages over X, y t is i the set {±1}, ad ŷ t rages i [ 1, 1]. It follows that A(F, B) 0 is a ecessary ad sufficiet coditio for the existece of (ŷ t ) such that (24) holds. Ideed, the optimal choice for ŷ 1 is made give x 1 ; the optimal choice for ŷ 2 is made give x 1, y 1, x 2, ad so o. This choice defies the optimal strategy (ŷ t ). 3 The other directio is immediate. Suppose we ca fid a upper boud o A(F, B) ad the prove that this upper boud is o-positive. This would serve as a sufficiet coditio for the existece of (ŷ t ). Next, we preset such a upper boud for the case whe the cost fuctio is liear. More geeral results for covex Lipschitz cost fuctios ca be foud i (Foster et al., 2015) Liear loss As i the itroductio, let ε = (ε 1,..., ε ) be a sequece of idepedet Rademacher radom variables. Let x = (x 1,..., x ) ad y = (y 1,..., y ) be predictable processes with respect to the dyadic filtratio (σ(ε 1,..., ε t )) t=0, with values i X ad {±1}, respectively. I other words, x t = x t (ε 1,..., ε t 1 ) X ad y t = y t (ε 1,..., ε t 1 ) {±1} for each t = 1,...,. Oe ca thik of the collectios (x t ) ad (y t ) as trees labeled, respectively, by elemets of X ad {±1}. Lemma 7 For the case of the liear cost fuctio, A(F, B) = sup x E [sup Therefore, the followig are equivalet: ε t f(x t ) B(f; x 1,..., x )]. (28) For ay predictable process x = (x 1,..., x ) E [sup 3. If the ifima are ot achieved, a limitig argumet ca be employed. ε t f(x t ) B(f; x 1,..., x )] 0, (29) 9
10 Rakhli Sridhara There exists a strategy (ŷ t ) such that the pathwise iequality (25) holds. Furthermore, the strategy ca be assumed to satisfy ŷ t sup f(x t ). (30) The i-expectatio boud of (29) is a ecessary ad sufficiet coditio for the existece of a strategy with the per-sequece boud (25). This latter boud, however, implies a highprobability statemet, i the spirit of the other results i the paper. Below, we detail this amplificatio. Take ay X -valued predictable process x = (x 1,..., x ) with respect to the dyadic filtratio. The determiistic iequality (25) applied to x t = x t (ε 1,..., ε t 1 ) ad y t = ε t becomes sup { ε t f(x t ) B(f; x 1,..., x )} for ay sample path (ε 1,..., ε ), ad thus we have the compariso of tails ε t ŷ t (31) { ε t f(x t ) B(f; x 1,..., x )} > u) P ( ε t ŷ t > u). (32) Give the boudedess of the icremets ε t ŷ t, the tail bouds follow immediately from the Azuma-Hoeffdig s iequality or from Freedma s iequality (Freedma, 1975). More precisely, we use the fact that the martigale differeces are bouded by ŷ t sup f(x t ), ad coclude: Lemma 8 If there exists a predictio strategy (ŷ t ) that satisfies (25) ad (30), the for ay predictable process x, the Azuma-Hoeffdig iequality implies that { ε t f(x t ) B(f; x 1,..., x )} > u) exp ( 4 max ε sup f(x t (ε)) 2 ), (33) Freedma s iequality implies u 2 { ε t f(x t ) B(f; x 1,..., x )} > u, sup f(x t ) 2 σ 2 ) exp ( 2σ 2 + 2uM/3 ), u 2 (34) where M = sup,ε {±1},t f(x t ), ad we also have that for ay α > 0, { ε t f(x t ) B(f; x 1,..., x )} α sup f(x t ) 2 > u) exp ( 2αu). (35) I view of Lemma 7, a sufficiet coditio for these iequalities is that (29) holds for all x. Let us emphasize the coclusio of the above lemma: the o-positivity of the expected supremum of a collectio of martigales, offset by a fuctio B, implies existece of a regret-miimizatio strategy, which implies a high-probability tail boud. To close the loop, 10
11 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities we itegrate out the tails, obtaiig a i-expectatio boud of the form (29), but possibly with a somewhat larger B fuctio (this depeds o the particular form of B). I additio to describig the equivalece, let us capitalize o it ad prove a ew tail boud. The most basic B is a costat that depeds o the complexity of F, but ot o f or the data. Defie the worst-case sequetial Rademacher averages as R (F) sup x E sup Clearly, B = R (F) satisfies (29) ad the followig is immediate. ε t f(x t ). (36) Corollary 9 For ay F R X ad a X -valued predictable process x with respect to the dyadic filtratio, ε t f(x t ) > R (F) + u) exp ( 4 max ε sup f(x t (ε)) 2 ). (37) Superficially, (37) looks like a oe-sided versio of a deviatio boud for classical (i.i.d.) Rademacher averages (Bouchero et al., 2013). However, sequetial Rademacher averages are ot Lipschitz with respect to a flip of a sig, as all of the remaiig path may chage after a flip. It is uclear to the authors how to prove (37) through other existig methods Square loss Due to limited space, we will ot state the aalogue of Lemma 7 ad simply outlie the implicatio from existece of regret miimizatio strategies to high probability tail bouds. As for the case of the liear loss fuctio, take ay X -valued predictable process x = (x 1,..., x ) with respect to the dyadic filtratio. Fix α > 0. The determiistic iequality (26) for x t = x t (ε 1,..., ε t 1 ) ad y t = 1 α ε t becomes sup { ( 2 α ε tf(x t ) f 2 (x t )) B(f; x 1,..., x )} As i the proof of (35), we obtai a tail compariso u 2 2 α ε tŷ t ŷ 2 t. (38) { ( 2 α ε tf(x t ) f 2 (x t )) B(f; x 1,..., x )} > u α ) (39) P ( ( 2 α ε tŷ t ŷ 2 t ) > u ) exp { αu α 2 } where the last iequality follows via a stadard aalysis of the momet geeratig fuctio. As a example, cosider the Azoury-Vovk-Warmuth forecaster for liear regressio (see e.g. (Cesa-Biachi ad Lugosi, 2006, Sec. 11.8)). Take the class F to be the class of fuctios F = {x f, x f B2 d}, where Bd 2 is the uit Euclidea ball i Rd. Assumig X = B2 d, the regret boud for the forecaster is kow to be B(f; x 1,..., x ) = f 2 + Y 2 xt T A 1 t x t, 11
12 Rakhli Sridhara where A t = I + t s=1 x t xt T ad Y = max y t. However, whe F is idexed by the uit ball, the supremum i (39) has a closed form expressio, ad the overall probability iequality takes o the form P 2 ε t x t 1 2 A 1 xt T A 1 t x t + u exp { u}. (40) We poit out that, beig fuctios of Rademacher radom variables, x t s are radom variables themselves, ad the terms i the above expressio are depedet i a o-trivial maer. We would like to refer the reader to the full versio of this paper (Rakhli ad Sridhara, 2015) which cotais further implicatios of the equivalece betwee the existece of determiistic strategies ad tail bouds. I particular, the amplificatio allows us to prove a characterizatio of a otio of martigale type beyod the liear case. 4. Symmetrizatio: dyadic filtratio is eough I Sectio 3, we preseted coectios betwee determiistic regret iequalities i the supervised settig ad tail bouds for dyadic martigales. Oe may ask whether these tail bouds ca be used for more geeral martigales idexed by some set. The purpose of this sectio is to prove that statemets for the dyadic filtratio ca be lifted to geeral processes via sequetial symmetrizatio. Cosider the martigale M g = g(z t ) E[g(Z t ) Z 1,..., Z t 1 ] idexed by g G. If (Z t ) is adapted to a dyadic filtratio A t = σ(ε 1,..., ε t ), each icremet g(z t ) E[g(Z t ) Z 1,..., Z t 1 ] takes o the value f g (x t (ε 1 t 1 )) (g(z t (ε 1 t 1, +1)) g(z t (ε 1 t 1, 1))) /2 or its egatio, where x t is a predictable process with values i Z Z ad f g F defied by (z, z ) g(z) g(z ). I Sectio 3, we worked directly with martigales of the form M f = ε t f(x t (ε)), idexed by a abstract class F R X ad a abstract X -valued predictable process x. We exted the symmetrizatio approach of Pacheko (Pacheko, 2003) to sequetial symmetrizatio for the case of martigales. I cotrast to the more frequetly-used Gié-Zi symmetrizatio proof (via Chebyshev s iequality) (Gié ad Zi, 1984; Va Der Vaart ad Weller, 1996) that allows a direct tail compariso of the symmetrized ad the origial processes, Pacheko s approach allows for a idirect compariso. The followig immediate extesio of (Pacheko, 2003, Lemma 1) will imply that ay exp{ µ(u)} type tail behavior of the symmetrized process yields the same behavior for the origial process. Lemma 10 Suppose ξ ad ν are radom variables ad for some Γ 1 ad for all u 0 P (ν u) Γ exp{ µ(u)}. 12
13 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Let µ R + R + be a icreasig differetiable fuctio with µ(0) = 0 ad µ( ) =. Suppose for all a R ad φ(x) µ([x a] + ) it holds that Eφ(ξ) Eφ(ν). The for ay u 0, P (ξ u) Γ exp{ µ(u µ 1 (1))}. I particular, if µ(b) = cb, we have P (ξ u) Γ exp{1 cu}; if µ(b) = cb 2, the P (ξ u) Γ exp{1 cu 2 /4}. As i (Pacheko, 2003), the lemma will be used with ξ ad ν as fuctios of a sigle sample ad the double sample, respectively. The expressio for the double sample will be symmetrized i order to pass to the dyadic filtratio. However, ulike (Pacheko, 2003), we are dealig with a depedet sequece Z 1,..., Z, ad the meaig ascribed to the secod sample Z 1,..., Z is that of a coditioally idepedet taget sequece. That is, Z t, Z t are idepedet ad have the same distributio coditioally o Z 1,..., Z t 1. Let E t 1 stad for the coditioal expectatio give Z 1,..., Z t 1. Corollary 11 Let B G Z 2 R be a fuctio that is symmetric with respect to the swap of the i-th pair z i, z i, for ay i []: B(g; z 1, z 1,..., z i, z i,..., z, z ) = B(g; z 1, z 1,..., z i, z i,..., z, z ) (41) for all g G. The, uder the assumptios of Lemma 10 o µ, a tail behavior (z, z ), g G ε t (g(z t ) g(z t)) B(g; (z 1, z 1),..., (z, z )) > u) Γ exp{ µ(u)} for all u > 0 implies the tail boud g G (g(z t ) E t 1 g(z t )) E Z 1 B(g; Z1, Z 1,..., Z, Z ) > u) Γ exp{ µ(u µ 1 (1))} for ay sequece of radom variables Z 1,..., Z ad the correspodig taget sequece Z 1,..., Z. The supremum is take over a pair of predictable processes z, z with respect to the dyadic filtratio. A direct compariso of the expected suprema also holds: E sup g G (g(z t ) E t 1 g(z t )) E Z 1 B(g; Z1, Z 1,..., Z, Z ) (42) sup E sup ε t (g(z t ) g(z t)) B(g; (z 1, z 1),..., (z, z )). z,z g G We coclude that it is eough to prove tail bouds for a supremum sup ε t f(x t ) B(f; x 1,..., x ) of a martigale with respect to the dyadic filtratio, offset by a fuctio B(f; x 1,..., x ), as doe i Sectio 3. 13
14 Rakhli Sridhara Ackowledgemets Research is supported i part by the NSF uder grats o CDS&E-MSS ad Refereces J. Aberethy, A. Agarwal, P. Bartlett, ad A. Rakhli. A stochastic view of optimal regret through miimax duality. I Proceedigs of the 22th Aual Coferece o Learig Theory, B. Acciaio, M. Beiglbck, F. Peker, W. Schachermayer, ad J. Temme. A trajectorial iterpretatio of Doob s martigale iequalities. A. Appl. Probab., 23(4): , URL M. Beiglböck ad M. Nutz. Martigale iequalities ad determiistic couterparts. Electro. J. Probab, 19(95):1 15, M. Beiglböck ad P. Siorpaes. Pathwise versios of the burkholder davis gudy iequality. Beroulli, 21(1): , B. Bercu, B. Delyo, ad E. Rio. Cocetratio iequalities for sums ad martigales, J. Borwei, A. Guirao, P. Hájek, ad J. Vaderwerff. Uiformly covex fuctios o baach spaces. Proceedigs of the America Mathematical Society, 137(3): , S. Bouchero, G. Lugosi, ad P. Massart. Cocetratio iequalities: A oasymptotic theory of idepedece. Oxford Uiversity Press, D. Burkholder. The best costat i the davis iequality for the expectatio of the martigale square fuctio. Trasactios of the America Mathematical Society, 354(1):91 105, N. Cesa-Biachi ad G. Lugosi. Predictio, Learig, ad Games. Cambridge Uiversity Press, T. Cover. Behaviour of sequetial predictors of biary sequeces. I Proc. 4th Prague Cof. Iform. Theory, Statistical Decisio Fuctios, Radom Processes, V. H de la Peña, T. L. Lai, ad Q.-M. Shao. Self-ormalized processes: Limit theory ad Statistical Applicatios. Spriger, D. Foster, A. Rakhli, ad K. Sridhara. Adaptive olie learig, I Submissio. D. A Freedma. O tail probabilities for martigales. the Aals of Probability, pages , E. Gié ad J. Zi. Some limit theorems for empirical processes. Aals of Probability, 12(4): ,
15 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities A. Gushchi. O pathwise couterparts of Doob s maximal iequalities. Proceedigs of the Steklov Istitute of Mathematics, 1(287): , D. Pacheko. Symmetrizatio approach to cocetratio iequalities for empirical processes. Aals of Probability, 31(4): , I. Pielis. Optimum bouds for the distributios of martigales i baach spaces. The Aals of Probability, 22(4): , A. Rakhli ad K. Sridhara. O equivalece of martigale tail bouds ad determiistic regret iequalities. arxiv preprit arxiv: , A. Rakhli ad K. Sridhara. A tutorial o olie supervised learig with applicatios to ode classificatio i social etworks. CoRR, abs/ , URL http: //arxiv.org/abs/ A. Rakhli, K. Sridhara, ad A. Tewari. Olie learig: Radom averages, combiatorial parameters, ad learability. Advaces i Neural Iformatio Processig Systems 23, pages , A. Rakhli, K. Sridhara, ad A. Tewari. Olie learig via sequetial complexities. Joural of Machie Learig Research, N. Srebro, K. Sridhara, ad A. Tewari. O the uiversality of olie mirror descet. I NIPS, pages , A. W. Va Der Vaart ad J. A. Weller. Weak Covergece ad Empirical Processes: With Applicatios to Statistics. Spriger Series, March Appedix A. Proofs Lemma 12 The update i (2) satisfies z 1,..., z B, ŷ t f, z t. Proof [Lemma 12] The followig two-lie proof is stadard. By the property of a projectio, ŷ t+1 f 2 = Proj B (ŷ t 1/2 z t ) f 2 (ŷ t 1/2 z t ) f 2 (43) = ŷ t f z t 2 2 1/2 ŷ t f, z t. (44) Rearragig, 2 1/2 ŷ t f, z t ŷ t f 2 ŷ t+1 f z t 2. Summig over t = 1,..., yields the desired statemet. 15
16 Rakhli Sridhara Proof [Lemma 6] Because of the aytime property of the regret boud ad the strategy defiitio, we ca write (19) as s s max { Z t ŷ t, Z t } 2R max V (45) s=1,..., simply because the right-had side is largest for s =. Sub-additivity of max implies s max Z t 2R max V max ŷ t, Z t. (46) s=1,..., s s=1,..., By the Burkholder-Davis-Gudy iequality (with the costat from Burkholder (2002)), E max s s=1,..., ŷ t, Z t 3E ( 1/2 ŷ t, Z t 2 ) 3E V. (47) Proof [Lemma 2] Let ŷ t+1 be the urestricted miimum of (15). Because of the update form, f F, ŷ t+1 f, z t 1 η t (D R (f, ŷ t ) D R (f, ŷ t+1 ) D R (ŷ t+1, ŷ t )). Summig over t = 1,...,, ŷ t+1 f, z t η1 1 D R (f, ŷ 1 ) + η1 1 Rmax 2 + R 2 maxη 1 (ηt 1 t=2 (ηt 1 ηt 1)R 2 max t=2 η 1 t 2 ŷ t+1 ŷ t 2, ηt 1)D R (f, ŷ t ) η 1 t 2 ŷ t+1 ŷ t 2 η 1 t D R (ŷ t+1, ŷ t ) where we used strog covexity of R ad the fact that η t is oicreasig. Next, we write ŷ t f, z t = ad upper boud the secod term by otig that Combiig the bouds, ŷ t+1 f, z t + ŷ t ŷ t+1, z t ŷ t ŷ t+1, z t ŷ t ŷ t+1 z t η 1 t 2 ŷ t ŷ t η t 2 z t 2. ŷ t f, z t R 2 maxη 1 + η t 2 z t 2. (48) 16
17 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Usig the fact (Cesa-Biachi ad Lugosi, 2006, Lemma 11.8) that for oegative (α t ) ad the defiitio of η t, α t t s=1 αs 2 α t ŷ t f, z t 2R max z t 2. (49) Proof [Lemma 5] Let E t 1 [ ] = E t 1 [ Z 1 t 1 ] deote coditioal expectatio. We have E t 1 exp {λ ŷ t, Z t E t 1 Z t λ2 ( Z t 2 + E t 1 Z t 2 )} E t 1 exp {λ ŷ t, Z t Z t λ2 ( Z t 2 + Z t 2 )} E t 1 E ɛ exp {λɛ ŷ t, Z t Z t λ2 ( Z t 2 + Z t 2 )}. Sice exp is a covex fuctio, the expressio is E t 1 E ɛ exp { 1 2 (2λɛ ŷ t, Z t 2λ 2 Z t 2 ) (2λɛ ŷ t, Z t 2λ2 Z t 2 )} 1 2 E t 1E ɛ exp {2λɛ ŷ t, Z t 2λ 2 Z t 2 } E t 1E ɛ exp {2λɛ ŷ t, Z t 2λ2 Z t 2 } = E t 1 E ɛ exp {2λɛ ŷ t, Z t 2λ 2 Z t 2 } E t 1 exp {2λ 2 ŷ t, Z t 2 2λ 2 Z t 2 } 1 sice ŷ t 1. Repeatig this argumet for t = to t = 1 yields the statemet. If Z t are coditioally symmetric, the ŷ t, Z t are also coditioally symmetric. Hece, E t 1 exp {λ ŷ t, Z t λ2 2 Z t 2 } = E t 1 E ɛ exp {λɛ ŷ t, Z t λ2 2 Z t 2 } E t 1 exp { λ2 2 ŷ t, Z t 2 λ2 2 Z t 2 } 1. Proof [Lemma 7] For biary outcomes y {±1} ad either absolute loss or liear loss, A(F, B) = sup x t if ŷ t max y t { y t ŷ t + sup { y t f(x t ) B(f; x 1,..., x )}}, where we shall restrict ŷ t to rage over the iterval ŷ t sup f(x t ) ad y t i {±1}. Cosider the last step t =. Give x 1, ŷ 1 1, ad y 1 1, we solve if max { ŷ y + φ (x 1, y 1 )} (50) ŷ y 17
18 Rakhli Sridhara where φ (x 1, y 1 ) sup { y t f(x t ) B(f; x 1,..., x )}. (51) Sice there are two possibilities for y, the closed form solutio for ŷ is give by ŷ = 1 2 (φ (x 1, y 1 1, 1) φ (x 1, y 1 1, 1)). (52) Importatly, this value satisfies ŷ sup f(x ). With this optimal choice, (50) is equal to E ε φ (x 1, y 1 1, ε ). We ow iclude the supremum over x i the defiitio of φ 1 φ 1 (x 1 1, y 1 1 ) sup x E ε φ (x 1, y 1 1, ε ) ad repeat the argumet for t = 1. Sice all the steps are equalities, A(F, B) = φ 0 ( ) = sup x 1 which ca be writte as (28). E ε1... sup x E ε sup { ε t f(x t ) B(f; x 1,..., x )}, Proof [Lemma 10] We have P (ξ u) Eφ(ξ) φ(u) Eφ(ν) φ(u) 1 φ(u) (φ(0) + 0 φ (x)p (ν x)dx). Choose a = u µ 1 (1), where µ 1 is the iverse fuctio. If a < 0, the coclusio of the lemma is true sice Γ 1. I the case of a 0, we have φ(0) = 0. The above upper boud becomes P (ξ u) Γ φ(u) φ (x) exp{ µ(x)}dx = 0 = If µ(b) = cb, we have If µ(b) = cb 2, we have Γ φ(u) a µ (x) exp{ µ(x)}dx Γ µ(u a) [ exp{ µ(x)}] a = Γ exp{ µ(a)} = Γ exp{ µ(u µ 1 (1))}. P (ξ u) Γ exp{ c(u 1/c)} = Γ exp{1 cu}. P (ξ u) Γ exp{ c(u 1/ c) 2 } Γ exp{ cu 2 /4} wheever u 2/ c. If u 2/ c, the coclusio is valid sice Γ 1. Proof [Corollary 11] Let ξ(z 1,..., Z, Z 1,..., Z ) = sup g (g(z t ) g(z t)) B(g; Z 1, Z 1,..., Z, Z ) 18
19 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities ad ν(z 1,..., Z ) = sup g (g(z t ) E t 1 g(z t)) E Z 1 B(g; Z1, Z 1,..., Z, Z ). The for ay covex φ R R, Eφ(ν) Eφ(ξ) usig covexity of the supremum. The problem is ow reduced to obtaiig tail bouds for f Write the probability as (g(z t ) g(z t)) B(g; Z 1, Z 1,..., Z, Z ) > u). E1 {ξ(z 1,..., Z, Z 1,..., Z ) > u}. We ow proceed to replace the radom variables from backwards with a dyadic filtratio. Let us start with the last idex. Reamig Z ad Z we see that E1 {sup g (g(z t ) g(z t)) B(g; Z 1, Z 1,..., Z, Z ) > u} 1 = E1 {sup g 1 = EE ɛ 1 {sup g E sup z,z (g(z t ) g(z t)) + (g(z ) g(z )) B(g; Z 1, Z 1,..., Z, Z ) > u} 1 E ɛ 1 {sup g (g(z t ) g(z t)) + ɛ (g(z ) g(z )) B(g; Z 1, Z 1,..., Z, Z ) > u} (g(z t ) g(z t)) + ɛ (g(z ) g(z )) B(g; Z 1, Z 1,..., Z 1, Z 1, z, z ) > u}. Proceedig i this maer for step 1 ad back to t = 1, we obtai a upper boud of sup E ɛ1... sup z 1,z 1 z,z = sup x E1 {sup g E ɛ 1 {sup g ɛ t (g(z t ) g(z t)) B(g; z 1, z 1,..., z, z ) > u} ɛ t f g (x t ) B(g; x 1,..., x ) > u}. 19
On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Alexader Rakhli Uiversity of Pesylvaia Karthik Sridhara Corell Uiversity October 17, 2015 Abstract We study a equivalece of (i) determiistic
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationMAT1026 Calculus II Basic Convergence Tests for Series
MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real
More informationSelf-normalized deviation inequalities with application to t-statistic
Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationRiesz-Fischer Sequences and Lower Frame Bounds
Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationBerry-Esseen bounds for self-normalized martingales
Berry-Essee bouds for self-ormalized martigales Xiequa Fa a, Qi-Ma Shao b a Ceter for Applied Mathematics, Tiaji Uiversity, Tiaji 30007, Chia b Departmet of Statistics, The Chiese Uiversity of Hog Kog,
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationRates of Convergence by Moduli of Continuity
Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationEmpirical Processes: Glivenko Cantelli Theorems
Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More information7 Sequences of real numbers
40 7 Sequeces of real umbers 7. Defiitios ad examples Defiitio 7... A sequece of real umbers is a real fuctio whose domai is the set N of atural umbers. Let s : N R be a sequece. The the values of s are
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More informationApplication to Random Graphs
A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationAdvanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology
Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4
More informationarxiv: v1 [math.pr] 13 Oct 2011
A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationThe random version of Dvoretzky s theorem in l n
The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the
More informationRandom Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.
Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationAppendix to Quicksort Asymptotics
Appedix to Quicksort Asymptotics James Alle Fill Departmet of Mathematical Scieces The Johs Hopkis Uiversity jimfill@jhu.edu ad http://www.mts.jhu.edu/~fill/ ad Svate Jaso Departmet of Mathematics Uppsala
More informationNotes 5 : More on the a.s. convergence of sums
Notes 5 : More o the a.s. covergece of sums Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: Dur0, Sectios.5; Wil9, Sectio 4.7, Shi96, Sectio IV.4, Dur0, Sectio.. Radom series. Three-series
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationChapter IV Integration Theory
Chapter IV Itegratio Theory Lectures 32-33 1. Costructio of the itegral I this sectio we costruct the abstract itegral. As a matter of termiology, we defie a measure space as beig a triple (, A, µ), where
More informationA Proof of Birkhoff s Ergodic Theorem
A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed
More informationb i u x i U a i j u x i u x j
M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationSequences. Notation. Convergence of a Sequence
Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationThe Wasserstein distances
The Wasserstei distaces March 20, 2011 This documet presets the proof of the mai results we proved o Wasserstei distaces themselves (ad ot o curves i the Wasserstei space). I particular, triagle iequality
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationSolutions to Tutorial 5 (Week 6)
The Uiversity of Sydey School of Mathematics ad Statistics Solutios to Tutorial 5 (Wee 6 MATH2962: Real ad Complex Aalysis (Advaced Semester, 207 Web Page: http://www.maths.usyd.edu.au/u/ug/im/math2962/
More information18.657: Mathematics of Machine Learning
18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,
More informationOn forward improvement iteration for stopping problems
O forward improvemet iteratio for stoppig problems Mathematical Istitute, Uiversity of Kiel, Ludewig-Mey-Str. 4, D-24098 Kiel, Germay irle@math.ui-iel.de Albrecht Irle Abstract. We cosider the optimal
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationSingular Continuous Measures by Michael Pejic 5/14/10
Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable
More informationOn Random Line Segments in the Unit Square
O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationTR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT
TR/46 OCTOBER 974 THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION by A. TALBOT .. Itroductio. A problem i approximatio theory o which I have recetly worked [] required for its solutio a proof that the
More informationChapter 7 Isoperimetric problem
Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated
More informationDimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationAssignment 5: Solutions
McGill Uiversity Departmet of Mathematics ad Statistics MATH 54 Aalysis, Fall 05 Assigmet 5: Solutios. Let y be a ubouded sequece of positive umbers satisfyig y + > y for all N. Let x be aother sequece
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationMath Solutions to homework 6
Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationMi-Hwa Ko and Tae-Sung Kim
J. Korea Math. Soc. 42 2005), No. 5, pp. 949 957 ALMOST SURE CONVERGENCE FOR WEIGHTED SUMS OF NEGATIVELY ORTHANT DEPENDENT RANDOM VARIABLES Mi-Hwa Ko ad Tae-Sug Kim Abstract. For weighted sum of a sequece
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationLecture 4: April 10, 2013
TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a
More information5. INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY
IA Probability Let Term 5 INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY 51 Iequalities Suppose that X 0 is a radom variable takig o-egative values ad that c > 0 is a costat The P X c E X, c is
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationFeedback in Iterative Algorithms
Feedback i Iterative Algorithms Charles Byre (Charles Byre@uml.edu), Departmet of Mathematical Scieces, Uiversity of Massachusetts Lowell, Lowell, MA 01854 October 17, 2005 Abstract Whe the oegative system
More informationLearning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More informationMeasure and Measurable Functions
3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationMath 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions
Math 451: Euclidea ad No-Euclidea Geometry MWF 3pm, Gasso 204 Homework 3 Solutios Exercises from 1.4 ad 1.5 of the otes: 4.3, 4.10, 4.12, 4.14, 4.15, 5.3, 5.4, 5.5 Exercise 4.3. Explai why Hp, q) = {x
More informationThe log-behavior of n p(n) and n p(n)/n
Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity
More informationPreponderantly increasing/decreasing data in regression analysis
Croatia Operatioal Research Review 269 CRORR 7(2016), 269 276 Prepoderatly icreasig/decreasig data i regressio aalysis Darija Marković 1, 1 Departmet of Mathematics, J. J. Strossmayer Uiversity of Osijek,
More informationSolutions to HW Assignment 1
Solutios to HW: 1 Course: Theory of Probability II Page: 1 of 6 Uiversity of Texas at Austi Solutios to HW Assigmet 1 Problem 1.1. Let Ω, F, {F } 0, P) be a filtered probability space ad T a stoppig time.
More informationDisjoint Systems. Abstract
Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More information