On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

Size: px
Start display at page:

Download "On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities"

Transcription

1 Proceedigs of Machie Learig Research vol 65:1 19, 2017 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Alexader Rakhli Uiversity of Pesylvaia Karthik Sridhara Corell Uiversity rakhli@wharto.upe.edu sridhara@cs.corell.edu Abstract We study a equivalece of (i) determiistic pathwise statemets appearig i the olie learig literature (termed regret bouds), (ii) high-probability tail bouds for the supremum of a collectio of martigales (of a specific form arisig from uiform laws of large umbers), ad (iii) i-expectatio bouds for the supremum. By virtue of the equivalece, we prove expoetial tail bouds for orms of Baach space valued martigales via determiistic regret bouds for the olie mirror descet algorithm with a adaptive step size. We show that the pheomeo exteds beyod the settig of olie liear optimizatio ad preset the equivalece for the supervised olie learig settig. Keywords: martigale iequalities; olie learig 1. Itroductio The paper ivestigates equivalece of regret iequalities that hold for all sequeces ad probabilistic iequalities for martigales. I recet years, it was show that existece of regret-miimizatio strategies ca be certified o-algorithmically by studyig certai stochastic processes. I this paper, we make the coectio i the opposite directio ad show a certai equivalece. We preset several ew deviatio iequalities that follow with surprisig ease from pathwise regret iequalities, while it is far from clear how to prove them with other methods. Arguably the simplest example of the equivalece betwee predictio of idividual sequeces ad probabilistic iequalities ca be foud i the work of Cover (1965). Cosider the task of predictig a biary sequece y = (y 1,..., y ) {±1} i a olie maer. Let φ {±1} [0, 1] be 1/-Lipschitz with respect to the Hammig distace. The there exists a radomized strategy such that y, E [ 1 1 {ŷ t y t }] φ(y) (1) if ad oly if Eφ(ε) 1/2. The expectatio i (1) is with respect to the radomized predictios ŷ t = ŷ t (y 1,..., y t 1 ) {±1} made by the algorithm, ε = (ε 1,..., ε ) is a sequece of idepedet Rademacher radom variables, ad 1 {} is the idicator loss fuctio. While this result is ot difficult to prove by backward iductio (see e.g. (Rakhli ad Sridhara, 2016)), the message is rather itriguig: existece of a predictio strategy with a give 2017 A. Rakhli & K. Sridhara.

2 Rakhli Sridhara mistake boud φ is equivalet to a simple statemet about the expected value of φ with respect to the uiform distributio. Furthermore, the Lipschitz coditio o φ implies a high-probability boud for the deviatio of φ from Eφ via McDiarmid s iequality. Our secod example of the equivalece is i the settig of olie liear optimizatio. Cosider the uit Euclidea ball B i R d. Let z 1,..., z B ad defie, recursively, the Euclidea projectios ŷ t+1 = ŷ t+1 (z 1,..., z t ) = Proj B (ŷ t 1/2 z t ) (2) for each t = 1,...,, with the iitial value ŷ 1 = 0. Elemetary algebra 1 shows that for ay f B, the regret iequality ŷ t f, z t holds determiistically for ay sequece z 1,..., z B. By optimally choosig f i the directio of the sum, we re-write this statemet equivaletly as z t ŷ t, z t. (3) Sice the iequality holds pathwise, by applyig it to a B-valued martigale differece sequece Z 1,..., Z, we coclude that P ( Z t > u) P ( ŷ t, Z t > u) exp { u2 2 }. (4) The latter upper boud is a applicatio of the Azuma-Hoeffdig s iequality. Ideed, the process (ŷ t ) is predictable with respect to σ(z 1,..., Z t 1 ), ad thus ( ŷ t, Z t ) is a [ 1, 1]- valued martigale differece sequece. It is worth emphasizig the coclusio: oe-sided deviatio tail bouds for a orm of a vector-valued martigale ca be deduced from tail bouds for real-valued martigales with the help of a determiistic regret iequality. Next, itegratig the tail boud i (4) yields a seemigly weaker i-expectatio statemet E Z t c (5) for a appropriate costat c. The twist i this ucomplicated story comes ext: with the help of the miimax theorem, (Aberethy et al., 2009; Rakhli et al., 2010) established existece of strategies (ŷ t ) such that z 1,..., z, f B, ŷ t f, z t sup E Z t, (6) with the supremum take over all 2B-valued martigale differece sequeces. I view of (5), this boud is c. What have we achieved? Let us summarize. The determiistic iequality (3), which holds for all sequeces, implies a tail boud (4). The latter, i tur, implies a iexpectatio boud (5), which implies (3) (with a worse costat) through a miimax argumet, thus closig the loop. The equivalece studied i depth i this paper is iformally stated below: 1. See the two-lie proof i the Appedix, Lemma 12. 2

3 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Iformal: The followig bouds imply each other: (a) a iequality that holds for all sequeces; (b) a deviatio tail probability for the size of a martigale; (c) a i-expectatio boud o the size of a martigale. The equivalece, i particular, allows us to amplify the i-expectatio bouds to appropriate high-probability tail bouds. While writig the paper, we leared of the trajectorial approach, extesively studied i recet years. I particular, it has bee show that Doob s maximal iequalities ad Burkholder-Davis-Gudy iequalities have determiistic couterparts (Acciaio et al., 2013; Beiglböck ad Nutz, 2014; Gushchi, 2014; Beiglböck ad Siorpaes, 2015). The olie learig literature cotais a trove of pathwise iequalities, ad further sythesis with the trajectorial approach (ad the applicatios i mathematical fiace) appears to be a promisig directio. This paper is orgaized as follows. I the ext sectio, we exted the Euclidea result to martigales with values i Baach spaces ad improve it by replacig with square root of variatio. I particular, we coclude a high probability self-ormalized tail boud, a statemet that appears to be difficult to obtai with other methods (see (Bercu et al., 2015; de la Peña et al., 2008) for a survey of techiques i this area). Sectio 3 is devoted to the aalysis of equivalece for supervised learig. Fially, Sectio 4 shows that it is eough to cosider dyadic martigales if oe is iterested i geeral martigale iequalities of a certai form. 2. Adaptive Bouds ad Probabilistic Iequalities i Baach Spaces For the case of the Euclidea (or Hilbertia) orm, it is easy to see that the boud of (5) ca be improved to a distributio-depedet quatity ( E Z t 2 ) 1/2. Give the equivalece sketched earlier, oe may woder whether this upper boud is also equivalet to a gradiet-descet-like olie method with a sequece-depedet variatio goverig the rate of covergece. Below, we ideed preset such a equivalece for 2-smooth Baach spaces. Furthermore, the probabilistic tail bouds obtaied this way appear to be ovel. Suppose that we have a orm o some vector space such that 2 is a smooth fuctio: x + y 2 x 2 + x 2, y + C y 2 (7) for some C > 0. Repeatedly usig smoothess of the orm, we coclude that 2 E Z t C E Z t 2 (8) for ay martigale differece sequece takig values i that vector space, sice the crossterms vaish. Istead of (8), we will work with the tighter iequality E Z t CE Z t 2. (9) Let (B, ) be a reflexive Baach space with dual space (B, ). Assume that (B, ) is 2-smooth (that is, ρ(δ) sup { 1 2 ( x + y + x y ) 1 x = 1, y = δ}, the modulus of 3

4 Rakhli Sridhara smoothess, behaves as cδ 2 ). The there exists a equivalet orm B (i the sese that c 1 B c 2 B for some possibly dimesio-depedet c 1, c 2 ) that is smooth. I this case, we ca expect that (9) holds for martigale differece sequeces takig values i B. Let us ow argue this more formally, ad also show equivalece to the existece of determiistic predictio strategies From regret iequality to expected value ad back Lemma 1 Existece of a (determiistic) predictio strategy (ŷ t ), with values ŷ t(z 1,..., z t 1 ) i the uit ball B of B such that z 1,..., z B, f B, ŷ t f, z t C z t 2 (10) for some C is equivalet to (9) (with a possibly differet costat C) holdig for all martigale differece sequeces with values i B. Proof By rearragig (10) as i (3), choosig a uit vector f, ad takig a expectatio o both sides implies (9) with the same costat C as i (10). We ow argue the reverse directio: (9) implies existece of a strategy with a regret boud (10). First, cosider a arbitrary collectio (X 1,..., X ) of radom variables takig values i a R-radius cetered ball of B ad defie the coditioal expectatios E t 1 [ ] = E[ X 1,..., X t 1 ]. Observe that the collectio (X t E t 1 X t ), t = 1,...,, is a martigale differece sequece. Hece, by triagle iequality ad our assumptio, E X t E E t 1 X t E (X t E t 1 X t ) CE X t E t 1 X t 2. (11) The right-most expressio i (11) ca be upper bouded by 2CE X t 2 + E t 1 X t 2 8CE X t 2. (12) To justify the last iequality, first observe that E t 1 X t E t 1 X t. Secod, the fuctio x A + x 2 is covex, ad (12) follows by Jese s iequality. Combiig (11) ad (12), for ay fiite R ad ay collectio (X 1,..., X ) with values i R B, E X t E t 1 X t C X t 2 0 (13) with C = 8C. Writig we coclude that E t 1 X t = if ŷ t, E t 1 X t, ŷ t 1 sup E if ŷ t, E t 1 X t if f, X t C X t 2 0 (14) ŷ t 1 f 1 4

5 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities where the supremum is over the distributios of (X 1,..., X ) with values i R B. The rest of the argumet ca be see as ruig the proof of (Aberethy et al., 2009) backwards. The miimax theorem holds because of fiiteess of R, the radius of the support of the X t s, via argumets i (Rakhli et al., 2014, Appedix A). Thakfully, the strategy that guaratees (10) is already kow: it is a adaptive versio of Mirror Descet. For completeess, the proof is provided i the Appedix. To defie the strategy, we eed the followig fact: if B is a 2-smooth Baach space, the there is a fuctio R o the dual space B which is strogly covex with respect to the orm. I fact, oe ca take the squared dual orm correspodig to the smooth equivalet orm o B (Borwei et al., 2009). To avoid extra costats, let us simply assume that R is 1- strogly covex o the uit ball B of B. The fuctio R iduces the Bregma divergece D R B B R, defied as D R (f, g) = R(f) R(g) R(g), f g. Lemma 2 Let F B be a covex set. Defie, recursively, ŷ t+1 = ŷ t+1 (z 1,..., z t ) = argmi η t f, z t + D R (f, ŷ t ) (15) with ŷ 1 = 0, η t R max ( t s=1 z s 2 ) 1/2, ad R 2 max sup f,g F D R (f, g). The for ay f F ad ay z 1,..., z B, ŷ t f, z t 2R max z t 2. Lemma 2 is complemetary to Lemma 1, as it gives the algorithm whose existece was guarateed by Lemma 1. I Sectio 3, we will ot have the luxury of producig a explicit algorithm, yet the equivalece will still be established From regret iequalities to tail bouds ad back We ow start from a regret-miimizatio strategy ad deduce a ew probabilistic iequality for martigales. We the coclude the i-expectatio boud ad use the equivalece of Lemma 1 to close the loop. The adaptive Mirror Descet algorithm of the previous sectio implies the followig theorem: Theorem 3 Let Z 1,..., Z be a B-valued martigale differece sequece, ad let E t stad for the coditioal expectatio give Z 1,..., Z t. Defie V = 2 Z t 2 ad W = 2 E t 1 Z t 2, (16) which are assumed to have a fiite expected value. For ay u > 0, it holds that Z t 2R max V P V + W + (E > u 2 exp { u 2 /16}, (17) V + W ) 2 5

6 Rakhli Sridhara ad for ay u 2, it holds that P Z t 2R max V (V + W + 1) ( log (V + W + 1)) u exp { u 2 /2}. (18) Furthermore, both bouds also hold with W 0 ad V = Z t 2 if the martigale differeces are coditioally symmetric. 2 I additio to extedig the Euclidea result of the previous sectio to Baach spaces, (17) ad (18) offer several advatages. First, the bouds are -idepedet. The deviatios i (17) ad (18) are self-ormalized (that is, scaled by root-variatio terms) ad all the terms are either distributio-depedet or data-depedet, as i the case of the Studet s t- statistic (de la Peña et al., 2008). The advatage of (18), especially i the case of coditioal symmetry, is that all the terms, modulo the additive costats 1, are data-depedet. We are ot aware of similar bouds for orms of radom vectors i the literature, ad we wish to stress that the proof of the result is almost immediate, give the regret iequality. We would also like to stress that Theorem 3 holds without ay assumptio o the martigale differece sequece beyod square itegrability. Proof [Theorem 3] We take F i Lemma 2 to be the uit ball i B, esurig ŷ t 1. For ay martigale differece sequece (Z t ) with values i B, the above lemma implies, by the defiitio of the orm, Z t 2R max V ŷ t, Z t (19) determiistically for all sample paths. Dividig both sides by V + W + (E V + W ) 2, we coclude that the left-had side i (17) is upper bouded by ŷ P t, Z t V + W + (E > u. (20) V + W ) 2 To cotrol this probability, we recall the followig results (de la Peña et al., 2008, Theorem 12.4, Corollary 12.5): Theorem 4 ((de la Peña et al., 2008)) For a pair of radom variables A, B, with B > 0, such that it holds that for ay u > 0, E exp {λa λ 2 B 2 /2} 1 λ R, (21) P A B 2 + (EB) > u 2 2 exp { u 2 /4} 2. A martigale differece sequece Z 1,..., Z is coditioally symmetric if the law L(Z t Z 1,..., Z t 1) = L( Z t Z 1,..., Z t 1). 6

7 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities ad for ay y > 0 ad u 2, P A (B 2 + y) ( log (B2 /y + 1)) To apply this theorem, we verify assumptio (21): u exp { u 2 /2}. Lemma 5 The radom variables A = ŷ t, Z t ad B 2 = 2 ( Z t 2 + E t 1 Z t 2 ) satisfy (21). Furthermore, if Z t s are coditioally symmetric, the A = ŷ t, Z t ad B 2 = Z t 2 satisfy (21). The simple proof of the Lemma is postpoed to the Appedix. Puttig together (20) with Lemma 5 ad Theorem 4 cocludes the proof of Theorem 3. To close the loop of equivaleces, we eed to deduce (9) from the tail boud iequality. Let us use the first part of Theorem 3. Deote the radom variable i the umerator of the fractio i (17) as Y ad the deomiator as a radom variable U. The (17) implies that (Y /U) is a subgaussia radom variable. Hece, its secod momet is bouded by a costat: E(Y /U) 2 c. However, by Cauchy-Schwartz iequality, implyig EY = E (U Y U ) (EU 2 ) 1/2 (E Y 2 U 2 ) 1/2 ceu 2, E Z t 2R max E V + 2 cev. (22) This almost closes the loop, except the last term i (22) has the expectatio iside the square root rather tha outside, ad thus presets a weaker upper boud (i the sese of (8) rather tha (9)). We cojecture that there is a way to prove the upper boud with the expectatio outside the square root. Noetheless, to keep the promise of closig the loop, we observe that the upper boud of (8) implies that the Baach space has martigale type 2, which implies, via (Srebro et al., 2011), existece of a strogly covex fuctio o the dual space, ad, hece, existece of a strategy that guaratees (10) with a costat C that may deped at most logarithmically o Remarks We compare our result to that of Pielis (1994). Let Z 1,..., Z be a martigale differece sequece takig values i a separable (2, D)-smooth Baach space (B, ). Pielis (1994) proved, through a sigificatly more difficult aalysis, that for ay u > 0, 1 Z t σu) 2 exp { u2 2D 2 }, (23) where σ is a costat satisfyig Z t 2 σ2. I compariso to Theorem 3, this result ivolves a distributio-idepedet variatio σ as a worst-case poitwise upper boud. 7

8 Rakhli Sridhara The reader will otice that the pathwise iequality (19) does ot deped o ad the costructio of ŷ t is also oblivious to this value. A simple argumet the allows us to lift the real-valued Burkholder-Davis-Gudy iequality (with the costat from (Burkholder, 2002)) to the Baach space valued martigales: Lemma 6 With the otatio of Theorem 3, E max s=1,..., s Z t (2R max + 3) E V. Remarkably, the costat i the resultig BDG iequality is, up to a additive costat, proportioal to R max. Oce agai, we have ot see such results i the literature, yet they follow with ease from regret iequalities. We also remark that Theorem 3 ca be aturally exteded to p-smooth Baach spaces B. This is accomplished i a straightforward maer by extedig Lemma Probabilistic Iequalities ad Supervised Learig We ow look beyod liear predictio ad aalyze supervised learig problems with side iformatio. Here agai we establish a strog coectio betwee existece of predictio strategies, the i-expectatio iequalities for martigales, ad high-probability tail bouds. I cotrast to Sectio 2, we will ot preset ay algorithms. Note that the simplest example of the equivalece (for biary predictio ad i the absece of side iformatio) was already stated i the very begiig of this paper Supervised learig with side iformatio We let y 1,..., y {±1} ad x 1,..., x X for some abstract measurable set X. Let F be a class of [ 1, 1]-valued fuctios o X. Fix a cost fuctio l R R R, covex i the first argumet. For a give fuctio B F X R, we aim to costruct ŷ t = ŷ t (x 1,..., x t, y 1,..., y t 1 ) [ 1, 1] such that the followig adaptive boud holds: (x t, y t ), l(ŷ t, y t ) if { l(f(x t ), y t ) + B(f; x 1,..., x )}. (24) We may view ŷ t as a predictio of the ext value y t havig observed x t ad all the data thus far. I this paper, we focus o the liear loss l(a, b) = ab (equivaletly, absolute loss a b = 1 ab whe a [ 1, 1] ad b {±1}) ad the square loss l(a, b) = (a b) 2. We write (24) for the liear cost fuctio as sup { y t f(x t ) B(f; x 1,..., x )} while for the square loss it becomes sup { 2y t f(x t ) f(x t ) 2 B(f; x 1,..., x )} y t ŷ t (25) 2y t ŷ t ŷ 2 t. (26) 8

9 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Give a fuctio B ad a class F, there are two goals we may cosider: (a) certify the existece of (ŷ t ) (ŷ 1,..., ŷ ) satisfyig the pathwise iequality (24) for all sequeces (x t, y t ) ; or (b) give a explicit costructio of (ŷ t). Both questios have bee studied i the olie learig literature, but the o-costructive approach will play a especially importat role. Ideed, explicit costructios such as the simple gradiet descet update (2) might ot be available i more complex situatios, yet it is the existece of (ŷ t ) that yields the sought-after tail bouds. To certify the existece of a strategy (ŷ t ), cosider the followig object: A(F, B) = sup x t if ŷ t max y t { l(ŷ t, y t ) if { l(f(x t ), y t ) + B(f; x 1,..., x )}} (27) where the otatio stads for the repeated applicatio of the operators (the outer operators correspodig to t = 1). The variable x t rages over X, y t is i the set {±1}, ad ŷ t rages i [ 1, 1]. It follows that A(F, B) 0 is a ecessary ad sufficiet coditio for the existece of (ŷ t ) such that (24) holds. Ideed, the optimal choice for ŷ 1 is made give x 1 ; the optimal choice for ŷ 2 is made give x 1, y 1, x 2, ad so o. This choice defies the optimal strategy (ŷ t ). 3 The other directio is immediate. Suppose we ca fid a upper boud o A(F, B) ad the prove that this upper boud is o-positive. This would serve as a sufficiet coditio for the existece of (ŷ t ). Next, we preset such a upper boud for the case whe the cost fuctio is liear. More geeral results for covex Lipschitz cost fuctios ca be foud i (Foster et al., 2015) Liear loss As i the itroductio, let ε = (ε 1,..., ε ) be a sequece of idepedet Rademacher radom variables. Let x = (x 1,..., x ) ad y = (y 1,..., y ) be predictable processes with respect to the dyadic filtratio (σ(ε 1,..., ε t )) t=0, with values i X ad {±1}, respectively. I other words, x t = x t (ε 1,..., ε t 1 ) X ad y t = y t (ε 1,..., ε t 1 ) {±1} for each t = 1,...,. Oe ca thik of the collectios (x t ) ad (y t ) as trees labeled, respectively, by elemets of X ad {±1}. Lemma 7 For the case of the liear cost fuctio, A(F, B) = sup x E [sup Therefore, the followig are equivalet: ε t f(x t ) B(f; x 1,..., x )]. (28) For ay predictable process x = (x 1,..., x ) E [sup 3. If the ifima are ot achieved, a limitig argumet ca be employed. ε t f(x t ) B(f; x 1,..., x )] 0, (29) 9

10 Rakhli Sridhara There exists a strategy (ŷ t ) such that the pathwise iequality (25) holds. Furthermore, the strategy ca be assumed to satisfy ŷ t sup f(x t ). (30) The i-expectatio boud of (29) is a ecessary ad sufficiet coditio for the existece of a strategy with the per-sequece boud (25). This latter boud, however, implies a highprobability statemet, i the spirit of the other results i the paper. Below, we detail this amplificatio. Take ay X -valued predictable process x = (x 1,..., x ) with respect to the dyadic filtratio. The determiistic iequality (25) applied to x t = x t (ε 1,..., ε t 1 ) ad y t = ε t becomes sup { ε t f(x t ) B(f; x 1,..., x )} for ay sample path (ε 1,..., ε ), ad thus we have the compariso of tails ε t ŷ t (31) { ε t f(x t ) B(f; x 1,..., x )} > u) P ( ε t ŷ t > u). (32) Give the boudedess of the icremets ε t ŷ t, the tail bouds follow immediately from the Azuma-Hoeffdig s iequality or from Freedma s iequality (Freedma, 1975). More precisely, we use the fact that the martigale differeces are bouded by ŷ t sup f(x t ), ad coclude: Lemma 8 If there exists a predictio strategy (ŷ t ) that satisfies (25) ad (30), the for ay predictable process x, the Azuma-Hoeffdig iequality implies that { ε t f(x t ) B(f; x 1,..., x )} > u) exp ( 4 max ε sup f(x t (ε)) 2 ), (33) Freedma s iequality implies u 2 { ε t f(x t ) B(f; x 1,..., x )} > u, sup f(x t ) 2 σ 2 ) exp ( 2σ 2 + 2uM/3 ), u 2 (34) where M = sup,ε {±1},t f(x t ), ad we also have that for ay α > 0, { ε t f(x t ) B(f; x 1,..., x )} α sup f(x t ) 2 > u) exp ( 2αu). (35) I view of Lemma 7, a sufficiet coditio for these iequalities is that (29) holds for all x. Let us emphasize the coclusio of the above lemma: the o-positivity of the expected supremum of a collectio of martigales, offset by a fuctio B, implies existece of a regret-miimizatio strategy, which implies a high-probability tail boud. To close the loop, 10

11 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities we itegrate out the tails, obtaiig a i-expectatio boud of the form (29), but possibly with a somewhat larger B fuctio (this depeds o the particular form of B). I additio to describig the equivalece, let us capitalize o it ad prove a ew tail boud. The most basic B is a costat that depeds o the complexity of F, but ot o f or the data. Defie the worst-case sequetial Rademacher averages as R (F) sup x E sup Clearly, B = R (F) satisfies (29) ad the followig is immediate. ε t f(x t ). (36) Corollary 9 For ay F R X ad a X -valued predictable process x with respect to the dyadic filtratio, ε t f(x t ) > R (F) + u) exp ( 4 max ε sup f(x t (ε)) 2 ). (37) Superficially, (37) looks like a oe-sided versio of a deviatio boud for classical (i.i.d.) Rademacher averages (Bouchero et al., 2013). However, sequetial Rademacher averages are ot Lipschitz with respect to a flip of a sig, as all of the remaiig path may chage after a flip. It is uclear to the authors how to prove (37) through other existig methods Square loss Due to limited space, we will ot state the aalogue of Lemma 7 ad simply outlie the implicatio from existece of regret miimizatio strategies to high probability tail bouds. As for the case of the liear loss fuctio, take ay X -valued predictable process x = (x 1,..., x ) with respect to the dyadic filtratio. Fix α > 0. The determiistic iequality (26) for x t = x t (ε 1,..., ε t 1 ) ad y t = 1 α ε t becomes sup { ( 2 α ε tf(x t ) f 2 (x t )) B(f; x 1,..., x )} As i the proof of (35), we obtai a tail compariso u 2 2 α ε tŷ t ŷ 2 t. (38) { ( 2 α ε tf(x t ) f 2 (x t )) B(f; x 1,..., x )} > u α ) (39) P ( ( 2 α ε tŷ t ŷ 2 t ) > u ) exp { αu α 2 } where the last iequality follows via a stadard aalysis of the momet geeratig fuctio. As a example, cosider the Azoury-Vovk-Warmuth forecaster for liear regressio (see e.g. (Cesa-Biachi ad Lugosi, 2006, Sec. 11.8)). Take the class F to be the class of fuctios F = {x f, x f B2 d}, where Bd 2 is the uit Euclidea ball i Rd. Assumig X = B2 d, the regret boud for the forecaster is kow to be B(f; x 1,..., x ) = f 2 + Y 2 xt T A 1 t x t, 11

12 Rakhli Sridhara where A t = I + t s=1 x t xt T ad Y = max y t. However, whe F is idexed by the uit ball, the supremum i (39) has a closed form expressio, ad the overall probability iequality takes o the form P 2 ε t x t 1 2 A 1 xt T A 1 t x t + u exp { u}. (40) We poit out that, beig fuctios of Rademacher radom variables, x t s are radom variables themselves, ad the terms i the above expressio are depedet i a o-trivial maer. We would like to refer the reader to the full versio of this paper (Rakhli ad Sridhara, 2015) which cotais further implicatios of the equivalece betwee the existece of determiistic strategies ad tail bouds. I particular, the amplificatio allows us to prove a characterizatio of a otio of martigale type beyod the liear case. 4. Symmetrizatio: dyadic filtratio is eough I Sectio 3, we preseted coectios betwee determiistic regret iequalities i the supervised settig ad tail bouds for dyadic martigales. Oe may ask whether these tail bouds ca be used for more geeral martigales idexed by some set. The purpose of this sectio is to prove that statemets for the dyadic filtratio ca be lifted to geeral processes via sequetial symmetrizatio. Cosider the martigale M g = g(z t ) E[g(Z t ) Z 1,..., Z t 1 ] idexed by g G. If (Z t ) is adapted to a dyadic filtratio A t = σ(ε 1,..., ε t ), each icremet g(z t ) E[g(Z t ) Z 1,..., Z t 1 ] takes o the value f g (x t (ε 1 t 1 )) (g(z t (ε 1 t 1, +1)) g(z t (ε 1 t 1, 1))) /2 or its egatio, where x t is a predictable process with values i Z Z ad f g F defied by (z, z ) g(z) g(z ). I Sectio 3, we worked directly with martigales of the form M f = ε t f(x t (ε)), idexed by a abstract class F R X ad a abstract X -valued predictable process x. We exted the symmetrizatio approach of Pacheko (Pacheko, 2003) to sequetial symmetrizatio for the case of martigales. I cotrast to the more frequetly-used Gié-Zi symmetrizatio proof (via Chebyshev s iequality) (Gié ad Zi, 1984; Va Der Vaart ad Weller, 1996) that allows a direct tail compariso of the symmetrized ad the origial processes, Pacheko s approach allows for a idirect compariso. The followig immediate extesio of (Pacheko, 2003, Lemma 1) will imply that ay exp{ µ(u)} type tail behavior of the symmetrized process yields the same behavior for the origial process. Lemma 10 Suppose ξ ad ν are radom variables ad for some Γ 1 ad for all u 0 P (ν u) Γ exp{ µ(u)}. 12

13 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Let µ R + R + be a icreasig differetiable fuctio with µ(0) = 0 ad µ( ) =. Suppose for all a R ad φ(x) µ([x a] + ) it holds that Eφ(ξ) Eφ(ν). The for ay u 0, P (ξ u) Γ exp{ µ(u µ 1 (1))}. I particular, if µ(b) = cb, we have P (ξ u) Γ exp{1 cu}; if µ(b) = cb 2, the P (ξ u) Γ exp{1 cu 2 /4}. As i (Pacheko, 2003), the lemma will be used with ξ ad ν as fuctios of a sigle sample ad the double sample, respectively. The expressio for the double sample will be symmetrized i order to pass to the dyadic filtratio. However, ulike (Pacheko, 2003), we are dealig with a depedet sequece Z 1,..., Z, ad the meaig ascribed to the secod sample Z 1,..., Z is that of a coditioally idepedet taget sequece. That is, Z t, Z t are idepedet ad have the same distributio coditioally o Z 1,..., Z t 1. Let E t 1 stad for the coditioal expectatio give Z 1,..., Z t 1. Corollary 11 Let B G Z 2 R be a fuctio that is symmetric with respect to the swap of the i-th pair z i, z i, for ay i []: B(g; z 1, z 1,..., z i, z i,..., z, z ) = B(g; z 1, z 1,..., z i, z i,..., z, z ) (41) for all g G. The, uder the assumptios of Lemma 10 o µ, a tail behavior (z, z ), g G ε t (g(z t ) g(z t)) B(g; (z 1, z 1),..., (z, z )) > u) Γ exp{ µ(u)} for all u > 0 implies the tail boud g G (g(z t ) E t 1 g(z t )) E Z 1 B(g; Z1, Z 1,..., Z, Z ) > u) Γ exp{ µ(u µ 1 (1))} for ay sequece of radom variables Z 1,..., Z ad the correspodig taget sequece Z 1,..., Z. The supremum is take over a pair of predictable processes z, z with respect to the dyadic filtratio. A direct compariso of the expected suprema also holds: E sup g G (g(z t ) E t 1 g(z t )) E Z 1 B(g; Z1, Z 1,..., Z, Z ) (42) sup E sup ε t (g(z t ) g(z t)) B(g; (z 1, z 1),..., (z, z )). z,z g G We coclude that it is eough to prove tail bouds for a supremum sup ε t f(x t ) B(f; x 1,..., x ) of a martigale with respect to the dyadic filtratio, offset by a fuctio B(f; x 1,..., x ), as doe i Sectio 3. 13

14 Rakhli Sridhara Ackowledgemets Research is supported i part by the NSF uder grats o CDS&E-MSS ad Refereces J. Aberethy, A. Agarwal, P. Bartlett, ad A. Rakhli. A stochastic view of optimal regret through miimax duality. I Proceedigs of the 22th Aual Coferece o Learig Theory, B. Acciaio, M. Beiglbck, F. Peker, W. Schachermayer, ad J. Temme. A trajectorial iterpretatio of Doob s martigale iequalities. A. Appl. Probab., 23(4): , URL M. Beiglböck ad M. Nutz. Martigale iequalities ad determiistic couterparts. Electro. J. Probab, 19(95):1 15, M. Beiglböck ad P. Siorpaes. Pathwise versios of the burkholder davis gudy iequality. Beroulli, 21(1): , B. Bercu, B. Delyo, ad E. Rio. Cocetratio iequalities for sums ad martigales, J. Borwei, A. Guirao, P. Hájek, ad J. Vaderwerff. Uiformly covex fuctios o baach spaces. Proceedigs of the America Mathematical Society, 137(3): , S. Bouchero, G. Lugosi, ad P. Massart. Cocetratio iequalities: A oasymptotic theory of idepedece. Oxford Uiversity Press, D. Burkholder. The best costat i the davis iequality for the expectatio of the martigale square fuctio. Trasactios of the America Mathematical Society, 354(1):91 105, N. Cesa-Biachi ad G. Lugosi. Predictio, Learig, ad Games. Cambridge Uiversity Press, T. Cover. Behaviour of sequetial predictors of biary sequeces. I Proc. 4th Prague Cof. Iform. Theory, Statistical Decisio Fuctios, Radom Processes, V. H de la Peña, T. L. Lai, ad Q.-M. Shao. Self-ormalized processes: Limit theory ad Statistical Applicatios. Spriger, D. Foster, A. Rakhli, ad K. Sridhara. Adaptive olie learig, I Submissio. D. A Freedma. O tail probabilities for martigales. the Aals of Probability, pages , E. Gié ad J. Zi. Some limit theorems for empirical processes. Aals of Probability, 12(4): ,

15 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities A. Gushchi. O pathwise couterparts of Doob s maximal iequalities. Proceedigs of the Steklov Istitute of Mathematics, 1(287): , D. Pacheko. Symmetrizatio approach to cocetratio iequalities for empirical processes. Aals of Probability, 31(4): , I. Pielis. Optimum bouds for the distributios of martigales i baach spaces. The Aals of Probability, 22(4): , A. Rakhli ad K. Sridhara. O equivalece of martigale tail bouds ad determiistic regret iequalities. arxiv preprit arxiv: , A. Rakhli ad K. Sridhara. A tutorial o olie supervised learig with applicatios to ode classificatio i social etworks. CoRR, abs/ , URL http: //arxiv.org/abs/ A. Rakhli, K. Sridhara, ad A. Tewari. Olie learig: Radom averages, combiatorial parameters, ad learability. Advaces i Neural Iformatio Processig Systems 23, pages , A. Rakhli, K. Sridhara, ad A. Tewari. Olie learig via sequetial complexities. Joural of Machie Learig Research, N. Srebro, K. Sridhara, ad A. Tewari. O the uiversality of olie mirror descet. I NIPS, pages , A. W. Va Der Vaart ad J. A. Weller. Weak Covergece ad Empirical Processes: With Applicatios to Statistics. Spriger Series, March Appedix A. Proofs Lemma 12 The update i (2) satisfies z 1,..., z B, ŷ t f, z t. Proof [Lemma 12] The followig two-lie proof is stadard. By the property of a projectio, ŷ t+1 f 2 = Proj B (ŷ t 1/2 z t ) f 2 (ŷ t 1/2 z t ) f 2 (43) = ŷ t f z t 2 2 1/2 ŷ t f, z t. (44) Rearragig, 2 1/2 ŷ t f, z t ŷ t f 2 ŷ t+1 f z t 2. Summig over t = 1,..., yields the desired statemet. 15

16 Rakhli Sridhara Proof [Lemma 6] Because of the aytime property of the regret boud ad the strategy defiitio, we ca write (19) as s s max { Z t ŷ t, Z t } 2R max V (45) s=1,..., simply because the right-had side is largest for s =. Sub-additivity of max implies s max Z t 2R max V max ŷ t, Z t. (46) s=1,..., s s=1,..., By the Burkholder-Davis-Gudy iequality (with the costat from Burkholder (2002)), E max s s=1,..., ŷ t, Z t 3E ( 1/2 ŷ t, Z t 2 ) 3E V. (47) Proof [Lemma 2] Let ŷ t+1 be the urestricted miimum of (15). Because of the update form, f F, ŷ t+1 f, z t 1 η t (D R (f, ŷ t ) D R (f, ŷ t+1 ) D R (ŷ t+1, ŷ t )). Summig over t = 1,...,, ŷ t+1 f, z t η1 1 D R (f, ŷ 1 ) + η1 1 Rmax 2 + R 2 maxη 1 (ηt 1 t=2 (ηt 1 ηt 1)R 2 max t=2 η 1 t 2 ŷ t+1 ŷ t 2, ηt 1)D R (f, ŷ t ) η 1 t 2 ŷ t+1 ŷ t 2 η 1 t D R (ŷ t+1, ŷ t ) where we used strog covexity of R ad the fact that η t is oicreasig. Next, we write ŷ t f, z t = ad upper boud the secod term by otig that Combiig the bouds, ŷ t+1 f, z t + ŷ t ŷ t+1, z t ŷ t ŷ t+1, z t ŷ t ŷ t+1 z t η 1 t 2 ŷ t ŷ t η t 2 z t 2. ŷ t f, z t R 2 maxη 1 + η t 2 z t 2. (48) 16

17 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Usig the fact (Cesa-Biachi ad Lugosi, 2006, Lemma 11.8) that for oegative (α t ) ad the defiitio of η t, α t t s=1 αs 2 α t ŷ t f, z t 2R max z t 2. (49) Proof [Lemma 5] Let E t 1 [ ] = E t 1 [ Z 1 t 1 ] deote coditioal expectatio. We have E t 1 exp {λ ŷ t, Z t E t 1 Z t λ2 ( Z t 2 + E t 1 Z t 2 )} E t 1 exp {λ ŷ t, Z t Z t λ2 ( Z t 2 + Z t 2 )} E t 1 E ɛ exp {λɛ ŷ t, Z t Z t λ2 ( Z t 2 + Z t 2 )}. Sice exp is a covex fuctio, the expressio is E t 1 E ɛ exp { 1 2 (2λɛ ŷ t, Z t 2λ 2 Z t 2 ) (2λɛ ŷ t, Z t 2λ2 Z t 2 )} 1 2 E t 1E ɛ exp {2λɛ ŷ t, Z t 2λ 2 Z t 2 } E t 1E ɛ exp {2λɛ ŷ t, Z t 2λ2 Z t 2 } = E t 1 E ɛ exp {2λɛ ŷ t, Z t 2λ 2 Z t 2 } E t 1 exp {2λ 2 ŷ t, Z t 2 2λ 2 Z t 2 } 1 sice ŷ t 1. Repeatig this argumet for t = to t = 1 yields the statemet. If Z t are coditioally symmetric, the ŷ t, Z t are also coditioally symmetric. Hece, E t 1 exp {λ ŷ t, Z t λ2 2 Z t 2 } = E t 1 E ɛ exp {λɛ ŷ t, Z t λ2 2 Z t 2 } E t 1 exp { λ2 2 ŷ t, Z t 2 λ2 2 Z t 2 } 1. Proof [Lemma 7] For biary outcomes y {±1} ad either absolute loss or liear loss, A(F, B) = sup x t if ŷ t max y t { y t ŷ t + sup { y t f(x t ) B(f; x 1,..., x )}}, where we shall restrict ŷ t to rage over the iterval ŷ t sup f(x t ) ad y t i {±1}. Cosider the last step t =. Give x 1, ŷ 1 1, ad y 1 1, we solve if max { ŷ y + φ (x 1, y 1 )} (50) ŷ y 17

18 Rakhli Sridhara where φ (x 1, y 1 ) sup { y t f(x t ) B(f; x 1,..., x )}. (51) Sice there are two possibilities for y, the closed form solutio for ŷ is give by ŷ = 1 2 (φ (x 1, y 1 1, 1) φ (x 1, y 1 1, 1)). (52) Importatly, this value satisfies ŷ sup f(x ). With this optimal choice, (50) is equal to E ε φ (x 1, y 1 1, ε ). We ow iclude the supremum over x i the defiitio of φ 1 φ 1 (x 1 1, y 1 1 ) sup x E ε φ (x 1, y 1 1, ε ) ad repeat the argumet for t = 1. Sice all the steps are equalities, A(F, B) = φ 0 ( ) = sup x 1 which ca be writte as (28). E ε1... sup x E ε sup { ε t f(x t ) B(f; x 1,..., x )}, Proof [Lemma 10] We have P (ξ u) Eφ(ξ) φ(u) Eφ(ν) φ(u) 1 φ(u) (φ(0) + 0 φ (x)p (ν x)dx). Choose a = u µ 1 (1), where µ 1 is the iverse fuctio. If a < 0, the coclusio of the lemma is true sice Γ 1. I the case of a 0, we have φ(0) = 0. The above upper boud becomes P (ξ u) Γ φ(u) φ (x) exp{ µ(x)}dx = 0 = If µ(b) = cb, we have If µ(b) = cb 2, we have Γ φ(u) a µ (x) exp{ µ(x)}dx Γ µ(u a) [ exp{ µ(x)}] a = Γ exp{ µ(a)} = Γ exp{ µ(u µ 1 (1))}. P (ξ u) Γ exp{ c(u 1/c)} = Γ exp{1 cu}. P (ξ u) Γ exp{ c(u 1/ c) 2 } Γ exp{ cu 2 /4} wheever u 2/ c. If u 2/ c, the coclusio is valid sice Γ 1. Proof [Corollary 11] Let ξ(z 1,..., Z, Z 1,..., Z ) = sup g (g(z t ) g(z t)) B(g; Z 1, Z 1,..., Z, Z ) 18

19 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities ad ν(z 1,..., Z ) = sup g (g(z t ) E t 1 g(z t)) E Z 1 B(g; Z1, Z 1,..., Z, Z ). The for ay covex φ R R, Eφ(ν) Eφ(ξ) usig covexity of the supremum. The problem is ow reduced to obtaiig tail bouds for f Write the probability as (g(z t ) g(z t)) B(g; Z 1, Z 1,..., Z, Z ) > u). E1 {ξ(z 1,..., Z, Z 1,..., Z ) > u}. We ow proceed to replace the radom variables from backwards with a dyadic filtratio. Let us start with the last idex. Reamig Z ad Z we see that E1 {sup g (g(z t ) g(z t)) B(g; Z 1, Z 1,..., Z, Z ) > u} 1 = E1 {sup g 1 = EE ɛ 1 {sup g E sup z,z (g(z t ) g(z t)) + (g(z ) g(z )) B(g; Z 1, Z 1,..., Z, Z ) > u} 1 E ɛ 1 {sup g (g(z t ) g(z t)) + ɛ (g(z ) g(z )) B(g; Z 1, Z 1,..., Z, Z ) > u} (g(z t ) g(z t)) + ɛ (g(z ) g(z )) B(g; Z 1, Z 1,..., Z 1, Z 1, z, z ) > u}. Proceedig i this maer for step 1 ad back to t = 1, we obtai a upper boud of sup E ɛ1... sup z 1,z 1 z,z = sup x E1 {sup g E ɛ 1 {sup g ɛ t (g(z t ) g(z t)) B(g; z 1, z 1,..., z, z ) > u} ɛ t f g (x t ) B(g; x 1,..., x ) > u}. 19

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925

More information

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Alexader Rakhli Uiversity of Pesylvaia Karthik Sridhara Corell Uiversity October 17, 2015 Abstract We study a equivalece of (i) determiistic

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information

Self-normalized deviation inequalities with application to t-statistic

Self-normalized deviation inequalities with application to t-statistic Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Berry-Esseen bounds for self-normalized martingales

Berry-Esseen bounds for self-normalized martingales Berry-Essee bouds for self-ormalized martigales Xiequa Fa a, Qi-Ma Shao b a Ceter for Applied Mathematics, Tiaji Uiversity, Tiaji 30007, Chia b Departmet of Statistics, The Chiese Uiversity of Hog Kog,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

7 Sequences of real numbers

7 Sequences of real numbers 40 7 Sequeces of real umbers 7. Defiitios ad examples Defiitio 7... A sequece of real umbers is a real fuctio whose domai is the set N of atural umbers. Let s : N R be a sequece. The the values of s are

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Application to Random Graphs

Application to Random Graphs A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

The random version of Dvoretzky s theorem in l n

The random version of Dvoretzky s theorem in l n The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Appendix to Quicksort Asymptotics

Appendix to Quicksort Asymptotics Appedix to Quicksort Asymptotics James Alle Fill Departmet of Mathematical Scieces The Johs Hopkis Uiversity jimfill@jhu.edu ad http://www.mts.jhu.edu/~fill/ ad Svate Jaso Departmet of Mathematics Uppsala

More information

Notes 5 : More on the a.s. convergence of sums

Notes 5 : More on the a.s. convergence of sums Notes 5 : More o the a.s. covergece of sums Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: Dur0, Sectios.5; Wil9, Sectio 4.7, Shi96, Sectio IV.4, Dur0, Sectio.. Radom series. Three-series

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Chapter IV Integration Theory

Chapter IV Integration Theory Chapter IV Itegratio Theory Lectures 32-33 1. Costructio of the itegral I this sectio we costruct the abstract itegral. As a matter of termiology, we defie a measure space as beig a triple (, A, µ), where

More information

A Proof of Birkhoff s Ergodic Theorem

A Proof of Birkhoff s Ergodic Theorem A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

The Wasserstein distances

The Wasserstein distances The Wasserstei distaces March 20, 2011 This documet presets the proof of the mai results we proved o Wasserstei distaces themselves (ad ot o curves i the Wasserstei space). I particular, triagle iequality

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

Solutions to Tutorial 5 (Week 6)

Solutions to Tutorial 5 (Week 6) The Uiversity of Sydey School of Mathematics ad Statistics Solutios to Tutorial 5 (Wee 6 MATH2962: Real ad Complex Aalysis (Advaced Semester, 207 Web Page: http://www.maths.usyd.edu.au/u/ug/im/math2962/

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,

More information

On forward improvement iteration for stopping problems

On forward improvement iteration for stopping problems O forward improvemet iteratio for stoppig problems Mathematical Istitute, Uiversity of Kiel, Ludewig-Mey-Str. 4, D-24098 Kiel, Germay irle@math.ui-iel.de Albrecht Irle Abstract. We cosider the optimal

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT TR/46 OCTOBER 974 THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION by A. TALBOT .. Itroductio. A problem i approximatio theory o which I have recetly worked [] required for its solutio a proof that the

More information

Chapter 7 Isoperimetric problem

Chapter 7 Isoperimetric problem Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated

More information

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Lecture 2: Concentration Bounds

Lecture 2: Concentration Bounds CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

Assignment 5: Solutions

Assignment 5: Solutions McGill Uiversity Departmet of Mathematics ad Statistics MATH 54 Aalysis, Fall 05 Assigmet 5: Solutios. Let y be a ubouded sequece of positive umbers satisfyig y + > y for all N. Let x be aother sequece

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

Glivenko-Cantelli Classes

Glivenko-Cantelli Classes CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce

More information

Mi-Hwa Ko and Tae-Sung Kim

Mi-Hwa Ko and Tae-Sung Kim J. Korea Math. Soc. 42 2005), No. 5, pp. 949 957 ALMOST SURE CONVERGENCE FOR WEIGHTED SUMS OF NEGATIVELY ORTHANT DEPENDENT RANDOM VARIABLES Mi-Hwa Ko ad Tae-Sug Kim Abstract. For weighted sum of a sequece

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Lecture 4: April 10, 2013

Lecture 4: April 10, 2013 TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a

More information

5. INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY

5. INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY IA Probability Let Term 5 INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY 51 Iequalities Suppose that X 0 is a radom variable takig o-egative values ad that c > 0 is a costat The P X c E X, c is

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Feedback in Iterative Algorithms

Feedback in Iterative Algorithms Feedback i Iterative Algorithms Charles Byre (Charles Byre@uml.edu), Departmet of Mathematical Scieces, Uiversity of Massachusetts Lowell, Lowell, MA 01854 October 17, 2005 Abstract Whe the oegative system

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic

More information

Measure and Measurable Functions

Measure and Measurable Functions 3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions Math 451: Euclidea ad No-Euclidea Geometry MWF 3pm, Gasso 204 Homework 3 Solutios Exercises from 1.4 ad 1.5 of the otes: 4.3, 4.10, 4.12, 4.14, 4.15, 5.3, 5.4, 5.5 Exercise 4.3. Explai why Hp, q) = {x

More information

The log-behavior of n p(n) and n p(n)/n

The log-behavior of n p(n) and n p(n)/n Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity

More information

Preponderantly increasing/decreasing data in regression analysis

Preponderantly increasing/decreasing data in regression analysis Croatia Operatioal Research Review 269 CRORR 7(2016), 269 276 Prepoderatly icreasig/decreasig data i regressio aalysis Darija Marković 1, 1 Departmet of Mathematics, J. J. Strossmayer Uiversity of Osijek,

More information

Solutions to HW Assignment 1

Solutions to HW Assignment 1 Solutios to HW: 1 Course: Theory of Probability II Page: 1 of 6 Uiversity of Texas at Austi Solutios to HW Assigmet 1 Problem 1.1. Let Ω, F, {F } 0, P) be a filtered probability space ad T a stoppig time.

More information

Disjoint Systems. Abstract

Disjoint Systems. Abstract Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information