On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations

Size: px
Start display at page:

Download "On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations"

Transcription

1 On he Online Frank-Wolfe Algorihms for Convex and Non-convex Opimizaions Jean Lafond, Hoi-To Wai, Eric Moulines Augus 16, 016 arxiv: v [sa.ml] 15 Aug 016 Absrac In his paper, he online varians of he classical Frank-Wolfe algorihm are considered. We consider minimizing he regre wih a sochasic cos. The online algorihms only require simple ieraive updaes and a non-adapive sep size rule, in conras o he hybrid schemes commonly considered in he lieraure. Several new resuls are derived for convex and non-convex losses. Wih a srongly convex sochasic cos and when he opimal soluion lies in he inerior of he consrain se or he consrain se is a polyope, he regre bound and anyime opimaliy are shown o be Olog 3 T/T and Olog T/T, respecively, where T is he number of rounds played. These resuls are based on an improved analysis on he sochasic Frank-Wolfe algorihms. Moreover, he online algorihms are shown o converge even when he loss is non-convex, i.e., he algorihms find a saionary poin o he ime-varying/sochasic loss a a rae of O 1/T. Numerical experimens on realisic daa ses are presened o suppor our heoreical claims. 1 Inroducion Recenly, Frank-Wolfe FW algorihm [FW56] has become popular for high-dimensional consrained opimizaion. Compared o he projeced gradien PG algorihm see [BT09, JN1a, JN1b, NJLS09], he FW algorihm a.k.a. condiional gradien mehod is appealing due o is projecion-free naure. The cosly projecion sep in PG is replaced by a linear opimizaion in FW. The laer admis a closed form soluion for many problems of ineress in machine learning. This work focuses on he online varians of he FW and he FW wih away sep AW algorihms. A each round, he proposed online FW/AW algorihms follow he same updae equaion applied in classical FW/AW and a sep size is aken according o a non-adapive rule. The only modificaion involved is ha we use an online-compued aggregaed gradien as a surrogae of he rue gradien of he expeced loss ha we aemp o minimize. We esablish fas convergence of he algorihms under various condiions. Fas convergence for projecion-free algorihms have been sudied in [LJJ13, LJJ15, GH15a, GH15b, LZ14, HL16]. However, many works have considered a hybrid approach ha involves solving a regularized linear opimizaion during he updaes [GH15b, LZ14]; or combining exising algorihms wih FW [HL16]. In paricular, he auhors in [GH15b] showed a regre bound of Olog T/T for heir online projecion-free algorihm, where T is he number of ieraions, under an adversarial seing. This maches he opimal bound for srongly convex loss. The drawback of hese algorihms lies on he exra complexiies in implemenaion and compuaion added o he classical FW algorihm. Our aim is o show ha simple online projecion-free mehods can achieve on-he-par convergence guaranees as he sophisicaed algorihms menioned above. In paricular, we presen a se of new resuls for online FW/AW algorihms under he full informaion seing, i.e., complee knowledge abou he loss Insiu Mines-Telecom, Telecom ParisTech, CNRS LTCI, Paris, France. jean.lafond@elecom-parisech.fr School of Elecrical, Compuer and Energy Engineering, Arizona Sae Universiy, AZ, USA. hwai@asu.edu J. Lafond and H.-T. Wai have conribued equally. CMAP, Ecole Polyechnique, Palaiseau, France. eric.moulines@polyechnique.edu 1

2 funcion is rerieved a each round [ADX10] see secion. Our online FW algorihm is similar o he online projecion-free mehod proposed in [HK1], while he online AW algorihm is new. For online FW algorihms, [HK1] has proven a regre of O log T/T for convex and smooh sochasic coss. We improve he regre bound o Olog 3 T/T under wo differen ses of assumpions: a he sochasic cos is srongly convex, he opimal soluions lie in he inerior of C cf. H1, for online FW; b C is a polyope cf. H, for online AW. An improved anyime opimaliy bound of Olog T/T compared o O log T/T in [HK1] is also proven. We compare our resuls o he sae-of-he-ar in Table 1. Seings Regre bound Anyime bound Garber & Hazan, 015 [GH15b] Hazan & Kale, 01 [HK1] This work Hybrid algo., Lipschiz cvx. loss Hybrid algo., srong cvx. loss Simple algo., Lipschiz cvx. loss Simple algo., srong cvx. loss Simple algo., srong cvx. loss, inerior poin online FW Simple algo., srong cvx. loss, polyope cons. online AW O 1/T O log T/T Olog T/T Olog T/T O log T/T O log T/T O log T/T O log T/T Olog 3 T/T Olog T/T Olog 3 T/T Olog T/T Table 1: Convergence rae comparison. Noe ha he regre bound for [GH15b] is given under an adversarial loss seing, while he bounds for [HK1] and our work are based on a sochasic cos. Depending on he applicaions see Secion 5 & Appendix I, our regre and anyime bounds can be improved o Olog T/T and Olog T/T, respecively. Anoher ineresing discovery is ha he online FW/AW algorihms converge o a saionary poin even when he loss is non-convex, a a rae of O1/ T. To he bes of our knowledge, his is he firs convergence rae resul for non-convex online opimizaion wih projecion-free mehods. To suppor our claims, we perform numerical experimens on online marix compleion using realisic daase. The proposed online schemes ouperform a simple projeced gradien mehod in erms of running ime. The algorihm also demonsraes excellen performance for robus binary classificaion. Relaed Works. In addiion o he references menioned above, his work is relaed o he sudy of sochasic opimizaion, e.g., [GL15,NJLS09]. [GL15] describes a FW algorihm using sochasic approximaion and proves ha he opimaliy gap converges o zero almos surely; [NJLS09] analyses he sochasic projeced gradien mehod and proves ha he convergence rae is Olog / under srong convexiy and ha he opimal soluion lies in he inerior of C. This is similar o assumpion H1 in his paper. Lasly, mos recen works on non-convex opimizaion are based on he sochasic projeced gradien descen mehod [AZH16, GHJY15]. Projecion-free non-convex opimizaion has only been addressed by a few auhors [GL15,EV76]. A he ime when we finished wih he wriing, we noice ha several auhors have published aricles peraining o offline, non-convex FW algorihm, e.g., [LJ16] achieves he same convergence rae as ours wih an adapive sep size, [JLMZ16] considers a differen assumpion on he smoohness of loss funcion, [YZS14] has a slower convergence rae han ours. Neverheless, none of he above has considered an online opimizaion seing wih ime varying objecive like ours. Noaion. For any n N, le [n] denoe he se {1,, n}. The inner produc on a n dimensional

3 real Euclidian space E is denoed by, and he associaed Euclidian norm by. The space E is also equipped wih a norm and is dual norm. Diameer of he se C w.r.. is denoed by ρ, ha is ρ := sup θ,θ C θ θ. In addiion, we denoe he diameer of C w.r.. he Euclidean norm as ρ, i.e., ρ := sup θ,θ C θ θ. The ih elemen in a vecor x is denoed by [x] i. Problem Seup and Algorihms We use he seing inroduced in [HK1]. The online learner wans o minimize a loss funcion f which is he expecaion of empirical loss funcions f θ = fθ; ω, where ω is drawn i.i.d. from a fixed disribuion D: fθ := E ω D [fθ; ω]. The regre of a sequence of acions {θ } T =1 is : R T := T 1 T =1 fθ min θ C fθ. 1 Here, C is a bounded convex se included in E and f is a coninuously differeniable funcion. Our proposed algorihms assume he full informaion seing [ADX10] such ha upon playing θ, we receive full knowledge abou he loss funcion θ f θ. The choice of θ +1 will be based on he previously observed loss {f s θ}. Le γ 0, 1] be a sequence of decreasing sep size see secion 3, F T θ = T =1 f θ he aggregaed loss and F T θ be he gradien of F θ evaluaed a θ, we sudy wo online algorihms. Online Frank-Wolfe O-FW. The online FW algorihm, inroduced in [HK1], is a direc generalizaion of he classical FW algorihm, as summarized in Algorihm 1. I differs from he classical FW algorihm only in he sense ha he aggregaed gradien F θ = 1 f sθ is used for he linear opimizaion in Sep 4. See he commen in Remark 3 for he complexiy of calculaing he aggregaed gradien. Algorihm 1 Online Frank-Wolfe O-FW. 1: Iniialize: θ 1 0 : for = 1,... do 3: Play θ and receive θ f θ. 4: Solve he linear opimizaion: a arg min a C 5: Compue θ +1 θ + γ a θ. 6: end for a, F θ. Online away-sep Frank-Wolfe O-AW. The online counerpar of he away sep algorihm is given in Algorihm. By consrucion, he ierae θ is a convex combinaion of exreme poins of C, referred o as acive aoms. We denoe by A he se of acive aoms and denoe by α a he posiive weigh of any acive aom a A a ime, ha is: θ = a A α a a wih α a > 0. 4 A each round, wo ypes of sep migh be aken. If he condiion of line 5 in Algorihm is saisfied, we call he ieraion a FW sep, oherwise we call i an AW sep. When a FW sep is aken, a new aom a FW is seleced 3, he curren ierae θ is moved owards a FW and he acive se is updaed accordingly lines 6 and 15. The seleced aom is he exreme poin of C which is maximally correlaed o he negaive aggregaed gradien. Noe ha his sep is idenical o a usual O-FW ieraion. When an AW sep is aken, a currenly acive aom a AW is seleced 3 and he curren ierae is moved away from a AW line 8 and 15. The aom a AW is he acive aom which is he mos correlaed o he curren gradien approximaion. The inuiion is ha aking he away sep prevens he algorihm from following a zig-zag pah when θ is close o he boundary of C [Wol70]. Lasly, we noe ha he O-AW algorihm is similar o a classical AW algorihm [Wol70]. The excepion is ha a fixed sep size rule is adoped due o he online opimizaion seing. Remark 1. As he linear opimizaion 3 enumeraes over he acive aoms A a round, he O-AW algorihm is suiable when C is an aomic or polyope se, oherwise A may become oo large. 3

4 Algorihm Online away sep Frank-Wolfe O-AW. 1: Iniialize: n 0 = 0, θ 1 = 0, A 1 = ; : for = 1,... do 3: Play θ and receive he loss funcion θ f θ. 4: Solve he linear opimizaions wih he aggregaed gradien: a FW arg min a C a, F θ, a AW arg max a A a, F θ 3 5: if a FW θ, F θ θ a AW, F θ or A = hen 6: FW sep: d a FW θ, n n 1 + 1, ˆγ γ n and A +1 A {a FW }. 7: else 8: d θ a AW, γ max = α aaw /1 α aaw, cf. 4 for definiion of α aaw. 9: if γ max γ n 1 hen 10: AW sep: n n and ˆγ γ n 11: else 1: Drop sep: ˆγ γ max, n n 1 and A +1 A \ {a AW } 13: end if 14: end if 15: Compue θ +1 θ + ˆγ d. 16: end for Remark Linear Opimizaion.. The run-ime complexiy of he O-FW and O-AW algorihms depends on finding efficien soluion o he linear opimizaion sep. In many cases, his is exremely efficien. For example, when C is he race-norm ball, hen he linear opimizaion amouns o finding he op singular vecors of he gradien; see [Jag13] for an overview. Remark 3 Complexiy per ieraion.. In addiion o he linear opimizaion, boh O-FW/O-AW algorihms require he aggregae gradien F θ o be compued a each round, and he complexiy involved grows wih he round number. In cases when he loss f is he negaed log-likelihood of an exponenial family disribuion, he gradien aggregaion can be replaced by an efficien on-he-fly updae, whose complexiy is a dimension-dependen consan over he ieraions. As demonsraed in Secion 5 and Appendix I, his se-up covers many problems of ineres, among ohers he online marix compleion and online LASSO. 3 Main Resuls This secion presens he main resuls for he convergence of O-FW/O-AW algorihms. Noice ha our resuls for convex losses are based on an improved analysis on he sochasic/inexac invarian of FW/AW algorihms see Anyime Analysis in subsecion 3.1, while he resuls for non-convex losses are derived from a novel observaion on he dualiy gap for FW algorihms. Due o space consrains, only he main resuls are displayed. Deailed proofs can be found in he appendices. Some consans are defined as follows. A funcion f is said o be µ-srongly convex if, for all θ, θ E, We also say f is L-smooh if for all θ, θ E we ge Lasly, f is said o be G-Lipschiz if for all θ, θ E, fθ f θ + fθ, θ θ µ/ θ θ. 5 f θ fθ + fθ, θ θ + L/ θ θ. 6 fθ f θ G θ θ. 7 4

5 3.1 Convex Loss We analyze firs Algorihm 1 and Algorihm when he expeced loss funcion f is convex. In paricular, our analysis will depend on he following geomeric condiion of he consrain se C. Denoe by C he boundary se of C. For Algorihm 1, we consider H1. There is a minimizer θ of f ha lies in he inerior of C, i.e., δ := inf s C s θ > 0. While H1 appears o be resricive, for Algorihm, we can work wih a relaxed condiion: H. C is a polyope. As argued in [LJJ15], H implies ha he pyramidal widh for C, δ AW := P dirw C, is posiive; see he definiion in 9 of he appendix. Regre Analysis. Our main resul is summarized as follows. For 0, 1, Theorem 1. Consider O-FW resp. O-AW. Assume H1 resp. H, fθ is µ-srongly convex, fθ; ω is L-smooh for all ω drawn from D and each elemen of f θ is sub-gaussian wih parameer σ D. Se γ = / + 1. Wih probabiliy a leas 1 and for all 1, he anyime loss bounds hold: O-FW O-AW fθ min θ C fθ 3/σ grd ρ + L ρ /δ µ log logn/ 1, fθ min θ C fθ 5/3σ grd ρ + L ρ /δ AW µ log logn/ 1, 8 where σ grd = Omax{σ D, ρl} n. Consequenly, summing up he wo sides of 8 from = 1 o = T gives he regre bound for boh O-FW and O-AW: T 1 T =1 fθ min θ C fθ = Olog 3 T/T, T 1. 9 Proof. To prove Theorem 1, we firs upper bound he gradien error of F θ, i.e., Proposiion. Assume ha fθ; ω is L-smooh for all ω from D and each elemen of he vecor f θ is sub-gaussian wih parameer σ D. Wih probabiliy a leas 1, F θ fθ = Omax{σ D, ρl} n log logn//, This shows ha F θ is an inexac gradien of he sochasic objecive fθ a θ. Our proof is achieved by applying Theorem 3 see below by plugging in he appropriae consans. We noice ha for O-FW, [HK1] has proven a regre bound of O log T/T, which is obained by applying a uniform approximaion bound on he objecive value and proving a O1/ bound for he insananeous loss F θ F θ. In conras, Theorem 1 yields an improved regre by conrolling he gradien error direcly using Proposiion and analyzing O-FW/O-AW as an FW/AW algorihm wih inexac gradien in he following. Anyime Analysis. The regre analysis is derived from he following general resul for FW/AW algorihms wih sochasic/inexac gradiens. Le ˆ fθ be an esimae of fθ which saisfies: H3. For some α 0, 1], σ 0 and K Z +. Wih probabiliy a leas 1, we have ˆ fθ fθ ση /{K + 1} α, 1, 11 where η 1 is an increasing sequence such ha he righ hand side decreases o 0. This is a more general seing han is required for he analysis of O-FW/O-AW as σ, α, η are arbirary. The O-FW or O-AW wih he above inexac gradien has he following convergence rae: 5

6 Theorem 3. Consider he sequence {θ } =1 generaed by O-FW resp. O-AW wih he aggregaed gradien F θ replaced by ˆ fθ saisfying H3 wih K =. Assume H1 resp. H and ha fθ is L-smooh, µ-srongly convex. Se γ = / + 1. Wih probabiliy a leas 1 and for all 1, we have O-FW O-AW fθ min θ C fθ max{3/ α, 1 + α/ α}σρ + L ρ /δ µ η / + 1 α, fθ min θ C fθ max{3/ α, 1 + α/ α}σρ + L ρ /δ AW µ η / + 1 α, 1 When α = 0.5, Theorem 3 improves he previous known bound of fθ min θ C fθ = O η / in [FG13, Jag13] under srong convexiy and H1 or H. I also maches he informaion-heoreical lower bound for srongly convex sochasic opimizaion in [RR11] up o a log facor. Moreover, for O-AW, he srong convexiy requiremen on f can be relaxed; see Appendix G. 3. Non-convex Loss Define respecively he dualiy gaps for O-FW and O-AW as g FW := F θ, θ a, g AW := F θ, a AW a FW, 13, a FW are defined in 3 of Algorihm. Using he = 0, hen θ is a saionary poin o he opimizaion problem min θ C F θ. Therefore, where a is defined in line 4 of Algorihm 1 and a AW definiion of a, if g FW g FW and similarly g AW can be seen as a measure o he saionariy of he poin θ o he online opimizaion problem. We analyze he convergence of O-FW/O-AW for general Lipschiz and smooh possibly non-convex loss funcion using he dualiy gaps defined above. To do so, we depar from he usual inducion based proof echnique e.g., in he previous secion or [Jag13, HK1]. Insead, our mehod of proof amouns o relae he dualiy gaps wih a learning rae conrolled by he sep size rule on γ. The main resul can be found below: Theorem 4. Consider O-FW and O-AW. Assume ha each of he loss funcion f is G-Lipschiz, L-smooh. Seing he sep size sequence as γ = α wih α [0.5, 1. We have min [T/+1,T ] gfw 1 α4gρ + L ρ /1 /3 1 α 1 T 1 α, T 6, min [T/+1,T ] gaw 1 α4gρ + L ρ 1 4/5 1 α 1 T 1 α, T Noice ha he above resul is deerminisic cf. he definiion of g FW, g AW and also works wih nonsochasic, non-convex losses. The above guaranees an O1/T 1 α rae for O-FW/O-AW a a cerain round wihin he inerval [T/ + 1, T ]. Unlike he regre/anyime analysis done previously, our bounds are saed wih respec o he bes dualiy gap aained wihin an inerval from = T/ + 1 o = T. This is a common arifac when analyzing he dualiy gap of FW [Jag13]. Furhermore, we can show ha: Proposiion 5. Consider O-FW or O-AW, assume ha each of f is G-Lipschiz, L-smooh and each of f θ is sub-gaussian wih parameer σ D. Se he sep size sequence as γ = α wih α [0.5, 1. Wih probabiliy a leas 1 and for T 0, here exiss [T/ + 1, T ] such ha max fθ, θ θ = O max { 1/T 1 α, log T/T }. 15 θ C The proposiion indicaes ha he ierae θ a round [T/ + 1, T ] is an O max { 1/T 1 α, log T/T } - saionary poin o he sochasic opimizaion min θ C fθ. Our proof relies on Theorem 4 and a uniform approximaion bound resul for F θ. 6

7 4 Skech of he Proof of Theorem 3 To provide some insighs, we presen he main ideas behind he proof of Theorem 3. To simplify he discussion we only consider O-FW, K = 1, η = 1 and α = 0.5 in H3. The full proof can be found in he supplemenary maerial. Since f is L-smooh and C has a diameer of ρ, we have fθ +1 fθ + γ fθ, a θ + γ L ρ / If we define := ˆ fθ fθ, and subrac fθ on boh sides, applying Cauchy Schwarz yields Observe ha as h, g FW zero. h +1 h γ g FW + γ L ρ / + γ ρ. 16 0, he dualiy gap erm g FW In fac, when f is convex, one can prove g FW leas 1, we have deermines he convergence rae of he sequence h o h ρ. By he assumpion H3, wih probabiliy a h +1 h γ h + γ L ρ / + γ ρσ/ = 1 γ h + O 1.5. Seing γ = 1/ and a simple inducion on he above inequaliy proves h = O1/. An imporan consequence of H1 is ha he laer leads o a igher lower bound on g FW. As we presen in Lemma 6 in Appendix B, under H1 and when f is µ-srongly convex, we can lower bound g FW as g FW max{0, δ µh ρ }. Noe ha h converges o zero and he above lower bound on g FW evenually will become igher han he previous one, i.e., g FW δ µh ρ h ρ. This leads o he acceleraed convergence of h. More formally, plugging he lower bound ino 16 gives h +1 h γ δ µh + γ L ρ / + γ ρσ/. Again, seing γ = 1/ and a carefully execued inducion argumen shows h = O1/. The same line of argumens is also used o prove he convergence rae of O-AW, where H will be required insead of H1 o provide a similarly igh lower bound o g AW. 5 Numerical Experimens We conduc numerical experimens o demonsrae he pracical performance of he online algorihms. An addiional experimen for online LASSO wih O-AW can be found in he appendix. 5.1 Example: Online marix compleion MC Consider he following seing: we are sequenially given observaions in he form k, l, Y, wih k, l [m 1 ] [m ] and Y R. The observaions are assumed o be i.i.d. To define he loss funcion, he condiional disribuion of Y w.r.. he sampling is paramerized by an unknown marix θ R m1 m and supposed o belong o he exponenial family, i.e., p θy k, l := my exp Y θk,l A θ k,l, 17 where m and A are he base measure and log-pariion funcions, respecively. A naural choice for he loss funcion a round is obained by aking he logarihm of he poserior, i.e., f θ := Aθ k,l Y θ k,l. 7

8 Primal opimaliy O-FW O-PG Bach FW Dualiy MSE O-FW O-PG Bach FW Dualiy gap MSE O-FW Bach FW Dualiy gap Ieraion number Ieraion number Ieraion number Primal opimaliy O-FW O-PG Bach FW Dualiy MSE O-FW O-PG Bach FW Acive ALT Dualiy gap MSE O-FW Bach FW Acive ALT Dualiy gap Time s Time in sec Time in sec. Figure 1: Online MC performance. Lef synheic wih bach size B = 1000; Middle movielens100k wih B = 80; Righ movielens0m wih B = Top objecive value/mse agains round number; Boom agains execuion ime. The dualiy gap g FW for O-FW is ploed in purple. Our goal is o minimize he regre wih a penaly favoring low rank soluions ] C := {θ R m1 m : θ σ,1 R}, and he sochasic cos associaed is fθ := E θ[ Aθk1,l 1 Y 1 θ k1,l 1. Noe ha he aggregaed gradien F θ = 1 f sθ can be expressed as: [ F θ ] = k,l 1 A [θ ] k,l [ e ] k s e 1[ ls k,l Y se ks e ls ]k,l, k, l [m 1] [m ], wih {e k } m1 k=1 resp. {e l }m l=1 he canonical basis of of resp. R Rm1 m. We observe ha he wo marices e k s e ls and Y se ks e ls can be compued on-he-fly as he running sum. The wo marices can also be sored efficienly in he memory as hey are a mos -sparse. The per ieraion complexiy is upper bounded by Omin{m 1 m, T }, where T is he oal number of observaions. We observe ha for online MC, a beer anyime/regre bound han he general case analyzed in Secion 3 can be achieved. In paricular, Appendix H shows ha F θ fθ σ, = O log /. As such, he online gradien saisfies H3 wih η = Olog and α = 0.5. Moreover, fθ is srongly convex if A θ µ. For example, his holds for square loss funcion. Now if H1 is also saisfied, repeaing he analysis in Secion 3 yields an anyime and regre bound of Olog / and Olog T/T, respecively. We es our online MC algorihm on a small synheically generaed daase, where θ is a rank-0, marix wih Gaussian singular vecors. There are 10 6 observaions wih Gaussian noise of variance 3. Also, we es wih wo daase movielens100k, movielens0m from [HK15], which conains 10 5, 10 7 movie raings from 943, users on 168, 6744 movies, respecively. We assume Gaussian observaion and he loss funcion f is designed as he square loss. Resuls. We compare O-FW o a simple online projeced-gradien O-PG mehod. The sep size for O-FW is se as γ = /1 +. For he movielens daases, he parameer θ is unknown, herefore we spli he daase ino raining 80% and esing 0% se and evaluae he mean square error on he es se. Radiuses of C R are se as R = 1.1 θ σ,1 synheic, R = movielens100k and R = movielens0m. Noe ha H1 is saisfied by he synheic case. The resuls are shown in Figure 1. For he synheic daa, we observe ha he sochasic objecive of O-FW decreases a a rae O1/, as prediced in our analysis. Significan complexiy reducion compared o O-PG for synheic and movielens100k daases are also observed. The running ime is faser han he bach FW wih line searched sep size on movielens0m, which we suspec is caused by he simpler linear opimizaion solved a he algorihm iniializaion by O-FW 1 ; and is also comparable o a sae-of-he-ar, 1 This operaion amouns o finding he op singular vecors of F θ, whose complexiy grows linearly wih he number of non-zeros in F θ. 8

9 Round number Dualiy gap Error rae Round number Tes Logisic loss Tes Sigmoid loss, sigma= Round number Tes Logisic loss Tes Sigmoid loss, sigma= Dualiy gap Tes Logisic loss Tes Sigmoid loss, sigma= Train Logisic loss Train Sigmoid loss, sigma= Error rae Round number Tes Logisic loss Tes Sigmoid loss, sigma= Train Logisic loss Train Sigmoid loss, sigma= Dualiy gap Dualiy gap Dualiy gap 0.04 Error rae Dualiy gap 10- Tes Logisic loss Tes Sigmoid loss, sigma= Train Logisic loss Train Sigmoid loss, sigma= Error rae Error rae Tes Logisic loss Tes Sigmoid loss, sigma= Train Logisic loss Train Sigmoid loss, sigma= Error rae Round number Round number Figure : Binary classificaion performance agains round number for: Lef synheic daa; Middle mnis class 1 ; Righ rcv1.binary. Top wih no flip Boom wih 5% flip in he raining labels. The dualiy gap gfw for O-FW wih sigmoid loss is ploed in purple. specialized bach algorihm for MC problems in [HO14] acive ALT and achieves he same MSE level, even hough he daa are acquired in an online fashion in O-FW. 5. Example: Robus Binary Classificaion wih Ouliers Consider he following online learning seing: he raining daa is given sequenially in he form of y, x, where y {±1} is a binary label and x Rn is a feaure vecor. Our goal is o rain a classifier θ Rn such ha for an arbirary feaure vecor x i assigns y = signhθ, x i. The daase may someimes be conaminaed by wrong labels. As a remedy, we design a sigmoid loss funcion f θ := 1 + exp10 y hθ, x i 1 ha approximaes he 0/1 loss funcion [SSSS11, EBG11]. Noe ha f θ is smooh and Lipschiz, bu no convex. For C, we consider he `1 ball C`1 = {θ Rn : kθk1 r} when a sparse classifier is preferred; or he race-norm ball Cσ = {θ Rm1 m : kθkσ,1 R}, where n = m1 m, when a low rank classifier is preferred. We evaluae he performance of our online classifier on synheic and real daa. For he synheic daa, he rue classifier θ is a rank-10, Gaussian marix. Each feaure x is a Gaussian marix. We have uples of daa for raining esing. We also es he classifier on he mnis classifying 1 from he res of he digis, rcv1.binary daase from LIBSVM [CL11]. The feaure dimensions are 784, 4736, and here are and daa uples for raining esing, respecively. We arificially and randomly flip 0%, 5% labels in he raining se. Resuls. As benchmark, we compare wih he logisic loss funcion, i.e., f θ = log1 + exp y hθ, x i. We apply O-FW wih a learning rae of α = 0.75 for boh loss funcions, i.e., γ = 1/0.75. For he synheic daa and mnis, he sigmoid logisic loss classifier is rained wih a race norm ball consrain of R = 1 R = 10. Each round is fed wih a bach of B = 10 uples of daa. For rcv1.binary, we rain he classifiers wih `1 -ball consrain of r = 100 r = 1000 for sigmoid logisic loss. Each round is fed wih a bach of B = 5 uples of daa. As seen in Figure, he logisic loss and sigmoid loss performs similarly when here are no flip in he labels; and he sigmoid loss demonsraes beer classificaion performance when some of he labels are flipped. Lasly, he dualiy gap of O-FW applied o he non-convex loss decays gradually wih, indicaing ha he algorihm converges o a saionary poin. 9

10 A Proof of Proposiion The following proof is an applicaion of a modified version of [SSSSS09, Theorem 5]. Le us define θ = F θ fθ = 1 fθ; ω s E ω D [ fθ; ω]. 18 From [Gau05], for some sufficienly small > 0, here exiss a Euclidean -ne, N, wih cardinaliy bounded by ρ n N = O n logn. 19 In paricular, for any θ C here is a poin p θ N /L such ha p θ θ /L. This implies: θ p θ + p θ θ p θ + F θ F p θ + fθ fp θ p θ + F θ F p θ + fθ fp θ p θ + L θ p θ p θ +, where we used he L-smoohness of F θ and fθ for he second las inequaliy. Applying he union bound and conrolling each poin p θ N /L using he sub-gaussian assumpion yields: P sup θ C θ > s P p θ N /L { } s p θ > s N /L n exp O n 3 logn n L ρ exp σ D s σ D. Seing s = 3 in he above, i can be verified ha he following holds wih probabiliy a leas 1 δ n log logn/δ θ = O max{l ρ, σ D }. 0 Applying anoher union bound over 1 e.g., by seing δ = / hen yields he desired resul. B Proof of Theorem 3 We define h := fθ min θ C fθ in he following. The analysis below is done by assuming a more general sep size rule γ = K/K + 1 wih some K Z +. Firs of all, we noice ha for boh Algorihm 1 and Algorihm wih he sep size rule γ = K/K + 1, we have γ 1 = 1 and hus h 1 = fa 1 fθ <. For, we have he following convergence resuls for FW/AW algorihms wih inexac gradiens. As explained in he proof skech, le us sae he following lemma which is borrowed from [LJJ13, LJJ15]. Lemma 6. [LJJ13, LJJ15] Assume H1 and ha f is L-smooh and µ-srongly convex, hen max θ C fθ, θ θ µδ h and L ρ µδ. 1 Consider Algorihm, assume H and ha f is L-smooh and µ-srongly convex, hen max θ A fθ, θ min θ C fθ, θ µδ AW h and L ρ µδ AW. Noe ha [SSSSS09, Theorem 5] assumed implicily ha fθ; ω s is bounded for all ω s, which can be generalized by our assumpion ha fθ; ω s is sub-gaussian. 10

11 The above lemma is a key resul ha leads o he linear convergence of he classical FW/AW algorihms wih adapive sep sizes, as sudied in [LJJ13, LJJ15]. Lemma 6 enables us o prove he heorems below for he FW/AW algorihms wih inexac gradien and fixed sep sizes, whose proof can be founded in Appendix E and F: Theorem 7. Consider Algorihm 1 wih he assumpions given in Theorem 3. The following holds wih probabiliy a leas 1 : fθ fθ η α D 1,, 3 + K 1 where β = 1 + α/k α and { K + 1 α, } D 1 = max 4 β ρσ + KL ρ / K δ. µ The anyime bound for Algorihm 1 is obvious from he above Theorem. Theorem 8. Consider Algorihm wih he assumpions given in Theorem 3. The following holds wih probabiliy a leas 1 : fθ fθ η α, D,, 4 n 1 + K where n is he number of non-drop seps see Algorihm up o ieraion, β = 1 + α/k α and D = max { K + 1 α, } β ρσ + KL ρ / K δ AW. µ In addiion, we have he following Lemma for Algorihm. Lemma 9. Consider Algorihm. We have n / for all, where n is he number of non-drop seps aken unil round. Proof. Excep a iniializaion, he acive se is never empy. Indeed, if here is only one acive aom lef, hen is weigh is 1. Therfore he condiion of line 9 is saisfied and he aom canno be dropped. Denoe by q he number of ieraions where an aom was dropped up o ime line 1. As noed above, n + q = holds. Since o be dropped, an aom needs o be added o he acive se A firs, q / also holds, yielding he resul. Combining Theorem 8 and he above lemma, we ge he desirable anyime bound for Algorihm. B.1 Proof of Lemma 6 We firs prove he firs par of he lemma, i.e., 1, peraining o he O-FW algorihm. Le s C be a poin on he boundary of C such ha i is co-linear wih θ and θ. Moreover, we defin g := max θ C fθ, θ θ. As θ inc, we can wrie From he µ-srong convexiy of f, we have θ = θ + γ s θ for some γ [0, 1. 5 µ θ θ fθ fθ fθ, θ θ = h + γ fθ, θ s h + γg, 6 where he las inequaliy is due o he definiion of g. Now, he lef hand side of he inequaliy above can be bounded as µ θ θ = γ µ s θ γ µ s θ γ δ µ 7 11

12 Combining he wo inequaliies above yields h γg γ δ µ g δ µ, 8 where he upper bound is achieved by seing γ = g /δ µ. Recalling he definiion of g concludes he proof of he firs par. Lasly, we noe by combining Eq., Remark 1 and Lemma in [LJJ13], we have L ρ µδ. Nex, we prove he second par of he lemma, i.e.,, peraining o he O-AW algorihm. Recall ha as C is a polyope, we can wrie C = conva where A is a finie se of aoms in R n, i.e., C is a convex hull of A. Noe ha A A for all in he O-AW algorihm. Le us define he pyramidal widh δ AW of C as: 1 δ AW := inf inf max d, y K facesc,θ K,d conec θ\{0} A A θ d min d, y, 9 y A {ak,d} y A {ak,d} where A θ := {A : A A such ha θ conva and θ is a proper convex combinaion of A } and ak, d := arg max v K v, d. Now, define he quaniies: γ A θ, θ := fθ, θ θ fθ, v f θ s f θ, 30 where v f θ := arg min a Aθ fθ, a and s f θ := arg min a A fθ, a. From [LJJ15, Theorem 6], i can be verified ha µ δaw inf inf fθ θ C θ C,s.. fθ,θ θ <0 γ A θ, θ fθ fθ, θ θ, 31 In he above, we have denoed Aθ := {v = v A θ : A A θ } where v A θ := arg max a A fθ, a. We remark ha Aθ A. Noe ha γ A θ, θ > 0 as long as fθ, θ θ < 0 is saisfied. Assume θ θ and observe ha we have fθ, θ θ < 0, Eq. 31 implies ha γ A θ, θ µδaw fθ fθ fθ, θ θ = h + γ A θ, θ fθ, v f θ s f θ, 3 where he equaliy is found using he definiion of γ A θ, θ. min θ C fθ, θ and observe ha Define g AW := max θ A fθ, θ fθ, s f θ = min θ C fθ, θ and fθ, v f θ max θ A fθ, θ. 33 Plugging he above ino 3 yields h γa θ, θ µδaw + γ A θ, θ g AW gaw δaw µ, 34 where we have se γ A θ, θ = g AW /δaw µ similar o he firs par of his proof. This concludes he proof for he lower bound on g AW. Lasly, i follows from Remark 7, Eq. 0 and Theorem 6 of [LJJ15] ha µδaw L ρ. C Proof of Theorem 4 In he following, we denoe he minimum loss acion a round as θ arg min θ C F θ. Noice ha F θ may be non-convex. Observe ha for O-FW: F θ +1 F θ + γ F θ, a θ + 1 γ L ρ = F θ γ g FW + 1 γ L ρ, 35 1

13 where he firs inequaliy is due o he fac ha f is L-smooh and C has a diameer of ρ. Define := F θ F θ o be he insananeous loss a round recall ha θ arg min θ C F θ. We have +1 = F θ +1 F θ f +1θ +1 f +1 θ+1 36 Noe ha he firs par of he righ hand side of 36 can be upper bounded as F θ +1 F θ +1 F θ +1 F θ γ g FW + 1 γ L ρ, 37 where he firs inequaliy is due o θ +1 C and he opimaliy of θ and he second inequaliy is due o he L-smoohness of F. Combining 36 and 37 gives γ g FW + 1 γ g FW γ L ρ / f +1θ +1 f +1 θ γ L ρ f +1θ +1 f +1 θ Using he definiion of +1, we noe ha +1 1 f +1 θ +1 f +1 θ = /+1F θ +1 F θ +1. Therefore, simplifying erms give Observe ha: T =T/+1 γ g FW F θ +1 F θ +1 + γ L ρ /. 38 F θ +1 F θ+1 = T =T/+1 F θ F θ +1 F θ F θ+1 = F T θ T +1 + F T/+1 θ T/+1 F T/+1 θt/+1 + F T θt +1 + T =T/+ f 1 θ F 1 θ f θ F 1 θ G θ T +1 θt +1 + θ T/+1 θt/+1 + T =T/+ 1 θ θ ρg 1 + T =T/+ 1 ρg 1 + log 4ρG. where we have used he fac ha F θ F 1 θ = 1 f θ F 1 θ in he firs equaliy and ha f, F are G-Lipschiz in he second inequaliy. We noice ha T =T/+1 γ = T =T/+1 α log 1 as α [0.5, 1]. Summing up boh side of he inequaliy 38 gives T =T/+1 γ T =T/+1 γ g 4ρG + L ρ /, 39 min [T/+1,T ] gfw where he inequaliy o he lef is due o γ, g FW 0. Observe ha for all T 6, T =T/+1 γ = T =T/+1 α T 1 α 1 α 1 1 α 3 = Ω T 1 α. We conclude ha min [T/+1,T ] gfw 1 α T 1 α 4ρG + L ρ / 1 For he O-AW algorihm, we observe ha 1 α 1 = O1/T 1 α F θ +1 F θ + ˆγ F θ, d + 1 ˆγ L ρ 41 13

14 Noe ha by consrucion, F θ, d = min{ F θ, a FW θ, F θ, θ a AW }. Using he inequaliy min{a, b} 1/a + b, we have F θ +1 F θ + ˆγ F θ, 1 a FW Proceeding in a similar manner o he proof for O-FW above, we ge a AW + 1 ˆγ L ρ = F θ 1 ˆγ g AW + 1 ˆγ L ρ. 4 1 ˆγ g AW F θ +1 F θ ˆγ L ρ. 43 The only difference from 38 in he O-FW analysis are he erms ha depend on he acual sep size ˆγ. Now, Lemma 9 implies ha a leas T/4 non-drop seps could have aken unil round T/, herefore we have ˆγ γ T/4 for all [T/ + 1, T ] since if a non-drop sep is aken, hen he sep size will decrease; or if a drop-sep sep is aken, we have ˆγ γ n 1 and n 1 T/4. Therefore, 1 T =T/+1 ˆγ L ρ T 4 L ρ T α L ρ. 4 Summing he righ hand side of 43 from = T/ + 1 o = T yields an upper bound of 4ρG + L ρ. On he oher hand, define T non-drop be a subse of [T/ + 1, T ] where a non-drop sep is aken. We have T =T/+1 ˆγ T non-drop γ n T =3T/4+1 γ T 1 α 4 1 α 1 = ΩT 1 α 5 1 α, where he second inequaliy is due o he fac ha T non-drop T/4 and he las inequaliy holds for all T 0. Finally, summing he lef hand side of 43 from = T/ + 1 o = T yields min [T/+1,T ] gaw T =T/+1 Therefore, we conclude ha min [T/+1,T ] g AW D Proof of Proposiion 5 ˆγ T =T/+1 ˆγ g AW 4ρG + L ρ. = O1/T 1 α for he O-AW algorihm. We firs look a he O-FW algorihm. Our goal is o bound he following inner produc max fθ, θ θ, θ C where [T/ + 1, T ] is he round index ha saisfies g FW = O1/T 1 α, which exiss due o Theorem 4. For all θ C, observe ha fθ, θ θ F θ, θ θ + fθ F θ, θ θ g FW + ρ fθ F θ. Following he same line of analysis as Proposiion, wih probabiliy a leas 1, i holds ha n log logn/ fθ F θ = O max{σ D, ρl}, 45 which is obained from 0. Noe ha compared o Proposiion, we save a facor of log inside he square roo as he ieraion insance is fixed. Using he fac ha T/ + 1, he following holds wih probabiliy a leas 1, fθ, θ θ = O max { 1/T 1 α, log T/T }, θ C

15 For he O-AW algorihm, we observe ha he inequaliy 4 in Appendix C can be replaced by F θ +1 F θ ˆγ F θ, θ a FW + 1 ˆγ L ρ. Furhermore, we can show ha he inner produc F θ, θ a FW decays a he rae of O1/T 1 α by replacing g AW in he proof in Appendix C wih his inner produc. Consequenly, 44 holds for he θ generaed by O-AW, i.e., Applying 45 yields our resul. E Proof of Theorem 7 fθ, θ θ F θ, θ a FW + ρ fθ F θ This secion esablishes a Oη / + K 1 α bound for h for Algorihm 1 wih inexac gradiens, i.e., replacing F θ by ˆ fθ saisfying H3, under he assumpion ha fθ is L-smooh, µ-srongly convex and γ = K/K + 1. Define = ˆ fθ fθ, g = max s C θ s, fθ as he dualiy gap a θ. Noice ha 1 in Lemma 6 implies: g µδ h, 46 Define s arg max s C θ s, fθ. We noe ha fθ, a θ ˆ fθ, s θ, a θ = fθ, s θ +, s a g + ρ δ µh + ρ, 47 where he las line follows from 46. Combining he L-smoohness of fθ and 47 yield he following wih probabiliy a leas 1 and for all 1, h +1 h h γ δ η α µ + γ ρσ K 1 γ L ρ. 48 Le us recall he definiion of D 1 D 1 = max{4k + 1/K α, β }ρσ + KL ρ / /δ µ wih β = 1 + α/k α, 49 and proceed by inducion. Suppose ha h D 1 η / + K 1 α for some 1. There are wo cases. Case 1 h γ δ µh 0: Then since γ = K/K + 1, 48 yields η +1 α η h +1 ρσk α K α + L ρ K K + 1 ρσk + L ρ K / K + 1 α K + 1 α η ρσk + L ρ K α / +1, K K + where we used ha η is increasing and larger han 1. To conclude, one jus needs o check ha K + 1 α ρσk + L ρ K / D1. 50 K Noe ha we have K + 1 αρσ D 1 + L ρ K/ 4 ρσ + L ρ K/ K + 1 αρσ K µδ + L ρ K/ K, 51 K 15

16 where he las inequaliy is due o L ρ δ µ from Lemma 6. Hence, Case h γ δ µh > 0: By inducion hypohesis and 48, we have h +1 D 1 η +1/K + α. α η h +1 D +1 1 K + η α η D 1 +1 α η + α K K + 1 K + K α ρσ + L ρ K/ δ µd 1 η α η [αd α K α 1 + Kρσ + K L ρ / δk ] µd 1 + K 1 η α η [αd α K α 1 + Kρσ + K L ρ /1 β] + K 1 5 where we used he fac ha i η is increasing and larger han 1, ii 1 and iii 1/K + 1 α 1/K + α α/k α in he second las inequaliy; and we have used he definiion of D 1 in he las inequaliy. Define η α 0 := inf{ 1 : αd 1 + Kρσ + K L ρ /1 β 0} K 1 Since η/k + 1 is monoonically decreasing o 0 and β > 1, 0 exiss. Clearly, for any > 0 he RHS is non-posiive. For 0, we have Kρσ + K L ρ η α /β 1 αd K 1 i.e., D 0 K αβ 1 αd 1 η + K 1 α 55 Hence by he definiion ha β = 1 + α/k α and applying Theorem 10 see Secion E.1 we ge: η α η h D 0 D 1 + K 1 + K 1 The iniializaion is easily verified as he firs inequaliy holds rue for all. E.1 Proof of Theorem 10 Theorem 10. Consider Algorihm 1 and assume H3 and ha fθ is convex and L-smooh. Then, he following holds wih probabiliy a leas 1 : fθ fθ η α D 0,, 56 + K 1 α where D 0 = K L ρ / + ρσk K α

17 Le us define h = fθ fθ, hen we ge On he oher hand, he following also hods: h +1 h + γ fθ, a θ + 1 γ L ρ. 58 fθ, a θ = ˆ fθ, a θ, a θ, ˆ fθ, θ θ, a θ = fθ, θ θ +, θ a h + ρ. 59 where he second line follows from he definiion of a and he las inequaliy is due o he convexiy of f and he definiion of he diameer. Plugging 59 ino 58 and using H3 yields he following wih probabiliy a leas 1 and for all 1 η h +1 1 γ h + γ ρσ K + 1 α + 1 γ L ρ. 60 We now proceed by inducion o prove he firs bound of he Theorem. Define D 0 = K L ρ / + ρσk/k α. The iniializaion is done by applying 60 wih = 1 and noing ha K 1. Assume ha h D 0 η/k + 1 α for some 1. Since γ = K/ + K 1, from 60 we ge: η α h +1 D K + η α η D 0 +1 α + K L ρ / + ρσkη α D 0 Kη α + K 1 + K + K 1 1+α η α D 0 + K 1 α D 0 + K α + K L ρ / + ρσk D 0 K + K 1 1+α η α + K 1 1+α α KD 0 + K L ρ / + ρσk 0, where we used he fac ha η is increasing and larger ha 1 for he second inequaliy and 1/ + K 1 α 1/ + K α α/ + K 1 1+α for he hird inequaliy. The inducion argumen is now compleed. F Proof of Theorem 8 This secion esablishes a Oη/n 1 + K α bound for h for Algorihm wih inexac gradiens, i.e., replacing F θ by ˆ fθ saisfying H3, under he assumpion ha fθ is L-smooh, µ-srongly convex and γ = K/K + 1. Ouline of he proof. Here, our sraegy parallels ha of Appendix E. We firs show ha he slow convergence rae of Oη/n 1 + K α holds for Algorihm Theorem 11. The fas convergence rae of Oη/n 1 + K α is hen esablished using inducion. We have o pay special aenion o he case when a drop sep is aken line 13 of Algorihm. In paricular, when a drop sep is aken, he inducion sep is done by Lemma 1; for oherwise, we apply similar argumens in Appendix E o proceed wih he inducion. To begin our proof, le us define = ˆ fθ fθ, b FW := arg min b C b, fθ, b AW := arg max b A b, fθ, ḡ AW := fθ, b AW b FW. We remark ha b AW a AW and b FW a FW as hey are evaluaed on he rue gradien fθ. 17

18 Recall ha in Algorihm, we choose d such ha ˆ fθ, d = min{ ˆ fθ, a FW θ, ˆ fθ, θ }. Therefore, for : a AW ˆ fθ, d ˆ fθ, afw = fθ, bfw a AW where he second inequaliy is due o he definiions of a FW As f is L-smooh, he following holds, ˆ fθ, bfw b AW +, bfw and a AW ˆ fθ ḡaw, d +, bfw b AW b AW in 3. Hence: b AW 6 fθ +1 fθ + ˆγ fθ, d + L ρ ˆγ 63 = fθ + ˆγ ˆ fθ, d, d + ˆγ L ρ ḡ AW fθ ˆγ + ˆγ, bfw b AW d + ˆγ L ρ where we used 6 for he las line. Subracing fθ on boh sides and applying H3 yield ḡ AW h +1 h ˆγ + ˆγ η α ρσ + ˆγ L ρ K + 1, 64 where we have used b FW b AW / d ρ. We firs esablish a slow convergence rae of O-AW algorihm. Define D = K K α KL ρ / + ρσ. 65 Theorem 11. Consider Algorihm. Assume H3 and ha fθ is convex and L-smooh, he following holds wih probabliy 1 : h := fθ fθ D η α, 66 n 1 + K for all. Here D is given in 65. Proof. See subsecion F.1. Le us recall he definiion of D D = max{k + 1/K α, β }ρσ + KL ρ / /δ AWµ wih β = 1 + α/k α. To prove Theorem 8, we proceed by inducion and assume ha for some, h D η/k + n 1 α holds. Noice ha in Lemma 6 gives: µδaw h, 67 ḡ AW Now, suppose ha h > 0 h = 0 is discussed a he end of he proof. Combining 64 and 67 gives: µh h +1 h ˆγ δ AW + ˆγ η α ρσ + ˆγ L ρ n 1 + K. 68 We have used he fac ha 1 n 1. Consider wo differen cases. If a drop sep is aken a ieraion + 1, he inducion sep can be done by he following: 18

19 Lemma 1. Suppose ha h D η/k + n 1 α and ha a drop sep is aken a ieraion + 1 see Algorihm line 1, hen η α h +1 D +1, 69 K + n noe ha n = n 1 when a drop sep is aken. Proof. See subsecion F.. The above lemma shows ha he objecive value does no increase when a drop sep is aken. On he oher hand, when a drop sep is no aken a ieraion + 1, hen from Algorihm, we have ˆγ = γ n = K/K + n 1 and n = n We consider he following wo cases: Case 1: If h ˆγ δ AW µh 0. Then, since ˆγ = K/K + n 1 and n, 68 yields η h +1 ρσk α K + n 1 1+α + L ρ K K + n 1 70 η +1 α ρσk + L ρ K / K + n 1 α K + 1 α η ρσk + L ρ K α / +1, K K + n where we used ha η is increasing and larger han 1. To conclude, one jus needs o check ha K + 1 α ρσk + L ρ K / D. 71 K Noe ha we have K + 1 αρσ D + L ρ K/ ρσ + L ρ K/ K + 1 αρσ + L ρ K/ K, K K µδ AW where he las inequaliy is due o L ρ δaw µ from Lemma 6. Hence, Case : Assume h ˆγ δ AW µh > 0. By inducion and 68, we have h +1 D η +1/K + n α. η α h +1 D +1 K + n η α η α D +1 η + α K µd K + n 1 K + n n + K 1 1+α ρσ + C f K/ δ AW η α [ η α αd K + n 1 1+α + Kρσ + K L ρ µd ] / δ AWK n + K 1 η α [ η α αd K + n 1 1+α + Kρσ + K L ρ /1 β ] 7 n + K 1 where we used he fac ha i η is increasing and larger han 1, ii 1 and iii 1/K + 1 α 1/K + α α/k α in he second las inequaliy; and we have used he definiion of D in he las inequaliy. Define η α 0 := inf{ 1 : αd + Kρσ + KL ρ /1 β 0}. 73 n + K 1 19

20 Since η/k + n 1 decreases o 0 see H3 and Lemma 9, 0 exiss. Clearly, for any > 0 he RHS is non-posiive. For 0, we have Kρσ + KL ρ η α /β 1 αd 74 n + K 1 implying D K αβ 1 αd η n + K 1 α 75 Since β = 1 + α/k α, he lef hand side of 75 equals αd and we conclude ha D D η/n + K 1 α. Applying Theorem 11 we ge: h D η α η α D n + K 1 n + K 1 The inducion sep is compleed by observing ha n 1 = n 1. The iniializaion is easily verified for =. If h = 0, hen by Lemma 6 yields g AW = 0 and he inducion is reaed as Case 1. F.1 Proof of Theorem 11 We proceed by inducion and assume for some > 0 ha h D η /n 1 + K α holds. Firs of all, observe ha from he L-smoohness of fθ, Moreover, we have: h +1 h + ˆγ fθ, d + 1 ˆγ L ρ. 76 fθ, d = ˆ fθ, d, d ˆ fθ, a FW θ, d ˆ fθ, θ θ, d = fθ, θ θ +, θ θ d h + ρ 77 where we used he condiion of line 5 Algorihm in he firs inequaliy and he fac θ θ d ρ in he las inequaliy. This gives η α 1 h +1 1 ˆγ h + ˆγ ρσ + K + n 1 ˆγ L ρ, 78 where we have used H3 and he fac ha n 1 1. Consider he wo cases: if a drop sep line 1 is aken a ieraion + 1, he following resul ha is analogous o Lemma 1 gives he inducion. Lemma 13. Suppose ha h D η /K + n 1 α for α 0, 1], and ha a drop sep is aken a ime + 1 see Algorihm line 1, hen η α Proof. See subsecion F.3. h +1 D K + n On he oher hand, if a drop sep is no aken, noice ha we will have ˆγ = γ n = K/K + n 1 and n = n Consequenly, he same inducion argumen in subsecion E.1 replacing by n and consider h +1 D η +1/K + n α shows: η h +1 D α K + n The iniializaion of he inducion is easily checked for =. 0

21 F. Proof of Lemma 1 Since ieraion + 1 is a drop sep, we have by consrucion Algorihm line 1 ˆγ = γ max K K + n and n = n 1. From 68 and he assumpion in he lemma, we consider wo cases: if h ˆγ µδ AW / 0, hen we have η α h +1 D +1 η ˆγ ρσ α 1 η + K + n n 1 + K L ρˆγ α D +1 n + K ρσ K η +1 α n + K 1+α + 1 K η α L ρ D n + K n + K η α +1 ρσk + K L ρ / D n + K The second inequaliy is due o n = n 1 and ˆγ = γ max K/K + n. The las inequaliy is due o α min{, 1 + α} for all α 0, 1] and η is an increasing sequence wih η 1. I can be verified ha he righ hand side is non-posiive using he definiion of D. On he oher hand, if h ˆγ µδ AW / > 0, we have from 68 h +1 D η +1 n + K α h h ˆγ µδ AW / + 1 L ρˆγ + ˆγ ρσ 1 L ρˆγ + ˆγ ρσ η n 1 + K 1 = ˆγ L ρˆγ η + ρσ n + K KL ρ / ˆγ n + K + ρσ η ˆγ n + K η α η +1 D n 1 + K n + K α α ˆγ D µδ AW / α D µδ AW / D µδ AW / α KL ρ / + ρσ η n + K D µδ AW /. η n 1 + K α η n + K α The las inequaliy is due o α 1. Similarly, by he definiion of D, we observe ha he RHS in he above inequaliy is non-posiive. α F.3 Proof of Lemma 13 Using 78 gives he following chain η α +1 η 1 ˆγ h + ˆγ ρσ +1 α + 1 η +1 α h +1 D K + n K + n L ρˆγ D K + n η 1 ˆγ D α η α + ˆγ ρσ η K + n K + n L ρˆγ D +1 K + n η ˆγ D α + ρσ L ρ + ˆγ K + n ˆγ D + ρσ + 1 KL ρ η α 0. K + n In he above, he second inequaliy is due o 1 ˆγ 0 and he inducion hypohesis; he hird inequaliy is due o η is increasing and; he las inequaliy is due o ˆγ < K/K + n. The proof is compleed. α 8 1

22 G Fas convergence of O-AW wihou srong convexiy The proof is based on a generalizaion of Lemma 6, and he following resul is borrowed from Theorem 11 in [LJJ15]. We focus on he anyime/regre bound sudied in Secion 3.1 below. In paricular, he relaxed condiions for a regre bound of Olog 3 T/T and anyime bound of Olog / are ha i C is a polyope and ii he loss funcion can be wrien as: fθ = gaθ + b, θ. 83 where g is µ g -srongly convex. For a general marix A, fθ may no be srongly convex. Define C o be he marix wih rows conaining he linear inequaliies defining C. Le c h be he Hoffman consan [LJJ15] for he marix [A; b ; C], G = max θ C gaθ be he maximal norm of gradien of g over AC, ρ A be he diameer of AC and we define he generalized srong convexiy consan: µ := 1 c h b M + 3Gρ A + /µ g G Under H and assuming ha h > 0 holds, applying he inequaliy 43 from [LJJ15] yields ḡ AW δ AW µ h. 85 Subsequenly, he Olog T/T anyime bound and Olog 3 T/T regre bound in Theorem 1 can be obained by repeaing he proof in Appendix F wih 85. H Improved gradien error bound for online MC Our goal is o show ha wih high probabiliy, F θ fθ σ, = O log /, sufficienly large. 86 To faciliae our proof, le us sae he following condiions on he observaion noise saisics: A1. The noise variance is finie, ha is here exiss a consan σ > 0 such ha for all ϑ R, 0 A ϑ σ, and he noise is sub-exponenial i.e., here exis a consan λ 1 such ha for all k, l [m 1 ] [m ]: exp λ 1 y A θ k,l p θy k, ldy e, 87 where p θ is defined as p θy k, l := my exp y θ k,l A θ k,l and e is he naural number. A. There exiss a finie consan κ > 0 such ha for all θ C, k [m 1 ], l [m ] κ max m A θ k,l, m1 A θ k,l. 88 Noice ha κ = O max{m 1, m }. l=1 We remark ha A1 and A are saisfied by all he exponenial family disribuions. We also need he following proposiion. k=1 Proposiion 14. Consider a finie sequence of independen random marices Z s 1 s R m1 m E[Z i ] = 0. For some U > 0, assume saisfying inf{λ > 0 : E[exp Z i σ, /λ] e} U i [n], 89

23 and here exiss σ Z s.. σz 1 max E[Z s Zs 1 ], σ, E[Zs Z s ] σ,. 90 Then for any ν > 0, wih probabiliy a leas 1 e ν 1 { } ν + logd Z i c U max σ Z, U log σ, U ν + logd σ Z i=1, 91 wih c U an increasing consan wih U. Proof. This resul is proved in Theorem 4 in [Kol13] for symmeric marices. Here we sae a slighly differen resul because σz is an upper bound of he variance and no he variance iself. However, i does no he aler he proof and he resul says valid. This concenraion is exended o recangular marices by dilaion, see Proposiion 11 in [Klo14] for deails. Our resul is saed as follows. Proposiion 15. Assume A1, A and ha he sampling disribuion is uniform. Define he approximaion error θ := F θ fθ. Wih probabiliy a leas 1, for any T := λ/ σ log λ/ σ logd+d/, and any θ C R : logd1 + θ σ, = O c λ κ + σ /, m 1 m wih σ, he operaor norm, c λ a consan which depends only on λ. The consans λ, σ and κ are defined in A1 and A. Proof. For a fixed θ, by he riangle inequaliy θ σ, 1 Y s e ks e ls E[Y s e ks e ls ] σ, + 1 A θ ks,l s e ks e ls E[A θ ks,l s e ks e ls ] σ, Define Z s := Y s e ks e ls E[Y s e ks e ls ], hen E[Z s Zs ] σ, E[Ys e ks e ls e l s e k s ] σ,, m m1 1 = diag E[Ys k, l] σ,, m 1 m = 1 m 1 m max k [m 1] l=1 m k=1 A θ k,l + A θ k,l, l=1 σ + κ σ + κ, m 1 m m 1 m m 1 m where we used he fac ha he disribuion belongs o he exponenial family for he second equaliy. Similarly one shows ha E[Zs Z s ] σ, saisfies he same upper bound. Hence by Proposiion 14 and A1, wih probabiliy a leas 1 e ν, i holds 1 σ Z s σ, c + κ ν + logd λ, 9 m 1 m 3

24 for larger han he hreshold given in he proposiion saemen. For he second erm, define P := 1/ e k s e ls m 1 m 1 11, we ge 1 A θ ks,l s e ks e ls E[A θ ks,l s e ks e ls ] σ, = P A θ k,l k,l σ, κ P σ,, 93 where denoes he Hardamard produc and we have used Theorem in [HJ94] for he las inequaliy. Define Z s := e k s e ls m 1 m Since by definiion, λ 1, one can again apply Proposiion 14 for U = λ and ge wih probabiliy a leas 1 e ν, ν + logd P σ, c λ m 1 m. 94 Hence, by a union bound argumen we find ha wih probabiliy a leas 1 e ν ν + logd σ, c λ κ + σ m 1 m. 95 Taking ν = log1 + / and applying a union bound argumen yields he resul. I Addiional resuls: Online LASSO Consider he seing where we are sequenially given i.i.d. observaions Y, A such ha Y R m is he response, A R m n is he random design and Y = A θ + w, 96 where he vecor w is i.i.d., [w ] i is independen of [w ] j for i j and [w ] i is zero-mean and sub-gaussian wih parameer σ w. We suppose ha he unknown parameer θ is sparse. Aemping o learn θ, a naural choice for he loss funcion a round is he square loss, i.e., f θ = 1/ Y A θ 97 and he sochasic cos associaed is fθ := 1 E θ[ Y A θ ]. As θ is sparse, he consrain se is designed o be he l 1 ball, i.e., C = {θ R n : θ 1 r}, where r > 0 is a regularizaion consan. Noe ha C is a polyope. The aggregaed gradien can be expressed as F θ = 1 A s A s θ 1 A s Y s. 98 Similar o he case of online marix compleion, he erms A s A s and A s Y s can be compued on-he-fly as running sums. Applying O-FW Algorihm 1 or O-AW Algorihm wih he above aggregaed gradien yields an online LASSO algorihm wih a consan complexiy dimension-dependen per ieraion. Noice ha as C is an l 1 ball consrain, he linear opimizaion in Line 4 of Algorihm 1 or 3 in Algorihm can be evaluaed simply as a = r sign[ F θ ] i e i, where i = arg max j [n] [ F θ ] j. Similar o he case of online MC, we derive he following O log / bound for he gradien error: Proposiion 16. Assume ha A A E[A A] max B 1 and A max B almos surely, wih max being he marix max norm. Define c := max θ C θ θ 1. Wih probabiliy a leas /nπ /6, he following holds for all θ C and all 1: F θ fθ cb 1 + logn mb σw log, 99 where is he infiniy norm and he dual norm of 1. 4

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Lecture 9: September 25

Lecture 9: September 25 0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

An Introduction to Malliavin calculus and its applications

An Introduction to Malliavin calculus and its applications An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Some Ramsey results for the n-cube

Some Ramsey results for the n-cube Some Ramsey resuls for he n-cube Ron Graham Universiy of California, San Diego Jozsef Solymosi Universiy of Briish Columbia, Vancouver, Canada Absrac In his noe we esablish a Ramsey-ype resul for cerain

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities: Mah 4 Eam Review Problems Problem. Calculae he 3rd Taylor polynomial for arcsin a =. Soluion. Le f() = arcsin. For his problem, we use he formula f() + f () + f ()! + f () 3! for he 3rd Taylor polynomial

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

Empirical Process Theory

Empirical Process Theory Empirical Process heory 4.384 ime Series Analysis, Fall 27 Reciaion by Paul Schrimpf Supplemenary o lecures given by Anna Mikusheva Ocober 7, 28 Reciaion 7 Empirical Process heory Le x be a real-valued

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization A Forward-Backward Spliing Mehod wih Componen-wise Lazy Evaluaion for Online Srucured Convex Opimizaion Yukihiro Togari and Nobuo Yamashia March 28, 2016 Absrac: We consider large-scale opimizaion problems

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Optimality Conditions for Unconstrained Problems

Optimality Conditions for Unconstrained Problems 62 CHAPTER 6 Opimaliy Condiions for Unconsrained Problems 1 Unconsrained Opimizaion 11 Exisence Consider he problem of minimizing he funcion f : R n R where f is coninuous on all of R n : P min f(x) x

More information

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection Appendix o Online l -Dicionary Learning wih Applicaion o Novel Documen Deecion Shiva Prasad Kasiviswanahan Huahua Wang Arindam Banerjee Prem Melville A Background abou ADMM In his secion, we give a brief

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

Approximation Algorithms for Unique Games via Orthogonal Separators

Approximation Algorithms for Unique Games via Orthogonal Separators Approximaion Algorihms for Unique Games via Orhogonal Separaors Lecure noes by Konsanin Makarychev. Lecure noes are based on he papers [CMM06a, CMM06b, LM4]. Unique Games In hese lecure noes, we define

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Chapter 15: Phenomena. Chapter 15 Chemical Kinetics. Reaction Rates. Reaction Rates R P. Reaction Rates. Rate Laws

Chapter 15: Phenomena. Chapter 15 Chemical Kinetics. Reaction Rates. Reaction Rates R P. Reaction Rates. Rate Laws Chaper 5: Phenomena Phenomena: The reacion (aq) + B(aq) C(aq) was sudied a wo differen emperaures (98 K and 35 K). For each emperaure he reacion was sared by puing differen concenraions of he 3 species

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Convergence of the Neumann series in higher norms

Convergence of the Neumann series in higher norms Convergence of he Neumann series in higher norms Charles L. Epsein Deparmen of Mahemaics, Universiy of Pennsylvania Version 1.0 Augus 1, 003 Absrac Naural condiions on an operaor A are given so ha he Neumann

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

OBJECTIVES OF TIME SERIES ANALYSIS

OBJECTIVES OF TIME SERIES ANALYSIS OBJECTIVES OF TIME SERIES ANALYSIS Undersanding he dynamic or imedependen srucure of he observaions of a single series (univariae analysis) Forecasing of fuure observaions Asceraining he leading, lagging

More information

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem An Opimal Approximae Dynamic Programming Algorihm for he Lagged Asse Acquisiion Problem Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Lecture 4: November 13

Lecture 4: November 13 Compuaional Learning Theory Fall Semeser, 2017/18 Lecure 4: November 13 Lecurer: Yishay Mansour Scribe: Guy Dolinsky, Yogev Bar-On, Yuval Lewi 4.1 Fenchel-Conjugae 4.1.1 Moivaion Unil his lecure we saw

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

3.1 More on model selection

3.1 More on model selection 3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Primal-Dual Splitting: Recent Improvements and Variants

Primal-Dual Splitting: Recent Improvements and Variants Primal-Dual Spliing: Recen Improvemens and Varians 1 Thomas Pock and 2 Anonin Chambolle 1 Insiue for Compuer Graphics and Vision, TU Graz, Ausria 2 CMAP & CNRS École Polyechnique, France The proximal poin

More information

Oscillation of an Euler Cauchy Dynamic Equation S. Huff, G. Olumolode, N. Pennington, and A. Peterson

Oscillation of an Euler Cauchy Dynamic Equation S. Huff, G. Olumolode, N. Pennington, and A. Peterson PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DYNAMICAL SYSTEMS AND DIFFERENTIAL EQUATIONS May 4 7, 00, Wilmingon, NC, USA pp 0 Oscillaion of an Euler Cauchy Dynamic Equaion S Huff, G Olumolode,

More information

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems Essenial Microeconomics -- 6.5: OPIMAL CONROL Consider he following class of opimizaion problems Max{ U( k, x) + U+ ( k+ ) k+ k F( k, x)}. { x, k+ } = In he language of conrol heory, he vecor k is he vecor

More information

Lecture 10: The Poincaré Inequality in Euclidean space

Lecture 10: The Poincaré Inequality in Euclidean space Deparmens of Mahemaics Monana Sae Universiy Fall 215 Prof. Kevin Wildrick n inroducion o non-smooh analysis and geomery Lecure 1: The Poincaré Inequaliy in Euclidean space 1. Wha is he Poincaré inequaliy?

More information

Heat kernel and Harnack inequality on Riemannian manifolds

Heat kernel and Harnack inequality on Riemannian manifolds Hea kernel and Harnack inequaliy on Riemannian manifolds Alexander Grigor yan UHK 11/02/2014 onens 1 Laplace operaor and hea kernel 1 2 Uniform Faber-Krahn inequaliy 3 3 Gaussian upper bounds 4 4 ean-value

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

4 Sequences of measurable functions

4 Sequences of measurable functions 4 Sequences of measurable funcions 1. Le (Ω, A, µ) be a measure space (complee, afer a possible applicaion of he compleion heorem). In his chaper we invesigae relaions beween various (nonequivalen) convergences

More information

5.1 - Logarithms and Their Properties

5.1 - Logarithms and Their Properties Chaper 5 Logarihmic Funcions 5.1 - Logarihms and Their Properies Suppose ha a populaion grows according o he formula P 10, where P is he colony size a ime, in hours. When will he populaion be 2500? We

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

Chapter 6. Systems of First Order Linear Differential Equations

Chapter 6. Systems of First Order Linear Differential Equations Chaper 6 Sysems of Firs Order Linear Differenial Equaions We will only discuss firs order sysems However higher order sysems may be made ino firs order sysems by a rick shown below We will have a sligh

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t.

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t. Econ. 5b Spring 999 C. Sims Discree-Time Sochasic Dynamic Programming 995, 996 by Chrisopher Sims. This maerial may be freely reproduced for educaional and research purposes, so long as i is no alered,

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

CHAPTER 2 Signals And Spectra

CHAPTER 2 Signals And Spectra CHAPER Signals And Specra Properies of Signals and Noise In communicaion sysems he received waveform is usually caegorized ino he desired par conaining he informaion, and he undesired par. he desired par

More information

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite American Journal of Operaions Research, 08, 8, 8-9 hp://wwwscirporg/journal/ajor ISSN Online: 60-8849 ISSN Prin: 60-8830 The Opimal Sopping Time for Selling an Asse When I Is Uncerain Wheher he Price Process

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013 IMPLICI AND INVERSE FUNCION HEOREMS PAUL SCHRIMPF 1 OCOBER 25, 213 UNIVERSIY OF BRIISH COLUMBIA ECONOMICS 526 We have exensively sudied how o solve sysems of linear equaions. We know how o check wheher

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization

A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization 1 A Decenralized Second-Order Mehod wih Exac Linear Convergence Rae for Consensus Opimizaion Aryan Mokhari, Wei Shi, Qing Ling, and Alejandro Ribeiro Absrac This paper considers decenralized consensus

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

Object tracking: Using HMMs to estimate the geographical location of fish

Object tracking: Using HMMs to estimate the geographical location of fish Objec racking: Using HMMs o esimae he geographical locaion of fish 02433 - Hidden Markov Models Marin Wæver Pedersen, Henrik Madsen Course week 13 MWP, compiled June 8, 2011 Objecive: Locae fish from agging

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

t 2 B F x,t n dsdt t u x,t dxdt

t 2 B F x,t n dsdt t u x,t dxdt Evoluion Equaions For 0, fixed, le U U0, where U denoes a bounded open se in R n.suppose ha U is filled wih a maerial in which a conaminan is being ranspored by various means including diffusion and convecion.

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

18 Biological models with discrete time

18 Biological models with discrete time 8 Biological models wih discree ime The mos imporan applicaions, however, may be pedagogical. The elegan body of mahemaical heory peraining o linear sysems (Fourier analysis, orhogonal funcions, and so

More information

ELE 538B: Large-Scale Optimization for Data Science. Quasi-Newton methods. Yuxin Chen Princeton University, Spring 2018

ELE 538B: Large-Scale Optimization for Data Science. Quasi-Newton methods. Yuxin Chen Princeton University, Spring 2018 ELE 538B: Large-Scale Opimizaion for Daa Science Quasi-Newon mehods Yuxin Chen Princeon Universiy, Spring 208 00 op ff(x (x)(k)) f p 2 L µ f 05 k f (xk ) k f (xk ) =) f op ieraions converges in only 5

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

6.2 Transforms of Derivatives and Integrals.

6.2 Transforms of Derivatives and Integrals. SEC. 6.2 Transforms of Derivaives and Inegrals. ODEs 2 3 33 39 23. Change of scale. If l( f ()) F(s) and c is any 33 45 APPLICATION OF s-shifting posiive consan, show ha l( f (c)) F(s>c)>c (Hin: In Probs.

More information

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales Advances in Dynamical Sysems and Applicaions. ISSN 0973-5321 Volume 1 Number 1 (2006, pp. 103 112 c Research India Publicaions hp://www.ripublicaion.com/adsa.hm The Asympoic Behavior of Nonoscillaory Soluions

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004 ODEs II, Lecure : Homogeneous Linear Sysems - I Mike Raugh March 8, 4 Inroducion. In he firs lecure we discussed a sysem of linear ODEs for modeling he excreion of lead from he human body, saw how o ransform

More information

Cash Flow Valuation Mode Lin Discrete Time

Cash Flow Valuation Mode Lin Discrete Time IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728,p-ISSN: 2319-765X, 6, Issue 6 (May. - Jun. 2013), PP 35-41 Cash Flow Valuaion Mode Lin Discree Time Olayiwola. M. A. and Oni, N. O. Deparmen of Mahemaics

More information

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests Ouline Ouline Hypohesis Tes wihin he Maximum Likelihood Framework There are hree main frequenis approaches o inference wihin he Maximum Likelihood framework: he Wald es, he Likelihood Raio es and he Lagrange

More information

Math 10B: Mock Mid II. April 13, 2016

Math 10B: Mock Mid II. April 13, 2016 Name: Soluions Mah 10B: Mock Mid II April 13, 016 1. ( poins) Sae, wih jusificaion, wheher he following saemens are rue or false. (a) If a 3 3 marix A saisfies A 3 A = 0, hen i canno be inverible. True.

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS LECTURE : GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS We will work wih a coninuous ime reversible Markov chain X on a finie conneced sae space, wih generaor Lf(x = y q x,yf(y. (Recall ha q

More information

Introduction to Probability and Statistics Slides 4 Chapter 4

Introduction to Probability and Statistics Slides 4 Chapter 4 Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

Experiments on logistic regression

Experiments on logistic regression Experimens on logisic regression Ning Bao March, 8 Absrac In his repor, several experimens have been conduced on a spam daa se wih Logisic Regression based on Gradien Descen approach. Firs, he overfiing

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LDA, logisic

More information