Stochastic Convex Optimization
|
|
- Jasmin Barber
- 6 years ago
- Views:
Transcription
1 Stochastc Covex Optmzato Sha Shalev-Shwartz TTI-Chcago Ohad Shamr The Hebrew Uversty Natha Srebro TTI-Chcago Karthk Srdhara TTI-Chcago Abstract For supervsed classfcato problems, t s well kow that learablty s equvalet to uform covergece of the emprcal rsks ad thus to learablty by emprcal mmzato. Ispred by recet regret bouds for ole covex optmzato, we study stochastc covex optmzato, ad ucover a surprsgly dfferet stuato the more geeral settg: although the stochastc covex optmzato problem s learable (e.g. usg ole-to-batch coversos, o uform covergece holds the geeral case, ad emprcal mmzato mght fal. Rather the beg a dfferece betwee ole methods ad a global mmzato approach, we show that the key gredet s strog covexty ad regularzato. Our results demostrate that the celebrated theorem of Alo et al o the equvalece of learablty ad uform covergece does ot exted to Vapk s Geeral Settg of Learg, that the Geeral Settg cosderg oly emprcal mmzato s ot eough, ad that despte Vapk s result o the equvalece of strct cosstecy ad uform covergece, uform covergece s oly a suffcet, but ot ecessary, codto for meagful o-trval learablty. 1 Itroducto We cosder the stochastc covex mmzato problem argm F(w (1 where F(w = E Z [f(w; Z] s the expectato, wth respect to Z, of a radom objectve that s covex w. The optmzato s based o a..d. sample z 1,..., z draw from a ukow dstrbuto. The goal s to choose w based o the sample ad full kowledge of f(, ad W so as to mmze F(w. Alteratvely, we ca also thk of a ukow dstrbuto over covex fuctos, where we are gve a sample of fuctos {w f(w; z } ad would lke to optmze the expected fucto. A specal case s the famlar predcto settg where z = (x, y s a stace-label par, W s a subset of a Hlbert space, ad f(w; x, y = l( w, φ(x, y for some covex loss fucto l ad feature mappg φ. The stuato whch the stochastc depedece o w s lear, as the precedg example, s farly well uderstood. Whe the doma W ad the mappg φ are bouded, oe ca uformly (over all w W boud the devato betwee the expected objectve F(w ad the emprcal average ˆF(w = Ê [f(w; z] = 1 f(w; z. ( =1 Ths uform covergece of ˆF(w to F(w justfes choosg the emprcal mmzer ŵ = arg m w ˆF(w, (3 ad guaratees that the expected value of F(ŵ coverges to the optmal value F(w = f w F(w. Furthermore, a smlar guaratee ca also be obtaed for a approxmate mmzer of the emprcal objectve. Our goal here s to cosder the stochastc covex optmzato problem more broadly, wthout assumg ay metrc or other structure o the parameter z or mappgs of t, or ay specal structure of the objectve fucto f( ;. Vewed as optmzato based o a sample of fuctos, we do ot mpose ay costrats o the fuctos, or the relatoshp betwee the fuctos, except that each fucto w f(w; z separately s covex ad Lpschtz-cotuous. A ole aalogue of ths settg has recetly receved cosderable atteto. Ole covex optmzato cocers a sequece of covex fuctos f( ; z 1,..., f( ; z, whch ca be chose by a adversary, ad a sequece of ole predctors w, where w ca deped oly o z 1,...,z 1. Ole guaratees provde a upper boud o the ole regret, 1 f(w 1 ; z m w f(w; z. Note the dfferece versus the stochastc settg, where we seek a sgle predctor w ad would lke to boud the populato sub-optmalty F( w F(w. Zkevch [Z03] showed that requrg f(w; z to be Lpschtz-cotuous w.r.t. w s eough for obtag a ole algorthm wth ole regret whch dmshes as 1/. If f(w, z s ot merely covex w.r.t. w, but also strogly covex, the regret dmshes wth a faster rate of 1/ [HKKA06]. These ole results parallel kow results the stochastc settg, whe the stochastc depedece o w s
2 lear. However, they apply also a much broader settg, whe the stochastc depedece o w s ot lear, e.g. whe f(w; z = w z p for p. The requremet that the fuctos w f(w; z be Lpschtz-cotuous s much more geeral tha a specfc requremet o the structure of the fuctos, ad does ot at all costra the relatoshp betwee the fuctos. That s, we ca thk of z as parameterzg all possble Lpschtz-cotuous covex fuctos w f(w; z. We ote that ths s qute dfferet from the work of vo Luxburg ad Bousquet [vlb04] who studed learg wth fuctos that are Lpschtz wth respect to z. The results for the ole settg prompt us to ask whether smlar results, requrg oly Lpschtz cotuty, ca also be obtaed for stochastc covex optmzato. The aswer we dscover s surprsgly complex. Our frst surprsg observato s that requrg Lpschtz cotuty s ot eough for esurg uform covergece of ˆF(w to F(w, or for the emprcal mmzer ŵ to coverge to a optmal soluto. We preset covex, bouded, Lpschtz-cotuous examples where eve as the sample sze creases, the expected value of the emprcal mmzer ŵ s bouded away from the populato optmum: F(ŵ = 1/ > 0 = F(w. I essetally all prevously studed settgs we are aware of where learg or stochastc optmzato s possble, we have at least some form of locally uform covergece, ad a emprcal mmzato approach s approprate. I fact, for commo models of supervsed learg, t s kow that uform covergece s equvalet to stochastc optmzato beg possble [ABCH97]. Ths mght lead us to thk that Lpschtz-cotuty s ot eough to make stochastc covex optmzato possble, eve though t s eough to esure ole covex optmzato s possble. However, ths gap betwee the ole ad stochastc settg caot be, sce t s possble to covert the ole method of Zkevch to a batch algorthm, wth a matchg guaratee o the populato sub-optmalty F( w F(w. Ths guaratee holds for the specfc output w of the algorthm, whch s ot, geeral, the emprcal mmzer. It seems, the, that we are a strage stuato where stochastc optmzato s possble, but oly usg a specfc (ole algorthm, rather tha the more atural emprcal mmzer. We show that the magc ca be uderstood ot as a gap betwee ole optmzato ad emprcal mmzato, but rather terms of regularzato. To do so, we frst show that for a strogly covex stochastc optmzato problem, eve though we mght stll have o uform covergece, the emprcal mmzer s guarateed to coverge to the populato optmum. Ths result seems to defy Vapk s celebrated result o the equvalece of uform covergece ad strct cosstecy of the emprcal mmzer [Vap95, Vap98]. We expla why there s o cotradcto here: Vapk s oto of strct cosstecy s too strct ad does ot capture all stuatos whch learg s o-trval, yet stll possble. Covergece of the emprcal mmzer to the populato optmum for strogly covex objectves justfes stochastc covex optmzato of weakly covex Lpschtzcotuous fuctos usg regularzed emprcal mmzato. I fact, we dscuss how Zkevch s algorthm ca also be uderstood terms of mmzg a mplct regularzed problem. Setup ad Backgroud A stochastc covex optmzato problem s specfed by a covex doma W, whch ths paper we always take to be a closed ad bouded subset of a Hlbert space H, ad a fucto f : W Z R whch s covex w.r.t. ts frst argumet. We say that the problem s learable (or solvable ff there exsts a rule for choosg w based o a..d. sample z 1,..., z, ad complete kowledge of W ad f( ;, such that for ay δ > 0, ay ɛ > 0, ad large eough sample sze, for ay dstrbuto over z, wth probablty at least 1 δ over a sample of sze, we have F( w F(w + ɛ. We say that such a rule s uformly cosstet, or that t solves the stochastc optmzato problem. We say that the problem s bouded by B ff for all w W we have w B. We say that the problem s L-Lpschtz f f(w; z s L- Lpschtz w.r.t. w. That s, for ay z Z ad w 1,w W we have f(w 1 ; z f(w ; z L w 1 w. We say that the problem -strogly covex f for ay z Z, w 1,w W ad α [0, 1] we have f(αw 1 +(1 αw ; z αf(w 1 ; z+(1 αf(w ; z α(1 α w 1 w. Note that ths stregthes the covexty requremet, whch correspods to settg = 0..1 Geeralzed Lear Stochastc Optmzato We say that a problem s a geeralzed lear problem f f(w; z ca be wrtte as f(w; z = g( w, φ(z ; z + r(w (4 where g : R Z R s covex w.r.t. ts frst argumet, r : W R s covex, ad φ : Z H. A specal case s supervsed learg of a lear predctor wth a covex loss fucto, where g( ; ecodes the loss fucto. Learablty results for lear predctors ca -fact be stated more geerally as guaratees o stochastc optmzato of geeralzed lear problems: Theorem 1. Cosder a geeralzed lear stochastc covex optmzato problem of the form (4, such that the doma W s bouded by B, the mage of φ s bouded by R ad g(u; z s L g -Lpschtz u. The for ay dstrbuto over z ad ay δ > 0, wth probablty at least 1 δ over a sample of sze : sup F(w ˆF(w ( B (RL g O log(1/δ
3 That s, the emprcal values ˆF(w coverge uformly, for all w W, to ther expectatos F(w. Ths esures that wth probablty at least 1 δ, for all w W: F(w F(w ( ˆF(w ˆF(ŵ ( B (RL g + O log(1/δ The emprcal suboptmalty term o the rght-had-sde vashes for the emprcal mmzer ŵ, establshg that emprcal mmzato solves the stochastc optmzato problem wth a rate of 1/. Furthermore, (5 allows us to boud the populato suboptmalty terms of the emprcal suboptmalty ad obta meagful guaratees eve for approxmate emprcal mmzers. The o-stochastc term r(w does ot play a role the above boud, as t ca always be caceled out. However, whe ths terms s strogly-covex (e.g. whe t s a squaredorm regularzato term, r(w = w, a faster covergece rate ca be guarateed: Theorem. [SSS08] Cosder a geeralzed lear stochastc covex optmzato problem of the form (4, such that r(w s -strogly covex, the mage of φ s bouded by R ad g(u; z s L g -Lpschtz u. The for ay dstrbuto over z ad ay δ > 0, wth probablty at least 1 δ over a sample of sze, for all w W: F(w F(w ( ˆF(w ˆF(ŵ+O. Ole Covex Optmzato (5 ( (RLg log(1/δ Zkevch [Z03] establshed that Lpschtz cotuty ad covexty of the objectve fuctos wth respect to the optmzato argumet are suffcet for ole optmzato 1 : Theorem 3. [Sha07, Corollary 1] Let f : W Z R be such that W s bouded by B ad f(w, z s covex ad L- Lpschtz wth respect to w. The, there exsts a ole algorthm such that for ay sequece z 1,..., z the sequece of ole vectors w 1,...,w satsfes: ( 1 f(w ; z 1 f(w B L ; z + O (6 Subsequetly, Haza et al [HKKA06] showed that a faster rate ca be obtaed whe the objectve fuctos are ot oly covex, but also strogly covex: Theorem 4. [HKKA06, Theorem 1] Let f : W Z R be such that fucto f(w, z s -strogly covex ad L- Lpschtz wth respect to w. The, there exsts a ole algorthm such that for ay sequece z 1,..., z the sequece of ole vectors w 1,...,w satsfes: 1 f(w ; z 1 ( L f(w log( ; z + O 1 We preset here slghtly more geeral theorem statemets tha those foud the orgal papers [Z03, HKKA06]. We do ot requre dfferetablty, ad stead of boudg the gradet ad the Hessa we boud the Lpschtz costat ad the parameter of strog covexty. The boud Theorem 3 s also a bt tghter tha that orgally establshed by Zkevch. Ole-to-batch coversos I ths paper, we are ot terested the ole settg, but rather the batch stochastc optmzato settg, where we would lke to obta a sgle predctor w wth low expected value over future examples F( w = E z [f( w; z]. Usg martgale equaltes, t s possble to covert a ole algorthm to a batch algorthm wth a stochastc guaratee. Oe smple way to do so s to ru the ole algorthm o the stochastc sequece of fuctos f(, z 1,..., f(, z ad set the sgle predctor w to be the average of the ole choces w 1,...,w. Assumg the codtos of Theorem 3, t s possble to show (e.g. [CCG04] that wth probablty of at least 1 δ we have ( F( w F(w B L O log(1/δ. (7 It s also possble to derve a smlar guaratee assumg the codtos of Theorem 4 [KT08]: ( L F( w F(w log(/δ O. (8 The codtos for Theorem 3 geeralze those of Theorem 1 whe r(w = 0: If f(w; z = g( w, φ(z satsfes the codtos of Theorem 1 the t also satsfes the codtos of Theorem 3 wth L = L g R ad the boud o the populato sub-optmalty of w gve (7 matches the guaratee o ŵ usg Theorem 1. Smlarly, the codtos of Theorem 4 roughly geeralze those of Theorem wth L = RL g + L r ad the guaratees are smlar (except for a log-factor, ad as log as L r = O(RL g. It s mportat to ote, however, that the guaratees (7 ad (8 do ot subsume Theorems 1 ad, as the ole-to-batch guaratees apply oly to a specfc choce w whch s defed terms of the behavor of a specfc algorthm. They do ot provde guaratees o the emprcal mmzer, ad certaly ot a uform guaratee terms of the emprcal sub-optmalty. 3 Warm-Up: Fte Dmesoal Case We beg by otg that the fte dmesoal case, Lpschtz cotuty s eough to guaratee uform covergece, hece also learablty va emprcal mmzato. Theorem 5. Let W R d be bouded by B ad let f(w, z be L-Lpschtz w.r.t. w. The wth probablty of at least 1 δ over a sample of sze, for all w W: F(w ˆF(w O L B d log(log( d δ Proof. We wll show uform covergece by boudg the l -coverg umber of the class of fuctos F = {z f(w; z w W}. To do so, we frst ote that as a subset of a l -sphere, we ca boud the coverg umber of W wth respect to the Eucldea dstace d (w 1,w = w 1 w [VG05]: (for d > 3 ( N(ɛ, W, d = O d ( B ɛ d (9
4 We ow tur to coverg umbers of F wth respect to the l dstace d (f(w 1 ;, f(w ; = sup z f(w 1 ; z f(w ; z. By Lpschtz cotuty, for ay w 1,w W we have sup z f(w 1 ; z f(w ; z L w 1 w. A ɛ-coverg of W w.r.t. d therefore yelds a Lɛ-coverg of F w.r.t. d dstaces, ad so: ( N(ɛ, F, d N(ɛ/L, W, d = O d ( LB d ɛ (10 Notg that the emprcal l 1 coverg umber s bouded by the d coverg umber, ad usg a uform boud terms of emprcal l 1 coverg umbers we get [Pol84]: Pr ( sup F(w ˆF(w ɛ 8N(ɛ, F, d exp( ɛ ( ( d LB O d exp( ɛ ɛ 18LR. 18LR Equatg the rght-had-sde to δ ad boudg ɛ we get the boud the Theorem. We ca therefore coclude that emprcal mmzato s uformly cosstet wth the same rate as Theorem 5: F(ŵ F(w + O L B d log(log( d δ (11 wth probablty at least 1 δ over a sample of sze. Ths s the stadard approach for establshg learablty. We ow tur to ask whether such a approach ca also be take the fte dmesoal case,.e. yeldg a boud that does ot deped o the dmesoalty. 4 Learable, but ot wth Emprcal Mmzer The results of the Secto. suggest that perhaps Lpschtz cotuty s eough for obtag guaratees o stochastc covex optmzato usg a more drect approach, eve fte dmesos. I partcular, that perhaps Lpschtz cotuty s eough for esurg uform covergece, whch tur would mply learablty usg emprcal mmzato, as the fte dmesoal lear case, the fte dmesoal Lpschtz case, ad essetally all studed scearos of stochastc optmzato that we are aware of. Esurg uform covergece would further eable us to use approxmate emprcal mmzers, ad boud the stochastc sub-optmalty of ay vector w terms of ts emprcal suboptmalty, rather tha obtag a guaratee o the stochastc sub-optmalty of oly oe specfc procedural choce (obtaed from rug the ole learg algorthm. Ufortuately, ths s ot the case. Despte the fact that a bouded, Lpschtz-cotuous, stochastc covex optmzato problem s learable eve fte dmesos, as dscussed Secto., we show here that uform covergece does ot hold ad that t mght ot be learable wth emprcal mmzato. 4.1 Emprcal Mmzer far from Populato Optmal Cosder a covex stochastc optmzato problem gve by: f (1 (w; (x, α = α (w x = α [](w[] x[] (1 where for ow we wll set the doma to the d-dmesoal ut sphere W = { w R d : w 1 } ad take z = (x, α wth α [0, 1] d ad x W, ad where u v deotes a elemet-wse product. We wll frst cosder a sequece of problems, where d = for ay sample sze, ad establsh that we caot expect a covergece rate whch s depedet of the dmesoalty d. We the formalze ths example fte dmesos. Oe ca thk of the problem (1 as that of fdg the ceter of a ukow dstrbuto over x R d, where we also have stochastc per-coordate cofdece measures α[]. We wll actually focus o the case where some coordates are mssg,.e. occasoally α[] = 0. I ay case the doma W s bouded by oe, ad for ay z = (x, α the fucto w f (1 (w; z s covex ad 1-Lpschtz. Thus, the codtos of Theorem 3 hold, ad the covex stochastc optmzato problem s learable by rug Zkevch s ole algorthm ad takg a average. Cosder the followg dstrbuto over Z = (X, α: X = 0 wth probablty oe, ad α s uform over {0, 1} d. That s, α[] are..d. uform Beroull. For a radom sample (x 1, α 1,..., (x, α we have that wth probablty greater tha 1 e 1 > 0.63, there exsts a coordate j 1... such that all cofdece vectors α the sample are zero o the coordate j,.e. α [j] = 0 for all = 1... Let e j W be the stadard bass vector correspodg to ths coordate. The ˆF (1 (e j = 1 α (e j 0 = 1 α [j] = 0 but F (1 (e j = E X,α [ α (e j 0 ] = E X,α [ α[j] ] = 1/. We establshed that for ay, we ca costruct a covex Lpschtz-cotuous objectve hgh eough dmeso such that wth probablty at least 0.63 over the sample, F(1 sup w (w ˆF (1 (w 1/. Furthermore, sce f( ; s o-egatve, we have that e j s a emprcal mmzer, but ts expected value F (1 (e j = 1/ s far from the optmal expected value m w F (1 (w = F (1 (0 = I Ifte Dmesos: Populato Mmzer Does Not Coverge to Populato Optmum To formalze the example a sample-sze depedet way, take W to be the ut sphere of a fte-dmesoal Hlbert space wth orthoormal bass e 1,e,..., where for v W, we refer to ts coordates v[j] = v,e j w.r.t ths bass. The cofdeces α are ow a mappg of each coordate to [0, 1]. That s, a fte sequece of reals [0, 1]. The elemet-wse product operato α v s defed wth respect to ths bass ad the objectve fucto f (1 of equato (1 s well defed ths fte-dmesoal space.
5 We aga take a dstrbuto over Z = (X, α where X = 0 ad α s a..d. sequece of uform Beroull radom varables. Now, for ay fte sample there s almost surely a coordate j wth α [j] = 0 for all, ad so we a.s. have a emprcal mmzer ˆF (1 (e j = 0 wth F (1 (e j = 1/ > 0 = F (1 (0. We see that although the stochastc covex optmzato problem (1 s learable (usg Zkevch s ole algorthm, the emprcal values ˆF (1 (w do ot coverge uformly to ther expectatos, ad emprcal mmzato s ot guarateed to solve the problem! 4.3 Uque Emprcal Mmzer Does Not Coverge to Populato Optmum It s also possble to costruct a sharper couterexample, whch the uque emprcal mmzer ŵ s far from havg optmal expected value. To do so, we augmet f (1 by a small term whch esures ts emprcal mmzer s uque, ad far from the org. Cosder: f (13 (w; (x, α = f (1 (w; (x, α + ɛ (w[] 1 (13 where ɛ = The objectve s stll covex ad (1 + ɛ- Lpschtz. Furthermore, sce the addtoal term s strctly covex, we have that f (13 (w; z s strctly covex w.r.t. w ad so the emprcal mmzer s uque. Cosder the same dstrbuto over Z: X = 0 whle α[] are..d. uform zero or oe. The emprcal mmzer s the mmzer of ˆF (13 (w subject to the costrats w 1. Idetfyg the soluto to ths costraed optmzato problem s trcky, but fortuately ot ecessary. It s eough to show that the optmum of the ucostraed optmzato problem w UC = arg m ˆF (13 (w (wthout costrag w W has orm w UC 1. Notce that the ucostraed problem, wheever α [j] = 0 for all = 1.., oly the secod term of f (13 depeds o w[j] ad we have w UC[j] = 1. Sce ths happes a.s. for some coordate j, we ca coclude that the soluto to the costraed optmzato problem les o the boudary of W,.e. has ŵ = 1. But for such a soluto we have [ F (13 (ŵ E α α[]ŵ [] ] E α [ α[]ŵ [] ] = 1 ŵ = 1, whle F(w F(0 = ɛ. I cocluso, o matter how bg the sample sze s, the uque emprcal mmzer ŵ of the stochastc covex optmzato problem (13 s a.s. much worse tha the populato optmum, F(ŵ 1 > ɛ F(w, ad certaly does ot coverge to t. 5 Emprcal Mmzato of a Strogly Covex Objectve We saw that emprcal mmzato s ot adequate for stochastc covex optmzato eve f the objectve s Lpschtz-cotuous. We wll ow show that, f the objectve f(w; z s strogly covex w.r.t. w, the emprcal mmzer does coverge to the optmum. Ths s despte the fact that eve the strogly covex case, we stll mght ot have uform covergece of ˆF(w to F(w. 5.1 Emprcal Mmzer coverges to Populato Optmum Theorem 6. Cosder a stochastc covex optmzato problem such that f(w; z s -strogly covex ad L- Lpschtz wth respect to w W. Let z 1,..., z be a..d. sample ad let ŵ be the emprcal mmzer. The, wth probablty at least 1 δ over the sample we have F(ŵ F(w 4L δ. (14 Proof. To prove the Theorem, we use a stablty argumet troduced by Bousquet ad Elsseeff [BE0]. Deote ˆF ( (w = 1 f(w, z + f(w, z j the emprcal average wth z replaced by a depedetly ad detcally draw z, ad cosder ts mmzer: ŵ ( = arg m ˆF ( (w. We frst use strog covexty ad Lpschtz-cotuty to establsh that emprcal mmzato s stable the followg sese: z 1,...,z, z, z Z f(ŵ, z f(ŵ (, z β (15 wth β = 4L (ths s referred to as CV (Replacemet Stablty [RMP05] ad s smlar to uform stablty [BE0]. We the show that (15 mples covergece of F(ŵ to F(w. Clam 6.1. Uder the codtos of Theorem 6, the stablty boud (15 holds wth β = 4L. Proof of Clam 6.1: We frst calculate: ˆF(ŵ ( ˆF(ŵ = f(ŵ(, z f(ŵ, z = f(ŵ(, z f(ŵ, z ( + ˆF ( (ŵ ( ˆF ( (ŵ ( j f(ŵ (, z f(ŵ, z + + f(ŵ, z f(ŵ(, z (16 f(ŵ(, z f(ŵ, z + f(ŵ, z f(ŵ(, z L ŵ ( ŵ (17 where the frst equalty follows from the fact that ŵ ( s the mmzer of ˆF ( (w ad for the secod equalty we use Lpschtz cotuty. But from strog covexty of ˆF(w ad the fact that ŵ mmzes ˆF(w we also have that ˆF(ŵ ( ˆF(ŵ + ŵ ( ŵ. (18 Combg (18 wth (17 we get ŵ ( ŵ 4L/(. Fally from Lpschtz cotuty, for ay z Z: f(ŵ, z f(ŵ (, z 4L (19
6 Clam 6.. If the stablty boud (15 holds, the for ay δ > 0, wth probablty 1 δ over the sample, F(ŵ F(w β δ. (0 A smlar result that s ot specfc to ŵ, but yelds oly a β + 1 rate appears [RMP05, Theorem 4.4]. The faster rate s mportat for us here. Proof of Clam 6.: Sce the samples wth z ad wth z are detcally dstrbuted, ad z s depedet of z, we have: [ ] [ ] E[F(ŵ] = E F(ŵ ( = E f(ŵ ( ; z where the expectato s over z 1,...,z, z. Ths holds for all, ad so we ca also wrte: E[F(ŵ] = 1 [ ] E f(ŵ ( ; z. (1 We also have: [ [ ] 1 E ˆF(ŵ = E =1 ] f(ŵ; z = 1 =1 E[f(ŵ; z ] ( Combg (1 ad ( ad usg (15 yelds : [ E F(ŵ ˆF(ŵ ] = 1 [ ] E f(ŵ (, z f(ŵ; z β =1 ] [ ] We also have that E[F(w ] = E[ ˆF(w E ˆF(ŵ, where the equalty s just equatg a expectato to a expectato of a average, ad the equalty follows from optmalty of ŵ. We ca therefore coclude: [ E[F(ŵ F(w ] E F(ŵ ˆF(ŵ ] β. (3 Usg Markov s equalty yelds (0. =1 We suspect that the depedece o δ the above boud ca be mproved to log(1/δ, matchg the depedece o δ the ole-to-batch guaratee (8 ad the guratees for the geeralzed lear case. For more detals, see Appedx A. 5. But Wthout Uform Covergece! We ow tur to ask whether the covergece of the emprcal mmzer ths case s a result of uform covergece. Cosder augmetg the objectve fucto f (1 of Secto 4 wth a strogly covex term: f (4 (w;x, α = f (1 (w;x, α + w. (4 The modfed objectve f (4 ( ; s -strogly covex ad (1 + -Lpschtz over the doma W = {w : w 1} ad thus satsfes the codtos of Theorem 6. Cosder the same dstrbuto over Z = (X, α used Secto 4: X = 0 ad α s a..d. sequece of uform Ths s a modfcato of a dervato extracted from the proof of Theorem 1 [BE0] zero/oe Beroull varables. Recall that almost surely we have a coordate j that s ever observed,.e. such that α [j] = 0. Cosder a vector te j of magtude 0 < t 1 the drecto of ths coordate. We have that ˆF (4 (te j = t but F (4 (te j = 1 t+ t. Hece F (4 (te j ˆF (4 (te j = t/. I partcular, we ca set t = 1 ad establsh sup (F (4 (w ˆF (4 (w 1 regardless of the sample sze. We see the that the emprcal averages ˆF (4 (w do ot coverge uformly to ther expectatos, eve as the sample sze creases. 5.3 Not Eve Local Uform Covergece For ay ɛ > 0, cosder lmtg our atteto oly to predctors that are close to beg populato optmal: W ɛ = {w W : F (4 (w F (4 (w + ɛ}. Settg t = ɛ we have te j W ɛ (focusg for coveece o < 1 ad so: sup ɛ (F (4 (w ˆF (4 (w ɛ (5 regardless of the sample sze. Ad so, eve a arbtrarly small eghborhood of the optmum, the emprcal values ˆF (4 (w do ot coverge uformly to ther expected values eve as. Ths s sharp cotrast to essetally all other results o stochastc optmzato ad learg that we are aware of. 5.4 Boudg Populato Sub-Optmalty term of Emprcal Sub-Optmalty A practcal questo related to uform covergece s whether we ca obta a uform boud o the populato sub-optmalty terms of the emprcal sub-optmalty, as Theorem. We frst ote that merely due to the fact that the emprcal objectve ˆF s strogly covex, ay approxmate emprcal mmzer must be close to ŵ, ad due to the fact that the expected objectve F s Lpschtz-cotuous ay vector close to ŵ caot have a much worse value tha ŵ. We therefore have, uder the codtos of Theorem 6, that wth probablty at least 1 δ, for all w W: L F(w F(w ˆF(w ˆF(ŵ + 4L δ (6 It s mportat to emphasze that ths s a mmedate cosequece of (14 ad does ot volve ay further stochastc propertes of ˆF or F. Although ths uform equalty does allow us to boud the populato sub-optmalty terms of the emprcal sub-optmalty, the emprcal sub-optmalty must be quadratc the desred populato sub-optmalty. Compare ths depedece wth the more favorable lear depedece of Theorem. Ufortuately, as we show ext, ths s the best that ca be esured. Cosder the objectve f (4 ad the same dstrbuto over Z = (X, α dscussed above ad recall that te j s a vector of magtude t alog a coordate j s.t. α [j] = 0. We have that ˆF (4 (te j ˆF (4 (ŵ = t ad so settg t = ɛ/, we get a ɛ-emprcal-suboptmal vector wth populato sub-optmalty F (4 (te j F (4 (0 = 1 t + t = ɛ + ɛ.
7 Ths establshes that the depedece o ɛ the frst term of (6 s tght, ad the stuato s qualtatvely dfferet tha the geeralzed lear case. 5.5 Cotradcto to Vapk? At ths pot, a reader famlar wth Vapk s work o ecessary ad suffcet codtos for cosstecy of emprcal mmzato (.e. codtos for F(ŵ F(w mght be cofused. I seekg such ecessary ad suffcet codtos [Vap98, Chapter 3], Vapk excludes certa cosstet settgs where the cosstecy s so-called trval. The ma example of a excluded settg s oe whch there s oe hypothess w 0 that domates all others,.e. f(w 0 ; z < f(w; z for all w W ad all z Z [Vap98, Fgure 3.]. Whe ths s the case, emprcal mmzato wll be cosstet regardless of the behavor of ˆF(w for w w 0. I order to exclude such trval cases Vapk defes strct (aka o-trval cosstecy of emprcal mmzato as ( our otato: f ˆF(w P f F(w c (7 F(w c F(w c for all c R, where the covergece s probablty. Ths codto deed esures that F(ŵ P F(w. Vapk s Key Theorem o Learg Theory [Vap98, Theorem 3.1] the states that strct cosstecy of emprcal mmzato s equvalet to oe-sded uform covergece. Oesded meag requrg oly sup(f (4 (w ˆF (4 (w P 0, rather the sup F (4 (w ˆF P (4 (w 0. Note that the aalyss above shows the lack of such oe-sded uform covergece. I the example preseted above, eve though Theorem 6 establshes F(ŵ P F(w, the cosstecy s t strct by the defto above. To see ths, for ay c > 0, cosder the vector te j (where α [j] = 0 wth t = c. We have F(te j = 1 t + t > c but ˆF(4 (te j = t = c. Focusg o = 1 we get: f ˆF(w c (8 F(w c almost surely for ay sample sze, volatg the strct cosstecy requremet (7. The fact that the rght-had-sde of (8 s strctly greater the F(w = 0 s eough for obtag (o strct cosstecy of emprcal mmzato, but ths s ot eough for satsfyg strct cosstecy. We emphasze that stochastc covex optmzato s far from trval that there s o domatg hypothess that wll always be selected. Although for coveece of aalyss we took X = 0, oe should thk of stuatos whch X s stochastc wth a ukow dstrbuto. We see the that there s o mathematcal cotradcto here to Vapk s Key Theorem. Rather, we see a demostrato that strct cosstecy s too strct a requremet, ad that terestg, o-trval, learg problems mght admt o-strct cosstecy whch s ot equvalet to oe-sded uform covergece. We see that uform covergece s a suffcet, but ot at all ecessary, codto for cosstecy of emprcal mmzato o-trval settgs. 6 Regularzato We ow retur to the case where f(w, z s Lpschtz (ad covex w.r.t. w but ot strogly covex. As we saw, emprcal mmzato may fal ths case, despte the guarateed success of a ole approach. Our goal ths secto s to uderscore a more drect, o-procedural, optmzato crtero for stochastc optmzato. To do so, we defe a regularzed emprcal mmzato problem ( ŵ = m w + 1 f(w, z =1, (9 where s a parameter that wll be determed later. The followg theorem establshes that the mmzer of (9 s a good soluto to the stochastc covex optmzato problem: Theorem 7. Let f : W Z R be such that W s bouded by B ad f(w, z s covex ad L-Lpschtz wth respect to w. Let z 1,..., z be a..d. sample ad let ŵ be the 16L mmzer of (9 wth = δ B. The, wth probablty at least 1 δ we have ( F(ŵ F(w L B δ δ Proof. Let r(w; z = w + f(w; z ad let R(w = E z [r(w, z]. Note that ŵ s the emprcal mmzer for the stochastc optmzato problem defed by r(w; z. We apply Theorem 6 to r(w; z, to ths ed ote that sce f s L-Lpschtz ad w W, w B we see that r s fact L + B-Lpschtz. Applyg Theorem 6 ow we see that ŵ 4(L + B +F(ŵ = R(ŵ f R(w+ w δ R(w 4(L + B + = δ w +F(w 4(L + B + δ Now ote that w B ad so we get that F(ŵ F(w + B + 4(L + B δ F(w + B + 8L δ + 8B δ Pluggg the value of gve the theorem statemet we see that F(ŵ F(w L B + 4 δ + 3 L B δ δ Ths gves us the requred boud. From the above theorem ad the dscusso Secto 4 we see that regularzato s essetal for covex stochastc optmzato. It s terestg to cotrast ths wth the ole learg algorthm of Zkevch [Z03]. Seemgly, the ole approach of Zkevch does ot rely o regularzato. However, a more careful look reveals a uderlyg regularzato also the ole techque. Ideed, Shalev- Shwartz [Sha07] showed that Zkevch s ole learg
8 f (1 f (4 As we show below, we caot solve the stochastc optuform covergece: sup ˆF(w F(w 0 learable by emprcal m always supervsed learg f(1 learable wth ŵ: F(ŵ F(w uform covergece learable always supervsed learg f(4 learable: F( w F(w Fgure 1: Lpschtz-cotuous covex problems (tragle are all learable, but ot ecessarly usg emprcal mmzato. Lpschtz-cotuous strogly covex problems (dotted rectagle are all learable wth emprcal mmzato, but uform covergece mght ot hold. For bouded geeralzed lear problems (starred rectagle, uform covergece always holds. Our two separatg examples are also dcated. algorthm ca be vewed as approxmate coordate ascet optmzato of the dual of the regularzed problem (9. Furthermore, t s also possble to obta the same ole regret boud usg a Follow-The-Regularzed-Leader approach, whch at each terato drectly solves the regularzed mmzato problem (9 o z 1,..., z 1. The key, the, seems to be regularzato, rather the a procedural ole versus global mmzato approach. 6.1 Regularzato vs Costrats The role of regularzato here s very dfferet tha famlar settgs such as l regularzato SVMs ad l 1 regularzato LASSO. I those settgs regularzato serves to costra our doma to a low-complexty doma (e.g. loworm predctors, where we rely o uform covergece. I fact, almost all learg guaratees for such settgs that we are aware of ca be expressed terms of some sort of uform covergece. Ad as we metoed, learablty (uder the stadard supervsed learg model s fact equvalet to a uform covergece property. I our case, costrag the orm of w does ot esure uform covergece. Cosder the example f (1 of Secto 4. Eve over a restrcted doma W r = {w : w r}, for arbtrarly small r > 0, the emprcal averages ˆF(w do ot uformly coverge to F(w ad Pr (lmsup sup r ˆF(w F(w > 0 = 1. Furthermore, cosder replacg the regularzato term w wth a costrat o the orm of w, amely, solvg the problem w r = arg m ˆF(w (30 w r Fgure : Relatoshp betwee dfferet propertes of stochastc optmzato problems. mzato problem by settg r a dstrbuto-depedet way (.e. wthout kowg the soluto... To see ths, ote that whe X = 0 a.s. we must have r 0 to esure F( w w F(w. However, f X = e 1 a.s., we must set r 1. No costrat wll work for all dstrbutos over Z = (X, α! Ths sharply cotrasts wth tradtoal uses of regularzato, were learg guaratees are actually typcally stated terms of a costrat o the orm rather tha terms of a parameter such as, ad addg a regularzato term of the form w s vewed as a proxy for boudg the orm w. 7 Summary Followg the work of Zkevch [Z03], we expected to be able to geeralze well establshed results o stochastc optmzato of lear fuctos also to the more geeral Lpschtz-covex case. We dscovered a complex ad uexpected stuato, where strog covexty ad regularzato play a key role ad ultmately dd reach a uderstadg of stochastc covex optmzato that does ot rely o ole techques (Fgure 1. For stochastc objectves that arse from supervsed predcto problems, t s well kow that learablty,.e. solvablty of the stochastc optmzato problem, s equvalet to uform covergece, ad so wheever the problem s learable, t s learable usg emprcal mmzato [ABCH97]. May mght thk that ths prcpal, amely that a problem s learable ff t s learable usg emprcal mmzato, exteds also the Geeral Settg of Learg [Vap95] whch cludes also the stochastc covex optmzato problem studed here. However, we demostrated stochastc optmzato problems whch these equvaleces do ot hold. There s o cotradcto, sce stochastc optmzato problems that arse from supervsed learg have a restrcted structure, ad partcular the examples we study are ot amog such problems. I fact, for reasoable loss fuctos, order to make f(w;x, y = l(pred(w,x, y covex for both pos-
9 tve ad egatve labels, we must essetally make the predcto fucto pred(w, x both covex ad cocave w,.e. lear. Ad so stochastc (or ole covex optmzato problems that correspod to supervsed problems are ofte geeralzed lear problems. To summarze, although there s o cotradcto to the work of Vapk [Vap95] or of Alo et al [ABCH97], we see that learg the Geeral Settg s more complex tha we perhaps apprecate. Emprcal mmzato mght be cosstet wthout local uform covergece, ad more surprsgly, learg mght be possble, but ot by emprcal mmzato (Fgure. Ackowledgmets We would lke to thak Leo Bottou, Tog Zhag, ad Vladmr Vapk for helpful dscussos. Refereces [ABCH97] N. Alo, S. Be-Davd, N. Cesa-Bach, ad D. Haussler. Scale-sestve dmesos, uform covergece, ad learablty. J. ACM, 44(4: , [BE0] O. Bousquet ad A. Elsseeff. Stablty ad geeralzato. J. Mach. Lear. Res., :499 56, 00. [CCG04] N. Cesa-Bach, A. Coco, ad C. Getle. O the geeralzato ablty of o-le learg algorthms. IEEE Trasactos o Iformato Theory, 50(9: , September 004. [HKKA06] E. Haza, A. Kala, S. Kale, ad A. Agarwal. Logarthmc regret algorthms for ole covex optmzato. I Proceedgs of the Neteeth Aual Coferece o Computatoal Learg Theory, 006. [HKLW91] Davd Haussler, Mchael Kears, Nck Lttlestoe, ad Mafred K. Warmuth. Equvalece of models for polyomal learablty. Iformato ad Computato, 95(:19 161, December [KT08] S.M. Kakade ad A. Tewar. O the geeralzato ablty of ole strogly covex programmg algorthms. I NIPS, 008. [Pol84] D. Pollard. Covergece of Stochastc Processes. Sprger, New York, [RMP05] S. Rakhl, S. Mukherjee, ad T. Poggo. Stablty results learg theory. Aalyss ad Applcatos, 3(4: , 005. [Sha07] S. Shalev-Shwartz. Ole Learg: Theory, Algorthms, ad Applcatos. PhD thess, The Hebrew Uversty, 007. [SSS08] K. Srdhara, N. Srebro, ad S. Shalev-Shwartz. Fast rates for regularzed objectves. I Advaces Neural Iformato Processg Systems, 008. [Vap95] V.N. Vapk. The Nature of Statstcal Learg Theory. Sprger, [Vap98] V. N. Vapk. Statstcal Learg Theory. Wley, [VG05] J.L. Verger-Gaugry. Coverg a ball wth smaller equal balls R. Dscrete Comput. Geom., 33(1: , 005. [vlb04] U. vo Luxburg ad O. Bousquet. Dstace based classfcato wth lpschtz fuctos. J. Mach. Lear. Res., 5: , 004. [Z03] M. Zkevch. Ole covex programmg ad geeralzed ftesmal gradet ascet. I Proceedgs of the Tweteth Iteratoal Coferece o Mache Learg, 003. A Hgh Cofdece Bouds The bouds Theorems 6 ad 7 have polyomal rather tha logarthmc depedece o the cofdece parameter δ. Ths leads to the questo of whether these bouds ca be mproved to deped just o log(1/δ, matchg the depedece o δ the ole-to-batch guaratees (7 ad (8. Whle we suspect ths mght be the case, the questo remas ope. We emphasze that the questo here pertas to the boud o the covergece of the emprcal mmzer. The ole-to-batch guaratees apply oly to a specfc procedurally defed predctor, whch s ot the emprcal mmzer. Aother smple way to acheve a logarthmc depedece o 1/δ s to use emprcal mmzato combed wth a geerc boostg-the-cofdece method [HKLW91]. Aga, ths leads to a hgh-cofdece boud for a dfferet learg rule, based o the emprcal mmzer, but s ot the emprcal mmzer. As for results regardg the emprcal mmzer tself, we ote that t s possble to get hgh-cofdece bouds, wth oly a logarthmc depedece o 1/δ. However, these bouds come at the prce of worse depedece o the other parameters of the learg problem. For stace, f F(w s twce cotuously dfferetable, wth a uform upper boud max o the egevalues of ts Hessa, ad the codtos of Theorem 6 hold, we get that wth probablty at least 1 δ: ( L F(ŵ F(w log(1/δ max O. (31 Also, uder the codtos of Theorem 6 ad wthout ay addtoal assumpto, Bousquet ad Elsseeff [BE0] provde argumets for a boud of the form ( F(ŵ F(w L log(1/δ O. (3 Ufortuately, ether of these two bouds s suffcet for obtag a verso of Theorem 7 whch matches the oleto-batch guaratee (8 or the boud of Theorem 1 for the geeralzed lear case. Optmzg for the value of as a fucto of the sample sze, we get that the boud o the uregularzed objectve fucto Theorem 7 s replaced by ( (B F(ŵ F(w 4 L 1/3 log(1/δ max O f we use (31, or ( (B F(ŵ F(w 4 L 4 1/4 log(1/δ O f we use (3. I partcular, the depedece o the sample sze s sgfcatly worse tha 1/.
Stochastic Convex Optimization
Stochastc Covex Optmzato Sha Shalev-Shwartz TTI-Chcago sha@tt-c.org Ohad Shamr The Hebrew Uversty ohadsh@cs.huj.ac.l Natha Srebro TTI-Chcago at@uchcago.edu Karthk Srdhara TTI-Chcago karthk@tt-c.org.edu
More informationarxiv: v1 [cs.lg] 22 Feb 2015
SDCA wthout Dualty Sha Shalev-Shwartz arxv:50.0677v cs.lg Feb 05 Abstract Stochastc Dual Coordate Ascet s a popular method for solvg regularzed loss mmzato for the case of covex losses. I ths paper we
More informationDimensionality Reduction and Learning
CMSC 35900 (Sprg 009) Large Scale Learg Lecture: 3 Dmesoalty Reducto ad Learg Istructors: Sham Kakade ad Greg Shakharovch L Supervsed Methods ad Dmesoalty Reducto The theme of these two lectures s that
More informationFeature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)
CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.
More informationChapter 5 Properties of a Random Sample
Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample
More informationRademacher Complexity. Examples
Algorthmc Foudatos of Learg Lecture 3 Rademacher Complexty. Examples Lecturer: Patrck Rebesch Verso: October 16th 018 3.1 Itroducto I the last lecture we troduced the oto of Rademacher complexty ad showed
More informationMATH 247/Winter Notes on the adjoint and on normal operators.
MATH 47/Wter 00 Notes o the adjot ad o ormal operators I these otes, V s a fte dmesoal er product space over, wth gve er * product uv, T, S, T, are lear operators o V U, W are subspaces of V Whe we say
More information18.657: Mathematics of Machine Learning
8.657: Mathematcs of Mache Learg Lecturer: Phlppe Rgollet Lecture 3 Scrbe: James Hrst Sep. 6, 205.5 Learg wth a fte dctoary Recall from the ed of last lecture our setup: We are workg wth a fte dctoary
More informationSummary of the lecture in Biostatistics
Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the
More informationCS286.2 Lecture 4: Dinur s Proof of the PCP Theorem
CS86. Lecture 4: Dur s Proof of the PCP Theorem Scrbe: Thom Bohdaowcz Prevously, we have prove a weak verso of the PCP theorem: NP PCP 1,1/ (r = poly, q = O(1)). Wth ths result we have the desred costat
More informationEconometric Methods. Review of Estimation
Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators
More informationLecture 9: Tolerant Testing
Lecture 9: Tolerat Testg Dael Kae Scrbe: Sakeerth Rao Aprl 4, 07 Abstract I ths lecture we prove a quas lear lower boud o the umber of samples eeded to do tolerat testg for L dstace. Tolerat Testg We have
More informationPart 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))
art 4b Asymptotc Results for MRR usg RESS Recall that the RESS statstc s a specal type of cross valdato procedure (see Alle (97)) partcular to the regresso problem ad volves fdg Y $,, the estmate at the
More informationPoint Estimation: definition of estimators
Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.
More informationChapter 9 Jordan Block Matrices
Chapter 9 Jorda Block atrces I ths chapter we wll solve the followg problem. Gve a lear operator T fd a bass R of F such that the matrx R (T) s as smple as possble. f course smple s a matter of taste.
More informationLecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions
CO-511: Learg Theory prg 2017 Lecturer: Ro Lv Lecture 16: Bacpropogato Algorthm Dsclamer: These otes have ot bee subected to the usual scruty reserved for formal publcatos. They may be dstrbuted outsde
More informationResearch Article A New Iterative Method for Common Fixed Points of a Finite Family of Nonexpansive Mappings
Hdaw Publshg Corporato Iteratoal Joural of Mathematcs ad Mathematcal Sceces Volume 009, Artcle ID 391839, 9 pages do:10.1155/009/391839 Research Artcle A New Iteratve Method for Commo Fxed Pots of a Fte
More informationBayes (Naïve or not) Classifiers: Generative Approach
Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg
More informationX ε ) = 0, or equivalently, lim
Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece
More informationThe Mathematical Appendix
The Mathematcal Appedx Defto A: If ( Λ, Ω, where ( λ λ λ whch the probablty dstrbutos,,..., Defto A. uppose that ( Λ,,..., s a expermet type, the σ-algebra o λ λ λ are defed s deoted by ( (,,...,, σ Ω.
More informationDiscrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b
CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package
More information1 Review and Overview
CS9T/STATS3: Statstcal Learg Teory Lecturer: Tegyu Ma Lecture #7 Scrbe: Bra Zag October 5, 08 Revew ad Overvew We wll frst gve a bref revew of wat as bee covered so far I te frst few lectures, we stated
More information1 Lyapunov Stability Theory
Lyapuov Stablty heory I ths secto we cosder proofs of stablty of equlbra of autoomous systems. hs s stadard theory for olear systems, ad oe of the most mportat tools the aalyss of olear systems. It may
More informationCHAPTER VI Statistical Analysis of Experimental Data
Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca
More informationCIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights
CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme:
More informationPROJECTION PROBLEM FOR REGULAR POLYGONS
Joural of Mathematcal Sceces: Advaces ad Applcatos Volume, Number, 008, Pages 95-50 PROJECTION PROBLEM FOR REGULAR POLYGONS College of Scece Bejg Forestry Uversty Bejg 0008 P. R. Cha e-mal: sl@bjfu.edu.c
More informationBounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy
Bouds o the expected etropy ad KL-dvergece of sampled multomal dstrbutos Brado C. Roy bcroy@meda.mt.edu Orgal: May 18, 2011 Revsed: Jue 6, 2011 Abstract Iformato theoretc quattes calculated from a sampled
More informationClass 13,14 June 17, 19, 2015
Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral
More informationIntroduction to local (nonparametric) density estimation. methods
Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest
More information6.867 Machine Learning
6.867 Mache Learg Problem set Due Frday, September 9, rectato Please address all questos ad commets about ths problem set to 6.867-staff@a.mt.edu. You do ot eed to use MATLAB for ths problem set though
More information8.1 Hashing Algorithms
CS787: Advaced Algorthms Scrbe: Mayak Maheshwar, Chrs Hrchs Lecturer: Shuch Chawla Topc: Hashg ad NP-Completeess Date: September 21 2007 Prevously we looked at applcatos of radomzed algorthms, ad bega
More informationKernel-based Methods and Support Vector Machines
Kerel-based Methods ad Support Vector Maches Larr Holder CptS 570 Mache Learg School of Electrcal Egeerg ad Computer Scece Washgto State Uverst Refereces Muller et al. A Itroducto to Kerel-Based Learg
More informationLecture 3 Probability review (cont d)
STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto
More informationConvergence of Large Margin Separable Linear Classification
Covergece of Large Marg Separable Lear Classfcato Tog Zhag Mathematcal Sceces Departmet IBM T.J. Watso Research Ceter Yorktow Heghts, NY 0598 tzhag@watso.bm.com Abstract Large marg lear classfcato methods
More informationSimple Linear Regression
Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato
More informationOrdinary Least Squares Regression. Simple Regression. Algebra and Assumptions.
Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos
More informationESS Line Fitting
ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here
More information{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:
Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed
More informationFunctions of Random Variables
Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,
More informationLecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model
Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The
More information22 Nonparametric Methods.
22 oparametrc Methods. I parametrc models oe assumes apror that the dstrbutos have a specfc form wth oe or more ukow parameters ad oe tres to fd the best or atleast reasoably effcet procedures that aswer
More informationGeneralized Linear Regression with Regularization
Geeralze Lear Regresso wth Regularzato Zoya Bylsk March 3, 05 BASIC REGRESSION PROBLEM Note: I the followg otes I wll make explct what s a vector a what s a scalar usg vec t or otato, to avo cofuso betwee
More information( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model
Chapter 3 Asmptotc Theor ad Stochastc Regressors The ature of eplaator varable s assumed to be o-stochastc or fed repeated samples a regresso aalss Such a assumpto s approprate for those epermets whch
More informationAn Introduction to. Support Vector Machine
A Itroducto to Support Vector Mache Support Vector Mache (SVM) A classfer derved from statstcal learg theory by Vapk, et al. 99 SVM became famous whe, usg mages as put, t gave accuracy comparable to eural-etwork
More informationENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections
ENGI 441 Jot Probablty Dstrbutos Page 7-01 Jot Probablty Dstrbutos [Navd sectos.5 ad.6; Devore sectos 5.1-5.] The jot probablty mass fucto of two dscrete radom quattes, s, P ad p x y x y The margal probablty
More informationUNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS
UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:
More informationInvestigating Cellular Automata
Researcher: Taylor Dupuy Advsor: Aaro Wootto Semester: Fall 4 Ivestgatg Cellular Automata A Overvew of Cellular Automata: Cellular Automata are smple computer programs that geerate rows of black ad whte
More informationLecture 3. Sampling, sampling distributions, and parameter estimation
Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called
More informationX X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then
Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers
More informationTESTS BASED ON MAXIMUM LIKELIHOOD
ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal
More informationStrong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity
BULLETIN of the MALAYSIAN MATHEMATICAL SCIENCES SOCIETY Bull. Malays. Math. Sc. Soc. () 7 (004), 5 35 Strog Covergece of Weghted Averaged Appromats of Asymptotcally Noepasve Mappgs Baach Spaces wthout
More informationPTAS for Bin-Packing
CS 663: Patter Matchg Algorthms Scrbe: Che Jag /9/00. Itroducto PTAS for B-Packg The B-Packg problem s NP-hard. If we use approxmato algorthms, the B-Packg problem could be solved polyomal tme. For example,
More informationChapter 14 Logistic Regression Models
Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as
More informationCHAPTER 4 RADICAL EXPRESSIONS
6 CHAPTER RADICAL EXPRESSIONS. The th Root of a Real Number A real umber a s called the th root of a real umber b f Thus, for example: s a square root of sce. s also a square root of sce ( ). s a cube
More informationA Remark on the Uniform Convergence of Some Sequences of Functions
Advaces Pure Mathematcs 05 5 57-533 Publshed Ole July 05 ScRes. http://www.scrp.org/joural/apm http://dx.do.org/0.436/apm.05.59048 A Remark o the Uform Covergece of Some Sequeces of Fuctos Guy Degla Isttut
More informationå 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018
Chrs Pech Fal Practce CS09 Dec 5, 08 Practce Fal Examato Solutos. Aswer: 4/5 8/7. There are multle ways to obta ths aswer; here are two: The frst commo method s to sum over all ossbltes for the rak of
More informationarxiv:math/ v1 [math.gm] 8 Dec 2005
arxv:math/05272v [math.gm] 8 Dec 2005 A GENERALIZATION OF AN INEQUALITY FROM IMO 2005 NIKOLAI NIKOLOV The preset paper was spred by the thrd problem from the IMO 2005. A specal award was gve to Yure Boreko
More informationLecture 02: Bounding tail distributions of a random variable
CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome
More informationn -dimensional vectors follow naturally from the one
B. Vectors ad sets B. Vectors Ecoomsts study ecoomc pheomea by buldg hghly stylzed models. Uderstadg ad makg use of almost all such models requres a hgh comfort level wth some key mathematcal sklls. I
More informationTHE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 00 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for the
More informationMedian as a Weighted Arithmetic Mean of All Sample Observations
Meda as a Weghted Arthmetc Mea of All Sample Observatos SK Mshra Dept. of Ecoomcs NEHU, Shllog (Ida). Itroducto: Iumerably may textbooks Statstcs explctly meto that oe of the weakesses (or propertes) of
More informationThe Occupancy and Coupon Collector problems
Chapter 4 The Occupacy ad Coupo Collector problems By Sarel Har-Peled, Jauary 9, 08 4 Prelmares [ Defto 4 Varace ad Stadard Devato For a radom varable X, let V E [ X [ µ X deote the varace of X, where
More informationUNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS
UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted
More informationChapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements
Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall
More informationUnsupervised Learning and Other Neural Networks
CSE 53 Soft Computg NOT PART OF THE FINAL Usupervsed Learg ad Other Neural Networs Itroducto Mture Destes ad Idetfablty ML Estmates Applcato to Normal Mtures Other Neural Networs Itroducto Prevously, all
More informationLecture 07: Poles and Zeros
Lecture 07: Poles ad Zeros Defto of poles ad zeros The trasfer fucto provdes a bass for determg mportat system respose characterstcs wthout solvg the complete dfferetal equato. As defed, the trasfer fucto
More informationUNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS
Numercal Computg -I UNIT SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS Structure Page Nos..0 Itroducto 6. Objectves 7. Ital Approxmato to a Root 7. Bsecto Method 8.. Error Aalyss 9.4 Regula Fals Method
More information9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d
9 U-STATISTICS Suppose,,..., are P P..d. wth CDF F. Our goal s to estmate the expectato t (P)=Eh(,,..., m ). Note that ths expectato requres more tha oe cotrast to E, E, or Eh( ). Oe example s E or P((,
More information1 Onto functions and bijections Applications to Counting
1 Oto fuctos ad bectos Applcatos to Coutg Now we move o to a ew topc. Defto 1.1 (Surecto. A fucto f : A B s sad to be surectve or oto f for each b B there s some a A so that f(a B. What are examples of
More informationSpecial Instructions / Useful Data
JAM 6 Set of all real umbers P A..d. B, p Posso Specal Istructos / Useful Data x,, :,,, x x Probablty of a evet A Idepedetly ad detcally dstrbuted Bomal dstrbuto wth parameters ad p Posso dstrbuto wth
More information1 Convergence of the Arnoldi method for eigenvalue problems
Lecture otes umercal lear algebra Arold method covergece Covergece of the Arold method for egevalue problems Recall that, uless t breaks dow, k steps of the Arold method geerates a orthogoal bass of a
More informationChapter 8. Inferences about More Than Two Population Central Values
Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha
More informationChapter 3 Sampling For Proportions and Percentages
Chapter 3 Samplg For Proportos ad Percetages I may stuatos, the characterstc uder study o whch the observatos are collected are qualtatve ature For example, the resposes of customers may marketg surveys
More informationThird handout: On the Gini Index
Thrd hadout: O the dex Corrado, a tala statstca, proposed (, 9, 96) to measure absolute equalt va the mea dfferece whch s defed as ( / ) where refers to the total umber of dvduals socet. Assume that. The
More informationENGI 3423 Simple Linear Regression Page 12-01
ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable
More informationNon-uniform Turán-type problems
Joural of Combatoral Theory, Seres A 111 2005 106 110 wwwelsevercomlocatecta No-uform Turá-type problems DhruvMubay 1, Y Zhao 2 Departmet of Mathematcs, Statstcs, ad Computer Scece, Uversty of Illos at
More informationSimulation Output Analysis
Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5
More informationAN UPPER BOUND FOR THE PERMANENT VERSUS DETERMINANT PROBLEM BRUNO GRENET
AN UPPER BOUND FOR THE PERMANENT VERSUS DETERMINANT PROBLEM BRUNO GRENET Abstract. The Permaet versus Determat problem s the followg: Gve a matrx X of determates over a feld of characterstc dfferet from
More informationSimple Linear Regression
Correlato ad Smple Lear Regresso Berl Che Departmet of Computer Scece & Iformato Egeerg Natoal Tawa Normal Uversty Referece:. W. Navd. Statstcs for Egeerg ad Scetsts. Chapter 7 (7.-7.3) & Teachg Materal
More informationGeneralization of the Dissimilarity Measure of Fuzzy Sets
Iteratoal Mathematcal Forum 2 2007 o. 68 3395-3400 Geeralzato of the Dssmlarty Measure of Fuzzy Sets Faramarz Faghh Boformatcs Laboratory Naobotechology Research Ceter vesa Research Isttute CECR Tehra
More informationA tighter lower bound on the circuit size of the hardest Boolean functions
Electroc Colloquum o Computatoal Complexty, Report No. 86 2011) A tghter lower boud o the crcut sze of the hardest Boolea fuctos Masak Yamamoto Abstract I [IPL2005], Fradse ad Mlterse mproved bouds o the
More informationHomework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015
Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts
More informationCubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem
Joural of Amerca Scece ;6( Cubc Nopolyomal Sple Approach to the Soluto of a Secod Order Two-Pot Boudary Value Problem W.K. Zahra, F.A. Abd El-Salam, A.A. El-Sabbagh ad Z.A. ZAk * Departmet of Egeerg athematcs
More informationD KL (P Q) := p i ln p i q i
Cheroff-Bouds 1 The Geeral Boud Let P 1,, m ) ad Q q 1,, q m ) be two dstrbutos o m elemets, e,, q 0, for 1,, m, ad m 1 m 1 q 1 The Kullback-Lebler dvergece or relatve etroy of P ad Q s defed as m D KL
More informationEntropy ISSN by MDPI
Etropy 2003, 5, 233-238 Etropy ISSN 1099-4300 2003 by MDPI www.mdp.org/etropy O the Measure Etropy of Addtve Cellular Automata Hasa Aı Arts ad Sceces Faculty, Departmet of Mathematcs, Harra Uversty; 63100,
More informationChapter 4 Multiple Random Variables
Revew o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for Chapter 4-5 Notes: Although all deftos ad theorems troduced our lectures ad ths ote are mportat ad you should be famlar wth, but I put those
More informationEstimation of Stress- Strength Reliability model using finite mixture of exponential distributions
Iteratoal Joural of Computatoal Egeerg Research Vol, 0 Issue, Estmato of Stress- Stregth Relablty model usg fte mxture of expoetal dstrbutos K.Sadhya, T.S.Umamaheswar Departmet of Mathematcs, Lal Bhadur
More information1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.
CS 94- Desty Matrces, vo Neuma Etropy 3/7/07 Sprg 007 Lecture 3 I ths lecture, we wll dscuss the bascs of quatum formato theory I partcular, we wll dscuss mxed quatum states, desty matrces, vo Neuma etropy
More informationModule 7: Probability and Statistics
Lecture 4: Goodess of ft tests. Itroducto Module 7: Probablty ad Statstcs I the prevous two lectures, the cocepts, steps ad applcatos of Hypotheses testg were dscussed. Hypotheses testg may be used to
More informationComplete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables
Joural of Sceces, Islamc Republc of Ira 8(4): -6 (007) Uversty of Tehra, ISSN 06-04 http://sceces.ut.ac.r Complete Covergece ad Some Maxmal Iequaltes for Weghted Sums of Radom Varables M. Am,,* H.R. Nl
More informationMULTIDIMENSIONAL HETEROGENEOUS VARIABLE PREDICTION BASED ON EXPERTS STATEMENTS. Gennadiy Lbov, Maxim Gerasimov
Iteratoal Boo Seres "Iformato Scece ad Computg" 97 MULTIIMNSIONAL HTROGNOUS VARIABL PRICTION BAS ON PRTS STATMNTS Geady Lbov Maxm Gerasmov Abstract: I the wors [ ] we proposed a approach of formg a cosesus
More informationObjectives of Multiple Regression
Obectves of Multple Regresso Establsh the lear equato that best predcts values of a depedet varable Y usg more tha oe eplaator varable from a large set of potetal predctors {,,... k }. Fd that subset of
More informationBinary classification: Support Vector Machines
CS 57 Itroducto to AI Lecture 6 Bar classfcato: Support Vector Maches Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 57 Itro to AI Supervsed learg Data: D { D, D,.., D} a set of eamples D, (,,,,,
More informationSolving Constrained Flow-Shop Scheduling. Problems with Three Machines
It J Cotemp Math Sceces, Vol 5, 2010, o 19, 921-929 Solvg Costraed Flow-Shop Schedulg Problems wth Three Maches P Pada ad P Rajedra Departmet of Mathematcs, School of Advaced Sceces, VIT Uversty, Vellore-632
More informationChapter 8: Statistical Analysis of Simulated Data
Marquette Uversty MSCS600 Chapter 8: Statstcal Aalyss of Smulated Data Dael B. Rowe, Ph.D. Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 08 by Marquette Uversty MSCS600 Ageda 8. The Sample
More informationCS 1675 Introduction to Machine Learning Lecture 12 Support vector machines
CS 675 Itroducto to Mache Learg Lecture Support vector maches Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Mdterm eam October 9, 7 I-class eam Closed book Stud materal: Lecture otes Correspodg chapters
More information= lim. (x 1 x 2... x n ) 1 n. = log. x i. = M, n
.. Soluto of Problem. M s obvously cotuous o ], [ ad ], [. Observe that M x,..., x ) M x,..., x ) )..) We ext show that M s odecreasg o ], [. Of course.) mles that M s odecreasg o ], [ as well. To show
More informationAitken delta-squared generalized Juncgk-type iterative procedure
Atke delta-squared geeralzed Jucgk-type teratve procedure M. De la Se Isttute of Research ad Developmet of Processes. Uversty of Basque Coutry Campus of Leoa (Bzkaa) PO Box. 644- Blbao, 488- Blbao. SPAIN
More informationA conic cutting surface method for linear-quadraticsemidefinite
A coc cuttg surface method for lear-quadratcsemdefte programmg Mohammad R. Osoorouch Calfora State Uversty Sa Marcos Sa Marcos, CA Jot wor wth Joh E. Mtchell RPI July 3, 2008 Outle: Secod-order coe: defto
More information5 Short Proofs of Simplified Stirling s Approximation
5 Short Proofs of Smplfed Strlg s Approxmato Ofr Gorodetsky, drtymaths.wordpress.com Jue, 20 0 Itroducto Strlg s approxmato s the followg (somewhat surprsg) approxmato of the factoral,, usg elemetary fuctos:
More information