SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

Size: px
Start display at page:

Download "SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives"

Transcription

1 SAGA: A Fast Icremetal Gradet Method Wth Support for No-Strogly Covex Composte Objectves Aaro Defazo, Fracs Bach, Smo Lacoste-Jule To cte ths verso: Aaro Defazo, Fracs Bach, Smo Lacoste-Jule. SAGA: A Fast Icremetal Gradet Method Wth Support for No-Strogly Covex Composte Objectves. Advaces I Neural Iformato Processg Systems, Nov 2014, Motreal, Caada. <hal v3> HAL Id: hal Submtted o 12 Nov 2014 HAL s a mult-dscplary ope access archve for the depost ad dssemato of scetfc research documets, whether they are publshed or ot. The documets may come from teachg ad research sttutos Frace or abroad, or from publc or prvate research ceters. L archve ouverte plurdscplare HAL, est destée au dépôt et à la dffuso de documets scetfques de veau recherche, publés ou o, émaat des établssemets d esegemet et de recherche fraças ou étragers, des laboratores publcs ou prvés.

2 SAGA: A Fast Icremetal Gradet Method Wth Support for No-Strogly Covex Composte Objectves Aaro Defazo Ambata Australa Natoal Uversty, Caberra Fracs Bach INRIA - Serra Project-Team École Normale Supéreure, Pars, Frace Smo Lacoste-Jule INRIA - Serra Project-Team École Normale Supéreure, Pars, Frace Abstract I ths work we troduce a ew optmsato method called SAGA the sprt of SAG, SDCA, MISO ad SVRG, a set of recetly proposed cremetal gradet algorthms wth fast lear covergece rates. SAGA mproves o the theory behd SAG ad SVRG, wth better theoretcal covergece rates, ad has support for composte objectves where a proxmal operator s used o the regularser. Ulke SDCA, SAGA supports o-strogly covex problems drectly, ad s adaptve to ay heret strog covexty of the problem. We gve expermetal results showg the effectveess of our method. 1 Itroducto Remarkably, recet advaces 1, 2] have show that t s possble to mmse strogly covex fte sums provably faster expectato tha s possble wthout the fte sum structure. Ths s sgfcat for mache learg problems as a fte sum structure s commo the emprcal rsk mmsato settg. The requremet of strog covexty s lkewse satsfed mache learg problems the typcal case where a quadratc regularser s used. I partcular, we are terested mmsg fuctos of the form f(x) = 1 f (x), where x R d, each f s covex ad has Lpschtz cotuous dervatves wth costat L. We wll also cosder the case where each f s strogly covex wth costat µ, ad the composte (or proxmal) case where a addtoal regularsato fucto s added: =1 F(x) = f(x)+h(x), where h: R d R d s covex but potetally o-dfferetable, ad where the proxmal operato ofhs easy to compute few cremetal gradet methods are applcable ths settg 3]4]. Our cotrbutos are as follows. I Secto 2 we descrbe the SAGA algorthm, a ovel cremetal gradet method. I Secto 5 we prove theoretcal covergece rates for SAGA the strogly covex case better tha those for SAG 1] ad SVRG 5], ad a factor of 2 from the SDCA 2] covergece rates. These rates also hold the composte settg. Addtoally, we show that The frst author completed ths work whle uder fudg from NICTA. Ths work was partally supported by the MSR-Ira Jot Cetre ad a grat by the Europea Research Coucl (SIERRA project ). 1

3 lke SAG but ulke SDCA, our method s applcable to o-strogly covex problems wthout modfcato. We establsh theoretcal covergece rates for ths case also. I Secto 3 we dscuss the relato betwee each of the fast cremetal gradet methods, showg that each stems from a very small modfcato of aother. 2 SAGA Algorthm We start wth some kow tal vectorx 0 R d ad kow dervatvesf (φ0 ) Rd wthφ 0 = x0 for each. These dervatves are stored a table data-structure of legth, or alteratvely a d matrx. For may problems of terest, such as bary classfcato ad least-squares, oly a sgle floatg pot value stead of a full gradet vector eeds to be stored (see Secto 4). SAGA s spred both from SAG 1] ad SVRG 5] (as we wll dscuss Secto 3). SAGA uses a step sze ofγ ad makes the followg updates, startg wthk = 0: SAGA Algorthm: Gve the value ofx k ad of eachf (φk ) at the ed of teratok, the updates for terato k +1 s as follows: 1. Pck aj uformly at radom. 2. Take φ k+1 j = x k, ad store f j (φk+1 j ) the table. All other etres the table rema s ot explctly stored. uchaged. The quattyφ k+1 j 3. Updatexusgf j (φk+1 j w k+1 = x k γ The proxmal operator we use above s defed as ),f j (φk j ) ad the table average: f j(φ k+1 j ) f j(φ k j)+ 1 ] f (φ k ), (1) =1 x k+1 = prox h ( γ w k+1 ). (2) { h(x)+ 1 }. (3) prox h γ (y) := argm x R 2γ x y 2 d I the strogly covex case, whe a step sze ofγ = 1/(2(µ+L)) s chose, we have the followg covergece rate the composte ad hece also the o-composte case: ) Ex k x 2 k µ x (1 0 x 2 + f(x 0 ) f (x ),x 0 x f(x ) ]]. 2(µ+L) µ+l We prove ths result Secto 5. The requremet of strog covexty ca be relaxed from eedg to hold for each f to just holdg o average, but at the expese of a worse geometrc rate (1 ), requrg a step sze ofγ = 1/(3(µ+L)). µ 6(µ+L) I the o-strogly covex case, we have establshed the covergece rate terms of the average terate, excludg step 0: x k = 1 k k t=1 xt. Usg a step sze ofγ = 1/(3L) we have E F( x k ) ] F(x ) 4 2L x 0 x 2 +f(x 0 ) f (x ),x 0 x ] f(x ). k Ths result s proved the supplemetary materal. Importatly, whe ths step sze γ = 1/(3L) s used, our algorthm automatcally adapts to the level of strog covexty µ > 0 aturally preset, gvg a covergece rate of (see the commet at the ed of the proof of Theorem 1): { }) Ex k x 2 k 1 (1 m 4, µ x 0 x f(x 0 ) f (x ),x 0 x f(x ) ]]. 3L 3L Although ay cremetal gradet method ca be appled to o-strogly covex problems va the addto of a small quadratc regularsato, the amout of regularsato s a addtoal tuable parameter whch our method avods. 3 Related Work We explore the relatoshp betwee SAGA ad the other fast cremetal gradet methods ths secto. By usg SAGA as a mdpot, we are able to provde a more ufed vew tha s avalable the exstg lterature. A bref summary of the propertes of each method cosdered ths secto s gve Fgure 1. The method from 3], whch hadles the o-composte settg, s ot lsted as ts rate s of the slow type ad ca be up totmes smaller tha the oe for SAGA or SVRG 5]. 2

4 SAGA SAG SDCA SVRG FINITO Strogly Covex (SC) Covex, No-SC*?? Prox Reg.? 6] No-smooth Low Storage Cost Smple(-sh) Proof Adaptve to SC?? Fgure 1: Basc summary of method propertes. Questo marks deote uprove, but ot expermetally ruled out cases. (*) Note that ay method ca be appled to o-strogly covex problems by addg a small amout of L2 regularsato, ths row descrbes methods that do ot requre ths trck. SAGA: mdpot betwee SAG ad SVRG/S2GD I 5], the authors make the observato that the varace of the stadard stochastc gradet (SGD) update drecto ca oly go to zero f decreasg step szes are used, thus prevetg a lear covergece rate ulke for batch gradet descet. They thus propose to use a varace reducto approach (see 7] ad refereces there for example) o the SGD update order to be able to use costat step szes ad get a lear covergece rate. We preset the updates of ther method called SVRG (Stochastc Varace Reduced Gradet) (6) below, comparg t wth the o-composte form of SAGA rewrtte (5). They also meto that SAG (Stochastc Average Gradet) 1] ca be terpreted as reducg the varace, though they do ot provde the specfcs. Here, we make ths coecto clearer ad relate t to SAGA. We frst revew a slghtly more geeralzed verso of the varace reducto approach (we allow the updates to be based). Suppose that we wat to use Mote Carlo samples to estmate EX ad that we ca compute effcetlyey for aother radom varabley that s hghly correlated wthx. Oe varace reducto approach s to use the followg estmatorθ α as a approxmato toex: θ α := α(x Y)+EY, for a step szeα 0,1]. We have thateθ α s a covex combato ofex adey : Eθ α = αex +(1 α)ey. The stadard varace reducto approach usesα = 1 ad the estmate s ubased Eθ 1 = EX. The varace of θ α s: Var(θ α ) = α 2 Var(X)+Var(Y) 2Cov(X,Y)], ad so fcov(x,y)s bg eough, the varace ofθ α s reduced compared tox, gvg the method ts ame. By varyg α from 0 to 1, we crease the varace of θ α towards ts maxmum value (whch usually s stll smaller tha the oe forx) whle decreasg ts bas towards zero. Both SAGA ad SAG ca be derved from such a varace reducto vewpot: herex s the SGD drecto sample f j (xk ), whereas Y s a past stored gradet f j (φk j ). SAG s obtaed by usg α = 1/ (update rewrtte our otato (4)), whereas SAGA s the ubased verso wthα = 1 (see (5) below). For the same φ s, the varace of the SAG update s 1/ 2 tmes the oe of SAGA, but at the expese of havg a o-zero bas. Ths o-zero bas mght expla the complexty of the covergece proof of SAG ad why the theory has ot yet bee exteded to proxmal operators. By usg a ubased update SAGA, we are able to obta a smple ad tght theory, wth better costats tha SAG, as well as theoretcal rates for the use of proxmal operators. f (SAG) x k+1 = x k j (x k ) f j γ (φk j ) ] + 1 f (φ k ), (4) =1 ] (SAGA) x k+1 = x k γ f j(x k ) f j(φ k j)+ 1 f (φ k ), (5) =1 ] (SVRG) x k+1 = x k γ f j(x k ) f j( x)+ 1 f ( x). (6) The SVRG update (6) s obtaed by usg Y = f j ( x) wth α = 1 (ad s thus ubased we ote that SAG s the oly method that we preset the related work that has a based update drecto). The vector x s ot updated every step, but rather the loop overk appears sde a outer loop, where x s updated at the start of each outer terato. Essetally SAGA s at the mdpot betwee SVRG ad SAG; t updates the φ j value each tme dex j s pcked, whereas SVRG updates all of φ s as a batch. The S2GD method 8] has the same update as SVRG, just dfferg how the umber of er loop teratos s chose. We use SVRG heceforth to refer to both methods. =1 3

5 SVRG makes a trade-off betwee tme ad space. For the equvalet practcal covergece rate t makes 2x-3x more gradet evaluatos, but dog so t does ot eed to store a table of gradets, but a sgle average gradet. The usage of SAG vs. SVRG s problem depedet. For example for lear predctors where gradets ca be stored as a reduced vector of dmesop 1 forpclasses, SAGA s preferred over SVRG both theoretcally ad practce. For eural etworks, where o theory s avalable for ether method, the storage of gradets s geerally more expesve tha the addtoal backpropagatos, but ths s computer archtecture depedet. SVRG also has a addtoal parameter besdes step sze that eeds to be set, amely the umber of teratos per er loop (m). Ths parameter ca be set va the theory, or coservatvely as m =, however dog so does ot gve aywhere ear the best practcal performace. Havg to tue oe parameter stead of two s a practcal advatage for SAGA. Fto/MISOµ To make the relatoshp wth other pror methods more apparet, we ca rewrte the SAGA algorthm ( the o-composte case) term of a addtoal termedate quatty u k, wth u 0 := x 0 +γ =1 f (x0 ), addto to the usualx k terate as descrbed prevously: SAGA: Equvalet reformulato for o-composte case: Gve the value of u k ad of each f (φk ) at the ed of teratok, the updates for teratok+1, s as follows: 1. Calculatex k : x k = u k γ f (φ k ). (7) 2. Updateuwthu k+1 = u k + 1 (xk u k ). 3. Pck a j uformly at radom. 4. Take φ k+1 j =1 = x k, ad store f j (φk+1 j ) the table replacg f j (φk j ). All other etres s ot explctly stored. the table rema uchaged. The quattyφ k+1 j Elmatgu k recovers the update (5) forx k. We ow descrbe how the Fto 9] ad MISOµ 10] methods are closely related to SAGA. Both Fto ad MISOµ use updates of the followg form, for a step legthγ: x k+1 = 1 φ k γ f (φ k ). (8) The step sze used s of the order of 1/µ. To smplfy the dscusso of ths algorthm we wll troduce the otato φ = 1 φk. SAGA ca be terpreted as Fto, but wth the quatty φ replaced wth u, whch s updated the same way as φ, but expectato. To see ths, cosder how φ chages expectato: E φk+1 ] = E φ k + 1 ( x k φ k ) ] j = φ k + 1 ( x k φ k). The update s detcal expectato to the update for u, u k+1 = u k + 1 (xk u k ). There are three advatages of SAGA over Fto/MISOµ. SAGA does ot requre strog covexty to work, t has support for proxmal operators, ad t does ot requre storg the φ values. MISO has prove support for proxmal operators oly the case where mpractcally small step szes are used 10]. The bg advatage of Fto/MISOµ s that whe usg a per-pass re-permuted access orderg, emprcal speed-ups of up-to a factor of 2x has bee observed. Ths access order ca also be used wth the other methods dscussed, but wth smaller emprcal speed-ups. Fto/MISOµ s partcularly useful whe f s computatoally expesve to compute compared to the extra storage costs requred over the other methods. SDCA The Stochastc Dual Coordate Descet (SDCA) 2] method o the surface appears qute dfferet from the other methods cosdered. It works wth the covex cojugates of the f fuctos. However, ths secto we show a ovel trasformato of SDCA to a equvalet method that oly works wth prmal quattes, ad s closely related to the MISOµ method. =1 4

6 Cosder the followg algorthm: SDCA algorthm the prmal Step k +1: 1. Pck a dex j uformly at radom. 2. Compute φ k+1 j = prox fj γ (z), whereγ = 1 µ adz = γ j f (φk ( ) ). z φ k+1 j the table at locato j. For j, the 3. Store the gradet f j (φk+1 j ) = 1 γ table etres are uchaged (f (φk+1 ) = f (φk )). At completo, returx k = γ f (φk ). We clam that ths algorthm s equvalet to the verso of SDCA where exact block-coordate maxmsato s used o the dual. 1 Frstly, ote that whle SDCA was orgally descrbed for oedmesoal outputs (bary classfcato or regresso), t has bee expaded to cover the multclass predctor case 11] (called Prox-SDCA there). I ths case, the prmal objectve has a separate strogly covex regularser, ad the fuctosf are restrcted to the formf (x) := ψ (X T x), where X s ad p feature matrx, adψ s the loss fucto that takes apdmesoal put, forpclasses. To stay the same geeral settg as the other cremetal gradet methods, we work drectly wth the f (x) fuctos rather tha the more structured ψ (X T x). The dual objectve to maxmse the becomes D(α) = µ 1 2 α 1 f ( α ), 2 µ =1 whereα s ared-dmesoal dual varables. Geeralsg the exact block-coordate maxmsato update that SDCA performs to ths form, we get the dual update for block j (wth x k the curret prmal terate): α k+1 j = α k j +argmax a j R d { f j =1 ( α k j α j ) µ 2 xk + 1 µ α 2} j. (9) I the specal case where f (x) = ψ (X T x), we ca see that (9) gves exactly the same update as Opto I of Prox-SDCA 11, Fgure 1], whch operates stead o the equvalet p-dmesoal dual varables α wth the relatoshp thatα = X α. 2 As oted by Shalev-Shwartz & Zhag 11], the update (9) s actually a stace of the proxmal operator of the covex cojugate of f j. Our prmal formulato explots ths fact by usg a relato betwee the proxmal operator of a fucto ad ts covex cojugate kow as the Moreau decomposto: prox f (v) = v prox f (v). Ths decomposto allows us to compute the proxmal operator of the cojugate va the prmal proxmal operator. As ths s the oly use the basc SDCA method of the cojugate fucto, applyg ths decomposto allows us to completely elmate the dual aspect of the algorthm, yeldg the above prmal form of SDCA. The dual varables are related to the prmal represetatves φ s through α = f (φ ). The KKT codtos esure that f the α values are dual optmal the x k = γ α as defed above s prmal optmal. The same trck s commoly used to terpret Djkstra s set tersecto as a prmal algorthm stead of a dual block coordate descet algorthm 12]. The prmal form of SDCA dffers from the other cremetal gradet methods descrbed ths secto that t assumes strog covexty s duced by a separate strogly covex regularser, rather tha eachf beg strogly covex. I fact, SDCA ca be modfed to work wthout a separate regularser, gvg a method that s at the mdpot betwee Fto ad SDCA. We detal such a method the supplemetary materal. 1 More precsely, to Opto I of Prox-SDCA as descrbed 11, Fgure 1]. We wll smply refer to ths method as SDCA ths paper for brevty. 2 Ths s becausef (α ) = f α s.t. α =X α ψ ( α ). 5

7 SDCA varats The SDCA theory has bee expaded to cover a umber of other methods of performg the coordate step 11]. These varats replace the proxmal operato our prmal terpretato the prevous secto wth a update whereφ k+1 j s chose so that: f j (φk+1 j ) = (1 β)f j (φk j )+βf j (xk ), where x k = 1 µ f (φk ). The varats dffer how β 0,1] s chose. Note that φk+1 j does ot actually have to be explctly kow, just the gradetf j (φk+1 j ), whch s the result of the above terpolato. Varat 5 by Shalev-Shwartz & Zhag 11] does ot requre operatos o the cojugate fucto, t smply usesβ = µ L+µ. The most practcal varat performs a le search volvg the covex cojugate to determe β. As far as we are aware, there s o smple prmal equvalet of ths le search. So cases where we ca ot compute the proxmal operator from the stadard SDCA varat, we ca ether troduce a tueable parameter to the algorthm (β), or use a dual le search, whch requres a effcet way to evaluate the covex cojugates of eachf. 4 Implemetato We brefly dscuss some mplemetato cocers: For may problems each dervatve f s just a smple weghtg of the th data vector. Logstc regresso ad least squares have ths property. I such cases, stead of storg the full dervatvef for each, we eed oly to store the weghtg costats. Ths reduces the storage requremets to be the same as the SDCA method practce. A smlar trck ca be appled to mult-class classfers wthpclasses by storgp 1 values for each. Our algorthm assumes that tal gradets are kow for each f at the startg pot x 0. Istead, a heurstc may be used where durg the frst pass, data-pots are troduced oeby-oe, a o-radomzed order, wth averages computed terms of those data-pots processed so far. Ths procedure has bee successfully used wth SAG 1]. The SAGA update as stated s slower tha ecessary whe dervatves are sparse. A just-tme updatg of u or x may be performed just as s suggested for SAG 1], whch esures that oly sparse updates are doe at each terato. We gve the form of SAGA for the case where each f s strogly covex. However practce we usually have oly covexf, wth strog covexty f duced by the addto of a quadratc regularser. Ths quadratc regularser may be splt amogst the f fuctos evely, to satsfy our assumptos. It s perhaps easer to use a varat of SAGA where the regularser µ 2 x 2 s explct, such as the followg modfcato of Equato (5): ] x k+1 = (1 γµ)x k γ f j(x k ) f j(φ k j)+ 1 f (φ k ). For sparse mplemetatos stead of scalg x k at each step, a separate scalg costat β k may be scaled stead, wth β k x k beg used place of x k. Ths s a stadard trck used wth stochastc gradet methods. For sparse problems wth a quadratc regularser the just--tme updatg ca be a lttle trcate. I the supplemetary materal we provde example pytho code showg a correct mplemetato that uses each of the above trcks. 5 Theory I ths secto, all expectatos are take wth respect to the choce of j at terato k + 1 ad codtoed ox k ad each f (φk ) uless stated otherwse. We start wth two basc lemmas that just state propertes of covex fuctos, followed by Lemma 1, whch s specfc to our algorthm. The proofs of each of these lemmas s the supplemetary materal. Lemma 1. Let f(x) = 1 =1 f (x). Suppose each f s µ-strogly covex ad has Lpschtz cotuous gradets wth costatl. The for allxad x : f (x),x x L µ L f(x ) f(x)] µ 2 x x 2 6

8 1 2L f (x ) f (x) 2 µ L f (x ),x x. Lemma 2. We have that for allφ ad x : ] 1 f (φ ) f (x ) 2 1 2L f (φ ) f(x ) 1 f (x ),φ x. Lemma 3. It holds that for ayφ k,x, x k ad β > 0, wth w k+1 as defed Equato 1: Ew k+1 x k γf (x ) 2 γ 2 (1+β 1 )Ef j(φ k j) f j(x ) 2 +γ 2 (1+β)Ef j(x k ) f j(x ) γ 2 βf (x k ) f (x ) 2. Theorem 1. Wth x the optmal soluto, defe the Lyapuov fuctot as: T k := T(x k,{φ k } =1) := 1 f (φ k ) f(x ) 1 f (x ),φ k x +c x k x 2. 1 The wthγ = 2(µ+L),c = 1 2γ(1 γµ), adκ = 1 γµ, we have the followg expected chage the Lyapuov fucto betwee steps of the SAGA algorthm (codtoal ot k ): ET k+1 ] (1 1 κ )Tk. Proof. The frst three terms T k+1 are straght-forward to smplfy: ] 1 E f (φ k+1 ) = 1 ( f(xk )+ 1 1 ) 1 f (φ k ). E 1 f (x ),φ k+1 x ] = 1 f (x ),x k x ( 1 1 ) 1 f (x ),φ k x. For the chage the last term oft k+1, we apply the o-expasveess of the proxmal operator 3 : c x k+1 x 2 = c proxγ (w k+1 ) prox γ (x γf (x )) 2 c w k+1 x +γf (x ) 2. We expad the quadratc ad applyew k+1 ] = x k γf (x k ) to smplfy the er product term: ce w k+1 x +γf (x ) 2 = ce x k x +w k+1 x k +γf (x ) 2 = c x k x 2 +2cE w k+1 x k +γf (x ),x k x ] +ce w k+1 x k +γf (x ) 2 = c x k x 2 2cγ f (x k ) f (x ),x k x +ce w k+1 x k +γf (x ) 2 c x k x 2 2cγ f (x k ),x k x +2cγ f (x ),x k x cγ 2 β f (x k ) f (x ) 2 + ( 1+β 1) cγ 2 E f j (φ k j) f j(x ) 2 +(1+β)cγ 2 E f j (x k ) f j(x ) 2. (Lemma 7) The value of β shall be fxed later. Now we apply Lemma 1 to boud 2cγ f (x k ),x k x ad Lemma 6 to boud E f j (φ k j ) f j (x ) 2 : ce x k+1 x 2 (c cγµ) x k x 2 + pot. 2cγ(L µ) L +2 ( 1+β 1) cγ 2 L ( (1+β)cγ 2 cγ ) E f L j(x k ) f j(x ) 2 f(x k ) f(x ) f (x ),x k x ] cγ 2 β f (x k ) f (x ) 2 1 f (φ k ) f(x ) 1 f (x ),φ k x ]. 3 Note that the frst equalty below s the oly place the proof where we use the fact thatx s a optmalty 2 7

9 Fucto sub-optmalty Gradet evaluatos / Fto perm Fto SAGA SVRG SAG SDCA Fgure 2: From left to rght we have the MNIST, COVTYPE, IJCNN1 ad MILLIONSONG datasets. Top row s the L2 regularsed case, bottom row the L1 regularsed case. We ca ow combe the bouds that we have derved for each term T, ad pull out a fracto 1 κ of Tk (for ay κ at ths pot). Together wth the equalty f (x k ) f (x ) 2 2µ f(x k ) f(x ) f (x ),x k x ] 13, Thm ], that yelds: ET k+1 ] T k 1 ( 1 κ Tk + 2cγ(L µ) 2cγ µβ) 2 f(x k ) f(x ) f (x ),x k x ] L ( 1 + κ +2(1+β 1 )cγ 2 L 1 ) 1 f (φ k ) f(x ) 1 f (x ),φ k x ] ( ) ( 1 + κ γµ cx k x 2 + (1+β)γ 1 ) cγef j(x k ) f j(x ) 2. (10) L Note that each of the terms square brackets are postve, ad t ca be readly verfed that our 1 assumed values for the costats (γ = 2(µ+L), c = 1 2γ(1 γµ), ad κ = 1 γµ ), together wth β = 2µ+L L esure that each of the quattes roud brackets are o-postve (the costats were determed by settg all the roud brackets to zero except the secod oe see 14] for the detals). Adaptvty to strog covexty result: Note that whe usg the γ = 1 3L step sze, the same c as above ca be used wth β = 2 ad 1 κ = m{ 1 4, µ 3L} to esure o-postve terms. Corollary 1. Note that c x k x 2 T k, ad therefore by chag the expectatos, pluggg the costats explctly ad usgµ( 0.5) µ to smplfy the expresso, we get: x E ] ( ) k x 2 k µ x 1 0 x 2 + f(x 0 ) f (x ),x 0 x f(x ) ]]. 2(µ+L) µ+l Here the expectato s over all choces of dexj k up to step k. 6 Expermets We performed a seres of expermets to valdate the effectveess of SAGA. We tested a bary classfer o MNIST, COVTYPE, IJCNN1 ad a least squares predctor o MILLIONSONG. Detals of these datasets ca be foud 9]. We used the same code base for each method, just chagg the ma update rule. SVRG was tested wth the recalbrato pass used every teratos, as suggested 8]. Each method had ts step sze parameter chose so as to gve the fastest covergece. We tested wth a L2 regularser, whch all methods support, ad wth a L1 regularser o a subset of the methods. The results are show Fgure 2. We ca see that Fto (perm) performs the best o a per epoch equvalet bass, but t ca be the most expesve method per step. SVRG s smlarly fast o a per epoch bass, but whe cosderg the umber of gradet evaluatos per epoch s double that of the other methods for ths problem, t s mddle of the pack. SAGA ca be see to perform smlar to the o-permuted Fto case, ad to SDCA. Note that SAG s slower tha the other methods at the begg. To get the optmal results for SAG, a adaptve step sze rule eeds to be used rather tha the costat step sze we used. I geeral, these tests cofrm that the choce of methods should be doe based o ther propertes as dscussed Secto 3, rather tha ther covergece rate. 8

10 Refereces 1] Mark Schmdt, Ncolas Le Roux, ad Fracs Bach. Mmzg fte sums wth the stochastc average gradet. Techcal report, INRIA, hal , ] Sha Shalev-Shwartz ad Tog Zhag. Stochastc dual coordate ascet methods for regularzed loss mmzato. JMLR, 14: , ] Paul Tseg ad Sagwoo Yu. Icremetally updated gradet methods for costraed ad regularzed optmzato. Joural of Optmzato Theory ad Applcatos, 160:832:853, ] L Xao ad Tog Zhag. A proxmal stochastc gradet method wth progressve varace reducto. Techcal report, Mcrosoft Research, Redmod ad Rutgers Uversty, Pscataway, NJ, ] Re Johso ad Tog Zhag. Acceleratg stochastc gradet descet usg predctve varace reducto. NIPS, ] Taj Suzuk. Stochastc dual coordate ascet wth alteratg drecto method of multplers. Proceedgs of The 31st Iteratoal Coferece o Mache Learg, ] Eva Greesmth, Peter L. Bartlett, ad Joatha Baxter. Varace reducto techques for gradet estmates reforcemet learg. JMLR, 5: , ] Jakub Koečý ad Peter Rchtárk. Sem-stochastc gradet descet methods. ArXv e-prts, arxv: , December ] Aaro Defazo, Tbero Caetao, ad Just Domke. Fto: A faster, permutable cremetal gradet method for bg data problems. Proceedgs of the 31st Iteratoal Coferece o Mache Learg, ] Jule Maral. Icremetal majorzato-mmzato optmzato wth applcato to largescale mache learg. Techcal report, INRIA Greoble Rhôe-Alpes / LJK Laboratore Jea Kutzma, ] Sha Shalev-Shwartz ad Tog Zhag. Accelerated proxmal stochastc dual coordate ascet for regularzed loss mmzato. Techcal report, The Hebrew Uversty, Jerusalem ad Rutgers Uversty, NJ, USA, ] Patrck Combettes ad Jea-Chrstophe Pesquet. Proxmal Splttg Methods Sgal Processg. I Fxed-Pot Algorthms for Iverse Problems Scece ad Egeerg. Sprger, ] Yu. Nesterov. Itroductory Lectures O Covex Programmg. Sprger, ] Aaro Defazo. New Optmzato Methods for Mache Learg. PhD thess, (draft uder examato) Australa Natoal Uversty,

11 A The SDCA/Fto Mdpot Algorthm Usg Lagraga dualty theory, SDCA ca be show at step k as mmsg the followg lower boud: A k (x) = 1 f j(x)+ 1 f (φ k )+ f (φ k ),x φ k µ ] + 2 x 2. j Istead of drectly cludg the regularser ths boud, we ca use the stadard strog covexty lower boud for each f, by removg µ 2 x 2 ad chagg the expresso the summato to f (φ k )+ f (φk ),x φk + µ 2 x φ 2. The trasformato to havg strog covexty wth the f fuctos yelds the followg smple modfcato to the algorthm: φ k+1 j = prox fj (µ( 1)) (z), 1 where: z = 1 1 φ k j 1 µ( 1) f (φ k ). It ca be show that after ths update: x k+1 = φ k+1 j = 1 φ k+1 1 f µ (φ k+1 ). Now the smlarty to Fto s apparet f ths equato s compared Equato 8: x k+1 = 1 φk γ =1 f (φk ). The oly dfferece s that the vectors o the rght had sde of the equato are at ther values at stepk+1 stead ofk. Note that there s a crcular depedecy here, asφ k+1 j := x k+1 but φ k+1 j appears the defto of x k+1. Solvg the proxmal operator s the resoluto of the crcular depedecy. Ths md-pot betwee Fto ad SDCA s terestg t s ow rght, as t appears expermetally to have smlar robustess to permuted ordergs as Fto, but t has o tuable parameters lke SDCA. Whe the proxmal operator above s fast to compute, say o the same order as just evaluatg f j, the SDCA ca be the best method amog those dscussed. It s a lttle slower tha the other methods dscussed here, but t has o tuable parameters at all. It s also the oly choce whe eachf s ot dfferetable. The major dsadvatage of SDCA s that t ca ot hadle o-strogly covex problems drectly. Although lke most methods, addg a small amout of quadratc regularsato ca be used to recover a covergece rate. It s also ot adapted to use proxmal operators for the regularser the composte objectve case. The requremet of computg the proxmal operator of each loss f tally appears to be a bg dsadvatage, however there are varats of SDCA that remove ths requremet, but they troduce addtoal dowsdes. j B Lemmas Lemma 4. Let f be µ-strogly covex ad have Lpschtz cotuous gradets wth costat L. The we have for allxad y: f(x) f(y)+ f 1 (y),x y + 2(L µ) f (x) f (y) 2 + µl 2(L µ) y µ x 2 + (L µ) f (x) f (y),y x. Proof. Defe the fucto g as g(x) = f(x) µ 2 x 2. The the gradet s g (x) = f (x) µx. g has a Lpschtz gradet wth costatl µ. By covexty, we have 13, Thm ]: g(x) g(y)+ g 1 (y),x y + 2(L µ) g (x) g (y) 2. Substtutg the defto ofg ad g, ad smplfyg the terms gves the result. Lemma 5. Let f(x) = 1 =1 f (x). Suppose each f s µ-strogly covex ad has Lpschtz cotuous gradets wth costatl. The for allxad x : f (x),x x L µ L f(x ) f(x)] µ 2 x x 2 1 f (x ) f (x) 2 µ f (x ),x x. 2L L 10

12 Proof. Ths s a straght-forward corollary of Lemma 4, usg y = x, ad averagg over the f fuctos. Lemma 6. We have that for allφ ad x : ] 1 f (φ ) f (x ) 2 1 2L f (φ ) f(x ) 1 f (x ),φ x. Proof. Apply the stadard equalty f(y) f(x) + f (x),y x + 1 2L f (x) f (y) 2, wth y = φ ad x = x, for eachf, ad sum. Lemma 7. It holds that for ayφ k,x, x k ad β > 0, wth w k+1 as defed Equato 1: E w k+1 x k γf (x ) 2 γ 2 (1+β 1 )E f j (φ k j) f j(x ) 2 +γ 2 (1+β)E f j (x k ) f j(x ) 2 γ 2 β f (x k ) f (x ) 2. Proof. We follow a smlar argumet as occurs the SVRG proof 5] for ths term, but wth a tghter argumet. The tghteg comes from usg x+y 2 (1 + β 1 ) x 2 + (1 + β) y 2 stead of the smpler β = 1 case they use. The other key trck s the use of the stadard varace decompostoe X EX] 2 ] = E X 2 ] EX] 2 three tmes. Ew k+1 x k +γf (x ) = E γ 2 f (φ k )+γf (x )+γ f j(φ k j) f j(x k ) } {{ } := γx ] 2 X EX] {}} ] {{}}{ EX] = γ 2 E f j(φ k j) f j(x ) 1 2 {}}{ f (φ k )+f (x ) f j(x k ) f j(x ) f (x k )+f (x )] +γ 2 f (x k ) f (x ) γ 2 (1+β 1 )E f j(φ k j) f j(x ) 1 2 f (φ k )+f (x ) +γ 2 (1+β)Ef j(x k ) f j(x ) f (x k )+f (x ) 2 +γ 2 f (x k ) f (x ) (use varace decomposto twce more): γ 2 (1+β 1 )Ef j(φ k j) f j(x ) 2 +γ 2 (1+β)Ef j(x k ) f j(x ) 2 γ 2 βf (x k ) f (x ) C No-strogly-covex Problems Theorem 2. Whe each f s covex, usgγ = 1 3L, we have for xk = 1 k k t=1 xt that: E F( x k ) ] F(x ) 4 2L x 0 x 2 +f(x 0 ) f (x ),x 0 x ] f(x ). k Here the expectato s over all choces of dexj k up to stepk. Proof. A more detaled verso of ths proof s avalable 14]. We proceed by usg a smlar argumet as Theorem 1, but we add a addtoal α x k x 2 together wth the exstg c x k x 2 term the Lyapuov fucto. We wll boud α x k x 2 a dfferet maer to c x k x 2 (. Defe = 1 γ w k+1 x k) f (x k ), the dfferece betwee our approxmato to the gradet at x k ad 11

13 true gradet. The stead of usg the o-expasveess property at the begg, we use a result proved for prox-svrg 4, 2d eq. o p.12]: αe x k+1 x 2 α x k x 2 2αγE F(x k+1 ) F(x ) ] +2αγ 2 E 2. Although ther quatty s dfferet, they oly use the property that E ] = 0 to prove the above equato. A full proof of ths property for the SAGA algorthm that follows ther argumet appears 14]. To boud the term, a small modfcato of the argumet Lemma 7 ca be used, gvg: E 2 ( 1+β 1) E f j (φ k j) f j(x ) 2 +(1+β)E f j (x k ) f j(x ) 2. Applyg ths gves: αe x k+1 x 2 α x k x 2 2αγE F(x k+1 ) F(x ) ] +2(1+β 1 )αγ 2 E f j (φ k j) f j(x ) 2 +2(1+β)αγ 2 E f j (x k ) f j(x ) 2. As Theorem 1, we the apply Lemma 6 to boud E f j (φ k j ) f j (x ) 2. Combg wth the rest of the Lyapuov fucto as was derved Theorem 1 gves (we bascally add the α terms to equalty (10) wthµ = 0): ET k+1 ] T k ( ) 1 f(x 2cγ k ) f(x ) f (x ),x k x ] 2αγE F(x k+1 ) F(x ) ] ( + 4(1+β 1 )αlγ 2 +2(1+β 1 )clγ 2 1 ) 1 f (φ k ) f(x ) 1 f (x ),φ k x ] ( + (1+β)cγ +2(1+β)αγ c ) γe f L j (x k ) f j(x ) 2. As before, the terms square brackets are postve by covexty. Gve that our choce of step sze s γ = 1 3L (to match the adaptve to strog covexty step sze), we ca set the three roud brackets to zero by usgβ = 1,c = 3L 3L 2 adα = 8. We thus obta: ET k+1 ] T k 1 4 E F(x k+1 ) F(x ) ]. These expectatos are codtoal o formato from step k. We ow take the expectato wth respect to all prevous steps, yeldg ET k+1 ] ET k ] 1 4 E F(x k+1 ) F(x ) ], where all expectatos are ucodtoal. Further egatg ad summg for k from 0 to k 1 results telescopg of thet terms, gvg: k 1 4 E F(x t ) F(x ) ]] T 0 ET k ]. t=1 We ca drop the E T k] term sce T k s always postve. The we apply covexty to pull the summato sde off, ad multply through by4/k, gvg: ] E F( 1 k k x t ) F(x ) 1 k k E F(x t ) F(x ) ]] 4 k T0. t=1 t=1 We get a (c+α) = 15L 8 2L term that we use T0 for smplcty. D Example Code for Sparse Least Squares & Rdge Regresso The SAGA method s qute easy to mplemet for dese gradets, however the mplemetato for sparse gradet problems ca be trcky. The ma complcato s the eed for just--tme updatg of the elemets of the terate vector. Ths s eeded to avod havg to do ay full dese vector operatos at each terato. We provde below a smple mplemetato for the case of least-squares problems that llustrates how to correctly do ths. The code s the compled Pytho (Cytho) laguage. 12

14 mport radom mport umpy as p cmport umpy as p cmport cytho from cytho.vew cmport array as cvarray # Performs the lagged update of x by g. cdef le lagged update(log k, double:] x, double:] g, usged log:] lag, log:] ydces, t yle, double:] lag scalg, double a): cdef usged t cdef log d cdef usged log lagged amout = 0 for rage(yle): d = ydces] lagged amout = k lagd] lagd] = k xd] += lag scalg lagged amout] (a gd]) # Performs x += a y, where x s dese ad y s sparse. cdef le add weghted(double:] x, double:] ydata, log:] ydces, t yle, double a): cdef usged t for rage(yle): xydces]] += a ydata] # Dot product of a dese vector wth a sparse vector cdef le spdot(double:] x, double:] ydata, log:] ydces, t yle): cdef usged t cdef double v = 0.0 for rage(yle): v += ydata] xydces]] retur v def saga lstsq(a, double:] b, usged t maxter, props): # temporares cdef double:] ydata cdef log:] ydces cdef usged t, j, epoch, lagged amout cdef log dstart, ded, yle, d cdef double cew, Ax, cchage, gscalg # Data pots are stored colums CSC format. cdef double:] data = A.data cdef log:] dces = A.dces cdef log:] dptr = A.dptr cdef usged t m = A.shape0] # dmesos cdef usged t = A.shape1] # datapots cdef double:] xk = p.zeros(m) cdef double:] gk = p.zeros(m) 13

15 cdef double eta = props eta ] # Iverse step sze = 1/gamma cdef double reg = props.get( reg, 0.0) # Default 0 cdef double betak = 1.0 # Scalg factor for xk. # Tracks for each etry of x, what terato t was last updated at. cdef usged log:] lag = p.zeros(m, dtype= I ) # Italze gradets cdef double gd = 1.0/ for rage(): dstart = dptr] ded = dptr+1] ydata = datadstart:ded] ydces = dcesdstart:ded] yle = ded dstart add weghted(gk, ydata, ydces, yle, gd b]) # Ths s just a table of the sum the geometrc seres (1 reg/eta) # It s used to correctly do the just tme updatg whe # L2 regularsato s used. cdef double:] lag scalg = p.zeros( maxter+1) lag scalg0] = 0.0 lag scalg1] = 1.0 cdef double geosum = 1.0 cdef double mult = 1.0 reg/eta for rage(2, maxter+1): geosum = mult lag scalg] = lag scalg 1] + geosum # For least squares, we oly eed to store a sgle # double for each data pot, rather tha a full gradet vector. # The value stored s the A betak x product cdef double:] c = p.zeros() cdef usged log k = 0 # Curret terato umber for epoch rage(maxter): for j rage(): f epoch == 0: = j else: = p.radom.radt(0, ) # Selects the (sparse) colum of the data matrx cotag datapot. dstart = dptr] ded = dptr+1] ydata = datadstart:ded] ydces = dcesdstart:ded] yle = ded dstart # Apply the mssed updates to xk just tme lagged update(k, xk, gk, lag, ydces, yle, lag scalg, 1.0/(eta betak)) Ax = betak spdot(xk, ydata, ydces, yle) cew = Ax cchage = cew c] 14

16 c] = cew betak = 1.0 reg/eta # Update xk wth sparse step bt (wth betak scalg) add weghted(xk, ydata, ydces, yle, cchage/( eta betak)) k += 1 # Perform the gradet average part of the step lagged update(k, xk, gk, lag, ydces, yle, lag scalg, 1.0/(eta betak)) # update the gradet average add weghted(gk, ydata, ydces, yle, cchage/) # Perform the just tme updates for the whole xk vector, so that all etres are up to date. gscalg = 1.0/(eta betak) for d rage(m): lagged amout = k lagd] lagd] = k xkd] += lag scalg lagged amout] gscalg gkd] retur betak p.asarray(xk) 15

arxiv: v1 [cs.lg] 22 Feb 2015

arxiv: v1 [cs.lg] 22 Feb 2015 SDCA wthout Dualty Sha Shalev-Shwartz arxv:50.0677v cs.lg Feb 05 Abstract Stochastc Dual Coordate Ascet s a popular method for solvg regularzed loss mmzato for the case of covex losses. I ths paper we

More information

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture) CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

Rademacher Complexity. Examples

Rademacher Complexity. Examples Algorthmc Foudatos of Learg Lecture 3 Rademacher Complexty. Examples Lecturer: Patrck Rebesch Verso: October 16th 018 3.1 Itroducto I the last lecture we troduced the oto of Rademacher complexty ad showed

More information

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy Bouds o the expected etropy ad KL-dvergece of sampled multomal dstrbutos Brado C. Roy bcroy@meda.mt.edu Orgal: May 18, 2011 Revsed: Jue 6, 2011 Abstract Iformato theoretc quattes calculated from a sampled

More information

Dimensionality Reduction and Learning

Dimensionality Reduction and Learning CMSC 35900 (Sprg 009) Large Scale Learg Lecture: 3 Dmesoalty Reducto ad Learg Istructors: Sham Kakade ad Greg Shakharovch L Supervsed Methods ad Dmesoalty Reducto The theme of these two lectures s that

More information

An Accelerated Proximal Coordinate Gradient Method

An Accelerated Proximal Coordinate Gradient Method A Accelerated Proxmal Coordate Gradet Method Qhag L Uversty of Iowa Iowa Cty IA USA qhag-l@uowaedu Zhaosog Lu Smo Fraser Uversty Buraby BC Caada zhaosog@sfuca L Xao Mcrosoft Research Redmod WA USA lxao@mcrosoftcom

More information

New Optimisation Methods for Machine Learning

New Optimisation Methods for Machine Learning New Optmsato Methods for Mache Learg Aaro Defazo (Uder Examato) A thess submtted for the degree of Doctor of Phlosophy of The Australa Natoal Uversty November 204 c Aaro Defazo 204 Except where otherwse

More information

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971)) art 4b Asymptotc Results for MRR usg RESS Recall that the RESS statstc s a specal type of cross valdato procedure (see Alle (97)) partcular to the regresso problem ad volves fdg Y $,, the estmate at the

More information

New Optimisation Methods for Machine Learning Aaron Defazio

New Optimisation Methods for Machine Learning Aaron Defazio New Optmsato Methods for Mache Learg Aaro Defazo A thess submtted for the degree of Doctor of Phlosophy of The Australa Natoal Uversty October 205 c Aaro Defazo 204 Except where otherwse dcated, ths thess

More information

CHAPTER 4 RADICAL EXPRESSIONS

CHAPTER 4 RADICAL EXPRESSIONS 6 CHAPTER RADICAL EXPRESSIONS. The th Root of a Real Number A real umber a s called the th root of a real umber b f Thus, for example: s a square root of sce. s also a square root of sce ( ). s a cube

More information

A tighter lower bound on the circuit size of the hardest Boolean functions

A tighter lower bound on the circuit size of the hardest Boolean functions Electroc Colloquum o Computatoal Complexty, Report No. 86 2011) A tghter lower boud o the crcut sze of the hardest Boolea fuctos Masak Yamamoto Abstract I [IPL2005], Fradse ad Mlterse mproved bouds o the

More information

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek Partally Codtoal Radom Permutato Model 7- vestgato of Partally Codtoal RP Model wth Respose Error TRODUCTO Ed Staek We explore the predctor that wll result a smple radom sample wth respose error whe a

More information

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015 Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts

More information

Bayes (Naïve or not) Classifiers: Generative Approach

Bayes (Naïve or not) Classifiers: Generative Approach Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

Analysis of Lagrange Interpolation Formula

Analysis of Lagrange Interpolation Formula P IJISET - Iteratoal Joural of Iovatve Scece, Egeerg & Techology, Vol. Issue, December 4. www.jset.com ISS 348 7968 Aalyss of Lagrage Iterpolato Formula Vjay Dahya PDepartmet of MathematcsMaharaja Surajmal

More information

Lecture 3 Probability review (cont d)

Lecture 3 Probability review (cont d) STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto

More information

Simple Linear Regression

Simple Linear Regression Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato

More information

TESTS BASED ON MAXIMUM LIKELIHOOD

TESTS BASED ON MAXIMUM LIKELIHOOD ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal

More information

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions CO-511: Learg Theory prg 2017 Lecturer: Ro Lv Lecture 16: Bacpropogato Algorthm Dsclamer: These otes have ot bee subected to the usual scruty reserved for formal publcatos. They may be dstrbuted outsde

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

Lecture 9: Tolerant Testing

Lecture 9: Tolerant Testing Lecture 9: Tolerat Testg Dael Kae Scrbe: Sakeerth Rao Aprl 4, 07 Abstract I ths lecture we prove a quas lear lower boud o the umber of samples eeded to do tolerat testg for L dstace. Tolerat Testg We have

More information

18.413: Error Correcting Codes Lab March 2, Lecture 8

18.413: Error Correcting Codes Lab March 2, Lecture 8 18.413: Error Correctg Codes Lab March 2, 2004 Lecturer: Dael A. Spelma Lecture 8 8.1 Vector Spaces A set C {0, 1} s a vector space f for x all C ad y C, x + y C, where we take addto to be compoet wse

More information

PTAS for Bin-Packing

PTAS for Bin-Packing CS 663: Patter Matchg Algorthms Scrbe: Che Jag /9/00. Itroducto PTAS for B-Packg The B-Packg problem s NP-hard. If we use approxmato algorthms, the B-Packg problem could be solved polyomal tme. For example,

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematcs of Mache Learg Lecturer: Phlppe Rgollet Lecture 3 Scrbe: James Hrst Sep. 6, 205.5 Learg wth a fte dctoary Recall from the ed of last lecture our setup: We are workg wth a fte dctoary

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

2006 Jamie Trahan, Autar Kaw, Kevin Martin University of South Florida United States of America

2006 Jamie Trahan, Autar Kaw, Kevin Martin University of South Florida United States of America SOLUTION OF SYSTEMS OF SIMULTANEOUS LINEAR EQUATIONS Gauss-Sedel Method 006 Jame Traha, Autar Kaw, Kev Mart Uversty of South Florda Uted States of Amerca kaw@eg.usf.edu Itroducto Ths worksheet demostrates

More information

Investigating Cellular Automata

Investigating Cellular Automata Researcher: Taylor Dupuy Advsor: Aaro Wootto Semester: Fall 4 Ivestgatg Cellular Automata A Overvew of Cellular Automata: Cellular Automata are smple computer programs that geerate rows of black ad whte

More information

Ideal multigrades with trigonometric coefficients

Ideal multigrades with trigonometric coefficients Ideal multgrades wth trgoometrc coeffcets Zarathustra Brady December 13, 010 1 The problem A (, k) multgrade s defed as a par of dstct sets of tegers such that (a 1,..., a ; b 1,..., b ) a j = =1 for all

More information

STK4011 and STK9011 Autumn 2016

STK4011 and STK9011 Autumn 2016 STK4 ad STK9 Autum 6 Pot estmato Covers (most of the followg materal from chapter 7: Secto 7.: pages 3-3 Secto 7..: pages 3-33 Secto 7..: pages 35-3 Secto 7..3: pages 34-35 Secto 7.3.: pages 33-33 Secto

More information

LECTURE 24 LECTURE OUTLINE

LECTURE 24 LECTURE OUTLINE LECTURE 24 LECTURE OUTLINE Gradet proxmal mmzato method Noquadratc proxmal algorthms Etropy mmzato algorthm Expoetal augmeted Lagraga mehod Etropc descet algorthm **************************************

More information

Introduction to local (nonparametric) density estimation. methods

Introduction to local (nonparametric) density estimation. methods Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest

More information

Functions of Random Variables

Functions of Random Variables Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

Supervised learning: Linear regression Logistic regression

Supervised learning: Linear regression Logistic regression CS 57 Itroducto to AI Lecture 4 Supervsed learg: Lear regresso Logstc regresso Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 57 Itro to AI Data: D { D D.. D D Supervsed learg d a set of eamples s

More information

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines It J Cotemp Math Sceces, Vol 5, 2010, o 19, 921-929 Solvg Costraed Flow-Shop Schedulg Problems wth Three Maches P Pada ad P Rajedra Departmet of Mathematcs, School of Advaced Sceces, VIT Uversty, Vellore-632

More information

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem CS86. Lecture 4: Dur s Proof of the PCP Theorem Scrbe: Thom Bohdaowcz Prevously, we have prove a weak verso of the PCP theorem: NP PCP 1,1/ (r = poly, q = O(1)). Wth ths result we have the desred costat

More information

8.1 Hashing Algorithms

8.1 Hashing Algorithms CS787: Advaced Algorthms Scrbe: Mayak Maheshwar, Chrs Hrchs Lecturer: Shuch Chawla Topc: Hashg ad NP-Completeess Date: September 21 2007 Prevously we looked at applcatos of radomzed algorthms, ad bega

More information

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall

More information

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set. Addtoal Decrease ad Coquer Algorthms For combatoral problems we mght eed to geerate all permutatos, combatos, or subsets of a set. Geeratg Permutatos If we have a set f elemets: { a 1, a 2, a 3, a } the

More information

L5 Polynomial / Spline Curves

L5 Polynomial / Spline Curves L5 Polyomal / Sple Curves Cotets Coc sectos Polyomal Curves Hermte Curves Bezer Curves B-Sples No-Uform Ratoal B-Sples (NURBS) Mapulato ad Represetato of Curves Types of Curve Equatos Implct: Descrbe a

More information

ESS Line Fitting

ESS Line Fitting ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

Chapter 9 Jordan Block Matrices

Chapter 9 Jordan Block Matrices Chapter 9 Jorda Block atrces I ths chapter we wll solve the followg problem. Gve a lear operator T fd a bass R of F such that the matrx R (T) s as smple as possble. f course smple s a matter of taste.

More information

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS Numercal Computg -I UNIT SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS Structure Page Nos..0 Itroducto 6. Objectves 7. Ital Approxmato to a Root 7. Bsecto Method 8.. Error Aalyss 9.4 Regula Fals Method

More information

Entropy ISSN by MDPI

Entropy ISSN by MDPI Etropy 2003, 5, 233-238 Etropy ISSN 1099-4300 2003 by MDPI www.mdp.org/etropy O the Measure Etropy of Addtve Cellular Automata Hasa Aı Arts ad Sceces Faculty, Departmet of Mathematcs, Harra Uversty; 63100,

More information

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg

More information

PROJECTION PROBLEM FOR REGULAR POLYGONS

PROJECTION PROBLEM FOR REGULAR POLYGONS Joural of Mathematcal Sceces: Advaces ad Applcatos Volume, Number, 008, Pages 95-50 PROJECTION PROBLEM FOR REGULAR POLYGONS College of Scece Bejg Forestry Uversty Bejg 0008 P. R. Cha e-mal: sl@bjfu.edu.c

More information

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers

More information

Transforms that are commonly used are separable

Transforms that are commonly used are separable Trasforms s Trasforms that are commoly used are separable Eamples: Two-dmesoal DFT DCT DST adamard We ca the use -D trasforms computg the D separable trasforms: Take -D trasform of the rows > rows ( )

More information

5 Short Proofs of Simplified Stirling s Approximation

5 Short Proofs of Simplified Stirling s Approximation 5 Short Proofs of Smplfed Strlg s Approxmato Ofr Gorodetsky, drtymaths.wordpress.com Jue, 20 0 Itroducto Strlg s approxmato s the followg (somewhat surprsg) approxmato of the factoral,, usg elemetary fuctos:

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem Joural of Amerca Scece ;6( Cubc Nopolyomal Sple Approach to the Soluto of a Secod Order Two-Pot Boudary Value Problem W.K. Zahra, F.A. Abd El-Salam, A.A. El-Sabbagh ad Z.A. ZAk * Departmet of Egeerg athematcs

More information

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class) Assgmet 5/MATH 7/Wter 00 Due: Frday, February 9 class (!) (aswers wll be posted rght after class) As usual, there are peces of text, before the questos [], [], themselves. Recall: For the quadratc form

More information

Analysis of VMSS Schemes for Group Key Transfer Protocol

Analysis of VMSS Schemes for Group Key Transfer Protocol Aalss of VMSS Schemes for Group Ke Trasfer Protocol Chg-Fag Hsu, Sha Wu To cte ths verso: Chg-Fag Hsu, Sha Wu. Aalss of VMSS Schemes for Group Ke Trasfer Protocol. Chg- Hse Hsu; Xuahua Sh; Valeta Salapura.

More information

Evaluating Polynomials

Evaluating Polynomials Uverst of Nebraska - Lcol DgtalCommos@Uverst of Nebraska - Lcol MAT Exam Expostor Papers Math the Mddle Isttute Partershp 7-7 Evaluatg Polomals Thomas J. Harrgto Uverst of Nebraska-Lcol Follow ths ad addtoal

More information

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme:

More information

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package

More information

Cross-validation estimations of hyper-parameters of Gaussian processes with inequality constraints

Cross-validation estimations of hyper-parameters of Gaussian processes with inequality constraints Cross-valdato estmatos of hyper-parameters of Gaussa processes wth equalty costrats Hassa Maatouk, Olver Roustat, Ya Rchet To cte ths verso: Hassa Maatouk, Olver Roustat, Ya Rchet. Cross-valdato estmatos

More information

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation PGE 30: Formulato ad Soluto Geosystems Egeerg Dr. Balhoff Iterpolato Numercal Methods wth MATLAB, Recktewald, Chapter 0 ad Numercal Methods for Egeers, Chapra ad Caale, 5 th Ed., Part Fve, Chapter 8 ad

More information

Beam Warming Second-Order Upwind Method

Beam Warming Second-Order Upwind Method Beam Warmg Secod-Order Upwd Method Petr Valeta Jauary 6, 015 Ths documet s a part of the assessmet work for the subject 1DRP Dfferetal Equatos o Computer lectured o FNSPE CTU Prague. Abstract Ths documet

More information

A Remark on the Uniform Convergence of Some Sequences of Functions

A Remark on the Uniform Convergence of Some Sequences of Functions Advaces Pure Mathematcs 05 5 57-533 Publshed Ole July 05 ScRes. http://www.scrp.org/joural/apm http://dx.do.org/0.436/apm.05.59048 A Remark o the Uform Covergece of Some Sequeces of Fuctos Guy Degla Isttut

More information

Runtime analysis RLS on OneMax. Heuristic Optimization

Runtime analysis RLS on OneMax. Heuristic Optimization Lecture 6 Rutme aalyss RLS o OeMax trals of {,, },, l ( + ɛ) l ( ɛ)( ) l Algorthm Egeerg Group Hasso Platter Isttute, Uversty of Potsdam 9 May T, We wat to rgorously uderstad ths behavor 9 May / Rutme

More information

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d 9 U-STATISTICS Suppose,,..., are P P..d. wth CDF F. Our goal s to estmate the expectato t (P)=Eh(,,..., m ). Note that ths expectato requres more tha oe cotrast to E, E, or Eh( ). Oe example s E or P((,

More information

ρ < 1 be five real numbers. The

ρ < 1 be five real numbers. The Lecture o BST 63: Statstcal Theory I Ku Zhag, /0/006 Revew for the prevous lecture Deftos: covarace, correlato Examples: How to calculate covarace ad correlato Theorems: propertes of correlato ad covarace

More information

1 Lyapunov Stability Theory

1 Lyapunov Stability Theory Lyapuov Stablty heory I ths secto we cosder proofs of stablty of equlbra of autoomous systems. hs s stadard theory for olear systems, ad oe of the most mportat tools the aalyss of olear systems. It may

More information

An Introduction to. Support Vector Machine

An Introduction to. Support Vector Machine A Itroducto to Support Vector Mache Support Vector Mache (SVM) A classfer derved from statstcal learg theory by Vapk, et al. 99 SVM became famous whe, usg mages as put, t gave accuracy comparable to eural-etwork

More information

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis) We have covered: Selecto, Iserto, Mergesort, Bubblesort, Heapsort Next: Selecto the Qucksort The Selecto Problem - Varable Sze Decrease/Coquer (Practce wth algorthm aalyss) Cosder the problem of fdg the

More information

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018 Chrs Pech Fal Practce CS09 Dec 5, 08 Practce Fal Examato Solutos. Aswer: 4/5 8/7. There are multle ways to obta ths aswer; here are two: The frst commo method s to sum over all ossbltes for the rak of

More information

Chapter 8. Inferences about More Than Two Population Central Values

Chapter 8. Inferences about More Than Two Population Central Values Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha

More information

ECON 5360 Class Notes GMM

ECON 5360 Class Notes GMM ECON 560 Class Notes GMM Geeralzed Method of Momets (GMM) I beg by outlg the classcal method of momets techque (Fsher, 95) ad the proceed to geeralzed method of momets (Hase, 98).. radtoal Method of Momets

More information

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution: Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Mache Learg Problem set Due Frday, September 9, rectato Please address all questos ad commets about ths problem set to 6.867-staff@a.mt.edu. You do ot eed to use MATLAB for ths problem set though

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statstcal Learg Teory Lecturer: Tegyu Ma Lecture #7 Scrbe: Bra Zag October 5, 08 Revew ad Overvew We wll frst gve a bref revew of wat as bee covered so far I te frst few lectures, we stated

More information

Simulation Output Analysis

Simulation Output Analysis Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5

More information

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Multivariate Transformation of Variables and Maximum Likelihood Estimation Marquette Uversty Multvarate Trasformato of Varables ad Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Assocate Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 03 by Marquette Uversty

More information

Lecture 02: Bounding tail distributions of a random variable

Lecture 02: Bounding tail distributions of a random variable CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome

More information

Generative classification models

Generative classification models CS 75 Mache Learg Lecture Geeratve classfcato models Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Data: D { d, d,.., d} d, Classfcato represets a dscrete class value Goal: lear f : X Y Bar classfcato

More information

MATH 247/Winter Notes on the adjoint and on normal operators.

MATH 247/Winter Notes on the adjoint and on normal operators. MATH 47/Wter 00 Notes o the adjot ad o ormal operators I these otes, V s a fte dmesoal er product space over, wth gve er * product uv, T, S, T, are lear operators o V U, W are subspaces of V Whe we say

More information

NP!= P. By Liu Ran. Table of Contents. The P vs. NP problem is a major unsolved problem in computer

NP!= P. By Liu Ran. Table of Contents. The P vs. NP problem is a major unsolved problem in computer NP!= P By Lu Ra Table of Cotets. Itroduce 2. Strategy 3. Prelmary theorem 4. Proof 5. Expla 6. Cocluso. Itroduce The P vs. NP problem s a major usolved problem computer scece. Iformally, t asks whether

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Revew for the prevous lecture: Theorems ad Examples: How to obta the pmf (pdf) of U = g (, Y) ad V = g (, Y) Chapter 4 Multple Radom Varables Chapter 44 Herarchcal Models ad Mxture Dstrbutos Examples:

More information

Kernel-based Methods and Support Vector Machines

Kernel-based Methods and Support Vector Machines Kerel-based Methods ad Support Vector Maches Larr Holder CptS 570 Mache Learg School of Electrcal Egeerg ad Computer Scece Washgto State Uverst Refereces Muller et al. A Itroducto to Kerel-Based Learg

More information

Generalized Linear Regression with Regularization

Generalized Linear Regression with Regularization Geeralze Lear Regresso wth Regularzato Zoya Bylsk March 3, 05 BASIC REGRESSION PROBLEM Note: I the followg otes I wll make explct what s a vector a what s a scalar usg vec t or otato, to avo cofuso betwee

More information

NP!= P. By Liu Ran. Table of Contents. The P versus NP problem is a major unsolved problem in computer

NP!= P. By Liu Ran. Table of Contents. The P versus NP problem is a major unsolved problem in computer NP!= P By Lu Ra Table of Cotets. Itroduce 2. Prelmary theorem 3. Proof 4. Expla 5. Cocluso. Itroduce The P versus NP problem s a major usolved problem computer scece. Iformally, t asks whether a computer

More information

A conic cutting surface method for linear-quadraticsemidefinite

A conic cutting surface method for linear-quadraticsemidefinite A coc cuttg surface method for lear-quadratcsemdefte programmg Mohammad R. Osoorouch Calfora State Uversty Sa Marcos Sa Marcos, CA Jot wor wth Joh E. Mtchell RPI July 3, 2008 Outle: Secod-order coe: defto

More information

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity BULLETIN of the MALAYSIAN MATHEMATICAL SCIENCES SOCIETY Bull. Malays. Math. Sc. Soc. () 7 (004), 5 35 Strog Covergece of Weghted Averaged Appromats of Asymptotcally Noepasve Mappgs Baach Spaces wthout

More information

Distributed Accelerated Proximal Coordinate Gradient Methods

Distributed Accelerated Proximal Coordinate Gradient Methods Dstrbuted Accelerated Proxmal Coordate Gradet Methods Yog Re, Ju Zhu Ceter for Bo-Ispred Computg Research State Key Lab for Itell. Tech. & Systems Dept. of Comp. Sc. & Tech., TNLst Lab, Tsghua Uversty

More information

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods Malaysa Umodalty Joural Tests of Mathematcal for Global Optmzato Sceces (): of 05 Sgle - 5 Varable (007) Fuctos Usg Statstcal Methods Umodalty Tests for Global Optmzato of Sgle Varable Fuctos Usg Statstcal

More information

CHAPTER VI Statistical Analysis of Experimental Data

CHAPTER VI Statistical Analysis of Experimental Data Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca

More information

Newton s Power Flow algorithm

Newton s Power Flow algorithm Power Egeerg - Egll Beedt Hresso ewto s Power Flow algorthm Power Egeerg - Egll Beedt Hresso The ewto s Method of Power Flow 2 Calculatos. For the referece bus #, we set : V = p.u. ad δ = 0 For all other

More information

Simple Linear Regression

Simple Linear Regression Correlato ad Smple Lear Regresso Berl Che Departmet of Computer Scece & Iformato Egeerg Natoal Tawa Normal Uversty Referece:. W. Navd. Statstcs for Egeerg ad Scetsts. Chapter 7 (7.-7.3) & Teachg Materal

More information

Lecture Note to Rice Chapter 8

Lecture Note to Rice Chapter 8 ECON 430 HG revsed Nov 06 Lecture Note to Rce Chapter 8 Radom matrces Let Y, =,,, m, =,,, be radom varables (r.v. s). The matrx Y Y Y Y Y Y Y Y Y Y = m m m s called a radom matrx ( wth a ot m-dmesoal dstrbuto,

More information

Multiple Choice Test. Chapter Adequacy of Models for Regression

Multiple Choice Test. Chapter Adequacy of Models for Regression Multple Choce Test Chapter 06.0 Adequac of Models for Regresso. For a lear regresso model to be cosdered adequate, the percetage of scaled resduals that eed to be the rage [-,] s greater tha or equal to

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Marquette Uverst Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Coprght 08 b Marquette Uverst Maxmum Lkelhood Estmato We have bee sag that ~

More information

Taylor s Series and Interpolation. Interpolation & Curve-fitting. CIS Interpolation. Basic Scenario. Taylor Series interpolates at a specific

Taylor s Series and Interpolation. Interpolation & Curve-fitting. CIS Interpolation. Basic Scenario. Taylor Series interpolates at a specific CIS 54 - Iterpolato Roger Crawfs Basc Scearo We are able to prod some fucto, but do ot kow what t really s. Ths gves us a lst of data pots: [x,f ] f(x) f f + x x + August 2, 25 OSU/CIS 54 3 Taylor s Seres

More information

13. Parametric and Non-Parametric Uncertainties, Radial Basis Functions and Neural Network Approximations

13. Parametric and Non-Parametric Uncertainties, Radial Basis Functions and Neural Network Approximations Lecture 7 3. Parametrc ad No-Parametrc Ucertates, Radal Bass Fuctos ad Neural Network Approxmatos he parameter estmato algorthms descrbed prevous sectos were based o the assumpto that the system ucertates

More information

Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization

Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization Stochastc Prmal-Dual Coordate Method for Regularzed Emprcal Rsk Mmzato Yuche Zhag L Xao September 24 Abstract We cosder a geerc covex optmzato problem assocated wth regularzed emprcal rsk mmzato of lear

More information

EECE 301 Signals & Systems

EECE 301 Signals & Systems EECE 01 Sgals & Systems Prof. Mark Fowler Note Set #9 Computg D-T Covoluto Readg Assgmet: Secto. of Kame ad Heck 1/ Course Flow Dagram The arrows here show coceptual flow betwee deas. Note the parallel

More information