arxiv: v1 [math.oc] 7 Mar 2017

Size: px
Start display at page:

Download "arxiv: v1 [math.oc] 7 Mar 2017"

Transcription

1 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Jale Wag L Xao arxv: v [math.oc] 7 Mar 07 Abstract We cosder emprcal rsk mmzato of lear predctors wth covex loss fuctos. Such problems ca be reformulated as covex-cocave saddle pot problems ad thus are well sutable for prmal-dual frst-order algorthms. However prmal-dual algorthms ofte requre explct strogly covex regularzato order to obta fast lear covergece ad the requred dual proxmal mappg may ot admt closedform or effcet soluto. I ths paper we develop both batch ad radomzed prmal-dual algorthms that ca explot strog covexty from data adaptvely ad are capable of achevg lear covergece eve wthout regularzato. We also preset dual-free varats of the adaptve prmal-dual algorthms that do ot requre computg the dual proxmal mappg whch are especally sutable for logstc regresso.. Itroducto We cosder the problem of regularzed emprcal rsk mmzato ERM of lear predctors. Leta...a R d be the feature vectors of data samples φ : R R be a covex loss fucto assocated wth the lear predcto a T x for =... ad g : Rd R be a covex regularzato fucto for the predctorx R d. ERM amouts to solvg the followg covex optmzato problem: { m Px def = } x R d = φ a T xgx. Examples of the above formulato clude may wellkow classfcato ad regresso problems. For bary classfcato each feature vectora s assocated wth a label b {±}. I partcular logstc regresso s obtaed by settg φ z = logexp b z. For lear regresso problems each feature vector a s assocated wth a Departmet of Computer Scece The Uversty of Chcago Chcago Illos USA. Mcrosoft Research Redmod Washgto 9805 USA. Correspodece to: Jale Wag <jale@uchcago.edu> L Xao<l.xao@mcrosoft.com>. depedet varable b R ad φ z = /z b. The we get rdge regresso wth gx = λ/ x ad elastc et wthgx = λ x λ / x. LetA = [a...a ] T be the data matrx. Throughout ths paper we make the followg assumptos: Assumpto. The fuctosφ g ad matrxasatsfy: Each φ s δ-strogly covex ad /-smooth where > 0 adδ 0 adδ ; g s λ-strogly covex where λ 0; λδµ > 0 where µ = λ m A T A. The strog covexty ad smoothess metoed above are wth respect to the stadard Eucldea orm deoted as x = x T x. See e.g. Nesterov 004 Sectos.. ad..3 for the exact deftos. Let R = max { a } ad assumg λ > 0 the R /λ s a popular defto of codto umber for aalyzg complextes of dfferet algorthms. The last codto above meas that the prmal objectve fucto Px s strogly covex eve f λ = 0. There have bee extesve research actvtes recet years o developg effcetly algorthms for solvg problem. A broad class of radomzed algorthms that explot the fte sum structure the ERM problem have emerged as very compettve both terms of theoretcal complexty ad practcal performace. They ca be put to three categores: prmal dual ad prmal-dual. Prmal radomzed algorthms work wth the ERM problem drectly. They are moder versos of radomzed cremetal gradet methods e.g. Bertsekas 0; Nedc & Bertsekas 00 equpped wth varace reducto techques. Each terato of such algorthms oly process oe data pot a wth complexty Od. They cludes SAG Roux et al. 0 SAGA Defazo et al. 04 ad SVRG Johso & Zhag 03; Xao & Zhag 04 whch all acheve the terato complexty O R /λlog/ǫ to fd a ǫ- optmal soluto. I fact they are capable of explotg the strog covexty from data meag that the codto umberr /λ the complexty ca be replaced by the more favorable oer /λδµ /. Ths mprovemet ca be acheved wthout explct kowledge of µ from data.

2 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Dual algorthms solve Fechel dual of by maxmzg Dy def = = φ y g = y a usg radomzed coordate ascet algorthms. Here φ ad g deotes the cojugate fuctos of φ ad g. They clude SDCA Shalev-Shwartz & Zhag 03 Nesterov 0 ad Rchtárk & Takáč 04. They have the same complexty O R /λlog/ǫ but are hard to explot strog covexty from data. Prmal-dual algorthms solve the covex-cocave saddle pot problemm x max y Lxy where Lxy def = = y a x φ y gx. 3 I partcular SPDC Zhag & Xao 05 acheves a accelerated lear covergece rate wth terato complexty O R/ λlog/ǫ whch s better tha the aforemetoed o-accelerated complexty whe R /λ >. La & Zhou 05 developed dual-free varats of accelerated prmal-dual algorthms but wthout cosderg the lear predctor structure ERM. Balamuruga & Bach 06 exteded SVRG ad SAGA to solvg saddle pot problems. Accelerated prmal ad dual radomzed algorthms have also bee developed. Nesterov 0 Fercoq & Rchtárk 05 ad L et al. 05b developed accelerated coordate gradet algorthms whch ca be appled to solve the dual problem. Alle-Zhu 06 developed a accelerated varat of SVRG. Accelerato ca also be obtaed usg the Catalyst framework L et al. 05a. They all acheve the same O R/ λlog/ǫ complexty. A commo feature of accelerated algorthms s that they requre good estmate of the strog covexty parameter. Ths makes hard for them to explot strog covexty from data because the mmum sgular valueµ of the data matrxas very hard to estmate geeral. I ths paper we show that prmal-dual algorthms are capable of explotg strog covexty from data f the algorthm parameters such as step szes are set approprately. Whle these optmal settg depeds o the kowledge of the covexty parameter µ from the data we develop adaptve varats of prmal-dual algorthms that ca tue the parameter automatcally. Such adaptve schemes rely crtcally o the capablty of evaluatg the prmal-dual optmalty gaps by prmal-dual algorthms. A major dsadvatage of prmal-dual algorthms s that the requred dual proxmal mappg may ot admt closedform or effcet soluto. We follow the approach of La & Zhou 05 to derve dual-free varats of the prmal-dual algorthms customzed for ERM problems wth the lear predctor structure ad show that they ca also explot strog covexty from data wth correct choces of parameters or usg a adaptato scheme. Algorthm Batch Prmal-Dual BPD Algorthm put: parametersτ θ tal pot x 0 = x 0 y 0 fort = 0... do y t = prox f y t A x t x t = prox τg x t τa T y t x t = x t θx t x t ed for. Batch prmal-dual algorthms Before dvg to radomzed prmal-dual algorthms we frst cosder batch prmal-dual algorthms whch exhbt smlar propertes as ther radomzed varats. To ths ed we cosder a batch verso of the ERM problem m x R d { Px def = faxgx }. 4 wherea R d ad make the followg assumpto: Assumpto. The fuctos f g ad matrx A satsfy: f s δ-strogly covex ad /-smooth where > 0 adδ 0 adδ ; g s λ-strogly covex where λ 0; λδµ > 0 where µ = λ m A T A. For exact correspodece wth problem we have fz = = φ z wth z = a T x. Uder Assumpto the fucto fz s δ/-strogly covex ad /-smooth ad fax s δµ /-strogly covex ad R /-smooth. However such correspodeces aloe are ot suffcet to explot the structure of.e. substtutg them to the batch algorthms of ths secto wll ot produce the effcet algorthms for solvg problem that we wll preset Sectos 3 ad 4.. So we do ot make such correspodeces explct ths secto. Rather treat them as depedet assumptos wth the same otato. Usg cojugate fuctos we ca derve the dual of 4 as max y R { Dy def = f y g A T y } 5 ad the covex-cocave saddle pot formulato s { def m max Lxy = gxy T Ax f y }. 6 x R d y R We cosder the prmal-dual frst-order algorthm proposed by Chambolle & Pock 0; 06 for solvg the saddle pot problem 6 whch s gve as Algorthm. Here we call t the batch prmal-dual BPD algorthm. Assumg that f s smooth ad g s strogly covex Chambolle & Pock 0; 06 showed that Algorthm acheves accelerated lear covergece rate f λ > 0. However they dd ot cosder the case where addtoal or the sole source of strog covexty comes from fax.

3 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms I the followg theorem we show how to set the parameters τ ad θ to explot both sources of strog covexty to acheve fast lear covergece. Theorem. Suppose Assumpto holds ad x y s the uque saddle pot ofldefed 6. LetL = A = λmax A T A. If we set the parameters Algorthm as = L λδµ τ = L λδµ 7 adθ = max{θ x θ y } where θ x = δ µ δ L τλ θ y = / 8 the we have τ λ x t x 4 yt y θ t C Lx t y Lx y t θ t C wherec = τ λ x 0 x 4 y 0 y. The proof of Theorem s gve Appedces B ad C. Here we gve a detaled aalyss of the covergece rate. Substtutg ad τ 7 to the expressos for θ y ad θ x 8 ad assumgλδµ L we have θ x δµ L λδµ L δ λ θ y = λδµ /L λδµ L. L λδµ Sce the overall codto umber of the problem s L λδµ t s clear that θ y s a accelerated covergece rate. Next we exameθ x two specal cases. The case of δµ = 0but λ > 0. I ths case we have τ = L λ ad = λ L ad thus θ x = λ/l λ L θ y= λ/l λ L. Therefore we have θ = max{θ x θ y } λ L. Ths deed s a accelerated covergece rate recoverg the result of Chambolle & Pock 0; 06. The case of λ = 0 butδµ > 0. τ = Lµ δ ad = µ δ L ad I ths case we have θ x = δµ L δµ/lδ θ y δµ L. L Notce that δ µ s the codto umber of fax. Next we assumeµ L ad exame howθ x vares wthδ. Ifδ µ L meagf s badly codtoed the θ x δµ L 3 δµ/l = δµ 3L. Because the overall codto umber s L δ µ ths s a accelerated lear rate ad so sθ = max{θ x θ y }. Algorthm Adaptve Batch Prmal-Dual Ada-BPD put: problem costats λ δ L ad ˆµ > 0 tal potx 0 y 0 ad adaptato perodt. Compute τ adθ as 7 ad 8 usgµ = ˆµ fort = 0... do y t = prox f y t A x t x t = prox τg x t τa T y t x t = x t θx t x t f modtt == 0 the τθ = BPD-Adapt {P s D s } t s=t T ed f ed for Ifδ µ L meagf s mldly codtoed the θ x µ3 µ L 3 µ/l 3/ µ/l L. Ths represets a half-accelerated rate because the overall codto umber s L δ µ L3 µ. 3 Ifδ =.e.f s a smple quadratc fucto the θ x µ µ L µ/l L. Ths rate does ot have accelerato because the overall codto umber s L δ µ L µ. I summary the extet of accelerato the domatg factorθ x whch determesθ depeds o the relatve sze of δ ad µ /L.e. the relatve codtog betwee the fucto f ad the matrx A. I geeral we have full accelerato f δ µ /L. The theory predcts that the accelerato degrades as the fucto f gets better codtoed. However our umercal expermets we ofte observe accelerato eve f δ gets closer to. As explaed Chambolle & Pock 0 Algorthm s equvalet to a precodtoed ADMM. Deg & Y 06 characterzed codtos for ADMM to obta lear covergece wthout assumg both parts of the objectve fucto beg strogly covex but they dd ot derve covergece rate for ths case... Adaptve batch prmal-dual algorthms I practce t s ofte very hard to obta good estmate of the problem-depedet costats especally µ = λm A T A order to apply the algorthmc parameters specfed Theorem. Here we explore heurstcs that ca eable adaptve tug of such parameters whch ofte lead to much mproved performace practce. A key observato s that the covergece rate of the BPD algorthm chages mootocally wth the overall strog covexty parameter λ δµ regardless of the extet of 3

4 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Algorthm 3 BPD-Adapt smple heurstc put: prevous estmate ˆµ adapto perod T prmal ad dual objectve values{p s D s } t s=t T f P t D t < θ T P t T D t T the ˆµ := ˆµ else ˆµ := ˆµ/ ed f Compute τ adθ as 7 ad 8 usgµ = ˆµ output: ew parameters τ θ accelerato. I other words the larger λ δµ s the faster the covergece. Therefore f we ca motor the progress of the covergece ad compare t wth the predcted covergece rate Theorem the we ca adjust the algorthmc parameters to explot the fastest possble covergece. More specfcally f the observed covergece s slower tha the predcted covergece rate the we should reduce the estmate of µ; f the observed covergece s better tha the predcted rate the we ca try to crease µ for eve faster covergece. We formalze the above reasog a Adaptve BPD Ada-BPD algorthm descrbed Algorthm. Ths algorthm matas a estmate ˆµ of the true costatµ ad adjust t every T teratos. We use P t ad D t to represet the prmal ad dual objectve values at Px t ad Dy t respectvely. We gve two mplemetatos of the tug procedure BPD-Adapt: Algorthm 3 s a smple heurstc for tug the estmate ˆµ where the creasg ad decreasg factor ca be chaged to other values larger tha ; Algorthm 4 s a more robust heurstc. It does ot rely o the specfc covergece rate θ establshed Theorem. Istead t smply compares the curret estmate of objectve reducto rate ˆρ wth the prevous estmate ρ θ T. It also specfes a o-tug rage of chages ρ specfed by the terval[cc]. Oe ca also devse more sophstcated schemes; e.g. f we estmate that δµ < λ the o more tug s ecessary. The capablty of accessg both the prmal ad dual objectve values allows prmal-dual algorthms to have good estmate of the covergece rate whch eables effectve tug heurstcs. Automatc tug of prmal-dual algorthms have also bee studed by e.g. Maltsky & Pock 06 ad Goldste et al. 03 but wth dfferet goals. Fally we ote that Theorem oly establshes covergece rate for the dstace to the optmal pot ad the quatty Lx t y Lx y t whch s ot qute the dualty gappx t Dy t. Nevertheless same covergece rate ca also be establshed for the dualty gap see Algorthm 4 BPD-Adapt robust heurstc put: prevous rate estmate ρ > 0 = δˆµ perodt costatsc < adc > ad{p s D s } t s=t T Compute ew rate estmate ˆρ = Pt D t P t T D t T f ˆρ cρ the := ρ := ˆρ else f ˆρ cρ the := / else := ed f λ ρ := ˆρ λ = L τ = L Computeθ usg 8 or set θ = output: ew parameters τ θ Zhag & Xao 05 Secto. whch ca be used to better justfy the adapto procedure. 3. Radomzed prmal-dual algorthm I ths secto we come back to the ERM problem whch have a fte sum structure that allows the developmet of radomzed prmal-dual algorthms. I partcular we exted the stochastc prmal-dual coordate SPDC algorthm Zhag & Xao 05 to explot the strog covexty from data order to acheve faster covergece rate. Frst we show that by settg algorthmc parameters approprately the orgal SPDC algorthm may drectly beeft from strog covexty from the loss fucto. We ote that the SPDC algorthm s a specal case of the Adaptve SPDC Ada-SPDC algorthm preseted Algorthm 5 by settg the adapto perod T = ot performg ay adapto. The followg theorem s proved Appedx E. Theorem. Suppose Assumpto holds. Let x y be the saddle pot of the fucto L defed 3 ad R = max{ a... a }. If we set T = Algorthm 5 o adapto ad let τ = 4R λδµ = 4R adθ = max{θ x θ y } where θ x = τδµ 4δ λδµ 9 τλ θ y = // / 0 the we have τ [ λ E x t x ] 4 E[ y t y ] θ t C E [ Lx t y Lx y t ] θ t C wherec = τ λ x 0 x 4 y 0 y. The expectato E[ ] s take wth respect to the hstory of radom dces draw at each terato. 4

5 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Algorthm 5 Adaptve SPDC Ada-SPDC put: parameters τ θ > 0 tal potx 0 y 0 ad adaptato perod T. Set x 0 = x 0 fort = 0... do pckk {...} uformly at radom for {...} do f == k the y t k = prox φ k y t k at k xt else y t = y t ed f ed for x t = prox τg x t τ u t y t u t = u t yt k y t k a k x t = x t θx t x t k y t k a k f modtt = 0 the τθ = SPDC-Adapt {P t s D t s } T s=0 ed f ed for Below we gve a detaled dscusso o the expected covergece rate establshed Theorem. The cases of µ = 0 but λ > 0. τ = 4R λ ad = λ 4R ad θ x = τλ = 4R /λ I ths case we have θ y = // / = 8R /λ. Hece θ = θ y. These recover the parameters ad covergece rate of the stadard SPDC Zhag & Xao 05. The cases of µ > 0 but λ = 0. τ = 4Rµ δ ad = µ δ 4R ad θ x = τδµ δµ 4δ = θ y = 8R/µ δµ δ 8R I ths case we have 3R δµ/4r4δ. δµ. 8R Sce the objectve s R /-smooth ad δµ /-strogly covex θ y s a accelerated rate f δµ 8R otherwse θ y. Forθ x we cosder dfferet stuatos: If µ R the we have θ x δµ R whch s a accelerated rate. So sθ = max{θ x θ y }. If µ < R ad δ µ R the θ x δµ R whch represets accelerated rate. The terato complexty of SPDC s whch s better tha that of Õ R µ δ SVRG ths case whch sõ R δµ. Ifµ < R adδ µ R the we getθ x µ R. Ths s a half-accelerated rate because ths case SVRG would requreõr3 µ teratos whle terato complexty here sõr µ 3. If µ < R ad δ meag the φ s are well codtoed the we get θ x δµ R µ R whch s a o-accelerated rate. The correspodg terato complexty s the same as SVRG. 3.. Parameter adaptato for SPDC The SPDC-Adapt procedure called Algorthm 5 follows the same logcs as the batch adapto schemes Algorthms 3 ad 4 ad we omt the detals here. Oe thg we emphasze here s that the adaptato perod T s terms of epochs or umber of passes over the data. I addto we oly compute the prmal ad dual objectve values after each pass or every few passes because computg them exactly usually eed to take a full pass of the data. Aother mportat ssue s that ulke the batch case where the dualty gap usually decreases mootocally the dualty gap for radomzed algorthms ca fluctuate wldly. So stead of usg oly the two ed valuesp t T D t T ad P t D t we ca use more pots to estmate the covergece rate through a lear regresso. Suppose the prmal-dual values at the ed of each past T passes are {P0D0}{PD}...{PTDT} ad we eed to estmate ρ rate per pass such that Pt Dt ρ t P0 D0 t =...T. We ca tur t to a lear regresso problem after takg logarthm ad obta the estmate ˆρ through T Pt Dt logˆρ = T t=tlog P0 D0. The rest of the adapto procedure ca follow the robust scheme Algorthm 4. I practce we ca compute the prmal-dual values more sporadcally say every few passes ad modfy the regresso accordgly. 4. Dual-free Prmal-dual algorthms Compared wth prmal algorthms oe major dsadvatage of prmal-dual algorthms s the requremet of computg the proxmal mappg of the dual fuctof orφ whch may ot admt closed-formed soluto or effcet computato. Ths s especally the case for logstc regresso oe of the most popular loss fuctos used classfcato. La & Zhou 05 developed dual-free varats of prmal-dual algorthms that avod computg the dual proxmal mappg. Ther ma techque s to replace the Eucldea dstace the dual proxmal mappg wth a Bregma dvergece defed over the dual loss fucto tself. 5

6 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Algorthm 6 Dual-Free BPD Algorthm put: parameters τ θ > 0 tal potx 0 y 0 Set x 0 = x 0 adv 0 = f y 0 fort = 0... do v t = vt A x t y t = f v t x t = prox τg x t τa T y t x t = x t θx t x t ed for We show how to apply ths approach to solve the structured ERM problems cosdered ths paper. They ca also explot strog covexty from data f the algorthmc parameters are set approprately or adapted automatcally. 4.. Dual-free BPD algorthm Frst we cosder the batch settg. We replace the dual proxmal mappg computgy t Algorthm wth y t { =argm f y y T A x t Dyyt } y where D s the Bregma dvergece of a strctly covex kerel fucto h defed as D h yy t = hy hy t hy t y y t. Algorthm s obtaed the Eucldea settg wth hy = y ad Dyy t = y yt. Whle our covergece results would apply for arbtrary Bregma dvergece we oly focus o the case of usg f tself as the kerel because ths allows us to computey t very effcetly. The followg lemma explas the detals Cf. La & Zhou 05 Lemma. Lemma. Let the kerel h f the Bregma dvergeced. If we costruct a sequece of vectors{v t } such thatv 0 = f y 0 ad for allt 0 v t = vt A x t the the soluto to problem s y t = f v t. Proof. Supposev t = f y t true fort = 0 the Dyy t = f y f y t v tt y y t. The soluto to ca be wrtte as { y t = argm f y y T A x t f y v tt y } y { = argm f y } A x t vt T y y = argmax y = argmax y { T v t A x t y f y} } { v tt y f y = f v t where the last equalty we used the property of cojugate fucto whe f s strogly covex ad smooth. Moreover v t = f y t = f y t whch completes the proof. Accordg to Lemma we oly eed to provde tal pots such thatv 0 = f y 0 s easy to compute. We do ot eed to compute f y t drectly for ay t > 0 because t s ca be updated as v t. Cosequetly we ca updatey t the BPD algorthm usg the gradet f v t wthout the eed of dual proxmal mappg. The resultg dual-free algorthm s gve Algorthm 6. La & Zhou 05 cosdered a geeral settg whch does ot possess the lear predctor structure we focus o ths paper ad assumed that oly the regularzato g s strogly covex. Our followg result shows that dualfree prmal-dual algorthms ca also explot strog covexty from data wth approprate algorthmc parameters. Theorem 3. Suppose Assumpto holds ad let x y be the uque saddle pot ofldefed 6. If we set the parameters Algorthm 6 as τ = L λδµ = L λδµ 3 adθ = max{θ x θ y } where θ x = τδµ 4 τλ θ y = / 4 the we have τ λ x t x Dy y t θ t C Lx t y Lx y t θ t C where C = τ λ x 0 x Dy y 0. Theorem 3 s proved Appedces B ad D. Assumg λδµ L we have θ x δµ 6L λ λδµ L λδµ θ y 4L. Aga we ga sghts by cosder the specal cases: If δµ = 0 ad λ > 0 the θ y λ 4L ad θ x λ L. So θ = max{θ xθ y } s a accelerated rate. If δµ > 0 ad λ = 0 the θ y δµ 4L ad θ x δµ 6L. Thus θ = max{θ x θ y } δµ 6L s ot accelerated. Notce that ths cocluso does ot depeds o the relatve sze ofδ adµ /L ad ths s the major dfferece from the Eucldea case dscussed Secto. If both δµ > 0 ad λ > 0 the the extet of accelerato depeds o ther relatve sze. If λ s o the same order as δµ or larger the accelerated rate s obtaed. Ifλs much smaller thaδµ the the theory predcts o accelerato. 6

7 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Algorthm 7 Adaptve Dual-Free SPDC ADF-SPDC put: parameters τ θ > 0 tal potx 0 y 0 ad adaptato perod T. Set x 0 = x 0 adv 0 = φ y 0 for =... fort = 0... do pckk {...} uformly at radom for {...} do f == k the else v t k v t ed f ed for x t = prox τg = vt k at k xt y t k = φ k vt k = v t y t = y t x t τ u t y t u t = u t yt k y t k a k x t = x t θx t x t k y t k a k f modtt = 0 the τθ = SPDC-Adapt {P t s D t s } T s=0 ed f ed for 4.. Dual-free SPDC algorthm The same approach ca be appled to derve a Dualfree SPDC algorthm whch s descrbed Algorthm 7. It also cludes a parameter adapto procedure so we call t the adaptve dual-free SPDC ADF-SPDC algorthm. O related work Shalev-Shwartz & Zhag 06 ad Shalev-Shwartz 06 troduced dual-free SDCA. The followg theorem characterzes the choce of algorthmc parameters that ca explot strog covexty from data to acheve lear covergece proof gve Appedx F. Theorem 4. Suppose Assumpto holds. Let x y be the saddle pot of L defed 3 ad R = max{ a... a }. If we set T = Algorthm 7 o adapto ad let = 4R λδµ τ = 4R adθ = max{θ x θ y } where θ x = τδµ 4 λδµ 5 τλ θ y = // / 6 the we have τ λ E [ x t x ] 4 E[ Dy y t ] θ t C E [ Lx t y Lx y t ] θ t C where C = τ λ x 0 x Dy y 0. Below we dscuss the expected covergece rate establshed Theorem two specal cases. The cases of µ = 0 but λ > 0. τ = 4R λ ad = 4R λ ad θ x = τλ = 4R /λ I ths case we have θ y = // / = 8R /λ. These recover the covergece rate of the stadard SPDC algorthm Zhag & Xao 05. The cases ofµ > 0 but λ = 0. I ths case we have τ = 4Rµ δ = 4R µ δ ad θ x = τδµ δµ 4 = 3R δµ/4r4 θ y = // / = 8R/µ δ. We ote that the prmal fucto ow s R /-smooth ad δµ /-strogly covex. We dscuss the followg cases: If δµ > R the we have θ x δµ 8R ad θ y. Thereforeθ = max{θ xθ y }. Otherwse we have θ x δµ 64R ad θ y s of the same order. Ths s ot a accelerated rate ad we have the same terato complexty as SVRG. Fally we gve cocrete examples of how to compute the tal potsy 0 adv 0 such thatv 0 = φ y 0. For squared loss φ α = α b ad φ β = β b β. So v 0 = φ y 0 = y 0 b. For logstc regresso we have b { } ad φ α = log e bα. The cojugate fucto s φ β = b βlog b βb βlogb β f b β [ 0] ad otherwse. We ca choose y 0 = b adv 0 =0 such thatv 0 =φ y 0. For logstc regresso we have δ = 0 over the full doma of φ. However each φ s locally strogly covex bouded doma Bach 04: f z [ B B] the we kow δ = m z φ z exp B/4. Therefore t s well sutable for a adaptato scheme smlar to Algorthm 4 that do ot requre kowledge of etherδ orµ. 5. Prelmary expermets We preset prelmary expermets to demostrate the effectveess of our proposed algorthms. Frst we cosder batch prmal-dual algorthms for rdge regresso over a sythetc dataset. The data matrx A has szes = 5000 ad d = 3000 ad ts etres are sampled from multvarate ormal dstrbuto wth mea zero ad covarace matrx Σ j = j /. We ormalze all datasets 7

8 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Prmal optmalty gap Prmal AG BPD Opt-BPD Ada-BPD sythetcλ = / sythetcλ = 0 / sythetcλ = 0 4 / Fgure. Comparso of batch prmal-dual algorthms for a rdge regresso problem wth = 5000 ad d = such that a = a /max j a j to esure the maxmum orm of the data pots s. We use l -regularzato gx = λ/ x wth three choces of parameterλ: / 0 / ad 0 4 / whch represet the strog medum ad weak levels of regularzato respectvely. Fgure shows the performace of four dfferet algorthms: the accelerated gradet algorthm for solvg the prmal mmzato problem Prmal AG Nesterov 004 usg λ as strog covexty parameter the BPD algorthm Algorthm that usesλas the strog covexty parameter settg µ = 0 the optmal BPD algorthm Opt- BPD that uses µ = λ m A T A explctly computed from data ad the Ada-BPD algorthm Algorthm wth the robust adaptato heurstc Algorthm 4 wth T = 0 c = 0.95 ad c =.5. As expected the performace of Prmal-AG s very smlar to BPD wth the same strog covexty parameter. The Opt-BPD fully explots strog covexty from data thus has the fastest covergece. The Ada-BPD algorthm ca partally explot strog covexty from data wthout kowledge ofµ. Next we compare the DF-SPDC Algorthm 5 wthout adapto ad ADF-SPDC Algorthm 7 wth adapto agast several state-of-the-art radomzed algorthms for ERM: SVRG Johso & Zhag 03 SAGA Defazo et al. 04 Katyusha Alle-Zhu 06 ad the stadard SPDC method Zhag & Xao 05. For SVRG ad Katyusha a accelerated varat of SVRG we choose the varace reducto perod asm =. The step szes of all algorthms are set as ther orgal paper suggested. For Ada-SPDC ad ADF-SPDC we use the robust adaptato scheme wtht = 0c = 0.95 adc =.5. We frst compare these radomzed algorthms for rdge regresso over the same sythetc data descrbed above ad thecpuact data from the LbSVM webste. The results are show Fgure. Wth relatvely strog regularzato λ = / all methods perform smlarly as predcted by theory. For the sythetc dataset Wth λ = 0 / the regularzato s weaker but stll stroger tha the hdde strog covexty from data so the accelerated algorthms all varats of SPDC ad Katyusha perform better tha SVRG ad SAGA. Wth λ = 0 4 / t looks that the strog covexty from data domates the regularzato. Sce the o-accelerated algorthms SVRG ad SAGA may automatcally explot strog covexty from data they become faster tha the o-adaptve accelerated methods Katyusha SPDC ad DF-SPDC. The adaptve accelerated method ADF-SPDC has the fastest covergece. Ths shows that our theoretcal results whch predct o accelerato ths case ca be further mproved. Fally we compare these radomzed algorthm for logstc regresso o the rcv dataset from LbSVM webste ad aother sythetc dataset wth = 5000 ad d = 500 geerated smlarly as before but wth covarace matrx Σ j = j /00. For the stadard SPDC we solve the dual proxmal mappg usg a few steps of Newto s method to hgh precso. The dual-free SPDC algorthms oly use gradets of the logstc fucto. The results are preseted Fgure 3. for both datasets the strog covexty from data s very weak or oe so the accelerated algorthms performs better. 6. Coclusos We have show that prmal-dual frst-order algorthms are capable of explotg strog covexty from data f the algorthmc parameters are chose approprately. Whle they may depeds o problem depedet costats that are ukow we developed heurstcs for adaptg the parameters o the fly ad obtaed mproved performace expermets. It looks that our theoretcal characterzato of the covergece rates ca be further mproved as our expermets ofte demostrate sgfcat accelerato cases where our theory does ot predct accelerato. cjl/lbsvm/ 8

9 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Prmal optmalty gap SVRG SAGA Katyusha SPDC DF-SPDC ADF-SPDC sythetcλ = / sythetcλ = 0 / sythetcλ = 0 4 / Prmal optmalty gap SVRG SAGA Katyusha SPDC DF-SPDC ADF-SPDC cpuactλ = / cpuactλ = 0 / cpuactλ = 0 4 / Fgure. Comparso of radomzed algorthms for rdge regresso problems. Prmal optmalty gap SVRG SAGA Katyusha SPDC DF-SPDC ADF-SPDC sythetcλ = / sythetcλ = 0 / sythetcλ = 0 4 / Prmal optmalty gap SVRG SAGA Katyusha SPDC DF-SPDC ADF-SPDC rcvλ = / rcvλ = 0 / rcvλ = 0 4 / Fgure 3. Comparso of radomzed algorthms for logstc regresso problems. 9

10 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Refereces Alle-Zhu Zeyua. Katyusha: Accelerated varace reducto for faster sgd. ArXv e-prt Bach Fracs. Adaptvty of averaged stochastc gradet descet to local strog covexty for logstc regresso. Joural of Mache Learg Research 5: Balamuruga Palaappa ad Bach Fracs. Stochastc varace reducto methods for saddle-pot problems. I Advaces Neural Iformato Processg Systems NIPS 9 pp Bertsekas Dmtr P. Icremetal gradet subgradet ad proxmal methods for covex optmzato: A survey. I Sra Suvrt Nowoz Sebasta ad Wrght Stephe J. eds. Optmzato for Mache Learg chapter 4 pp MIT Press 0. Chambolle Ato ad Pock Thomas. A frst-order prmal-dual algorthm for covex problems wth applcatos to magg. Joural of Mathematcal Imagg ad Vso 40: Chambolle Ato ad Pock Thomas. O the ergodc covergece rates of a frst-order prmal dual algorthm. Mathematcal Programmg Seres A 59: Defazo Aaro Bach Fracs ad Lacoste-Jule Smo. Saga: A fast cremetal gradet method wth support for o-strogly covex composte objectves. I Advaces Neural Iformato Processg Systems pp Deg We ad Y Wotao. O the global ad lear covergece of the geeralzed alteratg drecto method of multplers. Joural of Scetfc Computg 663: Fercoq Olver ad Rchtárk Peter. Accelerated parallel ad proxmal coordate descet. SIAM Joural o Optmzato 54: Goldste Tom L M Yua Xaomg Esser Ere ad Barauk Rchard. Adaptve prmal-dual hybrd gradet methods for saddle-pot problems. arxv preprt arxv: Johso Re ad Zhag Tog. Acceleratg stochastc gradet descet usg predctve varace reducto. I Advaces Neural Iformato Processg Systems pp La Guaghu ad Zhou Y. A optmal radomzed cremetal gradet method. arxv preprt arxv: L Hogzhou Maral Jule ad Harchaou Zad. A uversal catalyst for frst-order optmzato. I Advaces Neural Iformato Processg Systems pp a. L Qhag Lu Zhaosog ad Xao L. A accelerated radomzed proxmal coordate gradet method ad ts applcato to regularzed emprcal rsk mmzato. SIAM Joural o Optmzato 54: b. Maltsky Yura ad Pock Thomas. A frst-order prmal-dual algorthm wth lesearch. arxv preprt arxv: Nedc Agela ad Bertsekas Dmtr P. Icremetal subgradet methods for odfferetable optmzato. SIAM Joural o Optmzato : Nesterov Y. Itroductory Lectures o Covex Optmzato: A Basc Course. Kluwer Bosto 004. Nesterov Yu. Effcecy of coordate descet methods o huge-scale optmzato problems. SIAM Joural o Optmzato : Rchtárk Peter ad Takáč Mart. Iterato complexty of radomzed block-coordate descet methods for mmzg a composte fucto. Mathematcal Programmg 44-: Roux Ncolas L Schmdt Mark ad Bach Fracs. A stochastc gradet method wth a expoetal covergece rate for fte trag sets. I Advaces Neural Iformato Processg Systems pp Shalev-Shwartz Sha. Sdca wthout dualty regularzato ad dvdual covexty. I Proceedgs of The 33rd Iteratoal Coferece o Mache Learg pp Shalev-Shwartz Sha ad Zhag Tog. Stochastc dual coordate ascet methods for regularzed loss mmzato. Joural of Mache Learg Research 4Feb: Shalev-Shwartz Sha ad Zhag Tog. Accelerated proxmal stochastc dual coordate ascet for regularzed loss mmzato. Mathematcal Programmg 55-: Xao L ad Zhag Tog. A proxmal stochastc gradet method wth progressve varace reducto. SIAM Joural o Optmzato 44: Zhag Yuche ad Xao L. Stochastc prmal-dual coordate method for regularzed emprcal rsk mmzato. I Proceedgs of The 3d Iteratoal Coferece o Mache Learg pp

11 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms I the followg appedces we provde detaled proofs of theorems stated the ma paper. I Secto A we frst prove a basc equalty whch s useful throughout the rest of the covergece aalyss. Secto B cotas geeral aalyss of the batch prmal-dual algorthm that are commo for provg both Theorem ad Theorem 3. Sectos C D E ad F gve proofs for Theorem Theorem 3 Theorem ad Theorem 4 respectvely. A. A basc lemma Lemma. Let h be a strctly covex fucto ad D h be ts Bregma dvergece. Suppose ψ s ν-strogly covex wth respect to D h ad/δ-smooth wth respect to the Eucldea orm ad ŷ = argm y C { ψyηdh yȳ } where C s a compact covex set that les wth the relatve teror of the domas of h ad ψ.e. both h ad ψ are dfferetable over C. The for ay y C ad ρ [0 ] we have ψyηd h y x ψŷηd h ŷȳ η ρν D h yŷ ρδ ψy ψŷ. Proof. The mmzer ŷ satsfes the followg frst-order optmalty codto: ψŷη D h ŷȳ y ŷ 0 y C. Here D deotes partal gradet of the Bregma dvergece wth respect to ts frst argumet.e. Dŷ ȳ = hŷ hȳ. So the above optmalty codto s the same as ψŷη hŷ hȳ y ŷ 0 y C. 7 Sceψ sν-strogly covex wth respect tod h ad/δ-smooth we have ψy ψŷ ψŷy ˆx νd h yŷ ψy ψŷ ψŷy ŷ δ ψy ψŷ. For the secod equalty see e.g. Theorem..5 Nesterov 004. Multplyg the two equaltes above by ρ adρrespectvely ad addg them together we have ψy ψŷ ψŷy ŷ ρνd h yŷ ρδ ψy ψŷ. The Bregma dvergeced h satsfes the followg equalty: D h yȳ = D h yŷd h ŷȳ hŷ hȳ y ŷ. We multply ths equalty byη ad add t to the last equalty to obta ψyηd h yȳ ψŷηd h yŷ η ρν D h ŷȳ ρδ ψy ψŷ ψŷη hŷ hȳ y ŷ. Usg the optmalty codto 7 the last term of er product s oegatve ad thus ca be dropped whch gves the desred equalty. B. Commo Aalyss of Batch Prmal-Dual Algorthms We cosder the geeral prmal-dual update rule as:

12 Iterato: ˆxŷ = PD τ xȳ xỹ Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms ˆx = arg m x R d ŷ = arg m y R { gxỹ T Ax τ Each terato of Algorthm s equvalet to the followg specfcato ofpd τ : } x x 8 {f y y T A x Dyȳ }. 9 ˆx = x t x = x t x = x t θx t x t ŷ = y t ȳ = y t ỹ = y t. 0 Besdes Assumpto we also assume that f sν-strogly covex wth respect to a kerel fuctoh.e. whered h s the Bregma dvergece defed as f y f y f yy y νd h y y D h y y = hy hy hyy y. We assume thaths -strogly covex ad/δ -smooth. Depedg o the kerel fuctoh ths assumpto of may mpose addtoal restrctos o f. I ths paper we are mostly terested two specal cases: hy = / y ad hy = f y for the latter we always have ν =. From ow o we wll omt the subscrpt h ad use D deote the Bregma dvergece. Uder the above assumptos ay solutox y to the saddle-pot problem 6 satsfes the optmalty codto: The optmalty codtos for the updates descrbed equatos 8 ad 9 are A T y gx Ax = f y. A T ỹ x ˆx gˆx 3 τ A x hŷ hȳ = f ŷ. 4 Applyg Lemma to the dual mmzato step 9 wth ψy = f y y T A x η = / y = y ad ρ = / we obta f y y T A x Dy ȳ f ŷ ŷ T A x Dŷȳ ν Dy ŷ δ f y f ŷ. 5 4 Smlarly for the prmal mmzato step 8 we have settgρ = 0 gx ỹ T Ax τ x x gˆxỹ T Aˆx τ ˆx x τ λ x ˆx. 6 Combg the two equaltes above wth the deftolxy = gxy T Ax f y we get Lˆxy Lx ŷ = gˆxy T Aˆx f y gx ŷ T Ax f ŷ τ x x Dy ȳ τ λ x ˆx ν Dy ŷ τ ˆx x Dŷȳ δ f y f ŷ 4 y T Aˆx ŷ T Ax ỹ T Ax ỹ T Aˆx y T A xŷ T A x.

13 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms We ca smplfy the er product terms as y T Aˆx ŷ T Ax ỹ T Ax ỹ T Aˆx y T A xŷ T A x = ŷ ỹ T Aˆx x ŷ y T Aˆx x. Rearragg terms o the two sdes of the equalty we have τ x x Dy ȳ Lˆxy Lx ŷ τ λ x ˆx ν Dy ŷ τ ˆx x Dŷȳ δ f y f ŷ 4 ŷ y T Aˆx x ŷ ỹ T Aˆx x. Applyg the substtutos 0 yelds τ x x t Dy y t Lx t y Lx y t τ λ x x t ν Dy y t τ xt x t Dyt y t δ f y f y t 4 y t y T A x t x t θx t x t. 7 We ca rearrage the er product term 7 as y t y T A x t x t θx t x t = y t y T Ax t x t θy t y T Ax t x t θy t y t T Ax t x t. Usg the optmalty codtos ad 4 we ca also boud f y f y t : = f y f y t Ax A x t θx t x t hy t hy t α Ax x t α θax t x t hy t hy t whereα >. Wth the deftoµ = λ m A T A we also have Ax x t µ x x t. Combg them wth the equalty 7 leads to τ x x t Dy y t θy t y T Ax t x t Lx t y Lx y t τ λ x x t ν Dy y t y t y T Ax t x t τ xt x t Dyt y t θy t y t T Ax t x t δµ α 4 x x t α δ θax t x t hy 4 t hy t. 8 3

14 C. Proof of Theorem Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Let the kerel fucto behy = / y. I ths case we havedy y = / y y ad hy = y. Moreover = δ = adν =. Therefore the equalty 8 becomes τ δµ x x t α y y t θy t y T Ax t x t Lx t y Lx y t τ λ x x t y y t y t y T Ax t x t τ xt x t yt y t θy t y t T Ax t x t α δ θax t x t 4 yt y t. 9 Next we derve aother form of the uderled tems above: yt y t θy t y t T Ax t x t = yt y t θ yt y t T Ax t x t = θax t x t yt y t θ Ax t x t θax t x t yt y t θ L x t x t where the last equalty we used A L ad hece Ax t x t L x t x t. Combg wth equalty 9 we have τ δµ x t x α yt y θy t y T Ax t x t θ L x t x t Lx t y Lx y t τ λ x t x y t y y t y T Ax t x t τ xt x t θax α δ t x t 4 yt y t. 30 We ca remove the last term the above equalty as log as ts coeffcet s oegatve.e. α δ 4 0. I order to maxmze /α we take the equalty ad solve for the largest value ofαallowed whch results α = δ α = δ. Applyg these values 30 gves τ δµ x t x δ yt y θy t y T Ax t x t θ L x t x t Lx t y Lx y t τ λ x t x y t y y t y T Ax t x t τ xt x t. 3 4

15 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms We use t to deote the last row 3. Equvaletly we defe t = τ λ x x t y y t y t y T Ax t x t τ xt x t = τ λ x x t 4 y y t [ ] x t x t T [ y y t τ I ][ ] AT x t x t A y y t. The quadratc form the last term s oegatve f the matrx M = [ τ I AT A ] s postve semdefte for whch a suffcet codto sτ /L. Uder ths codto t τ λ x x t 4 y y t 0. 3 If we ca to chooseτ ad so that τ δµ δ θ τ λ θ θ L θ τ 33 the accordg to 3 we have t Lx t y Lx y t θ t. Because t 0 adlx t y Lx y t 0 for ayt 0 we have t θ t whch mples ad t θ t 0 Lx t y Lx y t θ t 0. Letθ x adθ y be two cotracto factors determed by the frst two equaltes 33.e. / θ x = τ δµ δ τ λ = θ y = / = /. τδµ δ τλ The we ca let θ = max{θ x θ y }. We ote that ayθ < would satsfy the last codto 33 provded that τ = L whch also makes the matrxm postve semdefte ad thus esures the equalty 3. Amog all possble parsτ that satsfy τ = /L we choose whch gve the desred results of Theorem. τ = L λδµ = λδµ 34 L 5

16 D. Proof of Theorem 3 If we chooseh = f the Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms h s-strogly covex ad/δ-smooth.e. = adδ = δ; f s-strogly covex wth respect toh.e.ν =. For coveece we repeat equalty 8 here: τ x x t Dy y t θy t y T Ax t x t Lx t y Lx y t τ λ x x t ν Dy y t y t y T Ax t x t τ xt x t Dyt y t θy t y t T Ax t x t δµ α 4 x x t α δ θax t x t hy 4 t hy t. 35 We frst boud the Bregma dvergece Dy t y t usg the assumpto that the kerel h s -strogly covex ad /δ-smooth. Usg smlar argumets as the proof of Lemma we have for ayρ [0] Dy t y t = hy t hy t hy t y t y t ρ yt y t ρ δ hy t hy t. 36 For ayβ > 0 we ca lower boud the er product term I addto we have θy t y t T Ax t x t β yt y t θ L β xt x t. θax t x t hy t hy t θ L x t x t hy t hy t. Combg these bouds wth 35 ad 36 wth ρ = / we arrve at τ δµ α θ L L β α δθ x x t Dy y t θy t y T Ax t x t x t x t Lx t y Lx y t τ λ x x t Dy y t y t y T Ax t x t 4 β δ y t y t 4 α δ hy t hy t τ xt x t. 37 We chooseαadβ 37 to zero out the coeffcets of y t y t ad hy t hy t : α = β =. 6

17 The the equalty 37 becomes τ δµ 4 θ L Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms x x t Dy y t θy t y T Ax t x t δθ L 4 x t x t Lx t y Lx y t τ λ x x t Dy y t y t y T Ax t x t τ xt x t. The coeffcet of x t x t ca be bouded as θ L δθ L 4 = 4 δ θ L = 4δ 4 θ L < θ L where the equalty we used δ. Therefore we have x τ δµ x t 4 Dy y t θy t y T Ax t x t θ L x t x t Lx t y Lx y t τ λ x x t Dy y t y t y T Ax t x t τ xt x t. We use t to deote the last row of the above equalty. Equvaletly we defe t = τ λ x x t Dy y t y t y T Ax t x t τ xt x t. Scehs-strogly covex we havedy y t y y t ad thus t = τ λ x x t Dy y t τ λ x x t Dy y t The quadratc form the last term s oegatve fτ /L. Uder ths codto t yt y y t y T Ax t x t τ xt x t [ ] x t x t T [ y y t τ I ][ ] AT x t x t A y y t. τ λ x x t Dy y t If we ca to chooseτ ad so that τ δµ 4 θ τ λ θ θ L θ τ 39 the we have t Lx t y Lx y t θ t. Because t 0 adlx t y Lx y t 0 for ayt 0 we have t θ t whch mples t θ t 0 7

18 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms ad Lx t y Lx y t θ t 0. To satsfy the last codto 39 ad also esure the equalty 38 t suffces to have τ 4L. We choose τ = L λδµ = λδµ. L Wth the above choce ad assumgλδµ L we have θ y = For the cotracto factor over the prmal varables we have = / = λδµ /4L λδµ. 4L θ x = τ δµ 4 τδµ 4 δµ 44L τ λ = τλ = τλ δµ 6L λ L λδµ. Ths fshes the proof of Theorem 3. E. Proof of Theorem We cosder the SPDC algorthm the Eucldea case wthhx = / x. The correspodg batch case aalyss s gve Secto C. For each=... let ỹ be ỹ = argm y Based o the frst-order optmalty codto we have Also sce y mmzesφ y y a x we have By Lemma wth ρ = / we have y a x t φ y ad re-arragg terms we get { φ y } y yt y a x t. a x t ỹ y t φ ỹ. yt y y t y ỹ y a x φ y. ỹ y φ ỹ ỹ a x t ỹ y t δ 4 φ ỹ φ y ỹ y t ỹ y a x t φ ỹ φ y δ 4 φ ỹ φ y. 40 8

19 Notce that Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms E[y t ] = ỹ y t E[y t y ] = ỹ y y t ] = ỹ y t E[y t yt y E[φ y t ] = φ ỹ φ y t. Plug the above relatos to 40 ad dvde both sdes by we have y t y 4 4 ad summg over =... we get 4 where u t = y t y = E[y t y ] E[y t y t ] yt y E[φ y t δ 4 y t a u t = E[yt ] φ y t φ y t a x t x ỹ y t a x t y t ] φ y E[ y t y ] E[ yt y t ] 4 φ k yt k φ k yt k = u t u t u t u x t δ 4 Ax x t ỹ yt = y t a ad u = O the other had scex t mmzes the τ λ-strogly covex objectve gx u t u t u t x x xt τ we ca apply Lemma wth ρ = 0 to obta gx u t u t u t x xt x gx t u t u t u t x t xt x t ad re-arragg terms we get x t x τ τ λ τ τ φ yt φ y y a. = τ λ x t x E[ x t x ] E[ xt x t ] E[gx t gx ] τ E[ u t u t u t x t x ]. 9

20 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Also otce that Lx t y Lx y Lx y Lx y t Lx y Lx y t = φ yt φ y φ k yt k φ k yt k gxt gx = u x t u t x u t u t x. Combg everythg together we have x t x τ 4 τ λ E[ x t x ] y t y Lx y Lx y t E[ y t y ] E[ xt x t ] E[ yt y t ] 4 τ E[Lx t y Lx y Lx y Lx y t ] E[ u t u u t u t x t x t ] δ 4 Ax x t ỹ yt. Next we otce that δ 4 Ax x t E[yt ] y t for someα > ad Ax x t µ x x t ad θaxt x t ỹ yt = δ 4 Ax x t θax t x t ỹ yt δ Ax x t α 4 α δ 4 θaxt x t ỹ yt θ Ax t x t ỹ yt θ L x t x t E[ yt y t ]. We follow the same reasog as the stadard SPDC aalyss u t u u t u t x t x t = yt y T Ax t x t y t y t T Ax t x t θy t y t T Ax t x t ad usg Cauchy-Schwartz equalty we have ad y t y t T Ax t x t yt y t T A /τ yt y t /τr y t y t T Ax t x t yt y t T A /τ yt y t /τr. θyt y T Ax t x t xt x t 8τ xt x t 8τ 0

21 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Thus we get u t u u t u t x t x t yt y T Ax t x t yt y t /4τR xt x t 8τ Puttg everythg together we have τ /αδµ x t x 4 Lx y Lx y t θ τ λ E[ x t x ] 4 θ xt x t. 8τ 4 8τ α θδl E[Lx t y Lx y Lx y Lx y t ] τ E[ x t x t ] 8τ 4R τ α δ E[ y t y t ]. θyt y T Ax t x t y t y θlx t y Lx y x t x t θyt y T Ax t x t E[ y t y ] E[yt y T Ax t x t ] If we choose the parameters as α = τ = 4δ 6R the we kow 4R τ α δ = 4 8 > 0 ad α θδl L 8 R 8 56τ thus 8τ α θδl 3 8τ. I addto we have α = 4δ. Fally we obta τ δµ x t x 4 4δ y t y θlx t y Lx y 4 Lx y Lx y t 3 θ 8τ xt x t θyt y T Ax t x t τ λ E[ x t x ] E[ y t y ] E[yt y T Ax t x t ] 4 E[Lx t y Lx y Lx y Lx y t ] 3 8τ E[ xt x t ]. Now we ca defe θ x ad θ y as the ratos betwee the coeffcets the x-dstace ad y-dstace terms ad let θ = max{θ x θ y } as before. Choosg the step-sze parameters as λδµ gves the desred result. τ = 4R λδµ = 4R

22 F. Proof of Theorem 4 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms I ths settg for-th coordate of the dual varablesy we chooseh = φ let ad defe For =... let ỹ be D y y = φ y φ y φ y y y ỹ = argm y Dyy = Based o the frst-order optmalty codto we have Also scey mmzesφ y y a x we have D y y. = { } φ y D yy t y a x t. a x t φ ỹ φ y t φ ỹ. a x φ y. Usg Lemma wthρ = / we obta y a x t φ y D y yt D y ỹ φ ỹ ỹ a x t ad rearragg terms we get D y yt D ỹ y t δ 4 φ ỹ φ y D y ỹ D ỹ y t ỹ y a x t φ ỹ φ y δ 4 φ ỹ φ y. 4 Wth..d. radom samplg at each terato we have the followg relatos: E[y t ] = ỹ y t E[D y t y ] =D ỹ y Dy t y E[D y t y t ] = D ỹ y t E[φ yt ] = φ ỹ φ yt. Pluggg the above relatos to 4 ad dvdg both sdes by we have D y t y D y t y E[D y t E[y t y t ] yt y y t ] a x t E[φ y t ] φ y t φ y t φ y δ a x t x φ ỹ φ y t 4

23 ad summg over =... we get Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Dy t y E[Dy t y ] E[Dyt y t ] φ ky t k φ ky t k = u t u t u t u x t δ 4 Ax x t φ whereφ y t s a -dmesoal vector such that the-th coordate s φ y t φ y ỹ φ y t ad u t = = y t a u t = [φ y t ] = φ y t = y t a ad u = y a. = O the other had scex t mmzes a τ λ-strogly covex objectve gx u t u t u t x x xt τ we ca apply Lemma wth ρ = 0 to obta gx u t u t u t x xt x gx t u t u t u t x t xt x t ad rearragg terms we get Notce that x t x τ τ λ τ τ τ λ x t x E[ x t x ] E[ xt x t ] E[gx t gx ] τ E[ u t u t u t x t x ]. Lx t y Lx y Lx y Lx y t Lx y Lx y t = φ yt φ y φ k yt k φ k yt k gxt gx = u x t u t x u t u t x so x t x τ τ λ E[ x t x ] Dy t y Lx y Lx y t E[Dy t y ] E[ xt x t ] E[Dyt y t ] τ E[Lx t y Lx y Lx y Lx y t ] E[ u t u u t u t x t x t ] δ 4 Ax x t φ ỹ φ y t. 3

24 Next we have δ 4 Ax x t φ for ayα > ad ad θaxt x t φ Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms ỹ φ y t ỹ φ y t = δ 4 Ax x t θax t x t φ δ Ax x t α 4 α δ 4 θaxt x t φ Ax x t µ x x t Followg the same reasog as the stadard SPDC aalyss we have ỹ φ y t ỹ φ y t θ Ax t x t φ ỹ φ y t ] u t u u t u t x t x t = yt y T Ax t x t θ L x t x t E[ φ y t φ y t ]. y t y t T Ax t x t θy t y t T Ax t x t ad usg Cauchy-Schwartz equalty we have ad Thus we get y t y t T Ax t x t yt y t T A /τ yt y t /τr y t y t T Ax t x t yt y t T A /τ yt y t /τr. u t u u t u t x t x t yt y T Ax t x t yt y t /4τR xt x t 8τ θ xt x t. 8τ Also we ca lower boud the termdy t y t usg Lemma wthρ = /: Dy t y t = = = φ yt φ yt φ y t θyt y T Ax t x t xt x t 8τ xt x t 8τ y t θyt y T Ax t x t y t yt y t δ φ y t φ y t = yt y t δ φ y t φ y t. 4

25 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Combg everythg above together we have τ /αδµ x t x 4 Lx y Lx y t θ 8τ α θδl τ λ E[ x t x ] Dy t y θlx t y Lx y x t x t θyt y T Ax t x t E[Dy t y ] E[yt y T Ax t x t ] E[Lx t y Lx y Lx y Lx y t ] τ E[ x t x t ] 8τ 4R τ E[ y t y t ] δ α δ E[ φ y t φ y t ]. If we choose the parameters as the we kow ad ad thus I addto we have α θδl α = 4 τ = 6R 4R τ = 4 > 0 δ α δ = δ δ 8 > 0 δl 8 δr δ 8 56τ 56τ 8τ α θδl 3 8τ. α = 4. Fally we obta τ δµ x t x 44 Dy t y θlx t y Lx y Lx y Lx y t 3 θ 8τ xt x t θyt y T Ax t x t τ λ E[ x t x ] E[ y t y ] E[yt y T Ax t x t ] E[Lx t y Lx y Lx y Lx y t ] 3 8τ E[ xt x t ]. As before we ca defe θ x ad θ y as the ratos betwee the coeffcets the x-dstace ad y-dstace terms ad let θ = max{θ x θ y }. The choosg the step-sze parameters as gves the desred result. τ = 4R λδµ = λδµ 4R 5

arxiv: v1 [cs.lg] 22 Feb 2015

arxiv: v1 [cs.lg] 22 Feb 2015 SDCA wthout Dualty Sha Shalev-Shwartz arxv:50.0677v cs.lg Feb 05 Abstract Stochastc Dual Coordate Ascet s a popular method for solvg regularzed loss mmzato for the case of covex losses. I ths paper we

More information

Dimensionality Reduction and Learning

Dimensionality Reduction and Learning CMSC 35900 (Sprg 009) Large Scale Learg Lecture: 3 Dmesoalty Reducto ad Learg Istructors: Sham Kakade ad Greg Shakharovch L Supervsed Methods ad Dmesoalty Reducto The theme of these two lectures s that

More information

An Accelerated Proximal Coordinate Gradient Method

An Accelerated Proximal Coordinate Gradient Method A Accelerated Proxmal Coordate Gradet Method Qhag L Uversty of Iowa Iowa Cty IA USA qhag-l@uowaedu Zhaosog Lu Smo Fraser Uversty Buraby BC Caada zhaosog@sfuca L Xao Mcrosoft Research Redmod WA USA lxao@mcrosoftcom

More information

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture) CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

Bayes (Naïve or not) Classifiers: Generative Approach

Bayes (Naïve or not) Classifiers: Generative Approach Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg

More information

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme:

More information

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971)) art 4b Asymptotc Results for MRR usg RESS Recall that the RESS statstc s a specal type of cross valdato procedure (see Alle (97)) partcular to the regresso problem ad volves fdg Y $,, the estmate at the

More information

Functions of Random Variables

Functions of Random Variables Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,

More information

Rademacher Complexity. Examples

Rademacher Complexity. Examples Algorthmc Foudatos of Learg Lecture 3 Rademacher Complexty. Examples Lecturer: Patrck Rebesch Verso: October 16th 018 3.1 Itroducto I the last lecture we troduced the oto of Rademacher complexty ad showed

More information

Lecture 3 Probability review (cont d)

Lecture 3 Probability review (cont d) STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto

More information

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution: Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

The Mathematical Appendix

The Mathematical Appendix The Mathematcal Appedx Defto A: If ( Λ, Ω, where ( λ λ λ whch the probablty dstrbutos,,..., Defto A. uppose that ( Λ,,..., s a expermet type, the σ-algebra o λ λ λ are defed s deoted by ( (,,...,, σ Ω.

More information

Simple Linear Regression

Simple Linear Regression Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato

More information

ESS Line Fitting

ESS Line Fitting ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here

More information

Introduction to local (nonparametric) density estimation. methods

Introduction to local (nonparametric) density estimation. methods Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest

More information

Analysis of Lagrange Interpolation Formula

Analysis of Lagrange Interpolation Formula P IJISET - Iteratoal Joural of Iovatve Scece, Egeerg & Techology, Vol. Issue, December 4. www.jset.com ISS 348 7968 Aalyss of Lagrage Iterpolato Formula Vjay Dahya PDepartmet of MathematcsMaharaja Surajmal

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS Numercal Computg -I UNIT SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS Structure Page Nos..0 Itroducto 6. Objectves 7. Ital Approxmato to a Root 7. Bsecto Method 8.. Error Aalyss 9.4 Regula Fals Method

More information

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines It J Cotemp Math Sceces, Vol 5, 2010, o 19, 921-929 Solvg Costraed Flow-Shop Schedulg Problems wth Three Maches P Pada ad P Rajedra Departmet of Mathematcs, School of Advaced Sceces, VIT Uversty, Vellore-632

More information

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy Bouds o the expected etropy ad KL-dvergece of sampled multomal dstrbutos Brado C. Roy bcroy@meda.mt.edu Orgal: May 18, 2011 Revsed: Jue 6, 2011 Abstract Iformato theoretc quattes calculated from a sampled

More information

Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization

Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization Stochastc Prmal-Dual Coordate Method for Regularzed Emprcal Rsk Mmzato Yuche Zhag L Xao September 24 Abstract We cosder a geerc covex optmzato problem assocated wth regularzed emprcal rsk mmzato of lear

More information

A tighter lower bound on the circuit size of the hardest Boolean functions

A tighter lower bound on the circuit size of the hardest Boolean functions Electroc Colloquum o Computatoal Complexty, Report No. 86 2011) A tghter lower boud o the crcut sze of the hardest Boolea fuctos Masak Yamamoto Abstract I [IPL2005], Fradse ad Mlterse mproved bouds o the

More information

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem Joural of Amerca Scece ;6( Cubc Nopolyomal Sple Approach to the Soluto of a Secod Order Two-Pot Boudary Value Problem W.K. Zahra, F.A. Abd El-Salam, A.A. El-Sabbagh ad Z.A. ZAk * Departmet of Egeerg athematcs

More information

Communication-Efficient Distributed Primal-Dual Algorithm for Saddle Point Problems

Communication-Efficient Distributed Primal-Dual Algorithm for Saddle Point Problems Commucato-Effcet Dstrbuted Prmal-Dual Algorthm for Saddle Pot Problems Yaodog Yu Nayag Techologcal Uversty ydyu@tu.edu.sg Sul Lu Nayag Techologcal Uversty lusl@tu.edu.sg So Jal Pa Nayag Techologcal Uversty

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions Iteratoal Joural of Computatoal Egeerg Research Vol, 0 Issue, Estmato of Stress- Stregth Relablty model usg fte mxture of expoetal dstrbutos K.Sadhya, T.S.Umamaheswar Departmet of Mathematcs, Lal Bhadur

More information

Lecture 02: Bounding tail distributions of a random variable

Lecture 02: Bounding tail distributions of a random variable CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome

More information

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem CS86. Lecture 4: Dur s Proof of the PCP Theorem Scrbe: Thom Bohdaowcz Prevously, we have prove a weak verso of the PCP theorem: NP PCP 1,1/ (r = poly, q = O(1)). Wth ths result we have the desred costat

More information

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods Malaysa Umodalty Joural Tests of Mathematcal for Global Optmzato Sceces (): of 05 Sgle - 5 Varable (007) Fuctos Usg Statstcal Methods Umodalty Tests for Global Optmzato of Sgle Varable Fuctos Usg Statstcal

More information

LECTURE 24 LECTURE OUTLINE

LECTURE 24 LECTURE OUTLINE LECTURE 24 LECTURE OUTLINE Gradet proxmal mmzato method Noquadratc proxmal algorthms Etropy mmzato algorthm Expoetal augmeted Lagraga mehod Etropc descet algorthm **************************************

More information

Distributed Accelerated Proximal Coordinate Gradient Methods

Distributed Accelerated Proximal Coordinate Gradient Methods Dstrbuted Accelerated Proxmal Coordate Gradet Methods Yog Re, Ju Zhu Ceter for Bo-Ispred Computg Research State Key Lab for Itell. Tech. & Systems Dept. of Comp. Sc. & Tech., TNLst Lab, Tsghua Uversty

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

CHAPTER 4 RADICAL EXPRESSIONS

CHAPTER 4 RADICAL EXPRESSIONS 6 CHAPTER RADICAL EXPRESSIONS. The th Root of a Real Number A real umber a s called the th root of a real umber b f Thus, for example: s a square root of sce. s also a square root of sce ( ). s a cube

More information

Median as a Weighted Arithmetic Mean of All Sample Observations

Median as a Weighted Arithmetic Mean of All Sample Observations Meda as a Weghted Arthmetc Mea of All Sample Observatos SK Mshra Dept. of Ecoomcs NEHU, Shllog (Ida). Itroducto: Iumerably may textbooks Statstcs explctly meto that oe of the weakesses (or propertes) of

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

Research Article A New Iterative Method for Common Fixed Points of a Finite Family of Nonexpansive Mappings

Research Article A New Iterative Method for Common Fixed Points of a Finite Family of Nonexpansive Mappings Hdaw Publshg Corporato Iteratoal Joural of Mathematcs ad Mathematcal Sceces Volume 009, Artcle ID 391839, 9 pages do:10.1155/009/391839 Research Artcle A New Iteratve Method for Commo Fxed Pots of a Fte

More information

PROJECTION PROBLEM FOR REGULAR POLYGONS

PROJECTION PROBLEM FOR REGULAR POLYGONS Joural of Mathematcal Sceces: Advaces ad Applcatos Volume, Number, 008, Pages 95-50 PROJECTION PROBLEM FOR REGULAR POLYGONS College of Scece Bejg Forestry Uversty Bejg 0008 P. R. Cha e-mal: sl@bjfu.edu.c

More information

Objectives of Multiple Regression

Objectives of Multiple Regression Obectves of Multple Regresso Establsh the lear equato that best predcts values of a depedet varable Y usg more tha oe eplaator varable from a large set of potetal predctors {,,... k }. Fd that subset of

More information

Kernel-based Methods and Support Vector Machines

Kernel-based Methods and Support Vector Machines Kerel-based Methods ad Support Vector Maches Larr Holder CptS 570 Mache Learg School of Electrcal Egeerg ad Computer Scece Washgto State Uverst Refereces Muller et al. A Itroducto to Kerel-Based Learg

More information

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class) Assgmet 5/MATH 7/Wter 00 Due: Frday, February 9 class (!) (aswers wll be posted rght after class) As usual, there are peces of text, before the questos [], [], themselves. Recall: For the quadratc form

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Mache Learg Problem set Due Frday, September 9, rectato Please address all questos ad commets about ths problem set to 6.867-staff@a.mt.edu. You do ot eed to use MATLAB for ths problem set though

More information

Mu Sequences/Series Solutions National Convention 2014

Mu Sequences/Series Solutions National Convention 2014 Mu Sequeces/Seres Solutos Natoal Coveto 04 C 6 E A 6C A 6 B B 7 A D 7 D C 7 A B 8 A B 8 A C 8 E 4 B 9 B 4 E 9 B 4 C 9 E C 0 A A 0 D B 0 C C Usg basc propertes of arthmetc sequeces, we fd a ad bm m We eed

More information

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called

More information

Department of Agricultural Economics. PhD Qualifier Examination. August 2011

Department of Agricultural Economics. PhD Qualifier Examination. August 2011 Departmet of Agrcultural Ecoomcs PhD Qualfer Examato August 0 Istructos: The exam cossts of sx questos You must aswer all questos If you eed a assumpto to complete a questo, state the assumpto clearly

More information

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek Partally Codtoal Radom Permutato Model 7- vestgato of Partally Codtoal RP Model wth Respose Error TRODUCTO Ed Staek We explore the predctor that wll result a smple radom sample wth respose error whe a

More information

Unsupervised Learning and Other Neural Networks

Unsupervised Learning and Other Neural Networks CSE 53 Soft Computg NOT PART OF THE FINAL Usupervsed Learg ad Other Neural Networs Itroducto Mture Destes ad Idetfablty ML Estmates Applcato to Normal Mtures Other Neural Networs Itroducto Prevously, all

More information

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions CO-511: Learg Theory prg 2017 Lecturer: Ro Lv Lecture 16: Bacpropogato Algorthm Dsclamer: These otes have ot bee subected to the usual scruty reserved for formal publcatos. They may be dstrbuted outsde

More information

MATH 247/Winter Notes on the adjoint and on normal operators.

MATH 247/Winter Notes on the adjoint and on normal operators. MATH 47/Wter 00 Notes o the adjot ad o ormal operators I these otes, V s a fte dmesoal er product space over, wth gve er * product uv, T, S, T, are lear operators o V U, W are subspaces of V Whe we say

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematcs of Mache Learg Lecturer: Phlppe Rgollet Lecture 3 Scrbe: James Hrst Sep. 6, 205.5 Learg wth a fte dctoary Recall from the ed of last lecture our setup: We are workg wth a fte dctoary

More information

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model Chapter 3 Asmptotc Theor ad Stochastc Regressors The ature of eplaator varable s assumed to be o-stochastc or fed repeated samples a regresso aalss Such a assumpto s approprate for those epermets whch

More information

Lecture 07: Poles and Zeros

Lecture 07: Poles and Zeros Lecture 07: Poles ad Zeros Defto of poles ad zeros The trasfer fucto provdes a bass for determg mportat system respose characterstcs wthout solvg the complete dfferetal equato. As defed, the trasfer fucto

More information

Lecture 9: Tolerant Testing

Lecture 9: Tolerant Testing Lecture 9: Tolerat Testg Dael Kae Scrbe: Sakeerth Rao Aprl 4, 07 Abstract I ths lecture we prove a quas lear lower boud o the umber of samples eeded to do tolerat testg for L dstace. Tolerat Testg We have

More information

Chapter 8. Inferences about More Than Two Population Central Values

Chapter 8. Inferences about More Than Two Population Central Values Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha

More information

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d 9 U-STATISTICS Suppose,,..., are P P..d. wth CDF F. Our goal s to estmate the expectato t (P)=Eh(,,..., m ). Note that ths expectato requres more tha oe cotrast to E, E, or Eh( ). Oe example s E or P((,

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall

More information

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean Research Joural of Mathematcal ad Statstcal Sceces ISS 30 6047 Vol. 1(), 5-1, ovember (013) Res. J. Mathematcal ad Statstcal Sc. Comparso of Dual to Rato-Cum-Product Estmators of Populato Mea Abstract

More information

Class 13,14 June 17, 19, 2015

Class 13,14 June 17, 19, 2015 Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral

More information

Simulation Output Analysis

Simulation Output Analysis Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5

More information

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package

More information

Parallel Multi-splitting Proximal Method for Star Networks

Parallel Multi-splitting Proximal Method for Star Networks Parallel Mult-splttg Proxmal Method for Star Networks Erm We Departmet of Electrcal Egeerg ad Computer Scece Northwester Uversty Evasto, IL 600 erm.we@orthwester.edu Abstract We develop a parallel algorthm

More information

An Introduction to. Support Vector Machine

An Introduction to. Support Vector Machine A Itroducto to Support Vector Mache Support Vector Mache (SVM) A classfer derved from statstcal learg theory by Vapk, et al. 99 SVM became famous whe, usg mages as put, t gave accuracy comparable to eural-etwork

More information

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x CS 75 Mache Learg Lecture 8 Lear regresso Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 75 Mache Learg Lear regresso Fucto f : X Y s a lear combato of put compoets f + + + K d d K k - parameters

More information

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter LOGISTIC REGRESSION Notato Model Logstc regresso regresses a dchotomous depedet varable o a set of depedet varables. Several methods are mplemeted for selectg the depedet varables. The followg otato s

More information

X ε ) = 0, or equivalently, lim

X ε ) = 0, or equivalently, lim Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece

More information

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab Lear Regresso Lear Regresso th Shrkage Some sldes are due to Tomm Jaakkola, MIT AI Lab Itroducto The goal of regresso s to make quattatve real valued predctos o the bass of a vector of features or attrbutes.

More information

TESTS BASED ON MAXIMUM LIKELIHOOD

TESTS BASED ON MAXIMUM LIKELIHOOD ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal

More information

MULTIDIMENSIONAL HETEROGENEOUS VARIABLE PREDICTION BASED ON EXPERTS STATEMENTS. Gennadiy Lbov, Maxim Gerasimov

MULTIDIMENSIONAL HETEROGENEOUS VARIABLE PREDICTION BASED ON EXPERTS STATEMENTS. Gennadiy Lbov, Maxim Gerasimov Iteratoal Boo Seres "Iformato Scece ad Computg" 97 MULTIIMNSIONAL HTROGNOUS VARIABL PRICTION BAS ON PRTS STATMNTS Geady Lbov Maxm Gerasmov Abstract: I the wors [ ] we proposed a approach of formg a cosesus

More information

Analysis of Variance with Weibull Data

Analysis of Variance with Weibull Data Aalyss of Varace wth Webull Data Lahaa Watthaacheewaul Abstract I statstcal data aalyss by aalyss of varace, the usual basc assumptos are that the model s addtve ad the errors are radomly, depedetly, ad

More information

Arithmetic Mean and Geometric Mean

Arithmetic Mean and Geometric Mean Acta Mathematca Ntresa Vol, No, p 43 48 ISSN 453-6083 Arthmetc Mea ad Geometrc Mea Mare Varga a * Peter Mchalča b a Departmet of Mathematcs, Faculty of Natural Sceces, Costate the Phlosopher Uversty Ntra,

More information

Non-uniform Turán-type problems

Non-uniform Turán-type problems Joural of Combatoral Theory, Seres A 111 2005 106 110 wwwelsevercomlocatecta No-uform Turá-type problems DhruvMubay 1, Y Zhao 2 Departmet of Mathematcs, Statstcs, ad Computer Scece, Uversty of Illos at

More information

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015 Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts

More information

ENGI 3423 Simple Linear Regression Page 12-01

ENGI 3423 Simple Linear Regression Page 12-01 ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:

More information

A NEW LOG-NORMAL DISTRIBUTION

A NEW LOG-NORMAL DISTRIBUTION Joural of Statstcs: Advaces Theory ad Applcatos Volume 6, Number, 06, Pages 93-04 Avalable at http://scetfcadvaces.co. DOI: http://dx.do.org/0.864/jsata_700705 A NEW LOG-NORMAL DISTRIBUTION Departmet of

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Revew for the prevous lecture: Theorems ad Examples: How to obta the pmf (pdf) of U = g (, Y) ad V = g (, Y) Chapter 4 Multple Radom Varables Chapter 44 Herarchcal Models ad Mxture Dstrbutos Examples:

More information

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades STAT 101 Dr. Kar Lock Morga 11/20/12 Exam 2 Grades Multple Regresso SECTIONS 9.2, 10.1, 10.2 Multple explaatory varables (10.1) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (10.2) Trasformatos

More information

Derivation of 3-Point Block Method Formula for Solving First Order Stiff Ordinary Differential Equations

Derivation of 3-Point Block Method Formula for Solving First Order Stiff Ordinary Differential Equations Dervato of -Pot Block Method Formula for Solvg Frst Order Stff Ordary Dfferetal Equatos Kharul Hamd Kharul Auar, Kharl Iskadar Othma, Zara Bb Ibrahm Abstract Dervato of pot block method formula wth costat

More information

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity BULLETIN of the MALAYSIAN MATHEMATICAL SCIENCES SOCIETY Bull. Malays. Math. Sc. Soc. () 7 (004), 5 35 Strog Covergece of Weghted Averaged Appromats of Asymptotcally Noepasve Mappgs Baach Spaces wthout

More information

Chapter 9 Jordan Block Matrices

Chapter 9 Jordan Block Matrices Chapter 9 Jorda Block atrces I ths chapter we wll solve the followg problem. Gve a lear operator T fd a bass R of F such that the matrx R (T) s as smple as possble. f course smple s a matter of taste.

More information

A New Family of Transformations for Lifetime Data

A New Family of Transformations for Lifetime Data Proceedgs of the World Cogress o Egeerg 4 Vol I, WCE 4, July - 4, 4, Lodo, U.K. A New Famly of Trasformatos for Lfetme Data Lakhaa Watthaacheewakul Abstract A famly of trasformatos s the oe of several

More information

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines CS 675 Itroducto to Mache Learg Lecture Support vector maches Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Mdterm eam October 9, 7 I-class eam Closed book Stud materal: Lecture otes Correspodg chapters

More information

Module 7: Probability and Statistics

Module 7: Probability and Statistics Lecture 4: Goodess of ft tests. Itroducto Module 7: Probablty ad Statstcs I the prevous two lectures, the cocepts, steps ad applcatos of Hypotheses testg were dscussed. Hypotheses testg may be used to

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Marquette Uverst Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Coprght 08 b Marquette Uverst Maxmum Lkelhood Estmato We have bee sag that ~

More information

STK4011 and STK9011 Autumn 2016

STK4011 and STK9011 Autumn 2016 STK4 ad STK9 Autum 6 Pot estmato Covers (most of the followg materal from chapter 7: Secto 7.: pages 3-3 Secto 7..: pages 3-33 Secto 7..: pages 35-3 Secto 7..3: pages 34-35 Secto 7.3.: pages 33-33 Secto

More information

Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms

Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms Exploitig Strog Covexity from Data with Primal-Dual First-Order Algorithms Jialei Wag Li Xiao Abstract We cosider empirical risk miimizatio of liear predictors with covex loss fuctios. Such problems ca

More information

A Remark on the Uniform Convergence of Some Sequences of Functions

A Remark on the Uniform Convergence of Some Sequences of Functions Advaces Pure Mathematcs 05 5 57-533 Publshed Ole July 05 ScRes. http://www.scrp.org/joural/apm http://dx.do.org/0.436/apm.05.59048 A Remark o the Uform Covergece of Some Sequeces of Fuctos Guy Degla Isttut

More information

Support vector machines

Support vector machines CS 75 Mache Learg Lecture Support vector maches Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 75 Mache Learg Outle Outle: Algorthms for lear decso boudary Support vector maches Mamum marg hyperplae.

More information

5 Short Proofs of Simplified Stirling s Approximation

5 Short Proofs of Simplified Stirling s Approximation 5 Short Proofs of Smplfed Strlg s Approxmato Ofr Gorodetsky, drtymaths.wordpress.com Jue, 20 0 Itroducto Strlg s approxmato s the followg (somewhat surprsg) approxmato of the factoral,, usg elemetary fuctos:

More information

C-1: Aerodynamics of Airfoils 1 C-2: Aerodynamics of Airfoils 2 C-3: Panel Methods C-4: Thin Airfoil Theory

C-1: Aerodynamics of Airfoils 1 C-2: Aerodynamics of Airfoils 2 C-3: Panel Methods C-4: Thin Airfoil Theory ROAD MAP... AE301 Aerodyamcs I UNIT C: 2-D Arfols C-1: Aerodyamcs of Arfols 1 C-2: Aerodyamcs of Arfols 2 C-3: Pael Methods C-4: Th Arfol Theory AE301 Aerodyamcs I Ut C-3: Lst of Subects Problem Solutos?

More information

Lecture Note to Rice Chapter 8

Lecture Note to Rice Chapter 8 ECON 430 HG revsed Nov 06 Lecture Note to Rce Chapter 8 Radom matrces Let Y, =,,, m, =,,, be radom varables (r.v. s). The matrx Y Y Y Y Y Y Y Y Y Y = m m m s called a radom matrx ( wth a ot m-dmesoal dstrbuto,

More information

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Multivariate Transformation of Variables and Maximum Likelihood Estimation Marquette Uversty Multvarate Trasformato of Varables ad Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Assocate Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 03 by Marquette Uversty

More information

PTAS for Bin-Packing

PTAS for Bin-Packing CS 663: Patter Matchg Algorthms Scrbe: Che Jag /9/00. Itroducto PTAS for B-Packg The B-Packg problem s NP-hard. If we use approxmato algorthms, the B-Packg problem could be solved polyomal tme. For example,

More information

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables Joural of Sceces, Islamc Republc of Ira 8(4): -6 (007) Uversty of Tehra, ISSN 06-04 http://sceces.ut.ac.r Complete Covergece ad Some Maxmal Iequaltes for Weghted Sums of Radom Varables M. Am,,* H.R. Nl

More information

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression Overvew Basc cocepts of Bayesa learg Most probable model gve data Co tosses Lear regresso Logstc regresso Bayesa predctos Co tosses Lear regresso 30 Recap: regresso problems Iput to learg problem: trag

More information

CHAPTER 3 POSTERIOR DISTRIBUTIONS

CHAPTER 3 POSTERIOR DISTRIBUTIONS CHAPTER 3 POSTERIOR DISTRIBUTIONS If scece caot measure the degree of probablt volved, so much the worse for scece. The practcal ma wll stck to hs apprecatve methods utl t does, or wll accept the results

More information