arxiv: v1 [math.oc] 7 Mar 2017

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "arxiv: v1 [math.oc] 7 Mar 2017"

Transcription

1 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Jale Wag L Xao arxv: v [math.oc] 7 Mar 07 Abstract We cosder emprcal rsk mmzato of lear predctors wth covex loss fuctos. Such problems ca be reformulated as covex-cocave saddle pot problems ad thus are well sutable for prmal-dual frst-order algorthms. However prmal-dual algorthms ofte requre explct strogly covex regularzato order to obta fast lear covergece ad the requred dual proxmal mappg may ot admt closedform or effcet soluto. I ths paper we develop both batch ad radomzed prmal-dual algorthms that ca explot strog covexty from data adaptvely ad are capable of achevg lear covergece eve wthout regularzato. We also preset dual-free varats of the adaptve prmal-dual algorthms that do ot requre computg the dual proxmal mappg whch are especally sutable for logstc regresso.. Itroducto We cosder the problem of regularzed emprcal rsk mmzato ERM of lear predctors. Leta...a R d be the feature vectors of data samples φ : R R be a covex loss fucto assocated wth the lear predcto a T x for =... ad g : Rd R be a covex regularzato fucto for the predctorx R d. ERM amouts to solvg the followg covex optmzato problem: { m Px def = } x R d = φ a T xgx. Examples of the above formulato clude may wellkow classfcato ad regresso problems. For bary classfcato each feature vectora s assocated wth a label b {±}. I partcular logstc regresso s obtaed by settg φ z = logexp b z. For lear regresso problems each feature vector a s assocated wth a Departmet of Computer Scece The Uversty of Chcago Chcago Illos USA. Mcrosoft Research Redmod Washgto 9805 USA. Correspodece to: Jale Wag L depedet varable b R ad φ z = /z b. The we get rdge regresso wth gx = λ/ x ad elastc et wthgx = λ x λ / x. LetA = [a...a ] T be the data matrx. Throughout ths paper we make the followg assumptos: Assumpto. The fuctosφ g ad matrxasatsfy: Each φ s δ-strogly covex ad /-smooth where > 0 adδ 0 adδ ; g s λ-strogly covex where λ 0; λδµ > 0 where µ = λ m A T A. The strog covexty ad smoothess metoed above are wth respect to the stadard Eucldea orm deoted as x = x T x. See e.g. Nesterov 004 Sectos.. ad..3 for the exact deftos. Let R = max { a } ad assumg λ > 0 the R /λ s a popular defto of codto umber for aalyzg complextes of dfferet algorthms. The last codto above meas that the prmal objectve fucto Px s strogly covex eve f λ = 0. There have bee extesve research actvtes recet years o developg effcetly algorthms for solvg problem. A broad class of radomzed algorthms that explot the fte sum structure the ERM problem have emerged as very compettve both terms of theoretcal complexty ad practcal performace. They ca be put to three categores: prmal dual ad prmal-dual. Prmal radomzed algorthms work wth the ERM problem drectly. They are moder versos of radomzed cremetal gradet methods e.g. Bertsekas 0; Nedc & Bertsekas 00 equpped wth varace reducto techques. Each terato of such algorthms oly process oe data pot a wth complexty Od. They cludes SAG Roux et al. 0 SAGA Defazo et al. 04 ad SVRG Johso & Zhag 03; Xao & Zhag 04 whch all acheve the terato complexty O R /λlog/ǫ to fd a ǫ- optmal soluto. I fact they are capable of explotg the strog covexty from data meag that the codto umberr /λ the complexty ca be replaced by the more favorable oer /λδµ /. Ths mprovemet ca be acheved wthout explct kowledge of µ from data.

2 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Dual algorthms solve Fechel dual of by maxmzg Dy def = = φ y g = y a usg radomzed coordate ascet algorthms. Here φ ad g deotes the cojugate fuctos of φ ad g. They clude SDCA Shalev-Shwartz & Zhag 03 Nesterov 0 ad Rchtárk & Takáč 04. They have the same complexty O R /λlog/ǫ but are hard to explot strog covexty from data. Prmal-dual algorthms solve the covex-cocave saddle pot problemm x max y Lxy where Lxy def = = y a x φ y gx. 3 I partcular SPDC Zhag & Xao 05 acheves a accelerated lear covergece rate wth terato complexty O R/ λlog/ǫ whch s better tha the aforemetoed o-accelerated complexty whe R /λ >. La & Zhou 05 developed dual-free varats of accelerated prmal-dual algorthms but wthout cosderg the lear predctor structure ERM. Balamuruga & Bach 06 exteded SVRG ad SAGA to solvg saddle pot problems. Accelerated prmal ad dual radomzed algorthms have also bee developed. Nesterov 0 Fercoq & Rchtárk 05 ad L et al. 05b developed accelerated coordate gradet algorthms whch ca be appled to solve the dual problem. Alle-Zhu 06 developed a accelerated varat of SVRG. Accelerato ca also be obtaed usg the Catalyst framework L et al. 05a. They all acheve the same O R/ λlog/ǫ complexty. A commo feature of accelerated algorthms s that they requre good estmate of the strog covexty parameter. Ths makes hard for them to explot strog covexty from data because the mmum sgular valueµ of the data matrxas very hard to estmate geeral. I ths paper we show that prmal-dual algorthms are capable of explotg strog covexty from data f the algorthm parameters such as step szes are set approprately. Whle these optmal settg depeds o the kowledge of the covexty parameter µ from the data we develop adaptve varats of prmal-dual algorthms that ca tue the parameter automatcally. Such adaptve schemes rely crtcally o the capablty of evaluatg the prmal-dual optmalty gaps by prmal-dual algorthms. A major dsadvatage of prmal-dual algorthms s that the requred dual proxmal mappg may ot admt closedform or effcet soluto. We follow the approach of La & Zhou 05 to derve dual-free varats of the prmal-dual algorthms customzed for ERM problems wth the lear predctor structure ad show that they ca also explot strog covexty from data wth correct choces of parameters or usg a adaptato scheme. Algorthm Batch Prmal-Dual BPD Algorthm put: parametersτ θ tal pot x 0 = x 0 y 0 fort = 0... do y t = prox f y t A x t x t = prox τg x t τa T y t x t = x t θx t x t ed for. Batch prmal-dual algorthms Before dvg to radomzed prmal-dual algorthms we frst cosder batch prmal-dual algorthms whch exhbt smlar propertes as ther radomzed varats. To ths ed we cosder a batch verso of the ERM problem m x R d { Px def = faxgx }. 4 wherea R d ad make the followg assumpto: Assumpto. The fuctos f g ad matrx A satsfy: f s δ-strogly covex ad /-smooth where > 0 adδ 0 adδ ; g s λ-strogly covex where λ 0; λδµ > 0 where µ = λ m A T A. For exact correspodece wth problem we have fz = = φ z wth z = a T x. Uder Assumpto the fucto fz s δ/-strogly covex ad /-smooth ad fax s δµ /-strogly covex ad R /-smooth. However such correspodeces aloe are ot suffcet to explot the structure of.e. substtutg them to the batch algorthms of ths secto wll ot produce the effcet algorthms for solvg problem that we wll preset Sectos 3 ad 4.. So we do ot make such correspodeces explct ths secto. Rather treat them as depedet assumptos wth the same otato. Usg cojugate fuctos we ca derve the dual of 4 as max y R { Dy def = f y g A T y } 5 ad the covex-cocave saddle pot formulato s { def m max Lxy = gxy T Ax f y }. 6 x R d y R We cosder the prmal-dual frst-order algorthm proposed by Chambolle & Pock 0; 06 for solvg the saddle pot problem 6 whch s gve as Algorthm. Here we call t the batch prmal-dual BPD algorthm. Assumg that f s smooth ad g s strogly covex Chambolle & Pock 0; 06 showed that Algorthm acheves accelerated lear covergece rate f λ > 0. However they dd ot cosder the case where addtoal or the sole source of strog covexty comes from fax.

3 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms I the followg theorem we show how to set the parameters τ ad θ to explot both sources of strog covexty to acheve fast lear covergece. Theorem. Suppose Assumpto holds ad x y s the uque saddle pot ofldefed 6. LetL = A = λmax A T A. If we set the parameters Algorthm as = L λδµ τ = L λδµ 7 adθ = max{θ x θ y } where θ x = δ µ δ L τλ θ y = / 8 the we have τ λ x t x 4 yt y θ t C Lx t y Lx y t θ t C wherec = τ λ x 0 x 4 y 0 y. The proof of Theorem s gve Appedces B ad C. Here we gve a detaled aalyss of the covergece rate. Substtutg ad τ 7 to the expressos for θ y ad θ x 8 ad assumgλδµ L we have θ x δµ L λδµ L δ λ θ y = λδµ /L λδµ L. L λδµ Sce the overall codto umber of the problem s L λδµ t s clear that θ y s a accelerated covergece rate. Next we exameθ x two specal cases. The case of δµ = 0but λ > 0. I ths case we have τ = L λ ad = λ L ad thus θ x = λ/l λ L θ y= λ/l λ L. Therefore we have θ = max{θ x θ y } λ L. Ths deed s a accelerated covergece rate recoverg the result of Chambolle & Pock 0; 06. The case of λ = 0 butδµ > 0. τ = Lµ δ ad = µ δ L ad I ths case we have θ x = δµ L δµ/lδ θ y δµ L. L Notce that δ µ s the codto umber of fax. Next we assumeµ L ad exame howθ x vares wthδ. Ifδ µ L meagf s badly codtoed the θ x δµ L 3 δµ/l = δµ 3L. Because the overall codto umber s L δ µ ths s a accelerated lear rate ad so sθ = max{θ x θ y }. Algorthm Adaptve Batch Prmal-Dual Ada-BPD put: problem costats λ δ L ad ˆµ > 0 tal potx 0 y 0 ad adaptato perodt. Compute τ adθ as 7 ad 8 usgµ = ˆµ fort = 0... do y t = prox f y t A x t x t = prox τg x t τa T y t x t = x t θx t x t f modtt == 0 the τθ = BPD-Adapt {P s D s } t s=t T ed f ed for Ifδ µ L meagf s mldly codtoed the θ x µ3 µ L 3 µ/l 3/ µ/l L. Ths represets a half-accelerated rate because the overall codto umber s L δ µ L3 µ. 3 Ifδ =.e.f s a smple quadratc fucto the θ x µ µ L µ/l L. Ths rate does ot have accelerato because the overall codto umber s L δ µ L µ. I summary the extet of accelerato the domatg factorθ x whch determesθ depeds o the relatve sze of δ ad µ /L.e. the relatve codtog betwee the fucto f ad the matrx A. I geeral we have full accelerato f δ µ /L. The theory predcts that the accelerato degrades as the fucto f gets better codtoed. However our umercal expermets we ofte observe accelerato eve f δ gets closer to. As explaed Chambolle & Pock 0 Algorthm s equvalet to a precodtoed ADMM. Deg & Y 06 characterzed codtos for ADMM to obta lear covergece wthout assumg both parts of the objectve fucto beg strogly covex but they dd ot derve covergece rate for ths case... Adaptve batch prmal-dual algorthms I practce t s ofte very hard to obta good estmate of the problem-depedet costats especally µ = λm A T A order to apply the algorthmc parameters specfed Theorem. Here we explore heurstcs that ca eable adaptve tug of such parameters whch ofte lead to much mproved performace practce. A key observato s that the covergece rate of the BPD algorthm chages mootocally wth the overall strog covexty parameter λ δµ regardless of the extet of 3

4 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Algorthm 3 BPD-Adapt smple heurstc put: prevous estmate ˆµ adapto perod T prmal ad dual objectve values{p s D s } t s=t T f P t D t < θ T P t T D t T the ˆµ := ˆµ else ˆµ := ˆµ/ ed f Compute τ adθ as 7 ad 8 usgµ = ˆµ output: ew parameters τ θ accelerato. I other words the larger λ δµ s the faster the covergece. Therefore f we ca motor the progress of the covergece ad compare t wth the predcted covergece rate Theorem the we ca adjust the algorthmc parameters to explot the fastest possble covergece. More specfcally f the observed covergece s slower tha the predcted covergece rate the we should reduce the estmate of µ; f the observed covergece s better tha the predcted rate the we ca try to crease µ for eve faster covergece. We formalze the above reasog a Adaptve BPD Ada-BPD algorthm descrbed Algorthm. Ths algorthm matas a estmate ˆµ of the true costatµ ad adjust t every T teratos. We use P t ad D t to represet the prmal ad dual objectve values at Px t ad Dy t respectvely. We gve two mplemetatos of the tug procedure BPD-Adapt: Algorthm 3 s a smple heurstc for tug the estmate ˆµ where the creasg ad decreasg factor ca be chaged to other values larger tha ; Algorthm 4 s a more robust heurstc. It does ot rely o the specfc covergece rate θ establshed Theorem. Istead t smply compares the curret estmate of objectve reducto rate ˆρ wth the prevous estmate ρ θ T. It also specfes a o-tug rage of chages ρ specfed by the terval[cc]. Oe ca also devse more sophstcated schemes; e.g. f we estmate that δµ < λ the o more tug s ecessary. The capablty of accessg both the prmal ad dual objectve values allows prmal-dual algorthms to have good estmate of the covergece rate whch eables effectve tug heurstcs. Automatc tug of prmal-dual algorthms have also bee studed by e.g. Maltsky & Pock 06 ad Goldste et al. 03 but wth dfferet goals. Fally we ote that Theorem oly establshes covergece rate for the dstace to the optmal pot ad the quatty Lx t y Lx y t whch s ot qute the dualty gappx t Dy t. Nevertheless same covergece rate ca also be establshed for the dualty gap see Algorthm 4 BPD-Adapt robust heurstc put: prevous rate estmate ρ > 0 = δˆµ perodt costatsc < adc > ad{p s D s } t s=t T Compute ew rate estmate ˆρ = Pt D t P t T D t T f ˆρ cρ the := ρ := ˆρ else f ˆρ cρ the := / else := ed f λ ρ := ˆρ λ = L τ = L Computeθ usg 8 or set θ = output: ew parameters τ θ Zhag & Xao 05 Secto. whch ca be used to better justfy the adapto procedure. 3. Radomzed prmal-dual algorthm I ths secto we come back to the ERM problem whch have a fte sum structure that allows the developmet of radomzed prmal-dual algorthms. I partcular we exted the stochastc prmal-dual coordate SPDC algorthm Zhag & Xao 05 to explot the strog covexty from data order to acheve faster covergece rate. Frst we show that by settg algorthmc parameters approprately the orgal SPDC algorthm may drectly beeft from strog covexty from the loss fucto. We ote that the SPDC algorthm s a specal case of the Adaptve SPDC Ada-SPDC algorthm preseted Algorthm 5 by settg the adapto perod T = ot performg ay adapto. The followg theorem s proved Appedx E. Theorem. Suppose Assumpto holds. Let x y be the saddle pot of the fucto L defed 3 ad R = max{ a... a }. If we set T = Algorthm 5 o adapto ad let τ = 4R λδµ = 4R adθ = max{θ x θ y } where θ x = τδµ 4δ λδµ 9 τλ θ y = // / 0 the we have τ [ λ E x t x ] 4 E[ y t y ] θ t C E [ Lx t y Lx y t ] θ t C wherec = τ λ x 0 x 4 y 0 y. The expectato E[ ] s take wth respect to the hstory of radom dces draw at each terato. 4

5 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Algorthm 5 Adaptve SPDC Ada-SPDC put: parameters τ θ > 0 tal potx 0 y 0 ad adaptato perod T. Set x 0 = x 0 fort = 0... do pckk {...} uformly at radom for {...} do f == k the y t k = prox φ k y t k at k xt else y t = y t ed f ed for x t = prox τg x t τ u t y t u t = u t yt k y t k a k x t = x t θx t x t k y t k a k f modtt = 0 the τθ = SPDC-Adapt {P t s D t s } T s=0 ed f ed for Below we gve a detaled dscusso o the expected covergece rate establshed Theorem. The cases of µ = 0 but λ > 0. τ = 4R λ ad = λ 4R ad θ x = τλ = 4R /λ I ths case we have θ y = // / = 8R /λ. Hece θ = θ y. These recover the parameters ad covergece rate of the stadard SPDC Zhag & Xao 05. The cases of µ > 0 but λ = 0. τ = 4Rµ δ ad = µ δ 4R ad θ x = τδµ δµ 4δ = θ y = 8R/µ δµ δ 8R I ths case we have 3R δµ/4r4δ. δµ. 8R Sce the objectve s R /-smooth ad δµ /-strogly covex θ y s a accelerated rate f δµ 8R otherwse θ y. Forθ x we cosder dfferet stuatos: If µ R the we have θ x δµ R whch s a accelerated rate. So sθ = max{θ x θ y }. If µ < R ad δ µ R the θ x δµ R whch represets accelerated rate. The terato complexty of SPDC s whch s better tha that of Õ R µ δ SVRG ths case whch sõ R δµ. Ifµ < R adδ µ R the we getθ x µ R. Ths s a half-accelerated rate because ths case SVRG would requreõr3 µ teratos whle terato complexty here sõr µ 3. If µ < R ad δ meag the φ s are well codtoed the we get θ x δµ R µ R whch s a o-accelerated rate. The correspodg terato complexty s the same as SVRG. 3.. Parameter adaptato for SPDC The SPDC-Adapt procedure called Algorthm 5 follows the same logcs as the batch adapto schemes Algorthms 3 ad 4 ad we omt the detals here. Oe thg we emphasze here s that the adaptato perod T s terms of epochs or umber of passes over the data. I addto we oly compute the prmal ad dual objectve values after each pass or every few passes because computg them exactly usually eed to take a full pass of the data. Aother mportat ssue s that ulke the batch case where the dualty gap usually decreases mootocally the dualty gap for radomzed algorthms ca fluctuate wldly. So stead of usg oly the two ed valuesp t T D t T ad P t D t we ca use more pots to estmate the covergece rate through a lear regresso. Suppose the prmal-dual values at the ed of each past T passes are {P0D0}{PD}...{PTDT} ad we eed to estmate ρ rate per pass such that Pt Dt ρ t P0 D0 t =...T. We ca tur t to a lear regresso problem after takg logarthm ad obta the estmate ˆρ through T Pt Dt logˆρ = T t=tlog P0 D0. The rest of the adapto procedure ca follow the robust scheme Algorthm 4. I practce we ca compute the prmal-dual values more sporadcally say every few passes ad modfy the regresso accordgly. 4. Dual-free Prmal-dual algorthms Compared wth prmal algorthms oe major dsadvatage of prmal-dual algorthms s the requremet of computg the proxmal mappg of the dual fuctof orφ whch may ot admt closed-formed soluto or effcet computato. Ths s especally the case for logstc regresso oe of the most popular loss fuctos used classfcato. La & Zhou 05 developed dual-free varats of prmal-dual algorthms that avod computg the dual proxmal mappg. Ther ma techque s to replace the Eucldea dstace the dual proxmal mappg wth a Bregma dvergece defed over the dual loss fucto tself. 5

6 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Algorthm 6 Dual-Free BPD Algorthm put: parameters τ θ > 0 tal potx 0 y 0 Set x 0 = x 0 adv 0 = f y 0 fort = 0... do v t = vt A x t y t = f v t x t = prox τg x t τa T y t x t = x t θx t x t ed for We show how to apply ths approach to solve the structured ERM problems cosdered ths paper. They ca also explot strog covexty from data f the algorthmc parameters are set approprately or adapted automatcally. 4.. Dual-free BPD algorthm Frst we cosder the batch settg. We replace the dual proxmal mappg computgy t Algorthm wth y t { =argm f y y T A x t Dyyt } y where D s the Bregma dvergece of a strctly covex kerel fucto h defed as D h yy t = hy hy t hy t y y t. Algorthm s obtaed the Eucldea settg wth hy = y ad Dyy t = y yt. Whle our covergece results would apply for arbtrary Bregma dvergece we oly focus o the case of usg f tself as the kerel because ths allows us to computey t very effcetly. The followg lemma explas the detals Cf. La & Zhou 05 Lemma. Lemma. Let the kerel h f the Bregma dvergeced. If we costruct a sequece of vectors{v t } such thatv 0 = f y 0 ad for allt 0 v t = vt A x t the the soluto to problem s y t = f v t. Proof. Supposev t = f y t true fort = 0 the Dyy t = f y f y t v tt y y t. The soluto to ca be wrtte as { y t = argm f y y T A x t f y v tt y } y { = argm f y } A x t vt T y y = argmax y = argmax y { T v t A x t y f y} } { v tt y f y = f v t where the last equalty we used the property of cojugate fucto whe f s strogly covex ad smooth. Moreover v t = f y t = f y t whch completes the proof. Accordg to Lemma we oly eed to provde tal pots such thatv 0 = f y 0 s easy to compute. We do ot eed to compute f y t drectly for ay t > 0 because t s ca be updated as v t. Cosequetly we ca updatey t the BPD algorthm usg the gradet f v t wthout the eed of dual proxmal mappg. The resultg dual-free algorthm s gve Algorthm 6. La & Zhou 05 cosdered a geeral settg whch does ot possess the lear predctor structure we focus o ths paper ad assumed that oly the regularzato g s strogly covex. Our followg result shows that dualfree prmal-dual algorthms ca also explot strog covexty from data wth approprate algorthmc parameters. Theorem 3. Suppose Assumpto holds ad let x y be the uque saddle pot ofldefed 6. If we set the parameters Algorthm 6 as τ = L λδµ = L λδµ 3 adθ = max{θ x θ y } where θ x = τδµ 4 τλ θ y = / 4 the we have τ λ x t x Dy y t θ t C Lx t y Lx y t θ t C where C = τ λ x 0 x Dy y 0. Theorem 3 s proved Appedces B ad D. Assumg λδµ L we have θ x δµ 6L λ λδµ L λδµ θ y 4L. Aga we ga sghts by cosder the specal cases: If δµ = 0 ad λ > 0 the θ y λ 4L ad θ x λ L. So θ = max{θ xθ y } s a accelerated rate. If δµ > 0 ad λ = 0 the θ y δµ 4L ad θ x δµ 6L. Thus θ = max{θ x θ y } δµ 6L s ot accelerated. Notce that ths cocluso does ot depeds o the relatve sze ofδ adµ /L ad ths s the major dfferece from the Eucldea case dscussed Secto. If both δµ > 0 ad λ > 0 the the extet of accelerato depeds o ther relatve sze. If λ s o the same order as δµ or larger the accelerated rate s obtaed. Ifλs much smaller thaδµ the the theory predcts o accelerato. 6

7 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Algorthm 7 Adaptve Dual-Free SPDC ADF-SPDC put: parameters τ θ > 0 tal potx 0 y 0 ad adaptato perod T. Set x 0 = x 0 adv 0 = φ y 0 for =... fort = 0... do pckk {...} uformly at radom for {...} do f == k the else v t k v t ed f ed for x t = prox τg = vt k at k xt y t k = φ k vt k = v t y t = y t x t τ u t y t u t = u t yt k y t k a k x t = x t θx t x t k y t k a k f modtt = 0 the τθ = SPDC-Adapt {P t s D t s } T s=0 ed f ed for 4.. Dual-free SPDC algorthm The same approach ca be appled to derve a Dualfree SPDC algorthm whch s descrbed Algorthm 7. It also cludes a parameter adapto procedure so we call t the adaptve dual-free SPDC ADF-SPDC algorthm. O related work Shalev-Shwartz & Zhag 06 ad Shalev-Shwartz 06 troduced dual-free SDCA. The followg theorem characterzes the choce of algorthmc parameters that ca explot strog covexty from data to acheve lear covergece proof gve Appedx F. Theorem 4. Suppose Assumpto holds. Let x y be the saddle pot of L defed 3 ad R = max{ a... a }. If we set T = Algorthm 7 o adapto ad let = 4R λδµ τ = 4R adθ = max{θ x θ y } where θ x = τδµ 4 λδµ 5 τλ θ y = // / 6 the we have τ λ E [ x t x ] 4 E[ Dy y t ] θ t C E [ Lx t y Lx y t ] θ t C where C = τ λ x 0 x Dy y 0. Below we dscuss the expected covergece rate establshed Theorem two specal cases. The cases of µ = 0 but λ > 0. τ = 4R λ ad = 4R λ ad θ x = τλ = 4R /λ I ths case we have θ y = // / = 8R /λ. These recover the covergece rate of the stadard SPDC algorthm Zhag & Xao 05. The cases ofµ > 0 but λ = 0. I ths case we have τ = 4Rµ δ = 4R µ δ ad θ x = τδµ δµ 4 = 3R δµ/4r4 θ y = // / = 8R/µ δ. We ote that the prmal fucto ow s R /-smooth ad δµ /-strogly covex. We dscuss the followg cases: If δµ > R the we have θ x δµ 8R ad θ y. Thereforeθ = max{θ xθ y }. Otherwse we have θ x δµ 64R ad θ y s of the same order. Ths s ot a accelerated rate ad we have the same terato complexty as SVRG. Fally we gve cocrete examples of how to compute the tal potsy 0 adv 0 such thatv 0 = φ y 0. For squared loss φ α = α b ad φ β = β b β. So v 0 = φ y 0 = y 0 b. For logstc regresso we have b { } ad φ α = log e bα. The cojugate fucto s φ β = b βlog b βb βlogb β f b β [ 0] ad otherwse. We ca choose y 0 = b adv 0 =0 such thatv 0 =φ y 0. For logstc regresso we have δ = 0 over the full doma of φ. However each φ s locally strogly covex bouded doma Bach 04: f z [ B B] the we kow δ = m z φ z exp B/4. Therefore t s well sutable for a adaptato scheme smlar to Algorthm 4 that do ot requre kowledge of etherδ orµ. 5. Prelmary expermets We preset prelmary expermets to demostrate the effectveess of our proposed algorthms. Frst we cosder batch prmal-dual algorthms for rdge regresso over a sythetc dataset. The data matrx A has szes = 5000 ad d = 3000 ad ts etres are sampled from multvarate ormal dstrbuto wth mea zero ad covarace matrx Σ j = j /. We ormalze all datasets 7

8 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Prmal optmalty gap Prmal AG BPD Opt-BPD Ada-BPD sythetcλ = / sythetcλ = 0 / sythetcλ = 0 4 / Fgure. Comparso of batch prmal-dual algorthms for a rdge regresso problem wth = 5000 ad d = such that a = a /max j a j to esure the maxmum orm of the data pots s. We use l -regularzato gx = λ/ x wth three choces of parameterλ: / 0 / ad 0 4 / whch represet the strog medum ad weak levels of regularzato respectvely. Fgure shows the performace of four dfferet algorthms: the accelerated gradet algorthm for solvg the prmal mmzato problem Prmal AG Nesterov 004 usg λ as strog covexty parameter the BPD algorthm Algorthm that usesλas the strog covexty parameter settg µ = 0 the optmal BPD algorthm Opt- BPD that uses µ = λ m A T A explctly computed from data ad the Ada-BPD algorthm Algorthm wth the robust adaptato heurstc Algorthm 4 wth T = 0 c = 0.95 ad c =.5. As expected the performace of Prmal-AG s very smlar to BPD wth the same strog covexty parameter. The Opt-BPD fully explots strog covexty from data thus has the fastest covergece. The Ada-BPD algorthm ca partally explot strog covexty from data wthout kowledge ofµ. Next we compare the DF-SPDC Algorthm 5 wthout adapto ad ADF-SPDC Algorthm 7 wth adapto agast several state-of-the-art radomzed algorthms for ERM: SVRG Johso & Zhag 03 SAGA Defazo et al. 04 Katyusha Alle-Zhu 06 ad the stadard SPDC method Zhag & Xao 05. For SVRG ad Katyusha a accelerated varat of SVRG we choose the varace reducto perod asm =. The step szes of all algorthms are set as ther orgal paper suggested. For Ada-SPDC ad ADF-SPDC we use the robust adaptato scheme wtht = 0c = 0.95 adc =.5. We frst compare these radomzed algorthms for rdge regresso over the same sythetc data descrbed above ad thecpuact data from the LbSVM webste. The results are show Fgure. Wth relatvely strog regularzato λ = / all methods perform smlarly as predcted by theory. For the sythetc dataset Wth λ = 0 / the regularzato s weaker but stll stroger tha the hdde strog covexty from data so the accelerated algorthms all varats of SPDC ad Katyusha perform better tha SVRG ad SAGA. Wth λ = 0 4 / t looks that the strog covexty from data domates the regularzato. Sce the o-accelerated algorthms SVRG ad SAGA may automatcally explot strog covexty from data they become faster tha the o-adaptve accelerated methods Katyusha SPDC ad DF-SPDC. The adaptve accelerated method ADF-SPDC has the fastest covergece. Ths shows that our theoretcal results whch predct o accelerato ths case ca be further mproved. Fally we compare these radomzed algorthm for logstc regresso o the rcv dataset from LbSVM webste ad aother sythetc dataset wth = 5000 ad d = 500 geerated smlarly as before but wth covarace matrx Σ j = j /00. For the stadard SPDC we solve the dual proxmal mappg usg a few steps of Newto s method to hgh precso. The dual-free SPDC algorthms oly use gradets of the logstc fucto. The results are preseted Fgure 3. for both datasets the strog covexty from data s very weak or oe so the accelerated algorthms performs better. 6. Coclusos We have show that prmal-dual frst-order algorthms are capable of explotg strog covexty from data f the algorthmc parameters are chose approprately. Whle they may depeds o problem depedet costats that are ukow we developed heurstcs for adaptg the parameters o the fly ad obtaed mproved performace expermets. It looks that our theoretcal characterzato of the covergece rates ca be further mproved as our expermets ofte demostrate sgfcat accelerato cases where our theory does ot predct accelerato. cjl/lbsvm/ 8

9 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Prmal optmalty gap SVRG SAGA Katyusha SPDC DF-SPDC ADF-SPDC sythetcλ = / sythetcλ = 0 / sythetcλ = 0 4 / Prmal optmalty gap SVRG SAGA Katyusha SPDC DF-SPDC ADF-SPDC cpuactλ = / cpuactλ = 0 / cpuactλ = 0 4 / Fgure. Comparso of radomzed algorthms for rdge regresso problems. Prmal optmalty gap SVRG SAGA Katyusha SPDC DF-SPDC ADF-SPDC sythetcλ = / sythetcλ = 0 / sythetcλ = 0 4 / Prmal optmalty gap SVRG SAGA Katyusha SPDC DF-SPDC ADF-SPDC rcvλ = / rcvλ = 0 / rcvλ = 0 4 / Fgure 3. Comparso of radomzed algorthms for logstc regresso problems. 9

10 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Refereces Alle-Zhu Zeyua. Katyusha: Accelerated varace reducto for faster sgd. ArXv e-prt Bach Fracs. Adaptvty of averaged stochastc gradet descet to local strog covexty for logstc regresso. Joural of Mache Learg Research 5: Balamuruga Palaappa ad Bach Fracs. Stochastc varace reducto methods for saddle-pot problems. I Advaces Neural Iformato Processg Systems NIPS 9 pp Bertsekas Dmtr P. Icremetal gradet subgradet ad proxmal methods for covex optmzato: A survey. I Sra Suvrt Nowoz Sebasta ad Wrght Stephe J. eds. Optmzato for Mache Learg chapter 4 pp MIT Press 0. Chambolle Ato ad Pock Thomas. A frst-order prmal-dual algorthm for covex problems wth applcatos to magg. Joural of Mathematcal Imagg ad Vso 40: Chambolle Ato ad Pock Thomas. O the ergodc covergece rates of a frst-order prmal dual algorthm. Mathematcal Programmg Seres A 59: Defazo Aaro Bach Fracs ad Lacoste-Jule Smo. Saga: A fast cremetal gradet method wth support for o-strogly covex composte objectves. I Advaces Neural Iformato Processg Systems pp Deg We ad Y Wotao. O the global ad lear covergece of the geeralzed alteratg drecto method of multplers. Joural of Scetfc Computg 663: Fercoq Olver ad Rchtárk Peter. Accelerated parallel ad proxmal coordate descet. SIAM Joural o Optmzato 54: Goldste Tom L M Yua Xaomg Esser Ere ad Barauk Rchard. Adaptve prmal-dual hybrd gradet methods for saddle-pot problems. arxv preprt arxv: Johso Re ad Zhag Tog. Acceleratg stochastc gradet descet usg predctve varace reducto. I Advaces Neural Iformato Processg Systems pp La Guaghu ad Zhou Y. A optmal radomzed cremetal gradet method. arxv preprt arxv: L Hogzhou Maral Jule ad Harchaou Zad. A uversal catalyst for frst-order optmzato. I Advaces Neural Iformato Processg Systems pp a. L Qhag Lu Zhaosog ad Xao L. A accelerated radomzed proxmal coordate gradet method ad ts applcato to regularzed emprcal rsk mmzato. SIAM Joural o Optmzato 54: b. Maltsky Yura ad Pock Thomas. A frst-order prmal-dual algorthm wth lesearch. arxv preprt arxv: Nedc Agela ad Bertsekas Dmtr P. Icremetal subgradet methods for odfferetable optmzato. SIAM Joural o Optmzato : Nesterov Y. Itroductory Lectures o Covex Optmzato: A Basc Course. Kluwer Bosto 004. Nesterov Yu. Effcecy of coordate descet methods o huge-scale optmzato problems. SIAM Joural o Optmzato : Rchtárk Peter ad Takáč Mart. Iterato complexty of radomzed block-coordate descet methods for mmzg a composte fucto. Mathematcal Programmg 44-: Roux Ncolas L Schmdt Mark ad Bach Fracs. A stochastc gradet method wth a expoetal covergece rate for fte trag sets. I Advaces Neural Iformato Processg Systems pp Shalev-Shwartz Sha. Sdca wthout dualty regularzato ad dvdual covexty. I Proceedgs of The 33rd Iteratoal Coferece o Mache Learg pp Shalev-Shwartz Sha ad Zhag Tog. Stochastc dual coordate ascet methods for regularzed loss mmzato. Joural of Mache Learg Research 4Feb: Shalev-Shwartz Sha ad Zhag Tog. Accelerated proxmal stochastc dual coordate ascet for regularzed loss mmzato. Mathematcal Programmg 55-: Xao L ad Zhag Tog. A proxmal stochastc gradet method wth progressve varace reducto. SIAM Joural o Optmzato 44: Zhag Yuche ad Xao L. Stochastc prmal-dual coordate method for regularzed emprcal rsk mmzato. I Proceedgs of The 3d Iteratoal Coferece o Mache Learg pp

11 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms I the followg appedces we provde detaled proofs of theorems stated the ma paper. I Secto A we frst prove a basc equalty whch s useful throughout the rest of the covergece aalyss. Secto B cotas geeral aalyss of the batch prmal-dual algorthm that are commo for provg both Theorem ad Theorem 3. Sectos C D E ad F gve proofs for Theorem Theorem 3 Theorem ad Theorem 4 respectvely. A. A basc lemma Lemma. Let h be a strctly covex fucto ad D h be ts Bregma dvergece. Suppose ψ s ν-strogly covex wth respect to D h ad/δ-smooth wth respect to the Eucldea orm ad ŷ = argm y C { ψyηdh yȳ } where C s a compact covex set that les wth the relatve teror of the domas of h ad ψ.e. both h ad ψ are dfferetable over C. The for ay y C ad ρ [0 ] we have ψyηd h y x ψŷηd h ŷȳ η ρν D h yŷ ρδ ψy ψŷ. Proof. The mmzer ŷ satsfes the followg frst-order optmalty codto: ψŷη D h ŷȳ y ŷ 0 y C. Here D deotes partal gradet of the Bregma dvergece wth respect to ts frst argumet.e. Dŷ ȳ = hŷ hȳ. So the above optmalty codto s the same as ψŷη hŷ hȳ y ŷ 0 y C. 7 Sceψ sν-strogly covex wth respect tod h ad/δ-smooth we have ψy ψŷ ψŷy ˆx νd h yŷ ψy ψŷ ψŷy ŷ δ ψy ψŷ. For the secod equalty see e.g. Theorem..5 Nesterov 004. Multplyg the two equaltes above by ρ adρrespectvely ad addg them together we have ψy ψŷ ψŷy ŷ ρνd h yŷ ρδ ψy ψŷ. The Bregma dvergeced h satsfes the followg equalty: D h yȳ = D h yŷd h ŷȳ hŷ hȳ y ŷ. We multply ths equalty byη ad add t to the last equalty to obta ψyηd h yȳ ψŷηd h yŷ η ρν D h ŷȳ ρδ ψy ψŷ ψŷη hŷ hȳ y ŷ. Usg the optmalty codto 7 the last term of er product s oegatve ad thus ca be dropped whch gves the desred equalty. B. Commo Aalyss of Batch Prmal-Dual Algorthms We cosder the geeral prmal-dual update rule as:

12 Iterato: ˆxŷ = PD τ xȳ xỹ Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms ˆx = arg m x R d ŷ = arg m y R { gxỹ T Ax τ Each terato of Algorthm s equvalet to the followg specfcato ofpd τ : } x x 8 {f y y T A x Dyȳ }. 9 ˆx = x t x = x t x = x t θx t x t ŷ = y t ȳ = y t ỹ = y t. 0 Besdes Assumpto we also assume that f sν-strogly covex wth respect to a kerel fuctoh.e. whered h s the Bregma dvergece defed as f y f y f yy y νd h y y D h y y = hy hy hyy y. We assume thaths -strogly covex ad/δ -smooth. Depedg o the kerel fuctoh ths assumpto of may mpose addtoal restrctos o f. I ths paper we are mostly terested two specal cases: hy = / y ad hy = f y for the latter we always have ν =. From ow o we wll omt the subscrpt h ad use D deote the Bregma dvergece. Uder the above assumptos ay solutox y to the saddle-pot problem 6 satsfes the optmalty codto: The optmalty codtos for the updates descrbed equatos 8 ad 9 are A T y gx Ax = f y. A T ỹ x ˆx gˆx 3 τ A x hŷ hȳ = f ŷ. 4 Applyg Lemma to the dual mmzato step 9 wth ψy = f y y T A x η = / y = y ad ρ = / we obta f y y T A x Dy ȳ f ŷ ŷ T A x Dŷȳ ν Dy ŷ δ f y f ŷ. 5 4 Smlarly for the prmal mmzato step 8 we have settgρ = 0 gx ỹ T Ax τ x x gˆxỹ T Aˆx τ ˆx x τ λ x ˆx. 6 Combg the two equaltes above wth the deftolxy = gxy T Ax f y we get Lˆxy Lx ŷ = gˆxy T Aˆx f y gx ŷ T Ax f ŷ τ x x Dy ȳ τ λ x ˆx ν Dy ŷ τ ˆx x Dŷȳ δ f y f ŷ 4 y T Aˆx ŷ T Ax ỹ T Ax ỹ T Aˆx y T A xŷ T A x.

13 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms We ca smplfy the er product terms as y T Aˆx ŷ T Ax ỹ T Ax ỹ T Aˆx y T A xŷ T A x = ŷ ỹ T Aˆx x ŷ y T Aˆx x. Rearragg terms o the two sdes of the equalty we have τ x x Dy ȳ Lˆxy Lx ŷ τ λ x ˆx ν Dy ŷ τ ˆx x Dŷȳ δ f y f ŷ 4 ŷ y T Aˆx x ŷ ỹ T Aˆx x. Applyg the substtutos 0 yelds τ x x t Dy y t Lx t y Lx y t τ λ x x t ν Dy y t τ xt x t Dyt y t δ f y f y t 4 y t y T A x t x t θx t x t. 7 We ca rearrage the er product term 7 as y t y T A x t x t θx t x t = y t y T Ax t x t θy t y T Ax t x t θy t y t T Ax t x t. Usg the optmalty codtos ad 4 we ca also boud f y f y t : = f y f y t Ax A x t θx t x t hy t hy t α Ax x t α θax t x t hy t hy t whereα >. Wth the deftoµ = λ m A T A we also have Ax x t µ x x t. Combg them wth the equalty 7 leads to τ x x t Dy y t θy t y T Ax t x t Lx t y Lx y t τ λ x x t ν Dy y t y t y T Ax t x t τ xt x t Dyt y t θy t y t T Ax t x t δµ α 4 x x t α δ θax t x t hy 4 t hy t. 8 3

14 C. Proof of Theorem Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Let the kerel fucto behy = / y. I ths case we havedy y = / y y ad hy = y. Moreover = δ = adν =. Therefore the equalty 8 becomes τ δµ x x t α y y t θy t y T Ax t x t Lx t y Lx y t τ λ x x t y y t y t y T Ax t x t τ xt x t yt y t θy t y t T Ax t x t α δ θax t x t 4 yt y t. 9 Next we derve aother form of the uderled tems above: yt y t θy t y t T Ax t x t = yt y t θ yt y t T Ax t x t = θax t x t yt y t θ Ax t x t θax t x t yt y t θ L x t x t where the last equalty we used A L ad hece Ax t x t L x t x t. Combg wth equalty 9 we have τ δµ x t x α yt y θy t y T Ax t x t θ L x t x t Lx t y Lx y t τ λ x t x y t y y t y T Ax t x t τ xt x t θax α δ t x t 4 yt y t. 30 We ca remove the last term the above equalty as log as ts coeffcet s oegatve.e. α δ 4 0. I order to maxmze /α we take the equalty ad solve for the largest value ofαallowed whch results α = δ α = δ. Applyg these values 30 gves τ δµ x t x δ yt y θy t y T Ax t x t θ L x t x t Lx t y Lx y t τ λ x t x y t y y t y T Ax t x t τ xt x t. 3 4

15 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms We use t to deote the last row 3. Equvaletly we defe t = τ λ x x t y y t y t y T Ax t x t τ xt x t = τ λ x x t 4 y y t [ ] x t x t T [ y y t τ I ][ ] AT x t x t A y y t. The quadratc form the last term s oegatve f the matrx M = [ τ I AT A ] s postve semdefte for whch a suffcet codto sτ /L. Uder ths codto t τ λ x x t 4 y y t 0. 3 If we ca to chooseτ ad so that τ δµ δ θ τ λ θ θ L θ τ 33 the accordg to 3 we have t Lx t y Lx y t θ t. Because t 0 adlx t y Lx y t 0 for ayt 0 we have t θ t whch mples ad t θ t 0 Lx t y Lx y t θ t 0. Letθ x adθ y be two cotracto factors determed by the frst two equaltes 33.e. / θ x = τ δµ δ τ λ = θ y = / = /. τδµ δ τλ The we ca let θ = max{θ x θ y }. We ote that ayθ < would satsfy the last codto 33 provded that τ = L whch also makes the matrxm postve semdefte ad thus esures the equalty 3. Amog all possble parsτ that satsfy τ = /L we choose whch gve the desred results of Theorem. τ = L λδµ = λδµ 34 L 5

16 D. Proof of Theorem 3 If we chooseh = f the Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms h s-strogly covex ad/δ-smooth.e. = adδ = δ; f s-strogly covex wth respect toh.e.ν =. For coveece we repeat equalty 8 here: τ x x t Dy y t θy t y T Ax t x t Lx t y Lx y t τ λ x x t ν Dy y t y t y T Ax t x t τ xt x t Dyt y t θy t y t T Ax t x t δµ α 4 x x t α δ θax t x t hy 4 t hy t. 35 We frst boud the Bregma dvergece Dy t y t usg the assumpto that the kerel h s -strogly covex ad /δ-smooth. Usg smlar argumets as the proof of Lemma we have for ayρ [0] Dy t y t = hy t hy t hy t y t y t ρ yt y t ρ δ hy t hy t. 36 For ayβ > 0 we ca lower boud the er product term I addto we have θy t y t T Ax t x t β yt y t θ L β xt x t. θax t x t hy t hy t θ L x t x t hy t hy t. Combg these bouds wth 35 ad 36 wth ρ = / we arrve at τ δµ α θ L L β α δθ x x t Dy y t θy t y T Ax t x t x t x t Lx t y Lx y t τ λ x x t Dy y t y t y T Ax t x t 4 β δ y t y t 4 α δ hy t hy t τ xt x t. 37 We chooseαadβ 37 to zero out the coeffcets of y t y t ad hy t hy t : α = β =. 6

17 The the equalty 37 becomes τ δµ 4 θ L Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms x x t Dy y t θy t y T Ax t x t δθ L 4 x t x t Lx t y Lx y t τ λ x x t Dy y t y t y T Ax t x t τ xt x t. The coeffcet of x t x t ca be bouded as θ L δθ L 4 = 4 δ θ L = 4δ 4 θ L < θ L where the equalty we used δ. Therefore we have x τ δµ x t 4 Dy y t θy t y T Ax t x t θ L x t x t Lx t y Lx y t τ λ x x t Dy y t y t y T Ax t x t τ xt x t. We use t to deote the last row of the above equalty. Equvaletly we defe t = τ λ x x t Dy y t y t y T Ax t x t τ xt x t. Scehs-strogly covex we havedy y t y y t ad thus t = τ λ x x t Dy y t τ λ x x t Dy y t The quadratc form the last term s oegatve fτ /L. Uder ths codto t yt y y t y T Ax t x t τ xt x t [ ] x t x t T [ y y t τ I ][ ] AT x t x t A y y t. τ λ x x t Dy y t If we ca to chooseτ ad so that τ δµ 4 θ τ λ θ θ L θ τ 39 the we have t Lx t y Lx y t θ t. Because t 0 adlx t y Lx y t 0 for ayt 0 we have t θ t whch mples t θ t 0 7

18 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms ad Lx t y Lx y t θ t 0. To satsfy the last codto 39 ad also esure the equalty 38 t suffces to have τ 4L. We choose τ = L λδµ = λδµ. L Wth the above choce ad assumgλδµ L we have θ y = For the cotracto factor over the prmal varables we have = / = λδµ /4L λδµ. 4L θ x = τ δµ 4 τδµ 4 δµ 44L τ λ = τλ = τλ δµ 6L λ L λδµ. Ths fshes the proof of Theorem 3. E. Proof of Theorem We cosder the SPDC algorthm the Eucldea case wthhx = / x. The correspodg batch case aalyss s gve Secto C. For each=... let ỹ be ỹ = argm y Based o the frst-order optmalty codto we have Also sce y mmzesφ y y a x we have By Lemma wth ρ = / we have y a x t φ y ad re-arragg terms we get { φ y } y yt y a x t. a x t ỹ y t φ ỹ. yt y y t y ỹ y a x φ y. ỹ y φ ỹ ỹ a x t ỹ y t δ 4 φ ỹ φ y ỹ y t ỹ y a x t φ ỹ φ y δ 4 φ ỹ φ y. 40 8

19 Notce that Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms E[y t ] = ỹ y t E[y t y ] = ỹ y y t ] = ỹ y t E[y t yt y E[φ y t ] = φ ỹ φ y t. Plug the above relatos to 40 ad dvde both sdes by we have y t y 4 4 ad summg over =... we get 4 where u t = y t y = E[y t y ] E[y t y t ] yt y E[φ y t δ 4 y t a u t = E[yt ] φ y t φ y t a x t x ỹ y t a x t y t ] φ y E[ y t y ] E[ yt y t ] 4 φ k yt k φ k yt k = u t u t u t u x t δ 4 Ax x t ỹ yt = y t a ad u = O the other had scex t mmzes the τ λ-strogly covex objectve gx u t u t u t x x xt τ we ca apply Lemma wth ρ = 0 to obta gx u t u t u t x xt x gx t u t u t u t x t xt x t ad re-arragg terms we get x t x τ τ λ τ τ φ yt φ y y a. = τ λ x t x E[ x t x ] E[ xt x t ] E[gx t gx ] τ E[ u t u t u t x t x ]. 9

20 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Also otce that Lx t y Lx y Lx y Lx y t Lx y Lx y t = φ yt φ y φ k yt k φ k yt k gxt gx = u x t u t x u t u t x. Combg everythg together we have x t x τ 4 τ λ E[ x t x ] y t y Lx y Lx y t E[ y t y ] E[ xt x t ] E[ yt y t ] 4 τ E[Lx t y Lx y Lx y Lx y t ] E[ u t u u t u t x t x t ] δ 4 Ax x t ỹ yt. Next we otce that δ 4 Ax x t E[yt ] y t for someα > ad Ax x t µ x x t ad θaxt x t ỹ yt = δ 4 Ax x t θax t x t ỹ yt δ Ax x t α 4 α δ 4 θaxt x t ỹ yt θ Ax t x t ỹ yt θ L x t x t E[ yt y t ]. We follow the same reasog as the stadard SPDC aalyss u t u u t u t x t x t = yt y T Ax t x t y t y t T Ax t x t θy t y t T Ax t x t ad usg Cauchy-Schwartz equalty we have ad y t y t T Ax t x t yt y t T A /τ yt y t /τr y t y t T Ax t x t yt y t T A /τ yt y t /τr. θyt y T Ax t x t xt x t 8τ xt x t 8τ 0

21 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Thus we get u t u u t u t x t x t yt y T Ax t x t yt y t /4τR xt x t 8τ Puttg everythg together we have τ /αδµ x t x 4 Lx y Lx y t θ τ λ E[ x t x ] 4 θ xt x t. 8τ 4 8τ α θδl E[Lx t y Lx y Lx y Lx y t ] τ E[ x t x t ] 8τ 4R τ α δ E[ y t y t ]. θyt y T Ax t x t y t y θlx t y Lx y x t x t θyt y T Ax t x t E[ y t y ] E[yt y T Ax t x t ] If we choose the parameters as α = τ = 4δ 6R the we kow 4R τ α δ = 4 8 > 0 ad α θδl L 8 R 8 56τ thus 8τ α θδl 3 8τ. I addto we have α = 4δ. Fally we obta τ δµ x t x 4 4δ y t y θlx t y Lx y 4 Lx y Lx y t 3 θ 8τ xt x t θyt y T Ax t x t τ λ E[ x t x ] E[ y t y ] E[yt y T Ax t x t ] 4 E[Lx t y Lx y Lx y Lx y t ] 3 8τ E[ xt x t ]. Now we ca defe θ x ad θ y as the ratos betwee the coeffcets the x-dstace ad y-dstace terms ad let θ = max{θ x θ y } as before. Choosg the step-sze parameters as λδµ gves the desred result. τ = 4R λδµ = 4R

22 F. Proof of Theorem 4 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms I ths settg for-th coordate of the dual varablesy we chooseh = φ let ad defe For =... let ỹ be D y y = φ y φ y φ y y y ỹ = argm y Dyy = Based o the frst-order optmalty codto we have Also scey mmzesφ y y a x we have D y y. = { } φ y D yy t y a x t. a x t φ ỹ φ y t φ ỹ. a x φ y. Usg Lemma wthρ = / we obta y a x t φ y D y yt D y ỹ φ ỹ ỹ a x t ad rearragg terms we get D y yt D ỹ y t δ 4 φ ỹ φ y D y ỹ D ỹ y t ỹ y a x t φ ỹ φ y δ 4 φ ỹ φ y. 4 Wth..d. radom samplg at each terato we have the followg relatos: E[y t ] = ỹ y t E[D y t y ] =D ỹ y Dy t y E[D y t y t ] = D ỹ y t E[φ yt ] = φ ỹ φ yt. Pluggg the above relatos to 4 ad dvdg both sdes by we have D y t y D y t y E[D y t E[y t y t ] yt y y t ] a x t E[φ y t ] φ y t φ y t φ y δ a x t x φ ỹ φ y t 4

23 ad summg over =... we get Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Dy t y E[Dy t y ] E[Dyt y t ] φ ky t k φ ky t k = u t u t u t u x t δ 4 Ax x t φ whereφ y t s a -dmesoal vector such that the-th coordate s φ y t φ y ỹ φ y t ad u t = = y t a u t = [φ y t ] = φ y t = y t a ad u = y a. = O the other had scex t mmzes a τ λ-strogly covex objectve gx u t u t u t x x xt τ we ca apply Lemma wth ρ = 0 to obta gx u t u t u t x xt x gx t u t u t u t x t xt x t ad rearragg terms we get Notce that x t x τ τ λ τ τ τ λ x t x E[ x t x ] E[ xt x t ] E[gx t gx ] τ E[ u t u t u t x t x ]. Lx t y Lx y Lx y Lx y t Lx y Lx y t = φ yt φ y φ k yt k φ k yt k gxt gx = u x t u t x u t u t x so x t x τ τ λ E[ x t x ] Dy t y Lx y Lx y t E[Dy t y ] E[ xt x t ] E[Dyt y t ] τ E[Lx t y Lx y Lx y Lx y t ] E[ u t u u t u t x t x t ] δ 4 Ax x t φ ỹ φ y t. 3

24 Next we have δ 4 Ax x t φ for ayα > ad ad θaxt x t φ Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms ỹ φ y t ỹ φ y t = δ 4 Ax x t θax t x t φ δ Ax x t α 4 α δ 4 θaxt x t φ Ax x t µ x x t Followg the same reasog as the stadard SPDC aalyss we have ỹ φ y t ỹ φ y t θ Ax t x t φ ỹ φ y t ] u t u u t u t x t x t = yt y T Ax t x t θ L x t x t E[ φ y t φ y t ]. y t y t T Ax t x t θy t y t T Ax t x t ad usg Cauchy-Schwartz equalty we have ad Thus we get y t y t T Ax t x t yt y t T A /τ yt y t /τr y t y t T Ax t x t yt y t T A /τ yt y t /τr. u t u u t u t x t x t yt y T Ax t x t yt y t /4τR xt x t 8τ θ xt x t. 8τ Also we ca lower boud the termdy t y t usg Lemma wthρ = /: Dy t y t = = = φ yt φ yt φ y t θyt y T Ax t x t xt x t 8τ xt x t 8τ y t θyt y T Ax t x t y t yt y t δ φ y t φ y t = yt y t δ φ y t φ y t. 4

25 Explotg Strog Covexty from Data wth Prmal-Dual Frst-Order Algorthms Combg everythg above together we have τ /αδµ x t x 4 Lx y Lx y t θ 8τ α θδl τ λ E[ x t x ] Dy t y θlx t y Lx y x t x t θyt y T Ax t x t E[Dy t y ] E[yt y T Ax t x t ] E[Lx t y Lx y Lx y Lx y t ] τ E[ x t x t ] 8τ 4R τ E[ y t y t ] δ α δ E[ φ y t φ y t ]. If we choose the parameters as the we kow ad ad thus I addto we have α θδl α = 4 τ = 6R 4R τ = 4 > 0 δ α δ = δ δ 8 > 0 δl 8 δr δ 8 56τ 56τ 8τ α θδl 3 8τ. α = 4. Fally we obta τ δµ x t x 44 Dy t y θlx t y Lx y Lx y Lx y t 3 θ 8τ xt x t θyt y T Ax t x t τ λ E[ x t x ] E[ y t y ] E[yt y T Ax t x t ] E[Lx t y Lx y Lx y Lx y t ] 3 8τ E[ xt x t ]. As before we ca defe θ x ad θ y as the ratos betwee the coeffcets the x-dstace ad y-dstace terms ad let θ = max{θ x θ y }. The choosg the step-sze parameters as gves the desred result. τ = 4R λδµ = λδµ 4R 5

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

Functions of Random Variables

Functions of Random Variables Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

Lecture 02: Bounding tail distributions of a random variable

Lecture 02: Bounding tail distributions of a random variable CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

CHAPTER 4 RADICAL EXPRESSIONS

CHAPTER 4 RADICAL EXPRESSIONS 6 CHAPTER RADICAL EXPRESSIONS. The th Root of a Real Number A real umber a s called the th root of a real umber b f Thus, for example: s a square root of sce. s also a square root of sce ( ). s a cube

More information

PROJECTION PROBLEM FOR REGULAR POLYGONS

PROJECTION PROBLEM FOR REGULAR POLYGONS Joural of Mathematcal Sceces: Advaces ad Applcatos Volume, Number, 008, Pages 95-50 PROJECTION PROBLEM FOR REGULAR POLYGONS College of Scece Bejg Forestry Uversty Bejg 0008 P. R. Cha e-mal: sl@bjfu.edu.c

More information

Kernel-based Methods and Support Vector Machines

Kernel-based Methods and Support Vector Machines Kerel-based Methods ad Support Vector Maches Larr Holder CptS 570 Mache Learg School of Electrcal Egeerg ad Computer Scece Washgto State Uverst Refereces Muller et al. A Itroducto to Kerel-Based Learg

More information

Unsupervised Learning and Other Neural Networks

Unsupervised Learning and Other Neural Networks CSE 53 Soft Computg NOT PART OF THE FINAL Usupervsed Learg ad Other Neural Networs Itroducto Mture Destes ad Idetfablty ML Estmates Applcato to Normal Mtures Other Neural Networs Itroducto Prevously, all

More information

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called

More information

MATH 247/Winter Notes on the adjoint and on normal operators.

MATH 247/Winter Notes on the adjoint and on normal operators. MATH 47/Wter 00 Notes o the adjot ad o ormal operators I these otes, V s a fte dmesoal er product space over, wth gve er * product uv, T, S, T, are lear operators o V U, W are subspaces of V Whe we say

More information

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package

More information

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015 Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts

More information

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall

More information

Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms

Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms Exploitig Strog Covexity from Data with Primal-Dual First-Order Algorithms Jialei Wag Li Xiao Abstract We cosider empirical risk miimizatio of liear predictors with covex loss fuctios. Such problems ca

More information

Analysis of Variance with Weibull Data

Analysis of Variance with Weibull Data Aalyss of Varace wth Webull Data Lahaa Watthaacheewaul Abstract I statstcal data aalyss by aalyss of varace, the usual basc assumptos are that the model s addtve ad the errors are radomly, depedetly, ad

More information

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab Lear Regresso Lear Regresso th Shrkage Some sldes are due to Tomm Jaakkola, MIT AI Lab Itroducto The goal of regresso s to make quattatve real valued predctos o the bass of a vector of features or attrbutes.

More information

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades STAT 101 Dr. Kar Lock Morga 11/20/12 Exam 2 Grades Multple Regresso SECTIONS 9.2, 10.1, 10.2 Multple explaatory varables (10.1) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (10.2) Trasformatos

More information

Arithmetic Mean and Geometric Mean

Arithmetic Mean and Geometric Mean Acta Mathematca Ntresa Vol, No, p 43 48 ISSN 453-6083 Arthmetc Mea ad Geometrc Mea Mare Varga a * Peter Mchalča b a Departmet of Mathematcs, Faculty of Natural Sceces, Costate the Phlosopher Uversty Ntra,

More information

A Remark on the Uniform Convergence of Some Sequences of Functions

A Remark on the Uniform Convergence of Some Sequences of Functions Advaces Pure Mathematcs 05 5 57-533 Publshed Ole July 05 ScRes. http://www.scrp.org/joural/apm http://dx.do.org/0.436/apm.05.59048 A Remark o the Uform Covergece of Some Sequeces of Fuctos Guy Degla Isttut

More information

PTAS for Bin-Packing

PTAS for Bin-Packing CS 663: Patter Matchg Algorthms Scrbe: Che Jag /9/00. Itroducto PTAS for B-Packg The B-Packg problem s NP-hard. If we use approxmato algorthms, the B-Packg problem could be solved polyomal tme. For example,

More information

Qualifying Exam Statistical Theory Problem Solutions August 2005

Qualifying Exam Statistical Theory Problem Solutions August 2005 Qualfyg Exam Statstcal Theory Problem Solutos August 5. Let X, X,..., X be d uform U(,),

More information

2006 Jamie Trahan, Autar Kaw, Kevin Martin University of South Florida United States of America

2006 Jamie Trahan, Autar Kaw, Kevin Martin University of South Florida United States of America SOLUTION OF SYSTEMS OF SIMULTANEOUS LINEAR EQUATIONS Gauss-Sedel Method 006 Jame Traha, Autar Kaw, Kev Mart Uversty of South Florda Uted States of Amerca kaw@eg.usf.edu Itroducto Ths worksheet demostrates

More information

4 Inner Product Spaces

4 Inner Product Spaces 11.MH1 LINEAR ALGEBRA Summary Notes 4 Ier Product Spaces Ier product s the abstracto to geeral vector spaces of the famlar dea of the scalar product of two vectors or 3. I what follows, keep these key

More information

New Optimisation Methods for Machine Learning Aaron Defazio

New Optimisation Methods for Machine Learning Aaron Defazio New Optmsato Methods for Mache Learg Aaro Defazo A thess submtted for the degree of Doctor of Phlosophy of The Australa Natoal Uversty October 205 c Aaro Defazo 204 Except where otherwse dcated, ths thess

More information

9.1 Introduction to the probit and logit models

9.1 Introduction to the probit and logit models EC3000 Ecoometrcs Lecture 9 Probt & Logt Aalss 9. Itroducto to the probt ad logt models 9. The logt model 9.3 The probt model Appedx 9. Itroducto to the probt ad logt models These models are used regressos

More information

New Optimisation Methods for Machine Learning

New Optimisation Methods for Machine Learning New Optmsato Methods for Mache Learg Aaro Defazo (Uder Examato) A thess submtted for the degree of Doctor of Phlosophy of The Australa Natoal Uversty November 204 c Aaro Defazo 204 Except where otherwse

More information

THE EFFICIENCY OF EMPIRICAL LIKELIHOOD WITH NUISANCE PARAMETERS

THE EFFICIENCY OF EMPIRICAL LIKELIHOOD WITH NUISANCE PARAMETERS Joural of Mathematcs ad Statstcs (: 5-9, 4 ISSN: 549-3644 4 Scece Publcatos do:.3844/jmssp.4.5.9 Publshed Ole ( 4 (http://www.thescpub.com/jmss.toc THE EFFICIENCY OF EMPIRICAL LIKELIHOOD WITH NUISANCE

More information

Solution of General Dual Fuzzy Linear Systems. Using ABS Algorithm

Solution of General Dual Fuzzy Linear Systems. Using ABS Algorithm Appled Mathematcal Sceces, Vol 6, 0, o 4, 63-7 Soluto of Geeral Dual Fuzzy Lear Systems Usg ABS Algorthm M A Farborz Aragh * ad M M ossezadeh Departmet of Mathematcs, Islamc Azad Uversty Cetral ehra Brach,

More information

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn: Chapter 3 3- Busess Statstcs: A Frst Course Ffth Edto Chapter 2 Correlato ad Smple Lear Regresso Busess Statstcs: A Frst Course, 5e 29 Pretce-Hall, Ic. Chap 2- Learg Objectves I ths chapter, you lear:

More information

Logistic regression (continued)

Logistic regression (continued) STAT562 page 138 Logstc regresso (cotued) Suppose we ow cosder more complex models to descrbe the relatoshp betwee a categorcal respose varable (Y) that takes o two (2) possble outcomes ad a set of p explaatory

More information

QR Factorization and Singular Value Decomposition COS 323

QR Factorization and Singular Value Decomposition COS 323 QR Factorzato ad Sgular Value Decomposto COS 33 Why Yet Aother Method? How do we solve least-squares wthout currg codto-squarg effect of ormal equatos (A T A A T b) whe A s sgular, fat, or otherwse poorly-specfed?

More information

Lecture Notes Types of economic variables

Lecture Notes Types of economic variables Lecture Notes 3 1. Types of ecoomc varables () Cotuous varable takes o a cotuum the sample space, such as all pots o a le or all real umbers Example: GDP, Polluto cocetrato, etc. () Dscrete varables fte

More information

8.1 Hashing Algorithms

8.1 Hashing Algorithms CS787: Advaced Algorthms Scrbe: Mayak Maheshwar, Chrs Hrchs Lecturer: Shuch Chawla Topc: Hashg ad NP-Completeess Date: September 21 2007 Prevously we looked at applcatos of radomzed algorthms, ad bega

More information

D KL (P Q) := p i ln p i q i

D KL (P Q) := p i ln p i q i Cheroff-Bouds 1 The Geeral Boud Let P 1,, m ) ad Q q 1,, q m ) be two dstrbutos o m elemets, e,, q 0, for 1,, m, ad m 1 m 1 q 1 The Kullback-Lebler dvergece or relatve etroy of P ad Q s defed as m D KL

More information

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I Chapter 8 Heterosedastcty Recall MLR 5 Homsedastcty error u has the same varace gve ay values of the eplaatory varables Varu,..., = or EUU = I Suppose other GM assumptos hold but have heterosedastcty.

More information

LINEAR REGRESSION ANALYSIS

LINEAR REGRESSION ANALYSIS LINEAR REGRESSION ANALYSIS MODULE V Lecture - Correctg Model Iadequaces Through Trasformato ad Weghtg Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur Aalytcal methods for

More information

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg

More information

Statistics MINITAB - Lab 5

Statistics MINITAB - Lab 5 Statstcs 10010 MINITAB - Lab 5 PART I: The Correlato Coeffcet Qute ofte statstcs we are preseted wth data that suggests that a lear relatoshp exsts betwee two varables. For example the plot below s of

More information

COMPROMISE HYPERSPHERE FOR STOCHASTIC DOMINANCE MODEL

COMPROMISE HYPERSPHERE FOR STOCHASTIC DOMINANCE MODEL Sebasta Starz COMPROMISE HYPERSPHERE FOR STOCHASTIC DOMINANCE MODEL Abstract The am of the work s to preset a method of rakg a fte set of dscrete radom varables. The proposed method s based o two approaches:

More information

10.1 Approximation Algorithms

10.1 Approximation Algorithms 290 0. Approxmato Algorthms Let us exame a problem, where we are gve A groud set U wth m elemets A collecto of subsets of the groud set = {,, } s.t. t s a cover of U: = U The am s to fd a subcover, = U,

More information

Introduction to Matrices and Matrix Approach to Simple Linear Regression

Introduction to Matrices and Matrix Approach to Simple Linear Regression Itroducto to Matrces ad Matrx Approach to Smple Lear Regresso Matrces Defto: A matrx s a rectagular array of umbers or symbolc elemets I may applcatos, the rows of a matrx wll represet dvduals cases (people,

More information

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic Covergece of the Desrozers scheme ad ts relato to the lag ovato dagostc chard Méard Evromet Caada, Ar Qualty esearch Dvso World Weather Ope Scece Coferece Motreal, August 9, 04 o t t O x x x y x y Oservato

More information

Chapter 10 Two Stage Sampling (Subsampling)

Chapter 10 Two Stage Sampling (Subsampling) Chapter 0 To tage amplg (usamplg) I cluster samplg, all the elemets the selected clusters are surveyed oreover, the effcecy cluster samplg depeds o sze of the cluster As the sze creases, the effcecy decreases

More information

Bootstrap Method for Testing of Equality of Several Coefficients of Variation

Bootstrap Method for Testing of Equality of Several Coefficients of Variation Cloud Publcatos Iteratoal Joural of Advaced Mathematcs ad Statstcs Volume, pp. -6, Artcle ID Sc- Research Artcle Ope Access Bootstrap Method for Testg of Equalty of Several Coeffcets of Varato Dr. Navee

More information

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i. CS 94- Desty Matrces, vo Neuma Etropy 3/7/07 Sprg 007 Lecture 3 I ths lecture, we wll dscuss the bascs of quatum formato theory I partcular, we wll dscuss mxed quatum states, desty matrces, vo Neuma etropy

More information

7.0 Equality Contraints: Lagrange Multipliers

7.0 Equality Contraints: Lagrange Multipliers Systes Optzato 7.0 Equalty Cotrats: Lagrage Multplers Cosder the zato of a o-lear fucto subject to equalty costrats: g f() R ( ) 0 ( ) (7.) where the g ( ) are possbly also olear fuctos, ad < otherwse

More information

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model 1. Estmatg Model parameters Assumptos: ox ad y are related accordg to the smple lear regresso model (The lear regresso model s the model that says that x ad y are related a lear fasho, but the observed

More information

Complete Convergence for Weighted Sums of Arrays of Rowwise Asymptotically Almost Negative Associated Random Variables

Complete Convergence for Weighted Sums of Arrays of Rowwise Asymptotically Almost Negative Associated Random Variables A^VÇÚO 1 32 ò 1 5 Ï 2016 c 10 Chese Joural of Appled Probablty ad Statstcs Oct., 2016, Vol. 32, No. 5, pp. 489-498 do: 10.3969/j.ss.1001-4268.2016.05.005 Complete Covergece for Weghted Sums of Arrays of

More information

Chapter -2 Simple Random Sampling

Chapter -2 Simple Random Sampling Chapter - Smple Radom Samplg Smple radom samplg (SRS) s a method of selecto of a sample comprsg of umber of samplg uts out of the populato havg umber of samplg uts such that every samplg ut has a equal

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1 STA 08 Appled Lear Models: Regresso Aalyss Sprg 0 Soluto for Homework #. Let Y the dollar cost per year, X the umber of vsts per year. The the mathematcal relato betwee X ad Y s: Y 300 + X. Ths s a fuctoal

More information

Chapter 3 Sampling For Proportions and Percentages

Chapter 3 Sampling For Proportions and Percentages Chapter 3 Samplg For Proportos ad Percetages I may stuatos, the characterstc uder study o whch the observatos are collected are qualtatve ature For example, the resposes of customers may marketg surveys

More information

1 Lyapunov Stability Theory

1 Lyapunov Stability Theory Lyapuov Stablty heory I ths secto we cosder proofs of stablty of equlbra of autoomous systems. hs s stadard theory for olear systems, ad oe of the most mportat tools the aalyss of olear systems. It may

More information

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation PGE 30: Formulato ad Soluto Geosystems Egeerg Dr. Balhoff Iterpolato Numercal Methods wth MATLAB, Recktewald, Chapter 0 ad Numercal Methods for Egeers, Chapra ad Caale, 5 th Ed., Part Fve, Chapter 8 ad

More information

Entropy ISSN by MDPI

Entropy ISSN by MDPI Etropy 2003, 5, 233-238 Etropy ISSN 1099-4300 2003 by MDPI www.mdp.org/etropy O the Measure Etropy of Addtve Cellular Automata Hasa Aı Arts ad Sceces Faculty, Departmet of Mathematcs, Harra Uversty; 63100,

More information

DIFFERENTIAL GEOMETRIC APPROACH TO HAMILTONIAN MECHANICS

DIFFERENTIAL GEOMETRIC APPROACH TO HAMILTONIAN MECHANICS DIFFERENTIAL GEOMETRIC APPROACH TO HAMILTONIAN MECHANICS Course Project: Classcal Mechacs (PHY 40) Suja Dabholkar (Y430) Sul Yeshwath (Y444). Itroducto Hamltoa mechacs s geometry phase space. It deals

More information

3. Basic Concepts: Consequences and Properties

3. Basic Concepts: Consequences and Properties : 3. Basc Cocepts: Cosequeces ad Propertes Markku Jutt Overvew More advaced cosequeces ad propertes of the basc cocepts troduced the prevous lecture are derved. Source The materal s maly based o Sectos.6.8

More information

Stochastic Convex Optimization

Stochastic Convex Optimization Stochastc Covex Optmzato Sha Shalev-Shwartz TTI-Chcago sha@tt-c.org Ohad Shamr The Hebrew Uversty ohadsh@cs.huj.ac.l Natha Srebro TTI-Chcago at@uchcago.edu Karthk Srdhara TTI-Chcago karthk@tt-c.org Abstract

More information

STK3100 and STK4100 Autumn 2017

STK3100 and STK4100 Autumn 2017 SK3 ad SK4 Autum 7 Geeralzed lear models Part III Covers the followg materal from chaters 4 ad 5: Sectos 4..5, 4.3.5, 4.3.6, 4.4., 4.4., ad 4.4.3 Sectos 5.., 5.., ad 5.5. Ørulf Borga Deartmet of Mathematcs

More information

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006 Mult-Layer Refracto Problem Rafael Espercueta, Bakersfeld College, November, 006 Lght travels at dfferet speeds through dfferet meda, but refracts at layer boudares order to traverse the least-tme path.

More information

C.11 Bang-bang Control

C.11 Bang-bang Control Itroucto to Cotrol heory Iclug Optmal Cotrol Nguye a e -.5 C. Bag-bag Cotrol. Itroucto hs chapter eals wth the cotrol wth restrctos: s boue a mght well be possble to have scotutes. o llustrate some of

More information

We have already referred to a certain reaction, which takes place at high temperature after rich combustion.

We have already referred to a certain reaction, which takes place at high temperature after rich combustion. ME 41 Day 13 Topcs Chemcal Equlbrum - Theory Chemcal Equlbrum Example #1 Equlbrum Costats Chemcal Equlbrum Example #2 Chemcal Equlbrum of Hot Bured Gas 1. Chemcal Equlbrum We have already referred to a

More information

ECE 559: Wireless Communication Project Report Diversity Multiplexing Tradeoff in MIMO Channels with partial CSIT. Hoa Pham

ECE 559: Wireless Communication Project Report Diversity Multiplexing Tradeoff in MIMO Channels with partial CSIT. Hoa Pham ECE 559: Wreless Commucato Project Report Dversty Multplexg Tradeoff MIMO Chaels wth partal CSIT Hoa Pham. Summary I ths project, I have studed the performace ga of MIMO systems. There are two types of

More information

Probabilistic Meanings of Numerical Characteristics for Single Birth Processes

Probabilistic Meanings of Numerical Characteristics for Single Birth Processes A^VÇÚO 32 ò 5 Ï 206 c 0 Chese Joural of Appled Probablty ad Statstcs Oct 206 Vol 32 No 5 pp 452-462 do: 03969/jss00-426820605002 Probablstc Meags of Numercal Characterstcs for Sgle Brth Processes LIAO

More information

Research on SVM Prediction Model Based on Chaos Theory

Research on SVM Prediction Model Based on Chaos Theory Advaced Scece ad Techology Letters Vol.3 (SoftTech 06, pp.59-63 http://dx.do.org/0.457/astl.06.3.3 Research o SVM Predcto Model Based o Chaos Theory Sog Lagog, Wu Hux, Zhag Zezhog 3, College of Iformato

More information

Chapter 11 The Analysis of Variance

Chapter 11 The Analysis of Variance Chapter The Aalyss of Varace. Oe Factor Aalyss of Varace. Radomzed Bloc Desgs (ot for ths course) NIPRL . Oe Factor Aalyss of Varace.. Oe Factor Layouts (/4) Suppose that a expermeter s terested populatos

More information

Estimation of the Loss and Risk Functions of Parameter of Maxwell Distribution

Estimation of the Loss and Risk Functions of Parameter of Maxwell Distribution Scece Joural of Appled Mathematcs ad Statstcs 06; 4(4): 9- http://www.scecepublshggroup.com/j/sjams do: 0.648/j.sjams.060404. ISSN: 76-949 (Prt); ISSN: 76-95 (Ole) Estmato of the Loss ad Rsk Fuctos of

More information

Statistics: Unlocking the Power of Data Lock 5

Statistics: Unlocking the Power of Data Lock 5 STAT 0 Dr. Kar Lock Morga Exam 2 Grades: I- Class Multple Regresso SECTIONS 9.2, 0., 0.2 Multple explaatory varables (0.) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (0.2) Exam 2 Re- grades Re-

More information

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura Statstcs Descrptve ad Iferetal Statstcs Istructor: Dasuke Nagakura (agakura@z7.keo.jp) 1 Today s topc Today, I talk about two categores of statstcal aalyses, descrptve statstcs ad feretal statstcs, ad

More information

Answer key to problem set # 2 ECON 342 J. Marcelo Ochoa Spring, 2009

Answer key to problem set # 2 ECON 342 J. Marcelo Ochoa Spring, 2009 Aswer key to problem set # ECON 34 J. Marcelo Ochoa Sprg, 009 Problem. For T cosder the stadard pael data model: y t x t β + α + ǫ t a Numercally compare the fxed effect ad frst dfferece estmates. b Compare

More information

Exponentiated Pareto Distribution: Different Method of Estimations

Exponentiated Pareto Distribution: Different Method of Estimations It. J. Cotemp. Math. Sceces, Vol. 4, 009, o. 14, 677-693 Expoetated Pareto Dstrbuto: Dfferet Method of Estmatos A. I. Shawky * ad Haaa H. Abu-Zadah ** Grls College of Educato Jeddah, Scetfc Secto, Kg Abdulazz

More information

ON THE LOGARITHMIC INTEGRAL

ON THE LOGARITHMIC INTEGRAL Hacettepe Joural of Mathematcs ad Statstcs Volume 39(3) (21), 393 41 ON THE LOGARITHMIC INTEGRAL Bra Fsher ad Bljaa Jolevska-Tueska Receved 29:9 :29 : Accepted 2 :3 :21 Abstract The logarthmc tegral l(x)

More information

On generalized fuzzy mean code word lengths. Department of Mathematics, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India

On generalized fuzzy mean code word lengths. Department of Mathematics, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India merca Joural of ppled Mathematcs 04; (4): 7-34 Publshed ole ugust 30, 04 (http://www.scecepublshggroup.com//aam) do: 0.648/.aam.04004.3 ISSN: 330-0043 (Prt); ISSN: 330-006X (Ole) O geeralzed fuzzy mea

More information

1 Onto functions and bijections Applications to Counting

1 Onto functions and bijections Applications to Counting 1 Oto fuctos ad bectos Applcatos to Coutg Now we move o to a ew topc. Defto 1.1 (Surecto. A fucto f : A B s sad to be surectve or oto f for each b B there s some a A so that f(a B. What are examples of

More information

Dimensionality reduction Feature selection

Dimensionality reduction Feature selection CS 750 Mache Learg Lecture 3 Dmesoalty reducto Feature selecto Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 750 Mache Learg Dmesoalty reducto. Motvato. Classfcato problem eample: We have a put data

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

Chapter 11 Systematic Sampling

Chapter 11 Systematic Sampling Chapter stematc amplg The sstematc samplg techue s operatoall more coveet tha the smple radom samplg. It also esures at the same tme that each ut has eual probablt of cluso the sample. I ths method of

More information

Lecture 12 APPROXIMATION OF FIRST ORDER DERIVATIVES

Lecture 12 APPROXIMATION OF FIRST ORDER DERIVATIVES FDM: Appromato of Frst Order Dervatves Lecture APPROXIMATION OF FIRST ORDER DERIVATIVES. INTRODUCTION Covectve term coservato equatos volve frst order dervatves. The smplest possble approach for dscretzato

More information

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses Johs Hopks Uverst Departmet of Bostatstcs Math Revew for Itroductor Courses Ratoale Bostatstcs courses wll rel o some fudametal mathematcal relatoshps, fuctos ad otato. The purpose of ths Math Revew s

More information

Analysis of System Performance IN2072 Chapter 5 Analysis of Non Markov Systems

Analysis of System Performance IN2072 Chapter 5 Analysis of Non Markov Systems Char for Network Archtectures ad Servces Prof. Carle Departmet of Computer Scece U Müche Aalyss of System Performace IN2072 Chapter 5 Aalyss of No Markov Systems Dr. Alexader Kle Prof. Dr.-Ig. Georg Carle

More information

Analyzing Control Structures

Analyzing Control Structures Aalyzg Cotrol Strutures sequeg P, P : two fragmets of a algo. t, t : the tme they tae the tme requred to ompute P ;P s t t Θmaxt,t For loops for to m do P t: the tme requred to ompute P total tme requred

More information

Sufficiency in Blackwell s theorem

Sufficiency in Blackwell s theorem Mathematcal Socal Sceces 46 (23) 21 25 www.elsever.com/locate/ecobase Suffcecy Blacwell s theorem Agesza Belsa-Kwapsz* Departmet of Agrcultural Ecoomcs ad Ecoomcs, Motaa State Uversty, Bozema, MT 59717,

More information

A Collocation Method for Solving Abel s Integral Equations of First and Second Kinds

A Collocation Method for Solving Abel s Integral Equations of First and Second Kinds A Collocato Method for Solvg Abel s Itegral Equatos of Frst ad Secod Kds Abbas Saadatmad a ad Mehd Dehgha b a Departmet of Mathematcs, Uversty of Kasha, Kasha, Ira b Departmet of Appled Mathematcs, Faculty

More information

A Primer on Summation Notation George H Olson, Ph. D. Doctoral Program in Educational Leadership Appalachian State University Spring 2010

A Primer on Summation Notation George H Olson, Ph. D. Doctoral Program in Educational Leadership Appalachian State University Spring 2010 Summato Operator A Prmer o Summato otato George H Olso Ph D Doctoral Program Educatoal Leadershp Appalacha State Uversty Sprg 00 The summato operator ( ) {Greek letter captal sgma} s a structo to sum over

More information

Algorithms Theory, Solution for Assignment 2

Algorithms Theory, Solution for Assignment 2 Juor-Prof. Dr. Robert Elsässer, Marco Muñz, Phllp Hedegger WS 2009/200 Algorthms Theory, Soluto for Assgmet 2 http://lak.formatk.u-freburg.de/lak_teachg/ws09_0/algo090.php Exercse 2. - Fast Fourer Trasform

More information

Correlation and Regression Analysis

Correlation and Regression Analysis Chapter V Correlato ad Regresso Aalss R. 5.. So far we have cosdered ol uvarate dstrbutos. Ma a tme, however, we come across problems whch volve two or more varables. Ths wll be the subject matter of the

More information

VOL. 3, NO. 11, November 2013 ISSN ARPN Journal of Science and Technology All rights reserved.

VOL. 3, NO. 11, November 2013 ISSN ARPN Journal of Science and Technology All rights reserved. VOL., NO., November 0 ISSN 5-77 ARPN Joural of Scece ad Techology 0-0. All rghts reserved. http://www.ejouralofscece.org Usg Square-Root Iverted Gamma Dstrbuto as Pror to Draw Iferece o the Raylegh Dstrbuto

More information

About a Fuzzy Distance between Two Fuzzy Partitions and Application in Attribute Reduction Problem

About a Fuzzy Distance between Two Fuzzy Partitions and Application in Attribute Reduction Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND IORMATION TECHNOLOGIES Volume 6, No 4 Sofa 206 Prt ISSN: 3-9702; Ole ISSN: 34-408 DOI: 0.55/cat-206-0064 About a Fuzzy Dstace betwee Two Fuzzy Parttos ad Applcato

More information

Barycentric Interpolators for Continuous. Space & Time Reinforcement Learning. Robotics Institute, Carnegie Mellon University

Barycentric Interpolators for Continuous. Space & Time Reinforcement Learning. Robotics Institute, Carnegie Mellon University Barycetrc Iterpolators for Cotuous Space & Tme Reforcemet Learg Rem Muos & Adrew Moore Robotcs Isttute, Carege Mello Uversty Pttsburgh, PA 15213, USA. E-mal:fmuos, awmg@cs.cmu.edu Category : Reforcemet

More information

ε. Therefore, the estimate

ε. Therefore, the estimate Suggested Aswers, Problem Set 3 ECON 333 Da Hugerma. Ths s ot a very good dea. We kow from the secod FOC problem b) that ( ) SSE / = y x x = ( ) Whch ca be reduced to read y x x = ε x = ( ) The OLS model

More information

Generalized Convex Functions on Fractal Sets and Two Related Inequalities

Generalized Convex Functions on Fractal Sets and Two Related Inequalities Geeralzed Covex Fuctos o Fractal Sets ad Two Related Iequaltes Huxa Mo, X Su ad Dogya Yu 3,,3School of Scece, Bejg Uversty of Posts ad Telecommucatos, Bejg,00876, Cha, Correspodece should be addressed

More information

Applied Mathematics and Computation

Applied Mathematics and Computation Appled Mathematcs ad Computato 215 (2010) 4198 4202 Cotets lsts avalable at SceceDrect Appled Mathematcs ad Computato joural homepage: www.elsever.com/locate/amc Improvemet estmatg the populato mea smple

More information

The Effect of Distance between Open-Loop Poles and Closed-Loop Poles on the Numerical Accuracy of Pole Assignment

The Effect of Distance between Open-Loop Poles and Closed-Loop Poles on the Numerical Accuracy of Pole Assignment Proceedgs of the 5th Medterraea Coferece o Cotrol & Automato, July 7-9, 007, Athes - Greece T9-00 The Effect of Dstace betwee Ope-Loop Poles ad Closed-Loop Poles o the Numercal Accuracy of Pole Assgmet

More information

Lattices. Mathematical background

Lattices. Mathematical background Lattces Mathematcal backgroud Lattces : -dmesoal Eucldea space. That s, { T x } x x = (,, ) :,. T T If x= ( x,, x), y = ( y,, y), the xy, = xy (er product of xad y) x = /2 xx, (Eucldea legth or orm of

More information

Generating Multivariate Nonnormal Distribution Random Numbers Based on Copula Function

Generating Multivariate Nonnormal Distribution Random Numbers Based on Copula Function 7659, Eglad, UK Joural of Iformato ad Computg Scece Vol. 2, No. 3, 2007, pp. 9-96 Geeratg Multvarate Noormal Dstrbuto Radom Numbers Based o Copula Fucto Xaopg Hu +, Jam He ad Hogsheg Ly School of Ecoomcs

More information

MS exam problems Fall 2012

MS exam problems Fall 2012 MS exam problems Fall 01 (From: Rya Mart) 1. (Stat 401) Cosder the followg game wth a box that cotas te balls two red, three blue, ad fve gree. A player selects two balls from the box at radom, wthout

More information

13. Artificial Neural Networks for Function Approximation

13. Artificial Neural Networks for Function Approximation Lecture 7 3. Artfcal eural etworks for Fucto Approxmato Motvato. A typcal cotrol desg process starts wth modelg, whch s bascally the process of costructg a mathematcal descrpto (such as a set of ODE-s)

More information

arxiv: v1 [math.st] 24 Oct 2016

arxiv: v1 [math.st] 24 Oct 2016 arxv:60.07554v [math.st] 24 Oct 206 Some Relatoshps ad Propertes of the Hypergeometrc Dstrbuto Peter H. Pesku, Departmet of Mathematcs ad Statstcs York Uversty, Toroto, Otaro M3J P3, Caada E-mal: pesku@pascal.math.yorku.ca

More information

Complex Numbers Primer

Complex Numbers Primer Complex Numbers Prmer Before I get started o ths let me frst make t clear that ths documet s ot teded to teach you everythg there s to kow about complex umbers. That s a subject that ca (ad does) take

More information

Naïve Bayes MIT Course Notes Cynthia Rudin

Naïve Bayes MIT Course Notes Cynthia Rudin Thaks to Şeyda Ertek Credt: Ng, Mtchell Naïve Bayes MIT 5.097 Course Notes Cytha Rud The Naïve Bayes algorthm comes from a geeratve model. There s a mportat dstcto betwee geeratve ad dscrmatve models.

More information