A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

Size: px
Start display at page:

Download "A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization"

Transcription

1 Journa of Machne Learnng Research Submtted 9/16; Revsed 1/17; Pubshed 1/17 A Genera Dstrbuted Dua Coordnate Optmzaton Framework for Reguarzed Loss Mnmzaton Shun Zheng Insttute for Interdscpnary Informaton Scences Tsnghua Unversty Bejng, Chna Jae Wang Department of Computer Scence The Unversty of Chcago Chcago, Inos Fen Xa Bejng Wsdom Uranum Technoogy Co., Ltd. Bejng, Chna We Xu Insttute for Interdscpnary Informaton Scences Tsnghua Unversty Bejng, Chna Tong Zhang Tencent AI Lab Shenzhen, Chna zhengs14@mas.tsnghua.edu.cn jae@uchcago.edu xafen@ebran.a wexu@tsnghua.edu.cn tongzhang@tongzhang-m.org Edtor: Sathya Keerth Abstract In modern arge-scae machne earnng appcatons, the tranng data are often parttoned and stored on mutpe machnes. It s customary to empoy the data paraesm approach, where the aggregated tranng oss s mnmzed wthout movng data across machnes. In ths paper, we ntroduce a nove dstrbuted dua formuaton for reguarzed oss mnmzaton probems that can drecty hande data paraesm n the dstrbuted settng. Ths formuaton aows us to systematcay derve dua coordnate optmzaton procedures, whch we refer to as Dstrbuted Aternatng Dua Maxmzaton DADM. The framework extends earer studes descrbed n Boyd et a., 11; Ma et a., 17; Jagg et a., 14; Yang, 13 and has rgorous theoretca anayses. Moreover, wth the hep of the new formuaton, we deveop the acceerated verson of DADM Acc-DADM by generazng the acceeraton technque from Shaev-Shwartz and Zhang, 14 to the dstrbuted settng. We aso provde theoretca resuts for the proposed acceerated verson, and the new resut mproves prevous ones Yang, 13; Ma et a., 17 whose teraton compextes grow neary on the condton number. Our emprca studes vadate our theory and show that our acceerated approach sgnfcanty mproves the prevous state-of-the-art dstrbuted dua coordnate optmzaton agorthms.. Most of the work was done durng the nternshp of Shun Zheng at Badu Bg Data Lab n Bejng. c 17 Shun Zheng, Jae Wang, Fen Xa, We Xu, and Tong Zhang. Lcense: CC-BY 4., see Attrbuton requrements are provded at

2 Zheng, Wang, Xa, Xu, and Zhang Keywords: dstrbuted optmzaton, stochastc dua coordnate ascent, acceeraton, reguarzed oss mnmzaton, computatona compexty 1. Introducton In arge-scae machne earnng appcatons for bg data anayss, t becomes a common practce to partton the tranng data and store them on mutpe machnes connected va a commodty network. A typca settng of dstrbuted machne earnng s to aow these machnes to tran n parae, wth each machne processng ts oca data wth no data communcaton. Ths paradgm s often referred to as data paraesm. To reduce the overa tranng tme, t s often necessary to ncrease the number of machnes and to mnmze the communcaton overhead. A sgnfcant chaenge s to reduce the tranng tme as much as possbe when we ncrease the number of machnes. A practca souton requres two research drectons: one s to mprove the underyng system desgn makng t sutabe for machne earnng agorthms Dean and Ghemawat, 8; Zahara et a., 1; Dean et a., 1; L et a., 14; the other s to adapt tradtona snge-machne optmzaton methods to hande data paraesm Boyd et a., 11; Yang, 13; Mahajan et a., 13; Shamr et a., 14; Jagg et a., 14; Mahajan et a., 17; Ma et a., 17; Takáč et a., 15; Zhang and Ln, 15. Ths paper focuses on the atter. For bg data machne earnng on a snge machne, there are two types of agorthms: batch agorthms such as gradent descent or L-BFGS Lu and Noceda, 1989, and stochastc optmzaton agorthms such as stochastc gradent descent and ther modern varance reduced versons Defazo et a., 14; Johnson and Zhang, 13. It s known that batch agorthms are reatvey easy to paraeze. However, on a snge machne, they converge more sowy than the modern stochastc optmzaton agorthms due to ther hgh perteraton computaton costs. Specfcay, t has been shown that the modern stochastc optmzaton agorthms converge faster than the tradtona batch agorthms for convex reguarzed oss mnmzaton probems. The faster convergence can be guaranteed n theory and observed n practce. The fast convergence of modern stochastc optmzaton methods has ed to studes to extend these methods to the dstrbuted computng settng. Specfcay, ths paper consders the generazaton of Stochastc Dua Coordnate Ascent SDCA method Hseh et a., 8; Shaev-Shwartz and Zhang, 13 and ts proxma varant Shaev-Shwartz and Zhang, 14 to hande dstrbuted tranng usng data paraesm. Athough ths probem has been consdered prevousy Yang, 13; Jagg et a., 14; Ma et a., 17, these earer approaches work wth the dua formuaton that s the same as the tradtona snge-machne dua formuaton, where dua varabes are couped, and hence run nto dffcutes when they try to motvate and anayze the derved methods under the dstrbuted envronment. One contrbuton of ths work s to ntroduce a new dua formuaton specfcay for dstrbuted reguarzed oss mnmzaton probems when data are dstrbuted to mutpe machnes. In our new formuaton, we decoupe the oca dua varabes through ntroducng another dua varabe β. Ths unque dua formuaton aows us to naturay extend the proxma SDCA agorthm ProxSDCA of Shaev-Shwartz and Zhang, 14 to the settng of mut-machne dstrbuted optmzaton that can beneft from data paraesm. Moreover, the anayss of the orgna ProxSDCA can be easy adapted to the new formu-

3 A Genera Dstrbuted Dua Coordnate Optmzaton Framework aton, eadng to new theoretca resuts. Ths new dua formuaton can aso be combned wth the acceeraton technque of Shaev-Shwartz and Zhang, 14 to further mprove convergence. In the proposed formuaton, each teraton of the dstrbuted dua coordnate ascent optmzaton s naturay decomposed nto a oca step and a goba step. In the oca step, we aow the use of any oca procedure to optmze a oca dua objectve functon usng oca parameters and oca data on each machne. Ths fexbty s smar to those of Ma et a., 17; Jagg et a., 14. For exampe, we may appy ProxSDCA as the oca procedure. In the oca step, a computer node can perform the optmzaton ndependenty wthout communcatng wth each other. Whe n the goba step, nodes communcate wth each other to synchronze the oca parameters and jonty update the goba prma souton. Ony ths goba step requres communcaton among nodes. We summarze our man contrbutons as foows: New dstrbuted dua formuaton Ths new formuaton naturay eads to a two-step oca-goba dua aternatng optmzaton procedure for dstrbuted machne earnng. We thus ca the resutng procedure Dstrbuted Aternatng Dua Maxmzaton DADM. Note that DADM drecty generazes ProxSDCA, whch can hande compex reguarzatons such as L -L 1 reguarzaton. New convergence anayss The new formuaton aows us to drecty generaze the anayss of ProxSDCA n Shaev-Shwartz and Zhang, 14 to the dstrbuted settng. Ths anayss s n contrast to that of CoCoA + n Ma et a., 17, whch empoys a dfferent way based on the Θ-approxmate souton assumpton of the oca sover. Our anayss can ead to smpfed resuts n the commony used mn-batch setup. Acceeraton wth theoretca guarantees Based on the new dstrbuted dua formuaton, we can naturay derve a dstrbuted verson of the acceerated proxma SDCA method AccProxSDCA of Shaev-Shwartz and Zhang, 14, whch has been shown to be effectve on a snge machne. We ca the resutng procedure Acceerated Dstrbuted Aternatng Dua Maxmzaton Acc-DADM. The man dea s to modfy the orgna formuaton usng a sequence of approxmatons that have stronger reguarzatons. Moreover, we drecty adapt theoretca anayses of AccProxSDCA to the dstrbuted settng and provde guarantees for Acc-DADM. Our theorems guarantee that we can aways obtan a computaton speedup compared wth the snge-machne AccProxSDCA. These guarantees mprove the theoretca resuts of DADM and prevous methods Yang, 13; Ma et a., 17 whose teraton compextes grow neary on the condton number. Latter methods possby fa to provde computaton tme mprovement over the snge-machne ProxSDCA when the condton number s arge. Extensve emprca studes We perform extensve experments to compare the convergence and the scaabty of the acceerated approach wth that of prevous state-of-the-art dstrbuted dua coordnate ascent methods. Our emprca studes show that Acc-DADM can acheve faster convergence and better scaabty than the prevous state-of-the-art, n partcuar when the condton number s arge. Ths phenomenon s consstent wth our theory. 3

4 Zheng, Wang, Xa, Xu, and Zhang We organze the rest of the paper as foows. Secton dscusses reated works. Secton 3 provdes premnary defntons. Secton 4 to 6 present the dstrbuted prma formuaton, the dstrbuted dua formuaton and our DADM method respectvey. Secton 7 then provdes theorems for DADM. Secton 8 ntroduces the acceerated verson and provdes correspondng theoretca guarantees. Secton 9 ncudes a proofs of ths paper. Secton 1 provdes extensve emprca studes of our nove method. Fnay, Secton 11 concudes the whoe paper.. Reated Work Severa generazatons of SDCA to the dstrbuted settngs have been proposed n the terature, ncudng DsDCA Yang, 13, CoCoA Jagg et a., 14, and CoCoA + Ma et a., 17. DsDCA was the frst attempt to study dstrbuted SDCA, and t provded a basc theoretca anayss and a practca varant that behaves we emprcay. Nevertheess, ther theoretca resut ony appes to a few specay chosen mn-batch oca dua updates that dffer from the practca method used n ther experments. In partcuar, they dd not show that optmzng each oca dua probem eads to convergence. Ths mtaton makes the methods they anayzed nfexbe. CoCoA was proposed to fx the above gap between theory and practce, and t was camed to be a framework for dstrbuted dua coordnate ascent n that t aows any oca dua sover to be used for the oca dua probem, rather than the mpractca choces of DsDCA. However, the actua performance of CoCoA s nferor to the practca varant proposed n DsDCA wth an aggressve oca update. We note that the practca varant of DsDCA dd not have a sod theoretca guarantee at that tme. CoCoA + fxed ths stuaton and may be regarded as a generazaton of CoCoA. The most effectve choce of the aggregaton parameter eads to a verson whch s smar to DsDCA, but aows exact optmzaton of each dua probem n ther theory. Accordng to studes n Ma et a., 17, the resutng CoCoA + agorthm performs sgnfcanty better than the orgna CoCoA both theoretcay and emprcay. The orgna CoCoA + Ma et a., 15 can ony hande probems wth the L reguarzer, and t was generazed to genera strongy convex reguarzers n the ong verson Ma et a., 17. Besdes, Smth et a., 16 extended the framework to sove the prma probem of reguarzed oss mnmzaton and cover genera non-strongy convex reguarzers such as L 1 reguarzer, and Hseh et a., 15 studed parae SDCA wth asynchronous updates. Athough CoCoA + has the advantage of aowng arbtrary oca sovers and fexbe approxmate soutons of oca dua probems, ts theoretca anayses do not capture the contrbuton of the number of machnes and the mn-batch sze to the teraton compexty expcty. Moreover, the teraton compextes of both CoCoA + and DsDCA grow neary wth the condton number. Thus they probaby cannot provde computaton tme mprovement over the snge-machne SDCA when the condton number s arge. Ths paper w remedy these unsatsfed aspects by provdng a dfferent anayss based on a new dstrbuted dua formuaton. Usng ths formuaton, we can anayze procedures that can take an arbtrary oca dua sover, whch s ke CoCoA + ; moreover, we aow the dua updates to be a mn-batch, whch s ke DsDCA. Besdes, ths formuaton aso aows 4

5 A Genera Dstrbuted Dua Coordnate Optmzaton Framework us to naturay generaze AccProxSDCA and reevant theoretca resuts to the dstrbuted settng. Our emprca resuts aso vadate the superorty of the acceerated approach. Whe we focus on extendng SDCA n ths paper, we note that there are other approaches for parae optmzaton. For exampe, there are drect attempts to paraeze stochastc gradent descent Nu et a., 11; Znkevch et a., 1. Some of these procedures ony consder mut-core shared memory stuaton, whch s very dfferent from the dstrbuted computng envronment nvestgated n ths paper. In the settng of dstrbuted computng, data are parttoned nto mutpe machnes, and one often needs to study communcaton-effcent agorthms. In such cases, one extreme s to aow exact optmzaton of subprobems on each oca machne as consdered n Shamr et a., 14; Zhang and Ln, 15. Athough ths approach mnmzes communcaton, the computaton cost for each oca sover can domnate the overa tranng. Therefore n practce, t s necessary to do a trade-off by usng the mn-batch update approach Takáč et a., 13, 15. However, t s dffcut for tradtona mn-batch methods to desgn reasonabe aggregaton strateges to acheve fast convergence. Takáč et a., 15 studed how the step sze can be reduced when the mn-batch sze grows n the dstrbuted settng. Lee and Roth, 15 derved an anaytca souton of the optma step sze for dua near support vector machne probems. Besdes, Mahajan et a., 13 presented a genera framework for dstrbuted optmzaton based on oca functona approxmaton, whch ncude severa frst-order and second-order methods as speca cases. Mahajan et a., 17 consdered each machne to hande a bock of coordnates and proposed dstrbuted bock coordnate descent methods for sovng 1 reguarzed oss mnmzaton probems. Dfferent from those methods, Dstrbuted Aternatng Dua Maxmzaton DADM proposed n ths work handes the trade-off between computaton and communcaton by deveopng bounds for mn-batch dua updates, whch s smar to Yang, 13. Moreover, DADM aows other better oca sovers to acheve faster convergence n practce. 3. Premnares In ths secton, we ntroduce some notatons used ater. A functons that we consder n ths paper are proper convex functons over a Eucdean space. Gven a functon f : R d R, we denote ts conjugate functon as f b = sup[b a fa]. a A functon f : R d R s L-Lpschtz wth respect to f for a a, b R d, we have fa fb L a b. A functon f : R d R s 1/γ-smooth wth respect to f t s dfferentabe and ts gradent s 1/γ-Lpschtz wth respect to. An equvaent defnton s that for a a, b R d, we have fb fa + fa b a + 1 γ b a. A functon f : R d R s -strongy convex wth respect to f for any a, b R d, we have fb fa + fa b a + b a, 5

6 Zheng, Wang, Xa, Xu, and Zhang where fa s any subgradent of fa. It s we known that a functon f s γ-strongy convex wth respect to f and ony f ts conjugate functon f s 1/γ-smooth wth respect to. 4. Dstrbuted Prma Formuaton In ths paper, we consder the foowng generc reguarzed oss mnmzaton probem: [ ] n P w := φ X w + ngw + hw, 1 mn w R d =1 whch s often encountered n practca machne earnng probems. Here we assume each X R d q s a d q matrx, w R d s the mode parameter vector, φ u s a convex oss functon defned on R q, whch s assocated wth the -th data pont, > s the reguarzaton parameter, gw s a strongy convex reguarzer and hw s another convex reguarzer. A speca case s to smpy set hw =. Here we aow the more genera formuaton, whch can be used to derve dfferent dstrbuted dua forms that may be usefu for speca purposes. The above optmzaton formuaton can be specazed to a varety of machne earnng probems. As an exampe, we may consder the L -L 1 reguarzed east squares probem, where φ x w = w x y for vector nput data x R d and rea vaued output y R, gw = w + a w 1, and hw = b w 1 for some a, b. If we set hw =, then t s we-known see, for exampe, Shaev-Shwartz and Zhang, 14 that the prma probem 1 has an equvaent snge-machne dua form of max α R n [ Dα := n =1 n φ α ng =1 X ] α, n where α = [α 1,, α n ], α R q = 1,..., n are dua varabes, φ s the convex conjugate functon of φ, and smary, g s the convex conjugate functon of g. The stochastc dua coordnate ascent method, referred to as SDCA n Shaev-Shwartz and Zhang, 14, maxmzes the dua formuaton by optmzng one randomy chosen dua varabe at each teraton. Throughout the agorthm, the foowng prma-dua reatonshp s mantaned: n wα = g =1 X α, 3 n for some subgradent g v. It s known that wα = w, where w and α are optma soutons of the prma probem and the dua probem respectvey. It was shown n Shaev-Shwartz and Zhang, 14 that the duaty gap defned as P wα Dα, whch s an upper-bound of the prma sub-optmaty P wα P w, converges to zero. Moreover, a convergence rate can be estabshed. In partcuar, for smooth oss functons, the convergence rate s near. We note that SDCA s sutabe for optmzaton on a snge machne because t works wth a dua formuaton that s sutabe for a snge machne. In the foowng, we w 6

7 A Genera Dstrbuted Dua Coordnate Optmzaton Framework generaze the snge-machne dua formuaton to the dstrbuted settng, and study the correspondng dstrbuted verson of SDCA. In the dstrbuted settng, we assume that the tranng data are parttoned and dstrbuted to m machnes. In other words, the ndex set S = {1,..., n} of the tranng data s dvded nto m non-overappng parttons, where each machne {1,..., m} contans ts own partton S S. We assume that S = S, and we use n := S to denote the sze of the tranng data on machne. Next, we can rewrte the prma probem 1 as the foowng constraned mnmzaton probem that s sutabe for the mut-machne dstrbuted settng: mn w;{w } m =1 m P w + hw =1 s.t. w = w, for a {1,..., m}, where P w := S φ X w + n gw, 4 where w represents the oca prma varabe on each machne, P s the correspondng oca prma probem and the constrants w = w are mposed to synchronze the oca prma varabes. Obvousy ths mut-machne dstrbuted prma formuaton 4 s equvaent to the orgna prma probem 1. We note that the dea of objectve spttng n 4 s smar to the goba varabe consensus formuaton descrbed n Boyd et a., 11. Instead of usng the commony used ADMM Aternatng Drecton Method of Mutpers method that s not a generazaton of, n ths paper we derve a dstrbuted dua formuaton based on 4 that drecty generazes. We further propose a framework caed Dstrbuted Aternatng Dua Maxmzaton DADM to sove the dstrbuted dua formuaton. One advantage of DADM over ADMM s that DADM does not need to sove the subprobems n hgh accuracy, and thus t can naturay enjoy the trade-off between computaton and communcaton, whch s smar to reated methods such as DsDCA, CoCoA and CoCoA Dstrbuted Dua Formuaton The optmzaton probem 4 can be further rewrtten as: mn w;{w };{u } s.t m φ u + n gw + hw S =1 u = X w, for a S w = w, for a {1,..., m}. 5 Here we ntroduce n dua varabes α := {α } n =1, where each α s the Lagrange mutper for the constrant u X w =, and m dua varabes β := {β } m =1, where each β s the Lagrange mutper for the constrant w w =. We can now ntroduce the prma-dua 7

8 Zheng, Wang, Xa, Xu, and Zhang objectve functon wth Lagrange mutpers as foows: Jw; {w }; {u }; {α }; {β } m := φ u + α u X w + n gw + β w w + hw. =1 S Proposton 1 Defne the dua objectve as m Dα, β := φ α n g S X α β h β. n S Then we have =1 Dα, β = mn Jw; {w }; {u }; {α }; {β }, w;{w };{u } where the mnmzers are acheved when the foowng equatons are satsfed φ u + α =, X α β + n gw =, S 6 β + hw =, for some subgradents φ u, gw, and hw. When β = {β } are fxed, we may defne the oca snge-machne dua formuaton on each machne wth respect to α as D α β := φ α n g S X α β, 7 n S where α represents oca dua varabes {α ; S } on machne, β R d serves as a carrer for synchronzaton of machne. Based on Proposton 1, we obtan the foowng mut-machne dstrbuted dua formuaton for the correspondng prma probem 4: m m Dα, β = D α β h β. 8 =1 Moreover, we have the non-negatve duaty gap, and zero duaty gap can be acheved when w s the mnmzer of P w and α, β maxmzes the dua Dα, β. Proposton Gven any w, α, β, the foowng duaty gap s non-negatve: P w Dα, β. Moreover, zero duaty gap can be acheved at w, α, β, where w s the mnmzer of P w and α, β s a maxmzer of Dα, β. =1 8

9 A Genera Dstrbuted Dua Coordnate Optmzaton Framework We note that the parameters {β } m =1 pass the goba nformaton across mutpe machnes. When β s fxed, D α β wth respect to α corresponds to the dua of the adjusted oca prma probem: P w β := S φ X w + n g w, 9 where the orgna reguarzer n gw n P w s repaced by the adjusted reguarzer n g w := n gw + β w. Smar to the snge-machne prma-dua reatonshp of 3, we have the foowng oca prma-dua reatonshp on each machne as: w α, β = g ṽ = g v, 1 where v = S X α n, ṽ = v β n. where Moreover, we can defne the goba prma-dua reatonshp as wα, β = g ṽ = g v, 11 n =1 v = X α, ṽ = v n β n. We can aso estabsh the reatonshp of goba-oca duaty n Proposton 3. Proposton 3 Gven w, α, β and {w } such that w 1 = = w m = w, we have the foowng decomposton of goba duaty gap as the sum of oca duaty gaps: P w Dα, β m =1 [ P w β D α β ], and the equaty hods when hw = β for some subgradent hw. Athough we aow arbtrary hw, the case of hw = s of speca nterests. Ths corresponds to the conjugate functon h β = { + f β f β =. That s, the term h m =1 β s equvaent to mposng the constrant m =1 β =. 9

10 Zheng, Wang, Xa, Xu, and Zhang Agorthm 1 Loca Dua Update Retreve oca parameters α t 1,ṽ t 1 Randomy pck a mn-batch Q S Approxmatey maxmze 1 w.r.t α Q Update α t as α t = α t 1 + α for a Q return v t = 1 n Q X α 6. Dstrbuted Aternatng Dua Maxmzaton Mnmzng the prma formuaton 4 s equvaent to maxmzng the dua formuaton 8, and the atter can be acheved by repeatedy usng the foowng aternatng optmzaton strategy, whch we refer to as Dstrbuted Aternatng Dua Maxmzaton DADM: Loca step: fx β and et each machne approxmatey optmze D α β w.r.t α n parae. Goba step: maxmze the goba dua objectve w.r.t β, and set the goba prma parameter w accordngy. The above steps are apped n teratons t = 1,,..., T. At the begnnng of each teraton t, we assume that the oca prma and dua varabes on each oca machne are α t 1, β t 1, v t 1, then we seek to update α t 1 to α t and vt 1 to v t n the oca step, and seek to update β t 1 to β t n the goba step. We note that the oca step can be executed n parae w.r.t dua varabes {α } m =1. In practce, t s often usefu to optmze 7 approxmatey by usng a randomy seected mn-batch Q S of sze Q = M. That s, we want to fnd α t wth Q to approxmatey maxmze the oca dua objectve as foows: D t Q α Q := φ α t 1 α n g ṽ t 1 Q + X α. 1 n Q Ths step s descrbed n Agorthm 1. We can use any sover for ths approxmate optmzaton, and n our experments, we choose ProxSDCA. The goba step s to synchronze a oca soutons, whch requres communcaton among the machnes. Ths s acheved by optmzng the foowng dua objectve wth respect to a β = {β }: β t arg max Dα t, β. 13 β Proposton 4 Gven v, et wv be the unque souton of the foowng optmzaton probem [ ] wv = arg mn nw v + ngw + hw 14 w that satsfes n gw + hw = nv 1

11 A Genera Dstrbuted Dua Coordnate Optmzaton Framework for some subgradents gw and hw = ρ at w = wv. Then βv = ρ s a souton of max [ ng v b ] h b, b n and wv = g v βv. n Proposton 5 Gven α, a souton of can be obtaned by settng max Dα, β β β = n v α vα + βvα n where βvα s defned n Proposton 4, n =1 vα = X α, v α n = S X α n. Moreover, f we et w = wα, β = wvα = g vα βvα, n where wv s defned n Proposton 4, and w = w α, β = g v α β, n then w = w for a, and P w Dα, β = m [ P w β D α β ]. =1 Accordng to Proposton 5, the souton of 13 s gven by β t = n v t v t + ρt, n where v t = m =1 n n vt = v t 1 + m =1 n n vt, 11

12 Zheng, Wang, Xa, Xu, and Zhang Agorthm Dstrbuted Aternatng Dua Maxmzaton DADM Input: Objectve P w, target duaty gap ɛ, warm start varabes w nt, α nt, β nt, v nt, f not specfed, set w nt =, α nt =, β nt =, v nt =,. Intaze: et w = w nt, α = α nt, β = β nt, v = v nt. for t = 1,,... do Loca step for a machnes = 1,,..., m n parae do ca an arbtrary oca procedure, such as Agorthm 1 end for Goba step Aggregate v t = v t 1 + m =1 n n vt Compute ṽ t accordng to 15 Let ṽ t = ṽ t ṽ t 1 for a machnes = 1,,..., m n parae do update oca parameter ṽ t = ṽ t 1 + ṽ t end for Stoppng condton: Stop f P w t Dα t, β t ɛ. end for return w t = g ṽ t, α t, β t, v t, and the duaty gap P w t Dα t, β t. and ρ t = hw t s a subgradent of h at the souton w t of [ ] w t = arg mn nw v t + ngw + hw, w that can acheve the frst order optmaty condton nv t + n gw t + ρ t = for some subgradent gw t. The defnton of ṽ mpes that after each goba update, we have ṽ t = ṽ t = v t ρt n = gwt, for a = 1,..., m. 15 Snce the objectve 1 for the oca step on each machne ony depends on the mnbatch Q samped from S and the vector ṽ t, whch needs to be synchronzed at each goba step, we know from 15 that at each tme t, we can pass the same vector ṽ t as ṽ t to a nodes. In practce, t may be benefca to pass ṽ t nstead, especay when ṽ t s sparse but ṽ t s dense. Put thngs together, the oca-goba DADM teratons can be summarzed n Agorthm. If we consder the speca case of hw =, the souton of 15 s smpy ṽ t = ṽ t = v t, and the goba step n Agorthm can be smpfed as frst aggregatng updates by ṽ t = v t = 1 m =1 n n vt,

13 A Genera Dstrbuted Dua Coordnate Optmzaton Framework and then updatng oca parameters n parae. Further, f hw = and the data partton s baanced, that s n are dentca for a = 1,..., m, t can be verfed that the DADM procedure gnorng the mn-batch varaton s equvaent to CoCoA +. Therefore the framework presented here may be regarded as an aternatve nterpretaton. Moreover, when the added reguarzaton n 1 s compex and mght nvoves more than one non-smooth term, consderng the spttng of gw and hw can brng computatona advantages. For exampe, to promote both sparsty and group sparsty n the predctor we often use the sparse group asso reguarzaton Fredman et a., 1, where a combnaton of L 1 norm and mxed L /L 1 norm group sparse norm s ntroduced: 1 G w G + w / w, where we add a sght L reguarzaton to make t strongy convex, as dd n Shaev-Shwartz and Zhang, 14. The proxma mappng wth respect to the sparse group asso reguarzaton functon does not have cosed form souton, thus often rees on teratve mnmzaton steps, but there are cosed form proxma mappng wth respect to ether L -L 1 norm or the group norm. Thus f we smpy set hw = and gw = 1 G w G + w / w, then both the oca optmzaton update 1 and goba synchronzaton step 14 w not have cosed form souton. However, f we assgn the group norm on hw such that hw = 1 G w G, and hence gw = w / w, the oca updates steps 1 w enjoy cosed form update, whch makes the mpementaton much easer and we ony need to use teratve mnmzaton on the rare goba synchronzaton step Convergence Anayss Let w be the optma souton for the prma probem P w and α, β be the optma souton for the dua probem Dα, β respectvey. For the prma souton w t and the dua souton α t, β t at teraton t, we defne the prma sub-optmaty as and the dua sub-optmaty as ɛ t P := P wt P w, ɛ t D := Dα, β Dα t, β t. Due to the cose reatonshp of the dstrbuted dua formuaton and the snge-machne dua formuaton, an anayss of DADM can be obtaned by drecty generazng that of SDCA. We consder two knds of oss functons, smooth oss functons that mpy fast near convergence and genera L-Lpschtz oss functons. For the foowng two theorems we aways assume that g s 1-strongy convex w.r.t, X R for a, M = Q t s fxed on each machne, and our oca procedure optmzes D Q suffcenty we on each t t machne such that D Q α Q D Q α Q, where α Q s gven by a speca choce n each theorem. Theorem 6 Assume that each φ s 1/γ-smooth w.r.t and α Q s gven by α := s u t 1 α t 1, for a Q, 13

14 Zheng, Wang, Xa, Xu, and Zhang where u t 1 := φ X wt 1 and s := γn γn +M R [, 1]. To reach an expected duaty gap of E[P w T Dα T, β T ] ɛ, every T satsfyng the foowng condton s suffcent, T R γ + max n M og R γ + max n M ɛ D ɛ. 16 Theorem 7 Assume that each φ s L-Lpschtz w.r.t, and α Q s gven by := φ X wt 1 and ] q [, mn M /n ]. To reach an expected nor- ɛ, every T satsfyng the foowng condton s where u t 1 mazed duaty gap of E suffcent, α := qn u t 1 α t 1, for a Q, M [ P w Dα,β n T T + ñ + G ɛ max { }, ñ ogñ ɛ D ng + ñ + 5G ɛ, 17 where T max{t, 4G ɛ ñ + t }, t = max{, ñ ogñ ɛ D ng }, ñ = max n /M, G = 4RL, and w, α, β represent ether the average vector or a randomy chosen vector of w t 1, α t 1, β t 1 over t {T +1,..., T } respectvey, such as α = 1 T T T t=t +1 αt 1, β = 1 T T T t=t +1 βt 1, w = 1 T T T t=t +1 wt 1. Remark 8 Both Theorem 6 and Theorem 7 ncorporate two key components: the term n max M and the condton number term 1 L γ or. When the term max n M domnates the teraton compexty, we can speed up convergence and reduce the number of communcatons by ncreasng the number of machnes m or the oca mn-batch sze M. However, n some crcumstances when the condton number s arge, t w become the eadng factor, and ncreasng m or M w not contrbute to the computaton speedup. To tacke ths probem, we deveop the acceerated verson of DADM n Secton 8. Remark 9 Our method s cosey reated to prevous dstrbuted extensons of SDCA. Theorems 6, 7 that provde theoretca guarantees for more genera oca updates acheve the same teraton compexty wth the ones n DsDCA that ony aows some speca choces of oca mn-batch updates. Compared wth theoretca resuts of CoCoA + that are based on the Θ-approxmate souton of the oca dua subprobem, athough the derved bounds are wthn the same scae, Õ1/ɛ for Lpschtz osses and Õog1/ɛ for smooth osses, our bounds are dfferent and compementary. The anayss of CoCoA + can provde better nsghts for more accurate soutons of the oca sub-probems. Whe our anayss s based on the mn-batch setup, t can capture the contrbutons of the mn-batch sze and the number of machnes more expcty. Remark 1 Snce the bounds are derved wth a speca choce of α Q, the actua performance of the agorthm can be sgnfcanty better than what s ndcated by the bounds when the oca duas are better optmzed. For exampe, we can choose ProxSDCA n Shaev- Shwartz and Zhang, 14 as the oca procedure and adopt the sequenta update strategy as the oca sover of CoCoA + does. Ths s aso the one used n our experments. 14

15 A Genera Dstrbuted Dua Coordnate Optmzaton Framework Agorthm 3 Acceerated Dstrbuted Aternatng Dua Maxmzaton Acc-DADM. Parameters κ, η = / + κ, ν = 1 η/1 + η. Intaze v = y = w =, α =, ξ = 1 + η P D,. for t = 1,,..., T outer do 1. Construct new objectve: n P t w = φ X w + ngw + hw + κn w y t 1. =1. Ca DADM sover: w t, α t, β t, v t, ɛ t = DADMP t, ηξ t 1 / + η, w t 1, α t 1, β t 1, v t Update: y t = w t + νw t w t Update: end for Return w Touter. ξ t = 1 η/ξ t Acceeraton 1 L Theorems 6, 7 a mpy that when the condton number γ or s reatvey sma, DADM converges fast. However, the convergence may be sow when the condton number s arge and domnates the teraton compexty. In fact, we observe emprcay that the basc DADM method converges sowy when the reguarzaton parameter s sma. Ths phenomenon s aso consstent wth that of SDCA for the snge-machne case. In ths secton, we ntroduce the Acceerated Dstrbuted Aternatng Dua Maxmzaton Acc- DADM method that can aevate the probem. The procedure s motvated by Shaev-Shwartz and Zhang, 14, whch empoys an nner-outer teraton: at every teraton t, we sove a sghty modfed objectve, whch adds a reguarzaton term centered around the vector y t 1 = w t 1 + ν w t 1 w t, 18 where ν [, 1] s caed the momentum parameter. The acceerated DADM procedure descrbed n Agorthm 3 can be smary vewed as an nner-outer agorthm, where DADM serves as the nner teraton. In the outer teraton, we adjust the reguarzaton vector y t 1. That s, at each outer teraton t, we defne a modfed oca prma objectve on each machne, whch has the same form as the orgna oca prma objectve 9, except that g w s modfed to g t w that s defned by n g t w =n g t w + β w, g t w =gw + κ w y t 1. 15

16 Zheng, Wang, Xa, Xu, and Zhang It foows that we w need to sove a modfed dua at each oca step wth g repaced by g t n the oca dua probem 1. Therefore, compared to the basc DADM procedure, nothng changes other than g beng repaced by g t at each teraton. Specfcay, when the number of machnes m equas 1, ths agorthm reduces to AccProxSDCA descrbed n Shaev-Shwartz and Zhang, 14. Thus Acc-DADM can be naturay regarded as the dstrbuted generazaton of the snge-machne AccProxSDCA. Moreover, Acc-DADM aso aows arbtrary oca procedures as DADM does. Our emprca studes show that Acc-DADM sgnfcanty outperforms DADM n many cases. There are probaby two reasons. One reason s the use of a modfed reguarzer g t w that s more strongy convex than the orgna reguarzer gw when κ s much arger than. The other reason s cosey reated to the dstrbuted settng consdered n ths paper. Observe that n the modfed oca prma objectve P t w β := P w β + κn w y t 1, the frst term corresponds to the orgna oca prma objectve and the second term s an extra reguarzaton due to acceeraton that constrans w to be cose to y t 1. The effect s that dfferent oca probems become more smar to each other, whch stabze the overa system. 8.1 Theoretca Resuts of Acc-DADM for Smooth Losses The foowng theorem estabshes the computaton effcency guarantees for Acc-DADM. Theorem 11 Assume that each φ s 1/γ-smooth, and g s 1-strongy convex w.r.t, X R for a, M = Q s fxed on each machne. To obtan expected ɛ prma sub-optmaty: E[P w t ] P w ɛ, t s suffcent to have the foowng number of stages n Agorthm 3 T outer 1 + η og ξ 4 + κ + κ P D, = 1 + og + og, ɛ ɛ and the number of nner teratons n DADM at each stage: R T nner γ + κ + max n R og γ + κ + max n M M κ og. In partcuar, suppose we assume n 1 = n =... = n m, and M 1 = M =... = M m = b, then the tota vector computatons for each machne are bounded by κ + ÕT outer T nner b = Õ R 1 + γ + κ + n b. mb Remark 1 When κ =, then the guarantees reduce to DADM. However, DADM ony enjoys near speedup over ProxSDCA when the number of machnes satsfes m nγ/r, 16

17 A Genera Dstrbuted Dua Coordnate Optmzaton Framework and beng abe to obtan sub-near speedup when R γ = On. Besdes enjoyng the propertes descrbed above as DADM, f we choose κ n Agorthm 3 as κ = mr γn, and b = 1, then the tota vector computatons for each machne are bounded by Rm n Õ = γn m Õ Rn, γm whch means Acc-DADM can be much faster than DADM when the condton number s arge and aways obtan a reducton of computatons over the snge-machne AccProxSDCA by a factor of Õ 1/m. 8. Acceeraton for Non-smooth, Lpschtz Losses Theorem 11 estabshes the rate of convergence for smooth oss functons, but the acceeraton framework can aso be used on non-smooth, Lpschtz oss functons. The man dea s to use the Nesterov s smoothng technque Nesterov, 5 to construct a smooth approxmaton of the non-smooth functon φ, by addng a strongy-convex reguarzaton term on the conjugate of φ : φ α := φ α + γ α, by the property of conjugate functons e.g. Lemma n Shaev-Shwartz and Zhang, 14, we know φ, as the conjugate functon of φ s 1/γ-smooth, and φ u φ u γl. Then nstead of the orgna functon wth non-smooth osses, we mnmze the smoothed objectve: [ ] n ˆP w := φ X w + ngw + hw. 19 mn w R d =1 The foowng coroary estabshes the computaton effcency guarantees for Acc-DADM on non-smooth, Lpschtz oss functons. Coroary 13 Assume that each φ s L-Lpschtz, and g s 1-strongy convex w.r.t, X R for a, M = Q s fxed on each machne. To obtan expected ɛ normazed prma sub-optmaty: [ ] P w t E P w ɛ, n n t s suffcent to run Agorthm 3 on the smoothed objectve 19, wth and the foowng number of stages, T outer 1 + η og ξ 4 + κ = 1 + ɛ γ = ɛ L, og 17 + κ + og P D, ɛ,

18 Zheng, Wang, Xa, Xu, and Zhang and the number of nner teratons n DADM at each stage: T nner L R ɛ + κ + max n M L R og ɛ + κ + max n M κ og. In partcuar, suppose we assume n 1 = n =... = n m, and M 1 = M =... = M m = b, then the tota vector computatons for each machne are bounded by ÕT outer T nner b = Õ 1 + κ + L R ɛ + κ + n b. mb Remark 14 When κ =, then the guarantees reduce to DADM for Lpschtz osses. Moreover, when L Rm nɛ, f we choose κ n Agorthm 3 as κ = ml R nɛ, and b = 1, then the tota vector computatons for each machne are bounded by L Õ Rm n Rn = nɛ m Õ L, ɛm whch means Acc-DADM can be much faster than DADM when ɛ s sma and aways obtan a reducton of computatons over the snge-machne AccProxSDCA by a factor of Õ 1/m. 9. Proofs In ths secton, we frst present proofs about severa prevous propostons to estabsh our framework sody. Then based on our new dstrbuted dua formuaton, we drecty generaze the anayss of SDCA and adapt t to DADM n the commony used mn-batch setup. Fnay, we descrbe the proof for the theoretca guarantees of Acc-DADM. 9.1 Proof of Proposton 1 Proof Gven any set of parameters w; {w }; {u }; {α }; {β }, we have mn Jw; {w }; {u }; {α }; {β } w;{w };{u } m = mn mn φ u + α w;{w } u u X w + n gw + β w w + hw, =1 S }{{} A 18

19 A Genera Dstrbuted Dua Coordnate Optmzaton Framework where the mnmum s acheved at {u } such that φ u + α =. By emnatng u we obtan m A = mn φ α α X w + n gw + β w w + hw w;{w } =1 S m = mn mn φ w w α X α β w + n gw β w + hw, =1 S S }{{} B where mnmum s acheved at {w } such that X S α β + n gw =. By emnatng w we obtan m B = mn φ w α n g S X α β β n w + hw =1 S m = φ α n g S X α β h β, n =1 S }{{} Dα,β where the mnmzer s acheved at w such that β + hw =. Ths competes the proof. 9. Proof of Proposton Proof Gven any w, f we take u = X w and w = w for a and, then P w = Jw; {w }; {u }; {α }; {β } for arbtrary {α }; {β }. It foows from Proposton 1 that P w = Jw; {w }; {u }; {α }; {β } Dα, β. w s the mnmzer of P w. When w = w, we may set u = u = X w and w = w = w. From the frst order optmaty condton, we can obtan X φ u + n gw + hw =. If we take α = φ u and β = S X α n gw for some subgradents, then t s not dffcut to check that a equatons n 6 are satsfed. It foows that we can acheve equaty n Proposton 1 as P w = Jw ; {w }; {u }; {α }; {β } = Dα, β. 19

20 Zheng, Wang, Xa, Xu, and Zhang Ths means that zero duaty gap can be acheved wth w. It s easy to verfy that α, β maxmzes Dα, β, snce for any α, β, we have Dα, β Jw ; {w }; {u }; {α }; {β } = P w = Dα, β. 9.3 Proof of Proposton 3 Proof We have the decompostons m Dα, β = D α β h β, =1 and m P w = P w β β w + hw. =1 It foows that the duaty gap m P w Dα, β = =1[ P w β D α β ] + h β + hw β w. Note that the defnton of convex conjugate functon mpes that h β + hw β w, and the equaty hods when hw = β. Ths mpes the desred resut. 9.4 Proof of Proposton 4 Proof It s easy to check by usng the duaty that for any b and w: ng v b h b n [ nw v b ] [ ] + ngw + b w + hw n = nw v + ngw + hw, and the equaty hods f b = hw and v b n = gw for some subgradents. Based on the assumptons, the equaty can be acheved at b = βv = hwv and w = wv. Ths proves the desred resut by notcng that v b n = gw mpes that w = g v b/n.

21 A Genera Dstrbuted Dua Coordnate Optmzaton Framework 9.5 Proof of Proposton 5 Proof Snce α s fxed, we know that the probem max β Dα, β s equvaent to [ m max n g v α β ] h β. β n =1 Now by usng Jensen s nequaty, we obtan for any β : m n g v α β n =1 ng m =1 = ng vα n S X α β n n β n ng vα βvα n h β h h β h βvα. In the above dervaton, the ast nequaty has used Proposton 4. Here the equates can be acheved when v α β = vα βvα n n for a, whch can be obtaned wth the choce of {β } = {β } gven n the statement of the proposton. β 9.6 Proof of Theorem 6 The foowng resut s the mn-batch verson of a reated resut n the anayss of ProxSDCA, whch we appy to any oca machne. The proof s ncuded for competeness. Lemma 15 Assume that φ s γ-strongy convex w.r.t where γ can be zero and g s 1-smooth w.r.t. Every oca step, we randomy pck a mn-batch Q S, whose sze s M := Q, and optmze w.r.t dua varabes α, Q. Then, usng the smpfed notaton we have P w t 1 = P w t 1 β t 1, D α t 1 = D α t 1 β t 1, where E [D α t D α t 1 ] s M n E [P w t 1 G t := [ X γn ] 1 s E M s S D α t 1 ] s M n G t [ ] u t 1 α t 1 α := α t α t 1 = s u t 1 α t 1, for a Q, 1

22 Zheng, Wang, Xa, Xu, and Zhang and u t 1 = φ X wt 1, s [, 1]. Proof Snce ony the eements n Q are updated, the mprovement n the dua objectve can be wrtten as D α t D α t 1 = φ α t n g v t 1 + n 1 X α Q Q φ α t 1 n g v t 1 Q φ α t 1 α g v t 1 X α 1 X α n Q Q Q }{{} A φ α t 1, Q }{{} B where we have used the fact the g s 1-smooth n the dervaton of the nequaty. By the defnton of the update n the agorthm, and the defnton of α = s u t 1 α t 1, s [, 1], we have A φ α t 1 + s u t 1 α t 1 Q g v t 1 X s u t 1 α t 1 Q 1 X s u t 1 n Q α t 1 1 From now on, we omt the superscrpt t 1. Snce φ s γ-strongy convex w.r.t, we have φ α + s u α = φ s u + 1 s α s φ u + 1 s φ α γ s 1 s u α

23 A Genera Dstrbuted Dua Coordnate Optmzaton Framework Brngng Eq. nto Eq. 1, we get A Q s φ u 1 s φ α + γ s 1 s u α w s X u α 1 s X u α n Q Q φ α + Q Q }{{} B + Q γ s 1 s u α s w X u φ u + s φ α + s w X α M X u α s n, Q where we get the second nequaty accordng to the fact that Q a Q M a. Snce we choose u = φ X w, for some subgradents φ X w, whch yeds w X u φ u = φ X w, then we obtan A B Q s [φ X w + φ α + w X α ] + [ γ1 s u α s s M X ]. n Q = ] [φ X w + φ α + w X α Q s + M n Q s u α [ ] γn 1 s X. M s 3 Reca that wth w = g ṽ, we have gw + g ṽ = w ṽ. Then we derve the oca duaty gap as P w D α = φ X w + n gw + β w φ α n g S X α β n S S = φ X w + φ α + w X α S 3

24 Zheng, Wang, Xa, Xu, and Zhang Then, takng the expectaton of Eq. 3 w.r.t the random choce of mn-batch set Q at round t, we obtan E t [A B ] M n S s + M n = s M n [φ X w + φ α + w X α ] [ ] s u α γn 1 s X M s S ] [φ X w + φ α + w X α S M n S s u α [ X γn ] 1 s. M s Take expectaton of both sdes w.r.t the randomness n prevous teratons, we have E[A B ] s M E [ P w D α n ] s M n G t, where G t := [ X γn ] 1 s E [ u α M s ]. S Proof of Theorem 6. Proof We w appy Lemma 15 wth s = 1 γn 1 + RM = γn γn + M R [, 1], S. Reca that X R for a S, then we have X γn 1 s M s, for a S, whch mpes that G t for a. It foows that for a after the oca update step we have: E [ D α t s M E n βt 1 [ P w t 1 D α t 1 β t 1 β t 1 ] D α t 1 β t 1 ]. 4 Now we note that after the goba step at teraton t 1, the choces of w t 1 and β t 1 n DADM s accordng to the choce of Proposton 4 and Proposton 5. It foows from 4

25 A Genera Dstrbuted Dua Coordnate Optmzaton Framework Proposton 5 that the foowng reatonshp between the goba and oca duaty gap at the begnnng of the t-th teraton s satsfed: P w t 1 Dα t 1, β t 1 = [ P w t 1 β t 1 D ] α t 1 β t 1. Usng ths decomposton and summng over n 4, we obtan where E [Dα t, β t 1 Dα t 1, β t 1 ] qe [P w t 1 Dα t 1, β t 1 ], q = mn s M n = mn Snce Dα t, β t Dα t, β t 1, we obtan γm γn + M R. E [Dα t, β t Dα t 1, β t 1 ] qe [P w t 1 Dα t 1, β t 1 ]. Let α, β be the optma souton of the dua probem, we have defned the dua suboptmaty as ɛ t D := Dα, β Dα t, β t. Let ɛ t 1 G = P w t 1 Dα t 1, β t 1, and we know that ɛ t 1 D ɛ t 1 G. It foows that Therefore we have E[ɛ t 1 D ] E[ɛ t 1 D ɛ t D ] qe[ɛt 1 G ] qe[ɛ t 1 D ]. qe[ɛ t G ] E[ɛt D ] 1 qe[ɛt 1 D ] 1 q t ɛ D e qt ɛ To obtan an expected duaty gap of E[ɛ T G ] ɛ, every T, whch satsfes T 1 q og 1 ɛ D, q ɛ s suffcent. Ths proves the desred bound. D. 9.7 Proof of Theorem 7 Now, we consder L-Lpschtz oss functons and use the foowng basc emma for L- Lpschtz osses taken from Shaev-Shwartz and Zhang, 13, 14. Lemma 16 Let φ : R q R be an L-Lpschtz functon w.r.t, then we have φ α =, for any α R q s.t. α > L. Proof of Theorem 7. Proof Appyng Lemma 15 wth γ =, then we have = [ ] X E u t 1 α t 1. S G t 5

26 Zheng, Wang, Xa, Xu, and Zhang Accordng to Lemma 16, we know that u t 1 u t 1 α t 1 u t 1 L and α t 1 + α t 1 4L. L, thus we have Reca that X R, then we have Gt G, where G = 4n RL. Combnng ths nto Lemma 15, we have E [ D α t s M E n βt 1 [ P w t 1 D α t 1 β t 1 ] β t 1 D α t 1 β t 1 ] s M n G. 5 Now we aso note that after the goba step at teraton t 1, the choces of w t 1 and β t 1 n DADM s accordng to the choce of Proposton 4 and Proposton 5. It foows from Proposton 5 that the foowng reatonshp of goba and oca duaty gap at the begnnng of the t-th teraton s satsfed: P w t 1 Dα t 1, β t 1 = [ P w t 1 β t 1 D ] α t 1 β t 1. Summng the nequaty 5 over, combnng wth the above decomposton and brngng Dα t, β t Dα t, β t 1 nto t, we get E[Dα t, β t Dα t 1, β t 1 ] qe[p w t 1 Dα t 1, β t 1 ] m =1 q G, 6 M where q [, mn n ], q = s M n and s [, 1] s chosen so that a s M n = 1,..., m are equa. Let α, β be the optma souton for the dua probem Dα, β, and we have defned the dua suboptmaty as ɛ t D := Dα, β Dα t, β t. Note that the duaty gap s an upper bound of the dua suboptmaty, P w t 1 Dα t 1, β t 1 ɛ t 1 D. Then 6 mpes that E [ ] [ ɛ t D 1 qe n ] ɛ t 1 D + q G n, where G = 1 n m G = 4RL Startng from ths recurson, we can now appy the same anayss for L-Lpschtz oss functons of the snge-machne SDCA n Shaev-Shwartz and Zhang, 13 to obtan the foowng desred nequaty: E [ ] ɛ t D n =1 G ñ + t t, 7 for a t t = max, ñ og ɛ ng, where ñ = max n /M. Further appyng the same strateges n Shaev-Shwartz and Zhang, 13 based on 7 proves the desred bound. D ñ 6

27 A Genera Dstrbuted Dua Coordnate Optmzaton Framework 9.8 Proof of Theorem 11 Our proof strategy foows Shaev-Shwartz and Zhang, 14 and Frostg et a., 15, whch both used acceeraton technques of Nesterov, 4 on top of approxmate proxma pont steps, the man dfferences compared wth Shaev-Shwartz and Zhang, 14 and Frostg et a., 15 are here we warm start wth two groups dua varabes α and β where Shaev-Shwartz and Zhang, 14 warm start ony wth α as t consder the snge machne settng, and Frostg et a., 15 warm start from prma varabes w. Proof The proof conssts of the foowng steps: In Lemma 17 we show that one can construct a quadratc ower bound of the orgna objectve P w from an approxmate mnmzer of the proxma objectve P t w. Usng the quadratc ower bound we construct an estmaton sequence, based on whch n Lemma 18 we prove the acceerated convergence rate for the outer oops. We show n Lemma 19 that by warm start the terates from the ast stage, the dua sub-optmaty for the next stage s sma. Based on Lemma 19, we know the contracton factor between the nta dua sub-optmaty and the target prma-dua gap at stage t can be upper bounded by D t α t opt, βt opt D tα t 1, β t 1 ηξ t 1 / + η ɛ t 1 ηξ t 1 / + η η/ η η1 η/ κ 36 + η 5 1 η 36κξ t 3 ηξ t 1 / + η where the ast step we used the fact that η 4 = η 1η +1+1 > η 1η +1 = κη +1. Thus usng the resuts from pan DADM Theorem 6, we know the number of nner teratons n each stage s upper bounded by where χ = Dt α t opt χ og χ + og, βt opt D tα t 1, β t 1 ηξ t 1 / + η χ og χ κ og, R γ+κ + max n M. 9.9 Proof of Coroary 13 By the property of φ u, for every w we have ˆP w n P w n γl, 7

28 Zheng, Wang, Xa, Xu, and Zhang thus f we found a predctor w t that s ɛ -suboptma wth respect to ˆP w n : ˆP w t n mn w ˆP w n ɛ, and we choose γ = ɛ/l, we know t must be ɛ-suboptma wth respect to P w n, because P w t n P w n ˆP w t n ˆP w t n ˆP w + γl n mn w ˆP w n + ɛ ɛ. The rest of the proof just foows the smooth case as proved n Theorem 11. Dua subprobems n Acc-DADM P t w = = =1 =1 Defne: = + κ, fw = gw + κ w. Let n φ X w + ngw + hw + κn w y t 1 n φ X w + fw n κ w y t 1 + hw + κn y t 1 be the goba prma probem to sove, and P t w = S φ X w + n gw + κn w y t 1 be the separated oca probem. Gven each dua varabe β, we aso defne the adjusted oca prma probem as: P t w β = S φ X w + n gw + β w + κn t s not hard to see the adjusted oca dua probem s D t α β = S φ α n f S X α β + κn y t 1 n and the goba dua objectve can be wrtten as D t α, β = m m D t α β h β. =1 =1 w y t 1, + κn y t 1, Quadratc ower bound for P w based on approxmate proxma pont agorthm Snce P t w = P w + κn w y t 1, and et wt opt = arg mn w P t w. The foowng emma shows we can construct a ower bound of P w from an approxmate mnmzer of P t w. 8

29 A Genera Dstrbuted Dua Coordnate Optmzaton Framework Lemma 17 Let w + be an ɛ-approxmated mnmzer of P t w,.e. P t w + P t w t opt + ɛ. We can construct the foowng quadratc ower bound for P w, as w P w P w + + Qw; w +, y t 1, ɛ, 8 where Qw; w +, y t 1, ɛ = n 4 y w t 1 κ n w + y t κ y t 1 w + κ + ɛ. Proof Snce w t opt s the mnmzer of a κ + n-strongy convex objectve P tw, we know w, P t w P t w t κ + n opt + w w t opt P t w + κ + n + w w t opt ɛ, whch s equvaent to P w P w + + Snce κ + / κ + n w w t w w + + / =κ re-organzng terms we get So κ + = κ + / opt w w t ɛ + κn opt + wt opt w+ w t w opt w + t opt w+ + κ + / w w t opt, wt opt w+ κ + / w t w opt w + t opt w+ + κ + / w + w t opt w + y t 1 w y t 1., w t opt w κ + / w w + κ + κ + / + / w t opt w w + w t opt P w P w + κ + /n + w w + κ + κ + /n w + w t opt ɛ + κn w + y t 1 w y t 1 9

30 Zheng, Wang, Xa, Xu, and Zhang Aso noted that κ+n w + w t Decompose w w + we get opt ɛ, we get P w P w + κ + /n + w w + κ + ɛ + κn w + y t 1 w y t 1 w w + w = y t 1 + y t 1 w + + w yt 1, y t 1 w +. So P w P w + + /n κ + /n + w y t 1 y t 1 w + κ + ɛ + κ + /n w yt 1, y t 1 w + Notced that the rght hand sde of above nequaty s a quadratc functon wth respect to w, and the mnmum s acheved when w = y t κ y t 1 w +, wth mnmum vaue κ n w + y t 1 wth above we fnshed the proof of Lemma 17. κ + ɛ, Convergence proof and for t 1, Defne the foowng sequence of quadratc functons ψ w = P + n κ + 4 w P D,, ψ t w = 1 ηψ t 1 w + ηp w t + Qw; w t, y t 1, ɛ t, where η = +κ, We frst cacuate the expct form of the quadratc functon ψ tw and ts mnmzer v t = arg mn w ψ t w. Ceary v =, and notced that ψ t w s aways a n -strongy convex functon, we know ψ tw s n the foowng form: ψ t w = ψ t v t + n 4 3 w v t.

31 A Genera Dstrbuted Dua Coordnate Optmzaton Framework Based on the defnton of ψ t+1 w, and v t+1 s mnmzng ψ t+1 w, based on frst-order optmaty condton, we know 1 ηn v t+1 v t + ηn v t+1 y t 1 + κ y t w t+1 =, rearrangng we get v t+1 = 1 ηv t + η y t 1 + κ y t w t+1. The foowng emma proves the convergence rate of w t to ts mnmzer. Lemma 18 Let ɛ t η 1 + η ξ t, and ξ t = 1 η/ t ξ, we w have the foowng convergence guarantee: P w t P w ξ t. Proof It s suffcent to prove P w t mn w ψ tw ξ t, 9 then we get P w t P w P w t ψ t w P w t mn w ψ tw ξ t. We prove equaton 9 by nducton. When t =, we have κ + P w φ v = ɛ = ξ, 31

32 Zheng, Wang, Xa, Xu, and Zhang whch verfed 9 s true for t =. Suppose the cam hods for some t 1, for the stage t + 1, we have ψ t+1 v t+1 =1 η ψ t v t + n 4 v t+1 v t + ηp w t+1 + Qv t+1 ; w t+1, y t, ɛ t =1 ηψ t v t + 1 ηη n 4 vt + ηp w t+1 + η1 η n 4 vt ηκ n y t w t+1 κ + η ɛ t =1 ηψ t v t + ηp w t+1 ηκ n η1 ηn + 4 vt y t 1 + κ y t 1 + κ y t w t+1 y t 1 + κ y t w t+1 y t w t+1 y t w t+1 κ + η. ɛ t Snce ηκ n ηκ n y t w t+1 η1 ηn + 4 κ + η1 ηn + 4 vt 1 + κ y t w t+1 v t y t, y t w t+1 + η1 ηn ηκ n + η1 ηκ n y t w t+1 + η1 ηn κ + v t y t, y t w t+1 = η κ n y t w t+1 y t 1 + κ y t w t+1 + η1 ηn κ + v t y t, y t w t+1 Thus κ + ψ t+1 v t+1 1 ηψ t v t + ηp w t+1 η η κ n y t w t+1 + η1 ηn κ + ɛ t v t y t, y t w t+1 3

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory Advanced Scence and Technoogy Letters Vo.83 (ISA 205), pp.60-65 http://dx.do.org/0.4257/ast.205.83.2 Research on Compex etworks Contro Based on Fuzzy Integra Sdng Theory Dongsheng Yang, Bngqng L, 2, He

More information

MARKOV CHAIN AND HIDDEN MARKOV MODEL

MARKOV CHAIN AND HIDDEN MARKOV MODEL MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG JIANZHAN@STAT.PURDUE.EDU Markov chan and hdden Markov mode are probaby the smpest modes whch can be used to mode sequenta data,.e. data sampes whch are not

More information

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06 Dervng the Dua Prof. Bennett Math of Data Scence /3/06 Outne Ntty Grtty for SVM Revew Rdge Regresson LS-SVM=KRR Dua Dervaton Bas Issue Summary Ntty Grtty Need Dua of w, b, z w 2 2 mn st. ( x w ) = C z

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Associative Memories

Associative Memories Assocatve Memores We consder now modes for unsupervsed earnng probems, caed auto-assocaton probems. Assocaton s the task of mappng patterns to patterns. In an assocatve memory the stmuus of an ncompete

More information

DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization

DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization DSCOVR: Randomzed Prma-Dua Boc Coordnate Agorthms for Asynchronous Dstrbuted Optmzaton Ln Xao Mcrosoft Research AI Redmond, WA 9805, USA Adams We Yu Machne Learnng Department, Carnege Meon Unversty Pttsburgh,

More information

Neural network-based athletics performance prediction optimization model applied research

Neural network-based athletics performance prediction optimization model applied research Avaabe onne www.jocpr.com Journa of Chemca and Pharmaceutca Research, 04, 6(6):8-5 Research Artce ISSN : 0975-784 CODEN(USA) : JCPRC5 Neura networ-based athetcs performance predcton optmzaton mode apped

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks Shengyang Sun, Changyou Chen, Lawrence Carn Suppementary Matera: Learnng Structured Weght Uncertanty n Bayesan Neura Networks Shengyang Sun Changyou Chen Lawrence Carn Tsnghua Unversty Duke Unversty Duke

More information

A finite difference method for heat equation in the unbounded domain

A finite difference method for heat equation in the unbounded domain Internatona Conerence on Advanced ectronc Scence and Technoogy (AST 6) A nte derence method or heat equaton n the unbounded doman a Quan Zheng and Xn Zhao Coege o Scence North Chna nversty o Technoogy

More information

On the Power Function of the Likelihood Ratio Test for MANOVA

On the Power Function of the Likelihood Ratio Test for MANOVA Journa of Mutvarate Anayss 8, 416 41 (00) do:10.1006/jmva.001.036 On the Power Functon of the Lkehood Rato Test for MANOVA Dua Kumar Bhaumk Unversty of South Aabama and Unversty of Inos at Chcago and Sanat

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

A Class of Distributed Optimization Methods with Event-Triggered Communication

A Class of Distributed Optimization Methods with Event-Triggered Communication A Cass of Dstrbuted Optmzaton Methods wth Event-Trggered Communcaton Martn C. Mene Mchae Ubrch Sebastan Abrecht the date of recept and acceptance shoud be nserted ater Abstract We present a cass of methods

More information

Lower Bounding Procedures for the Single Allocation Hub Location Problem

Lower Bounding Procedures for the Single Allocation Hub Location Problem Lower Boundng Procedures for the Snge Aocaton Hub Locaton Probem Borzou Rostam 1,2 Chrstoph Buchhem 1,4 Fautät für Mathemat, TU Dortmund, Germany J. Faban Meer 1,3 Uwe Causen 1 Insttute of Transport Logstcs,

More information

Image Classification Using EM And JE algorithms

Image Classification Using EM And JE algorithms Machne earnng project report Fa, 2 Xaojn Sh, jennfer@soe Image Cassfcaton Usng EM And JE agorthms Xaojn Sh Department of Computer Engneerng, Unversty of Caforna, Santa Cruz, CA, 9564 jennfer@soe.ucsc.edu

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION European Journa of Mathematcs and Computer Scence Vo. No. 1, 2017 ON AUTOMATC CONTNUTY OF DERVATONS FOR BANACH ALGEBRAS WTH NVOLUTON Mohamed BELAM & Youssef T DL MATC Laboratory Hassan Unversty MORO CCO

More information

Subgradient Methods and Consensus Algorithms for Solving Convex Optimization Problems

Subgradient Methods and Consensus Algorithms for Solving Convex Optimization Problems Proceedngs of the 47th IEEE Conference on Decson and Contro Cancun, Mexco, Dec. 9-11, 2008 Subgradent Methods and Consensus Agorthms for Sovng Convex Optmzaton Probems Björn Johansson, Tamás Kevczy, Mae

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students.

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students. Exampe: Suppose we want to bud a cassfer that recognzes WebPages of graduate students. How can we fnd tranng data? We can browse the web and coect a sampe of WebPages of graduate students of varous unverstes.

More information

Polite Water-filling for Weighted Sum-rate Maximization in MIMO B-MAC Networks under. Multiple Linear Constraints

Polite Water-filling for Weighted Sum-rate Maximization in MIMO B-MAC Networks under. Multiple Linear Constraints 2011 IEEE Internatona Symposum on Informaton Theory Proceedngs Pote Water-fng for Weghted Sum-rate Maxmzaton n MIMO B-MAC Networks under Mutpe near Constrants An u 1, Youjan u 2, Vncent K. N. au 3, Hage

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

3. Stress-strain relationships of a composite layer

3. Stress-strain relationships of a composite layer OM PO I O U P U N I V I Y O F W N ompostes ourse 8-9 Unversty of wente ng. &ech... tress-stran reatonshps of a composte ayer - Laurent Warnet & emo Aerman.. tress-stran reatonshps of a composte ayer Introducton

More information

On the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel

On the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel Proceedngs of th European Symposum on Artfca Neura Networks, pp. 25-222, ESANN 2003, Bruges, Begum, 2003 On the Equaty of Kerne AdaTron and Sequenta Mnma Optmzaton n Cassfcaton and Regresson Tasks and

More information

Nested case-control and case-cohort studies

Nested case-control and case-cohort studies Outne: Nested case-contro and case-cohort studes Ørnuf Borgan Department of Mathematcs Unversty of Oso NORBIS course Unversty of Oso 4-8 December 217 1 Radaton and breast cancer data Nested case contro

More information

Approximate merging of a pair of BeÂzier curves

Approximate merging of a pair of BeÂzier curves COMPUTER-AIDED DESIGN Computer-Aded Desgn 33 (1) 15±136 www.esever.com/ocate/cad Approxmate mergng of a par of BeÂzer curves Sh-Mn Hu a,b, *, Rou-Feng Tong c, Tao Ju a,b, Ja-Guang Sun a,b a Natona CAD

More information

Lower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle

Lower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle Lower bounds for the Crossng Number of the Cartesan Product of a Vertex-transtve Graph wth a Cyce Junho Won MIT-PRIMES December 4, 013 Abstract. The mnmum number of crossngs for a drawngs of a gven graph

More information

New Inexact Parallel Variable Distribution Algorithms

New Inexact Parallel Variable Distribution Algorithms Computatona Optmzaton and Appcatons 7, 165 18 1997) c 1997 Kuwer Academc Pubshers. Manufactured n The Netherands. New Inexact Parae Varabe Dstrbuton Agorthms MICHAEL V. SOLODOV soodov@mpa.br Insttuto de

More information

Distributed Moving Horizon State Estimation of Nonlinear Systems. Jing Zhang

Distributed Moving Horizon State Estimation of Nonlinear Systems. Jing Zhang Dstrbuted Movng Horzon State Estmaton of Nonnear Systems by Jng Zhang A thess submtted n parta fufment of the requrements for the degree of Master of Scence n Chemca Engneerng Department of Chemca and

More information

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2 Note 2 Lng fong L Contents Ken Gordon Equaton. Probabty nterpretaton......................................2 Soutons to Ken-Gordon Equaton............................... 2 2 Drac Equaton 3 2. Probabty nterpretaton.....................................

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

A Derivative-Free Algorithm for Bound Constrained Optimization

A Derivative-Free Algorithm for Bound Constrained Optimization Computatona Optmzaton and Appcatons, 21, 119 142, 2002 c 2002 Kuwer Academc Pubshers. Manufactured n The Netherands. A Dervatve-Free Agorthm for Bound Constraned Optmzaton STEFANO LUCIDI ucd@ds.unroma.t

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Development of whole CORe Thermal Hydraulic analysis code CORTH Pan JunJie, Tang QiFen, Chai XiaoMing, Lu Wei, Liu Dong

Development of whole CORe Thermal Hydraulic analysis code CORTH Pan JunJie, Tang QiFen, Chai XiaoMing, Lu Wei, Liu Dong Deveopment of whoe CORe Therma Hydrauc anayss code CORTH Pan JunJe, Tang QFen, Cha XaoMng, Lu We, Lu Dong cence and technoogy on reactor system desgn technoogy, Nucear Power Insttute of Chna, Chengdu,

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

n-step cycle inequalities: facets for continuous n-mixing set and strong cuts for multi-module capacitated lot-sizing problem

n-step cycle inequalities: facets for continuous n-mixing set and strong cuts for multi-module capacitated lot-sizing problem n-step cyce nequates: facets for contnuous n-mxng set and strong cuts for mut-modue capactated ot-szng probem Mansh Bansa and Kavash Kanfar Department of Industra and Systems Engneerng, Texas A&M Unversty,

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Accelerated gradient methods and dual decomposition in distributed model predictive control

Accelerated gradient methods and dual decomposition in distributed model predictive control Deft Unversty of Technoogy Deft Center for Systems and Contro Technca report 12-011-bs Acceerated gradent methods and dua decomposton n dstrbuted mode predctve contro P. Gsesson, M.D. Doan, T. Kevczky,

More information

Quantum Runge-Lenz Vector and the Hydrogen Atom, the hidden SO(4) symmetry

Quantum Runge-Lenz Vector and the Hydrogen Atom, the hidden SO(4) symmetry Quantum Runge-Lenz ector and the Hydrogen Atom, the hdden SO(4) symmetry Pasca Szrftgser and Edgardo S. Cheb-Terrab () Laboratore PhLAM, UMR CNRS 85, Unversté Le, F-59655, France () Mapesoft Let's consder

More information

Journal of Multivariate Analysis

Journal of Multivariate Analysis Journa of Mutvarate Anayss 3 (04) 74 96 Contents sts avaabe at ScenceDrect Journa of Mutvarate Anayss journa homepage: www.esever.com/ocate/jmva Hgh-dmensona sparse MANOVA T. Tony Ca a, Yn Xa b, a Department

More information

Interference Alignment and Degrees of Freedom Region of Cellular Sigma Channel

Interference Alignment and Degrees of Freedom Region of Cellular Sigma Channel 2011 IEEE Internatona Symposum on Informaton Theory Proceedngs Interference Agnment and Degrees of Freedom Regon of Ceuar Sgma Channe Huaru Yn 1 Le Ke 2 Zhengdao Wang 2 1 WINLAB Dept of EEIS Unv. of Sc.

More information

A MIN-MAX REGRET ROBUST OPTIMIZATION APPROACH FOR LARGE SCALE FULL FACTORIAL SCENARIO DESIGN OF DATA UNCERTAINTY

A MIN-MAX REGRET ROBUST OPTIMIZATION APPROACH FOR LARGE SCALE FULL FACTORIAL SCENARIO DESIGN OF DATA UNCERTAINTY A MIN-MAX REGRET ROBST OPTIMIZATION APPROACH FOR ARGE SCAE F FACTORIA SCENARIO DESIGN OF DATA NCERTAINTY Travat Assavapokee Department of Industra Engneerng, nversty of Houston, Houston, Texas 7704-4008,

More information

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory Proceedngs of the 2009 IEEE Internatona Conference on Systems Man and Cybernetcs San Antono TX USA - October 2009 Mutspectra Remote Sensng Image Cassfcaton Agorthm Based on Rough Set Theory Yng Wang Xaoyun

More information

Cyclic Codes BCH Codes

Cyclic Codes BCH Codes Cycc Codes BCH Codes Gaos Feds GF m A Gaos fed of m eements can be obtaned usng the symbos 0,, á, and the eements beng 0,, á, á, á 3 m,... so that fed F* s cosed under mutpcaton wth m eements. The operator

More information

Chapter 6. Rotations and Tensors

Chapter 6. Rotations and Tensors Vector Spaces n Physcs 8/6/5 Chapter 6. Rotatons and ensors here s a speca knd of near transformaton whch s used to transforms coordnates from one set of axes to another set of axes (wth the same orgn).

More information

Price Competition under Linear Demand and Finite Inventories: Contraction and Approximate Equilibria

Price Competition under Linear Demand and Finite Inventories: Contraction and Approximate Equilibria Prce Competton under Lnear Demand and Fnte Inventores: Contracton and Approxmate Equbra Jayang Gao, Krshnamurthy Iyer, Huseyn Topaogu 1 Abstract We consder a compettve prcng probem where there are mutpe

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Key words. corner singularities, energy-corrected finite element methods, optimal convergence rates, pollution effect, re-entrant corners

Key words. corner singularities, energy-corrected finite element methods, optimal convergence rates, pollution effect, re-entrant corners NESTED NEWTON STRATEGIES FOR ENERGY-CORRECTED FINITE ELEMENT METHODS U. RÜDE1, C. WALUGA 2, AND B. WOHLMUTH 2 Abstract. Energy-corrected fnte eement methods provde an attractve technque to dea wth eptc

More information

Xin Li Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, CHINA

Xin Li Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, CHINA RESEARCH ARTICLE MOELING FIXE OS BETTING FOR FUTURE EVENT PREICTION Weyun Chen eartment of Educatona Informaton Technoogy, Facuty of Educaton, East Chna Norma Unversty, Shangha, CHINA {weyun.chen@qq.com}

More information

QUARTERLY OF APPLIED MATHEMATICS

QUARTERLY OF APPLIED MATHEMATICS QUARTERLY OF APPLIED MATHEMATICS Voume XLI October 983 Number 3 DIAKOPTICS OR TEARING-A MATHEMATICAL APPROACH* By P. W. AITCHISON Unversty of Mantoba Abstract. The method of dakoptcs or tearng was ntroduced

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

COXREG. Estimation (1)

COXREG. Estimation (1) COXREG Cox (972) frst suggested the modes n whch factors reated to fetme have a mutpcatve effect on the hazard functon. These modes are caed proportona hazards (PH) modes. Under the proportona hazards

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

A General Column Generation Algorithm Applied to System Reliability Optimization Problems

A General Column Generation Algorithm Applied to System Reliability Optimization Problems A Genera Coumn Generaton Agorthm Apped to System Reabty Optmzaton Probems Lea Za, Davd W. Cot, Department of Industra and Systems Engneerng, Rutgers Unversty, Pscataway, J 08854, USA Abstract A genera

More information

Application of Particle Swarm Optimization to Economic Dispatch Problem: Advantages and Disadvantages

Application of Particle Swarm Optimization to Economic Dispatch Problem: Advantages and Disadvantages Appcaton of Partce Swarm Optmzaton to Economc Dspatch Probem: Advantages and Dsadvantages Kwang Y. Lee, Feow, IEEE, and Jong-Bae Par, Member, IEEE Abstract--Ths paper summarzes the state-of-art partce

More information

Monica Purcaru and Nicoleta Aldea. Abstract

Monica Purcaru and Nicoleta Aldea. Abstract FILOMAT (Nš) 16 (22), 7 17 GENERAL CONFORMAL ALMOST SYMPLECTIC N-LINEAR CONNECTIONS IN THE BUNDLE OF ACCELERATIONS Monca Purcaru and Ncoeta Adea Abstract The am of ths paper 1 s to fnd the transformaton

More information

An Augmented Lagrangian Coordination-Decomposition Algorithm for Solving Distributed Non-Convex Programs

An Augmented Lagrangian Coordination-Decomposition Algorithm for Solving Distributed Non-Convex Programs An Augmented Lagrangan Coordnaton-Decomposton Agorthm for Sovng Dstrbuted Non-Convex Programs Jean-Hubert Hours and Con N. Jones Abstract A nove augmented Lagrangan method for sovng non-convex programs

More information

Correspondence. Performance Evaluation for MAP State Estimate Fusion I. INTRODUCTION

Correspondence. Performance Evaluation for MAP State Estimate Fusion I. INTRODUCTION Correspondence Performance Evauaton for MAP State Estmate Fuson Ths paper presents a quanttatve performance evauaton method for the maxmum a posteror (MAP) state estmate fuson agorthm. Under dea condtons

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

MAXIMUM NORM STABILITY OF DIFFERENCE SCHEMES FOR PARABOLIC EQUATIONS ON OVERSET NONMATCHING SPACE-TIME GRIDS

MAXIMUM NORM STABILITY OF DIFFERENCE SCHEMES FOR PARABOLIC EQUATIONS ON OVERSET NONMATCHING SPACE-TIME GRIDS MATHEMATICS OF COMPUTATION Voume 72 Number 242 Pages 619 656 S 0025-57180201462-X Artce eectroncay pubshed on November 4 2002 MAXIMUM NORM STABILITY OF DIFFERENCE SCHEMES FOR PARABOLIC EQUATIONS ON OVERSET

More information

The Application of BP Neural Network principal component analysis in the Forecasting the Road Traffic Accident

The Application of BP Neural Network principal component analysis in the Forecasting the Road Traffic Accident ICTCT Extra Workshop, Bejng Proceedngs The Appcaton of BP Neura Network prncpa component anayss n Forecastng Road Traffc Accdent He Mng, GuoXucheng &LuGuangmng Transportaton Coege of Souast Unversty 07

More information

Predicting Model of Traffic Volume Based on Grey-Markov

Predicting Model of Traffic Volume Based on Grey-Markov Vo. No. Modern Apped Scence Predctng Mode of Traffc Voume Based on Grey-Marov Ynpeng Zhang Zhengzhou Muncpa Engneerng Desgn & Research Insttute Zhengzhou 5005 Chna Abstract Grey-marov forecastng mode of

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques Energes 20, 4, 73-84; do:0.3390/en40073 Artce OPEN ACCESS energes ISSN 996-073 www.mdp.com/journa/energes Short-Term Load Forecastng for Eectrc Power Systems Usng the PSO-SVR and FCM Custerng Technques

More information

DISTRIBUTED PROCESSING OVER ADAPTIVE NETWORKS. Cassio G. Lopes and Ali H. Sayed

DISTRIBUTED PROCESSING OVER ADAPTIVE NETWORKS. Cassio G. Lopes and Ali H. Sayed DISTRIBUTED PROCESSIG OVER ADAPTIVE ETWORKS Casso G Lopes and A H Sayed Department of Eectrca Engneerng Unversty of Caforna Los Angees, CA, 995 Ema: {casso, sayed@eeucaedu ABSTRACT Dstrbuted adaptve agorthms

More information

Achieving Optimal Throughput Utility and Low Delay with CSMA-like Algorithms: A Virtual Multi-Channel Approach

Achieving Optimal Throughput Utility and Low Delay with CSMA-like Algorithms: A Virtual Multi-Channel Approach Achevng Optma Throughput Utty and Low Deay wth SMA-ke Agorthms: A Vrtua Mut-hanne Approach Po-Ka Huang, Student Member, IEEE, and Xaojun Ln, Senor Member, IEEE Abstract SMA agorthms have recenty receved

More information

Inexact Newton Methods for Inverse Eigenvalue Problems

Inexact Newton Methods for Inverse Eigenvalue Problems Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

SIMULTANEOUS wireless information and power transfer. Joint Optimization of Power and Data Transfer in Multiuser MIMO Systems

SIMULTANEOUS wireless information and power transfer. Joint Optimization of Power and Data Transfer in Multiuser MIMO Systems Jont Optmzaton of Power and Data ransfer n Mutuser MIMO Systems Javer Rubo, Antono Pascua-Iserte, Dane P. Paomar, and Andrea Godsmth Unverstat Potècnca de Cataunya UPC, Barceona, Span ong Kong Unversty

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

Finding low error clusterings

Finding low error clusterings Fndng ow error custerngs Mara-Forna Bacan Mcrosoft Research, New Engand One Memora Drve, Cambrdge, MA mabacan@mcrosoft.com Mark Braverman Mcrosoft Research, New Engand One Memora Drve, Cambrdge, MA markbrav@mcrosoft.com

More information

[WAVES] 1. Waves and wave forces. Definition of waves

[WAVES] 1. Waves and wave forces. Definition of waves 1. Waves and forces Defnton of s In the smuatons on ong-crested s are consdered. The drecton of these s (μ) s defned as sketched beow n the goba co-ordnate sstem: North West East South The eevaton can

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

Research Article H Estimates for Discrete-Time Markovian Jump Linear Systems

Research Article H Estimates for Discrete-Time Markovian Jump Linear Systems Mathematca Probems n Engneerng Voume 213 Artce ID 945342 7 pages http://dxdoorg/11155/213/945342 Research Artce H Estmates for Dscrete-Tme Markovan Jump Lnear Systems Marco H Terra 1 Gdson Jesus 2 and

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1 Abstract The Entre Souton Path for Support Vector Machne n Postve and Unabeed Cassfcaton 1 Yao Lmn, Tang Je, and L Juanz Department of Computer Scence, Tsnghua Unversty 1-308, FIT, Tsnghua Unversty, Bejng,

More information

Downlink Power Allocation for CoMP-NOMA in Multi-Cell Networks

Downlink Power Allocation for CoMP-NOMA in Multi-Cell Networks Downn Power Aocaton for CoMP-NOMA n Mut-Ce Networs Md Shpon A, Eram Hossan, Arafat A-Dwe, and Dong In Km arxv:80.0498v [eess.sp] 6 Dec 207 Abstract Ths wor consders the probem of dynamc power aocaton n

More information

arxiv: v1 [cs.gt] 28 Mar 2017

arxiv: v1 [cs.gt] 28 Mar 2017 A Dstrbuted Nash qubrum Seekng n Networked Graphca Games Farzad Saehsadaghan, and Lacra Pave arxv:7009765v csgt 8 Mar 07 Abstract Ths paper consders a dstrbuted gossp approach for fndng a Nash equbrum

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper Games of Threats Elon Kohlberg Abraham Neyman Workng Paper 18-023 Games of Threats Elon Kohlberg Harvard Busness School Abraham Neyman The Hebrew Unversty of Jerusalem Workng Paper 18-023 Copyrght 2017

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Greyworld White Balancing with Low Computation Cost for On- Board Video Capturing

Greyworld White Balancing with Low Computation Cost for On- Board Video Capturing reyword Whte aancng wth Low Computaton Cost for On- oard Vdeo Capturng Peng Wu Yuxn Zoe) Lu Hewett-Packard Laboratores Hewett-Packard Co. Pao Ato CA 94304 USA Abstract Whte baancng s a process commony

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Analysis of Bipartite Graph Codes on the Binary Erasure Channel

Analysis of Bipartite Graph Codes on the Binary Erasure Channel Anayss of Bpartte Graph Codes on the Bnary Erasure Channe Arya Mazumdar Department of ECE Unversty of Maryand, Coege Par ema: arya@umdedu Abstract We derve densty evouton equatons for codes on bpartte

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Delay tomography for large scale networks

Delay tomography for large scale networks Deay tomography for arge scae networks MENG-FU SHIH ALFRED O. HERO III Communcatons and Sgna Processng Laboratory Eectrca Engneerng and Computer Scence Department Unversty of Mchgan, 30 Bea. Ave., Ann

More information