A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization
|
|
- Andrea Higgins
- 5 years ago
- Views:
Transcription
1 Journa of Machne Learnng Research Submtted 9/16; Revsed 1/17; Pubshed 1/17 A Genera Dstrbuted Dua Coordnate Optmzaton Framework for Reguarzed Loss Mnmzaton Shun Zheng Insttute for Interdscpnary Informaton Scences Tsnghua Unversty Bejng, Chna Jae Wang Department of Computer Scence The Unversty of Chcago Chcago, Inos Fen Xa Bejng Wsdom Uranum Technoogy Co., Ltd. Bejng, Chna We Xu Insttute for Interdscpnary Informaton Scences Tsnghua Unversty Bejng, Chna Tong Zhang Tencent AI Lab Shenzhen, Chna zhengs14@mas.tsnghua.edu.cn jae@uchcago.edu xafen@ebran.a wexu@tsnghua.edu.cn tongzhang@tongzhang-m.org Edtor: Sathya Keerth Abstract In modern arge-scae machne earnng appcatons, the tranng data are often parttoned and stored on mutpe machnes. It s customary to empoy the data paraesm approach, where the aggregated tranng oss s mnmzed wthout movng data across machnes. In ths paper, we ntroduce a nove dstrbuted dua formuaton for reguarzed oss mnmzaton probems that can drecty hande data paraesm n the dstrbuted settng. Ths formuaton aows us to systematcay derve dua coordnate optmzaton procedures, whch we refer to as Dstrbuted Aternatng Dua Maxmzaton DADM. The framework extends earer studes descrbed n Boyd et a., 11; Ma et a., 17; Jagg et a., 14; Yang, 13 and has rgorous theoretca anayses. Moreover, wth the hep of the new formuaton, we deveop the acceerated verson of DADM Acc-DADM by generazng the acceeraton technque from Shaev-Shwartz and Zhang, 14 to the dstrbuted settng. We aso provde theoretca resuts for the proposed acceerated verson, and the new resut mproves prevous ones Yang, 13; Ma et a., 17 whose teraton compextes grow neary on the condton number. Our emprca studes vadate our theory and show that our acceerated approach sgnfcanty mproves the prevous state-of-the-art dstrbuted dua coordnate optmzaton agorthms.. Most of the work was done durng the nternshp of Shun Zheng at Badu Bg Data Lab n Bejng. c 17 Shun Zheng, Jae Wang, Fen Xa, We Xu, and Tong Zhang. Lcense: CC-BY 4., see Attrbuton requrements are provded at
2 Zheng, Wang, Xa, Xu, and Zhang Keywords: dstrbuted optmzaton, stochastc dua coordnate ascent, acceeraton, reguarzed oss mnmzaton, computatona compexty 1. Introducton In arge-scae machne earnng appcatons for bg data anayss, t becomes a common practce to partton the tranng data and store them on mutpe machnes connected va a commodty network. A typca settng of dstrbuted machne earnng s to aow these machnes to tran n parae, wth each machne processng ts oca data wth no data communcaton. Ths paradgm s often referred to as data paraesm. To reduce the overa tranng tme, t s often necessary to ncrease the number of machnes and to mnmze the communcaton overhead. A sgnfcant chaenge s to reduce the tranng tme as much as possbe when we ncrease the number of machnes. A practca souton requres two research drectons: one s to mprove the underyng system desgn makng t sutabe for machne earnng agorthms Dean and Ghemawat, 8; Zahara et a., 1; Dean et a., 1; L et a., 14; the other s to adapt tradtona snge-machne optmzaton methods to hande data paraesm Boyd et a., 11; Yang, 13; Mahajan et a., 13; Shamr et a., 14; Jagg et a., 14; Mahajan et a., 17; Ma et a., 17; Takáč et a., 15; Zhang and Ln, 15. Ths paper focuses on the atter. For bg data machne earnng on a snge machne, there are two types of agorthms: batch agorthms such as gradent descent or L-BFGS Lu and Noceda, 1989, and stochastc optmzaton agorthms such as stochastc gradent descent and ther modern varance reduced versons Defazo et a., 14; Johnson and Zhang, 13. It s known that batch agorthms are reatvey easy to paraeze. However, on a snge machne, they converge more sowy than the modern stochastc optmzaton agorthms due to ther hgh perteraton computaton costs. Specfcay, t has been shown that the modern stochastc optmzaton agorthms converge faster than the tradtona batch agorthms for convex reguarzed oss mnmzaton probems. The faster convergence can be guaranteed n theory and observed n practce. The fast convergence of modern stochastc optmzaton methods has ed to studes to extend these methods to the dstrbuted computng settng. Specfcay, ths paper consders the generazaton of Stochastc Dua Coordnate Ascent SDCA method Hseh et a., 8; Shaev-Shwartz and Zhang, 13 and ts proxma varant Shaev-Shwartz and Zhang, 14 to hande dstrbuted tranng usng data paraesm. Athough ths probem has been consdered prevousy Yang, 13; Jagg et a., 14; Ma et a., 17, these earer approaches work wth the dua formuaton that s the same as the tradtona snge-machne dua formuaton, where dua varabes are couped, and hence run nto dffcutes when they try to motvate and anayze the derved methods under the dstrbuted envronment. One contrbuton of ths work s to ntroduce a new dua formuaton specfcay for dstrbuted reguarzed oss mnmzaton probems when data are dstrbuted to mutpe machnes. In our new formuaton, we decoupe the oca dua varabes through ntroducng another dua varabe β. Ths unque dua formuaton aows us to naturay extend the proxma SDCA agorthm ProxSDCA of Shaev-Shwartz and Zhang, 14 to the settng of mut-machne dstrbuted optmzaton that can beneft from data paraesm. Moreover, the anayss of the orgna ProxSDCA can be easy adapted to the new formu-
3 A Genera Dstrbuted Dua Coordnate Optmzaton Framework aton, eadng to new theoretca resuts. Ths new dua formuaton can aso be combned wth the acceeraton technque of Shaev-Shwartz and Zhang, 14 to further mprove convergence. In the proposed formuaton, each teraton of the dstrbuted dua coordnate ascent optmzaton s naturay decomposed nto a oca step and a goba step. In the oca step, we aow the use of any oca procedure to optmze a oca dua objectve functon usng oca parameters and oca data on each machne. Ths fexbty s smar to those of Ma et a., 17; Jagg et a., 14. For exampe, we may appy ProxSDCA as the oca procedure. In the oca step, a computer node can perform the optmzaton ndependenty wthout communcatng wth each other. Whe n the goba step, nodes communcate wth each other to synchronze the oca parameters and jonty update the goba prma souton. Ony ths goba step requres communcaton among nodes. We summarze our man contrbutons as foows: New dstrbuted dua formuaton Ths new formuaton naturay eads to a two-step oca-goba dua aternatng optmzaton procedure for dstrbuted machne earnng. We thus ca the resutng procedure Dstrbuted Aternatng Dua Maxmzaton DADM. Note that DADM drecty generazes ProxSDCA, whch can hande compex reguarzatons such as L -L 1 reguarzaton. New convergence anayss The new formuaton aows us to drecty generaze the anayss of ProxSDCA n Shaev-Shwartz and Zhang, 14 to the dstrbuted settng. Ths anayss s n contrast to that of CoCoA + n Ma et a., 17, whch empoys a dfferent way based on the Θ-approxmate souton assumpton of the oca sover. Our anayss can ead to smpfed resuts n the commony used mn-batch setup. Acceeraton wth theoretca guarantees Based on the new dstrbuted dua formuaton, we can naturay derve a dstrbuted verson of the acceerated proxma SDCA method AccProxSDCA of Shaev-Shwartz and Zhang, 14, whch has been shown to be effectve on a snge machne. We ca the resutng procedure Acceerated Dstrbuted Aternatng Dua Maxmzaton Acc-DADM. The man dea s to modfy the orgna formuaton usng a sequence of approxmatons that have stronger reguarzatons. Moreover, we drecty adapt theoretca anayses of AccProxSDCA to the dstrbuted settng and provde guarantees for Acc-DADM. Our theorems guarantee that we can aways obtan a computaton speedup compared wth the snge-machne AccProxSDCA. These guarantees mprove the theoretca resuts of DADM and prevous methods Yang, 13; Ma et a., 17 whose teraton compextes grow neary on the condton number. Latter methods possby fa to provde computaton tme mprovement over the snge-machne ProxSDCA when the condton number s arge. Extensve emprca studes We perform extensve experments to compare the convergence and the scaabty of the acceerated approach wth that of prevous state-of-the-art dstrbuted dua coordnate ascent methods. Our emprca studes show that Acc-DADM can acheve faster convergence and better scaabty than the prevous state-of-the-art, n partcuar when the condton number s arge. Ths phenomenon s consstent wth our theory. 3
4 Zheng, Wang, Xa, Xu, and Zhang We organze the rest of the paper as foows. Secton dscusses reated works. Secton 3 provdes premnary defntons. Secton 4 to 6 present the dstrbuted prma formuaton, the dstrbuted dua formuaton and our DADM method respectvey. Secton 7 then provdes theorems for DADM. Secton 8 ntroduces the acceerated verson and provdes correspondng theoretca guarantees. Secton 9 ncudes a proofs of ths paper. Secton 1 provdes extensve emprca studes of our nove method. Fnay, Secton 11 concudes the whoe paper.. Reated Work Severa generazatons of SDCA to the dstrbuted settngs have been proposed n the terature, ncudng DsDCA Yang, 13, CoCoA Jagg et a., 14, and CoCoA + Ma et a., 17. DsDCA was the frst attempt to study dstrbuted SDCA, and t provded a basc theoretca anayss and a practca varant that behaves we emprcay. Nevertheess, ther theoretca resut ony appes to a few specay chosen mn-batch oca dua updates that dffer from the practca method used n ther experments. In partcuar, they dd not show that optmzng each oca dua probem eads to convergence. Ths mtaton makes the methods they anayzed nfexbe. CoCoA was proposed to fx the above gap between theory and practce, and t was camed to be a framework for dstrbuted dua coordnate ascent n that t aows any oca dua sover to be used for the oca dua probem, rather than the mpractca choces of DsDCA. However, the actua performance of CoCoA s nferor to the practca varant proposed n DsDCA wth an aggressve oca update. We note that the practca varant of DsDCA dd not have a sod theoretca guarantee at that tme. CoCoA + fxed ths stuaton and may be regarded as a generazaton of CoCoA. The most effectve choce of the aggregaton parameter eads to a verson whch s smar to DsDCA, but aows exact optmzaton of each dua probem n ther theory. Accordng to studes n Ma et a., 17, the resutng CoCoA + agorthm performs sgnfcanty better than the orgna CoCoA both theoretcay and emprcay. The orgna CoCoA + Ma et a., 15 can ony hande probems wth the L reguarzer, and t was generazed to genera strongy convex reguarzers n the ong verson Ma et a., 17. Besdes, Smth et a., 16 extended the framework to sove the prma probem of reguarzed oss mnmzaton and cover genera non-strongy convex reguarzers such as L 1 reguarzer, and Hseh et a., 15 studed parae SDCA wth asynchronous updates. Athough CoCoA + has the advantage of aowng arbtrary oca sovers and fexbe approxmate soutons of oca dua probems, ts theoretca anayses do not capture the contrbuton of the number of machnes and the mn-batch sze to the teraton compexty expcty. Moreover, the teraton compextes of both CoCoA + and DsDCA grow neary wth the condton number. Thus they probaby cannot provde computaton tme mprovement over the snge-machne SDCA when the condton number s arge. Ths paper w remedy these unsatsfed aspects by provdng a dfferent anayss based on a new dstrbuted dua formuaton. Usng ths formuaton, we can anayze procedures that can take an arbtrary oca dua sover, whch s ke CoCoA + ; moreover, we aow the dua updates to be a mn-batch, whch s ke DsDCA. Besdes, ths formuaton aso aows 4
5 A Genera Dstrbuted Dua Coordnate Optmzaton Framework us to naturay generaze AccProxSDCA and reevant theoretca resuts to the dstrbuted settng. Our emprca resuts aso vadate the superorty of the acceerated approach. Whe we focus on extendng SDCA n ths paper, we note that there are other approaches for parae optmzaton. For exampe, there are drect attempts to paraeze stochastc gradent descent Nu et a., 11; Znkevch et a., 1. Some of these procedures ony consder mut-core shared memory stuaton, whch s very dfferent from the dstrbuted computng envronment nvestgated n ths paper. In the settng of dstrbuted computng, data are parttoned nto mutpe machnes, and one often needs to study communcaton-effcent agorthms. In such cases, one extreme s to aow exact optmzaton of subprobems on each oca machne as consdered n Shamr et a., 14; Zhang and Ln, 15. Athough ths approach mnmzes communcaton, the computaton cost for each oca sover can domnate the overa tranng. Therefore n practce, t s necessary to do a trade-off by usng the mn-batch update approach Takáč et a., 13, 15. However, t s dffcut for tradtona mn-batch methods to desgn reasonabe aggregaton strateges to acheve fast convergence. Takáč et a., 15 studed how the step sze can be reduced when the mn-batch sze grows n the dstrbuted settng. Lee and Roth, 15 derved an anaytca souton of the optma step sze for dua near support vector machne probems. Besdes, Mahajan et a., 13 presented a genera framework for dstrbuted optmzaton based on oca functona approxmaton, whch ncude severa frst-order and second-order methods as speca cases. Mahajan et a., 17 consdered each machne to hande a bock of coordnates and proposed dstrbuted bock coordnate descent methods for sovng 1 reguarzed oss mnmzaton probems. Dfferent from those methods, Dstrbuted Aternatng Dua Maxmzaton DADM proposed n ths work handes the trade-off between computaton and communcaton by deveopng bounds for mn-batch dua updates, whch s smar to Yang, 13. Moreover, DADM aows other better oca sovers to acheve faster convergence n practce. 3. Premnares In ths secton, we ntroduce some notatons used ater. A functons that we consder n ths paper are proper convex functons over a Eucdean space. Gven a functon f : R d R, we denote ts conjugate functon as f b = sup[b a fa]. a A functon f : R d R s L-Lpschtz wth respect to f for a a, b R d, we have fa fb L a b. A functon f : R d R s 1/γ-smooth wth respect to f t s dfferentabe and ts gradent s 1/γ-Lpschtz wth respect to. An equvaent defnton s that for a a, b R d, we have fb fa + fa b a + 1 γ b a. A functon f : R d R s -strongy convex wth respect to f for any a, b R d, we have fb fa + fa b a + b a, 5
6 Zheng, Wang, Xa, Xu, and Zhang where fa s any subgradent of fa. It s we known that a functon f s γ-strongy convex wth respect to f and ony f ts conjugate functon f s 1/γ-smooth wth respect to. 4. Dstrbuted Prma Formuaton In ths paper, we consder the foowng generc reguarzed oss mnmzaton probem: [ ] n P w := φ X w + ngw + hw, 1 mn w R d =1 whch s often encountered n practca machne earnng probems. Here we assume each X R d q s a d q matrx, w R d s the mode parameter vector, φ u s a convex oss functon defned on R q, whch s assocated wth the -th data pont, > s the reguarzaton parameter, gw s a strongy convex reguarzer and hw s another convex reguarzer. A speca case s to smpy set hw =. Here we aow the more genera formuaton, whch can be used to derve dfferent dstrbuted dua forms that may be usefu for speca purposes. The above optmzaton formuaton can be specazed to a varety of machne earnng probems. As an exampe, we may consder the L -L 1 reguarzed east squares probem, where φ x w = w x y for vector nput data x R d and rea vaued output y R, gw = w + a w 1, and hw = b w 1 for some a, b. If we set hw =, then t s we-known see, for exampe, Shaev-Shwartz and Zhang, 14 that the prma probem 1 has an equvaent snge-machne dua form of max α R n [ Dα := n =1 n φ α ng =1 X ] α, n where α = [α 1,, α n ], α R q = 1,..., n are dua varabes, φ s the convex conjugate functon of φ, and smary, g s the convex conjugate functon of g. The stochastc dua coordnate ascent method, referred to as SDCA n Shaev-Shwartz and Zhang, 14, maxmzes the dua formuaton by optmzng one randomy chosen dua varabe at each teraton. Throughout the agorthm, the foowng prma-dua reatonshp s mantaned: n wα = g =1 X α, 3 n for some subgradent g v. It s known that wα = w, where w and α are optma soutons of the prma probem and the dua probem respectvey. It was shown n Shaev-Shwartz and Zhang, 14 that the duaty gap defned as P wα Dα, whch s an upper-bound of the prma sub-optmaty P wα P w, converges to zero. Moreover, a convergence rate can be estabshed. In partcuar, for smooth oss functons, the convergence rate s near. We note that SDCA s sutabe for optmzaton on a snge machne because t works wth a dua formuaton that s sutabe for a snge machne. In the foowng, we w 6
7 A Genera Dstrbuted Dua Coordnate Optmzaton Framework generaze the snge-machne dua formuaton to the dstrbuted settng, and study the correspondng dstrbuted verson of SDCA. In the dstrbuted settng, we assume that the tranng data are parttoned and dstrbuted to m machnes. In other words, the ndex set S = {1,..., n} of the tranng data s dvded nto m non-overappng parttons, where each machne {1,..., m} contans ts own partton S S. We assume that S = S, and we use n := S to denote the sze of the tranng data on machne. Next, we can rewrte the prma probem 1 as the foowng constraned mnmzaton probem that s sutabe for the mut-machne dstrbuted settng: mn w;{w } m =1 m P w + hw =1 s.t. w = w, for a {1,..., m}, where P w := S φ X w + n gw, 4 where w represents the oca prma varabe on each machne, P s the correspondng oca prma probem and the constrants w = w are mposed to synchronze the oca prma varabes. Obvousy ths mut-machne dstrbuted prma formuaton 4 s equvaent to the orgna prma probem 1. We note that the dea of objectve spttng n 4 s smar to the goba varabe consensus formuaton descrbed n Boyd et a., 11. Instead of usng the commony used ADMM Aternatng Drecton Method of Mutpers method that s not a generazaton of, n ths paper we derve a dstrbuted dua formuaton based on 4 that drecty generazes. We further propose a framework caed Dstrbuted Aternatng Dua Maxmzaton DADM to sove the dstrbuted dua formuaton. One advantage of DADM over ADMM s that DADM does not need to sove the subprobems n hgh accuracy, and thus t can naturay enjoy the trade-off between computaton and communcaton, whch s smar to reated methods such as DsDCA, CoCoA and CoCoA Dstrbuted Dua Formuaton The optmzaton probem 4 can be further rewrtten as: mn w;{w };{u } s.t m φ u + n gw + hw S =1 u = X w, for a S w = w, for a {1,..., m}. 5 Here we ntroduce n dua varabes α := {α } n =1, where each α s the Lagrange mutper for the constrant u X w =, and m dua varabes β := {β } m =1, where each β s the Lagrange mutper for the constrant w w =. We can now ntroduce the prma-dua 7
8 Zheng, Wang, Xa, Xu, and Zhang objectve functon wth Lagrange mutpers as foows: Jw; {w }; {u }; {α }; {β } m := φ u + α u X w + n gw + β w w + hw. =1 S Proposton 1 Defne the dua objectve as m Dα, β := φ α n g S X α β h β. n S Then we have =1 Dα, β = mn Jw; {w }; {u }; {α }; {β }, w;{w };{u } where the mnmzers are acheved when the foowng equatons are satsfed φ u + α =, X α β + n gw =, S 6 β + hw =, for some subgradents φ u, gw, and hw. When β = {β } are fxed, we may defne the oca snge-machne dua formuaton on each machne wth respect to α as D α β := φ α n g S X α β, 7 n S where α represents oca dua varabes {α ; S } on machne, β R d serves as a carrer for synchronzaton of machne. Based on Proposton 1, we obtan the foowng mut-machne dstrbuted dua formuaton for the correspondng prma probem 4: m m Dα, β = D α β h β. 8 =1 Moreover, we have the non-negatve duaty gap, and zero duaty gap can be acheved when w s the mnmzer of P w and α, β maxmzes the dua Dα, β. Proposton Gven any w, α, β, the foowng duaty gap s non-negatve: P w Dα, β. Moreover, zero duaty gap can be acheved at w, α, β, where w s the mnmzer of P w and α, β s a maxmzer of Dα, β. =1 8
9 A Genera Dstrbuted Dua Coordnate Optmzaton Framework We note that the parameters {β } m =1 pass the goba nformaton across mutpe machnes. When β s fxed, D α β wth respect to α corresponds to the dua of the adjusted oca prma probem: P w β := S φ X w + n g w, 9 where the orgna reguarzer n gw n P w s repaced by the adjusted reguarzer n g w := n gw + β w. Smar to the snge-machne prma-dua reatonshp of 3, we have the foowng oca prma-dua reatonshp on each machne as: w α, β = g ṽ = g v, 1 where v = S X α n, ṽ = v β n. where Moreover, we can defne the goba prma-dua reatonshp as wα, β = g ṽ = g v, 11 n =1 v = X α, ṽ = v n β n. We can aso estabsh the reatonshp of goba-oca duaty n Proposton 3. Proposton 3 Gven w, α, β and {w } such that w 1 = = w m = w, we have the foowng decomposton of goba duaty gap as the sum of oca duaty gaps: P w Dα, β m =1 [ P w β D α β ], and the equaty hods when hw = β for some subgradent hw. Athough we aow arbtrary hw, the case of hw = s of speca nterests. Ths corresponds to the conjugate functon h β = { + f β f β =. That s, the term h m =1 β s equvaent to mposng the constrant m =1 β =. 9
10 Zheng, Wang, Xa, Xu, and Zhang Agorthm 1 Loca Dua Update Retreve oca parameters α t 1,ṽ t 1 Randomy pck a mn-batch Q S Approxmatey maxmze 1 w.r.t α Q Update α t as α t = α t 1 + α for a Q return v t = 1 n Q X α 6. Dstrbuted Aternatng Dua Maxmzaton Mnmzng the prma formuaton 4 s equvaent to maxmzng the dua formuaton 8, and the atter can be acheved by repeatedy usng the foowng aternatng optmzaton strategy, whch we refer to as Dstrbuted Aternatng Dua Maxmzaton DADM: Loca step: fx β and et each machne approxmatey optmze D α β w.r.t α n parae. Goba step: maxmze the goba dua objectve w.r.t β, and set the goba prma parameter w accordngy. The above steps are apped n teratons t = 1,,..., T. At the begnnng of each teraton t, we assume that the oca prma and dua varabes on each oca machne are α t 1, β t 1, v t 1, then we seek to update α t 1 to α t and vt 1 to v t n the oca step, and seek to update β t 1 to β t n the goba step. We note that the oca step can be executed n parae w.r.t dua varabes {α } m =1. In practce, t s often usefu to optmze 7 approxmatey by usng a randomy seected mn-batch Q S of sze Q = M. That s, we want to fnd α t wth Q to approxmatey maxmze the oca dua objectve as foows: D t Q α Q := φ α t 1 α n g ṽ t 1 Q + X α. 1 n Q Ths step s descrbed n Agorthm 1. We can use any sover for ths approxmate optmzaton, and n our experments, we choose ProxSDCA. The goba step s to synchronze a oca soutons, whch requres communcaton among the machnes. Ths s acheved by optmzng the foowng dua objectve wth respect to a β = {β }: β t arg max Dα t, β. 13 β Proposton 4 Gven v, et wv be the unque souton of the foowng optmzaton probem [ ] wv = arg mn nw v + ngw + hw 14 w that satsfes n gw + hw = nv 1
11 A Genera Dstrbuted Dua Coordnate Optmzaton Framework for some subgradents gw and hw = ρ at w = wv. Then βv = ρ s a souton of max [ ng v b ] h b, b n and wv = g v βv. n Proposton 5 Gven α, a souton of can be obtaned by settng max Dα, β β β = n v α vα + βvα n where βvα s defned n Proposton 4, n =1 vα = X α, v α n = S X α n. Moreover, f we et w = wα, β = wvα = g vα βvα, n where wv s defned n Proposton 4, and w = w α, β = g v α β, n then w = w for a, and P w Dα, β = m [ P w β D α β ]. =1 Accordng to Proposton 5, the souton of 13 s gven by β t = n v t v t + ρt, n where v t = m =1 n n vt = v t 1 + m =1 n n vt, 11
12 Zheng, Wang, Xa, Xu, and Zhang Agorthm Dstrbuted Aternatng Dua Maxmzaton DADM Input: Objectve P w, target duaty gap ɛ, warm start varabes w nt, α nt, β nt, v nt, f not specfed, set w nt =, α nt =, β nt =, v nt =,. Intaze: et w = w nt, α = α nt, β = β nt, v = v nt. for t = 1,,... do Loca step for a machnes = 1,,..., m n parae do ca an arbtrary oca procedure, such as Agorthm 1 end for Goba step Aggregate v t = v t 1 + m =1 n n vt Compute ṽ t accordng to 15 Let ṽ t = ṽ t ṽ t 1 for a machnes = 1,,..., m n parae do update oca parameter ṽ t = ṽ t 1 + ṽ t end for Stoppng condton: Stop f P w t Dα t, β t ɛ. end for return w t = g ṽ t, α t, β t, v t, and the duaty gap P w t Dα t, β t. and ρ t = hw t s a subgradent of h at the souton w t of [ ] w t = arg mn nw v t + ngw + hw, w that can acheve the frst order optmaty condton nv t + n gw t + ρ t = for some subgradent gw t. The defnton of ṽ mpes that after each goba update, we have ṽ t = ṽ t = v t ρt n = gwt, for a = 1,..., m. 15 Snce the objectve 1 for the oca step on each machne ony depends on the mnbatch Q samped from S and the vector ṽ t, whch needs to be synchronzed at each goba step, we know from 15 that at each tme t, we can pass the same vector ṽ t as ṽ t to a nodes. In practce, t may be benefca to pass ṽ t nstead, especay when ṽ t s sparse but ṽ t s dense. Put thngs together, the oca-goba DADM teratons can be summarzed n Agorthm. If we consder the speca case of hw =, the souton of 15 s smpy ṽ t = ṽ t = v t, and the goba step n Agorthm can be smpfed as frst aggregatng updates by ṽ t = v t = 1 m =1 n n vt,
13 A Genera Dstrbuted Dua Coordnate Optmzaton Framework and then updatng oca parameters n parae. Further, f hw = and the data partton s baanced, that s n are dentca for a = 1,..., m, t can be verfed that the DADM procedure gnorng the mn-batch varaton s equvaent to CoCoA +. Therefore the framework presented here may be regarded as an aternatve nterpretaton. Moreover, when the added reguarzaton n 1 s compex and mght nvoves more than one non-smooth term, consderng the spttng of gw and hw can brng computatona advantages. For exampe, to promote both sparsty and group sparsty n the predctor we often use the sparse group asso reguarzaton Fredman et a., 1, where a combnaton of L 1 norm and mxed L /L 1 norm group sparse norm s ntroduced: 1 G w G + w / w, where we add a sght L reguarzaton to make t strongy convex, as dd n Shaev-Shwartz and Zhang, 14. The proxma mappng wth respect to the sparse group asso reguarzaton functon does not have cosed form souton, thus often rees on teratve mnmzaton steps, but there are cosed form proxma mappng wth respect to ether L -L 1 norm or the group norm. Thus f we smpy set hw = and gw = 1 G w G + w / w, then both the oca optmzaton update 1 and goba synchronzaton step 14 w not have cosed form souton. However, f we assgn the group norm on hw such that hw = 1 G w G, and hence gw = w / w, the oca updates steps 1 w enjoy cosed form update, whch makes the mpementaton much easer and we ony need to use teratve mnmzaton on the rare goba synchronzaton step Convergence Anayss Let w be the optma souton for the prma probem P w and α, β be the optma souton for the dua probem Dα, β respectvey. For the prma souton w t and the dua souton α t, β t at teraton t, we defne the prma sub-optmaty as and the dua sub-optmaty as ɛ t P := P wt P w, ɛ t D := Dα, β Dα t, β t. Due to the cose reatonshp of the dstrbuted dua formuaton and the snge-machne dua formuaton, an anayss of DADM can be obtaned by drecty generazng that of SDCA. We consder two knds of oss functons, smooth oss functons that mpy fast near convergence and genera L-Lpschtz oss functons. For the foowng two theorems we aways assume that g s 1-strongy convex w.r.t, X R for a, M = Q t s fxed on each machne, and our oca procedure optmzes D Q suffcenty we on each t t machne such that D Q α Q D Q α Q, where α Q s gven by a speca choce n each theorem. Theorem 6 Assume that each φ s 1/γ-smooth w.r.t and α Q s gven by α := s u t 1 α t 1, for a Q, 13
14 Zheng, Wang, Xa, Xu, and Zhang where u t 1 := φ X wt 1 and s := γn γn +M R [, 1]. To reach an expected duaty gap of E[P w T Dα T, β T ] ɛ, every T satsfyng the foowng condton s suffcent, T R γ + max n M og R γ + max n M ɛ D ɛ. 16 Theorem 7 Assume that each φ s L-Lpschtz w.r.t, and α Q s gven by := φ X wt 1 and ] q [, mn M /n ]. To reach an expected nor- ɛ, every T satsfyng the foowng condton s where u t 1 mazed duaty gap of E suffcent, α := qn u t 1 α t 1, for a Q, M [ P w Dα,β n T T + ñ + G ɛ max { }, ñ ogñ ɛ D ng + ñ + 5G ɛ, 17 where T max{t, 4G ɛ ñ + t }, t = max{, ñ ogñ ɛ D ng }, ñ = max n /M, G = 4RL, and w, α, β represent ether the average vector or a randomy chosen vector of w t 1, α t 1, β t 1 over t {T +1,..., T } respectvey, such as α = 1 T T T t=t +1 αt 1, β = 1 T T T t=t +1 βt 1, w = 1 T T T t=t +1 wt 1. Remark 8 Both Theorem 6 and Theorem 7 ncorporate two key components: the term n max M and the condton number term 1 L γ or. When the term max n M domnates the teraton compexty, we can speed up convergence and reduce the number of communcatons by ncreasng the number of machnes m or the oca mn-batch sze M. However, n some crcumstances when the condton number s arge, t w become the eadng factor, and ncreasng m or M w not contrbute to the computaton speedup. To tacke ths probem, we deveop the acceerated verson of DADM n Secton 8. Remark 9 Our method s cosey reated to prevous dstrbuted extensons of SDCA. Theorems 6, 7 that provde theoretca guarantees for more genera oca updates acheve the same teraton compexty wth the ones n DsDCA that ony aows some speca choces of oca mn-batch updates. Compared wth theoretca resuts of CoCoA + that are based on the Θ-approxmate souton of the oca dua subprobem, athough the derved bounds are wthn the same scae, Õ1/ɛ for Lpschtz osses and Õog1/ɛ for smooth osses, our bounds are dfferent and compementary. The anayss of CoCoA + can provde better nsghts for more accurate soutons of the oca sub-probems. Whe our anayss s based on the mn-batch setup, t can capture the contrbutons of the mn-batch sze and the number of machnes more expcty. Remark 1 Snce the bounds are derved wth a speca choce of α Q, the actua performance of the agorthm can be sgnfcanty better than what s ndcated by the bounds when the oca duas are better optmzed. For exampe, we can choose ProxSDCA n Shaev- Shwartz and Zhang, 14 as the oca procedure and adopt the sequenta update strategy as the oca sover of CoCoA + does. Ths s aso the one used n our experments. 14
15 A Genera Dstrbuted Dua Coordnate Optmzaton Framework Agorthm 3 Acceerated Dstrbuted Aternatng Dua Maxmzaton Acc-DADM. Parameters κ, η = / + κ, ν = 1 η/1 + η. Intaze v = y = w =, α =, ξ = 1 + η P D,. for t = 1,,..., T outer do 1. Construct new objectve: n P t w = φ X w + ngw + hw + κn w y t 1. =1. Ca DADM sover: w t, α t, β t, v t, ɛ t = DADMP t, ηξ t 1 / + η, w t 1, α t 1, β t 1, v t Update: y t = w t + νw t w t Update: end for Return w Touter. ξ t = 1 η/ξ t Acceeraton 1 L Theorems 6, 7 a mpy that when the condton number γ or s reatvey sma, DADM converges fast. However, the convergence may be sow when the condton number s arge and domnates the teraton compexty. In fact, we observe emprcay that the basc DADM method converges sowy when the reguarzaton parameter s sma. Ths phenomenon s aso consstent wth that of SDCA for the snge-machne case. In ths secton, we ntroduce the Acceerated Dstrbuted Aternatng Dua Maxmzaton Acc- DADM method that can aevate the probem. The procedure s motvated by Shaev-Shwartz and Zhang, 14, whch empoys an nner-outer teraton: at every teraton t, we sove a sghty modfed objectve, whch adds a reguarzaton term centered around the vector y t 1 = w t 1 + ν w t 1 w t, 18 where ν [, 1] s caed the momentum parameter. The acceerated DADM procedure descrbed n Agorthm 3 can be smary vewed as an nner-outer agorthm, where DADM serves as the nner teraton. In the outer teraton, we adjust the reguarzaton vector y t 1. That s, at each outer teraton t, we defne a modfed oca prma objectve on each machne, whch has the same form as the orgna oca prma objectve 9, except that g w s modfed to g t w that s defned by n g t w =n g t w + β w, g t w =gw + κ w y t 1. 15
16 Zheng, Wang, Xa, Xu, and Zhang It foows that we w need to sove a modfed dua at each oca step wth g repaced by g t n the oca dua probem 1. Therefore, compared to the basc DADM procedure, nothng changes other than g beng repaced by g t at each teraton. Specfcay, when the number of machnes m equas 1, ths agorthm reduces to AccProxSDCA descrbed n Shaev-Shwartz and Zhang, 14. Thus Acc-DADM can be naturay regarded as the dstrbuted generazaton of the snge-machne AccProxSDCA. Moreover, Acc-DADM aso aows arbtrary oca procedures as DADM does. Our emprca studes show that Acc-DADM sgnfcanty outperforms DADM n many cases. There are probaby two reasons. One reason s the use of a modfed reguarzer g t w that s more strongy convex than the orgna reguarzer gw when κ s much arger than. The other reason s cosey reated to the dstrbuted settng consdered n ths paper. Observe that n the modfed oca prma objectve P t w β := P w β + κn w y t 1, the frst term corresponds to the orgna oca prma objectve and the second term s an extra reguarzaton due to acceeraton that constrans w to be cose to y t 1. The effect s that dfferent oca probems become more smar to each other, whch stabze the overa system. 8.1 Theoretca Resuts of Acc-DADM for Smooth Losses The foowng theorem estabshes the computaton effcency guarantees for Acc-DADM. Theorem 11 Assume that each φ s 1/γ-smooth, and g s 1-strongy convex w.r.t, X R for a, M = Q s fxed on each machne. To obtan expected ɛ prma sub-optmaty: E[P w t ] P w ɛ, t s suffcent to have the foowng number of stages n Agorthm 3 T outer 1 + η og ξ 4 + κ + κ P D, = 1 + og + og, ɛ ɛ and the number of nner teratons n DADM at each stage: R T nner γ + κ + max n R og γ + κ + max n M M κ og. In partcuar, suppose we assume n 1 = n =... = n m, and M 1 = M =... = M m = b, then the tota vector computatons for each machne are bounded by κ + ÕT outer T nner b = Õ R 1 + γ + κ + n b. mb Remark 1 When κ =, then the guarantees reduce to DADM. However, DADM ony enjoys near speedup over ProxSDCA when the number of machnes satsfes m nγ/r, 16
17 A Genera Dstrbuted Dua Coordnate Optmzaton Framework and beng abe to obtan sub-near speedup when R γ = On. Besdes enjoyng the propertes descrbed above as DADM, f we choose κ n Agorthm 3 as κ = mr γn, and b = 1, then the tota vector computatons for each machne are bounded by Rm n Õ = γn m Õ Rn, γm whch means Acc-DADM can be much faster than DADM when the condton number s arge and aways obtan a reducton of computatons over the snge-machne AccProxSDCA by a factor of Õ 1/m. 8. Acceeraton for Non-smooth, Lpschtz Losses Theorem 11 estabshes the rate of convergence for smooth oss functons, but the acceeraton framework can aso be used on non-smooth, Lpschtz oss functons. The man dea s to use the Nesterov s smoothng technque Nesterov, 5 to construct a smooth approxmaton of the non-smooth functon φ, by addng a strongy-convex reguarzaton term on the conjugate of φ : φ α := φ α + γ α, by the property of conjugate functons e.g. Lemma n Shaev-Shwartz and Zhang, 14, we know φ, as the conjugate functon of φ s 1/γ-smooth, and φ u φ u γl. Then nstead of the orgna functon wth non-smooth osses, we mnmze the smoothed objectve: [ ] n ˆP w := φ X w + ngw + hw. 19 mn w R d =1 The foowng coroary estabshes the computaton effcency guarantees for Acc-DADM on non-smooth, Lpschtz oss functons. Coroary 13 Assume that each φ s L-Lpschtz, and g s 1-strongy convex w.r.t, X R for a, M = Q s fxed on each machne. To obtan expected ɛ normazed prma sub-optmaty: [ ] P w t E P w ɛ, n n t s suffcent to run Agorthm 3 on the smoothed objectve 19, wth and the foowng number of stages, T outer 1 + η og ξ 4 + κ = 1 + ɛ γ = ɛ L, og 17 + κ + og P D, ɛ,
18 Zheng, Wang, Xa, Xu, and Zhang and the number of nner teratons n DADM at each stage: T nner L R ɛ + κ + max n M L R og ɛ + κ + max n M κ og. In partcuar, suppose we assume n 1 = n =... = n m, and M 1 = M =... = M m = b, then the tota vector computatons for each machne are bounded by ÕT outer T nner b = Õ 1 + κ + L R ɛ + κ + n b. mb Remark 14 When κ =, then the guarantees reduce to DADM for Lpschtz osses. Moreover, when L Rm nɛ, f we choose κ n Agorthm 3 as κ = ml R nɛ, and b = 1, then the tota vector computatons for each machne are bounded by L Õ Rm n Rn = nɛ m Õ L, ɛm whch means Acc-DADM can be much faster than DADM when ɛ s sma and aways obtan a reducton of computatons over the snge-machne AccProxSDCA by a factor of Õ 1/m. 9. Proofs In ths secton, we frst present proofs about severa prevous propostons to estabsh our framework sody. Then based on our new dstrbuted dua formuaton, we drecty generaze the anayss of SDCA and adapt t to DADM n the commony used mn-batch setup. Fnay, we descrbe the proof for the theoretca guarantees of Acc-DADM. 9.1 Proof of Proposton 1 Proof Gven any set of parameters w; {w }; {u }; {α }; {β }, we have mn Jw; {w }; {u }; {α }; {β } w;{w };{u } m = mn mn φ u + α w;{w } u u X w + n gw + β w w + hw, =1 S }{{} A 18
19 A Genera Dstrbuted Dua Coordnate Optmzaton Framework where the mnmum s acheved at {u } such that φ u + α =. By emnatng u we obtan m A = mn φ α α X w + n gw + β w w + hw w;{w } =1 S m = mn mn φ w w α X α β w + n gw β w + hw, =1 S S }{{} B where mnmum s acheved at {w } such that X S α β + n gw =. By emnatng w we obtan m B = mn φ w α n g S X α β β n w + hw =1 S m = φ α n g S X α β h β, n =1 S }{{} Dα,β where the mnmzer s acheved at w such that β + hw =. Ths competes the proof. 9. Proof of Proposton Proof Gven any w, f we take u = X w and w = w for a and, then P w = Jw; {w }; {u }; {α }; {β } for arbtrary {α }; {β }. It foows from Proposton 1 that P w = Jw; {w }; {u }; {α }; {β } Dα, β. w s the mnmzer of P w. When w = w, we may set u = u = X w and w = w = w. From the frst order optmaty condton, we can obtan X φ u + n gw + hw =. If we take α = φ u and β = S X α n gw for some subgradents, then t s not dffcut to check that a equatons n 6 are satsfed. It foows that we can acheve equaty n Proposton 1 as P w = Jw ; {w }; {u }; {α }; {β } = Dα, β. 19
20 Zheng, Wang, Xa, Xu, and Zhang Ths means that zero duaty gap can be acheved wth w. It s easy to verfy that α, β maxmzes Dα, β, snce for any α, β, we have Dα, β Jw ; {w }; {u }; {α }; {β } = P w = Dα, β. 9.3 Proof of Proposton 3 Proof We have the decompostons m Dα, β = D α β h β, =1 and m P w = P w β β w + hw. =1 It foows that the duaty gap m P w Dα, β = =1[ P w β D α β ] + h β + hw β w. Note that the defnton of convex conjugate functon mpes that h β + hw β w, and the equaty hods when hw = β. Ths mpes the desred resut. 9.4 Proof of Proposton 4 Proof It s easy to check by usng the duaty that for any b and w: ng v b h b n [ nw v b ] [ ] + ngw + b w + hw n = nw v + ngw + hw, and the equaty hods f b = hw and v b n = gw for some subgradents. Based on the assumptons, the equaty can be acheved at b = βv = hwv and w = wv. Ths proves the desred resut by notcng that v b n = gw mpes that w = g v b/n.
21 A Genera Dstrbuted Dua Coordnate Optmzaton Framework 9.5 Proof of Proposton 5 Proof Snce α s fxed, we know that the probem max β Dα, β s equvaent to [ m max n g v α β ] h β. β n =1 Now by usng Jensen s nequaty, we obtan for any β : m n g v α β n =1 ng m =1 = ng vα n S X α β n n β n ng vα βvα n h β h h β h βvα. In the above dervaton, the ast nequaty has used Proposton 4. Here the equates can be acheved when v α β = vα βvα n n for a, whch can be obtaned wth the choce of {β } = {β } gven n the statement of the proposton. β 9.6 Proof of Theorem 6 The foowng resut s the mn-batch verson of a reated resut n the anayss of ProxSDCA, whch we appy to any oca machne. The proof s ncuded for competeness. Lemma 15 Assume that φ s γ-strongy convex w.r.t where γ can be zero and g s 1-smooth w.r.t. Every oca step, we randomy pck a mn-batch Q S, whose sze s M := Q, and optmze w.r.t dua varabes α, Q. Then, usng the smpfed notaton we have P w t 1 = P w t 1 β t 1, D α t 1 = D α t 1 β t 1, where E [D α t D α t 1 ] s M n E [P w t 1 G t := [ X γn ] 1 s E M s S D α t 1 ] s M n G t [ ] u t 1 α t 1 α := α t α t 1 = s u t 1 α t 1, for a Q, 1
22 Zheng, Wang, Xa, Xu, and Zhang and u t 1 = φ X wt 1, s [, 1]. Proof Snce ony the eements n Q are updated, the mprovement n the dua objectve can be wrtten as D α t D α t 1 = φ α t n g v t 1 + n 1 X α Q Q φ α t 1 n g v t 1 Q φ α t 1 α g v t 1 X α 1 X α n Q Q Q }{{} A φ α t 1, Q }{{} B where we have used the fact the g s 1-smooth n the dervaton of the nequaty. By the defnton of the update n the agorthm, and the defnton of α = s u t 1 α t 1, s [, 1], we have A φ α t 1 + s u t 1 α t 1 Q g v t 1 X s u t 1 α t 1 Q 1 X s u t 1 n Q α t 1 1 From now on, we omt the superscrpt t 1. Snce φ s γ-strongy convex w.r.t, we have φ α + s u α = φ s u + 1 s α s φ u + 1 s φ α γ s 1 s u α
23 A Genera Dstrbuted Dua Coordnate Optmzaton Framework Brngng Eq. nto Eq. 1, we get A Q s φ u 1 s φ α + γ s 1 s u α w s X u α 1 s X u α n Q Q φ α + Q Q }{{} B + Q γ s 1 s u α s w X u φ u + s φ α + s w X α M X u α s n, Q where we get the second nequaty accordng to the fact that Q a Q M a. Snce we choose u = φ X w, for some subgradents φ X w, whch yeds w X u φ u = φ X w, then we obtan A B Q s [φ X w + φ α + w X α ] + [ γ1 s u α s s M X ]. n Q = ] [φ X w + φ α + w X α Q s + M n Q s u α [ ] γn 1 s X. M s 3 Reca that wth w = g ṽ, we have gw + g ṽ = w ṽ. Then we derve the oca duaty gap as P w D α = φ X w + n gw + β w φ α n g S X α β n S S = φ X w + φ α + w X α S 3
24 Zheng, Wang, Xa, Xu, and Zhang Then, takng the expectaton of Eq. 3 w.r.t the random choce of mn-batch set Q at round t, we obtan E t [A B ] M n S s + M n = s M n [φ X w + φ α + w X α ] [ ] s u α γn 1 s X M s S ] [φ X w + φ α + w X α S M n S s u α [ X γn ] 1 s. M s Take expectaton of both sdes w.r.t the randomness n prevous teratons, we have E[A B ] s M E [ P w D α n ] s M n G t, where G t := [ X γn ] 1 s E [ u α M s ]. S Proof of Theorem 6. Proof We w appy Lemma 15 wth s = 1 γn 1 + RM = γn γn + M R [, 1], S. Reca that X R for a S, then we have X γn 1 s M s, for a S, whch mpes that G t for a. It foows that for a after the oca update step we have: E [ D α t s M E n βt 1 [ P w t 1 D α t 1 β t 1 β t 1 ] D α t 1 β t 1 ]. 4 Now we note that after the goba step at teraton t 1, the choces of w t 1 and β t 1 n DADM s accordng to the choce of Proposton 4 and Proposton 5. It foows from 4
25 A Genera Dstrbuted Dua Coordnate Optmzaton Framework Proposton 5 that the foowng reatonshp between the goba and oca duaty gap at the begnnng of the t-th teraton s satsfed: P w t 1 Dα t 1, β t 1 = [ P w t 1 β t 1 D ] α t 1 β t 1. Usng ths decomposton and summng over n 4, we obtan where E [Dα t, β t 1 Dα t 1, β t 1 ] qe [P w t 1 Dα t 1, β t 1 ], q = mn s M n = mn Snce Dα t, β t Dα t, β t 1, we obtan γm γn + M R. E [Dα t, β t Dα t 1, β t 1 ] qe [P w t 1 Dα t 1, β t 1 ]. Let α, β be the optma souton of the dua probem, we have defned the dua suboptmaty as ɛ t D := Dα, β Dα t, β t. Let ɛ t 1 G = P w t 1 Dα t 1, β t 1, and we know that ɛ t 1 D ɛ t 1 G. It foows that Therefore we have E[ɛ t 1 D ] E[ɛ t 1 D ɛ t D ] qe[ɛt 1 G ] qe[ɛ t 1 D ]. qe[ɛ t G ] E[ɛt D ] 1 qe[ɛt 1 D ] 1 q t ɛ D e qt ɛ To obtan an expected duaty gap of E[ɛ T G ] ɛ, every T, whch satsfes T 1 q og 1 ɛ D, q ɛ s suffcent. Ths proves the desred bound. D. 9.7 Proof of Theorem 7 Now, we consder L-Lpschtz oss functons and use the foowng basc emma for L- Lpschtz osses taken from Shaev-Shwartz and Zhang, 13, 14. Lemma 16 Let φ : R q R be an L-Lpschtz functon w.r.t, then we have φ α =, for any α R q s.t. α > L. Proof of Theorem 7. Proof Appyng Lemma 15 wth γ =, then we have = [ ] X E u t 1 α t 1. S G t 5
26 Zheng, Wang, Xa, Xu, and Zhang Accordng to Lemma 16, we know that u t 1 u t 1 α t 1 u t 1 L and α t 1 + α t 1 4L. L, thus we have Reca that X R, then we have Gt G, where G = 4n RL. Combnng ths nto Lemma 15, we have E [ D α t s M E n βt 1 [ P w t 1 D α t 1 β t 1 ] β t 1 D α t 1 β t 1 ] s M n G. 5 Now we aso note that after the goba step at teraton t 1, the choces of w t 1 and β t 1 n DADM s accordng to the choce of Proposton 4 and Proposton 5. It foows from Proposton 5 that the foowng reatonshp of goba and oca duaty gap at the begnnng of the t-th teraton s satsfed: P w t 1 Dα t 1, β t 1 = [ P w t 1 β t 1 D ] α t 1 β t 1. Summng the nequaty 5 over, combnng wth the above decomposton and brngng Dα t, β t Dα t, β t 1 nto t, we get E[Dα t, β t Dα t 1, β t 1 ] qe[p w t 1 Dα t 1, β t 1 ] m =1 q G, 6 M where q [, mn n ], q = s M n and s [, 1] s chosen so that a s M n = 1,..., m are equa. Let α, β be the optma souton for the dua probem Dα, β, and we have defned the dua suboptmaty as ɛ t D := Dα, β Dα t, β t. Note that the duaty gap s an upper bound of the dua suboptmaty, P w t 1 Dα t 1, β t 1 ɛ t 1 D. Then 6 mpes that E [ ] [ ɛ t D 1 qe n ] ɛ t 1 D + q G n, where G = 1 n m G = 4RL Startng from ths recurson, we can now appy the same anayss for L-Lpschtz oss functons of the snge-machne SDCA n Shaev-Shwartz and Zhang, 13 to obtan the foowng desred nequaty: E [ ] ɛ t D n =1 G ñ + t t, 7 for a t t = max, ñ og ɛ ng, where ñ = max n /M. Further appyng the same strateges n Shaev-Shwartz and Zhang, 13 based on 7 proves the desred bound. D ñ 6
27 A Genera Dstrbuted Dua Coordnate Optmzaton Framework 9.8 Proof of Theorem 11 Our proof strategy foows Shaev-Shwartz and Zhang, 14 and Frostg et a., 15, whch both used acceeraton technques of Nesterov, 4 on top of approxmate proxma pont steps, the man dfferences compared wth Shaev-Shwartz and Zhang, 14 and Frostg et a., 15 are here we warm start wth two groups dua varabes α and β where Shaev-Shwartz and Zhang, 14 warm start ony wth α as t consder the snge machne settng, and Frostg et a., 15 warm start from prma varabes w. Proof The proof conssts of the foowng steps: In Lemma 17 we show that one can construct a quadratc ower bound of the orgna objectve P w from an approxmate mnmzer of the proxma objectve P t w. Usng the quadratc ower bound we construct an estmaton sequence, based on whch n Lemma 18 we prove the acceerated convergence rate for the outer oops. We show n Lemma 19 that by warm start the terates from the ast stage, the dua sub-optmaty for the next stage s sma. Based on Lemma 19, we know the contracton factor between the nta dua sub-optmaty and the target prma-dua gap at stage t can be upper bounded by D t α t opt, βt opt D tα t 1, β t 1 ηξ t 1 / + η ɛ t 1 ηξ t 1 / + η η/ η η1 η/ κ 36 + η 5 1 η 36κξ t 3 ηξ t 1 / + η where the ast step we used the fact that η 4 = η 1η +1+1 > η 1η +1 = κη +1. Thus usng the resuts from pan DADM Theorem 6, we know the number of nner teratons n each stage s upper bounded by where χ = Dt α t opt χ og χ + og, βt opt D tα t 1, β t 1 ηξ t 1 / + η χ og χ κ og, R γ+κ + max n M. 9.9 Proof of Coroary 13 By the property of φ u, for every w we have ˆP w n P w n γl, 7
28 Zheng, Wang, Xa, Xu, and Zhang thus f we found a predctor w t that s ɛ -suboptma wth respect to ˆP w n : ˆP w t n mn w ˆP w n ɛ, and we choose γ = ɛ/l, we know t must be ɛ-suboptma wth respect to P w n, because P w t n P w n ˆP w t n ˆP w t n ˆP w + γl n mn w ˆP w n + ɛ ɛ. The rest of the proof just foows the smooth case as proved n Theorem 11. Dua subprobems n Acc-DADM P t w = = =1 =1 Defne: = + κ, fw = gw + κ w. Let n φ X w + ngw + hw + κn w y t 1 n φ X w + fw n κ w y t 1 + hw + κn y t 1 be the goba prma probem to sove, and P t w = S φ X w + n gw + κn w y t 1 be the separated oca probem. Gven each dua varabe β, we aso defne the adjusted oca prma probem as: P t w β = S φ X w + n gw + β w + κn t s not hard to see the adjusted oca dua probem s D t α β = S φ α n f S X α β + κn y t 1 n and the goba dua objectve can be wrtten as D t α, β = m m D t α β h β. =1 =1 w y t 1, + κn y t 1, Quadratc ower bound for P w based on approxmate proxma pont agorthm Snce P t w = P w + κn w y t 1, and et wt opt = arg mn w P t w. The foowng emma shows we can construct a ower bound of P w from an approxmate mnmzer of P t w. 8
29 A Genera Dstrbuted Dua Coordnate Optmzaton Framework Lemma 17 Let w + be an ɛ-approxmated mnmzer of P t w,.e. P t w + P t w t opt + ɛ. We can construct the foowng quadratc ower bound for P w, as w P w P w + + Qw; w +, y t 1, ɛ, 8 where Qw; w +, y t 1, ɛ = n 4 y w t 1 κ n w + y t κ y t 1 w + κ + ɛ. Proof Snce w t opt s the mnmzer of a κ + n-strongy convex objectve P tw, we know w, P t w P t w t κ + n opt + w w t opt P t w + κ + n + w w t opt ɛ, whch s equvaent to P w P w + + Snce κ + / κ + n w w t w w + + / =κ re-organzng terms we get So κ + = κ + / opt w w t ɛ + κn opt + wt opt w+ w t w opt w + t opt w+ + κ + / w w t opt, wt opt w+ κ + / w t w opt w + t opt w+ + κ + / w + w t opt w + y t 1 w y t 1., w t opt w κ + / w w + κ + κ + / + / w t opt w w + w t opt P w P w + κ + /n + w w + κ + κ + /n w + w t opt ɛ + κn w + y t 1 w y t 1 9
30 Zheng, Wang, Xa, Xu, and Zhang Aso noted that κ+n w + w t Decompose w w + we get opt ɛ, we get P w P w + κ + /n + w w + κ + ɛ + κn w + y t 1 w y t 1 w w + w = y t 1 + y t 1 w + + w yt 1, y t 1 w +. So P w P w + + /n κ + /n + w y t 1 y t 1 w + κ + ɛ + κ + /n w yt 1, y t 1 w + Notced that the rght hand sde of above nequaty s a quadratc functon wth respect to w, and the mnmum s acheved when w = y t κ y t 1 w +, wth mnmum vaue κ n w + y t 1 wth above we fnshed the proof of Lemma 17. κ + ɛ, Convergence proof and for t 1, Defne the foowng sequence of quadratc functons ψ w = P + n κ + 4 w P D,, ψ t w = 1 ηψ t 1 w + ηp w t + Qw; w t, y t 1, ɛ t, where η = +κ, We frst cacuate the expct form of the quadratc functon ψ tw and ts mnmzer v t = arg mn w ψ t w. Ceary v =, and notced that ψ t w s aways a n -strongy convex functon, we know ψ tw s n the foowng form: ψ t w = ψ t v t + n 4 3 w v t.
31 A Genera Dstrbuted Dua Coordnate Optmzaton Framework Based on the defnton of ψ t+1 w, and v t+1 s mnmzng ψ t+1 w, based on frst-order optmaty condton, we know 1 ηn v t+1 v t + ηn v t+1 y t 1 + κ y t w t+1 =, rearrangng we get v t+1 = 1 ηv t + η y t 1 + κ y t w t+1. The foowng emma proves the convergence rate of w t to ts mnmzer. Lemma 18 Let ɛ t η 1 + η ξ t, and ξ t = 1 η/ t ξ, we w have the foowng convergence guarantee: P w t P w ξ t. Proof It s suffcent to prove P w t mn w ψ tw ξ t, 9 then we get P w t P w P w t ψ t w P w t mn w ψ tw ξ t. We prove equaton 9 by nducton. When t =, we have κ + P w φ v = ɛ = ξ, 31
32 Zheng, Wang, Xa, Xu, and Zhang whch verfed 9 s true for t =. Suppose the cam hods for some t 1, for the stage t + 1, we have ψ t+1 v t+1 =1 η ψ t v t + n 4 v t+1 v t + ηp w t+1 + Qv t+1 ; w t+1, y t, ɛ t =1 ηψ t v t + 1 ηη n 4 vt + ηp w t+1 + η1 η n 4 vt ηκ n y t w t+1 κ + η ɛ t =1 ηψ t v t + ηp w t+1 ηκ n η1 ηn + 4 vt y t 1 + κ y t 1 + κ y t w t+1 y t 1 + κ y t w t+1 y t w t+1 y t w t+1 κ + η. ɛ t Snce ηκ n ηκ n y t w t+1 η1 ηn + 4 κ + η1 ηn + 4 vt 1 + κ y t w t+1 v t y t, y t w t+1 + η1 ηn ηκ n + η1 ηκ n y t w t+1 + η1 ηn κ + v t y t, y t w t+1 = η κ n y t w t+1 y t 1 + κ y t w t+1 + η1 ηn κ + v t y t, y t w t+1 Thus κ + ψ t+1 v t+1 1 ηψ t v t + ηp w t+1 η η κ n y t w t+1 + η1 ηn κ + ɛ t v t y t, y t w t+1 3
Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory
Advanced Scence and Technoogy Letters Vo.83 (ISA 205), pp.60-65 http://dx.do.org/0.4257/ast.205.83.2 Research on Compex etworks Contro Based on Fuzzy Integra Sdng Theory Dongsheng Yang, Bngqng L, 2, He
More informationMARKOV CHAIN AND HIDDEN MARKOV MODEL
MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG JIANZHAN@STAT.PURDUE.EDU Markov chan and hdden Markov mode are probaby the smpest modes whch can be used to mode sequenta data,.e. data sampes whch are not
More informationDeriving the Dual. Prof. Bennett Math of Data Science 1/13/06
Dervng the Dua Prof. Bennett Math of Data Scence /3/06 Outne Ntty Grtty for SVM Revew Rdge Regresson LS-SVM=KRR Dua Dervaton Bas Issue Summary Ntty Grtty Need Dua of w, b, z w 2 2 mn st. ( x w ) = C z
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationAssociative Memories
Assocatve Memores We consder now modes for unsupervsed earnng probems, caed auto-assocaton probems. Assocaton s the task of mappng patterns to patterns. In an assocatve memory the stmuus of an ncompete
More informationDSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization
DSCOVR: Randomzed Prma-Dua Boc Coordnate Agorthms for Asynchronous Dstrbuted Optmzaton Ln Xao Mcrosoft Research AI Redmond, WA 9805, USA Adams We Yu Machne Learnng Department, Carnege Meon Unversty Pttsburgh,
More informationNeural network-based athletics performance prediction optimization model applied research
Avaabe onne www.jocpr.com Journa of Chemca and Pharmaceutca Research, 04, 6(6):8-5 Research Artce ISSN : 0975-784 CODEN(USA) : JCPRC5 Neura networ-based athetcs performance predcton optmzaton mode apped
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationSupplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks
Shengyang Sun, Changyou Chen, Lawrence Carn Suppementary Matera: Learnng Structured Weght Uncertanty n Bayesan Neura Networks Shengyang Sun Changyou Chen Lawrence Carn Tsnghua Unversty Duke Unversty Duke
More informationA finite difference method for heat equation in the unbounded domain
Internatona Conerence on Advanced ectronc Scence and Technoogy (AST 6) A nte derence method or heat equaton n the unbounded doman a Quan Zheng and Xn Zhao Coege o Scence North Chna nversty o Technoogy
More informationOn the Power Function of the Likelihood Ratio Test for MANOVA
Journa of Mutvarate Anayss 8, 416 41 (00) do:10.1006/jmva.001.036 On the Power Functon of the Lkehood Rato Test for MANOVA Dua Kumar Bhaumk Unversty of South Aabama and Unversty of Inos at Chcago and Sanat
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationA Class of Distributed Optimization Methods with Event-Triggered Communication
A Cass of Dstrbuted Optmzaton Methods wth Event-Trggered Communcaton Martn C. Mene Mchae Ubrch Sebastan Abrecht the date of recept and acceptance shoud be nserted ater Abstract We present a cass of methods
More informationLower Bounding Procedures for the Single Allocation Hub Location Problem
Lower Boundng Procedures for the Snge Aocaton Hub Locaton Probem Borzou Rostam 1,2 Chrstoph Buchhem 1,4 Fautät für Mathemat, TU Dortmund, Germany J. Faban Meer 1,3 Uwe Causen 1 Insttute of Transport Logstcs,
More informationImage Classification Using EM And JE algorithms
Machne earnng project report Fa, 2 Xaojn Sh, jennfer@soe Image Cassfcaton Usng EM And JE agorthms Xaojn Sh Department of Computer Engneerng, Unversty of Caforna, Santa Cruz, CA, 9564 jennfer@soe.ucsc.edu
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More informationON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION
European Journa of Mathematcs and Computer Scence Vo. No. 1, 2017 ON AUTOMATC CONTNUTY OF DERVATONS FOR BANACH ALGEBRAS WTH NVOLUTON Mohamed BELAM & Youssef T DL MATC Laboratory Hassan Unversty MORO CCO
More informationSubgradient Methods and Consensus Algorithms for Solving Convex Optimization Problems
Proceedngs of the 47th IEEE Conference on Decson and Contro Cancun, Mexco, Dec. 9-11, 2008 Subgradent Methods and Consensus Agorthms for Sovng Convex Optmzaton Probems Björn Johansson, Tamás Kevczy, Mae
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationExample: Suppose we want to build a classifier that recognizes WebPages of graduate students.
Exampe: Suppose we want to bud a cassfer that recognzes WebPages of graduate students. How can we fnd tranng data? We can browse the web and coect a sampe of WebPages of graduate students of varous unverstes.
More informationPolite Water-filling for Weighted Sum-rate Maximization in MIMO B-MAC Networks under. Multiple Linear Constraints
2011 IEEE Internatona Symposum on Informaton Theory Proceedngs Pote Water-fng for Weghted Sum-rate Maxmzaton n MIMO B-MAC Networks under Mutpe near Constrants An u 1, Youjan u 2, Vncent K. N. au 3, Hage
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More information3. Stress-strain relationships of a composite layer
OM PO I O U P U N I V I Y O F W N ompostes ourse 8-9 Unversty of wente ng. &ech... tress-stran reatonshps of a composte ayer - Laurent Warnet & emo Aerman.. tress-stran reatonshps of a composte ayer Introducton
More informationOn the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel
Proceedngs of th European Symposum on Artfca Neura Networks, pp. 25-222, ESANN 2003, Bruges, Begum, 2003 On the Equaty of Kerne AdaTron and Sequenta Mnma Optmzaton n Cassfcaton and Regresson Tasks and
More informationNested case-control and case-cohort studies
Outne: Nested case-contro and case-cohort studes Ørnuf Borgan Department of Mathematcs Unversty of Oso NORBIS course Unversty of Oso 4-8 December 217 1 Radaton and breast cancer data Nested case contro
More informationApproximate merging of a pair of BeÂzier curves
COMPUTER-AIDED DESIGN Computer-Aded Desgn 33 (1) 15±136 www.esever.com/ocate/cad Approxmate mergng of a par of BeÂzer curves Sh-Mn Hu a,b, *, Rou-Feng Tong c, Tao Ju a,b, Ja-Guang Sun a,b a Natona CAD
More informationLower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle
Lower bounds for the Crossng Number of the Cartesan Product of a Vertex-transtve Graph wth a Cyce Junho Won MIT-PRIMES December 4, 013 Abstract. The mnmum number of crossngs for a drawngs of a gven graph
More informationNew Inexact Parallel Variable Distribution Algorithms
Computatona Optmzaton and Appcatons 7, 165 18 1997) c 1997 Kuwer Academc Pubshers. Manufactured n The Netherands. New Inexact Parae Varabe Dstrbuton Agorthms MICHAEL V. SOLODOV soodov@mpa.br Insttuto de
More informationDistributed Moving Horizon State Estimation of Nonlinear Systems. Jing Zhang
Dstrbuted Movng Horzon State Estmaton of Nonnear Systems by Jng Zhang A thess submtted n parta fufment of the requrements for the degree of Master of Scence n Chemca Engneerng Department of Chemca and
More informationNote 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2
Note 2 Lng fong L Contents Ken Gordon Equaton. Probabty nterpretaton......................................2 Soutons to Ken-Gordon Equaton............................... 2 2 Drac Equaton 3 2. Probabty nterpretaton.....................................
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationA Derivative-Free Algorithm for Bound Constrained Optimization
Computatona Optmzaton and Appcatons, 21, 119 142, 2002 c 2002 Kuwer Academc Pubshers. Manufactured n The Netherands. A Dervatve-Free Agorthm for Bound Constraned Optmzaton STEFANO LUCIDI ucd@ds.unroma.t
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationDevelopment of whole CORe Thermal Hydraulic analysis code CORTH Pan JunJie, Tang QiFen, Chai XiaoMing, Lu Wei, Liu Dong
Deveopment of whoe CORe Therma Hydrauc anayss code CORTH Pan JunJe, Tang QFen, Cha XaoMng, Lu We, Lu Dong cence and technoogy on reactor system desgn technoogy, Nucear Power Insttute of Chna, Chengdu,
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationn-step cycle inequalities: facets for continuous n-mixing set and strong cuts for multi-module capacitated lot-sizing problem
n-step cyce nequates: facets for contnuous n-mxng set and strong cuts for mut-modue capactated ot-szng probem Mansh Bansa and Kavash Kanfar Department of Industra and Systems Engneerng, Texas A&M Unversty,
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationAccelerated gradient methods and dual decomposition in distributed model predictive control
Deft Unversty of Technoogy Deft Center for Systems and Contro Technca report 12-011-bs Acceerated gradent methods and dua decomposton n dstrbuted mode predctve contro P. Gsesson, M.D. Doan, T. Kevczky,
More informationQuantum Runge-Lenz Vector and the Hydrogen Atom, the hidden SO(4) symmetry
Quantum Runge-Lenz ector and the Hydrogen Atom, the hdden SO(4) symmetry Pasca Szrftgser and Edgardo S. Cheb-Terrab () Laboratore PhLAM, UMR CNRS 85, Unversté Le, F-59655, France () Mapesoft Let's consder
More informationJournal of Multivariate Analysis
Journa of Mutvarate Anayss 3 (04) 74 96 Contents sts avaabe at ScenceDrect Journa of Mutvarate Anayss journa homepage: www.esever.com/ocate/jmva Hgh-dmensona sparse MANOVA T. Tony Ca a, Yn Xa b, a Department
More informationInterference Alignment and Degrees of Freedom Region of Cellular Sigma Channel
2011 IEEE Internatona Symposum on Informaton Theory Proceedngs Interference Agnment and Degrees of Freedom Regon of Ceuar Sgma Channe Huaru Yn 1 Le Ke 2 Zhengdao Wang 2 1 WINLAB Dept of EEIS Unv. of Sc.
More informationA MIN-MAX REGRET ROBUST OPTIMIZATION APPROACH FOR LARGE SCALE FULL FACTORIAL SCENARIO DESIGN OF DATA UNCERTAINTY
A MIN-MAX REGRET ROBST OPTIMIZATION APPROACH FOR ARGE SCAE F FACTORIA SCENARIO DESIGN OF DATA NCERTAINTY Travat Assavapokee Department of Industra Engneerng, nversty of Houston, Houston, Texas 7704-4008,
More informationMultispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory
Proceedngs of the 2009 IEEE Internatona Conference on Systems Man and Cybernetcs San Antono TX USA - October 2009 Mutspectra Remote Sensng Image Cassfcaton Agorthm Based on Rough Set Theory Yng Wang Xaoyun
More informationCyclic Codes BCH Codes
Cycc Codes BCH Codes Gaos Feds GF m A Gaos fed of m eements can be obtaned usng the symbos 0,, á, and the eements beng 0,, á, á, á 3 m,... so that fed F* s cosed under mutpcaton wth m eements. The operator
More informationChapter 6. Rotations and Tensors
Vector Spaces n Physcs 8/6/5 Chapter 6. Rotatons and ensors here s a speca knd of near transformaton whch s used to transforms coordnates from one set of axes to another set of axes (wth the same orgn).
More informationPrice Competition under Linear Demand and Finite Inventories: Contraction and Approximate Equilibria
Prce Competton under Lnear Demand and Fnte Inventores: Contracton and Approxmate Equbra Jayang Gao, Krshnamurthy Iyer, Huseyn Topaogu 1 Abstract We consder a compettve prcng probem where there are mutpe
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationKey words. corner singularities, energy-corrected finite element methods, optimal convergence rates, pollution effect, re-entrant corners
NESTED NEWTON STRATEGIES FOR ENERGY-CORRECTED FINITE ELEMENT METHODS U. RÜDE1, C. WALUGA 2, AND B. WOHLMUTH 2 Abstract. Energy-corrected fnte eement methods provde an attractve technque to dea wth eptc
More informationXin Li Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, CHINA
RESEARCH ARTICLE MOELING FIXE OS BETTING FOR FUTURE EVENT PREICTION Weyun Chen eartment of Educatona Informaton Technoogy, Facuty of Educaton, East Chna Norma Unversty, Shangha, CHINA {weyun.chen@qq.com}
More informationQUARTERLY OF APPLIED MATHEMATICS
QUARTERLY OF APPLIED MATHEMATICS Voume XLI October 983 Number 3 DIAKOPTICS OR TEARING-A MATHEMATICAL APPROACH* By P. W. AITCHISON Unversty of Mantoba Abstract. The method of dakoptcs or tearng was ntroduced
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationCOXREG. Estimation (1)
COXREG Cox (972) frst suggested the modes n whch factors reated to fetme have a mutpcatve effect on the hazard functon. These modes are caed proportona hazards (PH) modes. Under the proportona hazards
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationA General Column Generation Algorithm Applied to System Reliability Optimization Problems
A Genera Coumn Generaton Agorthm Apped to System Reabty Optmzaton Probems Lea Za, Davd W. Cot, Department of Industra and Systems Engneerng, Rutgers Unversty, Pscataway, J 08854, USA Abstract A genera
More informationApplication of Particle Swarm Optimization to Economic Dispatch Problem: Advantages and Disadvantages
Appcaton of Partce Swarm Optmzaton to Economc Dspatch Probem: Advantages and Dsadvantages Kwang Y. Lee, Feow, IEEE, and Jong-Bae Par, Member, IEEE Abstract--Ths paper summarzes the state-of-art partce
More informationMonica Purcaru and Nicoleta Aldea. Abstract
FILOMAT (Nš) 16 (22), 7 17 GENERAL CONFORMAL ALMOST SYMPLECTIC N-LINEAR CONNECTIONS IN THE BUNDLE OF ACCELERATIONS Monca Purcaru and Ncoeta Adea Abstract The am of ths paper 1 s to fnd the transformaton
More informationAn Augmented Lagrangian Coordination-Decomposition Algorithm for Solving Distributed Non-Convex Programs
An Augmented Lagrangan Coordnaton-Decomposton Agorthm for Sovng Dstrbuted Non-Convex Programs Jean-Hubert Hours and Con N. Jones Abstract A nove augmented Lagrangan method for sovng non-convex programs
More informationCorrespondence. Performance Evaluation for MAP State Estimate Fusion I. INTRODUCTION
Correspondence Performance Evauaton for MAP State Estmate Fuson Ths paper presents a quanttatve performance evauaton method for the maxmum a posteror (MAP) state estmate fuson agorthm. Under dea condtons
More information4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA
4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationMAXIMUM NORM STABILITY OF DIFFERENCE SCHEMES FOR PARABOLIC EQUATIONS ON OVERSET NONMATCHING SPACE-TIME GRIDS
MATHEMATICS OF COMPUTATION Voume 72 Number 242 Pages 619 656 S 0025-57180201462-X Artce eectroncay pubshed on November 4 2002 MAXIMUM NORM STABILITY OF DIFFERENCE SCHEMES FOR PARABOLIC EQUATIONS ON OVERSET
More informationThe Application of BP Neural Network principal component analysis in the Forecasting the Road Traffic Accident
ICTCT Extra Workshop, Bejng Proceedngs The Appcaton of BP Neura Network prncpa component anayss n Forecastng Road Traffc Accdent He Mng, GuoXucheng &LuGuangmng Transportaton Coege of Souast Unversty 07
More informationPredicting Model of Traffic Volume Based on Grey-Markov
Vo. No. Modern Apped Scence Predctng Mode of Traffc Voume Based on Grey-Marov Ynpeng Zhang Zhengzhou Muncpa Engneerng Desgn & Research Insttute Zhengzhou 5005 Chna Abstract Grey-marov forecastng mode of
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationShort-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques
Energes 20, 4, 73-84; do:0.3390/en40073 Artce OPEN ACCESS energes ISSN 996-073 www.mdp.com/journa/energes Short-Term Load Forecastng for Eectrc Power Systems Usng the PSO-SVR and FCM Custerng Technques
More informationDISTRIBUTED PROCESSING OVER ADAPTIVE NETWORKS. Cassio G. Lopes and Ali H. Sayed
DISTRIBUTED PROCESSIG OVER ADAPTIVE ETWORKS Casso G Lopes and A H Sayed Department of Eectrca Engneerng Unversty of Caforna Los Angees, CA, 995 Ema: {casso, sayed@eeucaedu ABSTRACT Dstrbuted adaptve agorthms
More informationAchieving Optimal Throughput Utility and Low Delay with CSMA-like Algorithms: A Virtual Multi-Channel Approach
Achevng Optma Throughput Utty and Low Deay wth SMA-ke Agorthms: A Vrtua Mut-hanne Approach Po-Ka Huang, Student Member, IEEE, and Xaojun Ln, Senor Member, IEEE Abstract SMA agorthms have recenty receved
More informationInexact Newton Methods for Inverse Eigenvalue Problems
Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationSIMULTANEOUS wireless information and power transfer. Joint Optimization of Power and Data Transfer in Multiuser MIMO Systems
Jont Optmzaton of Power and Data ransfer n Mutuser MIMO Systems Javer Rubo, Antono Pascua-Iserte, Dane P. Paomar, and Andrea Godsmth Unverstat Potècnca de Cataunya UPC, Barceona, Span ong Kong Unversty
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationNP-Completeness : Proofs
NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem
More informationFinding low error clusterings
Fndng ow error custerngs Mara-Forna Bacan Mcrosoft Research, New Engand One Memora Drve, Cambrdge, MA mabacan@mcrosoft.com Mark Braverman Mcrosoft Research, New Engand One Memora Drve, Cambrdge, MA markbrav@mcrosoft.com
More information[WAVES] 1. Waves and wave forces. Definition of waves
1. Waves and forces Defnton of s In the smuatons on ong-crested s are consdered. The drecton of these s (μ) s defned as sketched beow n the goba co-ordnate sstem: North West East South The eevaton can
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationResearch Article H Estimates for Discrete-Time Markovian Jump Linear Systems
Mathematca Probems n Engneerng Voume 213 Artce ID 945342 7 pages http://dxdoorg/11155/213/945342 Research Artce H Estmates for Dscrete-Tme Markovan Jump Lnear Systems Marco H Terra 1 Gdson Jesus 2 and
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationThe Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1
Abstract The Entre Souton Path for Support Vector Machne n Postve and Unabeed Cassfcaton 1 Yao Lmn, Tang Je, and L Juanz Department of Computer Scence, Tsnghua Unversty 1-308, FIT, Tsnghua Unversty, Bejng,
More informationDownlink Power Allocation for CoMP-NOMA in Multi-Cell Networks
Downn Power Aocaton for CoMP-NOMA n Mut-Ce Networs Md Shpon A, Eram Hossan, Arafat A-Dwe, and Dong In Km arxv:80.0498v [eess.sp] 6 Dec 207 Abstract Ths wor consders the probem of dynamc power aocaton n
More informationarxiv: v1 [cs.gt] 28 Mar 2017
A Dstrbuted Nash qubrum Seekng n Networked Graphca Games Farzad Saehsadaghan, and Lacra Pave arxv:7009765v csgt 8 Mar 07 Abstract Ths paper consders a dstrbuted gossp approach for fndng a Nash equbrum
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationGames of Threats. Elon Kohlberg Abraham Neyman. Working Paper
Games of Threats Elon Kohlberg Abraham Neyman Workng Paper 18-023 Games of Threats Elon Kohlberg Harvard Busness School Abraham Neyman The Hebrew Unversty of Jerusalem Workng Paper 18-023 Copyrght 2017
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationGreyworld White Balancing with Low Computation Cost for On- Board Video Capturing
reyword Whte aancng wth Low Computaton Cost for On- oard Vdeo Capturng Peng Wu Yuxn Zoe) Lu Hewett-Packard Laboratores Hewett-Packard Co. Pao Ato CA 94304 USA Abstract Whte baancng s a process commony
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationAnalysis of Bipartite Graph Codes on the Binary Erasure Channel
Anayss of Bpartte Graph Codes on the Bnary Erasure Channe Arya Mazumdar Department of ECE Unversty of Maryand, Coege Par ema: arya@umdedu Abstract We derve densty evouton equatons for codes on bpartte
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More informationDelay tomography for large scale networks
Deay tomography for arge scae networks MENG-FU SHIH ALFRED O. HERO III Communcatons and Sgna Processng Laboratory Eectrca Engneerng and Computer Scence Department Unversty of Mchgan, 30 Bea. Ave., Ann
More information