Randomized Dual Coordinate Ascent with Arbitrary Sampling

Size: px
Start display at page:

Download "Randomized Dual Coordinate Ascent with Arbitrary Sampling"

Transcription

1 Radomzed Dual Coordate Ascet wth Arbtrary Samplg Zheg Qu Peter Rchtárk Tog Zhag November 21, 2014 Abstract We study the problem of mmzg the average of a large umber of smooth covex fuctos pealzed wth a strogly covex regularzer. We propose ad aalyze a ovel prmal-dual method Quartz whch at every terato samples ad updates a radom subset of the dual varables, chose accordg to a arbtrary dstrbuto. I cotrast to typcal aalyss, we drectly boud the decrease of the prmal-dual error expectato, wthout the eed to frst aalyze the dual error. Depedg o the choce of the samplg, we obta effcet seral, parallel ad dstrbuted varats of the method. I the seral case, our bouds match the best kow bouds for SDCA both wth uform ad mportace samplg. Wth stadard m-batchg, our bouds predct tal data-depedet speedup as well as addtoal data-drve speedup whch depeds o spectral ad sparsty propertes of the data. We calculate theoretcal speedup factors ad fd that they are excellet predctors of actual speedup practce. Moreover, we llustrate that t s possble to desg a effcet m-batch mportace samplg. The dstrbuted varat of Quartz s the frst dstrbuted SDCA-lke method wth a aalyss for o-separable data. 1 Itroducto I ths paper we cosder a prmal-dual par of structured covex optmzato problems whch has several varats of varyg degrees of geeralty attracted a lot of atteto the past few years the mache learg ad optmzato commutes [8, 9, 29, 27, 30, 28, 37]. 1.1 The problem Let A 1,..., A be a collecto of d-by-m real matrces ad φ 1,..., φ be 1/γ-smooth covex fuctos from R m to R, where γ > 0. Further, let g : R d R be a 1-strogly covex fucto ad λ > 0 a regularzato parameter. We are terested solvg the followg prmal problem: [ ] m w=w 1,...,w d R d P w def = 1 φ A w + λgw. 1 I the mache learg cotext, matrces {A } are terpreted as examples/samples, w s a lear predctor, fucto φ s the loss curred by the predctor o example A, g s a regularzer, School of Mathematcs, The Uversty of Edburgh, Uted Kgdom. School of Mathematcs, The Uversty of Edburgh, Uted Kgdom. Departmet of Statstcs, Rutgers Uversty, New Jersey, USA ad Bg Data Lab, Badu Ic, Cha. Ackowledgmets: The frst two authors would lke to ackowledge support from the EPSRC Grat EP/K02325X/1, Accelerated Coordate Descet Methods for Bg Data Optmzato. 1

2 λ s a regularzato parameter ad 1 s the regularzed emprcal rsk mmzato problem. However, above problem has may other applcatos outsde mache learg. I ths paper we are especally terested problems where s very bg mllos, bllos, ad much larger tha d. Ths s ofte the case bg data applcatos. Let g : R d R be the covex cojugate 1 of g ad for each, let φ : Rm R be the covex cojugate of φ. Assocated wth the prmal problem 1 s the Fechel dual problem: [ ] max Dα def = fα ψα, 2 α=α 1,...,α R N =R m where α = α 1,..., α R N = R m s obtaed by stackg dual varables blocks α R m, = 1,...,, o top of each other ad fuctos f ad ψ are defed by fα def = λg 1 A α, 3 λ ψα def = 1 φ α. 4 Note that f s covex ad smooth ad ψ s strogly covex ad block separable. 1.2 Cotrbutos We ow brefly lst the ma cotrbutos of ths work. Quartz. We propose a ew algorthm, whch we call Quartz 2, for smultaeously solvg the prmal 1 ad dual 2 problems. O the dual sde, at each terato our method selects ad updates a radom subset samplg Ŝ {1,..., } of the dual varables/blocks. We assume that these sets are..d. throughout the teratos. However, we do ot mpose ay addtoal assumptos o the dstrbuto apart from the ecessary requremet that each block [] eeds def to be chose wth a postve probablty: p = P Ŝ > 0. Quartz s the frst SDCA-lke method aalyzed for a arbtrary samplg. The dual updates are the used to perform a update to the prmal varable w ad the process s repeated. Our prmal updates are dfferet less aggressve from those used SDCA [29] ad Prox-SDCA [27]. Ma result. We prove that startg from a tal par w 0, α 0, Quartz fds a par w, α for whch P w Dα ɛ expectato at most 1 max + v P w 0 Dα 0 log 5 p p λγ ɛ 1 I ths paper, the covex Fechel cojugate of a fucto ξ : R k R s the fucto ξ : R k R defed by ξ u = sup s =1 {s u ξs}, where s the L2 orm. 2 Strage as t may seem, ths algorthm ame appeared to oe of the authors of ths paper a dream. Accordg to Wkpeda: Quartz s the secod most abudat meral the Earth s cotetal crust. There are may dfferet varetes of quartz, several of whch are sem-precous gemstoes. Our method also comes may varats. It later came as a surprse to the authors that the ame could be terpreted as QU Ad Rchtárk ad Tog Zhag. Whether the subcoscous md of the sleepg coauthor who dreamed up the ame kew about ths coecto or ot s ot kow. 2

3 teratos. The parameters v 1,..., v are assumed to satsfy the followg ESO expected separable overapproxmato equalty: [ EŜ Ŝ A ] h 2 p v h 2. 6 Moreover, the parameters are eeded to ru the method they determe stepszes, ad hece t s crtcal that they ca be cheaply computed before the method starts. As we wll show, for may samplgs of terest ths ca be doe tme requred to read the data {A }. We wsh to pot out that 6 always holds for some parameters {v }. Ideed, the left had sde s a quadratc fucto of h ad hece the equalty holds for large-eough v. Havg sad that, the sze of these parameters drectly flueces the complexty, ad hece oe would wat to obta as tght bouds as possble. Arbtrary samplg. As descrbed above, Quartz uses a arbtrary samplg for pckg the dual varables to be updated each terato. To the best of our kowledge, oly a sgle paper exsts the lterature where a stochastc method usg a arbtrary samplg was aalyzed: the NSyc method of Rchtárk ad Takáč [22] for ucostraed mmzato of a strogly covex fucto. Assumpto 6 was for the frst tme troduced there a more geeral form; we are usg t here the specal case of a quadratc fucto. However, NSyc s ot a prmal-dual method. Besdes NSyc, the closest works to ours terms of the geeralty of the samplg are the PCDM algorthm of Rchtárk ad Takáč [23], SPCDM method of Fercoq ad Rchtárk [7] ad the APPROX method of Fercoq ad Rchtárk [6]. All these are radomzed coordate descet methods, ad all were aalyzed for arbtrary uform samplgs.e., samplgs satsfyg P Ŝ = P Ŝ for all, []. Aga, oe of these methods were aalyzed a prmal-dual framework. Drect prmal-dual aalyss. Vrtually all methods for solvg 1 by performg stochastc steps the dual 2, such as SDCA [29], SDCA for SVM dual [30], ProxSDCA [27], ASDCA [28] ad APCG [15], are aalyzed by frst establshg dual covergece ad the provg that the dualty gap s bouded by the dual resdual. The SPDC method of Zhag ad Xao [36], whch s a stochastc coordate update varat of the Chambolle-Pock method [3], s a excepto. Our aalyss s ovel, ad drectly prmal-dual ature. As a result, our proof s more drect, ad the logarthmc term our boud has a smpler form. Flexblty: may mportat varats. Our method s very flexble: by specalzg t to specfc samplgs, we obta umerous varats, some smlar but ot detcal to exstg methods the lterature, ad some very ew ad of sgfcace to bg data optmzato. Seral uform samplg. If Ŝ always pcks a sgle block, uformly at radom p = 1/, the the dual updates of Quartz are smlar to those of SDCA [29] ad Prox-SDCA [27]. The leadg term the complexty boud 5 becomes +max λ max A A /λγ, whch matches the bouds obtaed these papers. However, our logarthmc term s smpler. Seral optmal samplg mportace samplg. If Ŝ always pcks a sgle block, wth p chose so as to mmze the complexty boud 5, we obta the same mportace samplg as that recetly used the IProx-SDCA method [37]. Our boud becomes + 1 λ maxa A /λγ, whch matches the boud [37]. Aga, our logarthmc term s better. 3

4 τ-ce samplg. If we ow let Ŝ be a radom subset of [] of sze τ chose uformly at radom ths samplg s called τ-ce [23], we obta a m-batch parallel varat of Quartz. There are oly a hadful of prmal-dual stochastc methods whch use m-batchg. The frst such method was a m-batch verso of SDCA specalzed to trag L2-regularzed lear SVMs wth hge loss [30]. Besdes ths, two accelerated m-batch methods have bee recetly proposed: ASDCA of Shalev-Shwartz ad Zhag [28] ad SPDC of Zhag ad Xao [36]. The complexty boud of Quartz specalzed to the τ-ce samplg s dfferet, ad despte Quartz ot beg a accelerated method, ad ca be better certa regmes we wll do a detaled comparso Secto 4. Dstrbuted samplg. To the best of our kowledge, o other samplgs tha those descrbed above were used stochastc prmal-dual methods. However, there are may addtoal terestg samplgs proposed for radomzed coordate descet, but ever appled to the prmal-dual framework. For stace, we ca use the dstrbuted samplg whch led to the developmet of the Hydra algorthm [21] dstrbuted coordate descet ad ts accelerated varat Hydra 2 Hydra squared [5]. Usg ths samplg, Quartz ca be effcetly mplemeted a dstrbuted evromet partto the examples across the odes of a cluster, ad let each ode each terato update a radom subset of varables correspodg to the examples t ows. Product samplg. We descrbe a ovel samplg, whch we call product samplg, that ca be both o-seral ad o-uform. Ths s the frst tme such a samplg has bee descrbed ad ad a SDCA-lke method usg t aalyzed. For sutable data f the examples ca be parttoed to several groups o two of whch share a feature, ths samplg ca lead to lear or early lear speedup whe compared to the seral uform samplg. Other samplgs. Whle we develop the aalyss of Quartz for a arbtrary samplg, we do ot compute the ESO parameters {v } for ay other samplgs ths paper. However, there are several other terestg choces. We refer the reader to [23] ad [22] for further examples of uform ad o-uform samplgs, respectvely. All that must be doe for ay ew Ŝ s to fd parameters v for whch 6 holds, ad the complexty of the ew varat of Quartz s gve by 5. Further data-drve speedup. Exstg m-batch stochastc prmal-dual methods acheve lear speedup up to a certa m-batch sze whch depeds o, λ ad γ. Quartz obtas ths data-depedet speedup, but also obtas further data-drve speedup. Ths s caused by the fact that Quartz uses more aggressve dual stepszes, formed by the data through the ESO parameters {v }. The smaller these costats, the better speedup. For stace, we wll show that hgher data sparsty leads to smaller {v } ad hece to better speedup. To llustrate ths, cosder the τ-ce samplg hece, p = τ/ for all ad the extreme case of perfectly sparse data each feature j [d] appearg a sgle example A. The 6 holds wth v = λ max A A for all, ad hece the leadg term 5 becomes /τ + max λ max A A /γλτ, predctg perfect speedup the m-batch sze τ. We derve theoretcal speedup factors ad show that these are excellet predctors of actual behavor of the method a mplemetato. Ths was prevously observed for the PCDM method [23] whch s ot prmal-dual. 4

5 Quartz vs purely prmal ad purely dual methods. I the specal case whe Ŝ s the seral uform samplg, the complexty of Quartz s smlar to the bouds recetly obtaed by several purely prmal stochastc ad sem-stochastc gradet methods all havg reduced varace of the gradet estmate such as SAG [25], SVRG [11], S2GD [14], SAGA [4], ms2gd [12] ad MISO [16]. I the case of seral optmal samplg, relevat purely prmal methods wth smlar guaratees are ProxSVRG [33] ad S2CD [13]. A m-batch prmal method, ms2gd, was aalyzed [12], achevg a smlar boud to Quartz specalzed to the τ-ce samplg. Purely dual stochastc coordate descet methods wth smlar bouds to Quartz for both the seral uform ad seral optmal samplg, for problems of varyg smlarty ad geeralty whe compared to 2, clude SCD [26], RCDM [19], UCDC/RCDC [24], ICD [32] ad RCD [18]. These methods were the geeralzed to the τ-ce samplg SHOTGUN [2], further geeralzed to arbtrary uform samplgs PCDM [23], SPCDM [7], APPROX [6] whch s a accelerated method ad to arbtrary eve ouform samplgs NSyc [22]. Aother accelerated method, BOOM, was proposed [17]. Dstrbuted radomzed coordate descet methods wth purely dual aalyss clude Hydra [21] ad Hydra 2 [5] accelerated varat of Hydra. Quartz specalzed to the dstrbuted samplg acheves the same rate as Hydra, but for both the prmal ad dual problems smultaeously. Geeral problem. We cosder the problem 1 ad cosequetly, the assocated dual a rather geeral form; most exstg prmal-dual methods focus o the case whe g s a quadratc e.g., [29, 28] or m = 1 e.g., [36]. Lower bouds for a varat of problem 1 were recetly establshed by Agarwal ad Bottou [1]. 1.3 Outle I Secto 2 we descrbe the algorthm ad show that t admts a atural terpretato terms of Fechel dualty. We also outle the smlartes ad dffereces of the prmal ad dual update steps wth SDCA-lke methods. I Secto 3 we show how parameters {v } satsfyg the ESO equalty 6 ca be computed for several selected samplgs. We the proceed to Secto 4 where we state the ma result, specalze t to some of the samplgs dscussed Secto 3. Sectos 5 ad 6 deal wth Quartz specalzed to the τ-ce ad dstrbuted samplg, respectvely. We also gve detaled comparso of our results wth exstg results for related prmal-dual stochastc methods exstg the lterature, ad aalyze theoretcal speedup factors. We the provde the proof of the ma complexty result Secto 7. I Secto 8 we perform umercal expermets o the problem of trag L 2 -regularzed lear support vector mache wth square ad smoothed hge loss wth real datasets. Fally, Secto 9 we coclude. 2 The Quartz Algorthm I ths secto we descrbe our method Algorthm Prelmares The most mportat parameter of Quartz s a radom samplg Ŝ of the dual varables [] = {1, 2,..., }. That s, Ŝ s a radom subset of [], or more precsely, a radom set-valued mappg wth values beg the subsets of []. I order to guaratee that each block dual varable has a chace to get updated by the method, we ecessarly eed to make the followg assumpto. 5

6 Assumpto 1 Proper samplg Ŝ s a proper samplg. That s, def p = P Ŝ > 0, []. 7 However, we shall ot make ay other assumpto o Ŝ. Pror to rug the algorthm, we compute postve costats v 1,..., v satsfyg 6 such costats always exst as these are used to defe the stepsze parameter θ used throughout: θ = m p λγ v + λγ. 8 We shall show how ths parameter ca be computed for varous samplgs Secto 3. Let us ow formalze the otos of 1/γ-smoothess ad strog covexty. Assumpto 2 Loss For each [], the loss fucto φ : R m R s covex, dfferetable ad has 1/γ-Lpschtz cotuous gradet wth respect to the L2 orm, where γ s a postve costat: φ x φ y 1 γ x y, x, y Rm. For brevty, the last property s ofte called 1/γ-smoothess. It follows that φ s γ-strogly covex. Assumpto 3 Regularzer The regularzer g : R d R s 1-strogly covex. That s, gw gw + gw, w w w w 2, w, w R d, where gw s a subgradet of g at w. It follows that g s 1-smooth. 2.2 Descrpto of the method Quartz starts wth a tal par of prmal ad dual vectors w 0, α 0. Gve w t 1 ad α t 1, the method matas the vector ᾱ t 1 = 1 A α t 1. 9 λ Itally ths s computed from scratch, ad subsequetly t s mataed a effcet maer at the ed of each terato. Let us ow descrbe how the vectors w t ad α t are computed. Quartz frst updates the prmal vector w t by settg t to a covex combato of the prevous value w t 1 ad g ᾱ t 1 : w t = 1 θw t 1 + θ g ᾱ t We the proceed to select, ad subsequetly update, a radom subset S t [] of the dual varables, depedetly from the sets draw prevous teratos, ad followg the dstrbuto of Ŝ. Clearly, there are may ways whch the dstrbuto of Ŝ ca be chose, leadg the umerous varats of Quartz. We shall descrbe some of them Secto 3. We allow two optos for the actual computato of the dual updates. Oce the dual varables are updated, the vector ᾱ t s updated a effcet maer so that 9 holds. The etre process s repeated. 6

7 Algorthm 1 Quartz Parameters: proper radom samplg Ŝ ad a postve vector v R Italzato: Choose α 0 R N ad w 0 R d p Set p = P Ŝ, θ = m λγ v +λγ ad ᾱ0 = 1 λ A α 0 for t 1 do w t = 1 θw t 1 + θ g ᾱ t 1 α t = α t 1 Geerate a radom set S t [], followg the dstrbuto of Ŝ for S t do Calculate α t usg oe of the followg optos: Opto I : [ ] α t = arg max R m φ αt 1 + g ᾱ t 1 A v 2 2λ Opto II : α t = θp 1 α t = αt 1 + α t ed for ᾱ t = ᾱ t 1 + λ 1 S t A α t ed for Output: w t, α t α t 1 θp 1 φ A wt Fechel dualty terpretato. Quartz has a atural terpretato terms of Fechel dualty. Fx a prmal-dual par of vectors w, α R d R N ad defe ᾱ = 1 λ A α. The dualty gap for the par w, α ca be decomposed as follows: 1+2 P w Dα = λ gw + g ᾱ + 1 φ A w + φ α = λgw + g ᾱ w, ᾱ + 1 }{{} GAP gw,α φ A w + φ α + A w, α. }{{} GAP φ w,α By Fechel-Youg equalty, GAP g w, α 0 ad GAP φ w, α 0 for all, whch proves weak dualty for the problems 1 ad 2,.e., P w Dα. The par w, α s optmal whe both GAP g ad GAP φ for all are zero. It s kow that ths happes precsely whe the followg optmalty codtos hold: w = g ᾱ, 11 α = φ A w, []. 12 We wll ow terpret the prmal ad dual steps of Quartz terms of the above dscusso. At terato t we frst set the prmal varable w t to a covex combato of ts curret value w t 1 ad a value that would set GAP g to zero: see 10. Hece, our prmal update s ot as aggressve as that of Prox-SDCA. Ths s followed by adjustg the dual varables correspodg to a radomly chose set of examples S t. Uder Opto II, for each example S t, the -th dual varable α t s set to a covex combato of ts curret value α t 1 ad the value that would set GAP φ to zero: α t = 1 θp α t 1 + θ φ A w t. p 7

8 Quartz vs Prox-SDCA. I the specal case whe Ŝ s the seral uform samplg.e., p = 1/ for all [], Quartz ca be compared to Proxmal Stochastc Dual Coordate Ascet Prox- SDCA [28, 29]. Ideed, f Opto I s always used Quartz, the the dual update of α t Quartz s exactly the same as the dual update of Prox-SDCA usg Opto I. I ths case, the dfferece betwee our method ad Prox-SDCA les the update of the prmal varable w t : whle Quartz performs the update 10, Prox-SDCA see also [34, 10] performs the more aggressve update w t = g ᾱ t 1. 3 Expected Separable Overapproxmato For the sake of brevty, t wll be coveet to establsh some otato. Let A = [A 1,..., A ] R d N = R d m be the block matrx wth blocks A R d m. Further, let A j be the j-th row of A. Lkewse, for h R N we wll wrte h = h 1,..., h, where h R m, so that Ah = A h. For a vector of postve weghts w R, we defe a weghted Eucldea orm R N by h 2 w def = w h 2, 13 where s the stadard Eucldea orm o R m. For S [] def = {1,..., } ad h R N we use the otato h [S] to deote the vector R N cocdg wth h for blocks S ad zero elsewhere: h [S] = { h, f S, 0, otherwse. Wth ths otato, we have Ah [S] = S A h. 14 As metoed before, our aalyss we requre that the radom samplg Ŝ ad the postve vector v R used Quartz satsfy equalty 6. We shall ow formalze ths as a assumpto, usg the compact otato establshed above. Assumpto 4 ESO The followg equalty holds for all h R N : E[ Ah [ Ŝ] 2 ] h 2 p v, 15 where p = p 1,..., p s defed 7, v = v 1,..., v > 0 ad p v = p 1 v 1,..., p v R. Note that for ay proper samplg Ŝ, there must exst vector v > 0 satsfyg Assumpto 4. Hece, ths s a assumpto that such a vector v s readly avalable. Ideed, the term o the left s a fte average of covex quadratc fuctos of h, ad hece s a covex quadratc. Moreover, we ca wrte E[ Ah [ Ŝ] 2 ] = E[h [Ŝ]A Ah [ Ŝ] ] = h P A A h, where deotes the Hadamard compoet-wse product of matrces ad P R N N s a -by- block matrx wth block, j equal to P Ŝ, j Ŝ1 m, wth 1 m beg the m-by-m matrx of all oes. Hece 15 merely meas to upper boud the matrx P A A by a -by- block dagoal 8

9 matrx D = D p,v, the -th block of whch s equal to p v I m wth I m beg the m-by-m detty matrx. There s a fte umber of ways how ths ca be doe theory. Ideed, for ay proper samplg Ŝ ad ay postve w R, 15 holds wth v = tw, where t = λ max Dp,w 1/2 P A T ADp,w 1/2, sce the P A A td p,w = D p,v. I practce, ad especally the bg data settg whe s very large, computg v by solvg a egevalue problem wth a N N matrx recall that N = m wll be ether effcet or mpossble. It s therefore mportat that a good.e., small, albet perhaps suboptmal v ca be detfed cheaply. I all the cases we cosder ths paper, the detfcato of v ca be doe durg the tme the data s beg read; or tme roughly equal to a sgle pass through the data matrx A. I the specal case of uform 3 samplgs but for arbtrary smooth fuctos ad ot just quadratcs; whch s all we eed here, equalty 15 was troduced ad studed by Rchtárk ad Takáč [23], the cotext of complexty aalyss of o prmal-dual parallel block coordate descet methods. A varat of ESO for arbtrary possbly ouform samplgs was troduced [22]; ad to the best of our kowledge that s the oly work aalyzg a stochastc coordate descet method whch uses a arbtrary samplg. However, NSyc s ot a prmal-dual method ad apples to a dfferet problem ucostraed mmzato of a smooth strogly covex fucto. Besdes [23, 22], ESO equaltes were further studed [30, 31, 7, 21, 6, 5, 12]. 3.1 Seral samplgs The most studed samplg lterature o stochastc optmzato s the seral samplg, whch correspods to the selecto of a sgle block []. That s, Ŝ = 1 wth probablty 1. The ame seral s potg to the fact that a method usg such a samplg wll typcally be a seral as opposed to beg parallel method; updatg a sgle block dual varable at a tme. A seral samplg s uquely characterzed by the vector of probabltes p = p 1,..., p, where p s defed by 7. It turs out that we ca fd a vector v > 0 for whch 15 holds for ay seral samplg, depedetly of ts dstrbuto gve by p. Lemma 5 If Ŝ s a seral samplg.e., f satsfed for Ŝ = 1 wth probablty 1, the Assumpto 4 s v = λ max A A, []. 16 Proof Note that for ay h R N, E[ Ah [ Ŝ] 2 ] = p Ah [{}] 2 14 = p h A A h p λ max A A h 2 13 = h 2 p v. 3 A samplg Ŝ s uform f p = pj for all, j. It s easy to see that the, ecessarly, p = E[ Ŝ ]/ for all. E[ Ŝ ] The ESO equalty studed [23] s of the form: E[ξx + h [ Ŝ] ] ξx + ξx, h v h 2. I the case of uform samplg, x = 0 ad ξh = 1 2 Ah 2, we recover 15. 9

10 Note that v s the largest egevalue of a m-by-m matrx. If m s relatvely small ad may mache learg applcatos oe has m = 1; as examples are usually vectors ad ot matrces, the the cost of computg v s small. If m = 1, the v s smply the squared Eucldea orm of the vector A, ad hece oe ca compute all of these parameters oe pass through the data e.g., durg loadg to memory. 3.2 Parallel τ-ce samplg We ow cosder Ŝ whch selects subsets of [] of cardalty τ, uformly at radom. I the termology establshed [23], such Ŝ s called τ-ce. Ths samplg satsfes p = p j for all, j []; ad hece t s uform. Ths samplg s well suted for parallel computg. Ideed, Quartz could be mplemeted as follows. If we have τ processors avalable, the at the begg of terato t we ca assg each block dual varable S t to a dedcated processor. The processor assged to would the compute α t ad apply the update. If all processors have fast access to the memory where all the data s stored, as s the case a shared-memory multcore workstato, the ths way of assgg workload to the dvdual processors does ot cause ay major problems. Depedg o the partcular computer archtecture ad the sze m of the blocks whch wll fluece processg tme, t may be more effcet to chose τ to be a multple of the umber of processors avalable, whch case each terato every processor updates more tha oe block. The followg lemma gves a closed-form formula for parameters {v } for whch the ESO equalty holds. Lemma 6 compare wth [6] If Ŝ s a τ-ce samplg, the Assumpto 4 s satsfed for d v = λ max 1 + ω j 1τ 1 A 1 ja j, [], 17 j=1 where for each j [d], ω j s the umber of ozero blocks the j-th row of matrx A,.e., ω j def = { [] : A j 0}, j [d]. 18 Proof I the m = 1 case the result follows from Theorem 1 [6]. Exteso to the m > 1 case s straghtforward. Note that v s the largest egevalue of a m-by-m matrx whch s formed as the sum of d rak-oe matrces. The formato of all of these matrces takes tme proportoal to the umber of ozeros A f the data s stored a sparse format. Costats {ω j } ca be computed by scag the data oce e.g., durg loadg-to-memory phase. Fally, oe must compute egevalue problems for matrces of sze m m. I most applcatos, m = 1, so there s o more work to be doe. If m > 1, the cost of computg these egevalues would be small. Whle for τ = 1 t was easy to fd parameters {v } for ay samplg ad hece, as we wll see, t wll be easy to fd a optmal samplg, ths s ot the case the τ > 1 case. The task s geeral a dffcult optmzato problem. For some work ths drecto we refer the reader to [22]. 10

11 3.3 Product samplg I ths secto we gve a example of a samplg Ŝ whch ca be both o-uform ad o-seral.e., for whch P Ŝ = 1 1. We make the followg group separablty assumpto: there s a partto X 1,..., X τ of [] accordg to whch the examples {A } ca be parttoed to τ groups such that o feature s shared by ay two examples belogg to dfferet groups. Cosder the followg example wth m = 1, = 5 ad d = 4: A = [A 1, A 2, A 3, A 4, A 5 ] = If we choose τ = 2 ad X 1 = {1, 2}, X 2 = {3, 4, 5}, the o row of A has a ozero both a colum belogg to X 1 ad a colum belogg to X 2. Wth each [] we ow assocate l [τ] such that X l ad defe: S def = X 1 X τ. The product samplg Ŝ s obtaed by choosg S S, uformly at radom; that s, va: PŜ = S = 1 S = 1 τ l=1 X, S S. 19 l The Ŝ s proper ad def p = P Ŝ = l l X l 19 = 1, []. 20 S X l Hece the samplg s ouform as log as ot all of the sets X l have the same cardalty. We ext show that the product samplg Ŝ defed as above allows the same stepsze parameter v as the seral uform samplg. Lemma 7 Uder the group separablty assumpto, Assumpto 4 s satsfed for the product samplg Ŝ ad v = λ max A A, []. Proof For each j [d], deote by A j: the j-th row of the matrx A ad Ω j the colum dex set def of ozero blocks A j: : Ω j = { [] : A j 0}. For each l [τ], defe: J l def = {j [d] : Ω j X l }. 21 I words, J l s the set of features assocated wth the examples X l. By the group separablty assumpto, J 1,..., J τ forms a partto of [d], amely, τ J l = [d]; J k J l =, k l [τ]. 22 l=1 11

12 Thus, A A = d j=1 A 22 j:a j: = τ l=1 j J l A j:a j:. 23 Now fx l [τ] ad j J l. For ay h R N we have: E[h [ Ŝ] A j:a j: h [ Ŝ] ] = h A ja j h P Ŝ, Ŝ =, [], Ω j h A ja j h P Ŝ, Ŝ. Sce X 1,..., X τ forms a partto of [], the ay two dexes belogg to the same subset X l wll ever be selected smultaeously Ŝ,.e., Therefore, P Ŝ, Ŝ = { p f = 0 f,, X l. E[h [ Ŝ] A j:a j: h [ Ŝ] ] = Ω j h A ja j h p = h A ja j h p. 24 It follows from 23 ad 24 that: E[ Ah [ Ŝ] 2 ] = E[h [ Ŝ] A Ah [ Ŝ] ] = τ l=1 j J l E[h [ Ŝ] A j:a j: h [ Ŝ] ] = τ h A ja j h p. 25 l=1 j J l Hece, E[ Ah [ Ŝ] 2 ] 22 = d j=1 h A j A jh p λ maxa A h h p = h 2 p v. 3.4 Dstrbuted samplg We ow descrbe a samplg whch s partcularly sutable for a dstrbuted mplemetato of Quartz. Ths samplg was frst proposed [21] ad later used [5], where the dstrbuted coordate descet algorthm Hydra ad ts accelerated varat Hydra 2 were proposed ad aalyzed, respectvely. Both methods were show to be able to scale up to huge problem szes tests were performed o problem szes of several TB; ad up 50 bllo dual varables sze. Cosder a dstrbuted computg evromet wth c odes/computers. For smplcty, assume that s a teger multple of c ad let the blocks {1, 2,..., } be parttoed to c sets of equal sze: P 1, P 2,..., P c. We assg partto P l to ode l. The data A 1,..., A ad the dual varables blocks α 1,..., α are parttoed accordgly ad stored o the respectve odes. At each terato, all odes l {1,..., c} parallel pck a subset Ŝl of τ dual varables from those they ow,.e., from P l, uformly at radom. That s, each ode locally performs a τ-ce samplg, depedetly from the other odes. Node l computes the updates to the dual varables α correspodg to S l, ad locally stores them. Hece, a sgle dstrbuted terato, Quartz updates the dual varables belogg to the set Ŝ def = c l=1ŝl. Ths defes a samplg, whch we wll call c, τ-dstrbuted samplg. 12

13 Of course, there are other mportat cosderatos pertag to the dstrbuted mplemetato of Quartz, but we do ot dscuss them here as the focus of ths secto s o the samplg. However, t s possble to desg a dstrbuted commucato protocol for the update of the prmal varable. The followg result gves a formula for admssble parameters {v }. Lemma 8 compare wth [5] If Ŝ s a c, τ-dstrbuted samplg, the Assumpto 4 s satsfed for d v = λ max 1 + τ 1ω j 1 τc max { + c 1, 1} τ 1 ω j 1 max{ c 1, 1} ω j ω j A ja j, [], j=1 where ω j s the umber of ozero blocks the j-th row of the matrx A, as defed prevously 18, ad ω j s the umber of parttos actve at row j of A, more precsely, ω j 26 def = {l [c] : { P l : A j 0} }, j [d]. 27 Proof Whe m = 1, the result s equvalet to Theorem 4.1 [5]. The exteso to blocks m > 1 s straghtforward. Lemma 6 s a specal case of Lemma 8 whe oly a sgle ode c = 1 s used, whch case ω j = 1 for all j [d]. Lemma 8 also mproves the costats {v } derved [21], where stead of ω j ad ω j 26 oe has max j ω j ad max j ω j. Lemma 8 s expressed terms of certa sparsty parameters assocated wth the data {ω j } ad the parttog {ω j }. However, t s possble to derve alteratve ESO results for the c, τ-dstrbuted samplg. For stace, oe ca stead express the parameters {v j } wthout ay sparsty assumptos, usg oly spectral propertes of the data oly. We have ot cluded these results here, but the m = 1 case such results have bee derved [5]. It s possble to adopt them to the m = 1 case as we have doe t wth Lemma 8. 4 Ma Result The complexty of our method s gve by the followg theorem. Theorem 9 Ma Result Let Assumpto 2 φ are 1/γ-smooth ad Assumpto 3 g s 1- strogly covex be satsfed. Let Ŝ be a proper samplg Assumpto 1 ad v 1,..., v be postve scalars satsfyg Assumpto 4. The the sequece of prmal ad dual varables {w t, α t } t 0 of Quartz Algorthm 1 satsfes: E[P w t Dα t ] 1 θ t P w 0 Dα 0, 28 where θ = m p λγ v + λγ

14 I partcular, f we fx ɛ P w 0 Dα 0, the for 1 T max + v log p p λγ we are guarateed that E[P w T Dα T ] ɛ. P w 0 Dα 0 ɛ, 30 A result of a smlar flavour but for a dfferet problem ad ot a prmal-dual settg has bee establshed [22], where the authors aalyze a parallel coordate descet method, NSyc, also wth a arbtrary samplg, for mmzg a strogly covex fucto uder a ESO assumpto. I the rest of ths secto we wll specalze the above result to a few selected samplgs. We the devote two separate sectos to Quartz specalzed to the τ-ce samplg Secto 5 ad Quartz specalzed to the c, τ-dstrbuted samplg Secto 6 as we do a more detaled aalyss of the results these two cases. 4.1 Quartz wth uform seral samplg We frst look at the specal case whe Ŝ s the uform seral samplg,.e., whe p = 1/ for all []. Corollary 10 Assume that at each terato of Quartz we update oly oe dual varable uformly at radom ad use v = λ max A A for all []. If we let ɛ P w 0 Dα 0 ad T + max λ max A A P w 0 Dα 0 log, 31 λγ ɛ the E[P w T Dα T ] ɛ. Proof The result follows by combg Lemma 5 ad Theorem 9. Corollary 10 should be compared wth Theorem 5 [29] coverg the L2-regularzed case ad Theorem 1 [28] coverg the case of geeral g. They obta the rate + max λ max A A log + max λ max A A Dα Dα 0, λγ λγ ɛ where α s the dual optmal soluto. Notce that the domat terms the two rates exactly match, although our logarthmc term s better ad smpler. 4.2 Quartz wth optmal seral samplg mportace samplg Accordg to Lemma 5, the parameter v for a seral samplg Ŝ s determed by 16 ad s depedet of the dstrbuto of Ŝ. We ca the seek to maxmze the quatty θ 29 to obta the best boud. A smple calculato reveals that the optmal probablty s gve by: PŜ = {} = p def = λ max A A + λγ λmax A A λγ Usg ths samplg, we obta the followg terato complexty boud, whch s a mprovemet o the boud for uform probabltes

15 Corollary 11 Assume that at each terato of Quartz we update oly oe dual varable at radom accordg to the probablty p defed 32 ad use v = λ max A A for all []. If we let ɛ P w 0 Dα 0 ad T the E[P w T Dα T ] ɛ. + 1 λ maxa A λγ P w 0 Dα 0 log ɛ, 33 Note that cotrast wth the seral uform samplg, we ow have depedece o the average of the egevalues. The above result should be compared wth the complexty result of Iprox- SDCA [37]: 1 + λ maxa A 1 log + λ maxa A Dα Dα 0, λγ λγ ɛ where α s the dual optmal soluto. Aga, the domat terms the two rates exactly match, although our logarthmc term s better ad smpler. 4.3 Quartz wth product samplg I ths secto we apply Theorem 9 to the case whe Ŝ s the product samplg see the descrpto Secto 3.3. All the otato we use here was establshed there. Corollary 12 Uder the group separablty assumpto, let Ŝ be the product samplg ad let v = λ max A A for all []. If we fx ɛ P w 0 Dα 0 ad the E[P w T Dα T ] ɛ. T max X l + λ maxa A X l log λγ P w 0 Dα 0 Proof The proof follows drectly from Theorem 9, Lemma 7 ad 20. Recall from Secto 3.3 that the product samplg Ŝ has cardalty τ 1 ad s o-uform as log as all the sets {X 1,..., X τ } do ot have the same cardalty. To the best of our kowledge, Corollary 12 s the frst explct complexty boud of stochastc algorthm usg o-seral ad ouform samplg for composte covex optmzato problem the paper [22] oly deals wth smooth fuctos ad the method s ot prmal-dual, albet uder the group separablty assumpto. Let us compare the complexty boud wth the seral uform case Corollary 10: + max λ maxa A λγ max X l + λmaxa A X l λγ m ɛ X l. Hece the terato boud of Quartz specalzed to product samplg s at most a max X l / fracto of that of Quartz specalzed to seral uform samplg. The factor max X l / vares from 1/τ to 1, depedg o the degree to whch the partto X 1,..., X τ s balaced. A perfect, 15

16 lear speedup max X l / = 1/τ oly occurs whe the partto X 1,..., X τ s perfectly balaced.e., the set X l have the same cardalty, whch case the product samplg s uform recall the defto of uformty we use ths paper: P Ŝ = P Ŝ for all, []. Note that f the partto s ot perfectly but suffcetly so, the the factor max X l / wll be close to the perfect lear speedup factor 1/τ. 5 Quartz wth τ-ce Samplg stadard m-batchg We ow specalze Theorem 9 to the case of the τ-ce samplg. Corollary 13 Assume Ŝ s the τ-ce samplg ad v s chose as 17. If we let ɛ P w0 Dα 0 ad d T max λ max τ + j=1 1 + ω j 1τ 1 1 A j A j P w 0 Dα 0 log, 34 λγτ ɛ the E[P w T Dα T ] ɛ. Proof The result follows by combg Lemma 6 ad Theorem 9. Let us ow have a detaled look at the above result; especally terms of how t compares wth the seral uform case Corollary 10. We do ths comparso Table 1. For fully sparse data, we get perfect lear speedup: the boud the secod le of Table 1 s a 1/τ fracto of the boud the frst le. For fully dese data, the codto umber κ def = max λ max A A /γλ s uaffected by m-batchg/parallelzato. Hece, lear speedup s obtaed f κ = O/τ. For geeral data, the behavour of Quartz wth τ-ce samplg terpolates these two extreme cases. That s, κ gets multpled by a quatty betwee 1/τ fully sparse case ad 1 fully dese case. It s coveet to wrte ths factor the form τ ω 1τ 1 1 where ω [1, ] s a measure of average sparsty of the data, usg whch we ca wrte:, T τ def = 1 + ω 1τ 1 τ + 1 max λ max A A P w 0 Dα 0 log. 35 λγτ ɛ 5.1 Theoretcal speedup factor For smplcty of exposto, let us ow assume that λ max A A = 1. theoretcal speedup factor, defed as: T 1 35 = T τ τ1 + λγ 1 + λγ + τ 1 ω 1 1 = τ 1 + τ 1 ω 1 11+λγ We wll ow study the

17 Samplg Ŝ Data Complexty of Quartz 34 Theorem Seral uform Ay data + max λ max A A λγ Corollary 10 τ-ce Fully sparse data ω j = 1 for all j τ + max λ max A A λγτ Corollary 13 τ-ce Fully dese data ω j = for all j τ + max λ max A A λγ Corollary 13 τ-ce Ay data τ ω 1τ 1 1 max λ max A A λγτ Corollary 13 Table 1: Comparso of the complexty of Quartz wth seral uform samplg ad τ-ce samplg. That s, the speedup factor measures how much better Quartz s wth τ-ce samplg tha the seral uform case wth 1-ce samplg. Note that the speedup factor s a cocave ad creasg fucto wth respect to the umber of threads τ. The value depeds o two factors: the relatve sparsty level of the data matrx A, expressed through the quatty ω 1 1 ad the codto umber of the problem, expressed through the quatty λγ. We provde below two lower bouds for the speedup factor: T 1, 1 T 1, τ τ 1 + ω 1 1 τ 2 f 1 τ 2 + λγ. 37 Note that the last term does ot volve ω. I other words, lear speedup modulus a factor of 2 s acheved at least utl τ = 2 + λγ of course, we also requre that τ, regardless of the data matrx A. For stace, f λγ = 1/, whch s a frequetly used settg for the regularzer, the we get data depedet lear speedup up to m-batch sze τ = 2 +. Moreover, from the frst equalty 37 we see that there s further data-depedet speedup, depedg o the average sparsty measure ω. We gve a llustrato of ths pheomeo Fgure 1, where we plot the theoretcal speedup factor 36 as a fucto of the umber of threads τ, for = 10 6, γ = 1 ad three values of ω ad λ. Lookgat the plots from rght to left, we see that for fxed λ, the speedup factor creases as ω decreases, as descrbed by 36. Moreover, as the regularzato parameter λ gets smaller a reaches the value 1/, the speedup factor s healthy for sparse data oly. However, for λ = 1/ = 10 3, we observe lear speedup up to τ = = 1000, regardless 17

18 of ω the sparsty of the data, as predcted. There s addtoal data-drve speedup beyod ths pot, whch s better for sparser data λ=1e-3 λ=1e-4 λ=1e λ=1e-3 λ=1e-4 λ=1e λ=1e-3 λ=1e-4 λ=1e-6 speed up factor speed up factor speed up factor τ a ω = 10 2, = 10 6, γ = τ b ω = 10 4, = 10 6, γ = τ c ω = 10 6, = 10 6, γ = 1 Fgure 1: The speedup factor 36 as a fucto of τ for = 10 6, γ = 1, three regularzato parameters ad data of varous sparsty levels. 5.2 Quartz vs exstg prmal-dual m-batch methods We ow compare the above result wth exstg m-batch stochastc dual coordate ascet methods. A m-batch varat of SDCA, to whch Quartz wth τ-ce samplg ca be aturally compared, has bee proposed ad aalyzed prevously [30], [28] ad [36]. I [30], the authors proposed to use a so-called safe m-batchg, whch s precsely equvalet to fdg the stepsze parameter v satsfyg Assumpto 4 the specal case of τ-ce samplg. However, they oly aalyzed the case where the fuctos φ are o-smooth. I [28], the authors studed accelerated m-batch SDCA ASDCA, specalzed to the case whe the regularzer g s the squared L2 orm. They showed that the complexty of ASDCA terpolates betwee that of SDCA ad accelerated gradet descet AGD [20] through varyg the m-batch sze τ. I [36], the authors proposed a m-batch exteso of ther stochastc prmal-dual coordate algorthm SPDC. Both ASDCA ad SPDC reach the same complexty as AGD whe the m-batch sze equals to, thus should be cosdered as accelerated algorthms. The complexty bouds for all these algorthms are summarzed Table 2. To facltate the comparso, we assume that max λ max A A = 1 sce the aalyss of ASDCA assumes ths. I Table 3 we compare the complextes of SDCA, ASDCA, SPDC ad Quartz several regmes. We have used Lemma 14 to smplfy the bouds for Quartz. Lemma 14 For ay ω [1, ] ad τ [1, ] we have ω 1τ 1 1 ωτ 1 + ω 1τ ωτ. Proof The secod equalty follows by showg that the fucto φ 1 x = x + ω xτ x x s creasg, the frst ad thrd follow by showg that φ 2 x = ω xτ x x s decreasg o [0, 1]. The mootocty clams follow from the fact that φ 1 x = 2 + ωτ ω+τ x 2 φ 2 x = φ ω τ x2 1 x 1 = 0 for all x [0, 1]. x 2 18 = ω τ x 2 0 ad

19 Algorthm Iterato complexty g SDCA [29] + { 1 λγ } ASDCA [28] 4 max τ, λγτ, 1 λγτ, 3 1 SPDC [36] Quartz wth τ-ce samplg τ + τ + λγτ 1 + ω 1τ 1 1 λγτ λγτ geeral geeral Table 2: Comparso of the terato complexty of several prmal-dual algorthms performg stochastc coordate ascet steps the dual usg a m-batch of examples of sze τ wth the excepto of SDCA, whch s a seral method usg τ = 1. We assume that λ max A A = 1 for all to facltate comparso sce ths assumpto has bee mplctly made [28]. Algorthm γλ = Θ 1 γλ = Θ 1 τ γλ = Θ1 γλ = Θτ γλ = Θ κ = 3/2 κ = τ κ = κ = /τ κ = SDCA [29] ASDCA [28] SPDC [36] Quartz τ-ce 3/2 τ 3/2 /τ + 5/4 / τ + / τ /τ /τ + 3/4 / τ 4/3 /τ 2/3 5/4 / τ / τ /τ /τ + 3/4 / τ 3/2 /τ + ω + ωτ /τ + ω /τ /τ + ω/ Table 3: Comparso of leadg factors the complexty bouds of several methods 5 regmes; where κ = 1/γλ s the codto umber. We gore costat terms ad hece oe ca replace each plus by a max. Lookg at Table 3, we see that the γλ = Θτ regme.e., f the codto umber s κ = Θ/τ, Quartz matches the lear speedup whe compared to SDCA of ASDCA ad SPDC. Whe the codto umber s roughly equal to the sample sze κ = Θ, the Quartz does better tha both ASDCA ad SPDC as log as /τ + ω / τ. I partcular, ths s the case whe the data s sparse: ω / τ. If the data s eve more sparse ad may bg data applcatos oe has ω = O1 ad we have ω /τ, the Quartz sgfcatly outperforms both ASDCA ad SPDC. Note that Quartz ca be better tha both ASDCA ad SPDC eve the doma of accelerated methods, that s, whe the codto umber s larger tha the umber of examples: κ = γλ Ideed, we have the followg result, whch ca be terpreted as follows: f κ τ/4 that s, λγτ 4, the there are sparse-eough problems for whch Quartz s better tha both ASDCA ad SPDC. 19

20 Proposto 15 Assume that 38 holds ad that max λ max A A = 1. The f the data s suffcetly sparse so that ω 1τ 1 2 λγτ 2 +, 39 1 the terato complexty Õ order of Quartz s better tha that of ASDCA ad SPDC. Proof As log as λγτ 1, whch holds uder our assumpto, the terato complexty of ASDCA s: { } 1 Õ max τ, λγτ, 1 λγτ, 3 = λγτ Õ. 2 3 λγτ whch s already less tha that of SPDC. Moreover, λγτ τ 1 ω 1 1 λγτ 38 τ τ 1 ω 1 1. λγτ 6 Quartz wth Dstrbuted Samplg I ths secto we apply Theorem 9 to the case whe Ŝ s the c, τ-dstrbuted samplg; see the descrpto of ths samplg Secto 3.4. Corollary 16 Assume that Ŝ s a c, τ-dstrbuted samplg ad v s chose as 26. If we let ɛ P w 0 Dα 0 ad P w 0 Dα 0 T T c, τ log, 40 ɛ where d T c, τ def = λ max cτ + max j=1 the E[P w T Dα T ] ɛ. 1 + τ 1ω j 1 max{/c 1,1} + τc Proof If Ŝ s a c, τ-dstrbuted samplg, the λγcτ τ 1 max{/c 1,1} ω j 1 ω j ω j A j A j, 41 p = cτ, []. It ow oly remas to combe Theorem 9 ad Lemma 8. The expresso 41 volves ω j, whch depeds o the parttog {P 1, P 2,..., P c } of the dual varable ad the data. The followg lemma says that the effect of the partto s eglgble, ad fact vashes as τ creases. It was proved [5, Lemma 5.2]. 20

21 Lemma 17 [5] If /c 2 ad τ 2, the for all j [d], we have τc τ 1 ω j 1 /c 1 ω j ω j τ 1ω j 1. τ 1 /c 1 Accordg to ths result, whe each ode ows at least two dual examples /c 2 ad pcks ad updates at least two examples each terato τ 2, the T c, τ cτ d max λ max j=1 1 + τ 1ω j 1 /c 1 A j A j τ 1 λγcτ = cτ τ 1ˆω 1 max λ max A 1 + A, 42 τ 1 /c 1 λγcτ where ˆω [1, ] s a average sparsty measure smlar to that oe we troduced the study of τ-ce samplg. Ths boud s smlar to that we obtaed for the τ-ce samplg; ad ca be terpreted a aalogous way. Note that as the frst term receves perfect m-batch scalg t s dvded by cτ, whle the codto umber max λ max A A /λγ s dvded by cτ but also. However, ths term s bouded by 2ˆω, ad hece f ˆω s multpled by τ τ 1ˆω 1 /c 1 small, the codto umber also receves a early perfect m-batch scalg. 6.1 Quartz vs DSDCA A dstrbuted varat of SDCA, amed DsDCA, has bee proposed [34] ad aalyzed [35]. The authors of [34] proposed a basc DsDCA varat whch was aalyzed ad a practcal DsDCA varat whch was ot aalyzed. The complexty of basc DsDCA was show to be: cτ + max λ max A A log λγ cτ + max λ max A A λγ Dα Dα 0, 43 ɛ where α s a optmal dual soluto. Note that ths rate s much worse tha our rate. Igorg the logarthmc terms, whle the frst expresso /cτ s the same both results, f we replace all ω j by the upper boud ad all ω j by the upper boud c 41, the T c, τ max cτ + λ max A A 1 + τ 1 1 max/c 1 + τc λγcτ cτ + max λ max A A. λγ τ 1 max/c 1,1 c 1 c Therefore, the domat term 40 s a strct lower boud of that 43. Moreover, t s clear that the gap betwee 40 ad 43 s large whe the data s sparse. For stace, the perfectly sparse case wth ˆω = 1, the boud 42 for Quartz becomes whch s much better tha 43. cτ max λ max A A, τ 1 λγcτ 21

22 6.2 Theoretcal speedup factor I aalogy wth the dscusso Secto 5.1, we shall ow aalyze the theoretcal speedup factor T 1, 1/T c, τ measurg the multplcatve amout by whch Quartz specalzed to the c, τ- dstrbuted samplg s better tha Quartz specalzed to the seral uform samplg. I Secto 5, we have see how the speedup factor creases wth τ whe a m-batch of examples s used at each terato followg the τ-ce samplg. As we have dscussed before, ths samplg s ot partcularly sutable for a dstrbuted mplemetato uless τ = ; whch the bg data settg where s very large may be askg for may more cores/threads that are avalable. Ths s because the mplemetato of updates usg ths samplg would ether result frequetly dle odes, or creased data trasfer. Ofte the data matrx A s too large to be stored o a sgle ode, or lmted umber of threads/cores are avalable per ode. We the wat to mplemet Quartz a dstrbuted way c > 1. It s therefore ecessary to uderstad how the speedup factor compares to the hypothetcal stuato whch we would have a large mache where all data could be stored we gore commucato costs here ad hece a cτ-ce samplg could be mplemeted. That s, we are terested comparg T c, τ dstrbuted mplemetato ad T 1, cτ hypothetcal computer. If for smplcty of exposto we assume that λ max A A = 1, t s possble to argue that f cτ, the T 1, 1 T 1, 1 T c, τ T 1, cτ. 44 I Fgure 2 we plot the cotour les of the theoretcal speedup factor a log-log plot wth axes correspodg to τ ad c. The cotours are early perfect straght les, whch meas that the speedup factor s approxmately costat for those pars c, τ for whch cτ s the same. I partcular, ths meas that 44 holds. Note that better speedup s obtaed for sparse data the for dese data. However, all plots we have chose γ = 1 ad λ = 1/ ; ad hece we expect data depedet lear speedup up to cτ = Θ a specal le s depcted all three plots whch defes ths cotour. 4 cτ = 4 cτ = cτ = log 10c log 10c log 10τ a ω = 10 2, = log 10c log 10τ b ω = 10 4, = log 10τ 1.5 c ω = 10 6, = Fgure 2: Cotour le plots of the speedup factor T 1, 1/T c, τ for = 10 6, γ = 1, λ = 10 3, ω = 10 2 Fgure 2a, ω = 10 4 Fgure 2b, ω = 10 6 Fgure 2c. Here, ω [1, ] s a degree of average sparsty of the data. 22

23 7 Proof of the Ma Result I ths secto we prove our ma result Theorem 9. I order to make the aalyss more trasparet, we wll frst establsh three auxlary results. 7.1 Three lemmas Lemma 18 Fucto f : R N R defed 3 satsfes the followg equalty: fα + h fα + fα, h + 1 2λ 2 h A Ah, α, h R N. 45 Proof Sce g s 1-strogly covex, g s 1-smooth. Pck α, h R N. Sce, fα = λg 1 λ Aα, we have fα + h = λg 1 λ Aα + 1 λ Ah λ g 1 λ Aα + g 1 λ Aα, 1 λ Ah = fα + fα, h + 1 2λ 2 h T A Ah. 1 λ Ah 2 For s = s 1,..., s R N, h = h 1,..., h R N, where s, h R m for all, we wll for coveece wrte s, h p = p s, h, where p = p 1,..., p ad p = P Ŝ for []. I the ext lemma we gve a expected separable overapproxmato of the covex fucto D. Lemma 19 If Ŝ ad v R satsfy Assumpto 4, the for all α, h R N, the followg holds: E[ Dα + h [ Ŝ] ] fα + fα, h p + 1 2λ 2 h 2 p v + 1 [1 p φ α + p φ α h ]. 46 Proof By defto of D, we have Dα + h [ Ŝ] 2 = fα + h [ Ŝ] + ψα + h [Ŝ], where f ad ψ are defed 3 ad 4. Now we apply Lemma 18 ad 15 to boud the frst term: 45 E[fα + h [ Ŝ] ] E[fα + fα, h [ Ŝ] + 1 2λ 2 h [Ŝ]A Ah [ Ŝ] ] 15 fα + E[ fα, h [ Ŝ] ] + 1 2λ 2 h 2 p v = fα + fα, h p + 1 2λ 2 h 2 p v. 23

24 Moreover, sce ψ s block separable, we ca wrte 4 E[ψα + h [ Ŝ] ] = 1 = 1 [ ] P / Ŝφ α + P Ŝφ α h [1 p φ α + p φ α h ]. Our last auxlary result s a techcal lemma for further boudg the rght had sde Lemma 19. Lemma 20 Suppose that Ŝ ad v R satsfy Assumpto 4. Fxg α R N ad w R d, let h R N be defed by: h = θp 1 α + φ A w, [], where θ be as 29. The fα + fα, h p + 1 2λ 2 h 2 p v θdα θλg g ᾱ 1 [1 p φ α + p φ α h ] θ g ᾱ, A φ A w + θ φ φ A w, 47 where ᾱ = 1 λ Aα. Proof Recall from 3 that fα = λg ᾱ ad hece fα = 1 A g ᾱ. Thus, fα + fα, h p + 1 2λ 2 h 2 p v = λg ᾱ p 1 A g ᾱ, θp 1 α + φ A w + 1 2λ 2 h 2 p v = 1 θλg ᾱ + θλg ᾱ g ᾱ, ᾱ 1 θ g ᾱ, A φ A w + 1 2λ 2 h 2 p v. 48 Sce the fuctos φ are 1/γ-smooth, the cojugate fuctos φ Therefore, must be γ-strogly covex. φ α h = φ 1 θp 1 1 θp 1 = 1 θp 1 α + θp 1 φ A w φ α + θp 1 φ φ A w γθp 1 1 θp 1 α + φ A w 2 2 φ α + θp 1 φ φ A w γp 1 θp 1 h 2, 49 2θ 24

25 ad we ca wrte 1 [1 p φ α + p φ α h ] 49 1 θψα + θ φ φ A w 1 2λ 2 The by combg 48 ad 50 we get: fα + fα, h p + 1 2λ 2 h 2 p v θdα θλg g ᾱ λ 2 p v λγp2 1 θp 1 θ λγp 2 1 θp 1 h 2. θ [1 p φ α + p φ α h ] θ g ᾱ, A φ A w + θ h 2. It remas to otce that for θ defed 29, we have: p v λγp2 1 θp 1, []. θ φ φ A w Proof of Theorem 9 Let t 1. Defe h t = h t 1,..., ht R N by: ad κ t = κ t 1,, κt by: κ t = arg max R m h t = θp 1 α t 1 + φ A w t, [] [ φ α t 1 + g ᾱ t 1 A v 2 ], []. 2λ If we use Opto I Algorthm 1, the α t = α t 1 + κt [Ŝ]. If we use Opto II Algorthm 1, the we have α t = α t 1 + ht [Ŝ]. I both cases, by Lemma 19: E t [ Dα t ] fα t 1 + fα t 1, h t p + 1 2λ 2 ht 2 p v + 1 We ow apply Lemma 20 to further boud the last term ad obta: E t [ Dα t ] 1 θdα t 1 θλg g ᾱ t 1 1 θ g ᾱ t 1, A φ A w t + θ 25 [ 1 p φ α t 1 + p φ α t 1 h t ]. φ φ A w t. 51

26 By covexty of g, P w t = 1 1 φ A w t + λg1 θw t 1 + θ g ᾱ t 1 φ A w t + 1 θλgw t 1 + θλg g ᾱ t By combg 51 ad 52 we get: E t [P w t Dα t ] 1 φ A w t + 1 θλgw t 1 1 θdα t 1 1 θ g ᾱ t 1, A φ A w t + θ = 1 θp w t 1 Dα t φ φ A w t φ A w t 1 θφ A w t 1 θ g ᾱ t 1, A φ A w t + θ φ φ A w t. 53 Note that θ g ᾱ t 1 = w t 1 θw t 1 ad φ φ A wt = φ A wt, A wt φ A wt. Fally, we plug these two equaltes to 53 ad obta: E t [P w t Dα t ] 1 θp w t 1 Dα t φ A w t 1 θφ A w t 1 1 A w t 1 θa w t 1, φ A w t + θ φ A w t, A w t φ A w t =1 θp w t 1 Dα t θ φ A w t φ A w t 1 1 θ A w t A w t 1, φ A w t =1 θp w t 1 Dα t θ [ φ A w t φ A w t 1 + A w t 1 A w t, φ A w t ] 1 θp w t 1 Dα t 1, where the last equalty follows from the covexty of φ. 26

27 8 Expermetal Results I [29] ad [28], the reader ca fd a extesve lst of popular mache learg problems to whch Prox-SDCA ca be appled. Sharg the same prmal-dual formulato, our algorthm ca also be specfed ad appled to those applcatos, cludg Rdge regresso, SVM, Lasso, logstc regresso ad multclass predcto. We focus our umercal expermets o the L2-regularzed lear SVM problem wth smoothed hge loss or squared hge loss. These problems are descrbed detal Secto 8.1. The three ma messages that we draw from the umercal expermets are: Importace samplg does mprove the covergece for certa datasets; Quartz specalzed to seral samplgs s comparable to Prox-SDCA practce; The theoretcal speedup factor s a almost exact predctor of the actual speedup terms of terato complexty. We performed the expermets o several real world large datasets, of varous dmesos, d ad sparsty. The detals of the dataset characterstcs are provded Table 4. I all our expermets we used Opto I, whch we foud to be better practce. Dataset # Trag sze # features d Sparsty # z/d astro-ph 29,882 99, % CCAT 781,265 47, % cov1 522, % w8a 49, % jc1 49, % webspam 350, % Table 4: Datasets used our expermets. 8.1 Applcatos Smooth hge loss wth L 2 regularzer. We specfy Quartz to the lear Support Vector Mache SVM problem wth smoothed hge loss ad L 2 regularzer: where ad m w R d P w def = 1 φ y A w + λgw, 0 a 1 1 a γ/2 a 1 γ φ a = 1 a 2 otherwse. 2γ, a R 54 gw = 1 2 w 2, w R d

Dimensionality Reduction and Learning

Dimensionality Reduction and Learning CMSC 35900 (Sprg 009) Large Scale Learg Lecture: 3 Dmesoalty Reducto ad Learg Istructors: Sham Kakade ad Greg Shakharovch L Supervsed Methods ad Dmesoalty Reducto The theme of these two lectures s that

More information

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture) CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.

More information

Rademacher Complexity. Examples

Rademacher Complexity. Examples Algorthmc Foudatos of Learg Lecture 3 Rademacher Complexty. Examples Lecturer: Patrck Rebesch Verso: October 16th 018 3.1 Itroducto I the last lecture we troduced the oto of Rademacher complexty ad showed

More information

TESTS BASED ON MAXIMUM LIKELIHOOD

TESTS BASED ON MAXIMUM LIKELIHOOD ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal

More information

arxiv: v1 [cs.lg] 22 Feb 2015

arxiv: v1 [cs.lg] 22 Feb 2015 SDCA wthout Dualty Sha Shalev-Shwartz arxv:50.0677v cs.lg Feb 05 Abstract Stochastc Dual Coordate Ascet s a popular method for solvg regularzed loss mmzato for the case of covex losses. I ths paper we

More information

CHAPTER VI Statistical Analysis of Experimental Data

CHAPTER VI Statistical Analysis of Experimental Data Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

Lecture 3 Probability review (cont d)

Lecture 3 Probability review (cont d) STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto

More information

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions CO-511: Learg Theory prg 2017 Lecturer: Ro Lv Lecture 16: Bacpropogato Algorthm Dsclamer: These otes have ot bee subected to the usual scruty reserved for formal publcatos. They may be dstrbuted outsde

More information

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines It J Cotemp Math Sceces, Vol 5, 2010, o 19, 921-929 Solvg Costraed Flow-Shop Schedulg Problems wth Three Maches P Pada ad P Rajedra Departmet of Mathematcs, School of Advaced Sceces, VIT Uversty, Vellore-632

More information

Lecture 9: Tolerant Testing

Lecture 9: Tolerant Testing Lecture 9: Tolerat Testg Dael Kae Scrbe: Sakeerth Rao Aprl 4, 07 Abstract I ths lecture we prove a quas lear lower boud o the umber of samples eeded to do tolerat testg for L dstace. Tolerat Testg We have

More information

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971)) art 4b Asymptotc Results for MRR usg RESS Recall that the RESS statstc s a specal type of cross valdato procedure (see Alle (97)) partcular to the regresso problem ad volves fdg Y $,, the estmate at the

More information

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem CS86. Lecture 4: Dur s Proof of the PCP Theorem Scrbe: Thom Bohdaowcz Prevously, we have prove a weak verso of the PCP theorem: NP PCP 1,1/ (r = poly, q = O(1)). Wth ths result we have the desred costat

More information

Chapter 9 Jordan Block Matrices

Chapter 9 Jordan Block Matrices Chapter 9 Jorda Block atrces I ths chapter we wll solve the followg problem. Gve a lear operator T fd a bass R of F such that the matrx R (T) s as smple as possble. f course smple s a matter of taste.

More information

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

Functions of Random Variables

Functions of Random Variables Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,

More information

Chapter 8. Inferences about More Than Two Population Central Values

Chapter 8. Inferences about More Than Two Population Central Values Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha

More information

18.413: Error Correcting Codes Lab March 2, Lecture 8

18.413: Error Correcting Codes Lab March 2, Lecture 8 18.413: Error Correctg Codes Lab March 2, 2004 Lecturer: Dael A. Spelma Lecture 8 8.1 Vector Spaces A set C {0, 1} s a vector space f for x all C ad y C, x + y C, where we take addto to be compoet wse

More information

Bayes (Naïve or not) Classifiers: Generative Approach

Bayes (Naïve or not) Classifiers: Generative Approach Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg

More information

CHAPTER 4 RADICAL EXPRESSIONS

CHAPTER 4 RADICAL EXPRESSIONS 6 CHAPTER RADICAL EXPRESSIONS. The th Root of a Real Number A real umber a s called the th root of a real umber b f Thus, for example: s a square root of sce. s also a square root of sce ( ). s a cube

More information

A tighter lower bound on the circuit size of the hardest Boolean functions

A tighter lower bound on the circuit size of the hardest Boolean functions Electroc Colloquum o Computatoal Complexty, Report No. 86 2011) A tghter lower boud o the crcut sze of the hardest Boolea fuctos Masak Yamamoto Abstract I [IPL2005], Fradse ad Mlterse mproved bouds o the

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015 Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts

More information

The Mathematical Appendix

The Mathematical Appendix The Mathematcal Appedx Defto A: If ( Λ, Ω, where ( λ λ λ whch the probablty dstrbutos,,..., Defto A. uppose that ( Λ,,..., s a expermet type, the σ-algebra o λ λ λ are defed s deoted by ( (,,...,, σ Ω.

More information

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity BULLETIN of the MALAYSIAN MATHEMATICAL SCIENCES SOCIETY Bull. Malays. Math. Sc. Soc. () 7 (004), 5 35 Strog Covergece of Weghted Averaged Appromats of Asymptotcally Noepasve Mappgs Baach Spaces wthout

More information

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme:

More information

ENGI 3423 Simple Linear Regression Page 12-01

ENGI 3423 Simple Linear Regression Page 12-01 ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable

More information

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions Iteratoal Joural of Computatoal Egeerg Research Vol, 0 Issue, Estmato of Stress- Stregth Relablty model usg fte mxture of expoetal dstrbutos K.Sadhya, T.S.Umamaheswar Departmet of Mathematcs, Lal Bhadur

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

Lecture 07: Poles and Zeros

Lecture 07: Poles and Zeros Lecture 07: Poles ad Zeros Defto of poles ad zeros The trasfer fucto provdes a bass for determg mportat system respose characterstcs wthout solvg the complete dfferetal equato. As defed, the trasfer fucto

More information

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall

More information

Class 13,14 June 17, 19, 2015

Class 13,14 June 17, 19, 2015 Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral

More information

Introduction to local (nonparametric) density estimation. methods

Introduction to local (nonparametric) density estimation. methods Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest

More information

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution: Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

Third handout: On the Gini Index

Third handout: On the Gini Index Thrd hadout: O the dex Corrado, a tala statstca, proposed (, 9, 96) to measure absolute equalt va the mea dfferece whch s defed as ( / ) where refers to the total umber of dvduals socet. Assume that. The

More information

Research Article A New Iterative Method for Common Fixed Points of a Finite Family of Nonexpansive Mappings

Research Article A New Iterative Method for Common Fixed Points of a Finite Family of Nonexpansive Mappings Hdaw Publshg Corporato Iteratoal Joural of Mathematcs ad Mathematcal Sceces Volume 009, Artcle ID 391839, 9 pages do:10.1155/009/391839 Research Artcle A New Iteratve Method for Commo Fxed Pots of a Fte

More information

Lecture 2 - What are component and system reliability and how it can be improved?

Lecture 2 - What are component and system reliability and how it can be improved? Lecture 2 - What are compoet ad system relablty ad how t ca be mproved? Relablty s a measure of the qualty of the product over the log ru. The cocept of relablty s a exteded tme perod over whch the expected

More information

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i. CS 94- Desty Matrces, vo Neuma Etropy 3/7/07 Sprg 007 Lecture 3 I ths lecture, we wll dscuss the bascs of quatum formato theory I partcular, we wll dscuss mxed quatum states, desty matrces, vo Neuma etropy

More information

ESS Line Fitting

ESS Line Fitting ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here

More information

Chapter 14 Logistic Regression Models

Chapter 14 Logistic Regression Models Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as

More information

Simple Linear Regression

Simple Linear Regression Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato

More information

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package

More information

Generalized Linear Regression with Regularization

Generalized Linear Regression with Regularization Geeralze Lear Regresso wth Regularzato Zoya Bylsk March 3, 05 BASIC REGRESSION PROBLEM Note: I the followg otes I wll make explct what s a vector a what s a scalar usg vec t or otato, to avo cofuso betwee

More information

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy Bouds o the expected etropy ad KL-dvergece of sampled multomal dstrbutos Brado C. Roy bcroy@meda.mt.edu Orgal: May 18, 2011 Revsed: Jue 6, 2011 Abstract Iformato theoretc quattes calculated from a sampled

More information

Lecture Note to Rice Chapter 8

Lecture Note to Rice Chapter 8 ECON 430 HG revsed Nov 06 Lecture Note to Rce Chapter 8 Radom matrces Let Y, =,,, m, =,,, be radom varables (r.v. s). The matrx Y Y Y Y Y Y Y Y Y Y = m m m s called a radom matrx ( wth a ot m-dmesoal dstrbuto,

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter LOGISTIC REGRESSION Notato Model Logstc regresso regresses a dchotomous depedet varable o a set of depedet varables. Several methods are mplemeted for selectg the depedet varables. The followg otato s

More information

MATH 247/Winter Notes on the adjoint and on normal operators.

MATH 247/Winter Notes on the adjoint and on normal operators. MATH 47/Wter 00 Notes o the adjot ad o ormal operators I these otes, V s a fte dmesoal er product space over, wth gve er * product uv, T, S, T, are lear operators o V U, W are subspaces of V Whe we say

More information

To use adaptive cluster sampling we must first make some definitions of the sampling universe:

To use adaptive cluster sampling we must first make some definitions of the sampling universe: 8.3 ADAPTIVE SAMPLING Most of the methods dscussed samplg theory are lmted to samplg desgs hch the selecto of the samples ca be doe before the survey, so that oe of the decsos about samplg deped ay ay

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statstc ad Radom Samples A parameter s a umber that descrbes the populato. It s a fxed umber, but practce we do ot kow ts value. A statstc s a fucto of the sample data,.e., t s a quatty whose

More information

Lecture 8: Linear Regression

Lecture 8: Linear Regression Lecture 8: Lear egresso May 4, GENOME 56, Sprg Goals Develop basc cocepts of lear regresso from a probablstc framework Estmatg parameters ad hypothess testg wth lear models Lear regresso Su I Lee, CSE

More information

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg

More information

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek Partally Codtoal Radom Permutato Model 7- vestgato of Partally Codtoal RP Model wth Respose Error TRODUCTO Ed Staek We explore the predctor that wll result a smple radom sample wth respose error whe a

More information

Non-uniform Turán-type problems

Non-uniform Turán-type problems Joural of Combatoral Theory, Seres A 111 2005 106 110 wwwelsevercomlocatecta No-uform Turá-type problems DhruvMubay 1, Y Zhao 2 Departmet of Mathematcs, Statstcs, ad Computer Scece, Uversty of Illos at

More information

Simulation Output Analysis

Simulation Output Analysis Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Revew for the prevous lecture: Theorems ad Examples: How to obta the pmf (pdf) of U = g (, Y) ad V = g (, Y) Chapter 4 Multple Radom Varables Chapter 44 Herarchcal Models ad Mxture Dstrbutos Examples:

More information

Unsupervised Learning and Other Neural Networks

Unsupervised Learning and Other Neural Networks CSE 53 Soft Computg NOT PART OF THE FINAL Usupervsed Learg ad Other Neural Networs Itroducto Mture Destes ad Idetfablty ML Estmates Applcato to Normal Mtures Other Neural Networs Itroducto Prevously, all

More information

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematcs of Mache Learg Lecturer: Phlppe Rgollet Lecture 3 Scrbe: James Hrst Sep. 6, 205.5 Learg wth a fte dctoary Recall from the ed of last lecture our setup: We are workg wth a fte dctoary

More information

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d 9 U-STATISTICS Suppose,,..., are P P..d. wth CDF F. Our goal s to estmate the expectato t (P)=Eh(,,..., m ). Note that ths expectato requres more tha oe cotrast to E, E, or Eh( ). Oe example s E or P((,

More information

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x CS 75 Mache Learg Lecture 8 Lear regresso Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 75 Mache Learg Lear regresso Fucto f : X Y s a lear combato of put compoets f + + + K d d K k - parameters

More information

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018 Chrs Pech Fal Practce CS09 Dec 5, 08 Practce Fal Examato Solutos. Aswer: 4/5 8/7. There are multle ways to obta ths aswer; here are two: The frst commo method s to sum over all ossbltes for the rak of

More information

Algorithms Design & Analysis. Hash Tables

Algorithms Design & Analysis. Hash Tables Algorthms Desg & Aalyss Hash Tables Recap Lower boud Order statstcs 2 Today s topcs Drect-accessble table Hash tables Hash fuctos Uversal hashg Perfect Hashg Ope addressg 3 Symbol-table problem Symbol

More information

Chapter 2 - Free Vibration of Multi-Degree-of-Freedom Systems - II

Chapter 2 - Free Vibration of Multi-Degree-of-Freedom Systems - II CEE49b Chapter - Free Vbrato of Mult-Degree-of-Freedom Systems - II We ca obta a approxmate soluto to the fudametal atural frequecy through a approxmate formula developed usg eergy prcples by Lord Raylegh

More information

Generative classification models

Generative classification models CS 75 Mache Learg Lecture Geeratve classfcato models Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Data: D { d, d,.., d} d, Classfcato represets a dscrete class value Goal: lear f : X Y Bar classfcato

More information

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model Chapter 3 Asmptotc Theor ad Stochastc Regressors The ature of eplaator varable s assumed to be o-stochastc or fed repeated samples a regresso aalss Such a assumpto s approprate for those epermets whch

More information

Entropy ISSN by MDPI

Entropy ISSN by MDPI Etropy 2003, 5, 233-238 Etropy ISSN 1099-4300 2003 by MDPI www.mdp.org/etropy O the Measure Etropy of Addtve Cellular Automata Hasa Aı Arts ad Sceces Faculty, Departmet of Mathematcs, Harra Uversty; 63100,

More information

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class) Assgmet 5/MATH 7/Wter 00 Due: Frday, February 9 class (!) (aswers wll be posted rght after class) As usual, there are peces of text, before the questos [], [], themselves. Recall: For the quadratc form

More information

22 Nonparametric Methods.

22 Nonparametric Methods. 22 oparametrc Methods. I parametrc models oe assumes apror that the dstrbutos have a specfc form wth oe or more ukow parameters ad oe tres to fd the best or atleast reasoably effcet procedures that aswer

More information

Distributed Accelerated Proximal Coordinate Gradient Methods

Distributed Accelerated Proximal Coordinate Gradient Methods Dstrbuted Accelerated Proxmal Coordate Gradet Methods Yog Re, Ju Zhu Ceter for Bo-Ispred Computg Research State Key Lab for Itell. Tech. & Systems Dept. of Comp. Sc. & Tech., TNLst Lab, Tsghua Uversty

More information

1 Lyapunov Stability Theory

1 Lyapunov Stability Theory Lyapuov Stablty heory I ths secto we cosder proofs of stablty of equlbra of autoomous systems. hs s stadard theory for olear systems, ad oe of the most mportat tools the aalyss of olear systems. It may

More information

8.1 Hashing Algorithms

8.1 Hashing Algorithms CS787: Advaced Algorthms Scrbe: Mayak Maheshwar, Chrs Hrchs Lecturer: Shuch Chawla Topc: Hashg ad NP-Completeess Date: September 21 2007 Prevously we looked at applcatos of radomzed algorthms, ad bega

More information

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR amplg Theory MODULE II LECTURE - 4 IMPLE RADOM AMPLIG DR. HALABH DEPARTMET OF MATHEMATIC AD TATITIC IDIA ITITUTE OF TECHOLOGY KAPUR Estmato of populato mea ad populato varace Oe of the ma objectves after

More information

Investigating Cellular Automata

Investigating Cellular Automata Researcher: Taylor Dupuy Advsor: Aaro Wootto Semester: Fall 4 Ivestgatg Cellular Automata A Overvew of Cellular Automata: Cellular Automata are smple computer programs that geerate rows of black ad whte

More information

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance Chapter, Part A Aalyss of Varace ad Epermetal Desg Itroducto to Aalyss of Varace Aalyss of Varace: Testg for the Equalty of Populato Meas Multple Comparso Procedures Itroducto to Aalyss of Varace Aalyss

More information

The Occupancy and Coupon Collector problems

The Occupancy and Coupon Collector problems Chapter 4 The Occupacy ad Coupo Collector problems By Sarel Har-Peled, Jauary 9, 08 4 Prelmares [ Defto 4 Varace ad Stadard Devato For a radom varable X, let V E [ X [ µ X deote the varace of X, where

More information

C-1: Aerodynamics of Airfoils 1 C-2: Aerodynamics of Airfoils 2 C-3: Panel Methods C-4: Thin Airfoil Theory

C-1: Aerodynamics of Airfoils 1 C-2: Aerodynamics of Airfoils 2 C-3: Panel Methods C-4: Thin Airfoil Theory ROAD MAP... AE301 Aerodyamcs I UNIT C: 2-D Arfols C-1: Aerodyamcs of Arfols 1 C-2: Aerodyamcs of Arfols 2 C-3: Pael Methods C-4: Th Arfol Theory AE301 Aerodyamcs I Ut C-3: Lst of Subects Problem Solutos?

More information

Lecture 02: Bounding tail distributions of a random variable

Lecture 02: Bounding tail distributions of a random variable CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome

More information

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations HP 30S Statstcs Averages ad Stadard Devatos Average ad Stadard Devato Practce Fdg Averages ad Stadard Devatos HP 30S Statstcs Averages ad Stadard Devatos Average ad stadard devato The HP 30S provdes several

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Mache Learg Problem set Due Frday, September 9, rectato Please address all questos ad commets about ths problem set to 6.867-staff@a.mt.edu. You do ot eed to use MATLAB for ths problem set though

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statstcal Learg Teory Lecturer: Tegyu Ma Lecture #7 Scrbe: Bra Zag October 5, 08 Revew ad Overvew We wll frst gve a bref revew of wat as bee covered so far I te frst few lectures, we stated

More information

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem Joural of Amerca Scece ;6( Cubc Nopolyomal Sple Approach to the Soluto of a Secod Order Two-Pot Boudary Value Problem W.K. Zahra, F.A. Abd El-Salam, A.A. El-Sabbagh ad Z.A. ZAk * Departmet of Egeerg athematcs

More information

1 Onto functions and bijections Applications to Counting

1 Onto functions and bijections Applications to Counting 1 Oto fuctos ad bectos Applcatos to Coutg Now we move o to a ew topc. Defto 1.1 (Surecto. A fucto f : A B s sad to be surectve or oto f for each b B there s some a A so that f(a B. What are examples of

More information

ρ < 1 be five real numbers. The

ρ < 1 be five real numbers. The Lecture o BST 63: Statstcal Theory I Ku Zhag, /0/006 Revew for the prevous lecture Deftos: covarace, correlato Examples: How to calculate covarace ad correlato Theorems: propertes of correlato ad covarace

More information

1 Convergence of the Arnoldi method for eigenvalue problems

1 Convergence of the Arnoldi method for eigenvalue problems Lecture otes umercal lear algebra Arold method covergece Covergece of the Arold method for egevalue problems Recall that, uless t breaks dow, k steps of the Arold method geerates a orthogoal bass of a

More information

X ε ) = 0, or equivalently, lim

X ε ) = 0, or equivalently, lim Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece

More information

n -dimensional vectors follow naturally from the one

n -dimensional vectors follow naturally from the one B. Vectors ad sets B. Vectors Ecoomsts study ecoomc pheomea by buldg hghly stylzed models. Uderstadg ad makg use of almost all such models requres a hgh comfort level wth some key mathematcal sklls. I

More information

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections ENGI 441 Jot Probablty Dstrbutos Page 7-01 Jot Probablty Dstrbutos [Navd sectos.5 ad.6; Devore sectos 5.1-5.] The jot probablty mass fucto of two dscrete radom quattes, s, P ad p x y x y The margal probablty

More information

Derivation of 3-Point Block Method Formula for Solving First Order Stiff Ordinary Differential Equations

Derivation of 3-Point Block Method Formula for Solving First Order Stiff Ordinary Differential Equations Dervato of -Pot Block Method Formula for Solvg Frst Order Stff Ordary Dfferetal Equatos Kharul Hamd Kharul Auar, Kharl Iskadar Othma, Zara Bb Ibrahm Abstract Dervato of pot block method formula wth costat

More information

PTAS for Bin-Packing

PTAS for Bin-Packing CS 663: Patter Matchg Algorthms Scrbe: Che Jag /9/00. Itroducto PTAS for B-Packg The B-Packg problem s NP-hard. If we use approxmato algorthms, the B-Packg problem could be solved polyomal tme. For example,

More information

Special Instructions / Useful Data

Special Instructions / Useful Data JAM 6 Set of all real umbers P A..d. B, p Posso Specal Istructos / Useful Data x,, :,,, x x Probablty of a evet A Idepedetly ad detcally dstrbuted Bomal dstrbuto wth parameters ad p Posso dstrbuto wth

More information

5 Short Proofs of Simplified Stirling s Approximation

5 Short Proofs of Simplified Stirling s Approximation 5 Short Proofs of Smplfed Strlg s Approxmato Ofr Gorodetsky, drtymaths.wordpress.com Jue, 20 0 Itroducto Strlg s approxmato s the followg (somewhat surprsg) approxmato of the factoral,, usg elemetary fuctos:

More information

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables Joural of Sceces, Islamc Republc of Ira 8(4): -6 (007) Uversty of Tehra, ISSN 06-04 http://sceces.ut.ac.r Complete Covergece ad Some Maxmal Iequaltes for Weghted Sums of Radom Varables M. Am,,* H.R. Nl

More information

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic Covergece of the Desrozers scheme ad ts relato to the lag ovato dagostc chard Méard Evromet Caada, Ar Qualty esearch Dvso World Weather Ope Scece Coferece Motreal, August 9, 04 o t t O x x x y x y Oservato

More information

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura Statstcs Descrptve ad Iferetal Statstcs Istructor: Dasuke Nagakura (agakura@z7.keo.jp) 1 Today s topc Today, I talk about two categores of statstcal aalyses, descrptve statstcs ad feretal statstcs, ad

More information

Maps on Triangular Matrix Algebras

Maps on Triangular Matrix Algebras Maps o ragular Matrx lgebras HMED RMZI SOUROUR Departmet of Mathematcs ad Statstcs Uversty of Vctora Vctora, BC V8W 3P4 CND sourour@mathuvcca bstract We surveys results about somorphsms, Jorda somorphsms,

More information