Distributed Strongly Convex Optimization

Size: px
Start display at page:

Download "Distributed Strongly Convex Optimization"

Transcription

1 Distributed Strogly Covex Optimizatio Kostatios I. siaos Departmet of Electrical ad Computer Egieerig McGill Uiversity Motreal, Quebec H3A 0E9 ad arxiv:07.303v cs.dc] 0 Jul 0 Michael G. Rabbat Departmet of Electrical ad Computer Egieerig McGill Uiversity Motreal, Quebec H3A 0E9 michael.rabbat@mcgill.ca Abstract A lot of effort has bee ivested ito characterizig the covergece rates of gradiet based algorithms for o-liear covex optimizatio. Recetly, motivated by large datasets ad problems i machie learig, the iterest has shifted towards distributed optimizatio. I this wor we preset a distributed algorithm for strogly covex costraied optimizatio. Each ode i a etwor of computers coverges to the optimum of a strogly covex, L-Lipchitz cotiuous, separable objective at a rate O log where is the umber of iteratios. his rate is achieved i the olie settig where the data is revealed oe at a time to the odes, ad i the batch settig where each ode has access to its full local dataset from the start. he same covergece rate is achieved i expectatio whe the subgradiets used at each ode are corrupted with additive zero-mea oise. I. INRODUCION I this wor we focus o solvig optimizatio problems of the form miimize w W F w = f t w where each fuctio f w, f w,..., is covex over a covex set W R d. his formulatio applies widely i machie learig scearios, where f t w measures the loss of model w with respect to data poit t, ad F w is the average loss over data poits. I particular, we are iterested i the behavior of olie distributed optimizatio algorithms for this sort of problem as the umber of data poits teds to ifiity. We describe a distributed algorithm which, for strogly covex fuctios f t, coverges at a rate O. o the best of our owledge t= log this is the first distributed algorithm to achieve this coverge rate for costraied optimizatio without relyig o July 3, 0 DRAF

2 smoothess assumptios o the objective or o-trivial commuicatio mechaisms betwee the odes. he result is true both i the olie ad the batch optimizatio settig. Whe faced with a o-liear covex optimizatio problem, gradiet-based methods ca be applied to fid the solutio. he behavior of these algorithms is well-uderstood i the sigle-processor cetralized settig. Uder the assumptio that the objective is L-Lipschitz cotiuous, projected gradiet descet-type algorithms coverge at a rate O ], ]. his rate is achieved both i a olie settig where the f t s are revealed to the algorithm sequetially ad i the batch settig where all f t are ow i advace. If the cost fuctios are also strogly covex the gradiet algorithms ca achieve liear rates, O, i the batch settig 3] ad early-liear rates, O log, i the olie settig 4]. Uder additioal smoothess assumptios, such as Lipschitz cotiuous gradiets, the same rate of covergece ca also be achieved by secod order methods i the olie settig 5], 6], while accelerated methods ca achieve a quadratic rate i the batch settig; see 7] ad refereces therei. he aim of this wor is to exted the aforemetioed results to the distributed settig where a etwor of processors joitly optimize a similar objective. Assumig the etwor is arraged as a expader graph with costat spectral gap, for geeral covex cost fuctios that are oly L-Lipschitz cotiuous, the rate at which existig algorithms o a etwor of processors will all reach the optimum value is O log, i.e., similar to the optimal sigle processor algorithms up to a logarithmic factor 8], 9]. his is true both i a batch settig ad i a olie settig, eve whe the gradiets are corrupted by oise. he techique proposed i 0] maes log use of mii-batches to obtai asymptotic rates O for olie optimizatio of smooth cost fuctios that have Lipschitz cotiuous gradiets corrupted by bouded variace oise, ad O for smooth strogly covex fuctios. However, this techique requires that each ode exchage messages with every other ode at the ed of each iteratio. Fially, if the objective fuctio is strogly covex ad three times differetiable, a distributed versio of Nesterov s accelerated method ] achieves a rate of O for ucostraied problems i the batch settig, but the depedece o is ot characterized. he algorithm preseted i this paper achieves a rate O log log for strogly covex fuctios. Our formulatio allows for covex costraits i the problem ad assumes the objective fuctio is Lipschitz cotiuous ad strogly covex; o higher-order smoothess assumptios are made. Our algorithm wors i both the olie ad batch settig ad it scales early-liearly i umber of iteratios for etwor topologies with fast iformatio diffusio. I additio, at each iteratio odes are oly required to exchage messages with a subset of other odes i the etwor their eighbors. he rest of the paper is orgaized as follows. Sectio II itroduces otatio ad formalizes the problem. Sectio III describes the proposed algorithm ad states our mai results. hese results are prove i Sectio IV, ad Sectio V exteds the aalysis to the case where gradiets are oisy. Sectio VI presets the results of umerical experimets illustratig the performace of the algorithm, ad the paper cocludes i Sectio VII. July 3, 0 DRAF

3 3 II. ONLINE CONVEX OPIMIZAION Cosider the problem of miimizig a covex fuctio F w over a covex set W R d. Of particular iterest is the settig where the algorithm sequetially receives oisy samples of the subgradiets of F w. his settig arises i olie loss miimizatio for machie learig whe the data arrives as a steam ad the subgradiet is evaluated usig a idividual data poit at each step ]. Suppose the tth data poit xt X R d is draw i.i.d. from a uow distributio D, ad let f t w = fw, xt deote the loss of this data poit with respect to a particular model w. I this settig oe would lie to fid the model w that miimizes the expected loss E D fw, x], possibly with the costrait that w be restricted to a model space W. Clearly, as, the objective F w = t= f t w E D fw, x], ad so if the data stream is fiite this motivates miimizig the empirical loss F w. A olie covex optimizatio algorithm observes a data stream x, x,..., ad sequetially chooses a sequece of models w, w,..., after each observatio. Upo choosig wt, the algorithm receives a subgradiet gt f t wt. he goal is for the sequece w, w,... to coverge to a miimizer w of F w. he performace of a olie optimizatio algorithm is measured i terms of the regret: R = f t wt mi f t w. w W t= he regret measures the gap betwee the cost accumulated by the olie optimizatio algorithm over steps ad that of a model chose to simultaeously miimize the total regret over all cost terms. If the costs f t are allowed to be arbitrary covex fuctios the it ca be show that the best achievable rate for ay olie optimizatio algorithm is R = Ω, ad this boud is also achievable ]. he rate ca be sigificatly improved if the cost fuctios has more favourable properties. t= A. Assumptios Assumptio : We assume for the rest of the paper that each cost fuctio f t w = fw, xt is σ-strogly covex for all xt X ; i.e., there is a σ > 0 such that for all θ 0, ] ad all u, w W f t θu θw θf t u θf t w σ θ θ u w. 3 If each f t w is σ-strogly covex, it follows that F w is also σ-strogly covex. Moreover, if F w is strogly covex the it is also strictly covex, ad so F w has a uique miimizer which we deote by w. Assumptio : We also assume that the subgradiets gt of each cost fuctio f t are bouded by a ow costat L > 0; i.e., gt L where is the l Euclidea orm. B. Example: raiig a Classifier For a specific example of this setup, cosider the problem of traiig a SVM classifier usig a hige-loss with l regularizatio 4]. I this case, the data stream cosists of pairs {xt, yt} such that xt X ad July 3, 0 DRAF

4 4 yt {, }. he goal is to miimize the misclassificatio error as measured by the l -regularized hige loss. Formally, we wish to fid the w W R d that solves σ miimize w W w m max{0, yt w, xt } 4 m t= which is σ-strogly covex. For these types of problems, usig a sigle-processor stochastic gradiet descet algorithm, oe ca achieve R = O log R 4] or = O ] by usig differet update schemes. C. Distributed Olie Covex Optimizatio I this paper, we are iterested i solvig olie covex optimizatio problems with a etwor of computers. he computers are orgaized as a etwor G = V, E with V = odes, ad messages are oly exchaged betwee odes coected with a edge i E. Assumptio 3: I this wor we assume that G is coected ad udirected. Each ode i receives a stream of data x i, x i,..., similar to the serial case, ad the odes must collaborate to miimize the etwor-wide objective F w = t= i= fi t w, 5 where f t i w = fw, x it is the cost icurred at processor i at time t. I the distributed settig, the defiitio of regret is aturally exteded to R = t= i= fw i t, x i t mi w W t= i= fw, x i t. 6 For geeral covex cost fuctios, the distributed algorithm proposed i 8] has bee prove to have a average regret that decreases at a rate, similar to the serial case, ad this result holds eve whe the algorithm receives oisy, ubiased, observatios of the true subgradiets at each step. I the ext sectio, we preset a distributed algorithm that achieves a early-liear rate of decrease of the average regret up to a logarithmic factor whe the cost fuctios are strogly covex. III. ALGORIHM Nodes must collaborate to solve the distributed olie covex optimizatio problem described i the previous sectio. o that ed, the etwor is edowed with a cosesus matrix P which respects the structure of G, i the sese that P ] ji = 0 if i, j / E. We assume that P is doubly stochastic, although geeralizatios to the case where P is row stochastic or colum stochastic but ot both are also possible 3], 4]. A detailed descriptio of the proposed algorithm, distributed olie gradiet descet DOGD, is give i Algorithm. I the algorithm, each ode performs a total of updates. Oe update ivolves processig a sigle data poit x i t at each processor. he updates are performed over rouds, ad s updates are performed i roud Although the hige loss itself is ot strogly covex, addig a strogly covex regularizer maes the overall cost fuctio strogly covex. July 3, 0 DRAF

5 5 Algorithm DOGD : Iitialize: = σ, a =, =, z i = w i = 0 : 3: while s= s do Each ode i repeats 4: for t = to do 5: Sed/receive zi t ad z j t to/from eighbors 6: Obtai ext subgradiet g i t w f t i w i t 7: zi t = j= p ijzj t a g i t 8: wi t = Π W z i t ] 9: ed for 0: w i = w i : z i = w i : ŵ i = t= w i t 3: 4: 5: a a 6: = 7: ed while s. he mai steps withi each roud lies 9 ivolve updatig a accumulated gradiet variable, zi t, by simultaeously icorporatig the iformatio received from eighborig odes ad taig a local gradiet-descet lie step. he accumulated gradiet is projected oto the costrait set to obtai wi t, where Π W z] = argmi w z 7 w W deotes the Euclidea projectio of z oto W, ad the this projected value is merged ito a ruig average ŵ i r. he step size parameter a remais costat withi each roud, ad the step size is reduced by half at the ed of each roud. he umber of updates per roud doubles from oe roud to the ext. Note that the algorithm proposed here differs from the distributed dual averagig algorithm described i 8], where a proximal projectio is used rather tha the Euclidea projectio. Also, i cotrast to the distributed subgradiet algorithms described i 5], DOGD maitais a accumulated gradiet variable i zi t which is updated usig {zj t} as opposed to the primal feasible variables {w j t}. Fially, ey to achievig fast covergece is the expoetial decrease of the learig rate after performig a expoetially icreasig umber of gradiet steps together with a proper iitializatio of the learig rate. he ext sectio provides theoretical guaratees o the performace of DOGD. July 3, 0 DRAF

6 6 IV. CONVERGENCE ANALYSIS Our mai covergece result, stated below, guaratees that the average regret decreases at a rate which is early liear. heorem : Let Assumptios 3 hold ad suppose that the cosesus matrix P is doubly stochastic with costat λ. Let w be the miimizer of F w. he the sequece {ŵi } produced by odes ruig DOGD to miimize F w obeys F ŵ i F w = O log, 8 where = log / is the umber of rouds executed durig a total of gradiet steps per ode, ad ŵ i is the ruig average maitaied locally at each ode. Remar : We state the result for the case where λ is costat. his is the case whe G is, e.g., a complete graph or a expader graph 6]. For other graph topologies where λ shris with ad cosesus does ot coverge fast, the covergece rate depedece o is goig to be worse due to a factor λ i the deomiator; see the proof of heorem below for the precise depedece o the spectral gap λ. Remar : he theorem characterizes performace of the olie algorithm DOGD, where the data ad cost fuctios f t i are processed sequetially at each ode i order to miimize a objective of the form F w = i= fi t w. 9 However, as poited out i 4], if the etire dataset is available i advace, we ca use the same scheme to do batch miimizatio by effectively settig fi tw = f i w, where f i w is the objective fuctio accoutig for the etire dataset available to ode i. hus, the same result holds immediately for a batch versio of DOGD. he remaider of this sectio is devoted to the proof of heorem. Our aalysis follows argumets that ca be foud i ], 8], ] ad refereces therei. We first state ad prove some itermediate results. t= A. Properties of Strogly Covex Fuctios Recall the defiitio of σ-strog covexity give i Assumptio. A direct cosequece of this defiitio is that if F w is σ-strogly covex the F w F w σ w w. 0 Strog covexity ca be combied with the assumptios above to upper boud the differece F w F w for a arbitrary poit w W. Lemma : Let w be the miimizer of F w. For all w W, we have F w F w L σ. Proof: For ay subgradiet g of F at w, by covexity we ow that F w F w g, w w. It follows from Assumptio that F w F w L w w. Furthermore, from Assumptio, we obtai that σ w w L w w or w w L σ. As a result, F w F w L σ. July 3, 0 DRAF

7 7 B. he Lazy Projectio Algorithm he aalysis of DOGD below ivolves showig that the average state, i= w i t, evolves accordig to the so-called sigle processor lazy projectio algorithm ], which we discuss ext. he lazy projectio algorithm is a olie covex optimizatio scheme for the serial problem discussed at the begiig of Sectio II. A sigle processor sequetially chooses a ew variable wt ad receives a subgradiet gt of fwt, xt. he algorithm chooses wt by repeatig the steps By uwrappig the recursive form of, we get zt =zt agt wt =Π W zt ]. zt = a t gt z. 3 s= he followig is a typical result for subgradiet descet-style algorithms, ad is useful towards evetually characterizig how the regret accumulates. Its proof ca be foud i the appedix of the exteded versio of ]. heorem Zievich ]: Let w W, let a > 0, ad set z = w. After rouds of the serial lazy projectio algorithm, we have gt, wt w w w t= heorem immediately yields the same boud for the regret of lazy projectio ]. C. Evolutio of Networ-Average Quatities i DOGD a al. 4 We tur our attetio to Algorithm. A stadard approach to studyig covergece of distributed optimizatio algorithms, such as DOGD, is to eep trac of the discrepacy betwee every ode s state ad a average state sequece defied as z t = zi t ad w t = Π W z t ]. 5 i= Observe that z t evolves i a simple recursive maer, z t = zi t i= 6 = p ij zj t a g i t 7 = i= j= j= zj t p ij a =zt a = a t s= i= g i t 8 i= g i t 9 i= g i s i= zi 0 i= July 3, 0 DRAF

8 8 where equatio 9 holds sice P is doubly stochastic. Notice cf. eq. 3 that the states {z t, w t} evolve accordig to the lazy projectio algorithm with gradiets gt = i= g it ad learig rate a. I the sequel, we will also use a aalytic expressio for zi t derived by bac substitutig i its recursive update equatio. After some algebraic maipulatio, we obtai t zi t = a s= j= P t s ] ij g js a g i t P t ] ij zj, j= ad sice the projectio i o-expasive ad zi = 0, i, z i = w i = w i = ΠW z i ] D. Aalysis of Oe Roud of DOGD z i 3 a P s ] g ij is t= i= a g i a L ] P j= ij ] P j= ij z j 4 z j 5 6 L a s s. 7 s= Next, we focus o boudig the amout of regret accumulated durig the th roud of DOGD lies 5 of Algorithm durig which the learig rate remais fixed at a. Usig Assumptios,, ad the triagle iequality July 3, 0 DRAF

9 9 we have that t= For the first summad we have F w i t F w ] = F w t F w F wi t F w t ] 8 t= F w t F w L w i t w t ] 9 t= t= t= fi wi t f i w ] i= fi w t f i wi t ] i= L w i t w t 30 t= g i t, wi t w t= i= }{{} A L w t wi t t= i= L w i t w t. 3 t= A = t= t= t= g i t, wi t w 3 i= g i t, w t w i= g i t, wi t w t 33 i= g i t, w t w t= i= }{{} A L w i t w t. 34 t= i= July 3, 0 DRAF

10 0 o boud term A we ivoe heorem for the average sequeces {w t} ad {z t}. t= A = t= = t= t= g i t, w t w 35 i= g i t, Π W z t ] w 36 i= = gt, Π W z t ] w 37 w w a a w w = a L a F w i t F w ] i= g it Collectig ow all the partial results ad bouds, so far we have show that w w t= a L a L w i t w t i= L w i t w t. 40 ad sice the projectio operator is o-expasive, we have w F wi t F w w ] a L a t= t= t= L z i t z t 4 i= L zi t z t. t= he first two terms are stadard for subgradiet algorithms usig a costat step size. he last two terms deped o the error betwee each ode s iterate z i t ad the etwor-wide average z t, which we boud ext. E. Boudig the Networ Error What remais is to boud the term z i t z t which describes a error iduced by the etwor sice the differet odes do ot agree o the directio towards the optimum. By recallig that P is doubly stochastic ad maipulatig the recursive expressios ad 0 for z i t ad z t usig argumets similar to those i 8], July 3, 0 DRAF

11 4], we obtai the boud, z i t z t t a L P t s ] ij a L s= j= P t ] ij zj 4 j= t =a L P t s ] i,: a L s= P t ] ij z j. 43 j= he l orm ca be bouded usig Lemma, which is stated ad prove i the Appedix, ad usig 7 we arrive at z i t z t a L log 3a L L s= a s s 44 λ where λ is the secod largest eigevalue of P. Usig this boud i equatio 4, alog with the fact that F w is covex, we coclude that F ŵ i F w =F t= w i t F w 45 F w i t F w ] 46 where w = Π W i= z i ]. t= w w a L a L a 6 log ] 9 λ 3L s= a s s, 47 F. Aalysis of DOGD over Multiple Rouds As our last itermediate step, we must cotrol the learig rate ad update of from roud-to-roud to esure liear covergece of the error. From strog covexity of F we have ad thus w w F w F w σ F ŵ i F w F w F w σa L a log ] 9 λ 48 3L s= a s s. 49 July 3, 0 DRAF

12 Now, from heorem 3 i ] which is a direct cosequece of heorem for the average sequece w viewed as a sigle processor lazy projectio algorithm, we have that after executig gradiet steps i roud, w F w F w w a L a ad by repeatedly usig strog covexity ad heorem we see that F w F w F w F w a L σa F w F w j=0 σa j j a j L j s= σa s s. 53 j= Now, let us fix positive itegers b ad c, ad suppose we use the followig rules to determie the step size ad umber of updates performed withi each roud: a = a b Combiig 53 with 49 ad ivoig Lemma, we have F ŵ i F w = = a b 54 =c = = c. 55 j= L a b L σ j=0 σa c b a L σa c b b j j s=0 log c j s ] 9 λ s 3L s= a c b c. 56 o esure covergece to zero, we eed c b ad σa > or a > σ. Give these restrictios, let us mae the choices a =, =, c = b =. 57 σ July 3, 0 DRAF

13 3 o simplify the expositio, let us assume that = σ F ŵ i F w j= L is a iteger. Usig the selected values, we obtai σ j=0 L j L j j s=0 s log σ ] 9 λ s 3L s= L σ L 58 L j j j= log σ ] 9 λ L σ L L log σ ] 9 λ L σ L L log σ ] 9 λ Fially, we have all we eed to complete the aalysis of Algorithm. 3L 59 6L 60 6L. 6 G. Proof of heorem Suppose we ru Algorithm for total steps at each ode. his allows for rouds, where is determied by solvig i i= i= i log. 6 July 3, 0 DRAF

14 4 Usig this value for we see that F ŵ i F w L L σ L log σ 9 6L λ 63 L σ L log L log σ 9 λ 6L log =O = O λ log whe λ is costat ad does ot scale with, ad this cocludes the proof of heorem., 64 V. EXENSION O SOCHASIC OPIMIZAION he proof preseted i the previous sectio ca easily be exteded to the case where each ode receives a radom estimate ĝt of the gradiet, satisfyig Eĝt] = gt, istead of receivig gt directly. We assume that oisy gradiets still have bouded variace i.e., E ĝ i t ] L. I this settig, istead of equatio 35, we have A = t= = t= t= g i t, w t w 65 i= ĝ i t, w t w i= g i t ĝ i t, w t w. 66 i= However, the proof of heorem does ot deped o the gradiets beig correct; rather, it holds for oisy gradiets ĝ t as well. Moreover, we have E ĝ i t ] L, ad by Hölder s iequality E ĝ i t ĝ j t ] L. hus, ] E ĝ i t E ĝ i t ĝ j t ] L. 67 i= i,j= July 3, 0 DRAF

15 5 hus, ivoig heorem, if the ew data ad thus the subgradiets are idepedet of the past, ad sice Eĝ i t] = g i t, we have w w EA ] a L a E g i t ĝ i t, w t w ] 68 t= i= w w = a L a E g i t ĝ i t], w t w 69 t= i= w w = a L. 70 a Furthermore, the etwor error boud holds i expectatio as well, i.e., E w t wi t ] E z t zi t ] a L log 3a L L s= a s s 7 λ Collectig all these observatios we have show that, i expectatio, E F ŵ i which, after usig the update rules for a F w ] w w a L a L a 6 log ] 9 λ 3L s= a s s 7 ad, is exactly the same rate as before. We ote however that there may still be room for improvemet i the distributed stochastic optimizatio settig sice ] describes a sigle-processor algorithm that coverges at a rate O. VI. SIMULAION o illustrate the performace of DOGD we simulate olie traiig of a classifier by solvig the problem 4 usig a etwor of 0 odes arraged as a radom geometric graph. Each ode is give = 600 data poits, ad the iput dimesio is d = 00. We set σ = 0. ad geerate the data from a stadard ormal distributio ad classify them as or depedig o their relative positio to a radomly draw hyperplae i R d. As we see i Figure, DODG miimizes the objective much faster tha Distributed Dual Averagig DDA 8] which has a covergece rate of O log. DDA is simulated usig the learig rate that is suggested i 8]. We have observed that boostig this learig rate may yield faster covergece, but still ot as fast as DOGD. Figure also shows the performace of a versio of Fast Distributed Gradiet Descet FDGD ]. As we ca see, FDGD fails to coverge i a olie or stochastic settig ad eds up oscillatig. July 3, 0 DRAF

16 6.5 DOGD FDGD DDA i= F wit Iteratios Fig.. Optimizatio of a d = 00 dimesioal problem of the form 4 with a radom etwor of 0 odes. Our proposed algorithm DOGDred coverges faster tha DDAgree as expected from the istead of i the deomiator of the covergece rate boud. FDGDblac, is uable to coverge i the olie problem. VII. FUURE WORK I this paper we have proposed ad aalyzed a ovel distributed optimizatio algorithm which we call Distributed Olie Gradiet Descet DOGD. Our aalysis shows that DOGD coverges at a rate O log whe solvig olie, stochastic or batch costraied covex optimizatio problems if the objective fuctio is strogly covex. his rate is optimal i the umber of iteratios for the olie ad batch settig ad slower tha a serial algorithm oly by a logarithmic factor i the stochastic optimizatio settig. I its curret form, DOGD requires the odes i the etwor to exchage gradiet iformatio at every iteratio. Our prelimiary ivestigatio suggests that gradually performig more ad more updates betwee each commuicatio ca speed up distributed optimizatio algorithms i the batch settig whe oe explicitly accouts for the time required to commuicate data. Our future wor will carry out a similar aalysis for olie ad stochastic optimizatio algorithms. APPENDIX Lemma : If P is a doubly stochastic matrix defied over a strogly coected graph G = V, E with V = odes so that p ji = 0 if i, j E, the for ay t, t P t s] i,: log s= 73 λ where λ is the secod largest eigevalue of P. Proof: If the cosesus matrix P is doubly stochastic it is straightforward to show that P t as t. Moreover, from stadard Perro-Frobeius is it easy to show see e.g., 7] P t] i,: = P t] i,: t λ 74 V July 3, 0 DRAF

17 so i our case P t s] t s. λ i,: Next, demad that the right had side boud is less tha δ with δ to be determied: λ t s δ t s log δ log λ. 75 So with the choice δ =, P t s] i,: = 76 if t s log δ log = ˆt. Whe s is large ad t s < ˆt we tae λ P t s]. he desired i,: boud is ot obtaied as follows t P t s] t ˆt i,: = P t s] i,: 77 s= s= t s=t ˆt t ˆt t s= P t s] i,: s=t ˆt 7 78 t ˆt ˆt ˆt 79 Sice t we ow that t ˆt <. Moreover, log λ λ. Usig there two fact we arrive at the result. he same boud is true for ay idividual etry of P t approachig. REFERENCES ] M. A. Zievich, Olie covex programmig ad geeralized ifiitesimal gradiet ascet, i 0th Iteratioal Coferece o Machie Learig ICML, 003. ] Y. Nesterov, Primal-dual subgradiet methods for covex problems, Mathematical Programmig Series B, vol. 0, pp. 59, ] L. Bottou, Large-scale machie learig with stochastic gradiet descet, i Proceedigs of the 9th Iteratioal Coferece o Computatioal Statistics, Y. Lechevallier ad G. Saporta, Eds., Paris, Frace, August 00, pp ] S. Shalev-Shwartz ad Y. Siger, Logarithmic regret algorithms for strogly covex repeated games, i he Hebrew Uiversity, ] E. Haza, A. Kalai, S. Kale, ad A. Agarwal, Logarithmic regret algorithms for olie covex optimizatio, i 9 th COL, 006, pp ] P. L. Bartlett, E. Haza, ad A. Rahli, Adaptive olie gradiet descet, i Advaces i Neural Iformatio Processig Systems 0, J. C. Platt, D. Koller, Y. Siger, ad S. Roweis, Eds. MI Press, ] P. seg, O accelerated proximal gradiet methods for covex-cocave optimizatio, SIAM Joural o Optimizatio, vol. Submitted, ] J. Duchi, A. Agarwal, ad M. Waiwright, Dual averagig for distributed optimizatio: Covergece aalysis ad etwor scalig, IEEE rasactios o Automatic Cotrol, vol. 57, o. 3, pp , 0. 9] A. Nedic, A. Ozdaglar, ad P. A. Parrilo, Costraied cosesus ad optimizatio i multi-aget etwors, IEEE rasactios o Automatic Cotrol, vol. 55, o. 4, pp , 00. 0] O. Deel, R. Gilad-Bachrach, O. Shamir, ad L. Xiao, Optimal distributed olie predictio usig mii-batches. Joural of Machie Learig Research, vol. 3, pp. 65 0, 0. ] D. Jaovetic, J. Xavier, ad J. M. Moura, Fast distributed gradiet methods, arxiv:.97v, 0. July 3, 0 DRAF

18 8 ] E. Haza ad S. Kale, Beyod the regret miimizatio barrier: a optimal algorithm for stochastic strogly-covex optimizatio, i 4th Aual Coferece o Learig heory COL, 0. 3] K. I. siaos, S. Lawlor, ad M. G. Rabbat, Push-sum distributed dual averagig for covex optimizatio, i 5st IEEE Coferece o Decisio ad Cotrol, 0. 4] K. I. siaos ad M. G. Rabbat, Distributed dual averagig for covex optimizatio uder commuicatio delays, i America Cotrol Coferece ACC, 0. 5] S. S. Ram, A. Nedic, ad V. V. Veeravalli, Distributed stochastic subgradiet projectio algorithms for covex optimizatio, Joural of Optimizatio heory ad Applicatios, vol. 47, o. 3, pp , 0. 6] O. Reigold, S. Vadha, ad A. Wigderso, Etropy waves, the zig-zag graph product, ad ew costat-degree expaders, Aals of Mathematics, vol. 55, o., pp , 00. 7] P. Diacois ad D. Strooc, Geometric bouds for eigevalues of marov chais, he Aals of Applied Probability, vol., o., pp. 36 6, 99. July 3, 0 DRAF

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Mixed Optimization for Smooth Functions

Mixed Optimization for Smooth Functions Mixed Optimizatio for Smooth Fuctios Mehrdad Mahdavi Liju Zhag Rog Ji Departmet of Computer Sciece ad Egieerig, Michiga State Uiversity, MI, USA {mahdavim,zhaglij,rogji}@msu.edu Abstract It is well ow

More information

Introduction to Optimization Techniques. How to Solve Equations

Introduction to Optimization Techniques. How to Solve Equations Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Differentiable Convex Functions

Differentiable Convex Functions Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Fastest mixing Markov chain on a path

Fastest mixing Markov chain on a path Fastest mixig Markov chai o a path Stephe Boyd Persi Diacois Ju Su Li Xiao Revised July 2004 Abstract We ider the problem of assigig trasitio probabilities to the edges of a path, so the resultig Markov

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Supplemental Material: Proofs

Supplemental Material: Proofs Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special

More information

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data Doubly Stochastic Primal-Dual Coordiate Method for Regularized Empirical Risk Miimizatio with Factorized Data Adams Wei Yu, Qihag Li, Tiabao Yag Caregie Mello Uiversity The Uiversity of Iowa weiyu@cs.cmu.edu,

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Push-Sum Distributed Dual Averaging for Convex Optimization

Push-Sum Distributed Dual Averaging for Convex Optimization Push-Sum Distributed Dual Averagig for Covex Optimizatio Kostatios I. Tsiaos, Sea Lawlor ad Michael G. Rabbat Abstract Recetly there has bee a sigificat amout of research o developig cosesus based algorithms

More information

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc)

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc) Classificatio of problem & problem solvig strategies classificatio of time complexities (liear, arithmic etc) Problem subdivisio Divide ad Coquer strategy. Asymptotic otatios, lower boud ad upper boud:

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Lecture #20. n ( x p i )1/p = max

Lecture #20. n ( x p i )1/p = max COMPSCI 632: Approximatio Algorithms November 8, 2017 Lecturer: Debmalya Paigrahi Lecture #20 Scribe: Yua Deg 1 Overview Today, we cotiue to discuss about metric embeddigs techique. Specifically, we apply

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,

More information

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

Analysis of Algorithms. Introduction. Contents

Analysis of Algorithms. Introduction. Contents Itroductio The focus of this module is mathematical aspects of algorithms. Our mai focus is aalysis of algorithms, which meas evaluatig efficiecy of algorithms by aalytical ad mathematical methods. We

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Markov Decision Processes

Markov Decision Processes Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Chapter 2 The Monte Carlo Method

Chapter 2 The Monte Carlo Method Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful

More information

The log-behavior of n p(n) and n p(n)/n

The log-behavior of n p(n) and n p(n)/n Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

A class of spectral bounds for Max k-cut

A class of spectral bounds for Max k-cut A class of spectral bouds for Max k-cut Miguel F. Ajos, José Neto December 07 Abstract Let G be a udirected ad edge-weighted simple graph. I this paper we itroduce a class of bouds for the maximum k-cut

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice 0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct

More information

Random assignment with integer costs

Random assignment with integer costs Radom assigmet with iteger costs Robert Parviaie Departmet of Mathematics, Uppsala Uiversity P.O. Box 480, SE-7506 Uppsala, Swede robert.parviaie@math.uu.se Jue 4, 200 Abstract The radom assigmet problem

More information

Detailed derivation of multiplicative update rules for NMF

Detailed derivation of multiplicative update rules for NMF 1 Itroductio Detailed derivatio of multiplicative update rules for NMF Jua José Burred March 2014 Paris, Frace jjburred@jjburredcom The goal of No-egative Matrix Factorizatio (NMF) is to decompose a matrix

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

Monte Carlo Optimization to Solve a Two-Dimensional Inverse Heat Conduction Problem

Monte Carlo Optimization to Solve a Two-Dimensional Inverse Heat Conduction Problem Australia Joural of Basic Applied Scieces, 5(): 097-05, 0 ISSN 99-878 Mote Carlo Optimizatio to Solve a Two-Dimesioal Iverse Heat Coductio Problem M Ebrahimi Departmet of Mathematics, Karaj Brach, Islamic

More information

Lainiotis filter implementation. via Chandrasekhar type algorithm

Lainiotis filter implementation. via Chandrasekhar type algorithm Joural of Computatios & Modellig, vol.1, o.1, 2011, 115-130 ISSN: 1792-7625 prit, 1792-8850 olie Iteratioal Scietific Press, 2011 Laiiotis filter implemetatio via Chadrasehar type algorithm Nicholas Assimais

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem This is the Pre-Published Versio. A New Solutio Method for the Fiite-Horizo Discrete-Time EOQ Problem Chug-Lu Li Departmet of Logistics The Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog Phoe: +852-2766-7410

More information

Kinetics of Complex Reactions

Kinetics of Complex Reactions Kietics of Complex Reactios by Flick Colema Departmet of Chemistry Wellesley College Wellesley MA 28 wcolema@wellesley.edu Copyright Flick Colema 996. All rights reserved. You are welcome to use this documet

More information

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Lecture 8: Solving the Heat, Laplace and Wave equations using finite difference methods

Lecture 8: Solving the Heat, Laplace and Wave equations using finite difference methods Itroductory lecture otes o Partial Differetial Equatios - c Athoy Peirce. Not to be copied, used, or revised without explicit writte permissio from the copyright ower. 1 Lecture 8: Solvig the Heat, Laplace

More information

Time-Domain Representations of LTI Systems

Time-Domain Representations of LTI Systems 2.1 Itroductio Objectives: 1. Impulse resposes of LTI systems 2. Liear costat-coefficiets differetial or differece equatios of LTI systems 3. Bloc diagram represetatios of LTI systems 4. State-variable

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

ENGI Series Page 6-01

ENGI Series Page 6-01 ENGI 3425 6 Series Page 6-01 6. Series Cotets: 6.01 Sequeces; geeral term, limits, covergece 6.02 Series; summatio otatio, covergece, divergece test 6.03 Stadard Series; telescopig series, geometric series,

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Fast Rates for Regularized Objectives

Fast Rates for Regularized Objectives Fast Rates for Regularized Objectives Karthik Sridhara, Natha Srebro, Shai Shalev-Shwartz Toyota Techological Istitute Chicago Abstract We study covergece properties of empirical miimizatio of a stochastic

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Lecture 12: February 28

Lecture 12: February 28 10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

On forward improvement iteration for stopping problems

On forward improvement iteration for stopping problems O forward improvemet iteratio for stoppig problems Mathematical Istitute, Uiversity of Kiel, Ludewig-Mey-Str. 4, D-24098 Kiel, Germay irle@math.ui-iel.de Albrecht Irle Abstract. We cosider the optimal

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

Research Article A New Second-Order Iteration Method for Solving Nonlinear Equations

Research Article A New Second-Order Iteration Method for Solving Nonlinear Equations Abstract ad Applied Aalysis Volume 2013, Article ID 487062, 4 pages http://dx.doi.org/10.1155/2013/487062 Research Article A New Secod-Order Iteratio Method for Solvig Noliear Equatios Shi Mi Kag, 1 Arif

More information

Assignment 5: Solutions

Assignment 5: Solutions McGill Uiversity Departmet of Mathematics ad Statistics MATH 54 Aalysis, Fall 05 Assigmet 5: Solutios. Let y be a ubouded sequece of positive umbers satisfyig y + > y for all N. Let x be aother sequece

More information