Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses

Size: px
Start display at page:

Download "Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses"

Transcription

1 Prmal Method for ERM wth Flexble Mn-batchng Schemes and Non-convex Losses Domnk Csba Peter Rchtárk June 7, 205 Abstract In ths work we develop a new algorthm for regularzed emprcal rsk mnmzaton. Our method extends recent technques of Shalev-Shwartz 02/205], whch enable a dual-free analyss of SDCA, to arbtrary mn-batchng schemes. Moreover, our method s able to better utlze the nformaton n the data defnng the ERM problem. For convex loss functons, our complexty results match those of QUARTZ, whch s a prmal-dual method also allowng for arbtrary mn-batchng schemes. The advantage of a dual-free analyss comes from the fact that t guarantees convergence even for non-convex loss functons, as long as the average loss s convex. We llustrate through experments the utlty of beng able to desgn arbtrary mn-batchng schemes. Introducton Emprcal rsk mnmzaton (ERM s a very successful and mmensely popular paradgm n machne learnng, used to tran a varety of predcton and classfcaton models. Gven examples A,..., A n R d m, loss functons φ,..., φ n : R m R and a regularzaton parameter λ > 0, the L2-regularzed ERM problem s an optmzaton problem of the form ] P (w := φ (A w + λ n 2 w 2 ( mn w R d = Throughout the paper we shall assume that for each, the loss functon φ s l -smooth wth l > 0. That s, for all x, y R m and all n] := {, 2,..., n}, we have φ (x φ (y l x y. (2 Further, let L,..., L n > 0 be constants for whch the nequalty φ (A w φ (A z L w z (3 holds for all w, z R d and all and let L := max L. Note that we can always bound L l A. However, L can be better (smaller than l A. The authors acknowledge support from the EPSRC Grant EP/K02325X/, Accelerated Coordnate Descent Methods for Bg Data Optmzaton. School of Mathematcs, The Unversty of Ednburgh, Unted Kngdom (e-mal: cdomnk@gmal.com School of Mathematcs, The Unversty of Ednburgh, Unted Kngdom (e-mal: peter.rchtark@ed.ac.uk

2 . Background In the last few years, a lot of research effort was put nto desgnng new effcent algorthms for solvng ths problem (and some of ts modfcatons. The frenzy of actvty was motvated by the realzaton that SGD ], not so long ago consdered the state-of-the-art method for ERM, was far from beng optmal, and that new deas can lead to algorthms whch are far superor to SGD n both theory and practce. The methods that belong to ths category nclude SAG 2], SDCA 3], SVRG 4], S2GD 5], ms2gd 6], SAGA 7], S2CD 8], QUARTZ 9], ASDCA 0], prox-sdca ], IPROX-SDCA 2], A-PROX-SDCA 3], AdaSDCA 4], SDNA 5]. Methods analyzed for arbtrary mn-batchng schemes nclude NSync 6], ALPHA 7] and QUARTZ 9]. In order to fnd an ɛ-soluton n expectaton, state of the art (non-accelerated methods for solvng ( only need O((n + κ log(/ɛ steps, where each step nvolves the computaton of the gradent φ (A w for some randomly selected example. The quantty κ s the condton number. Typcally one has κ = max l A 2 l A 2 for methods pckng unformly at random, and κ = nλ for methods pckng usng a carefully desgned data-dependent mportance samplng. Computaton of such a gradent typcally nvolves work whch s equvalent to readng the example A, that s, O(nnz(A dm arthmetc operatons..2 Contrbutons In ths work we develop a new algorthm for the L2-regularzed ERM problem (. Our method extends a technque recently ntroduced by Shalev-Shwartz 8], whch enables a dual-free analyss of SDCA, to arbtrary mn-batchng schemes. That s, our method works at each teraton wth a random subset of examples, chosen n an..d. fashon from an arbtrary dstrbuton. Such flexble schemes are useful for varous reasons, ncludng the development of dstrbuted or robust varants of the method, desgn of mportance samplng for mprovng the complexty rate, desgn of a samplng whch s amed at obtanng effcences elsewhere, such us utlzng NUMA (non-unform memory access archtectures, and v streamlnng and speedng up the processng of each mn-batch by means of assgnng to each processor approxmately even workload so as to reduce dle tme (we do experments wth the latter setup. In comparson wth 8], our method s able to better utlze the nformaton n the data examples A,..., A n, leadng to a better data-dependent bound. For convex loss functons, our complexty results match those of QUARTZ 9] n terms of the rate (the logarthmc factors dffer. QUARTZ s a prmal-dual method also allowng for arbtrary mn-batchng schemes. However, whle 9] only characterze the decay of expected rsk, we also gve bounds for the sequence of terates. In partcular, we show that for convex loss functons, our method enjoys the rate (Theorem 2 ( max + l v log p λp n ( (L + λe (0, λɛ where p s the probablty that coordnate s updated n an teraton, v,..., v n > 0 are certan stepsze parameters of the method assocated wth the samplng and data (see (6, and E (0 s a constant dependng on the startng pont. For nstance, n the specal case pckng a sngle example at a tme unformly at random, we have p = /n and v = A 2, whereby we obtan one λ 2

3 of the O(n + κ log(/ɛ rates mentoned above. The other rate can be recovered usng mportance samplng. The advantage of a dual-free analyss comes from the fact that t guarantees convergence even for non-convex loss functons, as long as the average loss s convex. Ths s a step toward understandng non-convex models. In partcular, we show that for non-convex loss functons, our method enjoys the rate (Theorem ( max + L2 v ( (L + λd (0 p λ 2 log, p n λɛ where D (0 s a constant dependng on the startng pont. Fnally, we llustrate through experments wth chunkng a smple load balancng technque the utlty of beng able to desgn arbtrary mn-batchng schemes. 2 Algorthm We shall now descrbe the method (Algorthm. Algorthm dfsdca: Dual-Free SDCA wth Arbtrary Samplng Parameters: Samplng Ŝ, stepsze θ Intalzaton α (0,..., α(0 n R m, set w (0 = λn for t do Sample a set S t accordng to Ŝ for S t do α (t = α (t θp ( φ (A w(t + α (t w (t = w (t S t θ(nλp A ( φ (A w(t + α (t n = A α (0, p = Prob( Ŝ The method encodes a famly of algorthms, dependng on the choce of the samplng Ŝ, whch encodes a partcular mn-batchng scheme. Formally, a samplng Ŝ s a set-valued random varable wth values beng the subsets of n],.e., subsets of examples. In ths paper, we use the terms mn-batchng scheme and samplng nterchangeably. A samplng s defned by the collecton of probabltes Prob(S assgned to every subset S n] of the examples. The method mantans n vectors α R m and a vector w R d. At the begnnng of step t, we have α (t for all and w (t computed and stored n memory. We then pck a random subset S t of the examples, accordng to the mn-batchng scheme, and update varables α for S t, based on the computaton of the gradents φ (A w(t for S t. Ths s followed by an update of the vector w, whch s performed so as to mantan the relaton w (t = λn A α (t. (4 Ths relaton s mantaned for the followng reason. If w s the optmal soluton to (, then 0 = P (w = n A φ (A w + λw, (5 = 3

4 and hence w = λn n = A α, where α := φ (A w. So, f we beleve that the varables α converge to φ (A w, t ndeed does make sense to mantan (4. Why should we beleve ths? Ths s where the specfc update of the dual varables α comes from: α s set a convex combnaton of ts prevous value and our best estmate so far of φ (A w, namely, φ (A w(t. Indeed, the update can be wrtten as α (t = ( θp α (t + θp ( φ (A w (t. Why does ths make sense? Because we beleve that w (t converges to w. Admttedly, ths reasonng s somewhat crcular. However, a better word to descrbe ths reasonng would be: teratve. 3 Man Results Let p := P( Ŝ. We assume the knowledge of parameters v,..., v n > 0 for whch 2 E A h Ŝ p v h 2. (6 = Tght and easly computable formulas for such parameters can be found n 9]. whenever Prob( Ŝ τ =, nequalty (6 holds wth v = τ A 2. To smplfy the exposure, we wll wrte For nstance, B (t def = w (t w 2, C (t def = α (t α 2, =, 2,..., n. (7 3. Non-convex loss functons Our result wll be expressed n terms of the decay of the potental D (t def = λ 2 B(t + λ where B (t and C (t are defned n (7. n 2n = L 2 Theorem. Assume that the average loss functon, n n = φ, s convex. If (3 holds and we let C (t, θ mn p nλ 2 L 2 v + nλ 2, (8 then the for t 0 the potental D (t decays exponentally to zero as E D (t] e θt D (0. (9 Moreover, f we set θ equal to the upper bound n (8, then ( T max + L2 v ( (L + λd (0 p λ 2 log EP (w (T P (w ] ɛ. p n λɛ 4

5 3.2 Convex loss functons Our result wll be expressed n terms of the decay of the potental E (t def = λ 2 B(t + 2n where B (t and C (t are defned n (7. n = l C (t, Theorem 2. Assume that all loss functons {φ } are convex and satsfy (2. If we run Algorthm wth parameter θ satsfyng the nequalty θ mn p nλ l v + nλ, (0 then the for t 0 the potental E (t decays exponentally to zero as E E (t] e θt E (0. ( Moreover, f we set θ equal to the upper bound n (0, then ( T max + l ( v (L + λe (0 log p λp n λɛ EP (w (T P (w ] ɛ The rate, θ, precsely matches that of the QUARTZ algorthm 9]. Quartz s the only other method for ERM whch has been analyzed for an arbtrary mn-batchng scheme. Our algorthm s dual-free, and as we have seen above, allows for an analyss coverng the case of non-convex loss functons. 4 Chunkng In ths secton we llustrate one use of the ablty of our method to work wth an arbtrary mnbatchng scheme. Further examples nclude the ablty to desgn dstrbuted varants of the method 20], or the use of mportance/adaptve samplng to lower the number of teratons 2, 2, 9, 4]. One marked dsadvantage of standard mn-batchng ( choose a subset of examples, unformly at random used n the context of parallel processng on multcore processors s the fact that n a synchronous mplementaton there s a loss of effcency due to the fact that the computaton tme of φ(a w may dffer through. Ths s caused by the data examples havng varyng degree of sparsty. We hence ntroduce a new samplng whch mtgates ths ssue. Chunks: Choose sets G,..., G k n], such that k = G = n] and G G j =, j and ψ( := j G nnz(a j s smlar for every,.e. ψ( ψ(k. Instead of samplng τ coordnates we propose a new samplng, whch on each teraton t samples τ sets G (t (,..., G(t (τ out of G,..., G k and uses coordnates τ = G(t ( as the sampled set. We assgn each core one of the sets G (t ( for parallel computaton. The advantage of ths samplng les n the fact, that the load of computng φ(a w for all G j s smlar for all j k]. Hence, usng ths samplng we mnmze the watng tme of processors. 5

6 Algorthm 2 Nave Chunks Parameters: vector of nnz u Intalzaton n = length(u; Empty vector g and s of length n; m = max(u g] =, s] = u], = for t = 2 : n do f g] + ut] m then g] = g] +, s] = s] + ut] else = +, g] =, s] = ut] How to choose G,..., G k : We ntroduce the followng algorthm: The algorthms returns the partton of n] nto G,..., G k n a sense, that the frst g] coordnates belong to G, next g2] coordnates belong to G 2 and so on. The man advantage of ths approach s, that t makes a preprocessng step on the dataset whch takes just one pass through the data. On Fgure a through Fgure f we show the mpact of Algorthm 2 on the probablty of the watng tme of a sngle core, whch we measure by the dfference and max S t {nnz(a } τ max τ] {nnz(g(t ( } τ for the ntal and preprocessed dataset respectvely. smaller usng the preprocessng. 5 Experments S t nnz(a τ = nnz(g (t ( We can observe, that the watng tme s In all our experments we used logstc regresson. We normalzed the datasets so that max A =, and fxed λ = /n. The datasets used for experments are summarzed n Table. Dataset #samples #features sparsty w8a 49, % dorothea , % proten 7, % rcv 20,242 47, % cov 58, % Table : Datasets used n the experments. Experment. In Fgure 2a we compared the performance of Algorthm wth unform seral samplng aganst state of the art algorthms such as SGD ], SAG2] and S2GD 5] n number of epochs. The real runnng tme of the algorthms was 0.46s for S2GD, 0.79s for SAG, 0.47s for SDCA and 0.58s for SGD. In Fgure 2b we show the convergence rate for dfferent regularzaton parameters λ. In Fgure 2c we show convergence rates for dfferent seral samplngs: unform, 6

7 τ = 5 τ = 0 τ = 20 τ = τ = 5 τ = 0 τ = 20 τ = τ = 5 τ = 0 τ = 20 τ = 50 Prob a b lty Prob a b lty Prob a b lty Ma x(n n z - Me a n (n n z (a w8a ntally Ma x(n n z - Me a n (n n z (b dorothea ntally Ma x(n n z - Me a n (n n z (c proten ntally τ = 5 τ = 0 τ = 20 τ = τ = 5 τ = 0 τ = 20 τ = τ = 5 τ = 0 τ = 20 τ = 50 Prob a b lty Prob a b lty Prob a b lty Ma x(n n z - Me a n (n n z (d w8a chunked Ma x(n n z - Me a n (n n z (e dorothea chunked Ma x(n n z - Me a n (n n z (f proten chunked Fgure : Dstrbuton of the dfference between the maxmum number of nonzeros processed by a sngle core and the mean of all nonzeros processed by each core. Ths dfference shows us, how much tme s wasted per core watng on the slowest core to fnsh ts task, therefore smaller numbers are better. The frst row corresponds to the ntal dstrbuton whle the second row shows the dstrbuton after usng Algorthm 2. mportance 2] and also 4 dfferent randomly generated seral samplngs. These samplngs were generated n a controlled manner, such that random c has (max p /(mn p < c. All of these samplngs have lnear convergence as shown n the theory. Experment 2: New samplng vs. old samplng. In Fgure 3a through Fgure 3l we compare the performance of a standard parallel samplng aganst samplng of blocks G,..., G k output by Algorthm 2. In each teraton we measure the tme by and max{nnz(a } S t max τ] {nnz(g (} for the standard and new samplng respectvely. Ths way we measure only the computatons done by the core whch s gong to fnsh the last n each teraton, and consder the number of multplcatons wth nonzero entres of the data matrx as a proxy for tme. 7

8 Un form Im p orta n ce ra n d om 2 Objectve mnus Optmum SGD S2GD SAG Objectve mnus Optmum e 2 0e 3 0e 5 te s t p on t - op tm u m ra n d om 3 ra n d om 4 ra n d om 5 SDCA 0e Passes through Data (a rcv, state of the art Passes through Data (b rcv, dfferent λ (c cov, varous samplngs Fgure 2: LEFT: Comparson of SDCA wth other state of the art methods. MIDDLE: SDCA for varous values of λ. RIGHT: SDCA run wth varous samplngs Ŝ. new samplng standard samplng new samplng standard samplng new samplng standard samplng new samplng standard samplng test pont - optmum test pont - optmum test pont - optmum test pont - optmum (a w8a wth τ = 5 (b w8a wth τ = 0 (c w8a wth τ = 20 (d w8a wth τ = 50 new samplng standard samplng new samplng standard samplng new samplng standard samplng new samplng standard samplng test pont - optmum test pont - optmum test pont - optmum test pont - optmum (e dorothea wth τ = 5 (f dorothea wth τ = 0 (g dorothea wth τ = 20 (h dorothea wth τ = 50 new samplng standard samplng new samplng standard samplng new samplng standard samplng new samplng standard samplng test pont - optmum test pont - optmum test pont - optmum test pont - optmum ( proten wth τ = 5 (j proten wth τ = 0 (k proten wth τ = 20 (l proten wth τ = 50 Fgure 3: Logstc regresson wth λ = /n. Comparson between new and standard samplng wth fne-tuned stepszes for dfferent values of τ. 8

9 6 Proofs As a frst approxmaton, our proof s an extenson of the proof of Shalev-Shwartz 8] to accommodate an arbtrary samplng 6, 7, 9, 5]. For all and t we let u (t = φ (A w(t and z (t = α (t u (t. We wll use the followng lemma. Lemma 3 (Evoluton of C (t ] EŜ EŜ C (t C (t and B (t. For a fxed teraton tand all we have: = θ α (t α 2 u (t B (t B (t] 2θ λ (w(t w P (w (t α 2 + ( θp z (t θ2 n 2 λ 2 Proof. It follows that for S t usng the defnton (7 we have = 2] (2 v p z (t 2. (3 C (t C (t (7 = α (t α 2 α (t α 2 α 2 ( θp (α (t = α (t = α (t +θp = θp α 2 ( θp α (t ( θp α (t u (t 2 α (t α 2 u (t α + θp (u (t α 2 α 2 θp u (t α 2 α 2 + ( θp z (t 2] and for / S t we have C (t C (t = 0. Takng the expectaton over S t we get the result. For the second potental we get B (t B (t (7 = w (t w 2 w (t w 2 = 2θ (w (t w A z (t nλ p S t θ2 Takng the expectaton over S t, usng nequalty (6, and notng that we get n = A z (t E B (t B (t] = 2θ nλ (6 2θ nλ = n = = n 2 λ 2 p A z (t 2. S t A φ(a w (t + λw (t = P (w (t, (4 = (w (t w A z (t θ2 n 2 λ 2 E (w (t w A z (t θ2 n 2 λ 2 (4 = 2θ λ (w(t w P (w (t θ2 n 2 λ 2 = = A (p z (t 2 S t p v p z (t 2 v p z (t 2 ] 9

10 6. Proof of Theorem (nonconvex case Combnng (2 and (3, we obtan ED (t D (t ] θλ 2n Usng (3 we have = θ 2n (8 L 2 = = 2θ L 2 α (t α 2 u (t + λ 2 λ (w(t w P (w (t n 2 λ 2 λ ( C (t u (t α θ(w (t w P (w (t λ θ 2n L 2 = α 2 + ( θp z (t 2] θ2 = v p z (t 2 ] ( λ( θp L 2 θv nλp ( C (t u (t α 2 + θ(w (t w P (w (t. u (t α 2 = φ (A w (t φ (A w 2 L 2 w (t w 2. By strong convexty of P, (w (t w P (w (t P (w (t P (w + λ 2 w(t w 2 and P (w (t P (w λ 2 w(t w 2, whch together yelds (w (t w P (w (t λ w (t w 2. ] z (t 2 Therefore, ED (t D (t ] θ n λ 2L 2 = C (t + ( λ2 ] + λ B (t = θd (t. It follows that ED (t ] ( θd (t, and repeatng ths recursvely we end up wth ED (t ] ( θ t D (0 e θt D (0. Ths concludes the proof of the frst part of Theorem. The second part of the proof follows by observng that P s (L+λ-smooth, whch gves P (w P (w L+λ 2 w w Convex case For the next theorem we need an addtonal lemma: Lemma 4. Assume that φ are L -smooth and convex. Then, for every w, n = φ (w φ (w 2 2 (P (w P (w λ2 L w w 2 (5 0

11 Proof. Let g (x = φ (x φ (A w φ (A w (x A w. Clearly, g s also l -smooth. By convexty of φ we have g (x 0 for all x. It follows that g satsfes g (x 2 2l g (x. Usng the defnton of g, we obtan φ (A w φ (A w 2 = g (A w 2 Summng these terms up weghted by /l and usng (5 we get n = 2l φ (A w φ (A w φ (A w (A w A w ]. (6 φ (A w φ (A w 2 (6 2 l n φ (A w φ (A w A φ (A w (w w ] = (5 = 2 P (w λ 2 w 2 P (w + λ ] 2 w 2 + λw (w w = 2 P (w P (w λ2 ] w w Proof of Theorem 2 Combnng (2 and (3, we obtan EE (t E (t ] θ n = θ n 2l = + λ 2 = = 2θ α (t α 2 u (t α 2 + ( θp z (t 2] θ2 λ (w(t w P (w (t n 2 λ 2 (C (t u (t α 2 + 2l + θ(w (t w P (w (t (0 θ ] (C (t u (t α 2 n 2l = v p z (t 2 ( ( θp 2l θv 2p λn ] ] + θ(w (t w P (w (t Usng the convexty of P we have P (w P (w (t (w (t w P (w (t and usng Lemma 4, we have EE (t E (t ] (5 θ n = C (t θ (P (w (t P (w λ2 2l w(t w 2 +θ(w (t w P (w (t ] θ C (t + λ n 2l 2 B(t = = θe (t. Ths gves EE (t ] ( θe (t, whch concludes the frst part of the Theorem 2. The second part follows by observng, that P s (L + λ-smooth, whch gves P (w P (w L+λ 2 w w 2.

12 References ] Herbert Robbns and Sutton Monro. A stochastc approxmaton method. Ann. Math. Statst., 22(3: , ] Mark Schmdt, Ncolas Le Roux, and Francs Bach. Mnmzng fnte sums wth the stochastc average gradent. arxv: , ] Sha Shalev-Shwartz and Tong Zhang. Stochastc dual coordnate ascent methods for regularzed loss. Journal of Machne Learnng Research, 4(: , ] Re Johnson and Tong Zhang. Acceleratng stochastc gradent descent usng predctve varance reducton. In NIPS, ] Jakub Konečný and Peter Rchtárk. S2GD: Sem-stochastc gradent descent methods. arxv:32.666, ] Jakub Konečný, Je Lu, Peter Rchtárk, and Martn Takáč. ms2gd: Mn-batch semstochastc gradent descent n the proxmal settng. arxv: , ] Aaron Defazo, Francs Bach, and Smon Lacoste-Julen. SAGA: A fast ncremental gradent method wth support for non-strongly convex composte objectves. Advances n Neural Informaton Processng Systems 27 (NIPS 204, ] Jakub Konečný, Zheng Qu, and Peter Rchtárk. Sem-stochastc coordnate descent. arxv: , ] Zheng Qu, Peter Rchtárk, and Tong Zhang. Randomzed Dual Coordnate Ascent wth Arbtrary Samplng. arxv:4.5873, ] Sha Shalev-Shwartz and Tong Zhang. Accelerated mn-batch stochastc dual coordnate ascent. In Advances n Neural Informaton Processng Systems 26, pages ] Sha Shalev-Shwartz and Tong Zhang. Proxmal stochastc dual coordnate ascent. arxv:2.277, ] Peln Zhao and Tong Zhang. Stochastc optmzaton wth mportance samplng. ICML, ] Sha Shalev-Shwartz and Tong Zhang. Accelerated proxmal stochastc dual coordnate ascent for regularzed loss mnmzaton. to appear n Mathematcal Programmng, ] Domnk Csba, Zheng Qu, and Peter Rchtárk. Stochastc dual coordnate ascent wth adaptve probabltes. ICML ] Zheng Qu, Peter Rchtárk, Martn Takáč, and Olver Fercoq. Stochastc Dual Newton Ascent for emprcal rsk mnmzaton. arxv: ] Peter Rchtárk and Martn Takáč. On optmal probabltes n stochastc coordnate descent methods. arxv: , ] Zheng Qu and Peter Rchtárk. Coordnate descent methods wth arbtrary samplng I: Algorthms and complexty. arxv: ,

13 8] Sha Shalev-Shwartz. SDCA wthout dualty. CoRR, abs/ , ] Zheng Qu and Peter Rchtárk. Coordnate Descent wth Arbtrary Samplng II: Expected Separable Overapproxmaton. arxv: , ] Peter Rchtárk and Martn Takáč. Dstrbuted coordnate descent method for learnng wth bg data. arxv: , ] Peter Rchtárk and Martn Takáč. Iteraton complexty of randomzed block-coordnate descent methods for mnmzng a composte functon. Mathematcal Programmng, 44(2: 38,

Importance Sampling for Minibatches

Importance Sampling for Minibatches Importance Samplng for Mnbatches Domnk Csba and Peter Rchtárk School of Mathematcs Unversty of Ednburgh Unted Kngdom arxv:602.02283v [cs.lg] 6 Feb 206 February 9, 206 Abstract Mnbatchng s a very well studed

More information

On Optimal Probabilities in Stochastic Coordinate Descent Methods

On Optimal Probabilities in Stochastic Coordinate Descent Methods On Optmal Probabltes n Stochastc Coordnate Descent Methods Peter Rchtárk and Martn Takáč Unversty of Ednburgh, Unted Kngdom October, 203 Abstract We propose and analyze a new parallel coordnate descent

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

arxiv: v2 [math.oc] 2 Mar 2017

arxiv: v2 [math.oc] 2 Mar 2017 Dual Free Adaptve Mn-batch SDCA for Emprcal Rsk Mnmzaton X He 1 Martn Takáč 1 arxv:1510.06684v2 [math.oc] 2 Mar 2017 Abstract In ths paper we develop dual free mn-batch SDCA wth adaptve probabltes for

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Stochastic Optimization Methods

Stochastic Optimization Methods Stochastc Optmzaton Methods Lecturer: Pradeep Ravkumar Co-nstructor: Aart Sngh Convex Optmzaton 10-725/36-725 Adapted from sldes from Ryan Tbshran Stochastc gradent descent Consder sum of functons 1 mn

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Coordinate Descent with Arbitrary Sampling I: Algorithms and Complexity

Coordinate Descent with Arbitrary Sampling I: Algorithms and Complexity Coordnate Descent wth Arbtrary Samplng I: Algorthms and Complexty Zheng Qu Peter Rchtárk December 27, 2014 Abstract We study the problem of mnmzng the sum of a smooth convex functon and a convex blockseparable

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning

A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning A Delay-tolerant Proxmal-Gradent Algorthm for Dstrbuted Learnng Konstantn Mshchenko Franck Iutzeler Jérôme Malck Massh Amn KAUST Unv. Grenoble Alpes CNRS and Unv. Grenoble Alpes Unv. Grenoble Alpes ICML

More information

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7 Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Adaptive Variance Reducing for Stochastic Gradient Descent

Adaptive Variance Reducing for Stochastic Gradient Descent daptve Varance Reducng for Stochastc Gradent Descent Zebang Shen, Hu Qan, Tengfe Zhou, Tongzhou Mu Zhejang Unversty, Chna {shenzebang, qanhu, zhoutengfe zju, mutongzhou}@zju.edu.cn bstract Varance Reducng

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Lecture 17: Lee-Sidford Barrier

Lecture 17: Lee-Sidford Barrier CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016 CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Randomness and Computation

Randomness and Computation Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES BÂRZĂ, Slvu Faculty of Mathematcs-Informatcs Spru Haret Unversty barza_slvu@yahoo.com Abstract Ths paper wants to contnue

More information

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence. Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Randomized block proximal damped Newton method for composite self-concordant minimization

Randomized block proximal damped Newton method for composite self-concordant minimization Randomzed block proxmal damped Newton method for composte self-concordant mnmzaton Zhaosong Lu June 30, 2016 Revsed: March 28, 2017 Abstract In ths paper we consder the composte self-concordant CSC mnmzaton

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

An efficient algorithm for multivariate Maclaurin Newton transformation

An efficient algorithm for multivariate Maclaurin Newton transformation Annales UMCS Informatca AI VIII, 2 2008) 5 14 DOI: 10.2478/v10065-008-0020-6 An effcent algorthm for multvarate Maclaurn Newton transformaton Joanna Kapusta Insttute of Mathematcs and Computer Scence,

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Application of B-Spline to Numerical Solution of a System of Singularly Perturbed Problems

Application of B-Spline to Numerical Solution of a System of Singularly Perturbed Problems Mathematca Aeterna, Vol. 1, 011, no. 06, 405 415 Applcaton of B-Splne to Numercal Soluton of a System of Sngularly Perturbed Problems Yogesh Gupta Department of Mathematcs Unted College of Engneerng &

More information

CSE 546 Midterm Exam, Fall 2014(with Solution)

CSE 546 Midterm Exam, Fall 2014(with Solution) CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Lecture 11. minimize. c j x j. j=1. 1 x j 0 j. +, b R m + and c R n +

Lecture 11. minimize. c j x j. j=1. 1 x j 0 j. +, b R m + and c R n + Topcs n Theoretcal Computer Scence May 4, 2015 Lecturer: Ola Svensson Lecture 11 Scrbes: Vncent Eggerlng, Smon Rodrguez 1 Introducton In the last lecture we covered the ellpsod method and ts applcaton

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Inexact Newton Methods for Inverse Eigenvalue Problems

Inexact Newton Methods for Inverse Eigenvalue Problems Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.

More information

A new Approach for Solving Linear Ordinary Differential Equations

A new Approach for Solving Linear Ordinary Differential Equations , ISSN 974-57X (Onlne), ISSN 974-5718 (Prnt), Vol. ; Issue No. 1; Year 14, Copyrght 13-14 by CESER PUBLICATIONS A new Approach for Solvng Lnear Ordnary Dfferental Equatons Fawz Abdelwahd Department of

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

The Feynman path integral

The Feynman path integral The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space

More information