Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses
|
|
- Terence Doyle
- 6 years ago
- Views:
Transcription
1 Prmal Method for ERM wth Flexble Mn-batchng Schemes and Non-convex Losses Domnk Csba Peter Rchtárk June 7, 205 Abstract In ths work we develop a new algorthm for regularzed emprcal rsk mnmzaton. Our method extends recent technques of Shalev-Shwartz 02/205], whch enable a dual-free analyss of SDCA, to arbtrary mn-batchng schemes. Moreover, our method s able to better utlze the nformaton n the data defnng the ERM problem. For convex loss functons, our complexty results match those of QUARTZ, whch s a prmal-dual method also allowng for arbtrary mn-batchng schemes. The advantage of a dual-free analyss comes from the fact that t guarantees convergence even for non-convex loss functons, as long as the average loss s convex. We llustrate through experments the utlty of beng able to desgn arbtrary mn-batchng schemes. Introducton Emprcal rsk mnmzaton (ERM s a very successful and mmensely popular paradgm n machne learnng, used to tran a varety of predcton and classfcaton models. Gven examples A,..., A n R d m, loss functons φ,..., φ n : R m R and a regularzaton parameter λ > 0, the L2-regularzed ERM problem s an optmzaton problem of the form ] P (w := φ (A w + λ n 2 w 2 ( mn w R d = Throughout the paper we shall assume that for each, the loss functon φ s l -smooth wth l > 0. That s, for all x, y R m and all n] := {, 2,..., n}, we have φ (x φ (y l x y. (2 Further, let L,..., L n > 0 be constants for whch the nequalty φ (A w φ (A z L w z (3 holds for all w, z R d and all and let L := max L. Note that we can always bound L l A. However, L can be better (smaller than l A. The authors acknowledge support from the EPSRC Grant EP/K02325X/, Accelerated Coordnate Descent Methods for Bg Data Optmzaton. School of Mathematcs, The Unversty of Ednburgh, Unted Kngdom (e-mal: cdomnk@gmal.com School of Mathematcs, The Unversty of Ednburgh, Unted Kngdom (e-mal: peter.rchtark@ed.ac.uk
2 . Background In the last few years, a lot of research effort was put nto desgnng new effcent algorthms for solvng ths problem (and some of ts modfcatons. The frenzy of actvty was motvated by the realzaton that SGD ], not so long ago consdered the state-of-the-art method for ERM, was far from beng optmal, and that new deas can lead to algorthms whch are far superor to SGD n both theory and practce. The methods that belong to ths category nclude SAG 2], SDCA 3], SVRG 4], S2GD 5], ms2gd 6], SAGA 7], S2CD 8], QUARTZ 9], ASDCA 0], prox-sdca ], IPROX-SDCA 2], A-PROX-SDCA 3], AdaSDCA 4], SDNA 5]. Methods analyzed for arbtrary mn-batchng schemes nclude NSync 6], ALPHA 7] and QUARTZ 9]. In order to fnd an ɛ-soluton n expectaton, state of the art (non-accelerated methods for solvng ( only need O((n + κ log(/ɛ steps, where each step nvolves the computaton of the gradent φ (A w for some randomly selected example. The quantty κ s the condton number. Typcally one has κ = max l A 2 l A 2 for methods pckng unformly at random, and κ = nλ for methods pckng usng a carefully desgned data-dependent mportance samplng. Computaton of such a gradent typcally nvolves work whch s equvalent to readng the example A, that s, O(nnz(A dm arthmetc operatons..2 Contrbutons In ths work we develop a new algorthm for the L2-regularzed ERM problem (. Our method extends a technque recently ntroduced by Shalev-Shwartz 8], whch enables a dual-free analyss of SDCA, to arbtrary mn-batchng schemes. That s, our method works at each teraton wth a random subset of examples, chosen n an..d. fashon from an arbtrary dstrbuton. Such flexble schemes are useful for varous reasons, ncludng the development of dstrbuted or robust varants of the method, desgn of mportance samplng for mprovng the complexty rate, desgn of a samplng whch s amed at obtanng effcences elsewhere, such us utlzng NUMA (non-unform memory access archtectures, and v streamlnng and speedng up the processng of each mn-batch by means of assgnng to each processor approxmately even workload so as to reduce dle tme (we do experments wth the latter setup. In comparson wth 8], our method s able to better utlze the nformaton n the data examples A,..., A n, leadng to a better data-dependent bound. For convex loss functons, our complexty results match those of QUARTZ 9] n terms of the rate (the logarthmc factors dffer. QUARTZ s a prmal-dual method also allowng for arbtrary mn-batchng schemes. However, whle 9] only characterze the decay of expected rsk, we also gve bounds for the sequence of terates. In partcular, we show that for convex loss functons, our method enjoys the rate (Theorem 2 ( max + l v log p λp n ( (L + λe (0, λɛ where p s the probablty that coordnate s updated n an teraton, v,..., v n > 0 are certan stepsze parameters of the method assocated wth the samplng and data (see (6, and E (0 s a constant dependng on the startng pont. For nstance, n the specal case pckng a sngle example at a tme unformly at random, we have p = /n and v = A 2, whereby we obtan one λ 2
3 of the O(n + κ log(/ɛ rates mentoned above. The other rate can be recovered usng mportance samplng. The advantage of a dual-free analyss comes from the fact that t guarantees convergence even for non-convex loss functons, as long as the average loss s convex. Ths s a step toward understandng non-convex models. In partcular, we show that for non-convex loss functons, our method enjoys the rate (Theorem ( max + L2 v ( (L + λd (0 p λ 2 log, p n λɛ where D (0 s a constant dependng on the startng pont. Fnally, we llustrate through experments wth chunkng a smple load balancng technque the utlty of beng able to desgn arbtrary mn-batchng schemes. 2 Algorthm We shall now descrbe the method (Algorthm. Algorthm dfsdca: Dual-Free SDCA wth Arbtrary Samplng Parameters: Samplng Ŝ, stepsze θ Intalzaton α (0,..., α(0 n R m, set w (0 = λn for t do Sample a set S t accordng to Ŝ for S t do α (t = α (t θp ( φ (A w(t + α (t w (t = w (t S t θ(nλp A ( φ (A w(t + α (t n = A α (0, p = Prob( Ŝ The method encodes a famly of algorthms, dependng on the choce of the samplng Ŝ, whch encodes a partcular mn-batchng scheme. Formally, a samplng Ŝ s a set-valued random varable wth values beng the subsets of n],.e., subsets of examples. In ths paper, we use the terms mn-batchng scheme and samplng nterchangeably. A samplng s defned by the collecton of probabltes Prob(S assgned to every subset S n] of the examples. The method mantans n vectors α R m and a vector w R d. At the begnnng of step t, we have α (t for all and w (t computed and stored n memory. We then pck a random subset S t of the examples, accordng to the mn-batchng scheme, and update varables α for S t, based on the computaton of the gradents φ (A w(t for S t. Ths s followed by an update of the vector w, whch s performed so as to mantan the relaton w (t = λn A α (t. (4 Ths relaton s mantaned for the followng reason. If w s the optmal soluton to (, then 0 = P (w = n A φ (A w + λw, (5 = 3
4 and hence w = λn n = A α, where α := φ (A w. So, f we beleve that the varables α converge to φ (A w, t ndeed does make sense to mantan (4. Why should we beleve ths? Ths s where the specfc update of the dual varables α comes from: α s set a convex combnaton of ts prevous value and our best estmate so far of φ (A w, namely, φ (A w(t. Indeed, the update can be wrtten as α (t = ( θp α (t + θp ( φ (A w (t. Why does ths make sense? Because we beleve that w (t converges to w. Admttedly, ths reasonng s somewhat crcular. However, a better word to descrbe ths reasonng would be: teratve. 3 Man Results Let p := P( Ŝ. We assume the knowledge of parameters v,..., v n > 0 for whch 2 E A h Ŝ p v h 2. (6 = Tght and easly computable formulas for such parameters can be found n 9]. whenever Prob( Ŝ τ =, nequalty (6 holds wth v = τ A 2. To smplfy the exposure, we wll wrte For nstance, B (t def = w (t w 2, C (t def = α (t α 2, =, 2,..., n. (7 3. Non-convex loss functons Our result wll be expressed n terms of the decay of the potental D (t def = λ 2 B(t + λ where B (t and C (t are defned n (7. n 2n = L 2 Theorem. Assume that the average loss functon, n n = φ, s convex. If (3 holds and we let C (t, θ mn p nλ 2 L 2 v + nλ 2, (8 then the for t 0 the potental D (t decays exponentally to zero as E D (t] e θt D (0. (9 Moreover, f we set θ equal to the upper bound n (8, then ( T max + L2 v ( (L + λd (0 p λ 2 log EP (w (T P (w ] ɛ. p n λɛ 4
5 3.2 Convex loss functons Our result wll be expressed n terms of the decay of the potental E (t def = λ 2 B(t + 2n where B (t and C (t are defned n (7. n = l C (t, Theorem 2. Assume that all loss functons {φ } are convex and satsfy (2. If we run Algorthm wth parameter θ satsfyng the nequalty θ mn p nλ l v + nλ, (0 then the for t 0 the potental E (t decays exponentally to zero as E E (t] e θt E (0. ( Moreover, f we set θ equal to the upper bound n (0, then ( T max + l ( v (L + λe (0 log p λp n λɛ EP (w (T P (w ] ɛ The rate, θ, precsely matches that of the QUARTZ algorthm 9]. Quartz s the only other method for ERM whch has been analyzed for an arbtrary mn-batchng scheme. Our algorthm s dual-free, and as we have seen above, allows for an analyss coverng the case of non-convex loss functons. 4 Chunkng In ths secton we llustrate one use of the ablty of our method to work wth an arbtrary mnbatchng scheme. Further examples nclude the ablty to desgn dstrbuted varants of the method 20], or the use of mportance/adaptve samplng to lower the number of teratons 2, 2, 9, 4]. One marked dsadvantage of standard mn-batchng ( choose a subset of examples, unformly at random used n the context of parallel processng on multcore processors s the fact that n a synchronous mplementaton there s a loss of effcency due to the fact that the computaton tme of φ(a w may dffer through. Ths s caused by the data examples havng varyng degree of sparsty. We hence ntroduce a new samplng whch mtgates ths ssue. Chunks: Choose sets G,..., G k n], such that k = G = n] and G G j =, j and ψ( := j G nnz(a j s smlar for every,.e. ψ( ψ(k. Instead of samplng τ coordnates we propose a new samplng, whch on each teraton t samples τ sets G (t (,..., G(t (τ out of G,..., G k and uses coordnates τ = G(t ( as the sampled set. We assgn each core one of the sets G (t ( for parallel computaton. The advantage of ths samplng les n the fact, that the load of computng φ(a w for all G j s smlar for all j k]. Hence, usng ths samplng we mnmze the watng tme of processors. 5
6 Algorthm 2 Nave Chunks Parameters: vector of nnz u Intalzaton n = length(u; Empty vector g and s of length n; m = max(u g] =, s] = u], = for t = 2 : n do f g] + ut] m then g] = g] +, s] = s] + ut] else = +, g] =, s] = ut] How to choose G,..., G k : We ntroduce the followng algorthm: The algorthms returns the partton of n] nto G,..., G k n a sense, that the frst g] coordnates belong to G, next g2] coordnates belong to G 2 and so on. The man advantage of ths approach s, that t makes a preprocessng step on the dataset whch takes just one pass through the data. On Fgure a through Fgure f we show the mpact of Algorthm 2 on the probablty of the watng tme of a sngle core, whch we measure by the dfference and max S t {nnz(a } τ max τ] {nnz(g(t ( } τ for the ntal and preprocessed dataset respectvely. smaller usng the preprocessng. 5 Experments S t nnz(a τ = nnz(g (t ( We can observe, that the watng tme s In all our experments we used logstc regresson. We normalzed the datasets so that max A =, and fxed λ = /n. The datasets used for experments are summarzed n Table. Dataset #samples #features sparsty w8a 49, % dorothea , % proten 7, % rcv 20,242 47, % cov 58, % Table : Datasets used n the experments. Experment. In Fgure 2a we compared the performance of Algorthm wth unform seral samplng aganst state of the art algorthms such as SGD ], SAG2] and S2GD 5] n number of epochs. The real runnng tme of the algorthms was 0.46s for S2GD, 0.79s for SAG, 0.47s for SDCA and 0.58s for SGD. In Fgure 2b we show the convergence rate for dfferent regularzaton parameters λ. In Fgure 2c we show convergence rates for dfferent seral samplngs: unform, 6
7 τ = 5 τ = 0 τ = 20 τ = τ = 5 τ = 0 τ = 20 τ = τ = 5 τ = 0 τ = 20 τ = 50 Prob a b lty Prob a b lty Prob a b lty Ma x(n n z - Me a n (n n z (a w8a ntally Ma x(n n z - Me a n (n n z (b dorothea ntally Ma x(n n z - Me a n (n n z (c proten ntally τ = 5 τ = 0 τ = 20 τ = τ = 5 τ = 0 τ = 20 τ = τ = 5 τ = 0 τ = 20 τ = 50 Prob a b lty Prob a b lty Prob a b lty Ma x(n n z - Me a n (n n z (d w8a chunked Ma x(n n z - Me a n (n n z (e dorothea chunked Ma x(n n z - Me a n (n n z (f proten chunked Fgure : Dstrbuton of the dfference between the maxmum number of nonzeros processed by a sngle core and the mean of all nonzeros processed by each core. Ths dfference shows us, how much tme s wasted per core watng on the slowest core to fnsh ts task, therefore smaller numbers are better. The frst row corresponds to the ntal dstrbuton whle the second row shows the dstrbuton after usng Algorthm 2. mportance 2] and also 4 dfferent randomly generated seral samplngs. These samplngs were generated n a controlled manner, such that random c has (max p /(mn p < c. All of these samplngs have lnear convergence as shown n the theory. Experment 2: New samplng vs. old samplng. In Fgure 3a through Fgure 3l we compare the performance of a standard parallel samplng aganst samplng of blocks G,..., G k output by Algorthm 2. In each teraton we measure the tme by and max{nnz(a } S t max τ] {nnz(g (} for the standard and new samplng respectvely. Ths way we measure only the computatons done by the core whch s gong to fnsh the last n each teraton, and consder the number of multplcatons wth nonzero entres of the data matrx as a proxy for tme. 7
8 Un form Im p orta n ce ra n d om 2 Objectve mnus Optmum SGD S2GD SAG Objectve mnus Optmum e 2 0e 3 0e 5 te s t p on t - op tm u m ra n d om 3 ra n d om 4 ra n d om 5 SDCA 0e Passes through Data (a rcv, state of the art Passes through Data (b rcv, dfferent λ (c cov, varous samplngs Fgure 2: LEFT: Comparson of SDCA wth other state of the art methods. MIDDLE: SDCA for varous values of λ. RIGHT: SDCA run wth varous samplngs Ŝ. new samplng standard samplng new samplng standard samplng new samplng standard samplng new samplng standard samplng test pont - optmum test pont - optmum test pont - optmum test pont - optmum (a w8a wth τ = 5 (b w8a wth τ = 0 (c w8a wth τ = 20 (d w8a wth τ = 50 new samplng standard samplng new samplng standard samplng new samplng standard samplng new samplng standard samplng test pont - optmum test pont - optmum test pont - optmum test pont - optmum (e dorothea wth τ = 5 (f dorothea wth τ = 0 (g dorothea wth τ = 20 (h dorothea wth τ = 50 new samplng standard samplng new samplng standard samplng new samplng standard samplng new samplng standard samplng test pont - optmum test pont - optmum test pont - optmum test pont - optmum ( proten wth τ = 5 (j proten wth τ = 0 (k proten wth τ = 20 (l proten wth τ = 50 Fgure 3: Logstc regresson wth λ = /n. Comparson between new and standard samplng wth fne-tuned stepszes for dfferent values of τ. 8
9 6 Proofs As a frst approxmaton, our proof s an extenson of the proof of Shalev-Shwartz 8] to accommodate an arbtrary samplng 6, 7, 9, 5]. For all and t we let u (t = φ (A w(t and z (t = α (t u (t. We wll use the followng lemma. Lemma 3 (Evoluton of C (t ] EŜ EŜ C (t C (t and B (t. For a fxed teraton tand all we have: = θ α (t α 2 u (t B (t B (t] 2θ λ (w(t w P (w (t α 2 + ( θp z (t θ2 n 2 λ 2 Proof. It follows that for S t usng the defnton (7 we have = 2] (2 v p z (t 2. (3 C (t C (t (7 = α (t α 2 α (t α 2 α 2 ( θp (α (t = α (t = α (t +θp = θp α 2 ( θp α (t ( θp α (t u (t 2 α (t α 2 u (t α + θp (u (t α 2 α 2 θp u (t α 2 α 2 + ( θp z (t 2] and for / S t we have C (t C (t = 0. Takng the expectaton over S t we get the result. For the second potental we get B (t B (t (7 = w (t w 2 w (t w 2 = 2θ (w (t w A z (t nλ p S t θ2 Takng the expectaton over S t, usng nequalty (6, and notng that we get n = A z (t E B (t B (t] = 2θ nλ (6 2θ nλ = n = = n 2 λ 2 p A z (t 2. S t A φ(a w (t + λw (t = P (w (t, (4 = (w (t w A z (t θ2 n 2 λ 2 E (w (t w A z (t θ2 n 2 λ 2 (4 = 2θ λ (w(t w P (w (t θ2 n 2 λ 2 = = A (p z (t 2 S t p v p z (t 2 v p z (t 2 ] 9
10 6. Proof of Theorem (nonconvex case Combnng (2 and (3, we obtan ED (t D (t ] θλ 2n Usng (3 we have = θ 2n (8 L 2 = = 2θ L 2 α (t α 2 u (t + λ 2 λ (w(t w P (w (t n 2 λ 2 λ ( C (t u (t α θ(w (t w P (w (t λ θ 2n L 2 = α 2 + ( θp z (t 2] θ2 = v p z (t 2 ] ( λ( θp L 2 θv nλp ( C (t u (t α 2 + θ(w (t w P (w (t. u (t α 2 = φ (A w (t φ (A w 2 L 2 w (t w 2. By strong convexty of P, (w (t w P (w (t P (w (t P (w + λ 2 w(t w 2 and P (w (t P (w λ 2 w(t w 2, whch together yelds (w (t w P (w (t λ w (t w 2. ] z (t 2 Therefore, ED (t D (t ] θ n λ 2L 2 = C (t + ( λ2 ] + λ B (t = θd (t. It follows that ED (t ] ( θd (t, and repeatng ths recursvely we end up wth ED (t ] ( θ t D (0 e θt D (0. Ths concludes the proof of the frst part of Theorem. The second part of the proof follows by observng that P s (L+λ-smooth, whch gves P (w P (w L+λ 2 w w Convex case For the next theorem we need an addtonal lemma: Lemma 4. Assume that φ are L -smooth and convex. Then, for every w, n = φ (w φ (w 2 2 (P (w P (w λ2 L w w 2 (5 0
11 Proof. Let g (x = φ (x φ (A w φ (A w (x A w. Clearly, g s also l -smooth. By convexty of φ we have g (x 0 for all x. It follows that g satsfes g (x 2 2l g (x. Usng the defnton of g, we obtan φ (A w φ (A w 2 = g (A w 2 Summng these terms up weghted by /l and usng (5 we get n = 2l φ (A w φ (A w φ (A w (A w A w ]. (6 φ (A w φ (A w 2 (6 2 l n φ (A w φ (A w A φ (A w (w w ] = (5 = 2 P (w λ 2 w 2 P (w + λ ] 2 w 2 + λw (w w = 2 P (w P (w λ2 ] w w Proof of Theorem 2 Combnng (2 and (3, we obtan EE (t E (t ] θ n = θ n 2l = + λ 2 = = 2θ α (t α 2 u (t α 2 + ( θp z (t 2] θ2 λ (w(t w P (w (t n 2 λ 2 (C (t u (t α 2 + 2l + θ(w (t w P (w (t (0 θ ] (C (t u (t α 2 n 2l = v p z (t 2 ( ( θp 2l θv 2p λn ] ] + θ(w (t w P (w (t Usng the convexty of P we have P (w P (w (t (w (t w P (w (t and usng Lemma 4, we have EE (t E (t ] (5 θ n = C (t θ (P (w (t P (w λ2 2l w(t w 2 +θ(w (t w P (w (t ] θ C (t + λ n 2l 2 B(t = = θe (t. Ths gves EE (t ] ( θe (t, whch concludes the frst part of the Theorem 2. The second part follows by observng, that P s (L + λ-smooth, whch gves P (w P (w L+λ 2 w w 2.
12 References ] Herbert Robbns and Sutton Monro. A stochastc approxmaton method. Ann. Math. Statst., 22(3: , ] Mark Schmdt, Ncolas Le Roux, and Francs Bach. Mnmzng fnte sums wth the stochastc average gradent. arxv: , ] Sha Shalev-Shwartz and Tong Zhang. Stochastc dual coordnate ascent methods for regularzed loss. Journal of Machne Learnng Research, 4(: , ] Re Johnson and Tong Zhang. Acceleratng stochastc gradent descent usng predctve varance reducton. In NIPS, ] Jakub Konečný and Peter Rchtárk. S2GD: Sem-stochastc gradent descent methods. arxv:32.666, ] Jakub Konečný, Je Lu, Peter Rchtárk, and Martn Takáč. ms2gd: Mn-batch semstochastc gradent descent n the proxmal settng. arxv: , ] Aaron Defazo, Francs Bach, and Smon Lacoste-Julen. SAGA: A fast ncremental gradent method wth support for non-strongly convex composte objectves. Advances n Neural Informaton Processng Systems 27 (NIPS 204, ] Jakub Konečný, Zheng Qu, and Peter Rchtárk. Sem-stochastc coordnate descent. arxv: , ] Zheng Qu, Peter Rchtárk, and Tong Zhang. Randomzed Dual Coordnate Ascent wth Arbtrary Samplng. arxv:4.5873, ] Sha Shalev-Shwartz and Tong Zhang. Accelerated mn-batch stochastc dual coordnate ascent. In Advances n Neural Informaton Processng Systems 26, pages ] Sha Shalev-Shwartz and Tong Zhang. Proxmal stochastc dual coordnate ascent. arxv:2.277, ] Peln Zhao and Tong Zhang. Stochastc optmzaton wth mportance samplng. ICML, ] Sha Shalev-Shwartz and Tong Zhang. Accelerated proxmal stochastc dual coordnate ascent for regularzed loss mnmzaton. to appear n Mathematcal Programmng, ] Domnk Csba, Zheng Qu, and Peter Rchtárk. Stochastc dual coordnate ascent wth adaptve probabltes. ICML ] Zheng Qu, Peter Rchtárk, Martn Takáč, and Olver Fercoq. Stochastc Dual Newton Ascent for emprcal rsk mnmzaton. arxv: ] Peter Rchtárk and Martn Takáč. On optmal probabltes n stochastc coordnate descent methods. arxv: , ] Zheng Qu and Peter Rchtárk. Coordnate descent methods wth arbtrary samplng I: Algorthms and complexty. arxv: ,
13 8] Sha Shalev-Shwartz. SDCA wthout dualty. CoRR, abs/ , ] Zheng Qu and Peter Rchtárk. Coordnate Descent wth Arbtrary Samplng II: Expected Separable Overapproxmaton. arxv: , ] Peter Rchtárk and Martn Takáč. Dstrbuted coordnate descent method for learnng wth bg data. arxv: , ] Peter Rchtárk and Martn Takáč. Iteraton complexty of randomzed block-coordnate descent methods for mnmzng a composte functon. Mathematcal Programmng, 44(2: 38,
Importance Sampling for Minibatches
Importance Samplng for Mnbatches Domnk Csba and Peter Rchtárk School of Mathematcs Unversty of Ednburgh Unted Kngdom arxv:602.02283v [cs.lg] 6 Feb 206 February 9, 206 Abstract Mnbatchng s a very well studed
More informationOn Optimal Probabilities in Stochastic Coordinate Descent Methods
On Optmal Probabltes n Stochastc Coordnate Descent Methods Peter Rchtárk and Martn Takáč Unversty of Ednburgh, Unted Kngdom October, 203 Abstract We propose and analyze a new parallel coordnate descent
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationarxiv: v2 [math.oc] 2 Mar 2017
Dual Free Adaptve Mn-batch SDCA for Emprcal Rsk Mnmzaton X He 1 Martn Takáč 1 arxv:1510.06684v2 [math.oc] 2 Mar 2017 Abstract In ths paper we develop dual free mn-batch SDCA wth adaptve probabltes for
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationStochastic Optimization Methods
Stochastc Optmzaton Methods Lecturer: Pradeep Ravkumar Co-nstructor: Aart Sngh Convex Optmzaton 10-725/36-725 Adapted from sldes from Ryan Tbshran Stochastc gradent descent Consder sum of functons 1 mn
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationCoordinate Descent with Arbitrary Sampling I: Algorithms and Complexity
Coordnate Descent wth Arbtrary Samplng I: Algorthms and Complexty Zheng Qu Peter Rchtárk December 27, 2014 Abstract We study the problem of mnmzng the sum of a smooth convex functon and a convex blockseparable
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationA Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning
A Delay-tolerant Proxmal-Gradent Algorthm for Dstrbuted Learnng Konstantn Mshchenko Franck Iutzeler Jérôme Malck Massh Amn KAUST Unv. Grenoble Alpes CNRS and Unv. Grenoble Alpes Unv. Grenoble Alpes ICML
More informationStanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7
Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationEstimation: Part 2. Chapter GREG estimation
Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationAdaptive Variance Reducing for Stochastic Gradient Descent
daptve Varance Reducng for Stochastc Gradent Descent Zebang Shen, Hu Qan, Tengfe Zhou, Tongzhou Mu Zhejang Unversty, Chna {shenzebang, qanhu, zhoutengfe zju, mutongzhou}@zju.edu.cn bstract Varance Reducng
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationLecture 17: Lee-Sidford Barrier
CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the
More informationMatrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD
Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationCS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016
CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.
More informationOn a direct solver for linear least squares problems
ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear
More informationLossy Compression. Compromise accuracy of reconstruction for increased compression.
Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationWinter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan
Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationComputing Correlated Equilibria in Multi-Player Games
Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationRandomness and Computation
Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More informationA Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach
A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland
More informationOutline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique
Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationVARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES
VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES BÂRZĂ, Slvu Faculty of Mathematcs-Informatcs Spru Haret Unversty barza_slvu@yahoo.com Abstract Ths paper wants to contnue
More informationVector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.
Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm
More information1 The Mistake Bound Model
5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationRandomized block proximal damped Newton method for composite self-concordant minimization
Randomzed block proxmal damped Newton method for composte self-concordant mnmzaton Zhaosong Lu June 30, 2016 Revsed: March 28, 2017 Abstract In ths paper we consder the composte self-concordant CSC mnmzaton
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationAn efficient algorithm for multivariate Maclaurin Newton transformation
Annales UMCS Informatca AI VIII, 2 2008) 5 14 DOI: 10.2478/v10065-008-0020-6 An effcent algorthm for multvarate Maclaurn Newton transformaton Joanna Kapusta Insttute of Mathematcs and Computer Scence,
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationConvergence of random processes
DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large
More informationprinceton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora
prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationApplication of B-Spline to Numerical Solution of a System of Singularly Perturbed Problems
Mathematca Aeterna, Vol. 1, 011, no. 06, 405 415 Applcaton of B-Splne to Numercal Soluton of a System of Sngularly Perturbed Problems Yogesh Gupta Department of Mathematcs Unted College of Engneerng &
More informationCSE 546 Midterm Exam, Fall 2014(with Solution)
CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationLecture 11. minimize. c j x j. j=1. 1 x j 0 j. +, b R m + and c R n +
Topcs n Theoretcal Computer Scence May 4, 2015 Lecturer: Ola Svensson Lecture 11 Scrbes: Vncent Eggerlng, Smon Rodrguez 1 Introducton In the last lecture we covered the ellpsod method and ts applcaton
More informationLecture 3. Ax x i a i. i i
18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationChapter - 2. Distribution System Power Flow Analysis
Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More informationInexact Newton Methods for Inverse Eigenvalue Problems
Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.
More informationA new Approach for Solving Linear Ordinary Differential Equations
, ISSN 974-57X (Onlne), ISSN 974-5718 (Prnt), Vol. ; Issue No. 1; Year 14, Copyrght 13-14 by CESER PUBLICATIONS A new Approach for Solvng Lnear Ordnary Dfferental Equatons Fawz Abdelwahd Department of
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationThe Feynman path integral
The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space
More information