On Optimal Probabilities in Stochastic Coordinate Descent Methods
|
|
- Jodie Hubbard
- 5 years ago
- Views:
Transcription
1 On Optmal Probabltes n Stochastc Coordnate Descent Methods Peter Rchtárk and Martn Takáč Unversty of Ednburgh, Unted Kngdom October, 203 Abstract We propose and analyze a new parallel coordnate descent method NSync n whch at each teraton a random subset of coordnates s updated, n parallel, allowng for the subsets to be chosen non-unformly. We derve convergence rates under a strong convexty assumpton, and comment on how to assgn probabltes to the sets to optmze the bound. The complexty and practcal performance of the method can outperform ts unform varant by an order of magntude. Surprsngly, the strategy of updatng a sngle randomly selected coordnate per teraton wth optmal probabltes may requre less teratons, both n theory and practce, than the strategy of updatng all coordnates at every teraton. Introducton In ths work we consder the optmzaton problem mn φx), ) x Rn where φ s strongly convex and smooth. We propose a new algorthm, and call t NSync Nonunform SYNchronous Coordnate descent). Algorthm NSync) Input: Intal pont x 0 R n, subset probabltes {p S } and stepsze parameters w,..., w n > 0 for k = 0,, 2,... do Select a random set of coordnates Ŝ {,..., n} such that ProbŜ = S) = p S Updated selected coordnates: x k+ = x k Ŝ w φx k )e end for In NSync, we frst assgn a probablty p S 0 to every subset S of [n] := {,..., n}, wth S p S =, and pck stepsze parameters w > 0, =, 2,..., n. At every teraton, a random set Ŝ s generated, ndependently from prevous teratons, followng the law ProbŜ = S) = p S, and then coordnates Ŝ are updated n parallel by movng n the drecton of the negatve partal dervatve wth stepsze /w. The updates are synchronzed: no processor/thread s allowed to proceed before all updates are appled, generatng the new terate x k+. We specfcally study samplngs Ŝ whch are non-unform n the sense that p := Prob Ŝ) = S: S p S s allowed to vary wth. By φx) we mean φx), e, where e R n s the -th unt coordnate vector. Lterature. Seral stochastc coordnate descent methods were proposed and analyzed n [6, 3, 5, 8], and more recently n varous settngs n [2, 7, 8, 9, 2, 9, 24, 3]. Parallel methods were consdered n [2, 6, 4], and more recently n [22, 5, 23, 4,, 20, 0, ]. A memory dstrbuted method scalng to bg data problems was recently developed n [7]. A nonunform coordnate
2 descent method updatng a sngle coordnate at a tme was proposed n [5], and one updatng two coordnates at a tme n [2]. To the best of our knowledge, NSync s the frst nonunform parallel coordnate descent method. 2 Analyss Our analyss of NSync s based on two assumptons. The frst assumpton generalzes the ESO concept ntroduced n [6] and later used n [22, 23, 5, 4, 7] to nonunform samplngs. The second assumpton requres that φ be strongly convex. Notaton: For x, y, u R n we wrte x 2 u := u x 2, x, y u := n = u y x, x y := x y,..., x n y n ) and u := /u,..., /u n ). For S [n] and h R n, let h [S] := S h e. Assumpton Nonunform ESO: Expected Separable Overapproxmaton). Assume p = p,..., p n ) T > 0 and that for some postve vector w R n and all x, h R n, E[φx + h [ Ŝ] )] φx) + φx), h p + 2 h 2 p w. 2) Inequaltes of type 2), n the unform case p = p j for all, j), were studed n [6, 22, 5, 7]. Assumpton 2 Strong convexty). We assume that φ s γ-strongly convex wth respect to the norm v, where v = v,..., v n ) T > 0 and γ > 0. That s, we requre that for all x, h R n, φx + h) φx) + φx), h + γ 2 h 2 v. 3) We can now establsh a bound on the number of teratons suffcent for NSync to approxmately solve ) wth hgh probablty. Theorem 3. Let Assumptons and 2 be satsfed. Choose x 0 R n, 0 < ɛ < φx 0 ) φ and 0 < ρ <, where φ := mn x φx). Let Λ := max w p v. 4) If {x k } are the random terates generated by NSync, then ) K Λ γ log φx 0 ) φ ɛρ Probφx K ) φ ɛ) ρ. 5) Moreover, we have the lower bound Λ w v )/E[ Ŝ ]. Proof. We frst clam that φ s µ-strongly convex wth respect to the norm w p,.e., φx + h) φx) + φx), h + µ 2 h 2 w p, 6) where µ := γ/λ. Indeed, ths follows by comparng 3) and 6) n the lght of 4). Let x be such that φx ) = φ. Usng 6) wth h = x x, φ φx) 6) mn h R n φx), h + µ 2 h 2 w p = 2µ φx) 2 p w. 7) Let h k := Dagw)) φx k ). Then x k+ = x k + h k ) [ Ŝ], and utlzng Assumpton, we get E[φx k+ ) x k ] = E[φx k + h k 2) ) [ Ŝ] )] φx k ) + φx k ), h k p + 2 hk 2 p w 8) = φx k ) 2 φxk ) 2 p w 7) φx k ) µφx k ) φ ). 9) Takng expectatons n the last nequalty and rearrangng the terms, we obtan E[φx k+ ) φ ] µ)e[φx k ) φ ] µ) k+ φx 0 ) φ ). Usng ths, Markov nequalty, and the defnton of K, we fnally get Probφx K ) φ ɛ) E[φx K ) φ ]/ɛ µ) K φx 0 ) φ )/ɛ ρ. Let us now establsh the last clam. Frst, note that see [6, Sec 3.2] for more results of ths type), p = S: S p S = S Lettng := {p R n : p 0, p = E[ Ŝ ]}, we have Λ 4)+0) w p = v E[ Ŝ ] mn max p : S p S = S p S S = E[ Ŝ ]. 0) w v, where the last equalty follows snce optmal p s proportonal to w /v. 2
3 Theorem 3 s generc n the sense that we do not say when Assumptons and 2 are satsfed, how should one go about to choose the stepszes w and probabltes {p S }. In the next secton we address these ssues. On the other hand, ths abstract settng allowed us to wrte a bref complexty proof. Change of varables. Consder the change of varables y = Dagd)x, where d > 0. Defnng φ d y) := φx), we get φ d y) = Dagd)) φx). It can be seen that 2), 3) can equvalently be wrtten n terms of φ d, wth w replaced by w d := w d 2 and v replaced by v d := v d 2. By choosng d = v, we obtan v d = for all, recoverng standard strong convexty. 3 Nonunform samplngs and ESO Consder now problem ) wth φ of the form φx) := fx) + γ 2 x 2 v, ) where v > 0. Note that Assumpton 2 s satsfed. We further make the followng two assumptons. Assumpton 4 Smoothness). f has Lpschtz gradent wth respect to the coordnates, wth postve constants L,..., L n. That s, fx) fx + te ) L t for all x R n and t R. Assumpton 5 Partal separablty). fx) = J J f Jx), where J s a fnte collecton of nonempty subsets of [n] and f J are dfferentable convex functons such that f J depends on coordnates J only. Let ω := max J J. We say that f s separable of degree ω. Unform parallel coordnate descent methods for regularzed problems wth f of the above structure were analyzed n [6]. Example 6. Let fx) = 2 Ax b 2 2, where A R m n. Then L = A : 2 2 and fx) = 2 m j= A j:x b j ) 2, whence ω s the maxmum # of nonzeros n a row of A. Nonunform samplng. Instead of consderng the general case of arbtrary p S assgned to all subsets of [n], here we consder a specal knd of samplng havng two advantages: ) sets can be generated easly, ) t leads to larger stepszes /w and hence mproved convergence rate. Fx τ [n] and c and let S,..., S c be a collecton of possbly overlappng) subsets of [n] such that S j τ for all and c j= S j = [n]. Moreover, let q = q,..., q c ) > 0 be a probablty vector. Let Ŝj be τ-nce samplng from S j ; that s, Ŝj pcks subsets of S j havng cardnalty τ, unformly at random. We assume these samplngs are ndependent. Now, Ŝ s generated as follows. We frst pck j {,..., c} wth probablty q j, and then draw Ŝj. Note that we do not need to compute the quanttes p S, S [n], to execute NSync. In fact, t s much easer to mplement the samplng va the two-ter procedure explaned above. Samplng Ŝ s a nonunform varant of the τ-nce samplng studed n [6], whch here arses as a specal case for c =. Note that where δ j = f S j, and 0 otherwse. p = c j= q j τ S j δ j > 0, [n], 2) Theorem 7. Let Assumptons 4 and 5 be satsfed, and let Ŝ be the samplng descrbed above. Then Assumpton s satsfed wth p gven by 2) and any w = w,..., w n ) T for whch w w where ω j := max J J J S j ω. := L+v p c j= q j τ S j δ j + τ )ωj ) max{, S j } Proof. Snce f s separable of degree ω, so s φ because 2 x 2 v s separable). Now, ), [n], 3) E[φx + h [ Ŝ] )] = E[E[φx + h [Ŝj]) j]] = c j= q je[φx + h [ Ŝ j] )] 4) { ) )} c j= q j fx) + τ S j fx), h [Sj] τ )ωj ) max{, S j } h [Sj] 2 L+v, 5) where the last nequalty follows from the ESO for τ-nce samplngs establshed n [6, Theorem 5]. The clam now follows by comparng the above expresson and 2). 3
4 4 Optmal probabltes Observe that formula 3) can be used to desgn a samplng characterzed by the sets S j and probabltes q j ) that mnmzes Λ, whch n vew of Theorem 3 optmzes the convergence rate of the method. Seral settng. Consder the seral verson of NSync Prob Ŝ = ) = ). We can model ths va c = n, wth S = {} and p = q for all [n]. In ths case, usng 2) and 3), we get w = w = L + v. Mnmzng Λ n 4) over the probablty vector p gves the optmal probabltes we refer to ths as the optmal seral method) and optmal complexty p = L+v)/v, [n], Λ j Lj+vj)/vj OS = L +v v = n + L v, 6) respectvely. Note that the unform samplng, p = /n for all, leads to Λ US := n + n max j L j /v j we call ths the unform seral method), whch can be much larger than Λ OS. Moreover, under the change of varables y = Dagd)x, the gradent of f d y) := fdagd )y) has coordnate Lpschtz constants L d = L /d 2, whle the weghts n ) change to vd = v /d 2. Hence, the condton numbers L /v can not be mproved va such a change of varables. Optmal seral method can be faster than the fully parallel method. To model the fully parallel settng.e., the varant of NSync updatng all coordnates at every teraton), we can set c = and τ = n, whch yelds Λ F P = ω + ω max j L j /v j. Snce ω n, t s clear that Λ US Λ F P. However, for large enough ω t wll be the case that Λ F P Λ OS, mplyng, surprsngly, that the optmal seral method can be faster than the fully parallel method. ) Parallel settng. Fx τ and sets S j, j =, 2,..., c, and defne θ := max j + τ )ωj ) max{, S j }. Consder runnng NSync wth stepszes w = θl + v ) note that w w, so we are fne). From 4), 2) and 3) we see that the complexty of NSync s determned by ) w Λ = max p v = θ τ max c + L v j= q j δj S j ). The probablty vector q mnmzng ths quantty can be computed by solvng a lnear program wth c+ varables q,..., q c, α), 2n lnear nequalty constrants and a sngle lnear equalty constrant: max α,q {α subject to α b ) T q for all, q 0, } j q j =, where b R c, [n], are gven by b j = 5 Experments v δ j L +v ) S. j We now conduct 2 prelmnary small scale experments to llustrate the theory; the results are depcted below. All experments are wth problems of the form ) wth f chosen as n Example Unform Seral Optmal Seral 0 0 ω= φx k ) φ * 0 0 φx k ) φ * 0 0 ω=6 ω= Iteraton k Fully Parallel Seral Nonunform Epochs In the left plot we chose A R 2 30, γ =, v = 0.05, v = for and L = for all. We compare the US method p = /n, blue) wth the OS method p gven by 6), red). The dashed lnes show 95% confdence ntervals we run the methods 00 tmes, the lne n the mddle s the average behavor). Whle OS can be faster, t s senstve to over/under-estmaton of the constants L, v. In the rght plot we show that a nonunform seral NS) method can be faster than the fully parallel FP) varant we have chosen m = 8, n = 0 and 3 values of ω). On the horzontal axs we dsplay the number of epochs, where epoch corresponds to updatng n coordnates for FP ths s a sngle teraton, whereas for NS t corresponds to n teratons). 4
5 References [] Y. Ban, X. L, and Y. Lu. Parallel coordnate descent Newton for large-scale l-regularzed mnmzaton. arxv306:4080v. [2] J. Bradley, A. Kyrola, D. Bckson, and C. Guestrn. Parallel coordnate descent for l-regularzed loss mnmzaton. In ICML, 20. [3] C. D. Dang and G. Lan. Stochastc block mrror descent methods for nonsmooth and stochastc optmzaton. Techncal report, Georga Insttute of Technology, 203. [4] O. Fercoq. Parallel coordnate descent for the AdaBoost problem. In ICMLA, 203. [5] O. Fercoq and P. Rchtárk. Smooth mnmzaton of nonsmooth functons wth parallel coordnate descent methods. arxv: , 203. [6] C-J. Hseh, K-W. Chang, C-J. Ln, S.S. Keerth,, and S. Sundarajan. A dual coordnate descent method for large-scale lnear SVM. In ICML, [7] S. Lacoste-Julen, M. Jagg, M. Schmdt, and P. Pletcher. Block-coordnate frank-wolfe optmzaton for structural svms. In ICML, 203. [8] Z. Lu and L. Xao. On the complexty analyss of randomzed block-coordnate descent methods. arxv: , 203. [9] Z. Lu and L. Xao. Randomzed block coordnate non-monotone gradent methods for a class of nonlnear programmng. arxv: , 203. [0] I. Mukherjee, Y. Snger, R. Frongllo, and K. Cann. Parallel boostng wth momentum. In ECML, 203. [] I. Necoara and D. Clpc. Effcent parallel coordnate descent algorthm for convex optmzaton problems wth separable constrants: applcaton to dstrbuted mpc. J. of Process Control, 23: , 203. [2] I. Necoara, Yu. Nesterov, and F. Glneur. Effcency of randomzed coordnate descent methods on optmzaton problems wth lnearly coupled constrants. Techncal report, 202. [3] Yu. Nesterov. Effcency of coordnate descent methods on huge-scale optmzaton problems. SIAM Journal on Optmzaton, 222):34 362, 202. [4] P. Rchtárk and M. Takáč. Effcent seral and parallel coordnate descent methods for huge-scale truss topology desgn. In Operatons Research Proceedngs, pages Sprnger, 202. [5] P. Rchtárk and M. Takáč. Iteraton complexty of randomzed block-coordnate descent methods for mnmzng a composte functon. Mathematcal Programmng, 202. [6] P. Rchtárk and M. Takáč. Parallel coordnate descent methods for bg data optmzaton. arxv: , 202. [7] P. Rchtárk and M. Takáč. Dstrbuted coordnate descent method for learnng wth bg data. arxv: , 203. [8] S. Shalev-Shwartz and A. Tewar. Stochastc Methods for l-regularzed Loss Mnmzaton. JMLR, 2: , 20. [9] S. Shalev-Shwartz and T. Zhang. Proxmal stochastc dual coordnate ascent. arxv:2:277, 202. [20] S. Shalev-Shwartz and T. Zhang. Accelerated mn-batch stochastc dual coordnate ascent. arxv: v, May 203. [2] S. Shalev-Shwartz and T. Zhang. Stochastc dual coordnate ascent methods for regularzed loss mnmzaton. JMLR, 4: , 203. [22] M. Takáč, A. Bjral, P. Rchtárk, and N. Srebro. Mn-batch prmal and dual methods for SVMs. In ICML, 203. [23] R. Tappenden, P. Rchtárk, and B. Büke. Separable approxmatons and decomposton methods for the augmented Lagrangan. arxv: , 203. [24] R. Tappenden, P. Rchtárk, and J. Gondzo. Inexact coordnate descent: complexty and precondtonng. arxv: ,
Generalized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationPrimal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses
Prmal Method for ERM wth Flexble Mn-batchng Schemes and Non-convex Losses Domnk Csba Peter Rchtárk June 7, 205 Abstract In ths work we develop a new algorthm for regularzed emprcal rsk mnmzaton. Our method
More informationCoordinate Descent with Arbitrary Sampling I: Algorithms and Complexity
Coordnate Descent wth Arbtrary Samplng I: Algorthms and Complexty Zheng Qu Peter Rchtárk December 27, 2014 Abstract We study the problem of mnmzng the sum of a smooth convex functon and a convex blockseparable
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationarxiv: v2 [math.oc] 2 Mar 2017
Dual Free Adaptve Mn-batch SDCA for Emprcal Rsk Mnmzaton X He 1 Martn Takáč 1 arxv:1510.06684v2 [math.oc] 2 Mar 2017 Abstract In ths paper we develop dual free mn-batch SDCA wth adaptve probabltes for
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationMatrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD
Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo
More informationCS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016
CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationAPPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14
APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More informationOn the Global Linear Convergence of the ADMM with Multi-Block Variables
On the Global Lnear Convergence of the ADMM wth Mult-Block Varables Tany Ln Shqan Ma Shuzhong Zhang May 31, 01 Abstract The alternatng drecton method of multplers ADMM has been wdely used for solvng structured
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationprinceton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora
prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationOn the Multicriteria Integer Network Flow Problem
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of
More informationEstimation: Part 2. Chapter GREG estimation
Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationExercises of Chapter 2
Exercses of Chapter Chuang-Cheh Ln Department of Computer Scence and Informaton Engneerng, Natonal Chung Cheng Unversty, Mng-Hsung, Chay 61, Tawan. Exercse.6. Suppose that we ndependently roll two standard
More informationPhysics 5153 Classical Mechanics. Principle of Virtual Work-1
P. Guterrez 1 Introducton Physcs 5153 Classcal Mechancs Prncple of Vrtual Work The frst varatonal prncple we encounter n mechancs s the prncple of vrtual work. It establshes the equlbrum condton of a mechancal
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationBOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS
BOUNDEDNESS OF THE IESZ TANSFOM WITH MATIX A WEIGHTS Introducton Let L = L ( n, be the functon space wth norm (ˆ f L = f(x C dx d < For a d d matrx valued functon W : wth W (x postve sem-defnte for all
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationLecture 17: Lee-Sidford Barrier
CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the
More informationRandomized block proximal damped Newton method for composite self-concordant minimization
Randomzed block proxmal damped Newton method for composte self-concordant mnmzaton Zhaosong Lu June 30, 2016 Revsed: March 28, 2017 Abstract In ths paper we consder the composte self-concordant CSC mnmzaton
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationPHYS 705: Classical Mechanics. Calculus of Variations II
1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION
Advanced Mathematcal Models & Applcatons Vol.3, No.3, 2018, pp.215-222 ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EUATION
More informationPhysics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1
P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationCanonical transformations
Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,
More informationVector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.
Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm
More informationAnother converse of Jensen s inequality
Another converse of Jensen s nequalty Slavko Smc Abstract. We gve the best possble global bounds for a form of dscrete Jensen s nequalty. By some examples ts frutfulness s shown. 1. Introducton Throughout
More information6.854J / J Advanced Algorithms Fall 2008
MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More informationMaximizing Overlap of Large Primary Sampling Units in Repeated Sampling: A comparison of Ernst s Method with Ohlsson s Method
Maxmzng Overlap of Large Prmary Samplng Unts n Repeated Samplng: A comparson of Ernst s Method wth Ohlsson s Method Red Rottach and Padrac Murphy 1 U.S. Census Bureau 4600 Slver Hll Road, Washngton DC
More informationRandomness and Computation
Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually
More informationInexact Newton Methods for Inverse Eigenvalue Problems
Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.
More informationConvergence rates of proximal gradient methods via the convex conjugate
Convergence rates of proxmal gradent methods va the convex conjugate Davd H Gutman Javer F Peña January 8, 018 Abstract We gve a novel proof of the O(1/ and O(1/ convergence rates of the proxmal gradent
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More informationDesign and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm
Desgn and Optmzaton of Fuzzy Controller for Inverse Pendulum System Usng Genetc Algorthm H. Mehraban A. Ashoor Unversty of Tehran Unversty of Tehran h.mehraban@ece.ut.ac.r a.ashoor@ece.ut.ac.r Abstract:
More informationImportance Sampling for Minibatches
Importance Samplng for Mnbatches Domnk Csba and Peter Rchtárk School of Mathematcs Unversty of Ednburgh Unted Kngdom arxv:602.02283v [cs.lg] 6 Feb 206 February 9, 206 Abstract Mnbatchng s a very well studed
More informationOn a direct solver for linear least squares problems
ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013
COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationMATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1
MATH 5707 HOMEWORK 4 SOLUTIONS CİHAN BAHRAN 1. Let v 1,..., v n R m, all lengths v are not larger than 1. Let p 1,..., p n [0, 1] be arbtrary and set w = p 1 v 1 + + p n v n. Then there exst ε 1,..., ε
More informationInexact Variable Metric Stochastic Block-Coordinate Descent for Regularized Optimization
Inexact Varable Metrc Stochastc Block-Coordnate Descent for Regularzed Optmzaton LEE Chng-pe Department of Computer Scences Unversty of Wsconsn-Madson Madson, WI 53706, USA chng-pe@cs.wsc.edu Stephen J.
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationRandom Projection Algorithms for Convex Set Intersection Problems
Random Projecton Algorthms for Convex Set Intersecton Problems A. Nedć Department of Industral and Enterprse Systems Engneerng Unversty of Illnos, Urbana, IL 61801 angela@llnos.edu Abstract The focus of
More informationGeneral viscosity iterative method for a sequence of quasi-nonexpansive mappings
Avalable onlne at www.tjnsa.com J. Nonlnear Sc. Appl. 9 (2016), 5672 5682 Research Artcle General vscosty teratve method for a sequence of quas-nonexpansve mappngs Cuje Zhang, Ynan Wang College of Scence,
More informationA note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights
ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 7, Number 2, December 203 Avalable onlne at http://acutm.math.ut.ee A note on almost sure behavor of randomly weghted sums of φ-mxng
More informationP exp(tx) = 1 + t 2k M 2k. k N
1. Subgaussan tals Defnton. Say that a random varable X has a subgaussan dstrbuton wth scale factor σ< f P exp(tx) exp(σ 2 t 2 /2) for all real t. For example, f X s dstrbuted N(,σ 2 ) then t s subgaussan.
More informationLecture 3. Ax x i a i. i i
18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationStanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7
Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every
More information1 The Mistake Bound Model
5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there
More informationSolving Nonlinear Differential Equations by a Neural Network Method
Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More informationLecture Space-Bounded Derandomization
Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval
More information