A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning
|
|
- Cody Barker
- 5 years ago
- Views:
Transcription
1 A Delay-tolerant Proxmal-Gradent Algorthm for Dstrbuted Learnng Konstantn Mshchenko Franck Iutzeler Jérôme Malck Massh Amn KAUST Unv. Grenoble Alpes CNRS and Unv. Grenoble Alpes Unv. Grenoble Alpes ICML 2018
2 >>> Dstrbuted Learnng CONTEXT Global objectve 1 mn x R d m m examples m l j (x) + g(x) j=1 ndvdual losses (l j ) emprcal rsk mnmzaton regularzer g x data S 1 data S 2... data S M mn x R d Local data M π f (x) + g(x) =1 M data blocks stored locally local functon (f ) f (x) = 1 S j S l j (x) proporton π = S /m at Problem: Optmzaton: Large sum mnmzaton Varance-reduced sto. gradent v.s. v.s. Md-szed dstrbuted optmzaton ths presentaton 1 / 13
3 DISTRIBUTED OPTIMIZATION ASYNCHRONISM SCARSE COMMUNICATIONS CONCLUSION
4 >>> Dstrbuted Proxmal Gradent DISTRIBUTED OPTIMIZATION Problem: mn x M =1 π f (x) + g(x) Algorthm Implementaton Worker update on local varable x k+1/2 = x k γ f (x k ) for all = 1,.., M Master Map Worker 1 Worker 2... Worker M Dstrbuted Proxmal Gradent f 1 f 2... f M Drect extenson of the prox. grad.: Master gatherng of the local varables x k+1 = M =1 π x k+1/2 Master performs a proxmty operaton x k+1 1 =.. = x k+1 M = prox γg M =1 π prox γg Master (x k+1) Reduce Worker 1 Worker 2... Worker M Master: Intalze x = x 0, whle not converged do when all workers have fnshed: Receve (x ) from each of them x M =1 π x Broadcast x to all agents k k + 1 Interrupt all slaves Output x Worker : Intalze x = x = x, whle not nterrupted by master do Receve the most recent x z prox γg (x) x z γ f (z) Send x to the master f (x) = S 1 j S l j (x) 2 / 13
5 >>> Convergence and rate DISTRIBUTED OPTIMIZATION Dstrbuted Proxmal Gradent Master: Worker : Intalze x whle not converged do when all workers have fnshed: Receve (x ) from each of them x M =1 π x Broadcast x to all agents k k + 1 Interrupt all slaves Output x Intalze x = x = x, whle not nterrupted by master do Receve the most recent x z prox γg (x) x z γ f (z) Send x to the master f (x) = S 1 j S l j (x) Defne tme k as the number of master updates x k s the value of varable x at tme k Theorem Let each f be L-smooth and µ-strongly convex. Then, for γ (0, 2/(µ + L)], x k x 2 (1 α) k x 0 x 2 where x s the unque mnmzer of the mn x M =1 π f (x) + g(x) and α = 2γµL/(µ + L) (0, 1]. Proof. It s exactly proxmal gradent descent. 3 / 13
6 >>> Two Lmtatons DISTRIBUTED OPTIMIZATION Synchronsm: Master wats for all workers at each tme mage: W. Yn Communcatons: Sendng may be more costly than computng a gradent Local updates may be: fast (or not, dependng on S ), Context: Federated Learnng Dstrbuted Data mage: Google AI late (or not, dependng on state), costly (often). We provde an effcent Dstrbuted Proxmal Gradent algorthm: Asynchronous delay-tolerant Scarser comp./comm. tradeoff communcatons 4 / 13
7 DISTRIBUTED OPTIMIZATION ASYNCHRONISM SCARSE COMMUNICATIONS CONCLUSION
8 >>> Asynchronous Master Slave Framework ASYNCHRONISM Master x k = x k 1 + = (k) x k Worker 1 ( f1, prox g ) 1... Worker ( f, prox g )... Worker M ( fm, prox g ) M = (k) vewpont k D k k = k d k tme j (k) vewpont j j j k D k j j k d k j k tme teraton = receve from a worker + master update + send back tme k = number of teratons delay d k = tme snce last exchange wth d k = 0 ff updates at tme k, d k = d k elsewhere second delay D k = tme snce penultmate exchange wth Algorthm = global communcaton scheme + local optmzaton method what s x what s 5 / 13
9 >>> Communcaton scheme ASYNCHRONISM Master x k = x k 1 + = (k) x k Worker 1 Worker Worker M ( f1, prox g ) 1... ( f, prox g )... ( fm, prox g ) M DAve communcaton scheme master varable x k = combnaton of workers last contrbutons (x k dk ) one update/tme = one worker contrbuton but all workers are always nvolved at the master x k = x k 1 + wth = π (x k x k Dk M M.e. x k = π x k dk = π (x k Dk ) =1 =1 ) for = (k) PG proxmal gradent optmzaton method one step of proxmal gradent on regularzer g and local loss f = 1 S j S l j (x) z prox(x) γg x z γ f (z) ( π x x prev ) x prev x 6 / 13
10 >>> DAve-PG ASYNCHRONISM DAve-PG Master: Intalze x whle not converged do when a worker fnshes: Receve adjustment from t x x + Send x to the agent n return k k + 1 Interrupt all slaves Output x = prox γg (x) Worker : Intalze x = x = x, whle not nterrupted by master do Receve the most recent x z prox γg (x) x z γ f (z) ( π x x prev ) x prev Send adjustment to master f (x) = S 1 j S l j (x) x In practce: MPI blockng Send and Receve No computaton/storage at the master x = prox γg (x) s the convergng varable 7 / 13
11 >>> Comparson wth other combnatons ASYNCHRONISM DAve-PG PIAG Combnng: terates gradents Update x k ( M : prox γg =1 πxk Dk γ ) ( M =1 π f(xk Dk ) prox γg x k 1 γ ) M =1 π f(xk Dk ) Combnng terates s more stable than combnng gradents Example: 2D quadratc functons on 5 worker but one worker 10x slower than the others. Stepsze γ of PIAG s 10x smaller due to delays the one for DAve-PG stays the same to be detaled later. DAve-PG s less chaotc and faster than PIAG - A. Aytekn, H. Feyzmahdavan, and M. Johansson Analyss and mplementaton of an asynchronous optmzaton algorthm for the parameter server, arxv: N. Vanl, M. Gurbuzbalaban, and A. Ozdaglar A stronger convergence result on the proxmal ncremental aggregated gradent method, arxv: / 13
12 >>> Analyss ASYNCHRONISM Revstng the clock: epoch sequence (k m) = recursvely defned by k 0 = 0 and k m+1 = mn{k : each worker made at least 2 updates on the nterval [k m, k]} = mn{k : k D k k m for all = 1,.., M} epoch tme m = number of epochs Intuton: k m+1 s the frst moment when x k no longer depends drectly on nformaton pror to k m. x k = M =1 π x k Dk γ M =1 π f (x k Dk ) Theorem Let each f be L-smooth and µ-strongly convex. Then, for γ (0, 2/(µ + L)], k k m, x k x 2 (1 α) m x 0 x 2 where x s the unque mnmzer of the mn x M =1 π f (x) + g(x) and α = 2γµL/(µ + L). Exact same result as the synchronous case but over the epoch tme m, not k 9 / 13
13 >>> Performances ASYNCHRONISM 1 Logstc regresson w/ elastc net m m j=1 log(1+exp( y jz T j x)) + λ1 x 1 + λ 2 2 x machnes (1 CPU, 1 GB) n a cluster 10% of the data n machne one, even on the rest DAve-PG Synchronous PG PIAG 10 0 Suboptmalty ,000 1,200 1,400 1,600 1, ,000 1,200 1,400 1,600 1,800 2,000 2,200 Wallclock tme (s) Wallclock tme (s) RCV1 ( ) URL ( ) 10 / 13
14 DISTRIBUTED OPTIMIZATION ASYNCHRONISM SCARSE COMMUNICATIONS CONCLUSION
15 >>> More local computaton SCARSE COMMUNICATIONS To exchange less, a soluton s to compute more. DAve-RPG Master: Intalze x whle not converged do when a worker fnshes: Receve adjustment from t x x + Send x to the agent n return k k + 1 Interrupt all slaves Output x = prox γg (x) Worker : Intalze x = x = x, whle not nterrupted by master do Receve the most recent x Select a number of repettons p Intalze = 0 for q = 1 to p do z prox γg (x + ) x z γ f (z) ( + π x x prev ) x x prev Send the adjustment to the master f (x) = S 1 j S l j (x) Dfference wth before: at each local step, the worker performs p proxmal gradent steps controlled rate mprovement by max repettons p n the epoch but the epochs become longer p 1 1 γµ q=1 (1 γµ) q 1 mn π q 11 / 13
16 >>> Performance SCARSE COMMUNICATIONS 1 Logstc regresson w/ elastc net m m j=1 log(1+exp( y jz T j x)) + λ1 x 1 + λ 2 2 x machnes (1 CPU, 1 GB) n a cluster 10% of the data n machne one, even on the rest 10 1 p = 1 p = 4 p = 7 p = 10 Suboptmalty Wallclock tme (s) there s a compromse to fnd but p can be changed wthout restrctons 12 / 13
17 DISTRIBUTED OPTIMIZATION ASYNCHRONISM SCARSE COMMUNICATIONS CONCLUSION
18 >>> Summary & Perspectves CONCLUSION Dstrbuted Delay-Tolerant Proxmal Gradent Algorthm: Smple to mplement Adaptable to performance/computaton compromse General, adaptable epoch analyss Poster # 155 Future works: Sparse communcatons Usng dentfcaton to control the communcatons Thank you! Franck IUTZELER 13 / 13
Feature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationDistributed and Stochastic Machine Learning on Big Data
Dstrbuted and Stochastc Machne Learnng on Bg Data Department of Computer Scence and Engneerng Hong Kong Unversty of Scence and Technology Hong Kong Introducton Synchronous ADMM Asynchronous ADMM Stochastc
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationOutline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]
DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationResearch Article. Almost Sure Convergence of Random Projected Proximal and Subgradient Algorithms for Distributed Nonsmooth Convex Optimization
To appear n Optmzaton Vol. 00, No. 00, Month 20XX, 1 27 Research Artcle Almost Sure Convergence of Random Projected Proxmal and Subgradent Algorthms for Dstrbuted Nonsmooth Convex Optmzaton Hdea Idua a
More informationMultilayer neural networks
Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationPrimal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses
Prmal Method for ERM wth Flexble Mn-batchng Schemes and Non-convex Losses Domnk Csba Peter Rchtárk June 7, 205 Abstract In ths work we develop a new algorthm for regularzed emprcal rsk mnmzaton. Our method
More informationCSE 546 Midterm Exam, Fall 2014(with Solution)
CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationMulti-layer neural networks
Lecture 0 Mult-layer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Lnear regresson w Lnear unts f () Logstc regresson T T = w = p( y =, w) = g( w ) w z f () = p ( y = ) w d w d Gradent
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationStochastic Optimization Methods
Stochastc Optmzaton Methods Lecturer: Pradeep Ravkumar Co-nstructor: Aart Sngh Convex Optmzaton 10-725/36-725 Adapted from sldes from Ryan Tbshran Stochastc gradent descent Consder sum of functons 1 mn
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationCS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationEvaluation of classifiers MLPs
Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label
More information4DVAR, according to the name, is a four-dimensional variational method.
4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationChapter - 2. Distribution System Power Flow Analysis
Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationTornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003
Tornado and Luby Transform Codes Ashsh Khst 6.454 Presentaton October 22, 2003 Background: Erasure Channel Elas[956] studed the Erasure Channel β x x β β x 2 m x 2 k? Capacty of Noseless Erasure Channel
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationAsynchronous optimization: exploring the uncharted middle ground
Motvaton Dstrbuted optmzaton n networks Asynchronous optmzaton: explorng the uncharted mddle ground Large-scale machne learnng Mult-agent coordnaton Hamd Reza Feyzmahdavan, Arda Aytekn and Mkael Johansson
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationfind (x): given element x, return the canonical element of the set containing x;
COS 43 Sprng, 009 Dsjont Set Unon Problem: Mantan a collecton of dsjont sets. Two operatons: fnd the set contanng a gven element; unte two sets nto one (destructvely). Approach: Canoncal element method:
More informationCS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016
CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.
More informationLinear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1
School of Computer Scence 10-601 Introducton to Machne Learnng Lnear Regresson Readngs: Bshop, 3.1 Matt Gormle Lecture 5 September 14, 016 1 Homework : Remnders Extenson: due Frda (9/16) at 5:30pm Rectaton
More informationAdaptive Variance Reducing for Stochastic Gradient Descent
daptve Varance Reducng for Stochastc Gradent Descent Zebang Shen, Hu Qan, Tengfe Zhou, Tongzhou Mu Zhejang Unversty, Chna {shenzebang, qanhu, zhoutengfe zju, mutongzhou}@zju.edu.cn bstract Varance Reducng
More informationMin Cut, Fast Cut, Polynomial Identities
Randomzed Algorthms, Summer 016 Mn Cut, Fast Cut, Polynomal Identtes Instructor: Thomas Kesselhem and Kurt Mehlhorn 1 Mn Cuts n Graphs Lecture (5 pages) Throughout ths secton, G = (V, E) s a mult-graph.
More informationDFT with Planewaves pseudopotential accuracy (LDA, PBE) Fast time to solution 1 step in minutes (not hours!!!) to be useful for MD
LLNL-PRES-673679 Ths work was performed under the auspces of the U.S. Department of Energy by under contract DE-AC52-07NA27344. Lawrence Lvermore Natonal Securty, LLC Sequoa, IBM BGQ, 1,572,864 cores O(N)
More informationCommunication-Efficient Algorithms for Decentralized and Stochastic Optimization
oname manuscrpt o. (wll be nserted by the edtor) Communcaton-Effcent Algorthms for Decentralzed and Stochastc Optmzaton Guanghu Lan Soomn Lee Y Zhou the date of recept and acceptance should be nserted
More informationTOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION
1 2 MULTIPLIERLESS FILTER DESIGN Realzaton of flters wthout full-fledged multplers Some sldes based on support materal by W. Wolf for hs book Modern VLSI Desgn, 3 rd edton. Partly based on followng papers:
More informationVector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.
Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm
More informationInexact Variable Metric Stochastic Block-Coordinate Descent for Regularized Optimization
Inexact Varable Metrc Stochastc Block-Coordnate Descent for Regularzed Optmzaton LEE Chng-pe Department of Computer Scences Unversty of Wsconsn-Madson Madson, WI 53706, USA chng-pe@cs.wsc.edu Stephen J.
More informationHessian-CoCoA: a general parallel and distributed framework for non-strongly convex regularizers
Research Collecton Master Thess Hessan-CoCoA: a general parallel and dstrbuted framework for non-strongly convex regularzers Author(s): Gargan, Matlde Publcaton Date: 2017-06-29 Permanent Lnk: https://do.org/10.3929/ethz-b-000183454
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationMAXIMUM A POSTERIORI TRANSDUCTION
MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationMaxent Models & Deep Learning
Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson
More informationMULTI-AGENT DISTRIBUTED LARGE-SCALE OPTIMIZATION BY INEXACT CONSENSUS ALTERNATING DIRECTION METHOD OF MULTIPLIERS
014 IEEE Internatonal Conference on Acoustc, Speech and Sgnal Processng ICASSP MULTI-AGENT DISTRIBUTED LARGE-SCALE OPTIMIZATION BY INEXACT CONSENSUS ALTERNATING DIRECTION METHOD OF MULTIPLIERS Tsung-Hu
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationCSCI B609: Foundations of Data Science
CSCI B609: Foundatons of Data Scence Lecture 13/14: Gradent Descent, Boostng and Learnng from Experts Sldes at http://grgory.us/data-scence-class.html Grgory Yaroslavtsev http://grgory.us Constraned Convex
More informationCoordinate Descent with Arbitrary Sampling I: Algorithms and Complexity
Coordnate Descent wth Arbtrary Samplng I: Algorthms and Complexty Zheng Qu Peter Rchtárk December 27, 2014 Abstract We study the problem of mnmzng the sum of a smooth convex functon and a convex blockseparable
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationMATH 567: Mathematical Techniques in Data Science Lab 8
1/14 MATH 567: Mathematcal Technques n Data Scence Lab 8 Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 11, 2017 Recall We have: a (2) 1 = f(w (1) 11 x 1 + W (1) 12 x 2 + W
More informationSupport Vector Machines
/14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x
More informationInexact Alternating Minimization Algorithm for Distributed Optimization with an Application to Distributed MPC
Inexact Alternatng Mnmzaton Algorthm for Dstrbuted Optmzaton wth an Applcaton to Dstrbuted MPC Ye Pu, Coln N. Jones and Melane N. Zelnger arxv:608.0043v [math.oc] Aug 206 Abstract In ths paper, we propose
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationReport on Image warping
Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationarxiv: v2 [math.oc] 2 Mar 2017
Dual Free Adaptve Mn-batch SDCA for Emprcal Rsk Mnmzaton X He 1 Martn Takáč 1 arxv:1510.06684v2 [math.oc] 2 Mar 2017 Abstract In ths paper we develop dual free mn-batch SDCA wth adaptve probabltes for
More informationIntroduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:
CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationSTATIC OPTIMIZATION: BASICS
STATIC OPTIMIZATION: BASICS 7A- Lecture Overvew What s optmzaton? What applcatons? How can optmzaton be mplemented? How can optmzaton problems be solved? Why should optmzaton apply n human movement? How
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationResource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud
Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal
More informationSingle-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition
Sngle-Faclty Schedulng over Long Tme Horzons by Logc-based Benders Decomposton Elvn Coban and J. N. Hooker Tepper School of Busness, Carnege Mellon Unversty ecoban@andrew.cmu.edu, john@hooker.tepper.cmu.edu
More informationEstimating the Fundamental Matrix by Transforming Image Points in Projective Space 1
Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com
More informationDistributed Non-Autonomous Power Control through Distributed Convex Optimization
Dstrbuted Non-Autonomous Power Control through Dstrbuted Convex Optmzaton S. Sundhar Ram and V. V. Veeravall ECE Department and Coordnated Scence Lab Unversty of Illnos at Urbana-Champagn Emal: {ssrnv5,vvv}@llnos.edu
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationOn Optimal Probabilities in Stochastic Coordinate Descent Methods
On Optmal Probabltes n Stochastc Coordnate Descent Methods Peter Rchtárk and Martn Takáč Unversty of Ednburgh, Unted Kngdom October, 203 Abstract We propose and analyze a new parallel coordnate descent
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationVariance-Reduced Stochastic Gradient Descent on Streaming Data
Varance-Reduced Stochastc Gradent Descent on Streamng Data Ellango Jothmurugesan Carnege Mellon Unversty ejothmu@cs.cmu.edu Phllp B. Gbbons Carnege Mellon Unversty gbbons@cs.cmu.edu Ashraf Tahmasb Iowa
More informationAdmin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester
0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #
More informationRandomized block proximal damped Newton method for composite self-concordant minimization
Randomzed block proxmal damped Newton method for composte self-concordant mnmzaton Zhaosong Lu June 30, 2016 Revsed: March 28, 2017 Abstract In ths paper we consder the composte self-concordant CSC mnmzaton
More informationOn a direct solver for linear least squares problems
ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationarxiv: v3 [cs.dc] 20 Nov 2018
Network Constraned Dstrbuted Dual Coordnate Ascent for Machne Learnng Myung Cho 1, Lfeng La 2 and Weyu Xu 3 arxv:1703.04785v3 [cs.dc] 20 Nov 2018 1 Department of Electrcal and Computer Engneerng, North
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationLECTURE 9 CANONICAL CORRELATION ANALYSIS
LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of
More informationProjective Splitting with Forward Steps: Asynchronous and Block-Iterative Operator Splitting
Projectve Splttng wth Forward Steps: Asynchronous and Block-Iteratve Operator Splttng Patrck R. Johnstone Jonathan Ecksten August 8, 2018 Abstract Ths work s concerned wth the classcal problem of fndng
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More informationCoordinate friendly structures, algorithms and applications arxiv: v3 [math.oc] 14 Aug 2016
Coordnate frendly structures, algorthms and applcatons arxv:1601.00863v3 [math.oc] 14 Aug 2016 Zhmn Peng, Tanyu Wu, Yangyang Xu, Mng Yan, and Wotao Yn Ths paper focuses on coordnate update methods, whch
More informationTHE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationBezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0
Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the
More informationIntroduction to the R Statistical Computing Environment R Programming
Introducton to the R Statstcal Computng Envronment R Programmng John Fox McMaster Unversty ICPSR 2018 John Fox (McMaster Unversty) R Programmng ICPSR 2018 1 / 14 Programmng Bascs Topcs Functon defnton
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More information