A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning

Size: px
Start display at page:

Download "A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning"

Transcription

1 A Delay-tolerant Proxmal-Gradent Algorthm for Dstrbuted Learnng Konstantn Mshchenko Franck Iutzeler Jérôme Malck Massh Amn KAUST Unv. Grenoble Alpes CNRS and Unv. Grenoble Alpes Unv. Grenoble Alpes ICML 2018

2 >>> Dstrbuted Learnng CONTEXT Global objectve 1 mn x R d m m examples m l j (x) + g(x) j=1 ndvdual losses (l j ) emprcal rsk mnmzaton regularzer g x data S 1 data S 2... data S M mn x R d Local data M π f (x) + g(x) =1 M data blocks stored locally local functon (f ) f (x) = 1 S j S l j (x) proporton π = S /m at Problem: Optmzaton: Large sum mnmzaton Varance-reduced sto. gradent v.s. v.s. Md-szed dstrbuted optmzaton ths presentaton 1 / 13

3 DISTRIBUTED OPTIMIZATION ASYNCHRONISM SCARSE COMMUNICATIONS CONCLUSION

4 >>> Dstrbuted Proxmal Gradent DISTRIBUTED OPTIMIZATION Problem: mn x M =1 π f (x) + g(x) Algorthm Implementaton Worker update on local varable x k+1/2 = x k γ f (x k ) for all = 1,.., M Master Map Worker 1 Worker 2... Worker M Dstrbuted Proxmal Gradent f 1 f 2... f M Drect extenson of the prox. grad.: Master gatherng of the local varables x k+1 = M =1 π x k+1/2 Master performs a proxmty operaton x k+1 1 =.. = x k+1 M = prox γg M =1 π prox γg Master (x k+1) Reduce Worker 1 Worker 2... Worker M Master: Intalze x = x 0, whle not converged do when all workers have fnshed: Receve (x ) from each of them x M =1 π x Broadcast x to all agents k k + 1 Interrupt all slaves Output x Worker : Intalze x = x = x, whle not nterrupted by master do Receve the most recent x z prox γg (x) x z γ f (z) Send x to the master f (x) = S 1 j S l j (x) 2 / 13

5 >>> Convergence and rate DISTRIBUTED OPTIMIZATION Dstrbuted Proxmal Gradent Master: Worker : Intalze x whle not converged do when all workers have fnshed: Receve (x ) from each of them x M =1 π x Broadcast x to all agents k k + 1 Interrupt all slaves Output x Intalze x = x = x, whle not nterrupted by master do Receve the most recent x z prox γg (x) x z γ f (z) Send x to the master f (x) = S 1 j S l j (x) Defne tme k as the number of master updates x k s the value of varable x at tme k Theorem Let each f be L-smooth and µ-strongly convex. Then, for γ (0, 2/(µ + L)], x k x 2 (1 α) k x 0 x 2 where x s the unque mnmzer of the mn x M =1 π f (x) + g(x) and α = 2γµL/(µ + L) (0, 1]. Proof. It s exactly proxmal gradent descent. 3 / 13

6 >>> Two Lmtatons DISTRIBUTED OPTIMIZATION Synchronsm: Master wats for all workers at each tme mage: W. Yn Communcatons: Sendng may be more costly than computng a gradent Local updates may be: fast (or not, dependng on S ), Context: Federated Learnng Dstrbuted Data mage: Google AI late (or not, dependng on state), costly (often). We provde an effcent Dstrbuted Proxmal Gradent algorthm: Asynchronous delay-tolerant Scarser comp./comm. tradeoff communcatons 4 / 13

7 DISTRIBUTED OPTIMIZATION ASYNCHRONISM SCARSE COMMUNICATIONS CONCLUSION

8 >>> Asynchronous Master Slave Framework ASYNCHRONISM Master x k = x k 1 + = (k) x k Worker 1 ( f1, prox g ) 1... Worker ( f, prox g )... Worker M ( fm, prox g ) M = (k) vewpont k D k k = k d k tme j (k) vewpont j j j k D k j j k d k j k tme teraton = receve from a worker + master update + send back tme k = number of teratons delay d k = tme snce last exchange wth d k = 0 ff updates at tme k, d k = d k elsewhere second delay D k = tme snce penultmate exchange wth Algorthm = global communcaton scheme + local optmzaton method what s x what s 5 / 13

9 >>> Communcaton scheme ASYNCHRONISM Master x k = x k 1 + = (k) x k Worker 1 Worker Worker M ( f1, prox g ) 1... ( f, prox g )... ( fm, prox g ) M DAve communcaton scheme master varable x k = combnaton of workers last contrbutons (x k dk ) one update/tme = one worker contrbuton but all workers are always nvolved at the master x k = x k 1 + wth = π (x k x k Dk M M.e. x k = π x k dk = π (x k Dk ) =1 =1 ) for = (k) PG proxmal gradent optmzaton method one step of proxmal gradent on regularzer g and local loss f = 1 S j S l j (x) z prox(x) γg x z γ f (z) ( π x x prev ) x prev x 6 / 13

10 >>> DAve-PG ASYNCHRONISM DAve-PG Master: Intalze x whle not converged do when a worker fnshes: Receve adjustment from t x x + Send x to the agent n return k k + 1 Interrupt all slaves Output x = prox γg (x) Worker : Intalze x = x = x, whle not nterrupted by master do Receve the most recent x z prox γg (x) x z γ f (z) ( π x x prev ) x prev Send adjustment to master f (x) = S 1 j S l j (x) x In practce: MPI blockng Send and Receve No computaton/storage at the master x = prox γg (x) s the convergng varable 7 / 13

11 >>> Comparson wth other combnatons ASYNCHRONISM DAve-PG PIAG Combnng: terates gradents Update x k ( M : prox γg =1 πxk Dk γ ) ( M =1 π f(xk Dk ) prox γg x k 1 γ ) M =1 π f(xk Dk ) Combnng terates s more stable than combnng gradents Example: 2D quadratc functons on 5 worker but one worker 10x slower than the others. Stepsze γ of PIAG s 10x smaller due to delays the one for DAve-PG stays the same to be detaled later. DAve-PG s less chaotc and faster than PIAG - A. Aytekn, H. Feyzmahdavan, and M. Johansson Analyss and mplementaton of an asynchronous optmzaton algorthm for the parameter server, arxv: N. Vanl, M. Gurbuzbalaban, and A. Ozdaglar A stronger convergence result on the proxmal ncremental aggregated gradent method, arxv: / 13

12 >>> Analyss ASYNCHRONISM Revstng the clock: epoch sequence (k m) = recursvely defned by k 0 = 0 and k m+1 = mn{k : each worker made at least 2 updates on the nterval [k m, k]} = mn{k : k D k k m for all = 1,.., M} epoch tme m = number of epochs Intuton: k m+1 s the frst moment when x k no longer depends drectly on nformaton pror to k m. x k = M =1 π x k Dk γ M =1 π f (x k Dk ) Theorem Let each f be L-smooth and µ-strongly convex. Then, for γ (0, 2/(µ + L)], k k m, x k x 2 (1 α) m x 0 x 2 where x s the unque mnmzer of the mn x M =1 π f (x) + g(x) and α = 2γµL/(µ + L). Exact same result as the synchronous case but over the epoch tme m, not k 9 / 13

13 >>> Performances ASYNCHRONISM 1 Logstc regresson w/ elastc net m m j=1 log(1+exp( y jz T j x)) + λ1 x 1 + λ 2 2 x machnes (1 CPU, 1 GB) n a cluster 10% of the data n machne one, even on the rest DAve-PG Synchronous PG PIAG 10 0 Suboptmalty ,000 1,200 1,400 1,600 1, ,000 1,200 1,400 1,600 1,800 2,000 2,200 Wallclock tme (s) Wallclock tme (s) RCV1 ( ) URL ( ) 10 / 13

14 DISTRIBUTED OPTIMIZATION ASYNCHRONISM SCARSE COMMUNICATIONS CONCLUSION

15 >>> More local computaton SCARSE COMMUNICATIONS To exchange less, a soluton s to compute more. DAve-RPG Master: Intalze x whle not converged do when a worker fnshes: Receve adjustment from t x x + Send x to the agent n return k k + 1 Interrupt all slaves Output x = prox γg (x) Worker : Intalze x = x = x, whle not nterrupted by master do Receve the most recent x Select a number of repettons p Intalze = 0 for q = 1 to p do z prox γg (x + ) x z γ f (z) ( + π x x prev ) x x prev Send the adjustment to the master f (x) = S 1 j S l j (x) Dfference wth before: at each local step, the worker performs p proxmal gradent steps controlled rate mprovement by max repettons p n the epoch but the epochs become longer p 1 1 γµ q=1 (1 γµ) q 1 mn π q 11 / 13

16 >>> Performance SCARSE COMMUNICATIONS 1 Logstc regresson w/ elastc net m m j=1 log(1+exp( y jz T j x)) + λ1 x 1 + λ 2 2 x machnes (1 CPU, 1 GB) n a cluster 10% of the data n machne one, even on the rest 10 1 p = 1 p = 4 p = 7 p = 10 Suboptmalty Wallclock tme (s) there s a compromse to fnd but p can be changed wthout restrctons 12 / 13

17 DISTRIBUTED OPTIMIZATION ASYNCHRONISM SCARSE COMMUNICATIONS CONCLUSION

18 >>> Summary & Perspectves CONCLUSION Dstrbuted Delay-Tolerant Proxmal Gradent Algorthm: Smple to mplement Adaptable to performance/computaton compromse General, adaptable epoch analyss Poster # 155 Future works: Sparse communcatons Usng dentfcaton to control the communcatons Thank you! Franck IUTZELER 13 / 13

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Distributed and Stochastic Machine Learning on Big Data

Distributed and Stochastic Machine Learning on Big Data Dstrbuted and Stochastc Machne Learnng on Bg Data Department of Computer Scence and Engneerng Hong Kong Unversty of Scence and Technology Hong Kong Introducton Synchronous ADMM Asynchronous ADMM Stochastc

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1] DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Research Article. Almost Sure Convergence of Random Projected Proximal and Subgradient Algorithms for Distributed Nonsmooth Convex Optimization

Research Article. Almost Sure Convergence of Random Projected Proximal and Subgradient Algorithms for Distributed Nonsmooth Convex Optimization To appear n Optmzaton Vol. 00, No. 00, Month 20XX, 1 27 Research Artcle Almost Sure Convergence of Random Projected Proxmal and Subgradent Algorthms for Dstrbuted Nonsmooth Convex Optmzaton Hdea Idua a

More information

Multilayer neural networks

Multilayer neural networks Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses

Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses Prmal Method for ERM wth Flexble Mn-batchng Schemes and Non-convex Losses Domnk Csba Peter Rchtárk June 7, 205 Abstract In ths work we develop a new algorthm for regularzed emprcal rsk mnmzaton. Our method

More information

CSE 546 Midterm Exam, Fall 2014(with Solution)

CSE 546 Midterm Exam, Fall 2014(with Solution) CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Multi-layer neural networks

Multi-layer neural networks Lecture 0 Mult-layer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Lnear regresson w Lnear unts f () Logstc regresson T T = w = p( y =, w) = g( w ) w z f () = p ( y = ) w d w d Gradent

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Stochastic Optimization Methods

Stochastic Optimization Methods Stochastc Optmzaton Methods Lecturer: Pradeep Ravkumar Co-nstructor: Aart Sngh Convex Optmzaton 10-725/36-725 Adapted from sldes from Ryan Tbshran Stochastc gradent descent Consder sum of functons 1 mn

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Evaluation of classifiers MLPs

Evaluation of classifiers MLPs Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003 Tornado and Luby Transform Codes Ashsh Khst 6.454 Presentaton October 22, 2003 Background: Erasure Channel Elas[956] studed the Erasure Channel β x x β β x 2 m x 2 k? Capacty of Noseless Erasure Channel

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Asynchronous optimization: exploring the uncharted middle ground

Asynchronous optimization: exploring the uncharted middle ground Motvaton Dstrbuted optmzaton n networks Asynchronous optmzaton: explorng the uncharted mddle ground Large-scale machne learnng Mult-agent coordnaton Hamd Reza Feyzmahdavan, Arda Aytekn and Mkael Johansson

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

find (x): given element x, return the canonical element of the set containing x;

find (x): given element x, return the canonical element of the set containing x; COS 43 Sprng, 009 Dsjont Set Unon Problem: Mantan a collecton of dsjont sets. Two operatons: fnd the set contanng a gven element; unte two sets nto one (destructvely). Approach: Canoncal element method:

More information

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016 CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.

More information

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1 School of Computer Scence 10-601 Introducton to Machne Learnng Lnear Regresson Readngs: Bshop, 3.1 Matt Gormle Lecture 5 September 14, 016 1 Homework : Remnders Extenson: due Frda (9/16) at 5:30pm Rectaton

More information

Adaptive Variance Reducing for Stochastic Gradient Descent

Adaptive Variance Reducing for Stochastic Gradient Descent daptve Varance Reducng for Stochastc Gradent Descent Zebang Shen, Hu Qan, Tengfe Zhou, Tongzhou Mu Zhejang Unversty, Chna {shenzebang, qanhu, zhoutengfe zju, mutongzhou}@zju.edu.cn bstract Varance Reducng

More information

Min Cut, Fast Cut, Polynomial Identities

Min Cut, Fast Cut, Polynomial Identities Randomzed Algorthms, Summer 016 Mn Cut, Fast Cut, Polynomal Identtes Instructor: Thomas Kesselhem and Kurt Mehlhorn 1 Mn Cuts n Graphs Lecture (5 pages) Throughout ths secton, G = (V, E) s a mult-graph.

More information

DFT with Planewaves pseudopotential accuracy (LDA, PBE) Fast time to solution 1 step in minutes (not hours!!!) to be useful for MD

DFT with Planewaves pseudopotential accuracy (LDA, PBE) Fast time to solution 1 step in minutes (not hours!!!) to be useful for MD LLNL-PRES-673679 Ths work was performed under the auspces of the U.S. Department of Energy by under contract DE-AC52-07NA27344. Lawrence Lvermore Natonal Securty, LLC Sequoa, IBM BGQ, 1,572,864 cores O(N)

More information

Communication-Efficient Algorithms for Decentralized and Stochastic Optimization

Communication-Efficient Algorithms for Decentralized and Stochastic Optimization oname manuscrpt o. (wll be nserted by the edtor) Communcaton-Effcent Algorthms for Decentralzed and Stochastc Optmzaton Guanghu Lan Soomn Lee Y Zhou the date of recept and acceptance should be nserted

More information

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION 1 2 MULTIPLIERLESS FILTER DESIGN Realzaton of flters wthout full-fledged multplers Some sldes based on support materal by W. Wolf for hs book Modern VLSI Desgn, 3 rd edton. Partly based on followng papers:

More information

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence. Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm

More information

Inexact Variable Metric Stochastic Block-Coordinate Descent for Regularized Optimization

Inexact Variable Metric Stochastic Block-Coordinate Descent for Regularized Optimization Inexact Varable Metrc Stochastc Block-Coordnate Descent for Regularzed Optmzaton LEE Chng-pe Department of Computer Scences Unversty of Wsconsn-Madson Madson, WI 53706, USA chng-pe@cs.wsc.edu Stephen J.

More information

Hessian-CoCoA: a general parallel and distributed framework for non-strongly convex regularizers

Hessian-CoCoA: a general parallel and distributed framework for non-strongly convex regularizers Research Collecton Master Thess Hessan-CoCoA: a general parallel and dstrbuted framework for non-strongly convex regularzers Author(s): Gargan, Matlde Publcaton Date: 2017-06-29 Permanent Lnk: https://do.org/10.3929/ethz-b-000183454

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Maxent Models & Deep Learning

Maxent Models & Deep Learning Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson

More information

MULTI-AGENT DISTRIBUTED LARGE-SCALE OPTIMIZATION BY INEXACT CONSENSUS ALTERNATING DIRECTION METHOD OF MULTIPLIERS

MULTI-AGENT DISTRIBUTED LARGE-SCALE OPTIMIZATION BY INEXACT CONSENSUS ALTERNATING DIRECTION METHOD OF MULTIPLIERS 014 IEEE Internatonal Conference on Acoustc, Speech and Sgnal Processng ICASSP MULTI-AGENT DISTRIBUTED LARGE-SCALE OPTIMIZATION BY INEXACT CONSENSUS ALTERNATING DIRECTION METHOD OF MULTIPLIERS Tsung-Hu

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

CSCI B609: Foundations of Data Science

CSCI B609: Foundations of Data Science CSCI B609: Foundatons of Data Scence Lecture 13/14: Gradent Descent, Boostng and Learnng from Experts Sldes at http://grgory.us/data-scence-class.html Grgory Yaroslavtsev http://grgory.us Constraned Convex

More information

Coordinate Descent with Arbitrary Sampling I: Algorithms and Complexity

Coordinate Descent with Arbitrary Sampling I: Algorithms and Complexity Coordnate Descent wth Arbtrary Samplng I: Algorthms and Complexty Zheng Qu Peter Rchtárk December 27, 2014 Abstract We study the problem of mnmzng the sum of a smooth convex functon and a convex blockseparable

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

MATH 567: Mathematical Techniques in Data Science Lab 8

MATH 567: Mathematical Techniques in Data Science Lab 8 1/14 MATH 567: Mathematcal Technques n Data Scence Lab 8 Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 11, 2017 Recall We have: a (2) 1 = f(w (1) 11 x 1 + W (1) 12 x 2 + W

More information

Support Vector Machines

Support Vector Machines /14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x

More information

Inexact Alternating Minimization Algorithm for Distributed Optimization with an Application to Distributed MPC

Inexact Alternating Minimization Algorithm for Distributed Optimization with an Application to Distributed MPC Inexact Alternatng Mnmzaton Algorthm for Dstrbuted Optmzaton wth an Applcaton to Dstrbuted MPC Ye Pu, Coln N. Jones and Melane N. Zelnger arxv:608.0043v [math.oc] Aug 206 Abstract In ths paper, we propose

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

arxiv: v2 [math.oc] 2 Mar 2017

arxiv: v2 [math.oc] 2 Mar 2017 Dual Free Adaptve Mn-batch SDCA for Emprcal Rsk Mnmzaton X He 1 Martn Takáč 1 arxv:1510.06684v2 [math.oc] 2 Mar 2017 Abstract In ths paper we develop dual free mn-batch SDCA wth adaptve probabltes for

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

STATIC OPTIMIZATION: BASICS

STATIC OPTIMIZATION: BASICS STATIC OPTIMIZATION: BASICS 7A- Lecture Overvew What s optmzaton? What applcatons? How can optmzaton be mplemented? How can optmzaton problems be solved? Why should optmzaton apply n human movement? How

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition Sngle-Faclty Schedulng over Long Tme Horzons by Logc-based Benders Decomposton Elvn Coban and J. N. Hooker Tepper School of Busness, Carnege Mellon Unversty ecoban@andrew.cmu.edu, john@hooker.tepper.cmu.edu

More information

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1 Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com

More information

Distributed Non-Autonomous Power Control through Distributed Convex Optimization

Distributed Non-Autonomous Power Control through Distributed Convex Optimization Dstrbuted Non-Autonomous Power Control through Dstrbuted Convex Optmzaton S. Sundhar Ram and V. V. Veeravall ECE Department and Coordnated Scence Lab Unversty of Illnos at Urbana-Champagn Emal: {ssrnv5,vvv}@llnos.edu

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

On Optimal Probabilities in Stochastic Coordinate Descent Methods

On Optimal Probabilities in Stochastic Coordinate Descent Methods On Optmal Probabltes n Stochastc Coordnate Descent Methods Peter Rchtárk and Martn Takáč Unversty of Ednburgh, Unted Kngdom October, 203 Abstract We propose and analyze a new parallel coordnate descent

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Variance-Reduced Stochastic Gradient Descent on Streaming Data

Variance-Reduced Stochastic Gradient Descent on Streaming Data Varance-Reduced Stochastc Gradent Descent on Streamng Data Ellango Jothmurugesan Carnege Mellon Unversty ejothmu@cs.cmu.edu Phllp B. Gbbons Carnege Mellon Unversty gbbons@cs.cmu.edu Ashraf Tahmasb Iowa

More information

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester 0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #

More information

Randomized block proximal damped Newton method for composite self-concordant minimization

Randomized block proximal damped Newton method for composite self-concordant minimization Randomzed block proxmal damped Newton method for composte self-concordant mnmzaton Zhaosong Lu June 30, 2016 Revsed: March 28, 2017 Abstract In ths paper we consder the composte self-concordant CSC mnmzaton

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

arxiv: v3 [cs.dc] 20 Nov 2018

arxiv: v3 [cs.dc] 20 Nov 2018 Network Constraned Dstrbuted Dual Coordnate Ascent for Machne Learnng Myung Cho 1, Lfeng La 2 and Weyu Xu 3 arxv:1703.04785v3 [cs.dc] 20 Nov 2018 1 Department of Electrcal and Computer Engneerng, North

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

LECTURE 9 CANONICAL CORRELATION ANALYSIS

LECTURE 9 CANONICAL CORRELATION ANALYSIS LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of

More information

Projective Splitting with Forward Steps: Asynchronous and Block-Iterative Operator Splitting

Projective Splitting with Forward Steps: Asynchronous and Block-Iterative Operator Splitting Projectve Splttng wth Forward Steps: Asynchronous and Block-Iteratve Operator Splttng Patrck R. Johnstone Jonathan Ecksten August 8, 2018 Abstract Ths work s concerned wth the classcal problem of fndng

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Coordinate friendly structures, algorithms and applications arxiv: v3 [math.oc] 14 Aug 2016

Coordinate friendly structures, algorithms and applications arxiv: v3 [math.oc] 14 Aug 2016 Coordnate frendly structures, algorthms and applcatons arxv:1601.00863v3 [math.oc] 14 Aug 2016 Zhmn Peng, Tanyu Wu, Yangyang Xu, Mng Yan, and Wotao Yn Ths paper focuses on coordnate update methods, whch

More information

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0 Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the

More information

Introduction to the R Statistical Computing Environment R Programming

Introduction to the R Statistical Computing Environment R Programming Introducton to the R Statstcal Computng Envronment R Programmng John Fox McMaster Unversty ICPSR 2018 John Fox (McMaster Unversty) R Programmng ICPSR 2018 1 / 14 Programmng Bascs Topcs Functon defnton

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information