Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)
|
|
- Robyn McKinney
- 6 years ago
- Views:
Transcription
1 Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) Asilomar Conference on Signals, Systems and Computers Pacific Grove, CA, November 4, 2014 Mokhtari, Ling, Ribeiro Network Newton 1
2 Distributed optimization Network with n nodes. Each node i has access to local function f i (x) n Collaborate to minimize global objective f (x) = f i (x) Sample subsets to train classifier i=1 f 2 (x) f 5 (x) f 8 (x) f 1 (x) f 3 (x) f 6 (x) f 9 (x) f 4 (x) f 7 (x) f 10 (x) Nodes can operate (train or estimate) locally but would benefit by sharing Cost of aggregating functions is large Comms and computation Recursive exchanges with neighbors j N i to aggregate global information Mokhtari, Ling, Ribeiro Network Newton 2
3 Methods for distributed optimization n Replicate common variable at each node f (x 1,..., x n) = f i (x i ) Enforce equality between neighbors x i = x j (thus between all nodes) i=1 f 2 (x) f 5 (x) f 8 (x) f 1 (x) f 3 (x) f 6 (x) f 9 (x) f 4 (x) f 7 (x) f 10 (x) Operate recursively to enforce equality asymptotically. Differ on how. Distributed gradient descent, recursive averaging, [Nedic, Ozdaglar 09] Distributed dual descent, prices, [Rabbat et al 05] Distributed ADMM, prices, [Schizas et al 08] All are first order methods, thus, convergence times not always reasonable Mokhtari, Ling, Ribeiro Network Newton 3
4 Methods for distributed optimization n Replicate common variable at each node f (x 1,..., x n) = f i (x i ) Enforce equality between neighbors x i = x j (thus between all nodes) i=1 f 2 (x 2 ) f 5 (x 5 ) f 8 (x 8 ) f 1 (x 1 ) f 3 (x 3 ) f 6 (x 6 ) f 9 (x 9 ) f 4 (x 4 ) f 7 (x 7 ) f 10 (x 10 ) Operate recursively to enforce equality asymptotically. Differ on how. Distributed gradient descent, recursive averaging, [Nedic, Ozdaglar 09] Distributed dual descent, prices, [Rabbat et al 05] Distributed ADMM, prices, [Schizas et al 08] All are first order methods, thus, convergence times not always reasonable Mokhtari, Ling, Ribeiro Network Newton 4
5 (Approximate) Network Newton (NN) Reinterpret distributed gradient descent (DGD) as a penalty method Newton step for objective + penalty requires global coordination Approximate with local operations by truncating Taylor s series of Hessian inverse Hessian is neighbor sparse kth term of series is k-hop neighbor sparse NN-k aggregates information from k-hop neighborhood f 2 (x 2 ) f 5 (x 5 ) f 8 (x 8 ) f 1 (x 1 ) f 3 (x 3 ) f 6 (x 6 ) f 9 (x 9 ) f 4 (x 4 ) f 7 (x 7 ) f 10 (x 10 ) NN-k converges linearly always and exhibits a quadratic phase in a range Mokhtari, Ling, Ribeiro Network Newton 5
6 (Approximate) Network Newton (NN) Reinterpret distributed gradient descent (DGD) as a penalty method Newton step for objective + penalty requires global coordination Approximate with local operations by truncating Taylor s series of Hessian inverse Hessian is neighbor sparse kth term of series is k-hop neighbor sparse NN-k aggregates information from k-hop neighborhood NN-1 f 2 (x 2 ) f 5 (x 5 ) f 8 (x 8 ) f 1 (x 1 ) f 3 (x 3 ) f 6 (x 6 ) f 9 (x 9 ) f 4 (x 4 ) f 7 (x 7 ) f 10 (x 10 ) NN-k converges linearly always and exhibits a quadratic phase in a range Mokhtari, Ling, Ribeiro Network Newton 6
7 (Approximate) Network Newton (NN) Reinterpret distributed gradient descent (DGD) as a penalty method Newton step for objective + penalty requires global coordination Approximate with local operations by truncating Taylor s series of Hessian inverse Hessian is neighbor sparse kth term of series is k-hop neighbor sparse NN-k aggregates information from k-hop neighborhood NN-1 NN-2 f 2 (x 2 ) f 5 (x 5 ) f 8 (x 8 ) f 1 (x 1 ) f 3 (x 3 ) f 6 (x 6 ) f 9 (x 9 ) f 4 (x 4 ) f 7 (x 7 ) f 10 (x 10 ) NN-k converges linearly always and exhibits a quadratic phase in a range Mokhtari, Ling, Ribeiro Network Newton 7
8 (Approximate) Network Newton (NN) Reinterpret distributed gradient descent (DGD) as a penalty method Newton step for objective + penalty requires global coordination Approximate with local operations by truncating Taylor s series of Hessian inverse Hessian is neighbor sparse kth term of series is k-hop neighbor sparse NN-k aggregates information from k-hop neighborhood NN-1 NN-2 NN-3 f 2 (x 2 ) f 5 (x 5 ) f 8 (x 8 ) f 1 (x 1 ) f 3 (x 3 ) f 6 (x 6 ) f 9 (x 9 ) f 4 (x 4 ) f 7 (x 7 ) f 10 (x 10 ) NN-k converges linearly always and exhibits a quadratic phase in a range Mokhtari, Ling, Ribeiro Network Newton 8
9 Decentralized Gradient Descent (DGD) Problem in distributed form min x 1,...,x n n f i (x i ), i=1 s.t. x i = x j, for j N i With nonnegative doubly stochastic weights W = [w ij ], DGD update at node i is x i,t+1 = j=i,j N i w ij x j,t α f i (x i,t ) Average of local and neighboring variables + local gradient descent Mokhtari, Ling, Ribeiro Network Newton 9
10 Decentralized Gradient Descent (DGD) Rewrite DGD in vector form (aggregate variable y := [x 1;... ; x n]), y t+1 = Wy t αh(y t) Weight matrix W, W := W I. Gradient h(y) := [ f1(x 1);... ; f n(x n)] Reorder terms in vector form DGD y t+1 = y t [ (I W)y t + αh(y t) ] Gradient descent on function F (y) := 1 n 2 yt (I W) y + α f i (x i ) i=1 Mokhtari, Ling, Ribeiro Network Newton 10
11 DGD as a penalty method (reconsidering the mystery of DGD) Why do gradient descent on F (y) := 1 2 yt (I W) n y + α f i (x i )? i=1 Weight matrix W is constructed such that null(i W) = span(1) Thus null(i W) = span(1 I) and (I W)y = 0 if and only if x i = x j Same is true of (I W) 1/2 problem in distributed form is equivalent to min y n f i (x i ), s.t. (I W) 1/2 y = 0 i=1 DGD is a penalty method to solve this (equivalent) problem Squared norm penalty 1 2 (I W) 1/2 y 2 with coefficient 1/α Converges to wrong solution. Not far from right if α is small Gradient descent in F (y) works Why not using Newton steps on F (y)? Mokhtari, Ling, Ribeiro Network Newton 11
12 Newton method for penalized objective function Penalized objective function F (y) = 1 n 2 yt (I W)y + α f i (x i ) i=1 To implement Newton on F (y) need Hessians H t = I W + αg t G t is block diagonal with blocks G ii,t = 2 f i (x i,t ) Hessian H t has the sparsity pattern of W = sparsity pattern of graph Can be computed with local information + exchanges with neighbors Newton step depends on Hessian inverse d t := H 1 t g t Inverse of H t is, in general, not block sparse nor locally computable Mokhtari, Ling, Ribeiro Network Newton 12
13 Network Newton Hessian approximation Define diagonal matrix D t := αg t + 2(I diag( W)) Define block graph sparse matrix B := I 2diag( W) + W ( ) Split Hessian as H t = D t B = D 1/2 t I D 1/2 t BD 1/2 t D 1/2 t Use Taylor series (I X) 1 = k=0 Xk to write Hessian inverse as H 1 t = D 1/2 t k=0 ( ) kd D 1/2 t BD 1/2 1/2 t t Define NN-K step d (K) t := Ĥ(K) 1 t g t by truncating sum at Kth term Ĥ (K) 1 t := D 1/2 t K k=0 ( ) k D 1/2 t BD 1/2 1/2 t D t ( ) k D 1/2 t BD 1/2 t graph sparse D 1/2 t BD 1/2 t k-hop neighborhood sparse Mokhtari, Ling, Ribeiro Network Newton 13
14 Distributed computation of NN-K step Recursion for NN-k steps. Define d (0) t d (k+1) t = D 1 t Bd (k) t = D 1 t g t and for all other k D 1 t g t Given that D t is diagonal, can rewrite recursion componentwise as d (k+1) i,t d (k+1) i,t = D 1 ii,t n j=1 B ij d (k) j,t D 1 ii,t g i,t But given that B is graph sparse B ij = 0 unless i and j are neighbors = D 1 ii,t j N i,j=i B ij d (k) j,t D 1 ii,t g i,t Local piece of NN-(k + 1) step is computed as a function of Local matrices, local gradient components, local piece of NN-k step Pieces of the NN-k step of neighboring nodes. Can exchange. Mokhtari, Ling, Ribeiro Network Newton 14
15 NN-K Algorithm at node i (0) Initialize at x i,0. Repeat for times t = 0, 1,... (1) Exchange local iterates x i,t with neighboring nodes j N i. (2) Compute local gradient components g i,t = (1 w ii )x i,t j N i w ij x j,t + α f i (x i,t ) (3) Initialize NN step computation with NN-0 step d (0) i,t (4) Repeat for k = 0, 1,..., K 1 = D 1 ii,t g i,t (5) Exchange local elements d (k) i,t of NN-k step with neighbors j N i. (6) Compute local component of NN-(k + 1) step d (k+1) i,t = D 1 ii,t j N i,j=i (7) Update local iterate: x i,t+1 = x i,t + ɛ d (K) i,t B ij d (k) j,t D 1 ii,t g i,t Mokhtari, Ling, Ribeiro Network Newton 15
16 Assumptions Assumption 1 The local objective functions f i (x) are twice differentiable The Hessians 2 f i (x) have bounded eigenvalues mi 2 f i (x) MI Assumption 2 The local Hessians are Lipschitz continuous 2 f i (x) 2 f i (ˆx) L x ˆx Assumption 3 The local weights w ii are bounded 0 δ w ii < 1 i = 1,..., n. The upper bound is implied by connectivity condition. Mokhtari, Ling, Ribeiro Network Newton 16
17 Linear convergence of NN-K Theorem For a specific choice of stepsize ɛ the sequence F (y t) converges to the optimal argument F (y ) at least linearly with constant 0 < 1 ζ α < 1, i.e., F (y t) F (y ) (1 ζ α) t (F (y 0) F (y )) ɛ is the minimum of 1 and a constant depending on problem parameters Trade-off between convergence rate and accuracy Large α implies small ζ α and faster convergence Smaller choices of α implies more accurate convergence Mokhtari, Ling, Ribeiro Network Newton 17
18 Superlinear convergence Lemma Lemma For specific values of Γ 1 and Γ 2 the sequence of weighted gradient norm satisfies ] D 1/2 t g t+1 (1 ɛ+ɛρ K+1 ) [1 + Γ 1(1 ζ α) t 1 4 D 1/2 +ɛ t 1 gt 2 D 1/2 2 Γ 2 t 1 gt where ρ < 1. D 1 2 t g t+1 is upper bounded by linear and quadratic terms of D 1 2 Similar to the convergence analysis of Newton method with constant ɛ For t large enough Γ 1(1 ζ α) t t 1 gt There must be intervals in which the quadratic term dominates linear term Rate of convergence is quadratic in that interval Mokhtari, Ling, Ribeiro Network Newton 18
19 Quadratic phase of NN-K convergence Theorem For η t := [(1 ɛ + ɛρ K+1 )(1 + Γ 1(1 ζ) (t 1)/4 )] and t 0 := argmin t {t η t < 1}, we have that for all t t 0 if then ηt(1 η t) < D 1/2 ɛ 2 t 1 Γ gt < 1 η t, 2 ɛ 2 Γ 2 D 1/2 t g t+1 ɛ2 Γ 2 1 η t D 1/2 t 1 gt 2. Quadratic convergence of D 1/2 gt in a specified interval. t 1 Mokhtari, Ling, Ribeiro Network Newton 19
20 Numerical results Convergence path for f (x) := 100 i=1 xt A i x/2 + b T i x Condition number = 10 3, α = 10 2, graph is d regular with d = 4, ɛ = 1 xi x 2 x 2 error = 1 n n i= DGD NN-0 NN-1 NN Number of iterations DGD is slower than different versions of NN-K NN-K with larger K converges faster in terms of number of iterations Mokhtari, Ling, Ribeiro Network Newton 20
21 Numerical results Number of required information exchanges to achieve accuracy e = 10 2 n = 100, d = {4,..., 10} and c.n.= {10 2, 10 3, 10 4 } Empirical distribution Empirical distribution DGD Number of information exchanges NN Number of information exchanges Empirical distribution Empirical distribution NN-0 Number of information exchanges NN Number of information exchanges Different versions of NN-k have almost similar performances DGD is slower than all versions of NN-K by an order of magnitude Mokhtari, Ling, Ribeiro Network Newton 21
22 Numerical Results Convergence of NN-K and DGD with decreasing α Divide α by 10 when algorithm is converged DGD NN-0 NN-1 NN DGD NN-0 NN-1 NN-2 error 10 3 error number of iterations a) α 0 = number of iterations b) α 0 = 10 1 Exact convergence is achieved by decreasing α Larger initial value for α leads to faster convergence for both algorithms Mokhtari, Ling, Ribeiro Network Newton 22
23 Conclusions Introduced a network optimization formulation Each agent has local cost function f i Global cost f = n i=1 f i Network Newton is proposed as a second-order distributed method Approximates Newton by truncating the Taylor series of Hessian inverse Linear convergence is established Quadratic convergence in a specific interval is shown NN has faster convergence relative to DGD according to numerical results Mokhtari, Ling, Ribeiro Network Newton 23
Efficient Methods for Large-Scale Optimization
Efficient Methods for Large-Scale Optimization Aryan Mokhtari Department of Electrical and Systems Engineering University of Pennsylvania aryanm@seas.upenn.edu Ph.D. Proposal Advisor: Alejandro Ribeiro
More informationDECENTRALIZED algorithms are used to solve optimization
5158 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 19, OCTOBER 1, 016 DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Aryan Mohtari, Wei Shi, Qing Ling,
More informationDecentralized Quasi-Newton Methods
1 Decentralized Quasi-Newton Methods Mark Eisen, Aryan Mokhtari, and Alejandro Ribeiro Abstract We introduce the decentralized Broyden-Fletcher- Goldfarb-Shanno (D-BFGS) method as a variation of the BFGS
More informationDecentralized Quadratically Approximated Alternating Direction Method of Multipliers
Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania
More informationDLM: Decentralized Linearized Alternating Direction Method of Multipliers
1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction
More informationHigh Order Methods for Empirical Risk Minimization
High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu Thanks to: Aryan Mokhtari, Mark
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationA Distributed Newton Method for Network Utility Maximization, II: Convergence
A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility
More informationHigh Order Methods for Empirical Risk Minimization
High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu IPAM Workshop of Emerging Wireless
More informationIncremental Quasi-Newton methods with local superlinear convergence rate
Incremental Quasi-Newton methods wh local superlinear convergence rate Aryan Mokhtari, Mark Eisen, and Alejandro Ribeiro Department of Electrical and Systems Engineering Universy of Pennsylvania Int. Conference
More informationDistributed Smooth and Strongly Convex Optimization with Inexact Dual Methods
Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of
More informationA Distributed Newton Method for Network Utility Maximization, I: Algorithm
A Distributed Newton Method for Networ Utility Maximization, I: Algorithm Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract Most existing wors use dual decomposition and first-order
More informationIEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, Aryan Mokhtari and Alejandro Ribeiro, Member, IEEE
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, 2014 6089 RES: Regularized Stochastic BFGS Algorithm Aryan Mokhtari and Alejandro Ribeiro, Member, IEEE Abstract RES, a regularized
More informationDecentralized Consensus Optimization with Asynchrony and Delay
Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University
More informationHigh Order Methods for Empirical Risk Minimization
High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu IPAM Workshop of Emerging Wireless
More informationr=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J
7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationABSTRACT 1. INTRODUCTION
A DIAGONAL-AUGMENTED QUASI-NEWTON METHOD WITH APPLICATION TO FACTORIZATION MACHINES Aryan Mohtari and Amir Ingber Department of Electrical and Systems Engineering, University of Pennsylvania, PA, USA Big-data
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationDistributed Consensus Optimization
Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationWE consider the problem of estimating a time varying
450 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 61, NO 2, JANUARY 15, 2013 D-MAP: Distributed Maximum a Posteriori Probability Estimation of Dynamic Systems Felicia Y Jakubiec Alejro Ribeiro Abstract This
More informationNewton-like method with diagonal correction for distributed optimization
Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić February 7, 2017 Abstract We consider distributed optimization
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationNewton-like method with diagonal correction for distributed optimization
Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić August 15, 2015 Abstract We consider distributed optimization problems
More informationLARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION
LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION Aryan Mokhtari, Alec Koppel, Gesualdo Scutari, and Alejandro Ribeiro Department of Electrical and Systems
More informationA Distributed Newton Method for Network Utility Maximization
A Distributed Newton Method for Networ Utility Maximization Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie Abstract Most existing wor uses dual decomposition and subgradient methods to solve Networ Utility
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationNewton s Method for Constrained Norm Minimization and Its Application to Weighted Graph Problems
Newton s Method for Constrained Norm Minimization and Its Application to Weighted Graph Problems Mahmoud El Chamie 1 Giovanni Neglia 1 Abstract Due to increasing computer processing power, Newton s method
More informationWE consider an undirected, connected network of n
On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been
More informationOn the linear convergence of distributed optimization over directed graphs
1 On the linear convergence of distributed optimization over directed graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v4 [math.oc] 7 May 016 Abstract This paper develops a fast distributed algorithm,
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationD4L: Decentralized Dynamic Discriminative Dictionary Learning
D4L: Decentralized Dynamic Discriminative Dictionary Learning Alec Koppel, Garrett Warnell, Ethan Stump, and Alejandro Ribeiro Abstract We consider discriminative dictionary learning in a distributed online
More informationAsynchronous Non-Convex Optimization For Separable Problem
Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent
More informationarxiv: v1 [math.oc] 29 Sep 2018
Distributed Finite-time Least Squares Solver for Network Linear Equations Tao Yang a, Jemin George b, Jiahu Qin c,, Xinlei Yi d, Junfeng Wu e arxiv:856v mathoc 9 Sep 8 a Department of Electrical Engineering,
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationA SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS
A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)
More informationStochastic Quasi-Newton Methods
Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationAccelerated Distributed Nesterov Gradient Descent
Accelerated Distributed Nesterov Gradient Descent Guannan Qu, Na Li arxiv:705.0776v3 [math.oc] 6 Aug 08 Abstract This paper considers the distributed optimization problem over a network, where the objective
More informationWhy should you care about the solution strategies?
Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationNetwork Optimization with Heuristic Rational Agents
Network Optimization with Heuristic Rational Agents Ceyhun Eksin and Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania {ceksin, aribeiro}@seas.upenn.edu Abstract
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.
ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, 2015 1 Name: Solution Score: /100 This exam is closed-book. You must show ALL of your work for full credit. Please read the questions carefully.
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More informationHow to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization
How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationMATH 4211/6211 Optimization Quasi-Newton Method
MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationNumerical Optimization
Unconstrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 01, India. NPTEL Course on Unconstrained Minimization Let f : R n R. Consider the optimization problem:
More informationALADIN An Algorithm for Distributed Non-Convex Optimization and Control
ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of
More informationRandomized Hessian Estimation and Directional Search
Randomized Hessian Estimation and Directional Search D. Leventhal A.S. Lewis September 4, 008 Key words: derivative-free optimization, directional search, quasi-newton, random search, steepest descent
More informationConsensus-Based Distributed Optimization with Malicious Nodes
Consensus-Based Distributed Optimization with Malicious Nodes Shreyas Sundaram Bahman Gharesifard Abstract We investigate the vulnerabilities of consensusbased distributed optimization protocols to nodes
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationWE consider an undirected, connected network of n
On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been
More informationA Quick Tour of Linear Algebra and Optimization for Machine Learning
A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationA Distributed Newton Method for Network Optimization
A Distributed Newton Method for Networ Optimization Ali Jadbabaie, Asuman Ozdaglar, and Michael Zargham Abstract Most existing wor uses dual decomposition and subgradient methods to solve networ optimization
More informationQuasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationConvex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University
Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationAccelerated Block-Coordinate Relaxation for Regularized Optimization
Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth
More informationA Second-Order Method for Strongly Convex l 1 -Regularization Problems
Noname manuscript No. (will be inserted by the editor) A Second-Order Method for Strongly Convex l 1 -Regularization Problems Kimon Fountoulakis and Jacek Gondzio Technical Report ERGO-13-11 June, 13 Abstract
More informationarxiv: v2 [cs.lg] 8 Nov 2018
An Exact Quantized Decentralized Gradient Descent Algorithm Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani arxiv:806.536v cs.lg] 8 Nov 08 Abstract We consider the problem of decentralized
More informationLECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION
15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationInverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology
Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x
More informationDistributed Optimization over Networks Gossip-Based Algorithms
Distributed Optimization over Networks Gossip-Based Algorithms Angelia Nedić angelia@illinois.edu ISE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Outline Random
More informationOn the Linear Convergence of Distributed Optimization over Directed Graphs
1 On the Linear Convergence of Distributed Optimization over Directed Graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v1 [math.oc] 7 Oct 015 Abstract This paper develops a fast distributed algorithm,
More informationSimple Iteration, cont d
Jim Lambers MAT 772 Fall Semester 2010-11 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Simple Iteration, cont d In general, nonlinear equations cannot be solved in a finite sequence
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationParallel Coordinate Optimization
1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a
More informationarxiv: v3 [math.oc] 1 Jul 2015
On the Convergence of Decentralized Gradient Descent Kun Yuan Qing Ling Wotao Yin arxiv:1310.7063v3 [math.oc] 1 Jul 015 Abstract Consider the consensus problem of minimizing f(x) = n fi(x), where x Rp
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationAdaptive Piecewise Polynomial Estimation via Trend Filtering
Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationLecture 17: October 27
0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationOptimization Methods for Machine Learning
Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction
More informationIterative Methods. Splitting Methods
Iterative Methods Splitting Methods 1 Direct Methods Solving Ax = b using direct methods. Gaussian elimination (using LU decomposition) Variants of LU, including Crout and Doolittle Other decomposition
More information