Asynchronous Non-Convex Optimization For Separable Problem

Size: px
Start display at page:

Download "Asynchronous Non-Convex Optimization For Separable Problem"

Transcription

1 Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India

2 Distributed Optimization A general multi-agent cooperative distributed optimization problem min f(x) := =1 g (x) + h(x) s. t. x X (1) Master node Centralized Architecture for Distributed Optimization Decentralized Architecture for Distributed Optimization Where x R N 1, g : R N R for = 1,..., K and h : R N R, Machine learning, robotics, Economics, Big data analytics, Networ optimization, Signal processing Each agent, could be sensor nodes, processors, robots etc,. Heterogeneity of the nodes, resource, spatial and temporal constraints, motivates for distributed and asynchronous decision maing. Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

3 Distributed and Asynchronous Optimization Literature g (x) is convex Huge literature for both type of architecture (1). Consensus based, Diffusion, Gossip based, Incremental (sub)gradient, Distributed (sub)gradient, Distributed ADMM, Dual averaging, Proximal Dual, Bloc Coordinate, Mirror descent, Alternate minimization [1]-[10] g (x) is non-convex Proposed very recently, provably convergent solution, only for centralized architecture. Non-convex ADMM, Stochastic subgradient, Proximal Dual [11]-[14]. Applicable for many applications including, Machine Learning. Limitations of Centralized Architecture Needs a master node, or assumes sharing of global database/variable among all nodes. Not suited for communication and sensor networ applications. Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

4 Partially Separable Form [15] Partition the variables {x n } N n=1 among nodes {S } K =1 denote disjoint subsets, {x n x n S } are local to node g ( ) at node depends on the neighborhood N. P = min g ({x n } n S x ) + h ({x n } n S ) (2) =1 s. t. {x n } n S X = 1, 2,..., K , , Figure 1: Factor graph representation for the objective function of (2). Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

5 Decentralized Consensus Problem Formulation Introduce copies x j of the variable x j, j N The consensus variable z = {z }, = 1,..., K x = {x j } j N, z = {z j } j N and y = {y j } j N g 1 x 11 x 12 z 1 min {x },z =1 K g ({x j } j N ) + h (z ) (3) s. t. x j = z j, j N (4) z X, = 1,..., K (5) g 2 g 3 x 13 x 21 x 22 x 24 x 31 x 33 x 34 z 2 z 3 x 1 x 2 x 3 z 4 g 4 x 42 x 43 x 44 x 4 L({x }, z, {y }) = ( K g (x ) + h (z ) + ) ρ y j, x j z j + 2 x j z j 2 j N j N =1 (6) Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

6 A Wireless Sensor Networ Example Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

7 Fundamentals of ADMM Alternating direction of method of multipliers (ADMM), blends the decomposability of dual ascent with superior convergence properties of the method of multipliers The algorithm solves the problem of the form min f(x) + g(z) s.t Ax + Bz = c (7) With variables x R n and z R m, A R p n, B R p m and c R p Optimal value is p = inf{f(x) + g(z) Ax + Bz = c} L ρ (x, z, y) = f(x) + g(z) + y T (Ax + Bz c) + ρ/2 Ax + Bz c 2 2 z +1 := arg min L ρ (x, z, y ), z minimization step z x +1 := arg min L ρ (x, z +1, y ), x minimization step x y +1 := y +1 + ρ(ax +1 + Bz +1 c), dual variable update, ρ > 0 (8) Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

8 Consesnus update Starting with arbitrary {x 1 } and {y1 j z t+1 j = arg min z j X j h j (z j ) + = prox j N j ( }, the update for {zt+1 j } are yj, t x t ρ j z j + x t 2 j z j 2 N j N j ) ρ x t j + yt j (9) N j ρ Where the proximal point function prox j ( ) is defined as prox j (x) := arg min u X j h(u) x u 2. (10) Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

9 Primal and Dual updates x t+1 = arg min g (x ) + y x j, t x j z t+1 j + ρ xj z t+1 j 2 j N j N By linearizing g (x ) at z t+1 an approximate yet an accurate update can be obtained as x t+1 arg min g (z t+1 x ) + g (z t+1 ), x z t+1 + yj, t x j z t+1 j + ρ x j z t+1 j 2 j N j N where the vector [z ] j := z j for all j N and zero otherwise. The approximate update of x t+1 j thus becomes { ( ) z t+1 x t+1 j 1 ρ j = [ g (z t+1 )] j + yj t j N 0 j / N. The dual updates are as 2 2 (11) (12) y t+1 j = y t j + ρ {x t+1 j z t+1 j } j N (13) Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

10 Asynchronous updates Sipping of g ( ) or prox ( ) calculations, and communications are allowed for some iterations Let S t as the set of nodes that carry out the update at time t, then the update can be written as z t+1 j z t+1 j = ( N prox j (ρ x t j +yt j) j j S t (14) zj t j / S t N j ρ ) prox j (x) := arg min u Xj h(u) x u 2. Use the latest available gradient g (z [t+1] ) for x update, where t + 1 T [t + 1] t + 1 for some T <. x t+1 j = { ( ) z t+1 j 1 ρ [ g (z [t+1] )] j + yj t j N 0 j / N. (15) y t+1 j = y t j + ρ {x t+1 j z t+1 j } j N (16) Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

11 Async-ADMM Algorithm STATE Set t = 1, initialize {x 1 j, y1 j, z1 j } for all j N. FOR t = 1, 2,... STATE (Optional) Send {ρ x t j + yt j } to neighbors j N IF{ρ j x t j + yt j } received from all j N STATE(Optional) Update z t+1 as in (??) and transmit to each j N ENDIF IFz t+1 j STATE set z t+1 j not received from some j N = z t j ENDIF STATE(Optional) Calculate gradient g (z t+1 ) STATE Update the primal variable x t+1 as in (15) IF x t+1 x t δ STATE terminate loop ENDIF Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

12 Assumption For each node, the component function gradient g (x) is Lipschitz continuous, that is, there exists L > 0, for all x, x domg such that g (x) g (x ) L x x. (17) Assumption The set X is a closed, convex, and compact. The functions g (x) is bounded from below over X. Assumption For node, the step size ρ is chosen large enough such that, it holds that α > 0 and β > 0, where α := ρ ( f 7L + 1 ) N L 2 2 ρ (T + 1) 2 N L T 2 2 β := ρ 7L 2ρ 2 (18) Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

13 Lemma (a) Starting from any time t = t 0, there exists T < such that L({x T +t0 }; z T +t0, {y T +t0 K β 2 T +t 0 1 i=t 0 =1 T +t 0 i=t 0 =1 }) L({x t0 }; zt0, {y t0 }) j N x i+1 j x i j 2 K α j N z i+1 j z i j 2. (19) (b) The augmented Lagrangian values in (6) are bounded from below, i.e., for any time t 1, it holds that Lagrangian satisfies L({x t }; z t, {y t }) P L 2 j N diam 2 (X ) > Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

14 Theorem (a) The iterates generated by asynchronous Algorithm converges in the following sense lim z t+1 t z t = 0, (20a) lim x t+1 t j x t j = 0, j N, (20b) lim y t+1 yj t = 0, j N, (20c) t j (b) For each K and j N, denote limit points of the sequences {z t }, {x t j }, and {yt j } by z, x j, and y j, respectively. Then {{z }, {x j }, {y j }} is a stationary point of (3) and satisfies g (x ) + y = 0, = 1,..., K (21a) j N y j (h (z)) z=z = 1,..., K (21b) x j = z j X j, j N, = 1,..., K (21c) Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

15 Practical examples of partially separable form Distributed cooperative localization over networs ˆX = arg min g ({x j } j N ) (22) X B =1 g ({x j } j N ) = j N 2 w j (δ j x x j 2 + ɛ) (23) Figure 2: Cooperative Localization Example Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

16 Than You Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

17 References [1] Chang, Tsung-Hui, Angelia Nedi, and Anna Scaglione. Distributed constrained optimization by consensus-based primal-dual perturbation method. IEEE Transactions on Automatic Control 59.6 (2014): [2] Lobel, Ilan, and Asuman Ozdaglar. Distributed subgradient methods for convex optimization over random networs. IEEE Transactions on Automatic Control 56.6 (2011): [3] Nedic, Angelia, Dimitri P. Bertseas, and Vive S. Borar. Distributed asynchronous incremental subgradient methods. (2000). [4] Duchi, John C., Aleh Agarwal, and Martin J. Wainwright. Dual averaging for distributed optimization: convergence analysis and networ scaling. IEEE Transactions on Automatic Control 57.3 (2012): [5] Chen, Jianshu, and Ali H. Sayed. Diffusion adaptation strategies for distributed optimization and learning over networs. IEEE Transactions on Signal Processing 60.8 (2012): Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

18 References [6] Zhang, Ruiliang, and James T. Kwo. Asynchronous Distributed ADMM for Consensus Optimization. ICML [7] Richtri, Peter, and Martin Tac. Distributed coordinate descent method for learning with big data. (2013). [8] Deel, Ofer, et al. Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research 13.Jan (2012): [9] Boyd, Stephen, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3.1 (2011): [10] Boyd, Stephen, et al. Gossip algorithms: Design, analysis and applications. Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies.. Vol. 3. IEEE, non-convex admm hong Decomposing nonconvex hong Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

19 References [11] Hong, Mingyi. A distributed, asynchronous and incremental algorithm for nonconvex optimization: An ADMM based approach. arxiv preprint arxiv: (2014). [12] Davis, Dame. The Asynchronous PALM Algorithm for Nonsmooth Nonconvex Problems. arxiv preprint arxiv: (2016). [13] Ghadimi, Saeed, and Guanghui Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23.4 (2013): [14] Hong, Mingyi. Decomposing linearly constrained nonconvex problems by a proximal primal dual approach: Algorithms, convergence, and applications. arxiv preprint arxiv: (2016). [15] Kumar, Sandeep, Rahul Jain, and Ketan Rajawat. Asynchronous Optimization Over Heterogeneous Networs via Consensus ADMM. arxiv preprint arxiv: (2016). Sandeep Kumar (IIT K) Async-DADMM iwml, 2 July, / 19

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

Distributed Optimization over Networks Gossip-Based Algorithms

Distributed Optimization over Networks Gossip-Based Algorithms Distributed Optimization over Networks Gossip-Based Algorithms Angelia Nedić angelia@illinois.edu ISE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Outline Random

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems

More information

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San

More information

BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS

BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS Songtao Lu, Ioannis Tsaknakis, and Mingyi Hong Department of Electrical

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Distributed Optimization over Random Networks

Distributed Optimization over Random Networks Distributed Optimization over Random Networks Ilan Lobel and Asu Ozdaglar Allerton Conference September 2008 Operations Research Center and Electrical Engineering & Computer Science Massachusetts Institute

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Decentralized Consensus Optimization with Asynchrony and Delay

Decentralized Consensus Optimization with Asynchrony and Delay Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University

More information

DECENTRALIZED algorithms are used to solve optimization

DECENTRALIZED algorithms are used to solve optimization 5158 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 19, OCTOBER 1, 016 DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Aryan Mohtari, Wei Shi, Qing Ling,

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Constrained Consensus and Optimization in Multi-Agent Networks

Constrained Consensus and Optimization in Multi-Agent Networks Constrained Consensus Optimization in Multi-Agent Networks The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher

More information

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems?

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Mingyi Hong IMSE and ECpE Department Iowa State University ICCOPT, Tokyo, August 2016 Mingyi Hong (Iowa State University)

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained Problems

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained Problems The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained Problems Presenter: Mingyi Hong Joint work with Davood Hajinezhad University of Minnesota ECE Department DIMACS Workshop on Distributed

More information

Distributed Computation of Quantiles via ADMM

Distributed Computation of Quantiles via ADMM 1 Distributed Computation of Quantiles via ADMM Franck Iutzeler Abstract In this paper, we derive distributed synchronous and asynchronous algorithms for computing uantiles of the agents local values.

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Provable Non-Convex Min-Max Optimization

Provable Non-Convex Min-Max Optimization Provable Non-Convex Min-Max Optimization Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa City, IA, 52242 Department of Mathematics, The

More information

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization ESTT: A onconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization Davood Hajinezhad, Mingyi Hong Tuo Zhao Zhaoran Wang Abstract We study a stochastic and distributed algorithm for

More information

Consensus-Based Distributed Optimization with Malicious Nodes

Consensus-Based Distributed Optimization with Malicious Nodes Consensus-Based Distributed Optimization with Malicious Nodes Shreyas Sundaram Bahman Gharesifard Abstract We investigate the vulnerabilities of consensusbased distributed optimization protocols to nodes

More information

Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization

Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization Noname manuscript No. (will be inserted by the editor Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization Davood Hajinezhad and Mingyi Hong Received: date / Accepted: date Abstract

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

arxiv: v2 [cs.dc] 2 May 2018

arxiv: v2 [cs.dc] 2 May 2018 On the Convergence Rate of Average Consensus and Distributed Optimization over Unreliable Networks Lili Su EECS, MI arxiv:1606.08904v2 [cs.dc] 2 May 2018 Abstract. We consider the problems of reaching

More information

Subgradient Methods in Network Resource Allocation: Rate Analysis

Subgradient Methods in Network Resource Allocation: Rate Analysis Subgradient Methods in Networ Resource Allocation: Rate Analysis Angelia Nedić Department of Industrial and Enterprise Systems Engineering University of Illinois Urbana-Champaign, IL 61801 Email: angelia@uiuc.edu

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

arxiv: v1 [math.oc] 24 Oct 2017

arxiv: v1 [math.oc] 24 Oct 2017 Asynchronous ADMM for Distributed Non-Convex Optimization in Power Systems Junyao Guo a,, Gabriela Hug b, Ozan Tonguz a a Department of Electrical and Computer Engineering, Carnegie Mellon University,

More information

Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization

Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization Proceedings of the hirty-first AAAI Conference on Artificial Intelligence (AAAI-7) Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization Zhouyuan Huo Dept. of Computer

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Communication/Computation Tradeoffs in Consensus-Based Distributed Optimization

Communication/Computation Tradeoffs in Consensus-Based Distributed Optimization Communication/Computation radeoffs in Consensus-Based Distributed Optimization Konstantinos I. sianos, Sean Lawlor, and Michael G. Rabbat Department of Electrical and Computer Engineering McGill University,

More information

MACHINE learning and optimization theory have enjoyed a fruitful symbiosis over the last decade. On the one hand,

MACHINE learning and optimization theory have enjoyed a fruitful symbiosis over the last decade. On the one hand, 1 Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server Arda Aytein Student Member, IEEE, Hamid Reza Feyzmahdavian, Student Member, IEEE, and Miael Johansson Member,

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson 204 IEEE INTERNATIONAL WORKSHOP ON ACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 2 24, 204, REIS, FRANCE A DELAYED PROXIAL GRADIENT ETHOD WITH LINEAR CONVERGENCE RATE Hamid Reza Feyzmahdavian, Arda Aytekin,

More information

On the linear convergence of distributed optimization over directed graphs

On the linear convergence of distributed optimization over directed graphs 1 On the linear convergence of distributed optimization over directed graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v4 [math.oc] 7 May 016 Abstract This paper develops a fast distributed algorithm,

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

DLM: Decentralized Linearized Alternating Direction Method of Multipliers 1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

THE EFFECT OF DETERMINISTIC NOISE 1 IN SUBGRADIENT METHODS

THE EFFECT OF DETERMINISTIC NOISE 1 IN SUBGRADIENT METHODS Submitted: 24 September 2007 Revised: 5 June 2008 THE EFFECT OF DETERMINISTIC NOISE 1 IN SUBGRADIENT METHODS by Angelia Nedić 2 and Dimitri P. Bertseas 3 Abstract In this paper, we study the influence

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Asymptotics, asynchrony, and asymmetry in distributed consensus

Asymptotics, asynchrony, and asymmetry in distributed consensus DANCES Seminar 1 / Asymptotics, asynchrony, and asymmetry in distributed consensus Anand D. Information Theory and Applications Center University of California, San Diego 9 March 011 Joint work with Alex

More information

Convergence Rate for Consensus with Delays

Convergence Rate for Consensus with Delays Convergence Rate for Consensus with Delays Angelia Nedić and Asuman Ozdaglar October 8, 2007 Abstract We study the problem of reaching a consensus in the values of a distributed system of agents with time-varying

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Stochastic Gradient Descent Algorithms for Resource Allocation

Stochastic Gradient Descent Algorithms for Resource Allocation Stochastic Gradient Descent Algorithms for Resource Allocation Amrit Singh Bedi Supervisor: Dr. Ketan Rajawat Department of Electrical Engineering, Indian Institute of Technology, Kanpur Kanpur, Uttar

More information

Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance

Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Date: Mar. 3rd, 2017 Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Presenter: Songtao Lu Department of Electrical and Computer Engineering Iowa

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Zeroth-Order Online Alternating Direction Method of Multipliers: Convergence Analysis and Applications

Zeroth-Order Online Alternating Direction Method of Multipliers: Convergence Analysis and Applications Zeroth-Order Online Alternating Direction Method of Multipliers: Convergence Analysis and Applications Sijia Liu Jie Chen Pin-Yu Chen Alfred O. Hero University of Michigan Northwestern Polytechnical IBM

More information

Stochastic dynamical modeling:

Stochastic dynamical modeling: Stochastic dynamical modeling: Structured matrix completion of partially available statistics Armin Zare www-bcf.usc.edu/ arminzar Joint work with: Yongxin Chen Mihailo R. Jovanovic Tryphon T. Georgiou

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Nov 2, 2016 Outline SGD-typed algorithms for Deep Learning Parallel SGD for deep learning Perceptron Prediction value for a training data: prediction

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

Convergence of Rule-of-Thumb Learning Rules in Social Networks

Convergence of Rule-of-Thumb Learning Rules in Social Networks Convergence of Rule-of-Thumb Learning Rules in Social Networks Daron Acemoglu Department of Economics Massachusetts Institute of Technology Cambridge, MA 02142 Email: daron@mit.edu Angelia Nedić Department

More information

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis Enhanced Fritz John Optimality Conditions and Sensitivity Analysis Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology March 2016 1 / 27 Constrained

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information

Primal Solutions and Rate Analysis for Subgradient Methods

Primal Solutions and Rate Analysis for Subgradient Methods Primal Solutions and Rate Analysis for Subgradient Methods Asu Ozdaglar Joint work with Angelia Nedić, UIUC Conference on Information Sciences and Systems (CISS) March, 2008 Department of Electrical Engineering

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Resilient Distributed Optimization Algorithm against Adversary Attacks

Resilient Distributed Optimization Algorithm against Adversary Attacks 207 3th IEEE International Conference on Control & Automation (ICCA) July 3-6, 207. Ohrid, Macedonia Resilient Distributed Optimization Algorithm against Adversary Attacks Chengcheng Zhao, Jianping He

More information

Distributed Convex Optimization

Distributed Convex Optimization Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

On the Convergence of Federated Optimization in Heterogeneous Networks

On the Convergence of Federated Optimization in Heterogeneous Networks On the Convergence of Federated Optimization in Heterogeneous Networs Anit Kumar Sahu, Tian Li, Maziar Sanjabi, Manzil Zaheer, Ameet Talwalar, Virginia Smith Abstract The burgeoning field of federated

More information

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006 Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

arxiv: v3 [cs.lg] 15 Sep 2018

arxiv: v3 [cs.lg] 15 Sep 2018 Asynchronous Stochastic Proximal Methods for onconvex onsmooth Optimization Rui Zhu 1, Di iu 1, Zongpeng Li 1 Department of Electrical and Computer Engineering, University of Alberta School of Computer

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

arxiv: v2 [math.oc] 7 Apr 2017

arxiv: v2 [math.oc] 7 Apr 2017 Optimal algorithms for smooth and strongly convex distributed optimization in networks arxiv:702.08704v2 [math.oc] 7 Apr 207 Kevin Scaman Francis Bach 2 Sébastien Bubeck 3 Yin Tat Lee 3 Laurent Massoulié

More information

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos 1 Stochastic Variance Reduction for Nonconvex Optimization Barnabás Póczos Contents 2 Stochastic Variance Reduction for Nonconvex Optimization Joint work with Sashank Reddi, Ahmed Hefny, Suvrit Sra, and

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

Consensus Optimization with Delayed and Stochastic Gradients on Decentralized Networks

Consensus Optimization with Delayed and Stochastic Gradients on Decentralized Networks 06 IEEE International Conference on Big Data (Big Data) Consensus Optimization with Delayed and Stochastic Gradients on Decentralized Networks Benjamin Sirb Xiaojing Ye Department of Mathematics and Statistics

More information

A Distributed Newton Method for Network Utility Maximization, I: Algorithm

A Distributed Newton Method for Network Utility Maximization, I: Algorithm A Distributed Newton Method for Networ Utility Maximization, I: Algorithm Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract Most existing wors use dual decomposition and first-order

More information

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX

More information

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu Department of Computer Science, University of Rochester {lianxiangru,huangyj0,raingomm,ji.liu.uwisc}@gmail.com

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

A Distributed Newton Method for Network Optimization

A Distributed Newton Method for Network Optimization A Distributed Newton Method for Networ Optimization Ali Jadbabaie, Asuman Ozdaglar, and Michael Zargham Abstract Most existing wor uses dual decomposition and subgradient methods to solve networ optimization

More information