ADMM and Fast Gradient Methods for Distributed Optimization

Size: px
Start display at page:

Download "ADMM and Fast Gradient Methods for Distributed Optimization"

Transcription

1 ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013

2 Joint work Fast gradient methods Du san Jakovetić (IST-CMU) José M. F. Moura (CMU) ADMM João Mota (IST-CMU) Markus Püschel (ETH) Pedro Aguiar (IST)

3 Multi-agent optimization minimize subject to f (x) := f 1 (x) + f (x) + + f n (x) x X i f i f i is convex, private to agent i X R d is closed, convex (hereafter, d = 1) f = inf x X f (x) is attained at x network is connected and static applications: distributed learning, cognitive radio, consensus,...

4 Distributed subgradient method Update at each agent i (with constant step size) x i (t) = P X W ij x j (t 1) α f i (x i (t 1)) j N i N i is neighborhood of node i (including i) W ij are weights α > 0 is stepsize f i (x) is a subgradient of f i at x P X is projector onto X

5 Several variations exist and time-varying networks are supported Small sample of representative work: J.Tsitsiklis et al., Distributed asynchronous deterministic and stochastic gradient optimization algorithms, IEEE TAC, 31(9), 1986 A. Nedić and A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE TAC, 54(1), 009 B. Johansson, A randomized incremental subgradient method for distributed optimization in networked systems, SIAM J. Optimization, 0(3), 009 A. Nedić et al., Constrained consensus and optimization in multi-agent networks, IEEE TAC, 55(4), 010 J. Duchi et al., Dual averaging for distributed optimization: convergence and network scaling, IEEE TAC, 57(3), 01

6 Convergence analysis: under appropriate conditions ( f (x i (t)) f = O α + 1 ) αt For optimized α, O(1/ɛ ) iterations suffice to reach ɛ-suboptimality Distributed subgradient method in matrix form x(t) = P (W x(t 1) α F (x(t 1))) x(t) := (x 1 (t),..., x n (t)) is network state F (x 1,..., x n ) := f 1 (x 1 ) + + f n (x n ) F (x 1,..., x n ) = ( f 1 (x 1 ),..., f n (x n )) P is projector onto X n

7 Interpretation: when W = I αρl (L = network Laplacian, ρ > 0) x(t) = P (x(t 1) α Ψ ρ (x(t 1))) classical subgradient method applied to penalized objective Ψ ρ (x) = F (x) + ρ x Lx = n f i (x i ) + ρ x i x j i=1 i j Key idea: apply instead Nesterov s fast gradient method x(t) = P (y(t 1) α Ψ ρ (y(t 1))) y(t) = x(t) + t 1 (x(t) x(t 1)) t + y(t) = (y 1 (t),..., y n (t)) is auxiliary variable

8 Distributed Nesterov gradient method (D-NG) with constant stepsize x(t) = P (W y(t 1) α F (y(t 1))) y(t) = x(t) + t 1 (x(t) x(t 1)) t + Convergence analysis: if f i s are differentiable, f i s are Lipschitz continuous, and X is compact ( 1 f (x i (t)) f = O ρ + 1 t + ρ ) t For optimized ρ, O(1/ɛ) iterations suffice to reach ɛ-suboptimality D. Jakovetić et al., Distributed Nesterov-like gradient algorithms, IEEE 51st Annual Conference on Decision and Control (CDC), 01

9 Proof Step 1: plug-in Nesterov s classical analysis f i (x) f i (y) L x y implies Ψ ρ (x) Ψ ρ (y) L ρ x y for L ρ = L + ρλ max (L) notation: Ψ ρ := inf x X Ψ ρ (x) is attained at x ρ with α := 1/L ρ, classical analysis yields Ψ ρ (x(t)) Ψ ρ L ρ t x(0) x ρ for some B 0 (since X is compact) L ρ t B (1)

10 Step : relate f (x i ) to Ψ ρ (x), x = (x 1,..., x n ) n f (x i ) = f j (x i ) = j=1 n f j (x j ) + ρ x Lx + j=1 } {{ } Ψ ρ (x) n f j (x i ) f j (x j ) ρ x Lx j=1 } {{ } (x) () Step 3: upper bound (x) C ρ for some C 0 use Lipschitz continuity of f j to obtain n f j (x i ) f j (x j ) G j=1 n x i x j (3) j=1 for some G 0

11 since L1 = 0, x Lx = (x x i 1) L (x x i 1) (4) combine (3) and (4) to obtain (x) G x x i 1 1 ρ (x x i1) L (x x i 1) = G x 1 ρ x L x x is x xi 1 with ith entry removed L is L with ith row and ith column removed easy to see that L is positive definite (network is connected)

12 it follows (x) max G y y 1 ρ y Ly = max y max z 1 Gz y ρ y Ly = max max Gz y ρ z 1 y y Ly = 1 ρ G max z 1 z L 1 z }{{} C (5) use f Ψ ρ, and combine (1), () with (5) to conclude f (x i (t)) f (L + ρλ max(l))b t + C ρ = O ( 1 ρ + 1 t + ρ t )

13 Numerical example Distributed logistic regression minimize s,r subject to n i=1 j=1 s R 5 φ ( b ij (s a ij + r) ) } {{ } f i (s,r) {(a ij, b ij ) R 3 R : j = 1,..., 5}: training data for agent i φ(t) = log (1 + e t ) geometric graph, n = 0 nodes and 86 edges

14 Constant stepsize e f (t) := 1 n f (x i (t)) f n i=1 f 10 0 Nesterov subgradient dual averaging e f iteration number, k x 10 4 Red: D-NG 1 Blue: subgradient Black: dual averaging 3 1 D. Jakovetić et al., Distributed Nesterov-like gradient algorithms, IEEE 51st Annual Conference on Decision and Control (CDC), 01 A. Nedić and A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE TAC, 54(1), J. Duchi et al., Dual averaging for distributed optimization: convergence and network scaling, IEEE TAC, 57(3), 01

15 Diminishing stepsize e f (t) := 1 n f (x i (t)) f n i=1 f e f Nesterov subgradient dual averaging iteration number, k D-NG: α(t) = 1/t Subgradient, dual averaging: α(t) = 1/ t

16 Distributed Nesterov gradient method (D-NG) with diminishing stepsize Unconstrained problem minimize subject to f (x) := f 1 (x) + f (x) + + f n (x) x R d Diminishing stepsize α(t) = c t+1 x(t) = W y(t 1) α(t 1) F (y(t 1)) (6) y(t) = x(t) + t 1 (x(t) x(t 1)) t + (7) D. Jakovetić et al., Fast cooperative distributed learning, 46th Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 01

17 Convergence analysis: if f i s are differentiable, f i s are bounded and L-Lipschitz continuous, and W is symmetric, stochastic and pos.-def. ( ) log t f (x i (t)) f = O t 1 dependence on network spectral gap 1 λ (W ) agents may ignore L and λ (W ) also known Sample of related work (different assumptions): K. Tsianos and M. Rabbat, Distributed strongly convex optimization, 50th Allerton Conference on Communication, Control and Computing, 01 E. Ghadimi et al., Accelerated gradient methods for networked optimization, arxiv preprint, 01

18 Sketch of proof Step 1: look at network averages x(t) := 1 n n x i (t) i=1 y(t) := 1 n n y i (t) i=1 from (6) and (7) x(t) = y(t 1) α(t 1) n } {{ } 1 L(t 1) n f i (y i (t 1)) i=1 } {{ } f (y(t 1)) y(t) = x(t) + t 1 (x(t) x(t 1)) t + interpretation: inexact Nesterov s gradient method

19 using ideas from optimization with inexact oracles 4, where f (x(t)) f n ct x(0) x + L t δ y (t) := y(t) y(t)1 t 1 s=0 = (y 1 (t) y(t),..., y n (t) y(t)) (s + ) s + 1 δ y (s) (8) 4 O. Devolder et al., First-order methods of smooth convex optimization with inexact oracle, submitted, Mathematical Programming, 011

20 Step : show δ y (t) = O ( ) 1 t rewrite (6) and (7) as the time-varying linear system where A(t) := [ t 1 [ ] δx (t) δ x (t 1) = A(t) [ ] δx (t 1) δ x (t ) + u(t 1), (9) t+1 W t t+1 ] [ ] W α(t)(i J) F (y(t)), u(t) := I 0 0 δx(t) := x(t) x(t)1 J := 1 n 11 W := W J u(t) = O ( ) 1 t due to α(t) = c t+1 and bounded gradient assumption

21 Fact: if x(t) = λx(t 1) + O with λ < 1, then x(t) = O ( ) 1 t ( ) 1 t (10) hand-waving argument: upon approximating [ ] W A(t) W, I 0 and diagonalizing, system (9) reduces to (10) since δ x (t) = O ( 1 t ), δ y (t) = δ x (t) + t 1 t + (δ x(t) δ x (t 1)) ( ) 1 = O t

22 Step 3: relate f (x i ) to f (x) f (x i ) = = n f j (x i ) j=1 n f j (x) + j=1 } {{ } f (x) n f j (x i ) f j (x) j=1 } {{ } (x) (11) by the bounded gradient assumption n (x) = f j (x i ) f j (x) Gn x i x Gn δ x (1) j=1 combine (8), (11) and (1) to conclude ( ) log t f (x i (t)) f = O t

23 Numerical example Acoustic source localization in sensor networks agent i measures r i = position of agent i y i = goal: determine source position x 1 x r i + noise convex approach 5 : n minimize x i=1 dist (x, C i ) { } with C i = x : x r i 1 yi geometric graph, n = 70 nodes and 99 edges 5 A. O. Hero and D. Blatt, Sensor network source localization via projection onto convex sets (POCS), IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 005

24 e f (t) := 1 n f (x i (t)) f n i=1 f avg. relative eror [ log 10 ] dis. Nesterov dis. grad., k =1/(k+1) 1/ dis. grad., k =1/(k+1) 1/3 dis. grad., k =1/(k+1) 1/10 dis. grad., k =1/(k+1) iteraion number, k [ log ] 10 Red: D-NG 6, α(t) = 1/(t + 1) Others: subgradient 7 6 D. Jakovetić et al., Fast cooperative distributed learning, 46th Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 01 7 A. Nedić and A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE TAC, 54(1), 009

25 Numerical example Distributed regularized logistic regression minimize s,r n i=1 φ ( b i (s a i + r) ) + β s }{{} f i (s,r) (a i, b i ): training data for agent i φ(t) = log (1 + e t ) geometric graph, n = 0 nodes and 67 edges

26 e f (t) := 1 n f (x i (t)) f n i=1 f 0 avg. relative error [ log 10 ] dis. grad, k =1/(k+1) dis. Nesterov dis. grad, k =1/(k+1) 1/ dis. grad, k =1/(k+1) 1/3 dis. grad, k =1/(k+1) 1/ iteration number, k [ log 10 ] Red: D-NG, α(t) = 1/(t + 1) Others: subgradient

27 Distributed Nesterov gradient method (D-NC) with consensus iterations x(t) = W a(t) (y(t 1) α F (y(t 1))) ( y(t) = W b(t) x(t) + t 1 ) (x(t) x(t 1)) t + log t a(t) = log λ (W ) and b(t) = log 3 λ (W ) must be known by all agents log λ (W ) + log t log λ (W ) D. Jakovetić et al., Fast distributed gradient methods, arxiv preprint, 011

28 Convergence analysis: if f i s are differentiable, f i s are bounded and L-Lipschitz continuous, W is symmetric and stochastic, and α = 1/(L) ( ) 1 f (x i (t)) f = O t t iterations involve O(t log t) communication rounds dependence on network spectral gap also known Similar rate guarantees in: A. Chen and A. Ozdaglar, A fast distributed proximal gradient method, 50th Allerton Conference on Communication, Control and Computing, 01

29 Distributed optimization via ADMM (D-ADMM) ADMM = Alternate Direction Method of Multipliers Recent review: S. Boyd et al., Distributed optimization and statistical learning via the alternating method of multipliers, Foundations and Trends in Machine Learning, 011 Star network

30 ADMM, distributed optimization: I. Schizas et al., Consensus in ad hoc WSNs with noisy links - part I: distributed estimation of deterministic signals, IEEE TSP, 56(1), 008 many others This talk: Generic network J. Mota et al., D-ADMM: a communication efficient distributed algorithm for separable optimization, IEEE TSP, 61(10), 013 J. Mota et al., Distributed optimization with local domains: applications in MPC and network flows, arxiv preprint, 013

31 Illustrative example: f 1 (x 1,x ) 1 f 3 (x 1,x ) 3 f (x ) minimize x 1,x,x 3 f 1 (x 1, x ) + f (x ) + f 3 (x 1, x ) f i s are proper, closed, convex functions with range R {+ } f i s may depend on different subsets of variables

32 minimize x 1,x,x 3 f 1 (x 1, x ) + f (x ) + f 3 (x 1, x ) Roadmap to obtain a distributed algorithm Step 1: split each variable across associated nodes x (1) 1,x(1) 1 x (3) 1,x(3) 3 x () Step : introduce consistency constraints ( ) ( ) ( ) minimize f 1 x (1) 1, x (1) + f x () + f 3 x (3) 1, x (3) subject to x (1) 1 = x (3) 1 x (1) = x (3) x (1) = x ()

33 Step 3: pass to augmented Lagrangian dual maximize (1,3) λ 1,λ (1,3),λ (1,) with ( ) L ρ λ (1,3) 1, λ (1,3), λ (1,) ( ) L ρ λ (1,3) 1, λ (1,3), λ (1,) ( ) ( = f 1 x (1) 1, x (1) + f x () + λ (1,3) 1, x (1) 1 x (3) 1 + ρ + λ (1,3), x (1) x (3) + ρ + λ (1,), x (1) x () + ρ ) x (3) 1, x (3) 1 x (3) 1 ) + f 3 ( x (1) x (1) x (3) x () x (1)

34 Step 4: color the network f 1 (x 1,x ) 1 f 3 (x 1,x ) 3 f (x ) ( ) L ρ λ (1,3) 1, λ (1,3), λ (1,) ( ) ( = f 1 x (1) 1, x (1) + f x () + λ (1,3) 1, x (1) 1 x (3) 1 + ρ + λ (1,3), x (1) x (3) + ρ + λ (1,), x (1) x () + ρ ) + f 3 ( ) x (3) 1, x (3) 1 x (3) 1 x (1) x (1) x (3) x () x (1)

35 Step 5: apply extended ADMM Primal update at node 1 ( ) x (1) 1, x (1) (t + 1) = argmin x1,x f 1 (x 1, x ) + λ (1,3) 1 (t), x 1 + ρ + λ (1,3) (t), x + ρ + λ (1,) (t), x + ρ x 1 x (3) 1 (t) x x (3) (t) x x () (t)

36 Primal update at node x () (t + 1) = argmin x f (x ) Primal update at node 3 ( ) x (3) 1, x (3) λ (1,) (t), x + ρ x (1) (t + 1) = argmin x1,x f 3 (x 1, x ) λ (1,3) 1 (t), x 1 + ρ λ (1,3) (t), x + ρ (t + 1) x x (1) 1 (t + 1) x 1 (t + 1) x x (1) Key point: nodes and 3 work in parallel (same color)

37 Step 6: dual update at all relevant nodes ( λ (1,3) 1 (t + 1) = λ (1,3) 1 (t) + ρ ( λ (1,3) (t + 1) = λ (1,3) (t) + ρ ( λ (1,) (t + 1) = λ (1,) (t) + ρ x (1) 1 x (1) x (1) (t + 1) x (3) 1 (t + 1) ) (t + 1) x (3) (t + 1) ) (t + 1) x () (t + 1) ) Convergence analysis: for colors (bipartite network): classical ADMM results apply for 3 colors 8 : convergence for strongly convex functions and suitable ρ 8 D. Han and X. Yuan, A note on the alternating direction method of multipliers, JOTA, 155(1), 01

38 Numerical example Consensus minimize x n (x θ i ) i=1 } {{ } f i (x) θ i = measurement of agent i (θ i i.i.d. N ( 10, 10 4) ) Network Model (parameters) # Colors 1 Erdős-Rényi (0.1) 5 Watts-Strogatz (4, 0.4) 4 3 Barabasi () 3 4 Geometric (0.3) 10 5 Lattice (5 10)

39 Communication steps B A D-ADMM C Network number D-ADMM 9 Others: A 10, B 11, C 1 9 J. Mota et al., D-ADMM: a communication efficient distributed algorithm for separable optimization, IEEE TSP, 61(10), I. Schizas et al., Consensus in ad hoc WSNs with noisy links - part I: distributed estimation of deterministic signals, IEEE TSP, 56(1), H. Zhu et al., Distributed in-network channel decoding, IEEE TSP, 57(10), B. Oreshkin et al., Optimization and analysis of distributed averaging with short node memory, IEEE TSP, 58(5), 010

40 LASSO Numerical example minimize x subject to x 1 Ax b σ Column partition: node i holds A i R 00 0 After regularization and dualization: n 1 minimize i=1 λ n (b λ + σ λ ) inf { x x 1 + λ A i x + δ } x }{{} f i (λ)

41 Communication steps B A D-ADMM Network number D-ADMM 13 Others: A 14, B J. Mota et al., D-ADMM: a communication efficient distributed algorithm for separable optimization, IEEE TSP, 61(10), I. Schizas et al., Consensus in ad hoc WSNs with noisy links - part I: distributed estimation of deterministic signals, IEEE TSP, 56(1), H. Zhu et al., Distributed in-network channel decoding, IEEE TSP, 57(10), 009

42 What if variables are not connected? Example: variable x is not connected f 1 (x 1 ) 1 f 3 (x 1,x ) 3 f (x ) Propagate variable across a Steiner tree J. Mota et al., Distributed optimization with local domains: applications in MPC and network flows, arxiv preprint, 013

43 Thank you!

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems

More information

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

Decentralized Consensus Optimization with Asynchrony and Delay

Decentralized Consensus Optimization with Asynchrony and Delay Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

Distributed Optimization over Random Networks

Distributed Optimization over Random Networks Distributed Optimization over Random Networks Ilan Lobel and Asu Ozdaglar Allerton Conference September 2008 Operations Research Center and Electrical Engineering & Computer Science Massachusetts Institute

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Consensus-Based Distributed Optimization with Malicious Nodes

Consensus-Based Distributed Optimization with Malicious Nodes Consensus-Based Distributed Optimization with Malicious Nodes Shreyas Sundaram Bahman Gharesifard Abstract We investigate the vulnerabilities of consensusbased distributed optimization protocols to nodes

More information

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)

More information

Stochastic dynamical modeling:

Stochastic dynamical modeling: Stochastic dynamical modeling: Structured matrix completion of partially available statistics Armin Zare www-bcf.usc.edu/ arminzar Joint work with: Yongxin Chen Mihailo R. Jovanovic Tryphon T. Georgiou

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Distributed Optimization over Networks Gossip-Based Algorithms

Distributed Optimization over Networks Gossip-Based Algorithms Distributed Optimization over Networks Gossip-Based Algorithms Angelia Nedić angelia@illinois.edu ISE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Outline Random

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS

BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS Songtao Lu, Ioannis Tsaknakis, and Mingyi Hong Department of Electrical

More information

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

DLM: Decentralized Linearized Alternating Direction Method of Multipliers 1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

Asynchronous Distributed Optimization. via Randomized Dual Proximal Gradient

Asynchronous Distributed Optimization. via Randomized Dual Proximal Gradient Asynchronous Distributed Optimization 1 via Randomized Dual Proximal Gradient Ivano Notarnicola and Giuseppe Notarstefano arxiv:1509.08373v2 [cs.sy] 24 Jun 2016 Abstract In this paper we consider distributed

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

arxiv: v2 [math.oc] 7 Apr 2017

arxiv: v2 [math.oc] 7 Apr 2017 Optimal algorithms for smooth and strongly convex distributed optimization in networks arxiv:702.08704v2 [math.oc] 7 Apr 207 Kevin Scaman Francis Bach 2 Sébastien Bubeck 3 Yin Tat Lee 3 Laurent Massoulié

More information

On the Linear Convergence of Distributed Optimization over Directed Graphs

On the Linear Convergence of Distributed Optimization over Directed Graphs 1 On the Linear Convergence of Distributed Optimization over Directed Graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v1 [math.oc] 7 Oct 015 Abstract This paper develops a fast distributed algorithm,

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization ESTT: A onconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization Davood Hajinezhad, Mingyi Hong Tuo Zhao Zhaoran Wang Abstract We study a stochastic and distributed algorithm for

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Convergence rates for distributed stochastic optimization over random networks

Convergence rates for distributed stochastic optimization over random networks Convergence rates for distributed stochastic optimization over random networs Dusan Jaovetic, Dragana Bajovic, Anit Kumar Sahu and Soummya Kar Abstract We establish the O ) convergence rate for distributed

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Convergence Rates in Decentralized Optimization. Alex Olshevsky Department of Electrical and Computer Engineering Boston University

Convergence Rates in Decentralized Optimization. Alex Olshevsky Department of Electrical and Computer Engineering Boston University Convergence Rates in Decentralized Optimization Alex Olshevsky Department of Electrical and Computer Engineering Boston University Distributed and Multi-agent Control Strong need for protocols to coordinate

More information

Consensus Optimization with Delayed and Stochastic Gradients on Decentralized Networks

Consensus Optimization with Delayed and Stochastic Gradients on Decentralized Networks 06 IEEE International Conference on Big Data (Big Data) Consensus Optimization with Delayed and Stochastic Gradients on Decentralized Networks Benjamin Sirb Xiaojing Ye Department of Mathematics and Statistics

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

arxiv: v3 [math.oc] 1 Jul 2015

arxiv: v3 [math.oc] 1 Jul 2015 On the Convergence of Decentralized Gradient Descent Kun Yuan Qing Ling Wotao Yin arxiv:1310.7063v3 [math.oc] 1 Jul 015 Abstract Consider the consensus problem of minimizing f(x) = n fi(x), where x Rp

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

On the linear convergence of distributed optimization over directed graphs

On the linear convergence of distributed optimization over directed graphs 1 On the linear convergence of distributed optimization over directed graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v4 [math.oc] 7 May 016 Abstract This paper develops a fast distributed algorithm,

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

5. Subgradient method

5. Subgradient method L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

A Robust Event-Triggered Consensus Strategy for Linear Multi-Agent Systems with Uncertain Network Topology

A Robust Event-Triggered Consensus Strategy for Linear Multi-Agent Systems with Uncertain Network Topology A Robust Event-Triggered Consensus Strategy for Linear Multi-Agent Systems with Uncertain Network Topology Amir Amini, Amir Asif, Arash Mohammadi Electrical and Computer Engineering,, Montreal, Canada.

More information

A Distributed Newton Method for Network Optimization

A Distributed Newton Method for Network Optimization A Distributed Newton Method for Networ Optimization Ali Jadbabaie, Asuman Ozdaglar, and Michael Zargham Abstract Most existing wor uses dual decomposition and subgradient methods to solve networ optimization

More information

A Distributed Newton Method for Network Utility Maximization

A Distributed Newton Method for Network Utility Maximization A Distributed Newton Method for Networ Utility Maximization Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie Abstract Most existing wor uses dual decomposition and subgradient methods to solve Networ Utility

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Distributed Computation of Quantiles via ADMM

Distributed Computation of Quantiles via ADMM 1 Distributed Computation of Quantiles via ADMM Franck Iutzeler Abstract In this paper, we derive distributed synchronous and asynchronous algorithms for computing uantiles of the agents local values.

More information

DECENTRALIZED algorithms are used to solve optimization

DECENTRALIZED algorithms are used to solve optimization 5158 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 19, OCTOBER 1, 016 DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Aryan Mohtari, Wei Shi, Qing Ling,

More information

Distributed Algorithms for Optimization and Nash-Games on Graphs

Distributed Algorithms for Optimization and Nash-Games on Graphs Distributed Algorithms for Optimization and Nash-Games on Graphs Angelia Nedić angelia@illinois.edu ECEE Department Arizona State University, Tempe, AZ 0 Large Networked Systems The recent advances in

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

Resilient Distributed Optimization Algorithm against Adversary Attacks

Resilient Distributed Optimization Algorithm against Adversary Attacks 207 3th IEEE International Conference on Control & Automation (ICCA) July 3-6, 207. Ohrid, Macedonia Resilient Distributed Optimization Algorithm against Adversary Attacks Chengcheng Zhao, Jianping He

More information

On the Convergence Rate of Incremental Aggregated Gradient Algorithms

On the Convergence Rate of Incremental Aggregated Gradient Algorithms On the Convergence Rate of Incremental Aggregated Gradient Algorithms M. Gürbüzbalaban, A. Ozdaglar, P. Parrilo June 5, 2015 Abstract Motivated by applications to distributed optimization over networks

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization

Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization Noname manuscript No. (will be inserted by the editor Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization Davood Hajinezhad and Mingyi Hong Received: date / Accepted: date Abstract

More information

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学 Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

TOPOLOGY FOR GLOBAL AVERAGE CONSENSUS. Soummya Kar and José M. F. Moura

TOPOLOGY FOR GLOBAL AVERAGE CONSENSUS. Soummya Kar and José M. F. Moura TOPOLOGY FOR GLOBAL AVERAGE CONSENSUS Soummya Kar and José M. F. Moura Department of Electrical and Computer Engineering Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail:{moura}@ece.cmu.edu)

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Primal Solutions and Rate Analysis for Subgradient Methods

Primal Solutions and Rate Analysis for Subgradient Methods Primal Solutions and Rate Analysis for Subgradient Methods Asu Ozdaglar Joint work with Angelia Nedić, UIUC Conference on Information Sciences and Systems (CISS) March, 2008 Department of Electrical Engineering

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Distributed and Real-time Predictive Control

Distributed and Real-time Predictive Control Distributed and Real-time Predictive Control Melanie Zeilinger Christian Conte (ETH) Alexander Domahidi (ETH) Ye Pu (EPFL) Colin Jones (EPFL) Challenges in modern control systems Power system: - Frequency

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology February 2014

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Zero-Gradient-Sum Algorithms for Distributed Convex Optimization: The Continuous-Time Case

Zero-Gradient-Sum Algorithms for Distributed Convex Optimization: The Continuous-Time Case 0 American Control Conference on O'Farrell Street, San Francisco, CA, USA June 9 - July 0, 0 Zero-Gradient-Sum Algorithms for Distributed Convex Optimization: The Continuous-Time Case Jie Lu and Choon

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

arxiv: v2 [cs.dc] 2 May 2018

arxiv: v2 [cs.dc] 2 May 2018 On the Convergence Rate of Average Consensus and Distributed Optimization over Unreliable Networks Lili Su EECS, MI arxiv:1606.08904v2 [cs.dc] 2 May 2018 Abstract. We consider the problems of reaching

More information

Final exam.

Final exam. EE364b Convex Optimization II June 4 8, 205 Prof. John C. Duchi Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you

More information

A Distributed Newton Method for Network Utility Maximization, I: Algorithm

A Distributed Newton Method for Network Utility Maximization, I: Algorithm A Distributed Newton Method for Networ Utility Maximization, I: Algorithm Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract Most existing wors use dual decomposition and first-order

More information

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley Model QP/LP: min 1 / 2 x T Qx+c T x s.t. Ax = b, x 0, (1) Lagrangian: L(x,y) = 1 / 2 x T Qx+c T x y T x s.t. Ax = b, (2) where y 0 is the vector of Lagrange

More information

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University

More information

Dealing with Constraints via Random Permutation

Dealing with Constraints via Random Permutation Dealing with Constraints via Random Permutation Ruoyu Sun UIUC Joint work with Zhi-Quan Luo (U of Minnesota and CUHK (SZ)) and Yinyu Ye (Stanford) Simons Institute Workshop on Fast Iterative Methods in

More information

Stochastic gradient methods for machine learning

Stochastic gradient methods for machine learning Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - January 2013 Context Machine

More information

First-order methods for structured nonsmooth optimization

First-order methods for structured nonsmooth optimization First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University

More information