Dual Decomposition.

Similar documents
Dual Proximal Gradient Method

9. Dual decomposition and dual algorithms

The proximal mapping

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

Lecture: Smoothing.

Lecture: Duality of LP, SOCP and SDP

6. Proximal gradient method

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Distributed Optimization: Analysis and Synthesis via Circuits

Solving Dual Problems

Convex Optimization Conjugate, Subdifferential, Proximation

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Dual and primal-dual methods

Lagrange duality. The Lagrangian. We consider an optimization program of the form

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization & Lagrange Duality

Constrained Optimization and Lagrangian Duality

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture: Duality.

Primal/Dual Decomposition Methods

Optimization for Machine Learning

EE364b Homework 5. A ij = φ i (x i,y i ) subject to Ax + s = 0, Ay + t = 0, with variables x, y R n. This is the bi-commodity network flow problem.

5. Duality. Lagrangian

6. Proximal gradient method

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

8. Conjugate functions

Convex Optimization M2

Routing. Topics: 6.976/ESD.937 1

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Math 273a: Optimization Convex Conjugacy

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Optimization and Optimal Control in Banach Spaces

Conditional Gradient (Frank-Wolfe) Method

An Interior-Point Method for Large Scale Network Utility Maximization

15. Conic optimization

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Dynamic Network Utility Maximization with Delivery Contracts

Additional Homework Problems

ICS-E4030 Kernel Methods in Machine Learning

minimize x subject to (x 2)(x 4) u,

LECTURE 13 LECTURE OUTLINE

Lecture 1: Background on Convex Analysis

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

5. Subgradient method

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Math 273a: Optimization Subgradients of convex functions

Proximal Methods for Optimization with Spasity-inducing Norms

Distributed Estimation via Dual Decomposition

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

CS-E4830 Kernel Methods in Machine Learning

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Convex Optimization Boyd & Vandenberghe. 5. Duality

BASICS OF CONVEX ANALYSIS

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Homework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent

subject to (x 2)(x 4) u,

Lagrangian Duality and Convex Optimization

Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient

Optimization methods

Newton s Method. Javier Peña Convex Optimization /36-725

Dual Ascent. Ryan Tibshirani Convex Optimization

11. Equality constrained minimization

Minimum cost transportation problem

EE 227A: Convex Optimization and Applications October 14, 2008

Extended Monotropic Programming and Duality 1

Lagrange Relaxation and Duality

Homework 3. Convex Optimization /36-725

Lecture Note 5: Semidefinite Programming for Stability Analysis

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Convex Optimization and Modeling

Duality Theory of Constrained Optimization

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

12. Interior-point methods

Splitting methods for decomposing separable convex programs

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Nonlinear Programming

Brøndsted-Rockafellar property of subdifferentials of prox-bounded functions. Marc Lassonde Université des Antilles et de la Guyane

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Distributed Multiuser Optimization: Algorithms and Error Analysis

Dual methods for the minimization of the total variation

MVE165/MMG631 Linear and integer optimization with applications Lecture 8 Discrete optimization: theory and algorithms

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control

Transcription:

1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes

Outline 2/34 1 Conjugate function 2 introduction: dual methods 3 gradient and subgradient of conjugate 4 dual decomposition 5 network utility maximization 6 network flow optimization

Conjugate function 3/34 the conjugate of a function f is f (y) = sup (y T x f (x)) x dom f f is closed and convex even if f is not Fenchel s s inequality f (x) + f (y) x T y x, y (extends inequality x T x/2 + y T y/2 x T y to non-quadratic convex f )

The second conjugate 4/34 f (x) = sup (x T y f (y)) y dom f f (x) is closed and convex from Fenchel s inequality x T y f (y) f (x) for all y and x): f f (x) x equivalently, epi f epi f (for any f ) if f is closed and convex, then f (x) = f (x) x equivalently, epi f = epi f (if f is closed convex); proof on next page

Conjugates and subgradients 5/34 if f is closed and convex, then y f (x) x f (y) x T y = f (x) + f (y) proof: if y f (x), then f (y) = sup u (y T u f (u)) = y T x f (x) f (v) = sup(v T u f (u) u v T x f (x) = x T (v y) f (x) + y T x = f (y) + x T (v y) (1) for all v; therefore, x is a subgradient of f at y (x f (y)) reverse implication x f (y) y f (x) follows from f = f

6/34 Moreau decomposition ( prox f (x) = argmin f (u) + 1 ) u 2 u x 2 2 x = prox f (x) + prox f (x) x follows from properties of conjugates and subgradients: u = prox f (x) x u f (u) u f (x u) x u = prox f (x) generalizes decomposition by orthogonal projection on subspaces: x = P L(x) + P L (x) if L is a subspace, L its orthogonal complement (this is Moreau decomposition with f = I L, f = I L )

Outline 7/34 1 Conjugate function 2 introduction: dual methods 3 gradient and subgradient of conjugate 4 dual decomposition 5 network utility maximization 6 network flow optimization

Duality and conjugates 8/34 primal problem (A R m n, f and g convex) min f (x) + g(ax) Lagrangian (after introducing new variable y = Ax) dual function inf x (f (x) + z T Ax) + inf y f (x) + g(y) + z T (Ax y) (g(y) z T y) = f ( A T z) g (z) dual problem max f ( A T z) g (z)

Examples 9/34 equality constraints: g is indicator for {b} min f (x) max b T z f ( A T z) s.t. Ax = b linear inequality constraints: g is indicator for {y y b} min f (x) max b T z f ( A T z) s.t. Ax b s.t. z 0 norm regularization: g(y) = y b min f (x) + Ax b max b T z f ( A T z) s.t. z 1

10/34 Dual methods apply first-order method to dual problem max f ( A T z) g (z) reasons why dual problem may be easier for first-order method: dual problem is unconstrained or has simple constraints dual objective is differentiable or has a simple nondifferentiable term decomposition: exploit separable structure

Outline 11/34 1 Conjugate function 2 introduction: dual methods 3 gradient and subgradient of conjugate 4 dual decomposition 5 network utility maximization 6 network flow optimization

(Sub-)gradients of conjugate function assume f : R n R is closed and convex with conjugate subgradient f (y) = sup(y T x f (x)) x f is subdifferentiable on (at least) int dom f maximizers in the definition of f (y) are subgradients at y y f (x) y T x f (x) = f (y) x f (y) gradient: for strictly convex f, maximizer in definition is unique if it exists f (y) = argmax(y T x f (x)) (if maximum is attained) x 12/34

13/34 Conjugate of strongly convex function assume f is closed and strongly convex, with parameter µ > 0 f is defined for all y (i.e., dom f = R n ) f is differentiable everywhere, with gradient f (y) = argmax(y T x f (x)) x f is Lipschitz continuous with constant 1/µ f (y) f (y ) 2 1 µ y y 2 y, y

14/34 proof: if f is strongly convex and closed y T x f (x) has a unique maximizer x for every y x maximizes y T x f (x) if and only if y f (x); y f (x) x f (y) = { f (y)} hence f (y) = argmax x (y T x f (x)) from convexity of f (x) (µ/2)x T x : (y y ) T (x x ) µ x x 2 2 if y f (x), y f (x ) this is co-coercivity of f (which implies Lipschitz continuity) (y y ) T ( f (y) f (y )) µ f (y) f (y ) 2 2

Outline 15/34 1 Conjugate function 2 introduction: dual methods 3 gradient and subgradient of conjugate 4 dual decomposition 5 network utility maximization 6 network flow optimization

16/34 Equality constraints P: min f (x) D: min f ( A T z) + b T z s.t. Ax = b dual gradient ascent (assuming dom f = R n ): ˆx = argmin(f (x) + z T Ax), z + = z + t(aˆx b) x ˆx is a subgradient of f at A T z ( i.e., ˆx f ( A T z)) b Aˆx is a subgradient of f ( A T z) + b T z at z It is of interest if calculation of ˆx is inexpensive (for example, f is separable)

Dual decomposition 17/34 convex problem with separable objective min f 1 (x 1 ) + f 2 (x 2 ) s.t. A 1 x 1 + A 2 x 2 b constraint is complicating or coupling constraint dual problem max f1 ( AT 1 z) f 2 ( AT 2 z) bt z s.t. z 0 can be solved by (sub-)gradient projection if z 0 is the only constraint

18/34 Dual subgradient projection subproblems: to calculate fj ( AT j z) and a (sub-) gradient for it, min x j f j (x j ) + z T A j x j optimal value is f j ( AT j z); minimizer ˆx j is in f j ( AT j z) dual subgradient projection method ˆx j = argmin x j (f j (x j ) + z T A j x j ), j = 1, 2 z + = (z + t(a 1ˆx 1 + A 2ˆx 2 b)) + minimization problems over x 1, x 2 are independent z-update is projected subgradient step (u + = max{u, 0} elementwise)

Quadratic programming example 19/34 min s.t. r j=1 (xt j P jx j + q T j x j) B j x j d j, j = 1,..., r p j=1 A jx j b r = 10, variables x j R 100, 10 coupling constraints (A j R 10 100 ) P j 0; implies dual function has Lipschitz continuous gradient subproblems: each iteration requires solving 10 decoupled QPs min xj s.t. xj TP jx j + (q j + A T j z)t x j B j x j d j

20/34 gradient projection and fast gradient projection fixed step size (equal in the two methods) plot shows dual objective gap

Outline 21/34 1 Conjugate function 2 introduction: dual methods 3 gradient and subgradient of conjugate 4 dual decomposition 5 network utility maximization 6 network flow optimization

Network utility maximization 22/34 network flows n flows, with fixed routes, in a network with m links variable x j 0 denotes the rate of flow j flow utility is U j : R R, concave, increasing capacity constraints traffic y i on link i is sum of flows passing through it y = Rx, where R is the routing matrix { 1 flow j passes over link i R ij = 0 otherwise link capacity constraint: y c

Dual network utility maximization problem 23/34 max n j=1 U j(x j ) s.t. Rx c a convex problem; dual decomposition gives decentralized method dual problem min c T z + n j=1 ( U j) (rj Tz) s.t. z 0 z i is price (per unit flow) for using link i r T j z is the sum of prices along routej (r j is jth column of R)

(Sub-)gradients of dual function 24/34 dual objective subgradient f (x) = c T z + = c T z + n ( U j ) (rj T z) j=1 n j=1 sup x j (U j (x j ) (r T j z)x j ) c Rˆx f (z) where ˆx j = argmax x j (U j (x j ) (r T j z)x j ) if U j is strictly concave, this is a gradient r T j z is the sum of link prices along route j c Rˆx is vector of link capacity margins for flow ˆx

Dual decomposition algorithm given initial link price vector z 0 ( e.g., z = 1), repeat: 1 sum link prices along each route: calculate λ j = r T j z for j = 1,..., n 2 optimize flows (separately) using flow prices ˆx j = argmax(u j (x j ) λ j x j ), j = 1,..., n x j 3 calculate link capacity margins s = c Rˆx 4 update link prices using projected (sub-)gradient step with step t z := (z ts) + decentralized: to find λ j, ˆx source j only needs to know the prices on its route to update s i, z i, link i only needs to know the flows that pass through it 25/34

Outline 26/34 1 Conjugate function 2 introduction: dual methods 3 gradient and subgradient of conjugate 4 dual decomposition 5 network utility maximization 6 network flow optimization

Single commodity network flow 27/34 network connected, directed graph with n links/arcs, m nodes node-arc incidence matrix A R m n is 1 arc j enters node i A ij = 1 arc j leaves node i 0 otherwise flow vector and external sources variable x j denotes flow (traffic) on arc j b i is external demand (or supply) of flow at node i (satisfies 1 T b = 0) flow conservation: Ax = b

Network flow optimization problem 28/34 min s.t. φ(x) = n j=1 φ j(x j ) Ax = b φ is a separable sum of convex functions dual decomposition yields decentralized solution method dual problem ( a j is jth column of A) n max b T z φ j ( a T j z) j=1 dual variable z i can be interpreted as potential at node i y j = a T j z is the potential difference across arc j (potential at start node minus potential at end node)

29/34 (Sub-)gradients of dual function negative dual objective f (z) = b T z + subgradient n φ j ( a T j z) j=1 b Aˆx f (z) where ˆx j = argmin(φ j (x j ) + (a T j z)x j ) this is a gradient if the functions φ j are strictly convex if φ j is differentiable, φ j (ˆx j) = a T j z

30/34 Dual decomposition network flow algorithm given initial potential vector z, repeat 1 determine link flows from potential differences y = A T z ˆx j = argmin(φ j (x j ) y j x j ), j = 1,..., n x j 2 compute flow residual at each node: s := b Aˆx 3 update node potentials using (sub-)gradient step with step size t decentralized z := z ts flow is calculated from potential difference across arc node potential is updated from its own flow surplus

Electrical network interpretation 31/34 network flow optimality conditions (with differentiable φ j ) Ax = b, y + A T z = 0, y j = φ j(x j ), j = 1,..., n network with node incidence matrix A, nonlinear resistors in branches Kirchhoff current law (KCL): Ax = b x j is the current flow in branch j; b i is external current extracted at node i Kirchhoff voltage law (KVL): y + A T z = 0 z j is node potential; y j = a T j z is jth branch voltage current-voltage characterics: y j = φ j (x j) for example, φ j (x j ) = R j xj 2/2 for linear resistor R j current and potentials in circuit are optimal flows and dual variables

Example: minimum queueing delay 32/34 flow cost function and conjugate (c j > 0 are link capacities): φ j (x j ) = (with dom φ j = [0, c j )) x { j, φ ( cj y j (y j ) = j 1) 2 y j > 1/c j c j x j 0 y j 1/c j φ j is differentiable except at x j = 0 φ j (0) = (, 0], φ j(x j ) = φ j is differentiable c j (c j x j ) 2 (0 < x j < c j ) { φ j 0 yj 1/c (y j ) = j c j c j /y j y j > 1/c j

flow cost function and conjugate (c j = 1) 33/34 derivatives

34/34 References S. Boyd, course notes for EE364b. Lectures and notes on decomposition M. Chiang, S.H. Low, A.R. Calderbank, J.C. Doyle, Layering as optimization decomposition: A mathematical theory of network architectures, Proceedings IEEE (2007) D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods (1989) D.P. Bertsekas, Network Optimization. Continuous and Discrete Models (1998) L.S. Lasdon, Optimization Theory for Large Systems (1970)