Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Similar documents
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Constrained Optimization and Lagrangian Duality

12. Interior-point methods

Solving Dual Problems

Convex Optimization & Lagrange Duality

Lecture 18: Optimization Programming

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Lagrange Relaxation and Duality

ICS-E4030 Kernel Methods in Machine Learning

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Lagrange duality. The Lagrangian. We consider an optimization program of the form

10 Numerical methods for constrained problems

5. Duality. Lagrangian

Optimization for Machine Learning

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Primal/Dual Decomposition Methods

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Convex Optimization Boyd & Vandenberghe. 5. Duality

Gradient methods for minimizing composite functions

Lecture: Duality.

Double Smoothing technique for Convex Optimization Problems with Linear Constraints

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Interior Point Algorithms for Constrained Convex Optimization

Gradient methods for minimizing composite functions

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization M2

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

10-725/ Optimization Midterm Exam

Linear and Combinatorial Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

9. Dual decomposition and dual algorithms

Dual Decomposition.

Nesterov s Optimal Gradient Methods

Primal-dual subgradient methods for convex problems

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by

Lecture 3. Optimization Problems and Iterative Algorithms

CS-E4830 Kernel Methods in Machine Learning

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

On Nesterov s Random Coordinate Descent Algorithms - Continued

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Lecture: Smoothing.

IE 5531 Midterm #2 Solutions

Lecture: Duality of LP, SOCP and SDP

Lecture 6: Conic Optimization September 8

12. Interior-point methods

Part IB Optimisation

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Algorithms for Constrained Optimization

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Optimality, Duality, Complementarity for Constrained Optimization

Primal-dual IPM with Asymmetric Barrier

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

On the Method of Lagrange Multipliers

Universal Gradient Methods for Convex Optimization Problems

Algorithms for constrained local optimization

Support Vector Machines: Maximum Margin Classifiers

Lagrangian Duality Theory

3.10 Lagrangian relaxation

Generalization to inequality constrained problem. Maximize

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Sparse Optimization Lecture: Dual Methods, Part I

Math 273a: Optimization Convex Conjugacy

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Optimisation in Higher Dimensions

Convex Optimization and SVM

Accelerated Proximal Gradient Methods for Convex Optimization

Lecture 3: Huge-scale optimization problems

Convex Optimization and l 1 -minimization

Numerical Optimization

CS711008Z Algorithm Design and Analysis

EE/AA 578, Univ of Washington, Fall Duality

Convex Optimization and Support Vector Machine

Homework Set #6 - Solutions

arxiv: v1 [math.oc] 10 May 2016

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual Proximal Gradient Method

1. f(β) 0 (that is, β is a feasible point for the constraints)

Lecture 9 Sequential unconstrained minimization

6. Proximal gradient method

Complexity bounds for primal-dual methods minimizing the model of objective function

Convex Optimization and Modeling

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Cubic regularization of Newton s method for convex problems with constraints

CORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28

Tutorial on Convex Optimization: Part II

Chap 2. Optimality conditions

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

LINEAR AND NONLINEAR PROGRAMMING

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

subject to (x 2)(x 4) u,

Smooth minimization of non-smooth functions

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

Transcription:

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual method for functional constraints 1/20

Outline 1 Constrained optimization problem 2 Lagrange multipliers 3 Dual function and dual problem 4 Augmented Lagrangian 5 Switching subgradient method 6 Finding the dual multipliers 7 Complexity analysis Yu. Nesterov Primal-dual method for functional constraints 2/20

Optimization problem: simple constraints Consider the problem: min f (x), x Q where Q is a closed convex set: x, y Q [x, y] Q, f is a subdifferentiable on Q convex function: f (y) f (x) + f (x), y x, Optimality condition: x, y Q, f (x) f (x). point x Q is optimal iff f (x ), x x 0, x Q. Interpretation: Function increases along any feasible direction. Yu. Nesterov Primal-dual method for functional constraints 3/20

Examples 1. Interior solution. Let x int Q. Then f (x ), x x 0, x Q implies f (x ) = 0. 2. Optimization over positive orthant. Let Q R n + = { x R n : x (i) 0, i = 1...., n }. Optimality condition: f (x ), x x 0, x R n +. ( ) Coordinate form: i f (x ) x (i) x (i) 0, x (i) 0. This means that i f (x ) 0, i = 1,..., n, (tend x (i) ) x (i) i f (x ) = 0, i = 1,..., n, (set x (i) = 0.) Yu. Nesterov Primal-dual method for functional constraints 4/20

Optimization problem: functional constraints Problem: min {f 0(x), f i (x) 0, i = 1,..., m}, x Q Q is a closed convex set, where all f i are convex and subdifferentiable on Q, i = 0,..., m: f i (y) f i (x) + f i (x), y x, Optimality condition (KKT, 1951): x, y Q, f i (x) f i (x). point x Q is optimal iff there exist Lagrange multipliers λ (i) 0, i = 1,..., m, such that (1) : f 0 (x ) + m λ (i) f i (x ), x x 0, x Q, i=1 (2) : f i (x ) 0, i = 1,..., m, (feasibility) (3) : λ (i) f i (x ) = 0, i = 1,..., m. (complementary slackness) Yu. Nesterov Primal-dual method for functional constraints 5/20

Lagrange multipliers: interpretation Let I {1,..., m} be an arbitrary set of indexes. Denote f I (x) = f 0 (x) + λ (i) f i (x). Consider the problem P I : i I min {f I(x) : f i (x) 0, i I}. x Q Observation: in any case, x is the optimal solution of problem P I. Interpretation: λ (i) are the shadow prices for resources. (Kantorovich, 1939) Application examples: Traffic congestion: car flows on roads size of queues. Electrical networks: currents in the wires voltage potentials, etc. Main question: How to compute (x, λ )? Yu. Nesterov Primal-dual method for functional constraints 6/20

Algebraic interpretation Consider the Lagrangian Condition KKT(1): f 0 (x ) + m x Q, implies L(x, λ) = f 0 (x) + m λ (i) f i (x). i=1 x Arg min x Q L(x, λ ). i=1 λ (i) f i (x ), x x 0, Define the dual function φ(λ) = min L(x, λ), λ 0. x Q It is concave! By Danskin s Theorem, φ(λ) = (f 1 (x(λ)),..., f m (x(λ)), with x(λ) Arg min L(x, λ). x Q Conditions KKT(2,3): f i (x ) 0, λ (i) f i (x ) = 0, i = 1,..., m, imply (x = x(λ )) λ Arg max λ 0 φ(λ). Yu. Nesterov Primal-dual method for functional constraints 7/20

Algorithmic aspects Main idea: solve the dual problem max φ(λ) λ 0 by the subgradient method: 1. Compute x(λ k ) and define φ(λ k ) = (f 1 (x(λ k )),..., f m (x(λ k ))). 2. Update λ k+1 = Project R n + (λ k + h k φ(λ k )). Stepsizes h k > 0 are defined in the usual way. Main difficulties: Each iteration is time consuming. Unclear termination criterion. Low rate of convergence (O ( 1 ɛ 2 ) upper-level iterations). Yu. Nesterov Primal-dual method for functional constraints 8/20

Augmented Lagrangian (1970 s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas,...] Define the Augmented Lagrangian m L K (x, λ) = f 0 (x) + 1 ( 2K λ (i) + Kf i (x) ) 2 + 1 2K λ 2 2, λ Rm, i=1 where K > 0 is a penalty parameter. Consider the dual function ˆφ(λ) = min x Q L(x, λ). Main properties. Function ˆφ is concave. Lipschitz continuous with constant 1 K. Its gradient is Its unconstrained maximum is attained at the optimal dual solution. The corresponding point ˆx(λ ) is the optimal primal solution. Hint: Check that the equation is equivalent to KKT(2,3). ( λ (i) + Kf i (x) ) + = λ(i) Yu. Nesterov Primal-dual method for functional constraints 9/20

Method of Augmented Lagrangians Note that ˆφ(λ) = 1 K ( λ (i) + Kf i (x) ) + 1 K λ. Therefore, the usual gradient method λ k+1 = λ k + K ˆφ(λ k ) is exactly as follows: Method: λ k+1 = (λ k + Kf (ˆx(λ k ))) +. Advantage: Fast convergence of the dual process. Disadvantages: Difficult iteration. Unclear termination. No global complexity analysis. Do we have an alternative? Yu. Nesterov Primal-dual method for functional constraints 10/20

Problem formulation Problem: f = inf x Q {f 0(x) : f i (x) 0, i = 1,..., m}, where f i (x), i = 0,..., m, are closed convex functions on Q endowed with a first-order black-box oracles, Q E is a bounded simple closed convex set. (We can solve some auxiliary optimization problems over Q.) Defining the Lagrangian L(x, λ) = f 0 (x) + m i=1 λ (i) f i (x), x Q, λ R m +, def we can introduce the Lagrangian dual problem f = sup φ(λ), λ R m + where φ(λ) def = inf L(x, λ). x Q Clearly, f f. Later, we will show f = f algorithmically. Yu. Nesterov Primal-dual method for functional constraints 11/20

Bregman distances Prox-function: d( ) is strongly convex on Q with parameter one: d(y) d(x) + d(x), y x + 1 2 y x 2, x, y Q. Denote by x 0 the prox-center of the set Q: x 0 = arg min x Q d(x). Assume d(x 0 ) = 0. Bregman distance: β(x, y) = d(y) d(x) d(x), y x, x, y Q. Clearly, β(x, y) 1 2 x y 2 for all x, y Q. Bregman mapping: for x Q, g E and h > 0 define B h (x, g) = arg min{h g, y x + β(x, y)}. y Q The first-order condition for point x + def = B h (x, g) is as follows: hg + d(x + ) d(x), y x + 0, y Q. Yu. Nesterov Primal-dual method for functional constraints 12/20

Examples [ n ] 1/2 1. Euclidean distance. We choose x = (x (i) ) 2 and i=1 d(x) = 1 2 x 2. Then β(x, y) = 1 2 x y 2, and we have B h (x, g) = Projection Q (x hg). 2. Entropy distance. We choose x = n x (i) and d(x) = ln n + n x (i) ln x (i). i=1 Then i=1 β(x, y) = n y (i) [ln y (i) ln x (i) ]. i=1 n If Q = {x R n + : x (i) = 1}, then i=1 [ B (i) h (x, g) = x n (i) e hg (i) / x (j) e ], hg (j) i = 1,..., n. j=1 Yu. Nesterov Primal-dual method for functional constraints 13/20

Switching subgradient method Input parameter: the step size h > 0. Initialization : Compute the prox-center x 0. Iteration k 0 : a) Define I k = {i {1,..., m} : f i (x k ) > h f i (x k ) }. b) If I k =, then compute x k+1 = B h (x k, f 0 (x k ) f 0 (x k ) ). c) If I k, then choose arbitrary i k I k and define h k = f i k (x k ). Compute x f ik (x k ) 2 k+1 = B hk (x k, f ik (x k )). After t 0 iterations, define F t = {k {0,..., t} : I k = }. Denote N(t) = F(t). It is possible that N(t) = 0. Yu. Nesterov Primal-dual method for functional constraints 14/20

Finding the dual multipliers if N(t) > 0, define the dual multipliers as follows: = h λ (0) t λ (i) t = 1 λ (0) t 1 f 0 (x k ), k F t k A i (t) h k, i = 1,..., m, where A i (t) = {k {0,..., t} : i k = i}, 0 i m. Denote S t = k F t 1 f 0 (x k ). If F t =, then we define S t = 0. For proving convergence of the switching strategy, we find an upper bound for the gap δ t = 1 f 0 (x k ) S t f 0 (x k ) φ(λ t ), assuming that N(t) > 0. k F(t) Yu. Nesterov Primal-dual method for functional constraints 15/20

Convergence analysis Note that λ (0) t = h S(t). Therefore { λ (0) t δ t = sup h f 0 (x k ) f 0 (x k ) λ (0) t f 0 (x) m x Q = sup x Q sup x Q { { h h k F(t) k F(t) k F(t) f 0 (x k ) f 0 (x) f 0 ( k ) k F(t) f 0 (x k ),x k x f 0 (x k ) + k F(t) h k f ik (x) } i=1 k A i (t) h k f i (x) } h k [ f ik (x k ), x k x f ik (x k )] Let us estimate from above the right-hand side of this inequality. }. Yu. Nesterov Primal-dual method for functional constraints 16/20

Feasible step For arbitrary x Q, denote r t (x) = β(x t, x). Then r t+1 (x) r t (x) = [d(x) d(x t+1 ) d(x t+1 ), x x t+1 ] [d(x) d(x t ) d(x t ), x x t ] = d(x t ) d(x t+1 ), x x t+1 [d(x t+1 ) d(x t ) d(x t ), x t+1 x t ] d(x t ) d(x t+1 ), x x t+1 1 2 x t x t+1 2. In view of optimality condition, for all x Q and k F(t) we have h f 0 (x k ) f 0 (x k ), x k+1 x d(x k+1 ) d(x k ), x x k+1. Assume that k F t. In this case, r k+1 (x) r k (x) h f 0 (x k ) f 0 (x k ), x k+1 x 1 2 x k x k+1 2 h f 0 (x k ) f 0 (x k ), x k x + 1 2 h2. Yu. Nesterov Primal-dual method for functional constraints 17/20

Infeasible step If k F(t), then the optimality condition defining the point x k+1 looks as follows: h k f ik (x k ), x k+1 x d(x k+1 ) d(x k ), x x k+1. Therefore, r k+1 (x) r k (x) h k f ik (x k ), x k+1 x 1 2 x k x k+1 2 Hence, h k f ik (x k ), x k x + 1 2 h2 k f i k (x k ) 2. h k [ f ik (x k ), x k x f ik (x k )] r k (x) r k+1 (x) f 2 i k (x k ) 2 f ik (x k ) 2 r k (x) r k+1 (x) 1 2 h2. Yu. Nesterov Primal-dual method for functional constraints 18/20

Convergence result Summing up all inequalities for k = 0,..., t, and taking into account that r t+1 (x) 0, we obtain λ (0) t δ t r 0 (x) + 1 2 N(t)h2 1 2 (t N(t))h2 = r 0 (x) 1 2 th2 + N(t)h 2. Denote D = max x Q r 0(x). Theorem. If the number t 2 h 2 D, then F(t). In this case δ t Mh and max f i(x k ) Mh, k F(t) 1 i m where M = max max f i(x k ). 0 i m 0 k t Proof: If F(t) =, then N(t) = 0. Consequently, λ (0) t = 0. This is impossible for t big enough. Finally, λ (0) t h M N(t). Therefore, if t is big enough, then δ t N(t)h2 Mh. λ (0) t Yu. Nesterov Primal-dual method for functional constraints 19/20

Conclusion 1. Optimal primal-dual solution can be approximated by a simple switching subgradient scheme. 2. Dual process looks as a coordinate-descent method. 3. Approximations of dual multipliers have natural interpretation : relative importance of corresponding constraints during the adjustments process. 4. However, it has optimal worst-case efficiency estimate even if the dual optimal solution does not exist. 5. Many interesting questions (influence of smoothness, strong convexity, etc.) Thank you for your attention! Yu. Nesterov Primal-dual method for functional constraints 20/20