Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual method for functional constraints 1/20
Outline 1 Constrained optimization problem 2 Lagrange multipliers 3 Dual function and dual problem 4 Augmented Lagrangian 5 Switching subgradient method 6 Finding the dual multipliers 7 Complexity analysis Yu. Nesterov Primal-dual method for functional constraints 2/20
Optimization problem: simple constraints Consider the problem: min f (x), x Q where Q is a closed convex set: x, y Q [x, y] Q, f is a subdifferentiable on Q convex function: f (y) f (x) + f (x), y x, Optimality condition: x, y Q, f (x) f (x). point x Q is optimal iff f (x ), x x 0, x Q. Interpretation: Function increases along any feasible direction. Yu. Nesterov Primal-dual method for functional constraints 3/20
Examples 1. Interior solution. Let x int Q. Then f (x ), x x 0, x Q implies f (x ) = 0. 2. Optimization over positive orthant. Let Q R n + = { x R n : x (i) 0, i = 1...., n }. Optimality condition: f (x ), x x 0, x R n +. ( ) Coordinate form: i f (x ) x (i) x (i) 0, x (i) 0. This means that i f (x ) 0, i = 1,..., n, (tend x (i) ) x (i) i f (x ) = 0, i = 1,..., n, (set x (i) = 0.) Yu. Nesterov Primal-dual method for functional constraints 4/20
Optimization problem: functional constraints Problem: min {f 0(x), f i (x) 0, i = 1,..., m}, x Q Q is a closed convex set, where all f i are convex and subdifferentiable on Q, i = 0,..., m: f i (y) f i (x) + f i (x), y x, Optimality condition (KKT, 1951): x, y Q, f i (x) f i (x). point x Q is optimal iff there exist Lagrange multipliers λ (i) 0, i = 1,..., m, such that (1) : f 0 (x ) + m λ (i) f i (x ), x x 0, x Q, i=1 (2) : f i (x ) 0, i = 1,..., m, (feasibility) (3) : λ (i) f i (x ) = 0, i = 1,..., m. (complementary slackness) Yu. Nesterov Primal-dual method for functional constraints 5/20
Lagrange multipliers: interpretation Let I {1,..., m} be an arbitrary set of indexes. Denote f I (x) = f 0 (x) + λ (i) f i (x). Consider the problem P I : i I min {f I(x) : f i (x) 0, i I}. x Q Observation: in any case, x is the optimal solution of problem P I. Interpretation: λ (i) are the shadow prices for resources. (Kantorovich, 1939) Application examples: Traffic congestion: car flows on roads size of queues. Electrical networks: currents in the wires voltage potentials, etc. Main question: How to compute (x, λ )? Yu. Nesterov Primal-dual method for functional constraints 6/20
Algebraic interpretation Consider the Lagrangian Condition KKT(1): f 0 (x ) + m x Q, implies L(x, λ) = f 0 (x) + m λ (i) f i (x). i=1 x Arg min x Q L(x, λ ). i=1 λ (i) f i (x ), x x 0, Define the dual function φ(λ) = min L(x, λ), λ 0. x Q It is concave! By Danskin s Theorem, φ(λ) = (f 1 (x(λ)),..., f m (x(λ)), with x(λ) Arg min L(x, λ). x Q Conditions KKT(2,3): f i (x ) 0, λ (i) f i (x ) = 0, i = 1,..., m, imply (x = x(λ )) λ Arg max λ 0 φ(λ). Yu. Nesterov Primal-dual method for functional constraints 7/20
Algorithmic aspects Main idea: solve the dual problem max φ(λ) λ 0 by the subgradient method: 1. Compute x(λ k ) and define φ(λ k ) = (f 1 (x(λ k )),..., f m (x(λ k ))). 2. Update λ k+1 = Project R n + (λ k + h k φ(λ k )). Stepsizes h k > 0 are defined in the usual way. Main difficulties: Each iteration is time consuming. Unclear termination criterion. Low rate of convergence (O ( 1 ɛ 2 ) upper-level iterations). Yu. Nesterov Primal-dual method for functional constraints 8/20
Augmented Lagrangian (1970 s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas,...] Define the Augmented Lagrangian m L K (x, λ) = f 0 (x) + 1 ( 2K λ (i) + Kf i (x) ) 2 + 1 2K λ 2 2, λ Rm, i=1 where K > 0 is a penalty parameter. Consider the dual function ˆφ(λ) = min x Q L(x, λ). Main properties. Function ˆφ is concave. Lipschitz continuous with constant 1 K. Its gradient is Its unconstrained maximum is attained at the optimal dual solution. The corresponding point ˆx(λ ) is the optimal primal solution. Hint: Check that the equation is equivalent to KKT(2,3). ( λ (i) + Kf i (x) ) + = λ(i) Yu. Nesterov Primal-dual method for functional constraints 9/20
Method of Augmented Lagrangians Note that ˆφ(λ) = 1 K ( λ (i) + Kf i (x) ) + 1 K λ. Therefore, the usual gradient method λ k+1 = λ k + K ˆφ(λ k ) is exactly as follows: Method: λ k+1 = (λ k + Kf (ˆx(λ k ))) +. Advantage: Fast convergence of the dual process. Disadvantages: Difficult iteration. Unclear termination. No global complexity analysis. Do we have an alternative? Yu. Nesterov Primal-dual method for functional constraints 10/20
Problem formulation Problem: f = inf x Q {f 0(x) : f i (x) 0, i = 1,..., m}, where f i (x), i = 0,..., m, are closed convex functions on Q endowed with a first-order black-box oracles, Q E is a bounded simple closed convex set. (We can solve some auxiliary optimization problems over Q.) Defining the Lagrangian L(x, λ) = f 0 (x) + m i=1 λ (i) f i (x), x Q, λ R m +, def we can introduce the Lagrangian dual problem f = sup φ(λ), λ R m + where φ(λ) def = inf L(x, λ). x Q Clearly, f f. Later, we will show f = f algorithmically. Yu. Nesterov Primal-dual method for functional constraints 11/20
Bregman distances Prox-function: d( ) is strongly convex on Q with parameter one: d(y) d(x) + d(x), y x + 1 2 y x 2, x, y Q. Denote by x 0 the prox-center of the set Q: x 0 = arg min x Q d(x). Assume d(x 0 ) = 0. Bregman distance: β(x, y) = d(y) d(x) d(x), y x, x, y Q. Clearly, β(x, y) 1 2 x y 2 for all x, y Q. Bregman mapping: for x Q, g E and h > 0 define B h (x, g) = arg min{h g, y x + β(x, y)}. y Q The first-order condition for point x + def = B h (x, g) is as follows: hg + d(x + ) d(x), y x + 0, y Q. Yu. Nesterov Primal-dual method for functional constraints 12/20
Examples [ n ] 1/2 1. Euclidean distance. We choose x = (x (i) ) 2 and i=1 d(x) = 1 2 x 2. Then β(x, y) = 1 2 x y 2, and we have B h (x, g) = Projection Q (x hg). 2. Entropy distance. We choose x = n x (i) and d(x) = ln n + n x (i) ln x (i). i=1 Then i=1 β(x, y) = n y (i) [ln y (i) ln x (i) ]. i=1 n If Q = {x R n + : x (i) = 1}, then i=1 [ B (i) h (x, g) = x n (i) e hg (i) / x (j) e ], hg (j) i = 1,..., n. j=1 Yu. Nesterov Primal-dual method for functional constraints 13/20
Switching subgradient method Input parameter: the step size h > 0. Initialization : Compute the prox-center x 0. Iteration k 0 : a) Define I k = {i {1,..., m} : f i (x k ) > h f i (x k ) }. b) If I k =, then compute x k+1 = B h (x k, f 0 (x k ) f 0 (x k ) ). c) If I k, then choose arbitrary i k I k and define h k = f i k (x k ). Compute x f ik (x k ) 2 k+1 = B hk (x k, f ik (x k )). After t 0 iterations, define F t = {k {0,..., t} : I k = }. Denote N(t) = F(t). It is possible that N(t) = 0. Yu. Nesterov Primal-dual method for functional constraints 14/20
Finding the dual multipliers if N(t) > 0, define the dual multipliers as follows: = h λ (0) t λ (i) t = 1 λ (0) t 1 f 0 (x k ), k F t k A i (t) h k, i = 1,..., m, where A i (t) = {k {0,..., t} : i k = i}, 0 i m. Denote S t = k F t 1 f 0 (x k ). If F t =, then we define S t = 0. For proving convergence of the switching strategy, we find an upper bound for the gap δ t = 1 f 0 (x k ) S t f 0 (x k ) φ(λ t ), assuming that N(t) > 0. k F(t) Yu. Nesterov Primal-dual method for functional constraints 15/20
Convergence analysis Note that λ (0) t = h S(t). Therefore { λ (0) t δ t = sup h f 0 (x k ) f 0 (x k ) λ (0) t f 0 (x) m x Q = sup x Q sup x Q { { h h k F(t) k F(t) k F(t) f 0 (x k ) f 0 (x) f 0 ( k ) k F(t) f 0 (x k ),x k x f 0 (x k ) + k F(t) h k f ik (x) } i=1 k A i (t) h k f i (x) } h k [ f ik (x k ), x k x f ik (x k )] Let us estimate from above the right-hand side of this inequality. }. Yu. Nesterov Primal-dual method for functional constraints 16/20
Feasible step For arbitrary x Q, denote r t (x) = β(x t, x). Then r t+1 (x) r t (x) = [d(x) d(x t+1 ) d(x t+1 ), x x t+1 ] [d(x) d(x t ) d(x t ), x x t ] = d(x t ) d(x t+1 ), x x t+1 [d(x t+1 ) d(x t ) d(x t ), x t+1 x t ] d(x t ) d(x t+1 ), x x t+1 1 2 x t x t+1 2. In view of optimality condition, for all x Q and k F(t) we have h f 0 (x k ) f 0 (x k ), x k+1 x d(x k+1 ) d(x k ), x x k+1. Assume that k F t. In this case, r k+1 (x) r k (x) h f 0 (x k ) f 0 (x k ), x k+1 x 1 2 x k x k+1 2 h f 0 (x k ) f 0 (x k ), x k x + 1 2 h2. Yu. Nesterov Primal-dual method for functional constraints 17/20
Infeasible step If k F(t), then the optimality condition defining the point x k+1 looks as follows: h k f ik (x k ), x k+1 x d(x k+1 ) d(x k ), x x k+1. Therefore, r k+1 (x) r k (x) h k f ik (x k ), x k+1 x 1 2 x k x k+1 2 Hence, h k f ik (x k ), x k x + 1 2 h2 k f i k (x k ) 2. h k [ f ik (x k ), x k x f ik (x k )] r k (x) r k+1 (x) f 2 i k (x k ) 2 f ik (x k ) 2 r k (x) r k+1 (x) 1 2 h2. Yu. Nesterov Primal-dual method for functional constraints 18/20
Convergence result Summing up all inequalities for k = 0,..., t, and taking into account that r t+1 (x) 0, we obtain λ (0) t δ t r 0 (x) + 1 2 N(t)h2 1 2 (t N(t))h2 = r 0 (x) 1 2 th2 + N(t)h 2. Denote D = max x Q r 0(x). Theorem. If the number t 2 h 2 D, then F(t). In this case δ t Mh and max f i(x k ) Mh, k F(t) 1 i m where M = max max f i(x k ). 0 i m 0 k t Proof: If F(t) =, then N(t) = 0. Consequently, λ (0) t = 0. This is impossible for t big enough. Finally, λ (0) t h M N(t). Therefore, if t is big enough, then δ t N(t)h2 Mh. λ (0) t Yu. Nesterov Primal-dual method for functional constraints 19/20
Conclusion 1. Optimal primal-dual solution can be approximated by a simple switching subgradient scheme. 2. Dual process looks as a coordinate-descent method. 3. Approximations of dual multipliers have natural interpretation : relative importance of corresponding constraints during the adjustments process. 4. However, it has optimal worst-case efficiency estimate even if the dual optimal solution does not exist. 5. Many interesting questions (influence of smoothness, strong convexity, etc.) Thank you for your attention! Yu. Nesterov Primal-dual method for functional constraints 20/20