AUGMENTED LAGRANGIAN METHODS FOR CONVEX OPTIMIZATION

Similar documents
On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Convex Optimization Algorithms for Machine Learning in 10 Slides

Self-dual Smooth Approximations of Convex Functions via the Proximal Average

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

A Unified Approach to Proximal Algorithms using Bregman Distance

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Coordinate Update Algorithm Short Course Operator Splitting

arxiv: v4 [math.oc] 29 Jan 2018

On convergence rate of the Douglas-Rachford operator splitting method

Optimization for Learning and Big Data

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms

minimize x subject to (x 2)(x 4) u,

Accelerated primal-dual methods for linearly constrained convex problems

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1

Dual Proximal Gradient Method

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

A Tutorial on Primal-Dual Algorithm

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

Optimization methods

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

New hybrid conjugate gradient methods with the generalized Wolfe line search

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

Lecture 18: Optimization Programming

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Convergence rate estimates for the gradient differential inclusion

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

Using exact penalties to derive a new equation reformulation of KKT systems associated to variational inequalities

Conditional Gradient (Frank-Wolfe) Method

Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control

About Split Proximal Algorithms for the Q-Lasso

Interior-Point and Augmented Lagrangian Algorithms for Optimization and Control

Algorithms for constrained local optimization

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Optimization methods

Dual and primal-dual methods

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

Additional Homework Problems

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

Sparse Regularization via Convex Analysis

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

An Inexact Newton Method for Optimization

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

Spectral gradient projection method for solving nonlinear monotone equations

Sparse Optimization Lecture: Dual Methods, Part I

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

Constrained optimization: direct methods (cont.)

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Active sets, steepest descent, and smooth approximation of functions

Statistical Machine Learning from Data

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Algorithms for Nonsmooth Optimization

A Proximal Method for Identifying Active Manifolds

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Splitting methods for decomposing separable convex programs

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Dual Augmented Lagrangian, Proximal Minimization, and MKL

An Inexact Newton Method for Nonlinear Constrained Optimization

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

A Dykstra-like algorithm for two monotone operators

In collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

A convergence result for an Outer Approximation Scheme

A second order primal-dual algorithm for nonsmooth convex composite optimization

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Support Vector Machines

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Introduction to Alternating Direction Method of Multipliers

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

Introduction to Nonlinear Stochastic Programming

Distributed Optimization via Alternating Direction Method of Multipliers

Max Margin-Classifier

1.The anisotropic plate model

Written Examination

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection

Auxiliary-Function Methods in Iterative Optimization

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

A Primal-dual Three-operator Splitting Scheme

An optimal control problem for a parabolic PDE with control constraints

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

A smoothing augmented Lagrangian method for solving simple bilevel programs

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

9. Dual decomposition and dual algorithms

6. Proximal gradient method

Downloaded 12/13/16 to Redistribution subject to SIAM license or copyright; see

Priority Programme 1962

Transcription:

New Horizon of Mathematical Sciences RIKEN ithes - Tohoku AIMR - Tokyo IIS Joint Symposium April 28 2016 AUGMENTED LAGRANGIAN METHODS FOR CONVEX OPTIMIZATION T. Takeuchi Aihara Lab. Laboratories for Mathematics, Lifesciences, and Informatics Institute of Industrial Science, the University of Tokyo 1 / 32

Optimization Problems: Types and Algorithms 1 Types of Optimization Linear Programming Quadratic Programming Semidefinite Programming MPEC Nonlinear Least Squares Nonlinear Equations Nondifferentiable Optimization Global Optimization Derivative-Free Optimization Network Optimization Algorithms Line Search Trust-Region (Nonlinear) Conjugate Gradient (Quasi-)Newton Truncated Newton Gauss Newton Levenberg-Marquardt Homotopy Gradient Projection Simplex Method Interior Point Methods Active Set Methods Primal-Dual Active Set Methods Augmented Lagrangian Methods Reduced Gradient Sequential Quadratic Programming 1 neos Optimization Guide: http://www.neos-guide.org 2 / 32

Problem and Objective Problem: A class of nonsmooth convex optimization X, H: real Hilbert spaces f : X R : differentiable, 2 f(x) mi min x f(x) + φ(ex) (1) φ: H R : convex, nonsmooth(nondifferentiable) E : X H : linear Optimality system 0 x f(x) + E φ(ex) (2) Objective: To develop Newton (-like) methods Applicable to a wide range of applications Fast and accurate Easy to implement Tool: Augmented Lagrangian in the sense of Fortin. 3 / 32

Motivating Examples (1/4) Optimal control problem Inverse source problem (ŷ, û) = arg 1 T min x(t) Qx(t) + u(t) Ru(t) dt 2 0 s.t. ẋ(t) = Ax(t) + Bu(t), x(0) = x 0 min u(t) 1. (y,u) H 1 0 (Ω) L2 (Ω) (1/2) y z 2 L 2 + (α/2) u 2 L 2 + β u L 1 s.t. y = u on Ω, y = 0 on Ω u:ground truth z: observation û: estimation 4 / 32

Motivating Examples (2) lasso (least absolute shrinkage and selection operator) [R.Tibshinari, 96] ( ) min Xβ β R y 2 p 2 subject to β 1 t. min Xβ β R y 2 p 2 + λ β 1 X = [x 1,..., x N ] x i = (x i1,..., x ip ) : the predictor variables, i = 1, 2,..., N y i : the responses, i = 1, 2,..., N envelope construction 1 ˆx = arg min x R n 2 x s 2 2 + α (D D) 2 x 1 subject to s x. s: noisy signal D: finite difference (D D) 2 x 1 : control the smoothness of the envelope x. 5 / 32

Motivating Examples (3/4) total variation image reconstruction min 1 p Kx y L p + α x T V + β/2 x 2 H 1 for p = 1 or p = 2. K: blurring kernel. x T V = ([D 1 x] i,j ) 2 + ([D 2 x] i,j ) 2 i,j 6 / 32

Motivating Examples (4/4) Quadratic programming min (x, Ax) (a, x) s.t. Ex = b, Gx g. 2 x 1 Group LASSO 1 min β 2 Xβ y 2 + α β G, e.g. G = {{1, 2}, {3, 4, 5}}, β G = β 2 1 + β2 2 + β 2 3 + β2 4 + β2 5. SVM 7 / 32

outline 1. Augmented Lagrangian 2. Newton Methods 3. Numerical Test 8 / 32

Augmented Lagrangian

Method of multipliers for equality constraint equality constrained optimization problem (P ) min x f(x) subject to g(x) = 0 where f : R n R and g : R n R m are differentiable. augmented Lagrangian (Hestenes, Powell 69), c > 0 L c (x, λ) = f(x) + λ g(x) + (c/2) g(x) 2 R m method of multiplier 1 = f(x) + (c/2) g(x) + λ/c 2 R m 1/(2c) λ 2 R m. λ k+1 = λ k + c[ λ d c ](λ k ). method of multiplier is composed of two steps: x k = arg min x L c (x, λ k ) λ k+1 = λ k + c λ L c (x k, λ k ) = λ k + cg(x k ) 1 def.: d c(λ) := min x L c(x, λ). fact: d c(λ) = [ λ L c](x(λ), λ) = g(x(λ)) 10 / 32

A brief history of augmented Lagrangian methods (1/2) equality constrained optimization inequality constrained optimization min x R n f(x) s.t. h(x) = 0 1969 Hestenes[7], Powell[14] min x R n f(x) s.t. g(x) 0 1973 Rockafellar[15, 16] equality and inequality constrained nonsmooth convex optimization optimization min f(x) + φ(ex) min f(x) x R n x R n s.t. h(x) = 0 g(x) 0 1975 Glowinski-Marroco[6] 1975 Fortin[4] 1982 Berstekas[2] 1976 Gabay-Mercier[5] 2000 Ito-Kunisch[9] 2009 Tomioka-Sugiyama[18] 11 / 32

A brief history of augmented Lagrangian methods (2/2) Augmented Lagrangian Methods (Method of Multipliers) year authors problem algorithm 1969 Hestenes[7], Powell[14] equality constraint 1973 Rockafellar[15, 16] inequality constraint 1975 Glowinski-Marroco[6] nonsmooth convex (FEM) ALG1 1975 Fortin[4] nonsmooth convex (FEM) 1976 Gabay-Mercier[5] nonsmooth convex (FEM) ADMM 1 1982 Berstekas[2] equality and inequality constraints 2000 Ito-Kunisch[9] nonsmooth convex (PDE Opt.) 2009 Tomioka-Sugiyama[18] nonsmooth convex DAL 2 (Machine Learning) 2013 Patrinos-Bemporad [12] convex composite Proximal Newton 2014 Patrinos-Stella (Optimal Control) FBN 3 -Bemporad[13] 1 Alternating Direction Method of Multipliers 2 Dual-augmented Lagrangian method 3 Forward-Backward-Newton 12 / 32

Augmented Lagrangian Methods [Glowinski+,75] nonsmooth convex optimization problem min x,v f(x) + φ(v), subject to v = Ex. augmented Lagrangian L c (x, v, λ) = f(x) + φ(v) + (λ, Ex v) + (c/2) Ex v 2 R m dual function method of multipliers (ALG1) d c (λ) := min x,v L c(x, v, λ) λ k+1 = λ k + c λ d c (λ k ). method of multiplier is composed of two steps: step 1 (x k+1, v k+1 ) = arg min x,v L c (x, v, λ k ) step 2 λ k+1 = λ k + c(ex k+1 v k+1 ) Fact (gradient): λ d c (λ) = [ λ L c ](x(λ), v(λ), λ) 13 / 32

Augmented Lagrangian Methods [Gabay+,76] nonsmooth convex optimization problem min x,v f(x) + φ(v), subject to v = Ex. augmented Lagrangian L c (x, v, λ) = f(x) + φ(v) + (λ, Ex v) + (c/2) Ex v 2 R m ADMM, alternating direction method of multiplier step 1 x k+1 = arg min x L c (x, v k, λ k ) step 2 v k+1 = arg min v L c (x k+1, v, λ k ) = prox φ/c (Ex k+1 + λ k /c) step 3 λ k+1 = λ k + c(ex k+1 v k+1 ) Assumption: prox φ/c (z) := arg min v (φ(v) + c/2 v z 2 ) has a closed-form representation (explicitly computable). 14 / 32

Augmented Lagrangian Methods [Fortin,75] nonsmooth convex optimization problem min x f(x) + φ(ex) (Fortin s) augmented Lagrangian L c (x, λ) = min L c (x, v, λ) = f(x) + φ c (Ex + λ/c) 1/(2c) λ 2 R m v where φ c (z) is Moreau s envelope method of multiplier λ k+1 = λ k + c λ d c (λ k ) method of multiplier is composed of two steps: x k = arg min x L c (x, λ k ) λ k+1 = λ k + c[ex k prox φ/c (Ex k + λ k /c)] def.(dual function): d c(λ) := min x L c(x, λ). fact (gradient): λ d c(λ) = [ λ L c](x(λ), λ) Assumption: prox φ/c (z) := arg min v (φ(v) + c/2 v z 2 ) has a closed-form representation (explicitly computable). 15 / 32

Comparison augmented Lagrangian for ALG1[Glowinski+,75], ADMM[Gabay+,76] L c (x, v, λ) = f(x) + φ(v) + (λ, Ex v) + (c/2) Ex v 2 R m (Fortin s) augmented Lagrangian for ALM[Fortin,75] L c (x, λ) = min v L c (x, v, λ) = f(x) + φ c (Ex + λ/c) 1/(2c) λ 2 R m Fortin s Augmented Lagrangian method has been ignored in the subsequent literature. The method was re-discovered by [Ito-Kunisch,2000] and by [Tomioka-Sugiyama,2009]. 16 / 32

Moreau s envelope, proximal operator (1/3) Moreau s envelope (Moreau-Yosida regularization) of a closed proper convex function φ φ c (z) := min v (φ(v) + c/2 v z 2 ) The proximal operator prox φ (z) is defined by prox φ (z) := arg min v (φ(v) + 1/2 v z 2 ) prox φ/c (z) = arg min v (φ(v) + c/2 v z 2 ) = arg min v (φ(v)/c + 1/2 v z 2 ) Moreau s envelope is expressed in terms of the proximal operator φ c (z) = φ(prox φ/c (z)) + c/2 prox φ/c (z) z 2 17 / 32

Moreau s envelope, proximal operator (2/3) l 1 norm φ(x) = x 1, x R n prox φ (x) i = max(0, x i 1)sign(x i ), G B (prox φ )(x) G ii = 1 x i > 1, 0 x i < 1, {0, 1} x i = 1 Euclidean norm φ(x) = x, x R n prox φ (x) = max(0, x 1) x x, B (prox φ )(x) = I 1 xx (I x x 2 ) x > 1, 0 x < 1, {0, I 1 xx (I x x 2 )} x = 1 Ref. Fixed-Point Algorithms for Inverse Problems in Science and Engineering Springer Optimization and Its Applications, Chapter 10. 18 / 32

Moreau s envelope, proximal operator (3/3) Properties 1 φ c (z) φ(z), lim c φ c (z) = φ(z) φ c (z) + (φ ) 1/c (cz) = (c/2) z 2 z φ c (z) = c(z prox φ/c (z)) prox φ/c (z) prox φ/c (w) 2 (prox φ/c (z) prox φ/c (w), z w) prox φ/c (z) + (1/c)prox cφ (cz) = z def.(convex conjugate): φ (x ) := sup x ( x, x φ(x)). prox φ/c is globally Lipschitz, almost everywhere differentiable P B (prox φ/c )(x) is a symmetric positive semidefinite that satisfies P 1. B (prox cφ )(x) = {I P : P B (prox φ/c )(x/c)} 1 [17, Bauschke+11],[13, Partrions+14] 19 / 32

Differentiability, Optimality condition Fortin s augmented Lagrangian L c (x, λ) = f(x) + φ c (Ex + λ/c) 1/(2c) λ 2 R m Fortin s augmented Lagrangian L c is convex and continuously differentiable with respect to x, and is concave and continuously differentiable with respect to λ. x L c (x, λ) = x f(x) + ce (Ex + λ/c prox φ/c (Ex + λ/c)), λ L c (x, λ) = Ex prox φ/c (Ex + λ/c). The following conditions on a pair (x, λ ) are equivalent. (a) x f(x ) + E λ = 0 and λ φ(ex ). (b) x f(x ) + E λ = 0 and Ex prox φ/c (Ex + λ /c) = 0. (c) x L c (x, λ ) = 0 and λ L c (x, λ ) = 0. (d) L c (x, λ) L c (x, λ ) L c (x, λ ), x X, λ H. If there exists a pair (x, λ ) that satisfies (d), then x is the minimizer of (1). 20 / 32

Newton Methods

Newton methods Problem min f(x) + φ(ex) x Approaches (1-1) Lagrange-Newton: Solve the optimality system by Newton method x L c (x, λ) = 0 and λ L c (x, λ) = 0 (1-2) A special case (E = I). Solve the optimality system by Newton method x = prox φ/c (x x f(x)/c) (2) Apply Newton method to Fortin s augmented Lagrangian method step 1 Solve for x k : x L c (x k, λ k ) = 0 step 2 λ k+1 = λ k + c(ex k prox φ/ck (Ex k + λ k /c)). For fixed λ k x k = arg min x L c (x, λ k ) x L c (x k, λ k ) = 0. 22 / 32

Lagrange-Newton method Lagrange-Newton method: Solve the optimality system by Newton method 1 x L c (x, λ) = 0 and λ L c (x, λ) = 0 Newton system: [ 2 x f(x) + ce (I G)E ((I G)E) (I G)E c 1 G where G (prox φ/c )(Ex + λ/c) or equivalently [ ] [ ] 2 x f(x) E x + (I G)E c 1 G λ + = where z = Ex + λ/c. d x = x + x, d λ = λ + λ. ] [ ] dx = d λ [ ] 2 x f(x)x x f(x) prox φ/c (z) Gz [ ] x L c (x, λ) λ L c (x, λ) 1 When E is not surjective, the Newton system becomes inconsistent, i.e., there may be no solution to the Newton system. In this case, a proximal algorithm is employed to guarantee the solvability of the Newton system 23 / 32

Newton method for composite convex problem Problem min f(x) + φ(x) x Apply Newton method to the optimality condition x f(x) + λ = 0, x prox φ/c (x + λ/c) = 0 x prox φ/c (x x f(x)/c) = 0 Define F c (x) = L c (x, x f(x)). Algorithm step 1: G (prox φ/c )(x x f(x)/c). step 2: ( I G(I c 1 D 2 f(x)) ) d = (x prox φ/c (x x f(x)/c)). step 3: Armijo s rule: Find the smallest nonnegative integer j such that F c (x + 2 j d) F c (x) + µ2 j x F c (x), d (0 < µ < 1) x + = x + 2 j d step 2: a variable metric gradient method on F c (x): c(i c 1 2 xf(x))(i G(I c 1 2 xf(x)))d = x F c (x). The algorithm is proposed in [12, Patrinos-Bemporad]. 24 / 32

Newton method applied to the priaml update Apply Newton method to solve the nonlinear equation arising from Fortin s augmented Lagrangian method step 1 x L ck (x, λ k ) = 0 step 2 λ k+1 = λ k + c k (Ex k prox φ/ck (Ex k + λ k /c)). Newton system ( 2 xf(x) + ce (I G)E)d x = x L ck (x, λ) 25 / 32

Numerical Test

Numerical Test l 1 least square problems 1 min x 2 Ax y 2 + γ x 1. The data set (A, y, x 0 ): L1-Testset (mat files) available at http://wwwopt.mathematik.tu-darmstadt.de/spear/ Used 200 datasets. (dimension of x : 1, 024 49, 152). Windows surface pro 3, Intel i5 / 8GB RAM. Comparison: MOSEK (free academic license available at https://mosek.com/) MOSEK is a software package for solving large scale sparse problems. Algorithm: the interior-point method. c = 0.99/σ max (A T A). γ = 0.01 A T y. Compute until x L c (x k, λ k ) 2 + λ L c (x k, λ k ) 2 < 10 12 is met. 27 / 32

Newton method for l 1 regularized L.S. Newton system: [ A A I ] [ ] dx I G cg d λ where G = (prox φ/c )(x + cλ): [ = A T (Ax y) x prox φ/c (x + cλ) G ii = 0 for i o := {j x j + cλ j γc, j = 1, 2,..., n} G ii = 1 for i o c = {j x j + cλ j > γc, j = 1, 2,..., n} Simply the system. d x (o) = x(o) A(:, o c ) A(:, o c )d x (o c ) = [γsign(x(o c ) + cλ(o c )) + A(:, o c ) (A(:, o c )x(o c ) y)] d λ = A A(x + d x ) + A T y. ] 28 / 32

Results 29 / 32

QUESTIONS? COMMENTS? SUGGESTIONS? 30 / 32

References I [1] Bergounioux, M., Ito, K., Kunisch, K.: Primal-dual strategy for constrained optimal control problems. SIAM J. Control Optim. 37, 1176 1194 (1999) [2] Bertsekas, D.: Constrained optimization and Lagrange multiplier methods. Academic Press, Inc. (1982) [3] Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Vol. II. Springer-Verlag, New York (2003) [4] Fortin, M.: Minimization of some non-differentiable functionals by the augmented Lagrangian method of Hestenes and Powell. Appl. Math. Optim. 2, 236 250 (1975) [5] Gaby, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comp. Math. Appl. 9, 17 40 (1976) [6] Glowinski, R., Marroco, A.: Sur l approximation, par éléments finis d ordre un, et la résolution, par pénalisation-dualité d une classe de problèmes de dirichlet non linéaires. ESAIM: Math. Model. Numer. Anal. 9, 41 76 (1975) [7] Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 303 320 (1969) [8] Ito, K., Kunisch, K.: An active set strategy based on the augmented Lagrangian formulation for image restoration. ESAIM: Math. Model. Numer. Anal. 33, 1 21 (1999) [9] Ito, K., Kunisch, K.: Augmented Lagrangian methods for nonsmooth, convex optimization in Hilbert spaces. Nonlin. Anal. Ser. A Theory Methods 41, 591 616 (2000) 31 / 32

References II [10] Ito, K., Kunisch, K.: Lagrange Multiplier Approach to Variational Problems and Applications. SIAM, Philadelphia, PA (2008) [11] Jin, B., Takeuchi, T.: Lagrange optimality system for a class of nonsmooth convex optimization. Optimization, 65, 1151-1166 (2016) [12] Patrinos, P., Bemporad, A.: Proximal Newton methods for convex composite optimization. 52nd IEEE Conference on Decision and Control, 2358-2363 (2013) [13] Patrinos, P., Stella, L., Bemporad, A.: Forward-backward truncated Newton methods for convex composite optimization. preprint, arxiv:1402.6655v2 (2014) [14] Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Optimization (Sympos., Univ. Keele, Keele, 1968), pp. 283 298. Academic Press, London (1969) [15] Rockafellar, R.: A dual approach to solving nonlinear programming problems by unconstrained optimization. Math. Programming 5, 354 373 (1973) [16] Rockafellar, R.: The multiplier method of Hestenes and Powell applied to convex programming. J. Optim. Theory Appl. 12, 555 562 (1973) [17] Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011) [18] Tomioka, R., Sugiyama, M.: Dual-Augmented Lagrangian Method for Efficient Sparse Reconstruction. IEEE Signal Processing Letters. 16, 1067 1070 (2009) 32 / 32