SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1

Similar documents
ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

Primal-Dual Bilinear Programming Solution of the Absolute Value Equation

20 J.-S. CHEN, C.-H. KO AND X.-R. WU. : R 2 R is given by. Recently, the generalized Fischer-Burmeister function ϕ p : R2 R, which includes

Extended Monotropic Programming and Duality 1

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Iterative Methods for Solving A x = b

Constrained Optimization and Lagrangian Duality

Expected Residual Minimization Method for Stochastic Linear Complementarity Problems 1

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Absolute Value Programming

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1

1. Introduction The nonlinear complementarity problem (NCP) is to nd a point x 2 IR n such that hx; F (x)i = ; x 2 IR n + ; F (x) 2 IRn + ; where F is

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

A Unified Approach to Proximal Algorithms using Bregman Distance

Convex Optimization and Support Vector Machine

Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

c 1998 Society for Industrial and Applied Mathematics

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Optimality Conditions for Constrained Optimization

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH

CS711008Z Algorithm Design and Analysis

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Machine Learning. Support Vector Machines. Manfred Huber

Convex Optimization and SVM

Absolute value equations

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Stationary Points of Bound Constrained Minimization Reformulations of Complementarity Problems1,2

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Iteration Complexity of Feasible Descent Methods for Convex Optimization

Research Article Modified Halfspace-Relaxation Projection Methods for Solving the Split Feasibility Problem

1. Introduction. We consider the classical variational inequality problem [1, 3, 7] VI(F, C), which is to find a point x such that

Optimization for Machine Learning

Parallel Coordinate Optimization

Lecture 3. Optimization Problems and Iterative Algorithms

DYNAMIC DUALIZATION IN A GENERAL SETTING

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

AUGMENTED LAGRANGIAN METHODS FOR CONVEX OPTIMIZATION

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A Smoothing SQP Method for Mathematical Programs with Linear Second-Order Cone Complementarity Constraints

A Descent Algorithm for Solving Variational Inequality Problems

Pacific Journal of Optimization (Vol. 2, No. 3, September 2006) ABSTRACT

Support Vector Machines for Regression

WHEN ARE THE (UN)CONSTRAINED STATIONARY POINTS OF THE IMPLICIT LAGRANGIAN GLOBAL SOLUTIONS?

WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE

Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems. Hirokazu KATO

Linear Convergence under the Polyak-Łojasiewicz Inequality

Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

An upper bound for the number of different solutions generated by the primal simplex method with any selection rule of entering variables

Lagrange duality. The Lagrangian. We consider an optimization program of the form

A FRITZ JOHN APPROACH TO FIRST ORDER OPTIMALITY CONDITIONS FOR MATHEMATICAL PROGRAMS WITH EQUILIBRIUM CONSTRAINTS

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

Dual and primal-dual methods

Solving Dual Problems

Additional Homework Problems

SUCCESSIVE LINEARIZATION METHODS FOR NONLINEAR SEMIDEFINITE PROGRAMS 1. Preprint 252 August 2003

An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization

Coordinate descent methods

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

A derivative-free nonmonotone line search and its application to the spectral residual method

Existence of Global Minima for Constrained Optimization 1

A projection algorithm for strictly monotone linear complementarity problems.

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

GENERAL NONCONVEX SPLIT VARIATIONAL INEQUALITY PROBLEMS. Jong Kyu Kim, Salahuddin, and Won Hee Lim

Coordinate Descent and Ascent Methods

CANONICAL DUAL APPROACH TO SOLVING 0-1 QUADRATIC PROGRAMMING PROBLEMS. Shu-Cherng Fang. David Yang Gao. Ruey-Lin Sheu. Soon-Yi Wu

A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints

4TE3/6TE3. Algorithms for. Continuous Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Newton s Method. Javier Peña Convex Optimization /36-725

Applications of Linear Programming

Optimization methods

2.098/6.255/ Optimization Methods Practice True/False Questions

Homework 4. Convex Optimization /36-725

Finite Convergence for Feasible Solution Sequence of Variational Inequality Problems

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

On the Coerciveness of Merit Functions for the Second-Order Cone Complementarity Problem

Hessian Riemannian Gradient Flows in Convex Programming

7.3 The Jacobi and Gauss-Siedel Iterative Techniques. Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP.

Accelerated primal-dual methods for linearly constrained convex problems

Research Article Solving the Matrix Nearness Problem in the Maximum Norm by Applying a Projection and Contraction Method

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lagrange Relaxation and Duality

1. f(β) 0 (that is, β is a feasible point for the constraints)

RESEARCH ARTICLE. A strategy of finding an initial active set for inequality constrained quadratic programming problems

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Solution of a General Linear Complementarity Problem using smooth optimization and its application to bilinear programming and LCP

Sparse Optimization Lecture: Dual Methods, Part I

On the exponential convergence of. the Kaczmarz algorithm

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Priority Programme 1962

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Transcription:

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 Masao Fukushima 2 July 17 2010; revised February 4 2011 Abstract We present an SOR-type algorithm and a Jacobi-type algorithm that can effectively be applied to the l 1 -l 2 problem by exploiting its special structure. The algorithms are globally convergent and can be implemented in a particularly simple manner. Relations with coordinate minimization methods are discussed. Key words. l 1 -l 2 problem Fenchel dual SOR method Jacobi method. 1 Introduction The purpose of this short article is to draw the reader s attention to the fact that the so-called l 1 -l 2 problem can effectively be solved by iterative methods of SOR- or Jacobi-type by way of Fenchel duality. The l 1 -l 2 problem is to find a vector x R n that solves the following nonsmooth convex optimization problem: where f : R m R and g : R n R are defined by min f(ax) + g(x) (1) x Rn f(s) = 1 2 s b 2 H g(x) = τ x 1 respectively and A R m n b R m τ > 0. Moreover 1 denotes the l 1 norm in R n and H is the norm in R m defined by s H = s T Hs with a symmetric 1 This work was supported in part by a Grant-in-Aid for Scientific Research from Japan Society for the Promotion of Science. 2 Department of Applied Mathematics and Physics Graduate School of Informatics Kyoto University Kyoto 606-8501 Japan. E-mail: fuku@i.kyoto-u.ac.jp 1

positive definite matrix H R m m. When H = I the norm H reduces to the l 2 or Euclidean norm 2. This problem has recently drawn much attention in various application areas such as signal and image reconstruction and restoration [14]. The Fenchel dual [11] of problem (1) is stated as min f ( y) + g (A T y) (2) y R m where f : R m R and g : R n R {+ } are the conjugate functions of f and g respectively and are given by f (y) = 1 2 y + Hb 2 H 1 1 2 Hb 2 H 1 { g 0 if t S := {t R n t τ} (t) = + otherwise. Therefore ignoring the constant term we may rewrite the dual problem (2) as follows: min s.t. 1 2 y Hb 2 H 1 τe A T y τe (3) where e = (1 1... 1) T R n. In the following we denote the columns of matrix A by a i R m i = 1... n which will be assumed to be nonzero throughout the paper. Then the constraints of problem (3) can be represented as τ (a i ) T y τ i = 1... n. (4) It may be worth mentioning that the above pair of dual problems can be derived in another way. First note that the KKT conditions for problem (3) can be written as H 1 y b + Aξ Aη = 0 (5) 0 ξ A T y + τe 0 (6) 0 η A T y + τe 0 (7) where ξ and η denote the Lagrange multipliers associated with the right-hand and the left-hand inequality constraints in (3) respectively and a b means vectors a and b are orthogonal. By (5) we have y = HA(ξ η) + Hb. (8) 2

This along with (6) and (7) yields 0 ξ A T HA(ξ η) A T Hb + τe 0 (9) 0 η A T HA(ξ η) + A T Hb + τe 0. (10) It is then easy to observe that (9) and (10) comprise the KKT conditions for the following convex quadratic program: min 1 2 (A(ξ η) b)t H(A(ξ η) b) + τe T (ξ + η) s.t. ξ 0 η 0. (11) It is not difficult to verify that any optimal solution of this problem satisfies ξ i η i = 0 i = 1... n. Therefore by letting x = ξ η (12) problem (11) can be rewritten as 1 min x R n 2 (Ax b)t H(Ax b) + τe T x where x := ( x 1... x n ) T. This is precisely problem (1). Note that the dual problem (3) has a unique optimal solution whereas the primal problem (1) has an optimal solution but it is not necessarily unique. From (8) and (12) optimal solutions of problems (1) and (3) are related by y = H(b Ax). (13) A few words about notation: We let e i denote the ith unit vector in R n i.e. the ith column of the n n identity matrix. The median of three real numbers α β γ is denoted by mid{α β γ}. 2 Algorithms Hildreth s algorithm [6] and its Successive Over-Relaxation (SOR) modification [8] are classical iterative methods for solving strictly convex quadratic programming problems with inequality constraints. These methods use the rows of the constraint matrix 3

just one at a time and act upon the problem data directly without modifying the original matrix in the course of the iterations; hence the name row-action methods [2]. This type of algorithms can be viewed as particular realizations of matrix splitting algorithms for linear complementarity problems and their convergence properties have extensively been studied under general settings; see e.g. [4 7 9 10]. The quadratic program (3) has particular constraints that consist of pairs of linear inequalities which we call interval constraints. The above-mentioned methods [6 8] may naturally be applied to problems with interval constraints by treating each pair of inequalities as two separate inequalities. However this is by no means the best strategy. By exploiting the special feature of interval constraints Herman and Lent [5] developed an extension of Hildreth s algorithm to deal with a pair of inequalities directly see also [3]. Subsequently an SOR version of the row-action method for interval constraints was presented in [12]. Moreover a parallel Jacobi-type modification of the row-action method for interval constraints was proposed in [13]. In the following two subsections we describe the SOR-type algorithm [12] and the Jacobi-type method [13] both of which fully exploit the special feature of the problem with interval constraints. 2.1 SOR-type algorithm The SOR-type algorithm for solving (3) is stated as follows [12]: Algorithm 1. Initialization: Let (y (0) x (0) ) := (Hb 0) R m R n and choose a relaxation parameter ω (0 2). Iteration k: Choose an index i k {1... n} according to some rule and let α ik := (a i k ) T Ha i k (k) := τ (ai k ) T y (k) α ik Γ (k) := τ (ai k ) T y (k) α ik c (k) := mid ( x (k) i k ω (k) ωγ (k)) x (k+1) := x (k) c (k) e i k y (k+1) := y (k) + c (k) Ha i k. Note that α ik are all positive since H is positive definite and a i k 0 by assumption. 4

Moreover it is easily seen that Γ (k) < (k) for all k. The algorithm generates two sequences {y (k) } R m and {x (k) } R n. It can easily be shown that they are related by the formula y (k) = H(b Ax (k) ) k = 0 1 2... (14) Under the assumption that the selection of indices {i k } follows the almost cyclic rule i.e. there exists an integer N > 0 such that {1 2... n} {i k i k+1... i i+n } for all k it is shown that the whole sequence {y (k) } converges to the unique solution y of problem (3) [12 Theorem 4.3] and the rate of convergence is N-step linear in the sense that for some constant ρ (0 1) the inequality y (k+n) y H 1 ρ y (k) y H 1 holds for all k large enough [12 Theorem 4.4]. In view of the relations (13) and (14) we can deduce that any accumulation point of the sequence {x (k) } is an optimal solution of problem (1). 2.2 Jacobi-type algorithm The Jacobi-type algorithm for solving (3) is stated as follows [13]: Algorithm 2. Initialization: Let (y (0) x (0) ) := (Hb 0) R m R n and choose a relaxation parameter ω > 0. Iteration k: (i) For i = 1... n let (ii) Let α i := (a i ) T Ha i (k) i := τ (ai ) T y (k) α i Γ (k) i := τ (ai ) T y (k) α i c (k) i := mid ( x (k) i ω (k) i ωγ (k) ) i. x (k+1) i := x (k) i c (k) i i = 1... n n y (k+1) := y (k) + H c (k) i a i. 5 i=1

Note that we always have α i > 0 and Γ (k) i < (k) i i = 1... n. Moreover step (i) can be implemented in parallel for i = 1... n. Like Algorithm 1 this algorithm generates two sequences {y (k) } R m and {x (k) } R n that satisfy y (k) = H(b Ax (k) ) k = 0 1 2... (15) Let α ij := (a i ) T Ha j i j = 1... n (i j) θ i := 2 n α ij i = 1... n α i j=1j i and define ω i := { 1 min θ i ω := min i. 1 i n 3 2 + θ i } i = 1... n It is shown [13 Theorem 3.8] that if the relaxation parameter ω is chosen to satisfy the condition ω (0 ω) then the sequence {y (k) } generated by the Jacobi-type algorithm converges to the unique solution y of problem (3). Moreover from the relations (13) and (15) any accumulation point of the sequence {x (k) } is an optimal solution of problem (1). 3 Discussion It is well-known that the so-called row-action methods are dual to the coordinate minimization methods which search for a next iterate along some coordinate selected possibly in an almost cyclic manner. In fact for a given x (k) the exact minimizer of the objective function f(ax) + g(x) of problem (1) along the ith coordinate can explicitly be computed as x (k) c (k) e i where ( c (k) = mid x (k) i τ (ai ) T H(b Ax (k) ) τ ) (ai ) T H(b Ax (k) ). (a i ) T Ha i (a i ) T Ha i In view of the relation (14) we find that the (exact) coordinate minimization method for problem (1) is equivalent to Algorithm 1 with relaxation parameter ω = 1. Coordinate minimization methods for nonsmooth optimization problems have been studied by a number of authors. Auslender [1 Chapter VI Section 1] established 6

convergence of the method with relaxation parameter ω (0 2) by assuming the objective function is strongly convex. Note however that the last assumption is not satisfied by problem (1). Tseng [15] studied a (block) coordinate descent method for solving a nonsmooth optimization problem of the form: min x R n J f(x) + f j (x j ) j=1 where x j R n j are sub-vectors that compose the vector x R n i.e. n = n 1 + + n J f is a smooth function and f j are nonsmooth convex functions. Tseng and Yun [17] considered the problem min x R n f(x) + P (x) where f is a smooth function and P is a nonsmooth convex function and propose (block) coordinate gradient descent methods with some stepsize rule based on a descent condition. The above two problems contain problem (1) as a special case. In [15 17] without assuming the function f to be convex it is shown that the (block) coordinate gradient descent methods generate a sequence {x (k) } whose accumulation point is a stationary point of the corresponding minimization problem. For related results see [16 18]. 4 Conclusion We have presented an SOR-type algorithm and a Jacobi-type algorithm for solving the l 1 -l 2 problem. These algorithms exploit the special structure of the problem and can be implemented in a very simple manner. The algorithms generate two sequences; one is convergent to the unique optimal solution of the dual problem while any accumulation point of the other sequence is an optimal solution of the original l 1 -l 2 problem. Although no numerical results are given here the behavior of the algorithms can be estimated from the extensive computational experience reported in [12 13] for convex quadratic programs with interval constraints since the algorithms in [12 13] are essentially the same as those considered in this paper. In particular the numerical results in [12 13] show that the algorithms typically exhibit a linear rate of convergence. 7

References [1] A. Auslender Optimisation Méthodes Numériques Masson Paris 1976. [2] Y. Center Row-action methods for huge and sparse systems and their applications SIAM Review 23 (1981) pp. 444 466. [3] Y. Censor and A. Lent An iterative row-action method for interval convex programming Journal of Optimization Theory and Applications 34 (1981) pp. 321 353. [4] R.W. Cottle J.-S. Pang and R.E. Stone The Linear Complementarity Problem Academic Press San Diego CA 1992. [5] G.T. Herman and A. Lent A family of iterative quadratic optimization algorithms for pairs of inequalities with application in diagnostic radiology Mathematical Programming Study 9 (1979) pp. 15 29. [6] C. Hildreth A quadratic programming procedure Naval Research Logistics Quarterly 4 (1957) pp. 79 85. [7] A.N. Iusem On the convergence of iterative methods for symmetric linear complementarity problems Mathematical Programming 59 (1993) pp. 33 48. [8] A. Lent and Y. Censor Extensions of Hildreth s row action method for quadratic programming SIAM Journal on Control and Optimization 18 (1980) pp. 444 454. [9] Z.-Q. Luo and P. Tseng Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem SIAM Journal on Optimization 2 (1992) pp. 43 54. [10] O.L. Mangasarian Solution of symmetric linear complementarity problems by iterative methods Journal of Optimization Theory and Applications 22 (1977) pp. 465 485. [11] R.T. Rockafellar Convex Analysis Princeton University Press Princeton NJ 1970. [12] Y. Shimazu M. Fukushima and T. Ibaraki A successive over-relaxation method for quadratic programming problems with interval constraints Journal of Operations Research Society of Japan 36 (1993) pp 73 89. 8

[13] T. Sugimoto M. Fukushima and T. Ibaraki A parallel relaxation method for quadratic programming problems with interval constraints Journal of Computational and Applied Mathematics 60 (1995) pp. 219 236. [14] J.A. Tropp and S.J. Wright Computational methods for sparse solution of linear inverse problems Proceedings of the IEEE Special Issue on Applications of Sparse Representation and Compressive Sensing 98 (2010) pp. 948 958. [15] P. Tseng Convergence of a block coordinate descent method for nondifferentiable minimization Journal of Optimization Theory and Applications 109 (1991) pp. 475 494. [16] P. Tseng Approximation accuracy gradient methods and error bound for structured convex optimization Mathematical Programming 125 (2010) pp. 263 295. [17] P. Tseng and S. Yun A coordinate gradient descent method for nonsmooth separable minimization Mathematical Programming 117 (2009) pp. 387 423. [18] S. Yun and K.-C. Toh A coordinate gradient descent method for l 1 -regularized convex minimization Singapore-MIT Alliance Singapore April 2008 (revised January 2009) Computational Optimization and Applications to appear. 9