Key words. alternating direction method of multipliers, convex composite optimization, indefinite proximal terms, majorization, iteration-complexity

Similar documents
An Efficient Inexact Symmetric Gauss-Seidel Based Majorized ADMM for High-Dimensional Convex Composite Conic Programming

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

A TWO-PHASE AUGMENTED LAGRANGIAN METHOD FOR CONVEX COMPOSITE QUADRATIC PROGRAMMING LI XUDONG. (B.Sc., USTC)

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems

On the Moreau-Yosida regularization of the vector k-norm related functions

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

arxiv: v1 [math.oc] 23 May 2017

arxiv: v2 [math.oc] 1 Dec 2014

arxiv: v2 [math.oc] 28 Jan 2019

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

M. Marques Alves Marina Geremia. November 30, 2017

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Accelerated primal-dual methods for linearly constrained convex problems

On the acceleration of augmented Lagrangian method for linearly constrained optimization

Duality Theory of Constrained Optimization

Optimization methods

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

An inexact block-decomposition method for extra large-scale conic semidefinite programming

arxiv: v1 [math.oc] 21 Apr 2016

On convergence rate of the Douglas-Rachford operator splitting method

LARGE SCALE COMPOSITE OPTIMIZATION PROBLEMS WITH COUPLED OBJECTIVE FUNCTIONS: THEORY, ALGORITHMS AND APPLICATIONS CUI YING. (B.Sc.

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Proximal-like contraction methods for monotone variational inequalities in a unified framework

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

5 Handling Constraints

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

First-order optimality conditions for mathematical programs with second-order cone complementarity constraints

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

A projection-type method for generalized variational inequalities with dual solutions

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley

arxiv: v1 [math.oc] 5 Dec 2014

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Sequential Unconstrained Minimization: A Survey

Iteration-complexity of first-order penalty methods for convex programming

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Optimization and Optimal Control in Banach Spaces

Optimization Theory. A Concise Introduction. Jiongmin Yong

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

An Introduction to Correlation Stress Testing

A SIMPLY CONSTRAINED OPTIMIZATION REFORMULATION OF KKT SYSTEMS ARISING FROM VARIATIONAL INEQUALITIES

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization

Largest dual ellipsoids inscribed in dual cones

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Dual Ascent. Ryan Tibshirani Convex Optimization

A class of Smoothing Method for Linear Second-Order Cone Programming

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

A strongly polynomial algorithm for linear systems having a binary solution

Implications of the Constant Rank Constraint Qualification

An inexact block-decomposition method for extra large-scale conic semidefinite programming

Research Article Modified Halfspace-Relaxation Projection Methods for Solving the Split Feasibility Problem

A Solution Method for Semidefinite Variational Inequality with Coupled Constraints

Application of the Strictly Contractive Peaceman-Rachford Splitting Method to Multi-block Separable Convex Programming

Kernel Methods. Machine Learning A W VO

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Technische Universität Dresden Herausgeber: Der Rektor

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Solving large Semidefinite Programs - Part 1 and 2

An inexact subgradient algorithm for Equilibrium Problems

Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

You should be able to...

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS

arxiv: v7 [math.oc] 22 Feb 2018

On Glowinski s Open Question of Alternating Direction Method of Multipliers

Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems. Hirokazu KATO

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Algorithms for constrained local optimization

A Unified Approach to Proximal Algorithms using Bregman Distance

Algorithms for Nonsmooth Optimization

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Douglas-Rachford splitting for nonconvex feasibility problems

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Cubic regularization of Newton s method for convex problems with constraints

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

Algorithms for Constrained Optimization


Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

First order optimality conditions for mathematical programs with second-order cone complementarity constraints

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Optimisation in Higher Dimensions

Finite Convergence for Feasible Solution Sequence of Variational Inequality Problems

Transcription:

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS FOR LINEARLY CONSTRAINED CONVEX COMPOSITE OPTIMIZATION MIN LI, DEFENG SUN, AND KIM-CHUAN TOH Abstract. This paper presents a majorized alternating direction method of multipliers ADMM with indefinite proximal terms for solving linearly constrained 2-block convex composite optimization problems with each block in the objective being the sum of a non-smooth convex function px or qy and a smooth convex function fx or gy, i.e., min x X, y Y {px + fx + qy + gy A x + B y = c}. By choosing the indefinite proximal terms properly, we establish the global convergence, and the iteration-complexity in the non-ergodic sense of the proposed method for the step-length τ 0, + 5/2. The computational benefit of using indefinite proximal terms within the ADMM framework instead of the current requirement of positive semidefinite ones is also demonstrated numerically. This opens up a new way to improve the practical performance of the ADMM and related methods. Key words. alternating direction method of multipliers, convex composite optimization, indefinite proximal terms, majorization, iteration-complexity AMS subject classifications. 90C25, 90C33, 65K05. Introduction. We consider the following 2-block convex composite optimization problem { } min px + fx + qy + gy A x + B y = c, x X, y Y where X, Y and Z are three real finite dimensional Euclidean spaces each equipped with an inner product, and its induced norm ; p : X, + ] and q : Y, + ] are two closed proper convex not necessarily smooth functions; f : X, + and g : Y, + are two convex functions with Lipschitz continuous gradients on X and Y, respectively; A : X Z and B : Y Z are the adjoints of the linear operators A : Z X and B : Z Y, respectively; and c Z. The solution set of is assumed to be nonempty throughout this paper. Let σ 0, + be a given parameter. Define the augmented Lagrangian function as L σ x, y; z := px + fx + qy + gy + z, A x + B y c + σ 2 A x + B y c 2 for any x, y, z X Y Z. One may attempt to solve by using the classical augmented Lagrangian method ALM, which consists of the following iterations: { x k+, y k+ := argmin L σ x, y; z k, 2 x X, y Y z k+ := z k + τσa x k+ + B y k+ c, School of Economics and Management, Southeast University, Nanjing, 20096, China limin@seu.edu.cn. The research of this author was supported in part by the National Natural Science Foundation of China Grant Nos. 00053 and 7390335, Program for New Century Excellent Talents in University Grant No. NCET-2-0 and Qing Lan Project. Department of Mathematics and Risk Management Institute, National University of Singapore, 0 Lower Kent Ridge Road, Singapore matsundf@nus.edu.sg. The research of this author was supported in part by the Academic Research Fund Grant No. R-46-000-207-2. Department of Mathematics, National University of Singapore, 0 Lower Kent Ridge Road, Singapore mattohkc@nus.edu.sg. The research of this author was supported in part by the Ministry of Education, Singapore, Academic Research Fund Grant No. R-46-000-94-2.

2 M. LI, D. F. SUN, AND K.-C. TOH where τ 0, 2 guarantees the convergence. Due to the non-separability of the quadratic penalty term in L σ, it is generally a challenging task to solve the joint minimization problem 2 exactly or approximately with a high accuracy which may not be necessary at the early stage of the ALM. To overcome this difficulty, one may consider the following popular 2-block alternating direction method of multipliers ADMM to solve : 3 x k+ := argmin L σ x, y k ; z k, x X y k+ := argmin L σ x k+, y; z k, y Y z k+ := z k + τσa x k+ + B y k+ c, where τ 0, + 5/2. The convergence of the 2-block ADMM has long been established under various conditions and the classical literature includes [8, 4, 7, 2, 3, 9, 8]. For a recent survey, see [0]. By noting the fact that the subproblems in 3 may be difficult to solve and that in many applications f or g is a convex quadratic function, Fazel et al. [] advocated the use of the following semi-proximal ADMM scheme x k+ := argmin L σ x, y k ; z k + x X 2 x xk 2 S, 4 y k+ := argmin L σ x k+, y; z k + y Y 2 y yk 2 T, z k+ := z k + τσa x k+ + B y k+ c, where τ 0, + 5/2, and S 0 and T 0 are two self-adjoint and positive semidefinite not necessarily positive definite linear operators. When S and T are two self-adjoint and positive definite linear operators, the above semi-proximal ADMM with τ = reduces to the proximal ADMM essentially proposed by Eckstein [8]. He et al. [20] extended the work of Eckstein [8] to monotone variational inequalities to allow σ, S, and T to vary at different iterations. We refer the readers to [] as well as [7] for a brief history on the development of the semi-proximal ADMM scheme 4. The successful applications of the 2-block ADMM in solving various problems to acceptable levels of moderate accuracy have inevitably inspired many researchers interest in extending the scheme to the general m-block m 3 case. However, it has been shown very recently by Chen et al. [] via simple counterexamples that the direct extension of the ADMM to the simplest 3-block case can be divergent even if the step-length τ is chosen to be as small as 0 8. This seems to suggest that one has to give up the direct extension of m-block m 3 ADMM unless if one is willing to take a sufficiently small step-length τ as was shown by Hong and Luo in [22] or to take a small penalty parameter σ if at least m 2 blocks in the objective are strongly convex [9, 2, 28, 27, 23]. On the other hand, despite the potential divergence, the directly extended m-block ADMM with τ and an appropriate choice of σ often works very well in practice. Recently, there is exciting progress in designing convergent and efficient ADMM type methods for solving multi-block linear and convex quadratic semidefinite programming problems [34, 25]. The convergence proof of the methods presented in [34] and [25] is via establishing their equivalence to particular cases of the general 2-block semi-proximal ADMM considered in []. It is this important fact that inspires us to extend the 2-block semi-proximal ADMM in [] to a majorized ADMM

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS 3 with indefinite proximal terms which we name it as Majorized ipadmm in this paper. Our new algorithm has two important aspects. Firstly, we introduce a majorization technique to deal with the case where f and g in may not be quadratic or linear functions. The purpose of the majorization is to make the corresponding subproblems in 4 more amenable to efficient computations. We note that a similar majorization technique has also been used by Wang and Banerjee [36] under the more general setting of Bregman distance functions. The drawback of the Bregman distance function based ADMM discussed in [36] is that the parameter τ should be small for the global convergence. For example, if we choose the Euclidean distance as the Bregman divergence, then the corresponding parameter τ should be smaller than. By focusing on the Euclidean divergence instead of the more general Bregman divergence, we allow τ to stay in the larger interval 0, + 5/2. Related works of using the majorization techniques can also be found in [5, 26]. Secondly and more importantly, we allow the added proximal terms to be indefinite for better practical performance. The introduction of the indefinite proximal terms instead of the commonly used positive semidefinite or positive definite terms is motivated by numerical evidences showing that the former can outperform the latter in the majorized penalty approach for solving rank constrained matrix optimization problems in [6] and in solving linear semidefinite programming problems with a large number of inequality constraints in [34]. Here, we conduct a rigorous study of the conditions under which indefinite proximal terms are allowed within the 2-block ADMM while also establishing the convergence of the algorithm. We have thus provided the necessary theoretical support for the numerical observation just mentioned in establishing the convergence of the indefinite-proximal 2-block ADMM. Interestingly, Deng and Yin [7] mentioned that the matrix T in the ADMM scheme 4 may be indefinite if τ 0, though no further developments are given. As far as we are aware of, this is the first paper proving that indefinite proximal terms can be employed within the ADMM framework with convergence guarantee while not making restrictive assumptions on the step-length parameter τ or the penalty parameter σ. Besides proving the global convergence of our proposed majorized indefiniteproximal ADMM, we also establish its non-ergodic iteration-complexity, but not the ergodic iteration-complexity for the sake of saving space. The study of the ergodic iteration-complexity of the classical ADMM is inspired by Nemirovski [3], who proposed a prox-method with O/k iteration-complexity for variational inequalities. Monteiro and Svaiter [29] analyzed the iteration-complexity of a hybrid proximal extragradient HPE method. They also considered the ergodic iteration-complexity of block-decomposition algorithms and the ADMM in [30]. He and Yuan [2] provided a simple O/k ergodic iteration-complexity result defined on compact sets for a special semi-proximal ADMM scheme where the x-part uses a semi-proximal term while the y-part does not. Tao and Yuan [35] proved that the O/k ergodic iterationcomplexity, in terms of [2], of the ADMM with a logarithmic-quadratic proximal regularization even for τ 0, + 5/2. Ouyang et al. [32] provided an ergodic iteration-complexity for an accelerated linearized ADMM. Wang and Banerjee [36] generalized the ADMM to Bregman function based ADMM, which allows the choice of different Bregman divergences and still has the O/k iteration-complexity. The remaining parts of this paper are organized as follows. In section 2, we summarize some useful results for further analysis. Then, we present our majorized indefinite-proximal ADMM in section 3, followed by some basic properties on the generated sequence. In section 4, we present the global convergence and the choices

4 M. LI, D. F. SUN, AND K.-C. TOH of proximal terms. The analysis of the non-ergodic iteration-complexity is provided in section 5. In section 6, we provide some illustrative examples to show the potential numerical efficiency that one can gain from the new scheme when using an indefinite proximal term versus the standard choice of a positive semidefinite proximal term. Notation. The effective domain of a function h: X, + ] is defined as domh := {x X hx < + }. The set of all relative interior points of a convex set C is denoted by ric. For convenience, we use x 2 S to denote x, Sx even if S is only a self-adjoint linear operator which may be indefinite. If M : X X is a self-adjoint and positive semidefinite linear operator, we use M 2 to denote the unique self-adjoint and positive semidefinite square root of M. 2. Preliminaries. In this section, we first introduce some notation to be used in our analysis and then summarize some useful preliminaries known in the literature. Throughout this paper, we assume that the following assumption holds. Assumption. Both f and g are smooth convex functions with Lipschitz continuous gradients. Under Assumption, we know that there exist two self-adjoint and positive semidefinite linear operators Σ f and Σ g such that for any x, x X and any y, y Y, 5 6 fx fx + x x, fx + 2 x x 2 Σ f, gy gy + y y, gy + 2 y y 2 Σ g ; moreover, there exist self-adjoint and positive semidefinite linear operators Σ f Σ f and Σ g Σ g such that for any x, x X and any y, y Y, 7 8 fx ˆfx; x := fx + x x, fx + 2 x x 2 Σf, gy ĝy; y := gy + y y, gy + 2 y y 2 Σg. The two functions ˆf and ĝ are called the majorized convex functions of f and g, respectively. For any given y Y, let 2 gy be Clarke s generalized Jacobian of g at y, i.e., 2 gy = conv { lim 2 gy k : 2 gy k exists }, y k y where conv denotes the convex hull. Then for any given y Y, W 2 gy is a self-adjoint and positive semidefinite linear operator satisfying 9 Σg W Σ g 0. For further discussions, we need the following constraint qualification. Assumption 2. There exists x 0, y 0 ridomp domq P, where P := {x, y X Y A x + B y = c}. Under Assumption 2, it follows from [33, Corollary 28.2.2] and [33, Corollary 28.3.] that x, ȳ domp domq is an optimal solution to problem if and

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS 5 only if there exists a Lagrange multiplier z Z such that x, ȳ, z satisfies the following Karush-Kuhn-Tucker KKT system 0 0 p x + f x + A z, 0 qȳ + gȳ + B z, c A x B ȳ = 0, where p and q are the subdifferential mappings of p and q, respectively. Moreover, any z Z satisfying 0 is an optimal solution to the dual of problem. Therefore, we call x, ŷ, ẑ domp domq Z an ε-approximate KKT point of if it satisfies d 2 0, p x + f x + Aẑ + d 2 0, qŷ + gŷ + Bẑ + A x + B ŷ c 2 ε, where dw, S denotes the Euclidean distance of a given point w to a set S. By the assumption that p and q are convex functions, 0 is equivalent to finding a vector x, ȳ, z W := X Y Z such that for any x, y X Y, we have px p x + x x, f x + A z 0, qy qȳ + y ȳ, gȳ + B z 0, c A x B ȳ = 0 or equivalently 2 px + fx p x + f x + x x, A z 0, qy + gy qȳ + gȳ + y ȳ, B z 0, c A x B ȳ = 0, which are obtained by using the assumption that f and g are smooth convex functions. We denote by W the solution set of, which is nonempty under Assumption 2 and the fact that the solution set of problem is assumed to be nonempty. The following lemma, motivated by [6, Lemma.2], is convenient for discussing the non-ergodic iteration-complexity. Lemma 3. If a sequence {a i } R obeys: a i 0; 2 i= a i < +, then we have min i k {a i } = o/k. Proof. Since min i 2k {a i } a j for any k + j 2k, we get 0 k min i 2k {a i} 2k i=k+ a i 0 as k. Therefore, we get min i k {a i } = o/k. The proof is completed. 3. A majorized ADMM with indefinite proximal terms. Let z Z be the Lagrange multiplier associated with the linear equality constraint in and let the Lagrangian function of be 3 Lx, y; z := px + fx + qy + gy + z, A x + B y c defined on X Y Z. Similarly, for given x, y X Y, σ 0, + and any x, y, z X Y Z, define the majorized augmented Lagrangian function as follows: L σ x, y; z, x, y := px + ˆfx; x + qy + ĝy; y + z, A x + B y c + σ 2 A x + B y c 2,

6 M. LI, D. F. SUN, AND K.-C. TOH where the two majorized convex functions ˆf and ĝ are defined by 7 and 8, respectively. Our promised majorized ADMM with indefinite proximal terms for solving problem can then be described as in the following. Majorized ipadmm: A majorized ADMM with indefinite proximal terms for solving problem. Let σ 0, + and τ 0, + be given parameters. Let S and T be given selfadjoint, possibly indefinite, linear operators defined on X and Y, respectively such that P := Σ f + S + σaa 0 and Q := Σ g + T + σbb 0. Choose x 0, y 0, z 0 domp domq Z. Set k = 0 and denote r 0 := A x 0 + B y 0 c + σ z 0. Step. Compute 4 x k+ y k+ z k+ := argmin x X = argmin x X := argmin y Y = argmin y Y L σ x, y k ; z k, x k, y k + 2 x xk 2 S px + 2 x, Px + fxk + σa r k Px k, x, L σ x k+, y; z k, x k, y k + 2 y yk 2 T qy + 2 y, Qy + gyk + σb r k + A x k+ x k Qy k, y, := z k + τσa x k+ + B y k+ c. Step 2. If a termination criterion is not met, denote r k+ := A x k+ + B y k+ c + σ z k+. Set k := k + and go to Step. Remark 4. In the above Majorized ipadmm for solving problem, the presence of the two self-adjoint operators S and T first helps to guarantee the existence of solutions for the subproblems in 4. Secondly, they play an important role in ensuring the boundedness of the two generated sequences {x k+ } and {y k+ }. Thirdly, as demonstrated in [25], the introduction of S and T is the key for dealing with additionally an arbitrary number of convex quadratic and linear functions. Hence, these two proximal terms are preferred although the choices of S and T are very much problem dependent. The general principle is that both S and T should be chosen such that x k+ and y k+ take larger step-lengths while they are still relatively easy to compute. From a numerical point of view, it is therefore advantageous to pick an indefinite S or T whenever possible. The issue on how to choose S and T will be discussed in the later sections. For notational convenience, for given α 0, ] and τ 0, +, denote 5 H f := 2 Σ f + S + 2 ασaa, M g := 2 Σ g + T + minτ, + τ τ 2 ασbb, for x, y, z W, x, ȳ, z W, k = 0,,..., define φ k x, y, z := τσ z k z 2 + x k x 2 Σf + +S yk y 2 Σg+T +σbb, 6 φ k := φ k x, ȳ, z = τσ z k z 2 + x k x 2 Σf +S + y k ȳ 2 Σg+T, +σbb

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS 7 7 and ξ k+ := y k+ y k 2 Σg+T, s k+ := x k+ x k 2 2 Σ f +S + yk+ y k 2 2 Σg+T, t k+ := x k+ x k 2 H f + y k+ y k 2 M g 8 r k := A x k + B y k c, z k+ := z k + σr k+. We also recall the following elementary identities which will be used later. Lemma 5. a For any vectors u, u 2, v, v 2 in the same Euclidean vector space X, we have the identity: 9 u u 2, v v 2 = 2 v 2 u 2 v u 2 + 2 v u 2 2 v 2 u 2 2. b For any vectors u, v in the same Euclidean vector space X and any self-adjoint linear operator G : X X, we have the identity: 20 u, Gv = 2 u 2 G + v 2 G u v 2 G = u + v 2 2 G u 2 G v 2 G. To prove the global convergence for the Majorized ipadmm, we first present some useful lemmas. Lemma 6. Suppose that Assumption holds. Let {z k+ } be generated by 4 and let { z k+ } be defined by 8. Then for any z Z and k 0 we have 2 σ z zk+, z k+ z k + 2σ zk z k+ 2 = 2τσ zk z 2 z k+ z 2 + τ 2σ zk z k+ 2. Proof. From 4 and 8, we get 22 z k+ z k = τσr k+ and z k+ z k = σr k+. It follows from 22 that 23 z k+ z k = τ zk+ z k and z k+ z k+ = τ z k z k+. By using the first equation in 23, we obtain 24 σ z zk+, z k+ z k + 2σ zk z k+ 2 = τσ z zk+, z k+ z k + 2σ zk z k+ 2. Now, by taking u = z, u 2 = z k+, v = z k+ and v 2 = z k and applying the identity 9 to the first term of the right-hand side of 24, we obtain 25 σ z zk+, z k+ z k + 2σ zk z k+ 2 = z k z 2 z k+ z 2 + z k+ z k+ 2 z k z k+ 2 2τσ 2τσ + τ z k z k+ 2.

8 M. LI, D. F. SUN, AND K.-C. TOH By using the second equation in 23, we have z k+ z k+ 2 z k z k+ 2 + τ z k z k+ 2 = τ 2 z k z k+ 2 z k z k+ 2 + τ z k z k+ 2 = ττ z k z k+ 2, which, together with 25, proves the assertion 2. Lemma 7. Suppose that Assumption holds. Assume that Σ 2 g + T 0. Let {x k, y k, z k } be generated by the Majorized ipadmm and for each k, let ξ k and r k be defined as in 7 and 8, respectively. Then for any k, we have 26 τσ r k+ 2 + σ A x k+ + B y k c 2 max τ, τ σ r k+ 2 r k 2 + ξ k+ ξ k + min τ, + τ τ 2 σ τ r k+ 2 + B y k+ y k 2. Proof. Note that 27 τσ r k+ 2 + σ A x k+ + B y k c 2 = τσ r k+ 2 + σ A x k+ + B y k+ c + B y k y k+ 2 = 2 τσ r k+ 2 + σ B y k+ y k 2 + 2σ B y k y k+, r k+. First, we shall estimate the term 2σ B y k y k+, r k+ in 27. From the first-order optimality condition of 4 and the notation of z k+ defined in 8, we have 28 gy k B z k+ Σ g + T y k+ y k qy k+, gy k B z k Σ g + T y k y k qy k. From Clarke s Mean Value Theorem [3, Proposition 2.6.5], we know that gy k gy k conv { 2 g[y k, y k ]y k y k }. Thus, there exists a self-adjoint and positive semidefinite linear operator W k conv{ 2 g[y k, y k ]} such that 29 gy k gy k = W k y k y k. From 28 and the maximal monotonicity of q, it follows that y k y k+, [ gy k B z k Σ g + T y k y k ] [ gy k B z k+ Σ g + T y k+ y k ] 0, which, together with 29, gives rise to 2 y k y k+, B z k+ B z k 2 gy k gy k, y k+ y k 2 Σ g + T y k y k, y k+ y k + 2 y k+ y k 2 Σg+T = 2 W k y k y k, y k+ y k 2 Σ g + T y k y k, y k+ y k + 2 y k+ y k 2 Σg+T 30 = 2 Σ g W k + T y k y k, y k+ y k + 2 y k+ y k 2 Σg+T.

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS 9 By using the first elementary identity in 20 and W k 0, we have 2 Σ g W k + T y k y k, y k+ y k = y k+ y k 2 Σg W + k +T yk y k 2 Σg W k +T yk+ y k 2 Σg W k +T y k+ y k 2 Σg W + k +T yk y k 2 Σg W k +T yk+ y k 2 Σg. 2 W k +T From 9, we know that Σ g 2 W k +T = Σ 2 g + 2 Σ g W k +T Σ 2 g +T 0. Then, by using the elementary inequality u + v 2 G 2 u 2 G + 2 v 2 G for any self-adjoint and positive semidefinite linear operator G, we get y k+ y k 2 Σg = 2 W k +T yk+ y k + y k y k 2 Σg 2 W k +T 2 y k+ y k 2 Σg + 2 W k +T yk y k 2 Σg. 2 W k +T Substituting the above inequalities into 30, we obtain 2 y k y k+, B z k+ B z k y k+ y k 2 Σg+T yk y k 2 Σg+T + 2 yk+ y k 2 Σg+T 3 = y k+ y k 2 Σg+T yk y k 2 Σg+T. Thus, by letting µ k+ := τσ B y k y k+, r k, and using σr k+ = τσr k + z k+ z k see 4 and 8 and 3, we have 2σ B y k y k+, r k+ = 2µ k+ + 2 y k y k+, B z k+ B z k 32 2µ k+ + y k+ y k 2 Σg+T yk y k 2 Σg+T. Since τ 0, +, from the definition of µ k+, we obtain 2µ k+ { τσ B y k+ y k 2 τσ r k 2 if τ 0, ], τστ B y k+ y k 2 + τστ r k 2 if τ, +, which, together with 27, 32 and the notation of ξ k, shows that 26 holds. This completes the proof. Proposition 8. Suppose that Assumption holds. Let {x k, y k, z k } be generated by the Majorized ipadmm. For each k and x, ȳ, z W, let φ k, ξ k+, s k+, t k+, r k and z k+ be defined as in 6, 7 and 8. Then, for any τ 0, + and k 0, we have 33 φk φ k+ s k+ + τσ r k+ 2 + σ A x k+ + B y k c 2. Proof. By setting x = x k+ and x = x k in 7, we have fx k+ fx k + x k+ x k, fx k + 2 xk+ x k 2 Σf. Setting x = x k in 5, we get fx fx k + x x k, fx k + 2 xk x 2 Σ f x X.

0 M. LI, D. F. SUN, AND K.-C. TOH Combining the above two inequalities, we obtain 34 fx fx k+ 2 xk x 2 Σ f + 2 xk+ x k 2 Σf x x k+, fx k x X. From the first-order optimality condition of 4, for any x X we have 35 px px k+ + x x k+, fx k + A[z k + σa x k+ + B y k c] + Σ f + Sx k+ x k 0. Adding up 34 and 35, for any x X we get px + fx px k+ + fx k+ + x x k+, A[z k + σa x k+ + B y k c] + Σ f + Sx k+ x k 36 2 xk x 2 Σ f 2 xk+ x k 2 Σf. Using the similar derivation as to get 36, for any y Y we have qy + gy qy k+ + gy k+ + y y k+, B[z k + σa x k+ + B y k+ c] + Σ g + T y k+ y k 37 2 yk y 2 Σ g 2 yk+ y k 2 Σg. Note that z k+ = z k + σr k+, where r k+ = A x k+ + B y k+ c. Then we have z k + σa x k+ + B y k c = z k+ + σb y k y k+. Adding up 36 and 37, setting x = x and y = ȳ, and using the above equation, we have p x + f x + qȳ + gȳ px k+ + fx k+ + qy k+ + gy k+ + x x k+, A z k+ + σab y k y k+ + Σ f + Sx k+ x k + ȳ y k+, B z k+ + Σ g + T y k+ y k 38 2 xk x 2 Σ f 2 xk+ x k 2 Σf + 2 yk ȳ 2 Σ g 2 yk+ y k 2 Σg. Now setting x = x k+, x = x in 5, and y = y k+, y = ȳ in 6, we get 39 40 fx k+ f x + x x k+, f x 2 xk+ x 2 Σ f, gy k+ gȳ + ȳ y k+, gȳ 2 yk+ ȳ 2 Σ g. Let k := x x k+, A z k+ z + σab y k y k+ + ȳ y k+, B z k+ z.

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS Adding up 38, 39 and 40, and using the elementary inequality 2 u 2 G + 2 v 2 G 4 u v 2 G for any self-adjoint and positive semidefinite linear operator G, we have p x + qȳ px k+ + qy k+ + x x k+, f x + A z + ȳ y k+, gȳ + B z + k + x x k+, Σ f + Sx k+ x k + ȳ y k+, Σ g + T y k+ y k 2 xk x 2 Σ f + 2 xk+ x 2 Σ f + 2 yk ȳ 2 Σ g + 2 yk+ ȳ 2 Σ g 2 xk+ x k 2 Σf 2 yk+ y k 2 Σg 4 4 xk+ x k 2 Σ f + 4 yk+ y k 2 Σ g 2 xk+ x k 2 Σf 2 yk+ y k 2 Σg. Recall that for any x, ȳ, z W, we have px k+ + qy k+ p x + qȳ 42 + x k+ x, f x + A z + y k+ ȳ, gȳ + B z 0. Adding up 4 and 42, we get k + x x k+, Σ f + Sx k+ x k + ȳ y k+, Σ g + T y k+ y k 43 4 xk+ x k 2 Σ f + 4 yk+ y k 2 Σ g 2 xk+ x k 2 Σf 2 yk+ y k 2 Σg. By simple manipulations, and using the definition of k and A x + B ȳ c = 0, we have k = A x k+ x, z z k+ + σb y k+ y k + B y k+ ȳ, z z k+ = A x + B ȳ c, z z k+ + r k+, z z k+ + σ A x k+ x, B y k+ y k 44 and 45 = r k+, z z k+ + σ A x k+ x, B y k+ y k σ A x k+ x, B y k+ y k = σ B ȳ A x k+ c, B y k+ B y k. Now, by taking u = B ȳ, u 2 = A x k+ c, v = B y k+ and v 2 = B y k and applying the identity 9 to the right-hand side of 45, we obtain σ A x k+ x, B y k+ y k = σ B y k B ȳ 2 B y k+ B ȳ 2 2 + σ 2 A x k+ + B y k+ c 2 A x k+ + B y k c 2. Substituting this into 44 and using the definition of z k+, we have 46 k = σ z zk+, z k+ z k + 2σ zk z k+ 2 σ 2 A x k+ + B y k c 2 + σ 2 B y k B ȳ 2 B y k+ B ȳ 2.

2 M. LI, D. F. SUN, AND K.-C. TOH Setting z = z in 2, and applying it to 46, we get 47 k = z k z 2 z k+ z 2 + τ 2τσ 2σ zk z k+ 2 σ 2 A x k+ + B y k c 2 + σ 2 B y k B ȳ 2 B y k+ B ȳ 2. Using the second elementary identity in 20, we obtain that x x k+, Σ f + Sx k+ x k + ȳ y k+, Σ g + T y k+ y k = x k x 2 Σf 2 +S xk+ x +S 2 Σf 2 xk+ x k 2 Σf +S + y k ȳ 2 Σg+T 2 yk+ ȳ 2 Σg+T 2 yk+ y k 2 Σg+T. Substituting this and 47 into 43, and using the definitions of z k+ and r k+, we have 48 z k z 2 z k+ z 2 + x 2τσ 2 k x 2 Σf +S xk+ x 2 Σf +S + y k ȳ 2 Σg+T y k+ ȳ 2 Σg+T 2 +σbb +σbb τσ r k+ 2 + σ A x k+ + B y k c 2 + x k+ x k 2 2 2 Σ f +S + y k+ y k 2 2 Σg+T. Using the notation in 6 and 7, we get 33 from 48 immediately. The proof is completed. Remark 9. Suppose that B is vacuous, q 0 and g 0. Then for any τ 0, + and k 0, we have y k+ = y 0 = ȳ. Since B is vacuous, by using the definition of r k+, we have τσ r k+ 2 + σ A x k+ + B y k c 2 = 2 τσ r k+ 2. By observing that the terms concerning y in 48 cancel out, we can easily from 48 and the above equation get 49 τσ zk z 2 z k+ z 2 + x k x 2 Σf +S xk+ x 2 Σf +S 2 τσ r k+ 2 + x k+ x k 2 2 Σ f +S. 4. Convergence analysis. In this section, we analyze the convergence for the Majorized ipadmm for solving problem. We first prove its global convergence and then give some choices of proximal terms. 4.. The global convergence. Now we are ready to establish the convergence results for the Majorized ipadmm for solving. Theorem 0. Suppose that Assumptions and 2 hold. Let H f and M g be defined by 5. Let {x k, y k, z k } be generated by the Majorized ipadmm. For each k and x, ȳ, z W, let φ k, ξ k+, s k+, t k+, r k and z k+ be defined as in 6, 7 and 8. Then the following results hold:

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS 3 a For any η 0, /2 and k 0, we have φ k + βσ r k 2 φ k+ + βσ r k+ 2 τ 2 σ zk+ z k 2 + x k+ x k 2 Σf + y k+ y k 2 Σg+T +S+ησAA +ησbb τ + β 50 τ 2 σ zk+ z k 2 + x k+ x k 2 Σf + y k+ y k, 2 Σg 2 Σ f 2 Σg+ησBB where η η 5 β := 2η. In addition, assume that for some η 0, /2, 52 Σf + S + ησaa 0 and Σg + T + ησbb 0 and the following condition holds: 53 x k+ x k 2 Σf + y k+ y k 2 Σg+σBB + r k+ 2 < +. k=0 Then the sequence {x k, y k } converges to an optimal solution of problem and {z k } converges to an optimal solution of the dual of problem. b Assume it holds that 54 2 Σ g + T 0. Then, for any α 0, ] and k, we have [φ k + α minτ, τ ] σ r k 2 + αξ k 55 [φ k+ + α minτ, τ ] σ r k+ 2 + αξ k+ t k+ + τ + α min + τ, + τ σ r k+ 2. In addition, assume that τ 0, + 5/2 and for some α τ/min + τ, + τ, ], 56 Σf + S 0, H f 0, 2 Σ f + S + σaa 0 and M g 0. Then, the sequence {x k, y k } converges to an optimal solution of problem and {z k } converges to an optimal solution of the dual of problem. Proof. In the following, we will consider Part a and Part b separately. Proof of Part a. Using the definition of r k and the Cauchy-Schwarz inequality, for a given β > 0 defined by 5, we get σ A x k+ + B y k c 2 = σ r k + A x k+ x k 2 = σ r k 2 + σ A x k+ x k 2 + 2σ A x k+ x k, r k [ + β ] σ r k 2 + σ A x k+ x k 2 + β = βσ r k 2 + β + β σ A x k+ x k 2.

4 M. LI, D. F. SUN, AND K.-C. TOH By using the definition of s k+, r k+ = τσ z k+ z k and the above formula, we obtain 57 Recall that s k+ + τσ r k+ 2 + σ A x k+ + B y k c 2 τ β τ 2 z k+ z k 2 + x k+ x k 2 σ 2 Σ f +S+ β +β σaa + y k+ y k 2 2 Σg+T + βσ r k+ 2 r k 2. η η β + β = 2η + η η 2η By simple manipulations, we get η = η η η 2 > η. τ β τ 2 z k+ z k 2 + x k+ x k 2 + y k+ y k 2 σ 2 Σ f +S+ β +β σaa 2 Σg+T τ 2 σ zk+ z k 2 + x k+ x k 2 Σf + y k+ y k 2 Σg+T +S+ησAA +ησbb τ + β τ 2 σ zk+ z k 2 + x k+ x k 2 Σf + y k+ y k. 2 Σg 2 Σ f 2 Σg+ησBB Substituting this and 57 into 33, we get 50. Now assume that 52 and 53 hold. For any given η 0, /2, using the definitions of φ k+, r k+ and β, A x + B ȳ = c and the Cauchy-Schwarz inequality, we get φ k+ + βσ r k+ 2 = τσ z k+ z 2 + x k+ x 2 Σf +S + y k+ ȳ 2 Σg+T +σbb + βσ A x k+ x + B y k+ ȳ 2 58 τσ z k+ z 2 + x k+ x 2 Σf + +S yk+ ȳ 2 Σg+T +σbb + β η x k+ x 2 σaa η + β η y k+ ȳ 2 σbb η = τσ z k+ z 2 + x k+ x 2 Σf +S+ησAA + y k+ ȳ 2 Σg+T +ησbb. Set and ζ k := τ + β τ 2 σ zk+ z k 2 + x k+ x k 2 Σf + y k+ y k 2 Σg 2 Σ f 2 Σg+ησBB υ k := τ 2 σ zk+ z k 2 + x k+ x k 2 Σf +S+ησAA + y k+ y k 2 Σg+T +ησbb. From 53 and z k+ = z k +τσr k+, we get k=0 ζ k < +. Since for some η 0, /2 Σ f + S + ησaa 0 and Σg + T + ησbb 0, it follows from 58 and 50 that 59 k 0 φ k+ + βσ r k+ 2 φ k + βσ r k 2 + ζ k φ 0 + βσ r 0 2 + ζ j. j=0

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS 5 Thus the sequence {φ k+ + βσ r k+ 2 } is bounded. From 58, we see that the three sequences { z k+ z }, { x k+ x Σf +S+ησAA } and { y k+ ȳ Σg+T +ησbb } are all bounded. Since Σ f + S + ησaa 0 and Σ g + T + ησbb 0, the sequence {x k, y k, z k } is also bounded. Using 50, we get for any k 0, and hence υ k φ k + βσ r k 2 φ k+ + βσ r k+ 2 + ζ k, k υ j φ 0 + βσ r 0 2 φ k+ + βσ r k+ 2 + j=0 k ζ j < +. Again, since Σ f + S + ησaa 0 and Σ g + T + ησbb 0, from the definition of υ k, we get j=0 60 6 lim k rk+ = lim k τσ z k+ z k = 0, lim k xk+ x k = 0 and lim k yk+ y k = 0. Recall that the sequence {x k, y k, z k } is bounded. There is a subsequence {x ki, y ki, z ki } which converges to a cluster point, say x, y, z. We next show that x, y is an optimal solution to problem and z is a corresponding Lagrange multiplier. Taking limits on both sides of 36 and 37 along the subsequence {x ki, y ki, z ki }, using 60 and 6, we obtain that px + fx px + fx + x x, Az 0, qy + gy qy + gy + y y, Bz 0, c A x B y = 0, i.e., x, y, z satisfies 2. Hence x, y is an optimal solution to problem and z is a corresponding Lagrange multiplier. To complete the proof of Part a, we show that x, y, z is actually the unique limit of {x k, y k, z k }. Replacing x, ȳ, z by x, y, z in 59, for any k k i, we have 62 φ k+ x, y, z + βσ r k+ 2 φ ki x, y, z + βσ r ki 2 + k j=k i ζ j. Note that Therefore, we get lim i φ k i x, y, z + βσ r ki 2 = 0 and ζ j < +. j=0 lim φ k+x, y, z + βσ r k+ 2 = 0. k Then from 58, we obtain τσ z k+ z 2 + x k+ x 2 Σf + y k+ y 2 Σg+T +S+ησAA +ησbb = 0. lim k

6 M. LI, D. F. SUN, AND K.-C. TOH Since Σ f + S + ησaa 0 and Σ g + T + ησbb 0, we also have that lim k x k = x and lim k y k = y. Therefore, we have shown that the whole sequence {x k, y k, z k } converges to x, y, z for any τ 0, +. Proof of Part b. To prove Part b, assume that 2 Σ g +T 0. Using the definition of r k and the Cauchy-Schwarz inequality, we get σ A x k+ + B y k c 2 = σ r k + A x k+ x k 2 = σ r k 2 + σ A x k+ x k 2 + 2σ A x k+ x k, r k 2σ r k 2 + 2 σ A x k+ x k 2 = σ r k 2 + 2 σ A x k+ x k 2. By using the definition of s k+, r k+ = τσ z k+ z k and the above formula, for any α 0, ], we get 63 α [ s k+ + τσ r k+ 2 + σ A x k+ + B y k c 2] ατσ r k+ 2 + α x k+ x k 2 2 Σ f +S+ 2 σaa + α y k+ y k 2 2 Σg+T + ασ r k+ 2 r k 2. By using the definition of s k+ and 26 in Lemma 7, for any α 0, ], we have α [ s k+ + τσ r k+ 2 + σ A x k+ + B y k c 2] 64 α x k+ x k 2 2 Σ f +S + α yk+ y k 2 2 Σg+T + max τ, τ σα r k+ 2 r k 2 + α ξ k+ ξ k + min τ, + τ τ 2 σα τ r k+ 2 + B y k+ y k 2. Adding up 63 and 64, we obtain for any α 0, ] that 65 s k+ + τσ r k+ 2 + σ A x k+ + B y k c 2 x k+ x k 2 H f + y k+ y k 2 M g + α minτ, τ σ r k+ 2 r k 2 + α ξ k+ ξ k + τ + α min + τ, + τ σ r k+ 2. Using the notation in 6 and 7, we know from 33 and 65 that [φ k+ + α minτ, τ σ r k+ 2 + αξ k+ ] 66 [φ k + α minτ, τ ] σ r k 2 + αξ k {t k+ + [ τ + α min + τ, + τ ] σ r k+ 2} holds. We can get 55 from 66 immediately. Assume that τ 0, + 5/2 and α τ/min + τ, + τ, ]. Then, we have α minτ, τ > 0 and τ + α min + τ, + τ > 0. Since Σ g Σ g, M g = 2 Σ g+t +minτ, +τ τ 2 σαbb 0 and minτ, +τ τ 2,

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS 7 we have Σ g +T 2 Σ g +T 0 and Σ g +T +σbb 2 Σ g+t +minτ, +τ τ 2 σαbb 0. Note that H f 0 and M g 0. Then we obtain that φ k+ 0, t k+ 0, ξ k+ 0. From 22 and 55, we see immediately that the sequence {φ k+ + ξ k+ } is bounded, 67 lim k t k+ = 0 and which, together with 7 and 56, imply that lim k zk+ z k = lim k τσ rk+ = 0, 68 lim k xk+ x k Hf = 0, lim k yk+ y k = 0 and 69 A x k+ x k r k+ + r k + B y k+ y k 0, k. Thus, from 68 and 69 we obtain that lim k xk+ x k 2 2 Σ f +S+σAA = lim x k+ x k 2 H k f + 2 +ασ A x k+ x k 2 = 0. Since 2 Σ f + S + σaa 0, we also get 70 lim k xk+ x k = 0. By the definition of φ k+, we see that the three sequences { z k+ z }, { x k+ x Σf +S } and { yk+ ȳ Σg+T +σbb } are all bounded. Since Σ g + T + σbb 0, the sequence { y k+ } is bounded. Note that A x + B ȳ = c. Furthermore, by using A x k+ x A x k+ + B y k+ A x + B ȳ + B y k+ ȳ = r k+ + B y k+ ȳ, we also know that the sequence { A x k+ x } is bounded, and so is the sequence { x k+ x Σf +S+σAA }. This shows that the sequence { x k+ } is also bounded since Σ f + S + σaa 2 Σ f + S + σaa 0. Thus, the sequence {x k, y k, z k } is bounded. Since the sequence {x k, y k, z k } is bounded, there is a subsequence {x ki, y ki, z ki } which converges to a cluster point, say x, y, z. We next show that x, y is an optimal solution to problem and z is a corresponding Lagrange multiplier. Taking limits on both sides of 36 and 37 along the subsequence {x ki, y ki, z ki }, using 67, 68 and 70, we obtain that px + fx px + fx + x x, Az 0, qy + gy qy + gy + y y, Bz 0, c A x B y = 0, i.e., x, y, z satisfies 2. Thus x, y is an optimal solution to problem and z is a corresponding Lagrange multiplier.

8 M. LI, D. F. SUN, AND K.-C. TOH To complete the proof of Part b, we show that x, y, z is actually the unique limit of {x k, y k, z k }. As in the proof of 62 in Part a, we can apply the inequality 55 with x, ȳ, z = x, y, z to show that Hence lim φ k+x, y, z = 0 and lim ξ k+ = 0. k k lim τσ z k+ z 2 + x k+ x 2 Σf + k +S yk+ y 2 Σg+T +σbb = 0, A x k+ x A x k+ + B y k+ A x + B y + B y k+ y = r k+ + B y k+ y 0, k and lim k xk+ x 2 Σf = 0. +S+σAA Using that fact that Σ f + S + σaa and Σ g + T + σbb are both positive definite, we have lim k x k = x and lim k y k = y. Therefore, we have shown that the whole sequence {x k, y k, z k } converges to x, y, z if τ 0, + 5/2. The proof is completed. Remark. In practice, Part a of Theorem 0 can be applied in a more heuristic way by using any sufficient condition to guarantee 53 holds. If this sufficient condition does not hold, then one can just use the conditions in Part b. The conditions on S and T in Part b for the case that τ = can be written as for some α /2, ], and Σ f + S 0, 2 Σ f + S + 2 ασaa 0, 2 Σ g + T 0, 2 Σ g + T + ασbb 0; 2 Σ f + S + σaa 0 and these conditions for the case that τ =.68 can be replaced by, for some α [0.99998, ], and Σ f + S 0, 2 Σ f + S + 2 ασaa 0, 2 Σ g + T 0, 2 Σ f + S + σaa 0 2 Σ g + T + 0.000075ασBB 0. Remark 2. Suppose that B is vacuous, q 0 and g 0. Then for any τ 0, + and k 0, we have y k+ = y 0 = ȳ. Similarly as 63, for any α 0, ], we have α [ 2 τσ r k+ 2 + x k+ x k 2 2 Σ f +S ατσ r k+ 2 + α x k+ x k 2 2 Σ f +S+ 2 σaa + ασ r k+ 2 r k 2. ]

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS 9 Adding α [ 2 τσ r k+ 2 + x k+ x k 2 we get 2 Σ f +S 2 τσ r k+ 2 + x k+ x k 2 2 Σ f +S ] to both sides of the above inequality, x k+ x k 2 H f + ασ r k+ 2 r k 2 + 2α τσ r k+ 2. Substituting this into 49, we obtain {τσ z k z 2 + x k x 2 Σf +S + ασ rk 2} {τσ z k+ z 2 + x k+ x 2 Σf + +S ασ rk+ 2} x k+ x k 2 H f + 2α τσ r k+ 2. In addition, assume that τ 0, 2 and for some α τ/2, ], Σ f + S 0, H f = 2 Σ f + S + 2 ασaa 0, 2 Σ f + S + σaa 0. Then, the sequence {x k } converges to an optimal solution of problem and {z k } converges to an optimal solution of the dual of problem. 4.2. Choices of proximal terms. Let G : X X be any given self-adjoint linear operator with dimx = n, the dimension of X. We shall first introduce a majorization technique to find a self-adjoint positive definite linear operator M such that M G and M is easy to calculate. Suppose that G has the following spectral decomposition n G = λ i u i u i, i= where λ λ 2 λ n, with λ l > 0 for some l n, are the eigenvalues of G and u i, i =...., n are the corresponding mutually orthogonal unit eigenvectors. Then, for a small l, we can design a practically useful majorization for G as follows: 7 G M := l λ i u i u i + λ l i= n i=l+ u i u i = λ l I + Note that M can be easily obtained as follows: 72 M = l i= λ i u i u i + λ l n i=l+ u i u i = λ l I + l λ i λ l u i u i. i= l i= λ i λ l u i u i. Thus, we only need to compute the first l eigen-pairs λ i, u i, i =,..., l of G for computing M and M. In the following, we only discuss how to choose proximal terms for the x-part. The discussions for the y-part are similar, and are thus omitted here. It follows from 4 that x k+ = argmin x X px + 2 x, Px + fxk + σa r k Px k, x.

20 M. LI, D. F. SUN, AND K.-C. TOH Example 4.: p 0 and A 0. Choose α τ/ min + τ, + τ, ] such that 2 Σ f Σ f ασaa. Define Note that ρ 0 λ max Σf 2 Σ f 73 ρ 0 := λ max Σf 2 Σ f + 2 + ασaa.. Let ρ be any positive number such that ρ ρ 0 if AA 0, ρ ρ 0 and ρ > λ max Σ f 2 Σ f otherwise. A particular choice which we will consider later in the numerical experiments is: 74 ρ =.0ρ 0. Choose S : = 2 [ Σ f + ασaa ] + [ρi Σf 2 Σ f + + ασaa ] 2 75 = ρi Σ f σaa, where ρ is defined in 73. Then, S, which obviously may be indefinite, satisfies 56, and P = Σ f + S + σaa = ρi 0. One interesting special case is Σ f = Σ f = Q for some self-adjoint linear operator Q 0. By taking α = and ρ = 2 λ maxq + σλ max AA, we have S = 2 λ maxqi Q + σ [ λ max AA I AA ]. Example 4.2: p 0. Let α τ/ min + τ, + τ, ] such that 2 Σ f Σ f ασaa. Choose G such that G = Σ f 2 Σ f + 2 + ασaa if AA 0, G Σ f 2 Σ f + 2 + ασaa and G Σ f 2 Σ f otherwise. Let M 0 be the majorization of G as in 7. Choose S := 2 [ Σ f + ασaa ] [ + M Σf 2 Σ f + 2 + ασaa ]. Certainly, S, which may be indefinite, satisfies 56, and P = S + Σ f + σaa = M 0. By using 7 and 72, one can compute P and P at a low cost if l is a small integer number, for example l 6. One special case is AA 0 and Σ f = Σ f = Q for some self-adjoint linear operator Q 0. By taking α =, G = 2 Q + σaa, and M to be a majorization of G as defined in 7, we have S = M Q + σaa.

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS 2 Example 4.3: p can be decomposed into two separate parts px := p x + p 2 x 2 with x := x, x 2. For simplicity, we assume Q Q Σ f = Q := 2 Q and 2 Q Σ f = Q + DiagD, D 2, 22 where D and D 2 are two self-adjoint and positive semidefinite linear operators. Define M := Diag M, M 2, where and M := D + Q + Q 2 Q 2 2 2 + σ A A + A A 2A 2 A 2 M 2 := D 2 + 2 Q22 + Q 2Q 2 2 + σ A2 A 2 + A 2 A A A 2 2. If A A + A A 2A 2 A 2 0 and A 2 A 2 + A 2 A A A 2 2 0, we can choose S := M Q DiagD, D 2 σaa ; otherwise we can add a block diagonal self-adjoint positive definite linear operator to S. Then one can see that S, which again may be indefinite, satisfies 56 for α =, by using the fact that for any given linear operator X from X to another finite dimensional real Euclidean space, it holds that X XX X 2 X. X 2 Thus, from the block diagonal structure of P = M, we can see that solving the subproblem for x can be split into solving two separate subproblems for the x -part and x 2 -part, respectively. If the subproblem for either the x - or the x 2 - part is still difficult to solve, one may add a self-adjoint positive semidefinite linear operator to M or M 2, respectively, to make the subproblem easier to solve. We refer the readers to Examples 4. and 4.2 for possible choices of such linear operators in different scenarios. In Examples 4.-4.3, we list various choices of proximal terms in different situations, where in order for S to be indefinite, neither f needs to be strongly convex nor A needs to be surjective. Nevertheless, they are far from being exhaustive. For example, if p 0, x := x,..., x m, px = p x and fx is a convex quadratic function, one may construct Schur complement based or more general symmetric Gauss-Seidel based proximal terms to derive convergent ADMMs for solving some interesting multi-block conic optimization problems [25, 24]. 5. The analysis of non-ergodic iteration-complexity. In this section, we will present the non-ergodic iteration-complexity of an ε-approximate KKT point for the Majorized ipadmm. For related results, see the work of Davis and Yin [5] on the operator-splitting scheme with separable objective functions and the work of Cui et al. [4] on the majorized ADMM with coupled objective functions. Theorem 3. Assume that Assumptions and 2 hold. Let {x i, y i, z i } be generated by the Majorized ipadmm. Assume that τ 0, + 5/2 and for some α τ/min + τ, + τ, ], Σ f + S 0, H f 0, 2 Σ g + T 0 and M g 0,

22 M. LI, D. F. SUN, AND K.-C. TOH where H f and M g are defined in 5. Then, we have { min d 2 0, px i+ + fx i+ + Az i+ + d 2 0, qy i+ + gy i+ i k 76 and 77 min i k + Bz i+ + A x i+ + B y i+ c 2} = o/k px i + fx i + qy i + gy i p x + f x + qȳ + gȳ = o/ k, where x, ȳ, z W. Proof. For each i, let φ i be defined by 6 and a i := τ + α min + τ, + τ τ 2 z i+ z i 2 + x i+ x i 2 H σ f + y i+ y i 2 M g. Since τ 0, + 5/2, α τ/min + τ, + τ, ], Σ f + S 0, H f 0, Σ g + T 2 Σ g + T 0 and M g 0, we have τ + α min + τ, + τ > 0, a i 0 and φ i + α minτ, τ σ r i 2 + αξ i 0, for any i. It follows from 55 and the definitions of t i+ and r i+ that for any i we have a i = t i+ + τ + α min + τ, + τ σ r i+ 2 [φ i + α minτ, τ ] σ r i 2 + αξ i [φ i+ + α minτ, τ ] σ r i+ 2 + αξ i+. For any k, summing the above inequality over i =,..., k, we obtain k a i φ + α minτ, τ σ r 2 + αξ. i= From the above inequality, we have i= a i < +. Then, by Lemma 3 and using H f 0 and M g 0, we get min i k {a i } = o/k, that is { 78 min z i+ z i 2 + x i+ x i 2 + y i+ y i 2} = o/k. i k It follows from the first-order optimality condition of 4 that fx i + A[z i + σa x i+ + B y i c] + Σ f + Sx i+ x i px i+. And then from the definition of z i+ in 4, we have 79 fx i+ fx i + τ Az i+ z i + σab y i+ y i Σ f + Sx i+ x i px i+ + fx i+ + Az i+. Similarly, we get 80 gy i+ gy i + τ Bz i+ z i Σ g + T y i+ y i qy i+ + gy i+ + Bz i+.

It follows from 4 that A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS 23 8 A x i+ + B y i+ c 2 = τσ 2 z i+ z i 2. By using the Cauchy-Schwarz inequality, 79, 80 and 8, we have d 2 0, px i+ + fx i+ + Az i+ + d 2 0, qy i+ + gy i+ + Bz i+ + A x i+ + B y i+ c 2 4 fx i+ fx i 2 + 4 τ 2 Az i+ z i 2 + 4σ 2 AB y i+ y i 2 + 4 Σ f + Sx i+ x i 2 + 3 gy i+ gy i 2 + 3 τ 2 Bz i+ z i 2 + 3 Σ g + T y i+ y i 2 + τσ 2 z i+ z i 2. It follows from the above inequality, Assumption and 78 that the assertion 76 is proved. By using 2, for any x, y X Y, we obtain px + fx + qy + gy p x + f x + qȳ + gȳ + z, A x + B y c 0. Setting x = x i and y = y i in the above inequality, we get 82 px i +fx i +qy i +gy i p x+f x+qȳ+gȳ z, A x i +B y i c. Note that p, f, q and g are convex functions and the sequence {x i, y i, z i } generated by the Majorized ipadmm is bounded. For any u px i and v qy i, using A x + B ȳ = c, we obtain p x + f x + qȳ + gȳ px i + fx i + qy i + gy i u + fx i, x x i + v + gy i, ȳ y i = u + fx i + Az i, x x i + z i, A x i A x + v + gy i + Bz i, ȳ y i + z i, B y i B ȳ = u + fx i + Az i, x x i + v + gy i + Bz i, ȳ y i + z i, A x i + B y i c, which, together with 76 and 82, implies 77. The proof is complete. 6. Numerical experiments. We consider the following problem to illustrate the benefit which can be brought about by using an indefinite proximal term instead of the standard requirement of a positive semidefinite proximal term in applying the semi-proximal ADMM: 83 min x R n, y R m { 2 x, Qx b, x + χ 2 Π R m + Dd Hx 2 +ϱ x +δ R m + y } Hx+y = c, where x := n i= x i, δ R m + is the indicator function of R m +, Q is an n n symmetric and positive semidefinite matrix may not be positive definite, H R m n, b R n, c R m and ϱ > 0 are given data, and D is a positive definite diagonal matrix chosen to normalize each row of H to have the unit norm. In addition, d c are given vectors, χ is a nonnegative penalty parameter, and Π R m + denotes the projection onto R m +. Observe that when the parameter χ is chosen to be positive, one may view the term χ 2 Π R m Dd Hx 2 as the penalty for failing to satisfy the soft constraint + Hx d 0.

24 M. LI, D. F. SUN, AND K.-C. TOH For problem 83, it can be expressed in the form by taking { fx = 2 x, Qx b, x + χ 2 Π R m + Dd Hx 2, px = ϱ x, gy 0, qy = δ R m + y with A = H, B = I. The KKT system for problem 83 is given by: 84 Hx+y c = 0, fx+h ξ +v = 0, y 0, ξ 0, y ξ = 0, v ϱ x, where denotes the elementwise product. In our numerical experiments, we apply the Majorized ipadmm to solve the problem 83 by using both the step-length parameters τ =.68 and τ =. We stop the Majorized ipadmm based on the following relative residual on the KKT system: { Hx k+ + y k+ c 85 max, fxk+ + H ξ k+ + v k+ } 0 6. + c + b Note that in the process of computing the iterates x k+ and y k+ from the Majorized ipadmm, we can generate the corresponding dual variables ξ k+ and v k+ which satisfy the complementarity conditions in 84. Thus we need not check the complementarity conditions in 84 since they are satisfied exactly. In our numerical experiments, for a given pair of n, m, we generate the data for 83 randomly as follows: Q= sprandnfloor0.*n,n,0.; Q = Q *Q; H = sprandnm,n,0.2; xx = randnn,; c = H*xx + maxrandnm,,0; b = Q*xx; and we set the parameter ϱ = 5 n. We take d = c 5e, where e is the vector of all ones. As we can see, Q is positive semidefinite but not positive definite. Note that for the data generated from the above scheme, the optimal solution x, y of 83 has the property that both x and y are nonzero vectors but each has a significant portion of zero components. We have tested other schemes to generate the data but the corresponding optimal solution is not interesting enough in that y is usually the zero vector. In the next two subsections, we consider two separate cases for evaluating the performance of the Majorized ipadmm in solving 83. 6.. The case where the parameter χ in 83 is zero. In the first case, we set the penalty parameter χ = 0. Hence fx is simply a convex quadratic function, and we can omit the word Majorized since there is no majorization on f or g. For this case, we have that Σ f = Q = Σ f and Σ g = 0 = Σ g. We consider the following two choices of the proximal terms for the ipadmm. a The proximal terms in the ipadmm are chosen according to 75 with 86 S := ρ I Σ f σaa, T := 0, where ρ is the constant given in 74 i.e., 87 ρ :=.0λ max Σf 2 Σ f + 2 + ασaa. In the notation of 75, we fix α = when we choose τ =.68. In which case, we get Σ f + S = ρ I σaa Σ f 2 Σ f 0. When we choose τ =, we need to

A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS 25 restrict α to the interval 0.5, ]. In this case, we also pick α = and the condition Σ f + S Σ f 2 Σ f α 2 σaa 0 is satisfied. With the above choices of the proximal terms, we have that Σ f + S + σaa = ρ I, Σg + T + σbb = σi. Furthermore, the conditions 54 and 56 in Theorem 0 are satisfied. Therefore the convergence of the ipadmm is ensured even though S is an indefinite matrix. b The proximal terms in the ipadmm are chosen based on Part a of Theorem 0 as follows: 88 S := ρ 2 I Σ f σaa, T := 0, where for a chosen parameter η 0, /2, say η := 0.49, and γ 2 :=., 89 ρ 2 := λ max 2 Q + γ 2 ησaa. Observe that ρ 2 is a smaller constant than ρ and hence the proximal term S in 88 is more indefinite than the one in 86. In this case, we can easily check that the conditions in 52 for S and T are satisfied. In particular, Σ f + S + ησaa ησ λ max AA I AA + γ 2 ησλ max AA I 0. However, in order to ensure that the ipadmm with the proximal terms chosen in 88 is convergent, we need to monitor the residual 90 R k+ := x k+ x k 2 Σf + y k+ y k 2 Σg+σBB + r k+ 2 in condition 53 as follows. At the kth iteration, if k+ j= Rj 50 and R k+ 0/k +., restart the ipadmm using the best iterate generated so far as the initial point with a new S as in 88 but the parameter γ 2 in 89 is increased by a factor of.. Obviously, the number of such restarts is finite since eventually ρ 2 will be larger than ρ in 87. In our numerical runs, we only encounter such a restart once when testing all the instances. Table reports the comparison between the ipadmm whose proximal term S is the indefinite matrix mentioned in 86 and the semi-proximal ADMM in [] denoted as spadmm where the proximal term S is replaced by the positive semidefinite matrix λ max Q + σaa I Q + σaa. Table 2 is the same as Table except that the comparison is between the ipadmm whose indefinite proximal term S is given by 88 and the spadmm. We can see from the results in both tables that the ipadmm can sometimes bring about 40 50% reduction in the number of iterations needed to solve the problem 83 as compared to the spadmm. In addition, the ipadmm using the more aggressive indefinite proximal term S in 88 sometimes takes substantially less iterations to solve the problems as compared to the more conservative choice in 86. 6.2. The case where the parameter χ in 83 is positive. In the second case, we consider problem 83 where the parameter χ is set to χ = 2ϱ. In this case, a majorization on f is necessary in order for the corresponding subproblem in the ipad- MM or spadmm to be solved efficiently. In this case, we have Σ f = Q + χh D 2 H, Σ f = Q, Σ g = 0 = Σ g. For problem 83 with χ = 2ϱ, the indefinite proximal term S given in 86 is too conservative due to the fact that Σ f is substantially larger than

26 M. LI, D. F. SUN, AND K.-C. TOH Table Comparison between the number of iterations time in seconds taken by ipadmm for S given by 86 and spadmm for S = λ maxq + σaa I Q + σaa to solve the problem 83, with χ = 0, to the required accuracy stated in 85. dim. of H spadmm ipadmm spadmm ipadmm ratio % m n τ =.68 τ =.68 τ = τ = ratio % 2000 000 8977 34.9 857 33. 95.5 94.8 077 43.4 9382 36. 84.7 83. 2000 2000 807 8.9 406 6. 77.8 85.2 970 20.0 398 5.6 7.0 77.9 2000 4000 224 43.9 74 29.5 58.3 67.2 254 43.9 784 3.7 62.5 72. 2000 8000 00 3.9 63 83. 55.7 73.0 03 5.4 647 84.4 58.7 73.2 4000 2000 9863 8.4 9222 68.3 93.5 92.8 99 29.2 043 88.4 87.4 86.0 4000 4000 2004 04.2 273 7.9 63.5 69. 240 0.8 490 80.7 69.6 72.9 4000 8000 408 9.5 829 33.8 58.9 69.9 396 93. 865 38.0 62.0 7.5 4000 6000 745 589.9 95 397. 52.4 67.3 9 648.6 982 407.0 5.4 62.8 8000 4000 068 902.0 0288 87.6 96.3 96.6 387 4.7 458 969.8 86.9 87.0 8000 8000 2037 49.2 220 266.7 59.9 63.6 2085 426.8 296 277.4 62.2 65.0 8000 6000 594 798. 97 549.7 57.5 68.9 696 842.6 994 569.0 58.6 67.5 Table 2 Comparison between the number of iterations time in seconds taken by ipadmm for S given by 88 and spadmm for S = λ maxq + σaa I Q + σaa to solve the problem 83, with χ = 0, to the required accuracy stated in 85. dim. of H spadmm ipadmm spadmm ipadmm ratio % m n τ =.68 τ =.68 τ = τ = ratio % 2000 000 8977 36.0 626 25.3 69.2 70.2 077 4.0 8094 30.0 73. 73.2 2000 2000 807 20. 00 2.6 55.4 62.6 970 20.7 9 2.8 56.8 6.9 2000 4000 224 44.0 659 29.3 53.8 66.5 254 43.9 734 3.0 58.5 70.7 2000 8000 00 5.7 593 80.8 53.9 69.9 03 4. 626 85.4 56.8 74.9 4000 2000 9863 8.7 750 3.3 72.5 72.3 99 28.0 8762 58.3 73.5 72.6 4000 4000 2004 04.4 48 65.6 57.3 62.8 240 0. 280 7.0 59.8 64.5 4000 8000 408 92.2 805 34.4 57.2 69.9 396 9.3 846 36.6 60.6 7.4 4000 6000 747 588.0 900 392.0 5.5 66.7 9 658.0 930 403.9 48.7 6.4 8000 4000 068 900.8 8 689.5 75.9 76.5 387. 9587 82.0 72.7 73.9 8000 8000 2037 420.2 09 248.8 54.4 59.2 2085 423.8 22 269.2 58.6 63.5 8000 6000 594 802.2 902 539.5 56.6 67.3 696 845.0 942 55.5 55.5 65.3 Σ f. In order to realize the full potential of allowing for an indefinite proximal term, we make use of the condition 53 in Part a of Theorem 0 by choosing S and T as follows: 9 S := ρ 3 I Σ f σaa, T := 0, where for a chosen parameter η 0, /2, say η := 0.49, and γ 3 := 0.25, 92 ρ 3 := λ max 2 Q + ησ + γ 3 χ AA. In this case, we can easily check that the conditions in 52 for S and T are satisfied. In particular, Σ f +S +ησaa ησ λ max AA I AA +γ 3 χλ max AA I 0. Again, in order to ensure that the Majorized ipadmm with the proximal terms chosen in 9 is convergent, we need to monitor the residual, defined similarly as in 90,