Inexact Alternating-Direction-Based Contraction Methods for Separable Linearly Constrained Convex Optimization

Size: px

Start display at page:

Download "Inexact Alternating-Direction-Based Contraction Methods for Separable Linearly Constrained Convex Optimization"

Catherine Reed
5 years ago
Views:

1 J Optim Theory Appl 04) 63:05 9 DOI 0007/s z Inexact Alternating-Direction-Based Contraction Methods for Separable Linearly Constrained Convex Optimization Guoyong Gu Bingsheng He Junfeng Yang Received: 9 February 03 / Accepted: 7 November 03 / Published online: 3 December 03 Springer Science+Business Media New Yor 03 Abstract Alternating direction method of multipliers has been well studied in the context of linearly constrained convex optimization In the last few years, we have witnessed a number of novel applications arising from image processing, compressive sensing and statistics, etc, where the approach is surprisingly efficient In the early applications, the objective function of the linearly constrained convex optimization problem is separable into two parts Recently, the alternating direction method of multipliers has been extended to the case where the number of the separable parts in the objective function is finite However, in each iteration, the subproblems are required to be solved exactly In this paper, by introducing some reasonable inexactness criteria, we propose two inexact alternating-direction-based contraction methods, which substantially broaden the applicable scope of the approach The convergence and complexity results for both methods are derived in the framewor of variational inequalities Keywords Alternating direction method of multipliers Separable linearly constrained convex optimization Contraction method Complexity Communicated by Panos M Pardalos G Gu B) B He J Yang Department of Mathematics, Nanjing University, Nanjing, 0093, China ggu@njueducn B He hebma@njueducn J Yang jfyang@njueducn

2 06 J Optim Theory Appl 04) 63:05 9 Introduction Because of its significant efficiency and easy implementation, the alternating direction method of multipliers ADMM) has attracted wide attention in various areas [, ] In particular, some novel and attractive applications of the ADMM have been discovered very recently; eg, total-variation regularization problems in image processing [3], l -norm minimization in compressive sensing [4], semidefinite optimization problems [5], the covariance selection problem and semidefinite least squares problem in statistics [6, 7], the sparse and low-ran recovery problem in engineering [8], etc The paper is organized as follows As a preparation, in the next section, we give a brief review of ADMM, which serves as a motivation of our inexact alternatingdirection-based contraction methods Some basic properties of the projection mappings and variational inequalities are recalled as well Then, in Sect 3 we present two contraction methods, which are based on two descent directions generated from an inexact alternating minimization of L A defined in 4) The rationale of the two descent search directions follows in Sect 4, and the convergence and complexity results are proved in Sects 5 and 6, respectively Finally, some conclusions are drawn in Sect 7 Preliminaries Before introducing the ADMM, we briefly review the augmented Lagrangian method ALM), for which we consider the following linearly constrained convex optimization problem: min { θx) Ax = b,x X } ) Here θx) : R n R is a convex function not necessarily smooth), A R m n, b R m, and X R n is a closed and convex set For solving problem ), the classical ALM generates a sequence of iterates via the following scheme: { x + = arg min{θx) λ,ax b + Ax b H x X }, λ + = λ HAx + b), where H R m m is a positive definite scaling matrix penalizing the violation of the linear constraints, and λ R m is the associated Lagrange multiplier; see, eg, [9, 0] for more details An important special scenario of ), which captures concrete applications in many fields [6, 3], is the following case, where the objective function is separable into two parts: min { θ x ) + θ x ) A x + A x = b,x i X i,i =, } ) Here n + n = n, and, for i =,, θ i : R n i R are convex functions not necessarily smooth), A i R m n i, and X i R n i are closed and convex sets For solving ), the ADMM, which dates bac to [4] and is closely related to the Douglas Rachford

3 J Optim Theory Appl 04) 63: operator splitting method [5], is perhaps one of the most popular methods Given x,λ ), the ADMM generates x +,x +,λ + ) via the following scheme: x + = arg min{θ x ) λ,a x + A x b + A x + A x b H x X }, x + = arg min{θ x ) λ,a x + + A x b + A x + + A x b H x X }, λ + = λ HA x + + A x + b) Therefore, ADMM can be viewed as a practical and structure-exploiting variant in a split or relaxed form) of the classical ALM for solving the separable problem ), with the adaptation of minimizing the involved separable variables separately in an alternating order In this paper, we consider a more general separable case of ) in the sense that the objective function is separable into finitely many parts: { N } N min θ i x i ) A i x i = b,x i X i,i =,,N, 3) i= i= where N i= n i = n, and, for i =,,N, θ i : R n i R are convex functions not necessarily smooth), A i R m n i, b R m, and X i R n i are closed and convex sets Without loss of generality, we assume that the solution set of 3) is nonempty Recently, the ADMM was extended to handle 3)byHeetal[6] For convenience, we denote the augmented Lagrangian function of 3) by N N L A x,,x N,λ):= θ i x i ) λ, A i x i b + N A i x i b, 4) i= i= where H R m m is a symmetric positive definite scaling matrix, and λ R m is the Lagrangian multiplier Given x,,x N,λ ), the algorithm in [6] generates the next iterate x +,,xn +,λ+ ) in two steps First, the algorithm produces a trial point x,, x N, λ ) by the following alternating minimization scheme: { x i = arg min{l A x,, x i,x i,xi+,,x N,λ ) x i X i }, i =,,N, i= H λ = λ H N i= A i x i b) 5) Then, the next iterate x +,,xn +,λ+ ) is obtained from x,,x N,λ ) and the trial point x,, x N, λ ) To the best of our nowledge, it is the first time that tractable algorithms for 3) based on the full utilization of their separable structure have been developed As demonstrated in [6], the resulting method falls into the framewors of both the descent-lie methods in the sense that the iterates generated by the extended ADMM scheme 5) can be used to construct descent directions and the contraction-type methods according to the definition in [7]) as the distance between the iterates and the solution set of 3) is monotonically decreasing Therefore, the method is called the alternating-direction-based contraction method

4 08 J Optim Theory Appl 04) 63:05 9 From a practical point of view, however, in many cases, solving each subproblem in 5) accurately is either expensive or even impossible On the other hand, there seems to be little justification of the effort required to calculate the accurate solutions at each iteration In this paper, we develop two inexact methods for solving problem 3), in which the subproblems in 5) are solved inexactly So, the methods presented in this paper are named inexact alternating-direction-based contraction methods In the sequel, we briefly review some basic properties and related definitions that will be used in the forthcoming analysis We consider the generally separable linearly constrained convex optimization problem 3) For i =,,N,welet f i x i ) θ i x i )), where θ i x i )) denotes the subdifferential of θ i at x i Moreover, we let W := X X N R m Then, it is evident that the first-order optimality condition of 3) is equivalent to the following variational inequality for finding x,,x N,λ ) W: { xi x i,f ix i ) AT i λ 0, i =,,N, λ λ, N i= A i x i b 0, x,,x N,λ) W 6) By defining x w := x N λ f x ) A T λ and Fw):= f N x N ) A T N λ Ni= A i x i b the problem 6) can be rewritten in a more compact form as w w,f w ) 0 w W, which we denote by VIW,F) Recall that Fw)is said to be monotone iff u v,fu) Fv) 0 u, v W One can easily verify that Fw) is monotone whenever all f i, i =,,N,are monotone Under the nonempty assumption of the solution set of 3), the solution set, W,ofVIW,F)is nonempty and convex Given a positive definite matrix G of size n + m) n + m), we define the G- norm of u R n+m as u G = u, Gu The projection onto W under the G-norm is defined as P W,G v) := arg min { v w G w W } From the above definition it follows that v PW,G v), G w P W,G v) ) 0 v R n+m, w W 7) Consequently, we have P W,G u) P W,G v) G u v G u, v R n+m

5 J Optim Theory Appl 04) 63: and PW,G v) w G v w G v PW,G v) G v Rn+m, w W 8) An important property of the projection is contained in the following lemma, for which the omitted proof can be found in [8, pp 67] Lemma Let W be a closed convex set in R n+m, and let G be any n + m) n + m) positive definite matrix Then w is a solution of VIW,F)if and only if w [ = P W,G w αg F w )] α>0 In other words, we have w [ = P W,G w αg F w )] w W, w W w w,f w ) 0 9) 3 Two Contraction Methods In this section, we present two algorithms, each of which consists of a search direction and a step length The search directions are based on the inexact alternating minimization of L A, while both algorithms use the same step length 3 Search Directions For given w = x,,x N,λ ), the alternating direction scheme of the th iteration generates a trial point w = x,, x N, λ ) via the following procedure: For i =,,N, x i is computed as { [ x i = P X i x i ) i f i x i A T i λ + A T i H A j x j + j= where ξi A i x i x i ) H and xi x i,ξ i N j=i+ ) ]} A j xj b + ξi, 0) A i x 4 i x i ) H ) Then, we set N λ = λ H A j x j ) b ) j= We claim that the above approximation scheme is appropriate for inexactly solving the minimization subproblems, ie, min { L A x,, x i,x i,x i+,,x N,λ) x i X i } 3)

6 0 J Optim Theory Appl 04) 63:05 9 Assume that x i is an optimal solution of 3) By the definition of L A in 4), the first-order optimality condition of 3) reduces to finding x i X i such that [ x i x i,f ) i x i A T i λ H i A j x j +A i x i + j= N j=i+ A j x j b )] 0 x i X i where f i x i ) is a subgradient of θ ix i ) at x i, ie, f i x i ) θ i x i )) According to Lemma, the above variational inequality is equivalent to { [ x i = P X i x i ) i f i x i A T i λ + A T i H A j x j + A i x i + j= N j=i+ A j x j b )]} If x i is an optimal solution of 3), then by setting ξi = 0in0) the approximation conditions in ) are satisfied In fact, if A i xi x i ) H = 0, then the approximation conditions in ) require that the corresponding subproblem 0) besolved exactly Thus, without loss of generality, we may assume that Ai x i x i ) H 0 Suppose that ˆx i is an approximate solution of 3), ie, { [ ˆx i P X i ˆx i ) i f i ˆx i A T i λ + A T i H A j x j + A i ˆx i + By setting j= { [ x i = P X i ˆx i ) i f i ˆx i A T i λ + A T i H A j x j + A i ˆx i + j= N j=i+ N j=i+ simple manipulation shows that 0) holds with ξi = x i ) ) )) ˆx i f x i f ˆx i A T i HA i x i ˆx i ) A j x j b )]} A j x j b )]} When ˆx i is sufficiently close to the exact solution x i, the relations between x i, ˆx i, and x i ensure that ˆx i x i and Ai x i x i ) H Ai x i x i ) H Therefore, for a suitable approximate solution ˆx i, we can define an x i such that the inexactness conditions 0) and ) are satisfied

7 J Optim Theory Appl 04) 63:05 9 Instead of accepting w as a new iterate, we use it to generate descent search directions In the sequel, we define the matrix M and the vector ξ as A T HA A T M := HA A T HA 0 0 A T N HA A T N HA A T N HA N H Then, our two search directions are, respectively, given by ξ ξ and ξ := ξn 0 4) d w, w,ξ ) = M w w ) ξ 5) and d w, w,ξ ) = F w ) + A T A T N 0 N H A j x j x j ) ) 6) j= Based on w, w, and ξ, we define ϕ w, w,ξ ) = w w,d w, w,ξ ) N + λ λ, A j x j x j ) 7) The function ϕw, w,ξ ) is a ey component in analyzing the proposed methods In the subsequent section, we prove in Theorem 4) that j= { ϕw, w,ξ ) 4 N j= A j x j x j ) H + λ λ H ), ϕw, w,ξ ) = 0 w W 8) Hence, ϕw, w,ξ ) can be viewed as an error measuring function, which measures how much w fails to be in W Furthermore, by utilizing ϕw, w,ξ ) we will prove in Theorems 4 and 43 that, for any given positive definite matrix G R n+m) n+m), both G d w, w,ξ ) and G d w, w,ξ ) are descent directions with respect to the unnown distance function w w G 3 The Step Length In this subsection, we provide a step length for the search directions d w, w,ξ ) and d w, w,ξ ) Later, we will justify the choice of the step length to be defined and show that the step length has a positive lower bound

8 J Optim Theory Appl 04) 63:05 9 Given a positive definite matrix G R n+m) n+m), the new iterate w + is generated as w + = w α G d w, w,ξ ) 9) or w + [ = P W,G w α G d w, w,ξ )], 0) where α = γα, α = ϕw, w,ξ ) G d w, w,ξ ), and γ 0, ) ) G The sequence {w } generated by 9) is not necessarily contained in W, while the sequence produced by 0) lies in W Note that the step length α in both 9) and 0) depends merely on ϕw, w,ξ ), d w, w,ξ ), and γ The proposed methods utilize different search directions but the same step length According to our numerical experiments in [9], the update form 0) usually outperforms 9), provided that the projection onto W can easily be carried out We mention that the proposed inexact ADMMs 9) and 0) are different from those in the literature [9, 0] In fact, the inexact methods proposed in [9, 0] belong to the proximal-type methods [ 4], while the ADMM subproblems in this paper do not include proximal terms of any form 4 Rationale of the Two Directions To derive the convergence of the proposed methods, we use similar arguments as those in the general framewor proposed in [5] By Lemma the equality in 0) is equivalent to [ x i x i,f ) i N i x i A T i λ H A j x j + A j xj )]+ξ b i 0 x i X i j= j=i+ By substituting λ given in ), the above inequality can be rewritten as x i x i,f ) N i x i A T i λ + A T i H A j x j x j ) ) + ξi 0 x i X i ) j=i+ According to the general framewor [5], for the pair w, w ), the following conditions are required to guarantee the convergence: w { = P W w [ d w, w,ξ ) d w, w,ξ )]}, 3) w w,d w, w,ξ ) ϕ w, w,ξ ) w w,d w, w,ξ ), 4) for which we give proofs in Lemmas 4 and 4, respectively

9 J Optim Theory Appl 04) 63: Lemma 4 Let w = x,, x N, λ ) be generated by the inexact alternating direction scheme 0) ) from the given vector w = x,,x N,λ ) Then, we have w W and w w,d w, w,ξ ) d w, w,ξ ) 0 w W, where d w, w,ξ ) and d w, w,ξ ) are defined in 5) and 6), respectively Proof Denote x := x,, x N ) and X := X X N It is clear from 0) that x X From), for all x X,wehave f x ) AT λ A T H N j= A j xj x j )) x x T x x f x ) AT λ + A T H N j=3 A j xj x j )) + x N x N f N x N ) AT λ N 0 ξ ξ ξ N 0 5) By adding x x T A T x x H j= A j xj x j )) A T H j= A j xj x j )) x N x N A T N H N j= A j xj x j )) to both sides of 5), we get x x T f x x x ) AT λ + A T H N j= A j xj x j )) ξ f x ) AT λ + A T H N j= A j x j x j )) ξ + x N x N f N x N ) AT λ N + A T N H N j= A j xj x j )) ξn x x T A T x x H j= A j xj x j )) A T H j= A j xj x j )) x N x N A T N H N j= A j xj x j )) x x T A T HA 0 0 x x = x A T HA A T HA x x x 0 6) x N x N A T N HA A T N HA A T N HA x N N x N

10 4 J Optim Theory Appl 04) 63:05 9 for all x X Since N i= A i x i b = H λ λ ), by embedding the equality N λ λ, A i x i b = λ λ,h λ λ ) i= into 6) we obtain w W and x x T f x x x ) AT λ + A T H N j= A j xj x j )) ξ f x ) AT λ + A T H N j= A j x j x j )) ξ x N + x N f N x λ λ N ) AT λ N + A T N H N j= A j xj x j )) ξn Ni= A i x i b 0 x x T A T HA x x x A T HA A T HA x x x x N 0 0 x N A T λ λ N HA A T N HA A T N HA N 0 xn x N H λ λ for all w W Recalling the definitions of d w, w,ξ ) and M see 6) and 4)), the last inequality can be rewritten in the following compact form: w W, w w,d w, w,ξ ) + ξ w w,m w w ) w W The assertion of this lemma follows directly from the above inequality and the definition of d w, w,ξ ) in 5) According to 9),theassertioninLemma4 is equivalent to 3) Lemma 4 Let w = x,, x N, λ ) be generated by the inexact alternating direction scheme 0) ) from the given vector w = x,,x N,λ ) Then, we have w w,d w, w,ξ ) ϕ w, w,ξ ) w w,d w, w,ξ ) w W 7) where d w, w,ξ ), d w, w,ξ ), and ϕw, w,ξ ) are defined in 5), 6), and 7) Proof According to 6), we have w w,d w, w,ξ ) = w w,f N w ) + A j x j xj ) N,H A j x j x j ) ) 8) j= j=

11 J Optim Theory Appl 04) 63: From the monotonicity of Fw)and the fact that w W it follows that w w,f w ) w w,f w ) 0 Substituting the above inequality into 8), we obtain w w N,d w, w,ξ ) A j x j xj ) N,H A j x j x j ) ) j= Since N j= A j xj = b and H N j= A j x j b) = λ λ, the above inequality becomes w w,d w, w,ξ ) N λ λ, A j x j x j ) w W Inequality 7) follows immediately by further considering the definition of ϕw, w,ξ ) Lemma 43 Let w = x,, x N, λ ) be generated by the inexact alternating direction scheme 0) ) from the given vector w = x,,x N,λ ) Then, we have w w,m w w ) N + λ λ, A j x j x j ) j= j= j= = N A j x j x j ) H + λ λ N ) H + A j x j b j= Proof From the definition of the matrix M in 4)wehave A T HA x M w w ) A T HA A T HA x x = x 0 0, A T N HA A T N HA A T N HA N 0 xn x N H λ λ from which we can easily verify the following equality: w w,m w w ) = N ) A j x j x j ) H + λ λ H j= j= H

12 6 J Optim Theory Appl 04) 63:05 9 A x x ) T H H H 0 A x + A x x ) H H H 0 x ) A x A N xn x N ) x ) H H H 0 A N x λ λ H N x N ) λ λ Simple calculations show that N λ λ, A j x j x j ) j= A x x ) A x = x ) A N xn x N ) λ λ T 9) I A x x ) I A x x ) 30) I A N xn x N ) I I I 0 λ λ The addition of 9) and 30) yields w w,m w w ) N + λ λ, A j x j x j ) j= A x x ) T H H H I A x x ) A x = x ) H H H I A x x ) A N xn x N ) H H H I A N xn x N ) λ λ I I I H λ λ + N A j x j x ) j H + ) λ λ H j= = N A j x j x j ) + H λ λ ) j= + N Aj x j x ) j H + ) λ λ H j= = N A j xj b + N Aj x j x j ) H + ) λ λ H, j= H j= H where the last equality follows from ) The lemma is proved

13 J Optim Theory Appl 04) 63: Now, we prove 8) and the descent properties of d w, w,ξ ) and d w, w,ξ ) Theorem 4 Let w = x,, x N, λ ) be generated by the inexact alternating direction scheme 0) ) from the given vector w = x,,x N,λ ) If the inexactness criteria ) are satisfied, then ϕ w, w,ξ ) 4 N j= Aj x j x j ) H + λ λ H In addition, if ϕw, w,ξ ) = 0, then w W is a solution of VIW,F) ) 3) Proof First, it follows from 5), 7), and Lemma 43 that ϕ w, w,ξ ) = N A j x j x ) j H + ) λ λ H j= + N A j x j b w w,ξ 3) j= From the inexactness criteria )wehave w w,ξ 4 H N Aj x j x j ) Substituting the above inequality into 3), the first part of the theorem follows immediately Consequently, if ϕw, w,ξ ) = 0, it follows that j= A j x j x j ) = 0, j =,,N, and λ = λ H Moreover, it follows from ) that ξj equations into ) we get = 0, j =,,N By substituting the above x i X i, xi x i,f ) i x i A T i λ 0 xi X i,i =,,N, 33) and N A j x j b = H λ λ ) = 0 34) j= Combining 33) and 34), we get w W, w w,f w ) 0 w W, and thus w is a solution of VIW,F)

14 8 J Optim Theory Appl 04) 63:05 9 Theorem 4 Let w = x,, x N, λ ) be generated by the inexact alternating direction scheme 0) ) from the given vector w = x,,x N,λ ) Then, we have w w,d w, w,ξ ) ϕ w, w,ξ ) w W, 35) where d w, w,ξ ) and ϕw, w,ξ ) are defined in 5) and 7), respectively Proof First, it follows from Lemma 4 that w w,d w, w,ξ ) w w,d w, w,ξ ) w W Combining with Lemma 4, wehave w w,d w, w,ξ ) ϕ w, w,ξ ) w w,d w, w,ξ ) w W which implies 35) Recall that the sequence {w } generated by 9) is not necessarily contained in W Therefore, the w in Theorem 4 must be allowed to be any point in R m+n In contrast, the sequence produced by 0) lies in W, and thus it is required in the next theorem that w belongs to W Theorem 43 Let w = x,, x N, λ ) be generated by the inexact alternating directions scheme 0) ) from the given vector w = x,,x N,λ ) If w W, then w w,d w, w,ξ ) ϕ w, w,ξ ) w W, 36) where d w, w,ξ ) and ϕw, w,ξ ) are defined in 6) and 7), respectively Proof Since w W, it follows from Lemma 4 that w w,d w, w,ξ ) w w,d w, w,ξ ) 37) Then, the addition of 7) to both sides of 37) yields 36) 5 Convergence Using the directions d w, w,ξ ) and d w, w,ξ ) offered by 5) and 6), the new iterate w + is determined by the positive definite matrix G and the step length α,see9) and 0) In order to explain how to determine the step length, we define the new step-length-dependent iterate by w + α ) = w α G d w, w,ξ ) 38) and w + α ) = P W,G [ w α G d w, w,ξ )] 39)

15 J Optim Theory Appl 04) 63: In this way, ϑ α ) = w w G w + α ) w 40) G and ϑ α ) = w w G w + α ) w 4) G measure the improvement in the th iteration by using updating forms 38) and 39), respectively Since the optimal solution w W is unnown, it is generally infeasible to maximize the improvement directly The following theorem introduces a tight lower bound on ϑ α ) and ϑ α ), which does not depend on the unnown vector w Theorem 5 For any w W and α 0, we have ϑ α ) qα ) and ϑ α ) qα ), where qα ) = α ϕ w, w,ξ ) α G d w, w,ξ ) G 4) Proof From 38) and 40) wehave ϑ α ) = w w G w w α G d w, w,ξ ) G = w w,α d w, w,ξ ) α G d w, w,ξ ) G α ϕ w, w,ξ ) α G d w, w,ξ ) G = qα ), where the last inequality follows from 35) Hence, the first assertion of this theorem is proved By setting v = w α G d w, w,ξ ) and u = w in 8) we get w + α ) w G w α G d w, w,ξ ) w G w α G d w, w,ξ ) w + α ) G Substituting the above inequality into 4), we obtain ϑ α ) w w G w w α G d w, w,ξ ) G + w w + α ) α G d w, w,ξ ) G = w w + α ) G + w + α ) w,α d w, w,ξ ) 43) Since w + α ) W, it follows from Lemma 4 that w + α ) w,d w, w,ξ ) w + α ) w,d w, w,ξ ) 44) The addition of 7) to both sides of 44) yields w + α ) w,d w, w,ξ ) ϕ w, w,ξ ) + w + α ) w,d w, w,ξ ) 45)

16 0 J Optim Theory Appl 04) 63:05 9 Substituting 45) into the right-hand side of 43), we obtain ϑ α ) w w + α ) G + α ϕ w, w,ξ ) + α w + α ) w ) T d w, w,ξ ) = w w + α ) α G d w, w,ξ ) G α G d w, w,ξ ) G + α ϕ w, w,ξ ) α ϕ w, w,ξ ) α G d w, w,ξ ) G = qα ) Hence, the proof is completed Denote ϑα ) := min { ϑ α ), ϑ α ) } qα ) 46) Note that qα ) is a quadratic function of α, and it reaches its maximum at α = ϕw, w,ξ ) G d w, w,ξ ), 47) G which is just the same as defined in ) Consequently, it follows from Theorem 5 that ϑ α ) q α ) = α ϕ w, w,ξ ) Since inequality 35) is used in the proof of Theorem 5, in practice, multiplication of the optimal step length α by a factor γ>may result in faster convergence By using 4) and 47)wehave q γα ) = γα ϕ w, w,ξ ) γα ) G d w, w,ξ ) G = γ γ)α ϕ w, w,ξ ) 48) In order to guarantee that the right-hand side of 48) is positive, we choose γ [, ) The following theorem shows that the sequence {w } generated by the proposed method is Fejèr monotone with respect to W Theorem 5 For any w W, the sequence {w } generated by each of the proposed methods with update form 9) or 0)) satisfies w + w G w w G γ γ)α ϕ w, w,ξ ) w W 49) Proof It follows from Theorem 5 that ϑγα ) qγα ), which is equivalent to w + w G w w G q γα ) Then, the result of this theorem directly follows from 48)

17 J Optim Theory Appl 04) 63:05 9 Theorem 5 indicates that the sequence {w } converges to the solution set monotonically in the Fejèr sense Thus, according to [6, 7], the proposed method belongs to the class of contraction methods In the following, we show that α > 0 has a positive lower bound Lemma 5 For any given but fixed) positive definite matrix G, there exists a constant c 0 > 0 such that α c 0 for all >0 Proof From the definition of M in 4) it is easy to show that A T H / M w w ) A T = H / A T H / 0 0 A T N H / A T N H / A T N H / H / H / A x x ) H / A x x ) H / A N xn x N ) H / λ λ ) Therefore, there exists a constant K>0such that M w w ) N K Aj x j x j ) H + ) λ λ H j= Moreover, according to the inexactness criteria ), we have ξ N A j x j x j ) j= Since d w, w,ξ ) = Mw w ) ξ,wehave d w, w,ξ ) M w w ) + ξ N K + ) A j x j x j ) H + ) λ λ H j= Using 3) and recalling that any two norms in a finite-dimensional space are equivalent, we have α = ϕw, w,ξ ) d w, w,ξ ) c ϕw, w,ξ ) d G w, w,ξ ) c c 0 := > 0, 50) 8K + ) H where c > 0 is a constant This completes the proof of the lemma

18 J Optim Theory Appl 04) 63:05 9 Theorem 53 Let {w } and { w } be the sequences generated by the proposed alternating-direction-based contraction methods 9) and 0) for problem 3) Then, we have i) The sequence {w } is bounded ii) lim { N j= A j xj x j ) + λ λ }=0 iii) Any accumulation point of { w } is a solution of 3) iv) When A i, i =,,N, are full column ran matrices, the sequence { w } converges to a unique w W Proof The first assertion follows from 49) Moreover, by combining the recursion of 49) and the fact that α c 0 > 0 it is easy to show that lim ϕ w, w,ξ ) = 0 Consequently, it follows from Theorem 4 that lim Ai x i x i ) = 0, i =,,N, and lim λ λ = 0, 5) and the second assertion is proved Using similar arguments as in the proof of Theorem 4, we obtain w W, lim w w,f w ) 0 w W, 5) and thus any accumulation point of { w } is a solution of VIW,F), ie, a solution of 3) If all A i are full column ran matrices, it follows from the first assertion and 5) that { w } is also bounded Let w be an accumulation point of { w } Then, there exists a subsequence { w j } that converges to w It follows from 5) that w j W, lim w w j,f w ) j 0 w W, and consequently, we have w W, w w,f w ) 0 w W, which implies that w W Since {w } is Fejèr monotone and lim w w =0, the sequence { w } cannot have any other accumulation point and thus must converge to w 6 Complexity The analysis in this section is inspired by [6] It is based on a ey inequality see Lemmas6 and 6)that is similarto that in [6] In the current framewor of variational inequalities, the analysis becomes much simpler and more elegant As a prepa-

19 J Optim Theory Appl 04) 63: ration for the proof of the complexity result, we first give an alternative characterization of the optimal solution set W, namely, W = { w W : w w,fw) 0 } w W For a proof of this characterization, we refer to Theorem 35 in the boo [7] According to the above alternative characterization, we have that ŵ W is an ɛ- optimal solution of VIW,F)if it satisfies ŵ W and {ŵ } sup w,fw) ɛ 53) w W In general, our complexity analysis follows the lines of [8], but instead of using w directly, we need to introduce an auxiliary vector, namely, x ŵ = x N, ˆλ where ˆλ = λ H N A j x j x j ) Since λ = λ H N j= A j x j b) see )), we have ˆλ = λ H N j= A j x j b) or, equivalently, H λ λ ) = j= N N A j x j b = A j x j x j ) + H λ ˆλ ) j= As a consequence, we may rewrite the two descent directions separately as d w, w,ξ ) = M w w ) ξ = ˆM w ŵ ) ξ =: ˆd w, ŵ,ξ ), 54) where and j= A T HA A T ˆM := HA A T HA 0 0 A T N HA A T N HA A T N HA N 0 A A A N H d w, w,ξ ) = F w ) + A T A T N 0 N H A j x j x j ) ) = F ŵ ) 55) j=

20 4 J Optim Theory Appl 04) 63:05 9 In fact, we constructed ŵ and ˆM such that ˆd w, ŵ,ξ ) is exactly the direction d w, w,ξ ) and Fŵ ) is exactly the direction d w, w,ξ ) Moreover, the assertioninlemma4 can be rewritten accordingly as w ŵ,f ŵ ) ˆd w, ŵ,ξ ) 0 w W 56) Now we are ready to prove the ey inequality for both algorithms, which is given in the following two lemmas Lemma 6 If the new iterate w + is updated by 9), then we have w ŵ,γα F ŵ ) + w w G w w + ) G 0 w W Proof Due to 56), we have w ŵ,γα F ŵ ) w ŵ,γα ˆd w, ŵ,ξ ) w W 57) In addition, we have, by using 54) and 9), γα ˆd w, ŵ,ξ ) = γα d w, w,ξ ) = G w w +) Substitution of the last equation into 57) yields w ŵ,γα F ŵ ) w ŵ,g w w +) Thus, it suffices to show that w ŵ,g w w +) + w w G w w + ) G 0 w W 58) Applying the equality a b,gc d) + a c G a d G) = c b G d b ) G, we derive that w ŵ,g w w +) + w w G w w+ G) = w ŵ G w + ŵ G) 59) In view of 9), we have w ŵ G w + ŵ G = w ŵ G w ŵ γα G d w, w,ξ ) G = w ŵ,γα d w, w,ξ ) γ α ) G d w, w,ξ ) G

21 J Optim Theory Appl 04) 63: Moreover, we have w ŵ,d w, w,ξ ) = w w,d w, w,ξ ) + w ŵ,d w, w,ξ ) = w w,d w, w,ξ ) N + H A j x j x j ),H λ λ ) j= = w w,d w, w,ξ ) N + λ λ, A j x j x j ) = ϕ w, w,ξ ), 60) where the second equality follows from the definitions of w, ŵ, and d w, w,ξ ) By combining the last two equations and using ) we obtain w ŵ G w + ŵ G j= = γα ϕ w, w,ξ ) γ α ) G d w, w,ξ ) G = γ γ)α ϕ w, w,ξ ) Substituting the last equation into 59) yields w ŵ,g w w +) + w w G w w+ G) = γ γ)α ϕ w, w,ξ ) 0, which is just 58) Thus, the proof is complete Lemma 6 If the new iterate w + is updated by 0), then we have w ŵ,γα F ŵ ) + w w G w w+ ) G 0 w W Proof To begin with, we separate the term w ŵ ) T γα Fŵ ) into two as w ŵ,γα F ŵ ) = w + ŵ,γα F ŵ ) + w w +,γα F ŵ ) 6) In the sequel, we will deal with the above two terms separately Since w + W, we have, by substituting w with w + in 56), w + ŵ,γα F ŵ ) w + ŵ,γα ˆd w, ŵ,ξ ) = w + ŵ,γα d w, w,ξ ) = w ŵ,γα d w, w,ξ ) w w +,γα d w, w,ξ ) 6)

22 6 J Optim Theory Appl 04) 63:05 9 Recall that the first term on the right-hand side of 6) was calculated in 60) As to the second term, we have w w +,γα d w, w,ξ ) Thus, we obtain w + ŵ,γα F ŵ ) = w w +,G γα G d w, w,ξ )) w w + G + γ α ) G d w, w,ξ ) G γα ϕ w, w,ξ ) γ α ) G d w, w,ξ ) G w w + G = γ γ)α ϕ w, w,ξ ) w w + G, 63) where the equality follows from ) Now, we turn to consider the second term w w +,γα Fŵ ) in 6) Since w + is updated by 0), w + is the projection of [w γα G Fŵ )] onto W under the G-norm It follows from 7) that [ w γα G F ŵ )] w +,G w w +) 0 w W As a consequence, we have w w +,γα F ŵ ) w w +,G w w +) By applying the formula a,gb = a G a b G + b G ) to the right-hand side of the last inequality we derive that w w +,γα F ŵ ) w w + G w w G) + w w + G 64) By incorporating inequalities 63) and 64) into Eq 6), the assertion of Lemma 6 follows Having the same ey inequality for both methods, the O/t) rate of convergence in an ergodic sense) can be obtained easily Theorem 6 For any integer t>0, we have a ŵ t W that satisfies ŵt w,fw) γυ t w w 0 G w W, where ŵ t = Υ t t t α ŵ and Υ t = α =0 =0

23 J Optim Theory Appl 04) 63: Proof In Lemmas 6 and 6, we have proved the same ey inequality for both methods, namely, w ŵ,γα F ŵ ) + w w G w w+ ) G 0 w W Since F is monotone, we have w ŵ,γα Fw) + w w G w w+ ) G 0 w W, or, equivalently, ŵ w,γ α Fw) + w w + G w w ) G 0 w W, When taing the sum of the above inequalities over = 0,,t, we obtain t t ) α ŵ α w,fw) + w t+ w 0 G γ w w0 ) G 0 =0 =0 By dropping the term w t+ w 0 G and incorporating Υ t and ŵ t into the above inequality, we have ŵt w,fw) w w0 G γυ t w W Hence, the proof is complete Since it follows from 50) that t Υ t = α t + )c 0, we have, by Theorem 6, =0 ŵt w,fw) γυ t w w 0 G w w0 G γc 0 t + ) w W According to 53), the above inequality implies the O/t)-rate of convergence immediately We emphasize that our convergence rate is in the ergodic sense From a theoretical point of view, this suggests to use a larger parameter γ 0, ) in implementations 7 Conclusions Attracted by the practical efficiency of the alternating direction method of multipliers, an alternating-direction-based contraction method was proposed in [6] The

24 8 J Optim Theory Appl 04) 63:05 9 new method deals with the general separable and linearly constrained convex optimization problem, where the objective function is separable into finitely many parts However, the new method requires the exact solution of ADMM subproblems, which limits its applicability To overcome this limitation, this paper presents two inexact alternating-direction-based contraction methods These methods are practically more viable since the subproblems are solved inexactly The convergence properties and complexity results O/t) rate of convergence) of the proposed methods were derived We emphasize that even for the simplest case, where the objective function is separable into two parts, our methods are different from the common inexact methods in the literature [9, 0], as our subproblems for computing search directions do not include proximal terms of any form In addition, the complexity results, which are proved in the framewor of variational inequalities, are new for this ind of inexact ADMs Acnowledgements Authors wish to than Prof Panos M Pardalos the associate editor) and Prof Cornelis Roos Delft University of Technology) for useful comments and suggestions G Gu was supported by the NSFC grant 004 B He was supported by the NSFC grant J Yang was supported by the NSFC grant 379 References Floudas, CA, Pardalos, PM eds): Encyclopedia of Optimization, nd edn Springer, Berlin 009) Pardalos, PM, Resende, MGC eds): Handboo of Applied Optimization Oxford University Press, Oxford 00) 3 Ng, MK, Weiss, P, Yuan, X: Solving constrained total-variation image restoration and reconstruction problems via alternating direction methods SIAM J Sci Comput 35), ) 4 Yang, J, Zhang, Y: Alternating direction algorithms for l -problems in compressive sensing SIAM J Sci Comput 33), ) 5 Wen, Z, Goldfarb, D, Yin, W: Alternating direction augmented Lagrangian methods for semidefinite programming Math Program Comput 3 4), ) 6 He, B, Xu, M, Yuan, X: Solving large-scale least squares semidefinite programming by alternating direction methods SIAM J Matrix Anal Appl 3), ) 7 Yuan, X: Alternating direction method for covariance selection models J Sci Comput 5), ) 8 Yuan, X, Yang, J: Sparse and low-ran matrix decomposition via alternating direction method Pac J Optim 9), ) 9 Bertseas, DP: Constrained Optimization and Lagrange Multiplier Methods Computer Science and Applied Mathematics Academic Press/Harcourt Brace Jovanovich Publishers, New Yor 98) 0 Nocedal, J, Wright, SJ: Numerical Optimization, nd edn Springer Series in Operations Research and Financial Engineering Springer, New Yor 006) Candès, EJ, Romberg, J, Tao, T: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information IEEE Trans Inf Theory 5), ) Chan, RH, Yang, J, Yuan, X: Alternating direction method for image inpainting in wavelet domains SIAM J Imaging Sci 43), ) 3 Goldstein, T, Osher, S: The split Bregman method for L-regularized problems SIAM J Imaging Sci ), ) 4 Gabay, D, Mercier, B: A dual algorithm for the solution of nonlinear variational problems via finite element approximation Comput Math Appl ), ) 5 Douglas, J Jr, Rachford, HH Jr: On the numerical solution of heat conduction problems in two and three space variables Trans Am Math Soc 8, ) 6 He, B, Tao, M, Xu, M, Yuan, X: An alternating direction-based contraction method for linearly constrained separable convex programming problems Optimization 64), )

25 J Optim Theory Appl 04) 63: Blum, E, Oettli, W: Grundlagen und Verfahren Springer, Berlin 975) Grundlagen und Verfahren, Mit einem Anhang Bibliographie zur Nichtlinearer Programmierung, Öonometrie und Unternehmensforschung, No XX 8 Bertseas, DP, Tsitsilis, JN: Parallel and Distributed Computation: Numerical Methods Prentice- Hall, Upper Saddle River 989) 9 He, B, Liao, L, Qian, M: Alternating projection based prediction-correction methods for structured variational inequalities J Comput Math 46), ) 0 He, B, Liao, L, Han, D, Yang, H: A new inexact alternating directions method for monotone variational inequalities Math Program, Ser A 9), ) Chen, G, Teboulle, M: A proximal-based decomposition method for convex minimization problems Math Program, Ser A 64), ) Rocafellar, RT: Monotone operators and the proximal point algorithm SIAM J Control Optim 45), ) 3 Teboulle, M: Convergence of proximal-lie algorithms SIAM J Optim 74), ) 4 Tseng, P: Alternating projection-proximal methods for convex programming and variational inequalities SIAM J Optim 74), ) 5 He, B, Xu, M: A general framewor of contraction methods for monotone variational inequalities Pac J Optim 4), ) 6 Nemirovsi, A: Prox-method with rate of convergence O/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems SIAM J Optim 5), ) Electronic) 7 Facchinei, F, Pang, JS: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol I Springer Series in Operations Research Springer, New Yor 003) 8 Cai, X, Gu, G, He, B: Onthe O/t) convergence rate of the projection and contraction methods for variational inequalities with Lipschitz continuous monotone operators Comput Optim Appl 03) doi:0007/s

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of