Self Equivalence of the Alternating Direction Method of Multipliers

Size: px

Start display at page:

Download "Self Equivalence of the Alternating Direction Method of Multipliers"

Mildred Joseph
5 years ago
Views:

1 Self Equivalence of the Alternating Direction Method of Multipliers Ming Yan and Wotao Yin Abstract The alternating direction method of multipliers (ADM or ADMM) breaks a comple optimization problem into much simpler subproblems. The ADM algorithms are tpicall short and eas to implement et ehibit (nearl) state-of-the-art performance for large-scale optimization problems. To appl ADM, we first formulate a given problem into the ADM-read form, so the final algorithm depends on the formulation. A problem like minimize u() + v(c) has si different ADM-read formulations. The can be in the primal or dual forms, and the differ b how dumm variables are introduced. To each ADMread formulation, ADM can be applied in two different orders depending on how the primal variables are updated. Finall, we get twelve different ADM algorithms! How do the compare to each other? Which algorithm should one choose? In this chapter, we show that man of the different was of appling ADM are equivalent. Specificall, we show that ADM applied to a primal formulation is equivalent to ADM applied to its Lagrange dual; ADM is equivalent to a primal-dual algorithm applied to the saddle-point formulation of the same problem. These results are surprising since the primal and dual variables in ADM are seemingl treated ver differentl, and some previous work ehibit preferences in one over the other on specific problems. In addition, when one of the two objective functions is quadratic, possibl subject to an affine constraint, we show that swapping the update order of the two primal variables in ADM gives the same algorithm. These results identif the few trul different ADM algorithms for a problem, which generall have different forms of subproblems from which it is eas to pick one with the most computationall friendl subproblems. M. Yan Department of Computational Mathematics, Science and Engineering, Department of Mathematics, Michigan State Universit, East Lansing, MI 4884, USA. anm@math.msu.edu W. Yin Department of Mathematics, Universit of California, Los Angeles, CA 9009, USA. wotaoin@math.ucla.edu

2 Ming Yan and Wotao Yin Kewords: alternating direction method of multipliers, ADM, ADMM, Douglas- Rachford splitting (DRS), Peaceman-Rachford splitting (PRS), primal-dual algorithm Introduction The alternating direction method of multipliers (ADM or ADMM) is a ver popular algorithm with a wide range of applications in signal and image processing, machine learning, statistics, compressive sensing, and operations research. Combined with problem reformulation tricks, the method can reduce a complicated problem into much simpler subproblems. The vanilla ADM applies to a linearl-constrained problem with a separable conve objective function in the following ADM-read form: { minimize f () + g(), subject to A + B = b, (P) where functions f, g are proper, closed (i.e., lower semi-continuous), conve but not necessaril differentiable. ADM reduces (P) into two simpler subproblems and then iterativel updates,, as well as a multiplier (dual) variable z. Given ( k, k,z k ), ADM generates ( k+, k+,z k+ ) as follows. k+ argmin f () + (λ/) A + B k b + λ z k,. k+ argming() + (λ/) A k+ + B b + λ z k,. z k+ = z k + λ(a k+ + B k+ b), where λ > 0 is a fied parameter. We use since the subproblems do not necessaril have unique solutions. Since { f,a,} and {g,b,} are in smmetric positions in (P), swapping them does not change the problem. This corresponds to switching the order that and are updated in each iteration. But, since the variable updated first is used in the updating of the other variable, this swap leads to a different sequence of variables and thus a different algorithm. Note that the order switch does not change the per-iteration cost of ADM. Also note that one, however, cannot mi the two update orders at different iterations because it will generall cause divergence, even when the primal-dual solution to (P) is unique. For eample, let us appl ADMM with mied update orders of and and parameter λ = to the problem minimize, 0 + subject to = 0, which has the unique primal-dual solution (,,z ) = (0,0,). Set initial values ( 0, 0,z 0 ) = (,,). At odd iterations, we appl the update order:,, and z; at

3 Self Equivalence of the Alternating Direction Method of Multipliers even iterations, we appl the update order:,, and z. Then we obtain ( k, k,z k ) = (,,) for odd k and ( k, k,z k ) = (,,) for even k.. ADM works in man different was In spite of its popularit and vast literature, there are still simple unanswered questions about ADM: how man was can ADM be applied? and which was work better? Before answering these questions, let us eamine the following problem, to which we can find twelve different was to appl ADM: minimize u() + v(c), () where u and v are proper, closed, conve functions and C is a linear mapping. Problem () generalizes a large number of signal and image processing problems, inverse problems, and machine learning models. We shall reformulate () into the form of (P). B introducing dumm variables in two different was, we obtain two ADM-read formulations of problem (): { minimize u() + v(), subject to C = 0 and { minimize u() + v(cȳ),ȳ subject to ȳ = 0. () If C = I, these two formulations are eactl the same. In addition, we can derive the dual problem of (): minimize u ( C v) + v (v), () v where u,v are the conve conjugates (i.e., Legendre transforms) of functions u,v, respectivel, C is the adjoint of C, and v is the dual variable. (The steps to derive () from () are standard and thus omitted.) Then, we also reformulate () into two ADM-read forms, which use different dumm variables: { minimize u (u) + v (v) u,v subject to u + C v = 0 and { minimize u (C ū) + v (v) ū,v subject to ū + v = 0. (4) Clearl, ADM can be applied to all of the four formulations in () and (4), and including the update order swaps, there are eight different was to appl ADM. Under some technical conditions such as the eistence of saddle-point solutions, all the eight ADM will converge to a saddle-point solution or solution for problem (). In short, the all work. It is worth noting that b the Moreau identit, the subproblems involving u and v can be easil reduced to subproblems involving u and v, respectivel. No significant computing is required. The two formulations in (), however, lead to significantl different ADM subproblems. In the ADM applied to the left formulation, u and C will appear in one

4 4 Ming Yan and Wotao Yin subproblem and v in the other subproblem. To the right formulation, u will be alone while v and C will appear in the same subproblem. This difference applies to the two formulations in (4) as well. It depends on the structures of u,v,c to determine the better choices. Therefore, out of the eight, four will have (more) difficult subproblems than the rest. There are another four was to appl ADM to problem (). Ever one of them will have three subproblems that separatel involve u, v, C, so the are all different from the above eight. To get the first two, let us take the left formulation in () and introduce a dumm variable s, obtaining a new equivalent formulation minimize,,s u(s) + v() subject to C = 0, s = 0. It turns out that the same dumm variable trick applied to the right formulation in () also gives (), up to a change of variable names. Although there are three variables, we can group (,s) and treat and (,s) as the two variables. Then problem () has the form (P). Hence, we have two was to appl ADM to () with two different update orders. Note that and s do not appear together in an equation or function, so the ADM subproblem that updates (, s) will further decouple to two separable subproblems of and s; in other words, the resulting ADM has three subproblems involving {, C},{, v}, {s, u} separatel. The other two was are results of the same dumm variable trick applied to either formulation in (4). Again, since now C has its own subproblem, these four was are distinct from the previous eight was. As demonstrated through an eample, there are quite man was to formulate the same optimization problem into ADM-read forms and obtain different ADM algorithms. While most ADM users choose just one wa without paing much attention to the other choices, some show preferences toward a specific formulation. For eample, some prefer () over those in () and (4) since C, u, v all end up in separate subproblems. When appling ADM to certain l minimization problems, the authors of [4, ] emphasize on the dual formulations, and later the authors of [] show a preference over the primal formulations. When ADM was proposed to solve a traffic equilibrium problem, it was first applied to the dual formulation in [] and, ears later, to the primal formulation in []. Regarding which one of the two variables should be updated first in ADM, neither a rule nor an equivalence claim is found in the literature. Other than giving preferences to ADM with simpler subproblems, there is no results that compare the different formulations. (). Contributions This chapter shows that, applied to certain pairs of different formulations of the same problem, ADM will generate equivalent sequences of variables that can be

5 Self Equivalence of the Alternating Direction Method of Multipliers mapped eactl from one to another at ever iteration. Specificall, between the sequence of an ADM algorithm on a primal formulation and that on the corresponding dual formulation, such maps eist. For a special class of problems, this mapping is provided in [9]. We also show that whenever at least one of f and g is a quadratic function (including affine function as a special case), possibl subject to an affine constraint, the sequence of an ADM algorithm can be mapped to that of the ADM algorithm using the opposite order for updating their variables. Abusing the word equivalence, we sa that ADM has primal-dual equivalence and update-order equivalence (with a quadratic objective function). Equivalent ADM algorithms take the same number of iterations to reach the same accurac. (However, it is possible that one algorithm is slightl better than the other in terms of numerical stabilit, for eample, against round-off errors.) Equipped with these equivalence results, the first eight was to appl ADM to problem () that were discussed in Section. are reduced to four was in light of primal-dual equivalence, and the four will further reduce to two whenever u or v, or both, is a quadratic function. The last four was to appl ADM on problem () discussed in Section., which ield three subproblems that separatel involve u, v, and C, are all equivalent and reduce to just one due to primal-dual equivalence and one variable in them is associated with 0 objective (for eample, variable has 0 objective in problem ()). Take the l p -regularization problem, p [, ], minimize p + f (C) (6) as an eample, which is a special case of problem () with a quadratic function u when p =. We list its three different formulations, whose ADM algorithms are trul different, as follows. When p and f is non-quadratic, each of the first two formulations leads to a pair of different ADM algorithms with different orders of variable update; otherwise, each pair of algorithms is equivalent.. Left formulation of (): { minimize p + f (), subject to C = 0. The subproblem for involves l p -norm and C. The other one for involves f.. Right formulation of (): { minimize p + f (C), subject to = 0. The subproblem for involves l p -norm and, for p = and, has a closed-form solution. The other subproblem for involves f (C ).. Formulation (): for an µ > 0,

6 6 Ming Yan and Wotao Yin minimize s p + f (),,s subject to C = 0, µ( s) = 0. The subproblem for is a quadratic program involving C C+ µi. The subproblem for s involves l p -norm. The subproblem for involves f. The subproblems for s and are independent. The best choice depends on which has the simplest subproblems. The result of ADM s primal-dual equivalence is surprising for three reasons. Firstl, ADM iteration updates two primal variable, k and k in (P) and one dual variable, all in different manners. The updates to the primal variables are done in a Gauss-Seidel manner and involve minimizing functions f and g, but the update to the dual variable is eplicit and linear. Surprisingl, ADM actuall treats one of the two primal variables and the dual variable equall as we will later show. Secondl, most literature describes ADM as an ineact version of the augmented Lagrangian method (ALM) [7], which updates (, ) together rather than one after another. Although ALM maintains the primal variables, under the hood ALM is the dual-onl proimal-point algorithm that iterates the dual variable. It is commonl believed that ADM is an ineact dual algorithm. Thirdl, primal and dual problems tpicall have different sizes and regularit properties, causing the same algorithm, even if it is applicable to both, to ehibit different performance. For eample, the primal and dual variables ma have different dimensions. If the primal function f is Lipschitz differentiable, the dual function f is strongl conve but can be non-differentiable, and vice versa. Such primal-dual differences often mean that it is numericall advantageous to solve one rather than the other, et our result means that there is no such primal-dual difference on ADM. Our maps between equivalent ADM sequences have ver simple forms, as the reader will see below. Besides the technical proofs that establish the maps, it is interesting to mention the operator-theoretic perspective of our results. It is shown in [] that the dual-variable sequence of ADM coincides with a sequence of the Douglas- Rachford splitting (DRS) algorithm [7, 8]. Our ADM s primal-dual equivalence can be obtained through the above ADM DRS relation and the Moreau identit: pro h +pro h = I, applied to the proimal maps of f and f and those of g and g. The details are omitted in this chapter. Here, pro h () := argmin s h(s) + s. Our results of primal-dual and update-order equivalence for ADM etends to the Peaceman-Rachford splitting (PRS) algorithm. Let the PRS operator [9] be denoted as T PRS = (pro f I) (pro g I). The DRS operator is the average of the identit map and the PRS operator: T DRS = I + T PRS, and the relaed PRS (RPRS) operator is a weighted-average: T RPRS = ( α)i + αt PRS, where α (0,]. The DRS and PRS algorithms that iterativel appl their operators to find a fied point were originall proposed for evolving PDEs with two spatial dimensions in the 90s and then etended to finding a root of the sum of two maimal monotone (set-valued) mappings b Lions and Mercier [8]. Eckstein showed, in [8, Chapter.], that DRS/PRS applied to the primal problem () is equivalent

7 Self Equivalence of the Alternating Direction Method of Multipliers 7 to DRS/PRS applied to the dual problem (4) when C = I. We will show that RPRS applied to () is equivalent to RPRS applied to () for all C. In addition to the aforementioned primal-dual and update-order equivalence, we obtain a primal-dual algorithm for the saddle-point formulation of (P) that is also equivalent to the ADM. This primal-dual algorithm is generall different from the primal-dual algorithm proposed b Chambolle and Pock [], while the become the same in a special case. The connection between these two algorithms will be eplained. Even when using the same number of dumm variables, trul different ADM algorithms can have different iteration compleities (do not confuse them with the difficulties of their subproblems). The convergence analsis of ADM, such as conditions for sublinear or linear convergence, involves man different scenarios [4 6]. The discussion of convergence rates of ADM algorithms is beond the scope of this chapter. Our focus is on the equivalence.. Organization This chapter is organized as follows. Section specifies our notation, definitions, and basic assumptions. The three equivalence results for ADM are shown in Sections 4,, and 6: The primal-dual equivalence of ADM is discussed in Sections 4; ADM is shown to be equivalent to a primal-dual algorithm applied to the saddlepoint formulation in Section ; In Section 6, we show the update-order equivalence of ADM if f or g is a quadratic function, possibl subject to an affine constraint. Sections 4-6 do not require an knowledge of monotone operators. The primal-dual and update-order equivalence of RPRS is shown in Section 7 based on monotone operator properties. We conclude this chapter with the application of our results on total variation image denoising in Section 8. Notation, definitions, and assumptions Let H, H, and G be (possibl infinite dimensional) Hilbert spaces. Bold lowercase letters such as,, u, and v are used for points in the Hilbert spaces. In the eample of (P), we have H, H, and b G. When the Hilbert space a point belongs to is clear from the contet, we do not specif it for the sake of simplicit. The inner product between points and is denoted b,, and :=, is the corresponding norm; and denote the l and l norms, respectivel. Bold uppercase letters such as A and B are used for both continuous linear mappings and matrices. A denotes the adjoint of A. I denotes the identit mapping. If C is a conve and nonempt set, the indicator function ι C is defined as follows:

8 8 Ming Yan and Wotao Yin { 0, if C, ι C () =, if / C. Both lower and upper case letters such as f, g, F, and G are used for functions. Let f () be the subdifferential of function f at. The proimal operator pro f ( ) of function f is defined as pro f ( ) () = argmin f () +, where the minimization problem has a unique solution. The conve conjugate f of function f is defined as f (v) = sup{ v, f ()}. Let L : H G, the infimal postcomposition [, Def..] of f : H (,+ ] b L is given b with dom(l f ) = L(dom( f )). L f : s inf f (L (s)) = inf f (), :L=s Lemma. If f is conve and L is affine and epressed as L( ) = A +b, then L f is conve and the conve conjugate of L f can be found as follows: (L f ) ( ) = f (A ) +,b. Proof. Following from the definitions of conve conjugate and infimal postcomposition, we have (L f ) (v) = sup v, L f () = sup v,a + b f () = sup A v, f () + v,b = f (A v) + v,b. Definition. We sa that an algorithm I applied to a problem is equivalent to an algorithm II applied to either the same or an equivalent problem if, given the set of parameters and a sequence of iterates {ξ k } k 0 of algorithm II, i.e., ξ k+ = A (ξ k k,ξ,,ξ k ) with 0, there eist a set of parameters and a sequence of iterates {ξ k } k 0 of algorithm I such that ξ k = T (ξ k k,ξ,,ξ k ) for some transformation T and 0. Definition. An optimization algorithm is called primal-dual equivalent if this algorithm applied to the primal formulation is equivalent to the same algorithm applied to its Lagrange dual. It is important to note that most algorithms are not primal-dual equivalent. ALM applied to the primal problem is equivalent to proimal point method applied to

9 Self Equivalence of the Alternating Direction Method of Multipliers 9 the dual problem [0], but both algorithms are not primal-dual equivalent. In this chapter, we will show that ADM and RPRS are primal-dual equivalent. We make the following assumptions throughout the chapter: Assumption All the functions in this chapter are assumed to be proper, closed, and conve. Assumption The saddle-point solutions to all the optimization problems in this chapter are assumed to eist. Equivalent problems A primal formulation equivalent to (P) is { minimize F(s) + G(t) s,t subject to s + t = 0, (P) where s,t G and F(s) := min f () + ι {:A=s} (), (7a) G(t) := ming() + ι {:B b=t} (). (7b) Remark. If we define L f and L g as L f () = A and L g () = B b, respectivel, then F = L f f, G = L g g. The Lagrange dual of (P) is minimize v f ( A v) + g ( B v) + v,b, (8) ( ) which can be derived from minimize v min, with the Lagrangian defined as follows: An ADM-read formulation of (8) is L(,,v) = f () + g() + v,a + B b. { minimize f ( A u) + g ( B v) + v,b u,v subject to u v = 0. (D) When ADM is applied to an ADM-read formulation of the Lagrange dual problem, we call it Dual ADM. The original ADM is called Primal ADM.

10 0 Ming Yan and Wotao Yin is Following similar steps, the ADM read formulation of the Lagrange dual of (P) { minimize F ( u) + G ( v) u,v subject to u v = 0. (D) The equivalence between (D) and (D) is trivial since F (u) = f (A u), G (v) = g (B v) v,b, which follows from Lemma. Although there can be multiple equivalent formulations of the same problem (e.g., (P), (P), (8), and (D)/(D) are equivalent), an algorithm ma or ma not be applicable to some of them. Even when the are, on different formulations, their behaviors such as convergence and speed of convergence are different. In particular, most algorithms have different behaviors on primal and dual formulations of the same problem. An algorithm applied to a primal formulation does not dictate the behaviors of the same algorithm applied to the related dual formulation. The simple method in linear programming has different performance when applied to both the primal and dual problems, i.e., the primal simple method starts with a primal basic feasible solution (dual infeasible) until the dual feasibilit conditions are satisfied, while the dual simple method starts with a dual basic feasible solution (primal infeasible) until the primal feasibilit conditions are satisfied. The ALM also has different performance when applied to the primal and dual problems, i.e., ALM applied to the primal problem is equivalent to proimal point method applied to the related dual problem, and proimal point method is, in general, different from ALM on the same problem. 4 Primal-dual equivalence of ADM In this section we show the primal-dual equivalence of ADM. Algorithms - describe how ADM is applied to (P), (P), and (D)/ (D) [4, ]. Algorithm ADM on (P) initialize 0, z0, λ > 0 for k = 0,, do k+ argming() + (λ) A k + B b + λzk k+ argmin f () + (λ) A + B k+ b + λz k z k+ = z k + λ (A k+ + B k+ b) end for

11 Self Equivalence of the Alternating Direction Method of Multipliers Algorithm ADM on (P) initialize s 0, z0, λ > 0 for k = 0,, do t k+ = argming(t) + (λ) s k + t + λzk s k+ = argmin t s z k+ = z k + λ (s k+ + t k+ end for F(s) + (λ) s + t k+ + λz k ) Algorithm ADM on (D)/(D) initialize u 0, z0, λ > 0 for k = 0,, do v k+ = argming ( v) + λ uk v + λ z k u k+ v = argmin u z k+ = z k + λ(uk+ v k+ end for F ( u) + λ u vk+ + λ z k ) The k and k in Algorithm ma not be unique because of the matrices A and B, while A k and Bk are unique. In addition, Ak and Bk are calculated for twice and thus stored in the implementation of Algorithm to save the second calculation. Following the equivalence of Algorithms and in Part of the following Theorem, we can view problem (P) as the master problem of (P). We can sa that ADM is essentiall an algorithm applied onl to the master problem (P), which is Algorithm ; this fact has been obscured b the often-seen Algorithm, which integrates ADM on the master problem with the independent subproblems in (7). Theorem (Equivalence of Algorithms -). Suppose A 0 = s0 = z0 and z0 = z 0 = u0 and that the same parameter λ is used in Algorithms -. Then, their equivalence can be established as follows:. From k, k, zk of Algorithm, we obtain tk, sk, zk of Algorithm through: t k = Bk b, s k = Ak, z k = zk. (9a) (9b) (9c) From t k, sk, zk of Algorithm, we obtain k, k, zk of Algorithm through: k = argmin {g() : B b = t k }, (0a) k = argmin { f () : A = s k }, (0b) z k = zk. (0c)

12 Ming Yan and Wotao Yin. We can recover the iterates of Algorithms and from each other through u k = zk, zk = sk. () Proof. Part. Proof b induction. We argue that under (9b) and (9c), Algorithms and have essentiall identical subproblems in their first steps at the kth iteration. Consider the following problem, which is obtained b plugging the definition of G( ) into the t k+ -subproblem of Algorithm : ( k+,t k+ ) = argming() + ι {(,t):b b=t} (,t) + (λ) s k + t + λzk. (),t If one minimizes over first while keeping t as a variable, one eliminates and recovers the t k+ -subproblem of Algorithm. If one minimizes over t first while keeping as a variable, then after plugging in (9b) and (9c), problem () reduces to the k+ -subproblem of Algorithm. In addition, ( k+,t k+ ) obes t k+ = B k+ b, () which is (9a) at k +. Plugging t = t k+ into () ields problem (0a) for k+, which must be equivalent to the k+ -subproblem of Algorithm. Therefore, the k+ -subproblem of Algorithm and the t k+ -subproblem of Algorithm are equivalent through (9a) and (0a) at k +, respectivel. Similarl, under () and (9c), we can show that the k+ -subproblem of Algorithm and the s k+ -subproblem of Algorithm are equivalent through the formulas for (9b) and (0b) at k +, respectivel. Finall, under (9a) and (9b) at k + and z k = zk, the formulas for zk+ and z k+ in Algorithms and are identical, and the return z k+ = z k+, which is (9c) and (0c) at k +. Part. Proof b induction. Suppose that () holds. We shall show that () holds at k +. Starting from the optimalit condition of the t k+ -subproblem of Algorithm, we derive 0 G(t k+ ) + λ (s k + tk+ + λz k ) t k+ G ( λ (s k + tk+ + λz k )) [ ] λ λ (s k + tk+ + λz k ) (λz k + sk ) G ( λ (s k + tk+ + λz k )) [ ] λ (s k + tk+ + λz k ) + (λu k + zk ) G ( λ (s k + tk+ λ + λz k )) [ ] 0 G ( λ (s k + tk+ + λz k )) λ u k λ (s k + tk+ + λz k ) + λ z k v k+ = λ (s k + tk+ + λz k ) = λ (z k + tk+ + λz k ), where the last equivalence follows from the optimalit condition for the v k+ - subproblem of Algorithm.

13 Self Equivalence of the Alternating Direction Method of Multipliers Starting from the optimalit condition of the s k+ -subproblem of Algorithm, and appling the update, z k+ = z k +λ (s k+ +t k+ ), in Algorithm and the identit of t k+ obtained above, we derive 0 F(s k+ ) + λ (s k+ + t k+ + λz k ) 0 F(s k+ ) + z k+ 0 s k+ F ( z k+ ) 0 λ(z k+ z k ) tk+ F ( z k+ ) 0 λ(z k+ z k ) + zk + λ(zk vk+ ) F ( z k+ ) 0 F ( z k+ ) + λ(z k+ v k+ + λ z k ) z k+ = u k+. where the last equivalence follows from the optimalit condition for the u k+ - subproblem of Algorithm. Finall, combining the update formulas of z k+ and z k+ in Algorithms and, respectivel, as well as the identities for u k+ and v k+ obtained above, we obtain z k+ = z k + λ(uk+ v k+ ) = s k + λ(z k+ z k λ (s k + tk+ )) = λ(z k+ z k ) tk+ = s k+. Remark. Part of the theorem (ADM s primal-dual equivalence) can also be derived b combining the following two equivalence results: (i) the equivalence between ADM on the primal problem and the Douglas-Rachford splitting (DRS) algorithm [7, 8] on the dual problem [], and (ii) the equivalence result between DRS algorithms applied to the master problem (P) and its dual problem (cf. [8, Chapter.] [9]). In this chapter, however, we provide an elementar algebraic proof in order to derive the formulas in Theorem that recover the iterates of one algorithm from another. Part of the theorem shows that ADM is a smmetric primal-dual algorithm. The reciprocal positions of parameter λ indicates its function to balance the primal and dual progresses. Part of the theorem also shows that Algorithms and have no difference, in terms of per-iteration compleit and the number of iterations needed to reach an accurac. However, Algorithms and have difference in terms of per-iteration compleit. In fact, Algorithm is implemented for Algorithm because Algorithm has smaller compleit than Algorithm. See the eamples in Sections 4. and 4..

14 4 Ming Yan and Wotao Yin 4. Primal-dual equivalence of ADM on () with three subproblems In Section., we introduced four different was to appl ADM on () with three subproblems. The ADM-read formulation for the primal problem is (), and the ADM applied to this formulation is k+ = argmin s k + λz k s + C k + λz k, s k+ = argminu(s) + (λ) k+ s + λz k s, s k+ = argminv() + (λ) C k+ + λz k, (4a) (4b) (4c) z k+ s = z k s + λ ( k+ s k+ ), (4d) z k+ = z k + λ (C k+ k+ ). (4e) Similarl, we can introduce a dumm variable t into the left formulation in (4) and obtain a new equivalent formulation { minimize u (u) + v (t) u,v,t subject to C v + u = 0, v t = 0. () The ADM applied to () is v k+ = argmin C v + u k + λ z k u + v t k + λ z k t, v u k+ = argminu (u) + λ u C v k+ + u + λ z k u, (6a) (6b) t k+ = argminv (t) + λ t vk+ t + λ z k t, (6c) z k+ u = z k u + λ(c v k+ + u k+ ), (6d) z k+ t = z k t + λ(v k+ t k+ ). (6e) Interestingl, as shown in the following theorem, ADM algorithms (4) and (6) applied to () and () are equivalent. Theorem. If the initialization for algorithms (4) and (6) satisfies z 0 = t 0, z 0 s = u 0, s 0 = z 0 u, and 0 = z 0 t. Then for k, we have the following equivalence results between the iterations of the two algorithms: z k = t k, z k s = u k, s k = z k u, k = z k t. The proof is similar to the proof of Theorem and is omitted here.

15 Self Equivalence of the Alternating Direction Method of Multipliers 4. Eample: basis pursuit The basis pursuit problem seeks for the minimal l solution to a set of linear equations: Its Lagrange dual is minimize u subject to Au = b. (7) u minimize b T subject to A. (8) The YALL algorithms [4] implement ADMs on a set of primal and dual formulations for basis pursuit and LASSO, et ADM for (7) is not given (however, a linearized ADM is given for (7)). Although seemingl awkward, problem (7) can be turned equivalentl into the ADM-read form minimize v + ι {u:au=b} (u) subject to u v = 0. (9) u,v Similarl, problem (8) can be turned equivalentl into the ADM-read form minimize b T + ι B, () subject to A = 0, (0) where B = { : }. For simplicit, let us suppose that A has full row rank so the inverse of AA eists. (Otherwise, Au = b are redundant whenever the are consistent; and (AA ) shall be replaced b the pseudo-inverse below.) ADM for problem (9) can be simplified to the iteration: v k+ =argmin v + λ uk v + λ zk, v (a) u k+ =v k+ λ zk A (AA ) (A(v k+ λ zk ) b), (b) z k+ =z k + λ(uk+ v k+ ). (c) And ADM for problem (0) can be simplified to the iteration: k+ =P B (A k + λzk ), (a) k+ =(AA ) (A k+ λ(az k b)), (b) z k+ =z k + λ (A k+ k+ ), (c) where P B is the projection onto B. Looking into the iteration in (), we can find that A k is used in both the kth and k + st iterations. To save the computation, we can store A k as sk. In addition, let tk = k and zk = zk, we have

16 6 Ming Yan and Wotao Yin t k+ =P B (s k + λzk ), (a) s k+ =A (AA ) (A(t k+ λaz k ) + λb)), (b) z k+ =z k + λ (s k+ t k+ ), (c) which is eactl Algorithm for (0). Thus, Algorithm has smaller compleit than Algorithm, i.e., one matri vector multiplication A k is saved from Algorithm. The corollar below follows directl from Theorem b associating (0) and (9) as (P) and (D), and () and () with the iterations of Algorithms and, respectivel. Corollar. Suppose that Au = b are consistent. Consider ADM iterations () and (). Let u 0 = z0 and z0 = A 0. Then, for k, iterations () and () are equivalent. In particular, From k, zk in (), we obtain uk, zk in () through: u k = zk, zk = A k. From u k, zk in (), we obtain k, zk in () through: k = (AA ) Az k, zk = uk. 4. Eample: basis pursuit denoising The basis pursuit denoising problem is and its Lagrange dual, in the ADM-read form, is minimize u + u α Au b (4) minimize b, + α, + ι B () subject to A = 0. () The iteration of ADM for () is k+ =P B (A k + λzk ), (6a) k+ =(AA + αλi) (A k+ λ(az k b)), (6b) z k+ =z k + λ (A k+ k+ ). (6c) Looking into the iteration in (6), we can find that A k is used in both the kth and k + st iterations. To save the computation, we can store A k as sk. In addition, let t k = k and zk = zk, we have

17 Self Equivalence of the Alternating Direction Method of Multipliers 7 t k+ =P B (s k + λzk ), (7a) s k+ =A (AA + αλi) (A(t k+ λz k ) + λb)), (7b) z k+ =z k + λ (s k+ t k+ ), (7c) which is eactl Algorithm for (). Thus, Algorithm has a lower per iteration compleit than Algorithm, i.e., one matri vector multiplication A k is saved from Algorithm. In addition, if A A = I, (7b) becomes s k+ =(αλ + ) (t k+ λz k + λa b), (8) and no matri vector multiplications is needed during the iteration because λa b can be precalculated. The ADM-read form of the original problem (4) is whose ADM iteration is minimize v + u,v α Au b subject to u v = 0, (9) v k+ =argmin v + λ uk v + λ zk, v (0a) u k+ =(A A + αλi) (A b + αλv k+ αz k ), (0b) z k+ =z k + λ(uk+ v k+ ). (0c) The corollar below follows directl from Theorem. Corollar. Consider ADM iterations (6) and (0). Let u 0 = z0 and z0 = A 0. For k, ADM on the dual and primal problems (6) and (0) are equivalent in the following wa: From k, zk in (6), we recover uk, zk in (0) through: u k = zk, zk = A k. From u k, zk in (0), we recover k, zk in (6) through: k = (Auk b)/α, zk = uk. Remark. Iteration (0) is different from that of ADM for another ADM-read form of (4) minimize u,v u + α v subject to Au v = b, ()

18 8 Ming Yan and Wotao Yin which is used in [4]. In general, there are different ADM-read forms and their ADM algorithms ield different iterates. ADM on one ADM-read form is equivalent to it on the corresponding dual ADM-read form. ADM as a primal-dual algorithm on the saddle-point problem As shown in Section 4, ADM on a pair of conve primal and dual problems are equivalent, and there is a connection between z k in Algorithm and dual variable u k in Algorithm. This primal-dual equivalence naturall suggests that ADM is also equivalent to a primal-dual algorithm involving both primal and dual variables. We derive problem (P) into an equivalent primal-dual saddle-point problem () as follows: min, g() + f () + ι {(,):A=b B}(,) =ming() + F(b B) =minma g() + u,b B u F ( u) () =minma g() + u,b b f ( A u). () u A primal-dual algorithm for solving () is described in Algorithm 4. Theorem establishes the equivalence between Algorithms and 4. Algorithm 4 Primal-dual formulation of ADM on problem () initialize u 0 4, u 4, 0 4, λ > 0 for k = 0,, do ū k 4 = uk 4 uk 4 k+ 4 = argming() + (λ) B B k 4 + λūk 4 u k+ end for 4 = argmin u f ( A u) u,b k+ 4 b + λ/ u u k 4 Remark 4. Publication [] proposed a primal-dual algorithm for () and obtained its connection to ADM [0]: When B = I, ADM is equivalent to the primal-dual algorithm in []; When B I, the primal-dual algorithm is a preconditioned ADM as an additional proimal term δ/ k 4 (λ) B B k 4 is added to the subproblem for k+ 4. This is also a special case of ineact ADM in [6]. Our Algorithm 4 is a primal-dual algorithm that is equivalent to ADM in the general case. Theorem (Equivalence between Algorithms and 4). Suppose that A 0 = λ(u 0 4 u 4 ) + b B0 4 and z0 = u0 4. Then, Algorithms and 4 are equivalent with

19 Self Equivalence of the Alternating Direction Method of Multipliers 9 the identities: for all k > 0. A k = λ(uk 4 uk 4 ) + b B k 4, zk = uk 4, (4) Proof. B assumption, (4) holds at iteration k = 0. Proof b induction. Suppose that (4) holds at iteration k 0. We shall establish (4) at iteration k +. From the first step of Algorithm, we have k+ =argming() + (λ) A k + B b + λzk =argming() + (λ) λ(u k 4 uk 4 ) + B B k 4 + λuk 4, which is the same as the first step in Algorithm 4. Thus we have k+ = k+ Combing the second and third steps of Algorithm, we have 4. 0 f ( k+ ) + λ A (A k+ + B k+ b + λz k ) = f (k+ ) + A z k+. Therefore, k+ f ( A z k+ ) = A k+ F ( z k+ ) λ(z k+ z k ) + b Bk+ F ( z k+ ) z k+ = argminf ( z) z,b k+ b + λ/ z z k z z k+ = argmin f ( A z) z,b k+ 4 b + λ/ z u k 4, z where the last line is the second step of Algorithm 4. Therefore, we have z k+ = u k+ 4 and A k+ = λ(z k+ z k ) + b Bk+ = λ(u k+ 4 u k 4 ) + b Bk Equivalence of ADM for different orders In both problem (P) and Algorithm, we can swap and and obtain Algorithm, which is still an ADM algorithm. In general, the two algorithms are different. In this section, we show that for a certain tpe of functions f (or g), Algorithms and become equivalent.

20 0 Ming Yan and Wotao Yin Algorithm ADM on (P) initialize 0, z0, λ > 0 for k = 0,, do k+ = argmin f () + (λ) A + B k b + λzk k+ = argmin g() + (λ) A k+ + B b + λz k z k+ = z k + λ (A k+ + B k+ b) end for The assumption that we need is that either pro F( ) or pro G( ) is affine (cf. (7) for the definitions of F and G). The definition of affine mapping is given in Definition. Definition. A mapping T is affine if T (r) T (0) is linear in r, i.e., T (αr + βr ) T (0) = α[t (r ) T (0)] + β[t (r ) T (0)], α,β R. () A mapping T is affine if and onl if it can be written as a linear mapping plus a constant, and the following proposition provides several equivalent statements for pro G( ) being affine. Proposition. Let λ > 0. The following statements are equivalent:. pro G( ) is affine;. pro λg( ) is affine;. apro G( ) bi + ci is affine for an scalars a, b and c; 4. pro G ( ) is affine;. G is conve quadratic (or, affine or constant) and its domain dom(g) is either G or the intersection of hperplanes in G. In addition, if function g is conve quadratic and its domain is the intersection of hperplanes, then function G defined in (7b) satisfies Part above. Proposition. If pro G( ) is affine, then the following holds for an r and r : pro G( ) (r r ) = pro G( ) r pro G( ) r. (6) Proof. Equation (6) is obtained b letting α = and β = in (). Theorem 4 (Equivalence of Algorithms and ).. Assume that pro λg( ) is affine. Given the sequences k, zk, and k of Algorithm, if 0 and z0 satisf z0 G(B0 b), then we can initialize Algorithm with 0 = and z0 = z0 + λ (A + B0 b), and recover the sequences k and zk of Algorithm through k = k+, (7a) z k = zk + λ (A k+ + B k b). (7b)

21 Self Equivalence of the Alternating Direction Method of Multipliers. Assume that pro λf( ) is affine. Given the sequences k, zk, and k of Algorithm, if 0 and z0 satisf z0 F(A0 ), then we can initialize Algorithm with 0 = and z0 = z0 + λ (A 0 + B b), and recover the sequences k and z k of Algorithm through k = k+, (8a) z k = zk + λ (A k + Bk+ b). (8b) Proof. We prove Part onl b induction. (The proof for the other part is similar.) The initialization of Algorithm clearl follows (7) at k = 0. Suppose that (7) holds at k 0. We shall show that (7) holds at k +. We first show from the affine propert of pro λg( ) that B k+ = B k+ B k. (9) The optimization subproblems for and in Algorithms and, respectivel, are as follows: k+ = argming() + (λ) A k + B b + λzk, k+ = argmin Following the definition of G in (7), we have g() + (λ) A k+ + B b + λz k. B k+ b = pro λg( ) ( A k λzk ), (40a) B k+ b = pro λg( ) ( A k+ λz k ), (40b) B k The third step of Algorithm is b = pro λg( ) ( A k λzk ). (40c) z k = zk + λ (A k + Bk b). (4) (Note that for k = 0, the assumption z 0 G(B0 b) ensures the eistence of z in (40c) and (4).) Then, (7) and (4) give us A k + λzk (7) = A k+ + λz k + Ak+ + B k b = (A k+ + λz k ) (λzk Bk + b) (4) = (A k+ + λz k ) (Ak + λzk ). Since pro λg( ) is affine, we have (6). Once we plug in (6): r = A k+ λz k, r = A k λzk, and r r = A k λzk and then appl (40), we obtain (9). Net, the third step of Algorithm and (9) give us

22 Ming Yan and Wotao Yin B k+ b + λz k (9) = (B k+ b) (B k b) + λzk + (Ak+ + B k b) = (B k+ b) + λz k + (Ak+ + B k+ b) = (B k+ b) + λz k+. This identit shows that the updates of k+ and k+ in Algorithms and, respectivel, have identical data, and therefore, we recover k+ = k+. Lastl, from the third step of Algorithm and the identities above, it follows that z k+ = z k + λ (A k+ + B k+ ( b) ) = z k + λ A k+ + (B k+ b + λz k+ λz k ) = z k+ + λ (A k+ + B k+ b). Therefore, we obtain (7) at k +. Remark. We can avoid the technical condition z 0 G(B0 b) on Algorithm in Part of Theorem 4. When it does not hold, we can use the alwas-true relation z G(B b) instead; correspondingl, we shall add one iteration to the iterates of Algorithm, namel, initialize Algorithm with 0 = and z 0 = z + λ (A + B b) and recover the sequences k and zk of Algorithm through k = k+, (4a) z k = zk+ + λ (A k+ + B k+ b). (4b) Similar arguments appl to the other part of Theorem 4. 7 Equivalence results of relaed PRS In this section, we consider the following conve problem: minimize and its corresponding Lagrangian dual f () + g(a), (P) minimize v f (A v) + g ( v). (D) In addition, we introduce another primal-dual pair equivalent to (P)-(D): minimize minimize u ( f A ) () + g(), (P4) f (u) + (g A) ( u). (D4)

23 Self Equivalence of the Alternating Direction Method of Multipliers Here (P4) is obtained as the dual of (D) b reformulating (D) as minimize v, v f (A v) + g ( v) subject to v = v, and (D4) is obtained as the dual of (P) in a similar wa. Lemma below will establish the equivalence between the two primal-dual pairs. Remark 6. When A = I, we have ( f A ) = f, and problem (P) is eactl the same as problem (P4). Similarl, problem (D) is eactl the same as problem (D4). Lemma. Problems (P) and (P4) are equivalent in the following sense: Given an solution to (P), = A is a solution to (P4), Given an solution to (P4), argmin :A= f () is a solution to (P). The equivalence between problems (D) and (D4) is similar: Given an solution v to (D), A v is a solution to (D4), Given an solution u to (D4), v argmin v:a v=u g ( v) is a solution to (D). Proof. We prove onl the equivalence of (P) and (P4), the proof for the equivalence of (D) and (D4) is similar. Part : If is a solution to (P), we have 0 f ( )+A g(a ). Assume that there eists q such that q g(a ) and A q f ( ). Then we have Therefore, A q f ( ) f (A q) = A A f (A q) = ( f A )(q) q ( f A ) (A ). 0 ( f A ) (A ) + g(a ) and A is a solution to (P4). Part : If is a solution to (P4), the optimalit condition gives us 0 ( f A ) ( ) + g( ). Assume that there eists q such that q g( ) and q ( f A ) ( ). Then we have q ( f A ) ( ) ( f A )(q). (4) Consider the following optimization problem for finding from minimize and the corresponding dual problem f () subject to A =,

24 4 Ming Yan and Wotao Yin maimize f (A v) + v,. v It is eas to obtain from (4) that q is a solution of the dual problem. The optimal dualit gap is zero and the strong dualit gives us f ( ) = f ( ) q,a = f (A q) + q,. (44) Thus is a solution of minimize Because q g( ) = g(a ), f () A q, and A q f ( ) 0 f ( ) A q. (4) 0 f ( ) + A g(a ) = f ( ) + (g A)( ) (46) Therefore is a solution of (P). Net we will show the equivalence between the RPRS to the primal and dual problems: RPRS on (P) RPRS on (D4) RPRS on (P4) RPRS on (D) We describe the RPRS on (P) in Algorithm 6, and the RPRS on other problems can be obtained in the same wa. Algorithm 6 RPRS on (P) initialize w 0, λ > 0, 0 < α. for k = 0,, do k+ = pro λ f ( ) w k w k+ = ( α)w k + α(pro λg A( ) I)( k+ w k ) end for Theorem. [Primal-dual equivalence of RPRS] RPRS on (P) is equivalent to RPRS on (D4). RPRS on (P4) is equivalent to RPRS on (D). Before proving this theorem, we introduce a lemma, which was also given in [8, Proposition.4]. Here, we prove it in a different wa using the generalized Moreau decomposition. Lemma. For λ > 0, we have λ (pro λf( ) I)w = (I pro λ F ( ) )(w/λ) = (pro λ F ( ) I)( w/λ). (47)

25 Self Equivalence of the Alternating Direction Method of Multipliers Proof. We prove it using the generalized Moreau decomposition [, Theorem..] w = pro λf( ) (w) + λpro λ F ( )(w/λ). (48) Using the generalized Moreau decomposition, we have λ (pro λf( ) I)w = λ pro λf( ) (w) w/λ The last equalit of (47) comes from (48) = λ (w λpro λ F ( )(w/λ)) w/λ = w/λ pro λ F ( ) (w/λ) = (I pro λ F ( ) )(w/λ). pro λ F ( ) ( w/λ) = pro λ F ( ) (w/λ). Proof (Proof of Theorem ). We will prove onl the equivalence of RPRS on (P) and (D4). The proof for the other equivalence is the same. The RPRS on (P) and (D4) can be formulated as and w k+ = ( α)w k + α(pro λg A( ) I)(pro λ f ( ) I)w k, (49) w k+ = ( α)w k + α(pro λ (g A) ( ) I)(pro λ f ( ) I)wk, (0) respectivel. In addition, we can recover the variables k (or v k ) from w k (or wk ) using the following: k+ = pro λ f ( ) w k, () v k+ = pro λ f ( ) wk. () Proof b induction. Suppose w k = wk /λ holds. We net show that wk+ = w k+ /λ. w k+ =( α)w k /λ + α(pro λ (g A) ( ) I)(pro λ f ( ) I)(wk /λ) (47) = ( α)w k /λ + α(pro λ (g A) ( ) I)( λ (pro λ f ( ) I)w k ) (47) = ( α)w k /λ + αλ (pro λ(g A)( ) I)(pro λ f ( ) I)w k =λ [( α)w k + α(pro λ(g A)( ) I)(pro λ f ( ) I)w k ] =w k+ /λ. In addition we have

26 6 Ming Yan and Wotao Yin k+ + λv k+ =pro λ f ( ) w k + λpro λ f ( ) wk =pro λ f ( ) w k + λpro λ f ( ) (wk /λ) = wk. Remark 7. Eckstein showed in [8, Chapter.] that DRS/PRS on (P) is equivalent to DRS/PRS on (D) when A = I. This special case can be obtained from this theorem immediatel because when A = I, (D) is eactl the same as (D4) and we have DRS/PRS on (P) DRS/PRS on (D4) DRS/PRS on (D) DRS/PRS on (P4). Remark 8. In order to make sure that RPRS on the primal and dual problems are equivalent, the initial conditions and parameters have to satisf conditions described in the proof of Theorem. We need the initial condition to satisf w 0 = w0 /λ and the parameter for RPRS on the dual problem has to be chosen as λ, see the differences in (49) and (0). Similarl to the ADM, we can swap f and g A and obtain a new RPRS. The iteration in Algorithm 6 can be written as w k+ = ( α)w k + α(pro λg A( ) I)(pro λ f ( ) I)w k, () and the RPRS after the swapping is w k+ = ( α)w k + α(pro λ f ( ) I)(pro λg A( ) I)w k. (4) We show below that for a certain tpe of function f (or g), () and (4) are equivalent. Theorem 6.. Assume that pro λ f ( ) is affine. If () and (4) initiall satisf w 0 = (pro λ f ( ) I)w 0, then wk = (pro λ f ( ) I)w k for all k.. Assume that pro λg A( ) is affine. If () and (4) initiall satisf w 0 = (pro λg A( ) I)w 0, then wk = (pro λg A( ) I)w k for all k. Proof. We onl prove the first statement, as the second one can be proved in a similar wa. We appl proof b induction. Suppose w k = (pro λ f ( ) I)w k holds. From (4), we have w k+ = ( α)(pro λ f ( ) I)w k + α(pro λ f ( ) I)(pro λg A( ) I)(pro λ f ( ) I)w k [ ] = (pro λ f ( ) I) ( α)w k + α(pro λg A( ) I)(pro λ f ( ) I)w k = (pro λ f ( ) I)w k+.

27 Self Equivalence of the Alternating Direction Method of Multipliers 7 The first equalit holds because pro λ f ( ) I is affine, which comes from the assumption that pro λ f ( ) is affine and Lemma. The second equalit comes from (). 8 Application: total variation image denoising ADM (or split Bregman [6]) has been applied on man image processing applications, and we appl the previous equivalence results of ADM to derive several equivalent algorithms for total variation denoising. The total variation (ROF model []) applied on image denoising is minimize + α BV (Ω) Ω b where stands for an image, and BV (Ω) is the set of all bounded variation functions on Ω. The first term is known as the total variation of, minimizing which tends to ield a piece-wise constant solution. The discrete version is as follows: minimize, + α b, where is a finite difference approimation of the gradient, which can be epressed as a linear operator. Without loss of generalit, we consider the twodimensional image, and the discrete total variation, of image is defined as, = ( ) i j, i j where is the -norm of a vector. The equivalent ADM-read form [6, Equation (.)] is minimize,, + α b subject to = 0, () and its dual problem in ADM-read form [, Equation (8)] is minimize v,u α div u + αb + ι {v: v, }(v) subject to u v = 0, (6) where v, = ma (v) i j and div u is the finite difference approimation of divergence that satisfies,div u =,u for an and u. In addition, the equivalent i j saddle-point problem is minimizemaimize v α b + v, ι {v: v, }(v). (7)

28 8 Ming Yan and Wotao Yin We list the following equivalent algorithms for solving the total variation image denoising problem. The equivalence result stated in Corollar can be obtained from Theorems -4.. Algorithm (primal ADM) on () is k+ =argmin k+ =argmin α b + (λ) k + λzk, (8a), + (λ) k+ + λz k, (8b) z k+ =z k + λ ( k+ k+ ). (8c). Algorithm (dual ADM) on (6) is =argmin α div u + αb + λ vk u + λ z k, u k+ u (9a) v k+ =argmin v ι {v: v, }(v) + λ v uk+ + λ z k, (9b) z k+ =z k + λ(vk+ u k+ ). (9c). Algorithm 4 (primal-dual) on (7) is v k 4 =vk 4 vk 4, (60a) α 4 =argmin b + (λ) k 4 + λ vk 4, (60b) k+ v k+ 4 =argmin v ι {v: v, }(v) v, k+ 4 + λ v vk. (60c) 4. Algorithm (primal ADM with order swapped) on () is k+ =argmin, + (λ) k + λzk, k+ =argmin (6a) α b + (λ) k+ + λz k, (6b) z k+ =z k + λ ( k+ k+ ). (6c) Corollar. Let 0 = b+α div z 0. If the initialization for all algorithms (8)-(6) satisfies 0 = z0 = 0 4 λ(v0 4 v 4 ) = and z0 = v0 = v0 4 = z0 + λ ( 0 ). Then for k, we have the following equivalence results between the iterations of the four algorithms: k = zk = k 4 λ(vk 4 vk 4 ) = k+, z k = vk = v k 4 = z k + λ ( k k+ ). Remark 9. In an of the four algorithms, the or div operator is separated in a different subproblem from the term, or its dual norm,. The or div op-

29 Self Equivalence of the Alternating Direction Method of Multipliers 9 erator is translation invariant, so their subproblems can be solved b a diagonalization trick []. The subproblems involving the term, or the indicator function ι {v: v, } have closed-form solutions. Therefore, in addition to the equivalence results, all the four algorithms have essentiall the same per-iteration costs. Acknowledgments This work is supported b NSF Grants DMS-498 and DMS-760 and ARO MURI Grant W9NF We thank Jonathan Eckstein for bringing his earl work [8, Chapter.] and [9] to our attention and anonmous reviewers for their helpful comments and suggestions. References. Bauschke, H.H., Combettes, P.L.: Conve Analsis and Monotone Operator Theor in Hilbert Spaces. Springer (0). Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 0(-), (004). Chambolle, A., Pock, T.: A first-order primal-dual algorithm for conve problems with applications to imaging. Journal of Mathematical Imaging and Vision 40(), 0 4 (0) 4. Davis, D., Yin, W.: Faster convergence rates of relaed Peaceman-Rachford and ADMM under regularit assumptions. arxiv preprint arxiv:407.0 (04). Davis, D., Yin, W.: Convergence rate analsis of several splitting schemes. In: R. Glowinski, S. Osher, W. Yin (eds.) Splitting Methods in Communication and Imaging, Science and Engineering, Chapter 4. Springer (06) 6. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. Journal of Scientific Computing 66(), (0) 7. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Transactions of the American Mathematical Societ 8(), 4 49 (96) 8. Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D. thesis, Massachusetts Institute of Technolog (989) 9. Eckstein, J., Fukushima, M.: Some reformulations and applications of the alternating direction method of multipliers. In: Large Scale Optimization, pp. 4. Springer (994) 0. Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal-dual algorithms for conve optimization in imaging science. SIAM Journal on Imaging Sciences (4), (00). Esser, J.: Primal dual algorithms for conve models and applications to image restoration, registration and nonlocal inpainting. Ph.D. thesis, Universit of California, Los Angeles (00). Fukushima, M.: The primal Douglas-Rachford splitting algorithm for a class of monotone mappings with application to the traffic equilibrium problem. Mathematical Programming 7(), (996). Gaba, D.: Applications of the method of multipliers to variational inequalities. In: M. Fortin, R. Glowinski (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundar-Value Problems. North-Holland: Amsterdam, Amsterdam (98) 4. Gaba, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approimation. Computers & Mathematics with Applications (), 7 40 (976)

Self Equivalence of the Alternating Direction Method of Multipliers

Self Equivalence of the Alternating Direction Method of Multipliers Ming Yan Wotao Yin March 9, 2015 Abstract The alternating direction method of multipliers (ADM or ADMM) breaks a comple optimization