Self Equivalence of the Alternating Direction Method of Multipliers

Size: px
Start display at page:

Download "Self Equivalence of the Alternating Direction Method of Multipliers"

Transcription

1 Self Equivalence of the Alternating Direction Method of Multipliers Ming Yan and Wotao Yin Abstract The alternating direction method of multipliers (ADM or ADMM) breaks a comple optimization problem into much simpler subproblems. The ADM algorithms are tpicall short and eas to implement et ehibit (nearl) state-of-the-art performance for large-scale optimization problems. To appl ADM, we first formulate a given problem into the ADM-read form, so the final algorithm depends on the formulation. A problem like minimize u() + v(c) has si different ADM-read formulations. The can be in the primal or dual forms, and the differ b how dumm variables are introduced. To each ADMread formulation, ADM can be applied in two different orders depending on how the primal variables are updated. Finall, we get twelve different ADM algorithms! How do the compare to each other? Which algorithm should one choose? In this chapter, we show that man of the different was of appling ADM are equivalent. Specificall, we show that ADM applied to a primal formulation is equivalent to ADM applied to its Lagrange dual; ADM is equivalent to a primal-dual algorithm applied to the saddle-point formulation of the same problem. These results are surprising since the primal and dual variables in ADM are seemingl treated ver differentl, and some previous work ehibit preferences in one over the other on specific problems. In addition, when one of the two objective functions is quadratic, possibl subject to an affine constraint, we show that swapping the update order of the two primal variables in ADM gives the same algorithm. These results identif the few trul different ADM algorithms for a problem, which generall have different forms of subproblems from which it is eas to pick one with the most computationall friendl subproblems. M. Yan Department of Computational Mathematics, Science and Engineering, Department of Mathematics, Michigan State Universit, East Lansing, MI 4884, USA. anm@math.msu.edu W. Yin Department of Mathematics, Universit of California, Los Angeles, CA 9009, USA. wotaoin@math.ucla.edu

2 Ming Yan and Wotao Yin Kewords: alternating direction method of multipliers, ADM, ADMM, Douglas- Rachford splitting (DRS), Peaceman-Rachford splitting (PRS), primal-dual algorithm Introduction The alternating direction method of multipliers (ADM or ADMM) is a ver popular algorithm with a wide range of applications in signal and image processing, machine learning, statistics, compressive sensing, and operations research. Combined with problem reformulation tricks, the method can reduce a complicated problem into much simpler subproblems. The vanilla ADM applies to a linearl-constrained problem with a separable conve objective function in the following ADM-read form: { minimize f () + g(), subject to A + B = b, (P) where functions f, g are proper, closed (i.e., lower semi-continuous), conve but not necessaril differentiable. ADM reduces (P) into two simpler subproblems and then iterativel updates,, as well as a multiplier (dual) variable z. Given ( k, k,z k ), ADM generates ( k+, k+,z k+ ) as follows. k+ argmin f () + (λ/) A + B k b + λ z k,. k+ argming() + (λ/) A k+ + B b + λ z k,. z k+ = z k + λ(a k+ + B k+ b), where λ > 0 is a fied parameter. We use since the subproblems do not necessaril have unique solutions. Since { f,a,} and {g,b,} are in smmetric positions in (P), swapping them does not change the problem. This corresponds to switching the order that and are updated in each iteration. But, since the variable updated first is used in the updating of the other variable, this swap leads to a different sequence of variables and thus a different algorithm. Note that the order switch does not change the per-iteration cost of ADM. Also note that one, however, cannot mi the two update orders at different iterations because it will generall cause divergence, even when the primal-dual solution to (P) is unique. For eample, let us appl ADMM with mied update orders of and and parameter λ = to the problem minimize, 0 + subject to = 0, which has the unique primal-dual solution (,,z ) = (0,0,). Set initial values ( 0, 0,z 0 ) = (,,). At odd iterations, we appl the update order:,, and z; at

3 Self Equivalence of the Alternating Direction Method of Multipliers even iterations, we appl the update order:,, and z. Then we obtain ( k, k,z k ) = (,,) for odd k and ( k, k,z k ) = (,,) for even k.. ADM works in man different was In spite of its popularit and vast literature, there are still simple unanswered questions about ADM: how man was can ADM be applied? and which was work better? Before answering these questions, let us eamine the following problem, to which we can find twelve different was to appl ADM: minimize u() + v(c), () where u and v are proper, closed, conve functions and C is a linear mapping. Problem () generalizes a large number of signal and image processing problems, inverse problems, and machine learning models. We shall reformulate () into the form of (P). B introducing dumm variables in two different was, we obtain two ADM-read formulations of problem (): { minimize u() + v(), subject to C = 0 and { minimize u() + v(cȳ),ȳ subject to ȳ = 0. () If C = I, these two formulations are eactl the same. In addition, we can derive the dual problem of (): minimize u ( C v) + v (v), () v where u,v are the conve conjugates (i.e., Legendre transforms) of functions u,v, respectivel, C is the adjoint of C, and v is the dual variable. (The steps to derive () from () are standard and thus omitted.) Then, we also reformulate () into two ADM-read forms, which use different dumm variables: { minimize u (u) + v (v) u,v subject to u + C v = 0 and { minimize u (C ū) + v (v) ū,v subject to ū + v = 0. (4) Clearl, ADM can be applied to all of the four formulations in () and (4), and including the update order swaps, there are eight different was to appl ADM. Under some technical conditions such as the eistence of saddle-point solutions, all the eight ADM will converge to a saddle-point solution or solution for problem (). In short, the all work. It is worth noting that b the Moreau identit, the subproblems involving u and v can be easil reduced to subproblems involving u and v, respectivel. No significant computing is required. The two formulations in (), however, lead to significantl different ADM subproblems. In the ADM applied to the left formulation, u and C will appear in one

4 4 Ming Yan and Wotao Yin subproblem and v in the other subproblem. To the right formulation, u will be alone while v and C will appear in the same subproblem. This difference applies to the two formulations in (4) as well. It depends on the structures of u,v,c to determine the better choices. Therefore, out of the eight, four will have (more) difficult subproblems than the rest. There are another four was to appl ADM to problem (). Ever one of them will have three subproblems that separatel involve u, v, C, so the are all different from the above eight. To get the first two, let us take the left formulation in () and introduce a dumm variable s, obtaining a new equivalent formulation minimize,,s u(s) + v() subject to C = 0, s = 0. It turns out that the same dumm variable trick applied to the right formulation in () also gives (), up to a change of variable names. Although there are three variables, we can group (,s) and treat and (,s) as the two variables. Then problem () has the form (P). Hence, we have two was to appl ADM to () with two different update orders. Note that and s do not appear together in an equation or function, so the ADM subproblem that updates (, s) will further decouple to two separable subproblems of and s; in other words, the resulting ADM has three subproblems involving {, C},{, v}, {s, u} separatel. The other two was are results of the same dumm variable trick applied to either formulation in (4). Again, since now C has its own subproblem, these four was are distinct from the previous eight was. As demonstrated through an eample, there are quite man was to formulate the same optimization problem into ADM-read forms and obtain different ADM algorithms. While most ADM users choose just one wa without paing much attention to the other choices, some show preferences toward a specific formulation. For eample, some prefer () over those in () and (4) since C, u, v all end up in separate subproblems. When appling ADM to certain l minimization problems, the authors of [4, ] emphasize on the dual formulations, and later the authors of [] show a preference over the primal formulations. When ADM was proposed to solve a traffic equilibrium problem, it was first applied to the dual formulation in [] and, ears later, to the primal formulation in []. Regarding which one of the two variables should be updated first in ADM, neither a rule nor an equivalence claim is found in the literature. Other than giving preferences to ADM with simpler subproblems, there is no results that compare the different formulations. (). Contributions This chapter shows that, applied to certain pairs of different formulations of the same problem, ADM will generate equivalent sequences of variables that can be

5 Self Equivalence of the Alternating Direction Method of Multipliers mapped eactl from one to another at ever iteration. Specificall, between the sequence of an ADM algorithm on a primal formulation and that on the corresponding dual formulation, such maps eist. For a special class of problems, this mapping is provided in [9]. We also show that whenever at least one of f and g is a quadratic function (including affine function as a special case), possibl subject to an affine constraint, the sequence of an ADM algorithm can be mapped to that of the ADM algorithm using the opposite order for updating their variables. Abusing the word equivalence, we sa that ADM has primal-dual equivalence and update-order equivalence (with a quadratic objective function). Equivalent ADM algorithms take the same number of iterations to reach the same accurac. (However, it is possible that one algorithm is slightl better than the other in terms of numerical stabilit, for eample, against round-off errors.) Equipped with these equivalence results, the first eight was to appl ADM to problem () that were discussed in Section. are reduced to four was in light of primal-dual equivalence, and the four will further reduce to two whenever u or v, or both, is a quadratic function. The last four was to appl ADM on problem () discussed in Section., which ield three subproblems that separatel involve u, v, and C, are all equivalent and reduce to just one due to primal-dual equivalence and one variable in them is associated with 0 objective (for eample, variable has 0 objective in problem ()). Take the l p -regularization problem, p [, ], minimize p + f (C) (6) as an eample, which is a special case of problem () with a quadratic function u when p =. We list its three different formulations, whose ADM algorithms are trul different, as follows. When p and f is non-quadratic, each of the first two formulations leads to a pair of different ADM algorithms with different orders of variable update; otherwise, each pair of algorithms is equivalent.. Left formulation of (): { minimize p + f (), subject to C = 0. The subproblem for involves l p -norm and C. The other one for involves f.. Right formulation of (): { minimize p + f (C), subject to = 0. The subproblem for involves l p -norm and, for p = and, has a closed-form solution. The other subproblem for involves f (C ).. Formulation (): for an µ > 0,

6 6 Ming Yan and Wotao Yin minimize s p + f (),,s subject to C = 0, µ( s) = 0. The subproblem for is a quadratic program involving C C+ µi. The subproblem for s involves l p -norm. The subproblem for involves f. The subproblems for s and are independent. The best choice depends on which has the simplest subproblems. The result of ADM s primal-dual equivalence is surprising for three reasons. Firstl, ADM iteration updates two primal variable, k and k in (P) and one dual variable, all in different manners. The updates to the primal variables are done in a Gauss-Seidel manner and involve minimizing functions f and g, but the update to the dual variable is eplicit and linear. Surprisingl, ADM actuall treats one of the two primal variables and the dual variable equall as we will later show. Secondl, most literature describes ADM as an ineact version of the augmented Lagrangian method (ALM) [7], which updates (, ) together rather than one after another. Although ALM maintains the primal variables, under the hood ALM is the dual-onl proimal-point algorithm that iterates the dual variable. It is commonl believed that ADM is an ineact dual algorithm. Thirdl, primal and dual problems tpicall have different sizes and regularit properties, causing the same algorithm, even if it is applicable to both, to ehibit different performance. For eample, the primal and dual variables ma have different dimensions. If the primal function f is Lipschitz differentiable, the dual function f is strongl conve but can be non-differentiable, and vice versa. Such primal-dual differences often mean that it is numericall advantageous to solve one rather than the other, et our result means that there is no such primal-dual difference on ADM. Our maps between equivalent ADM sequences have ver simple forms, as the reader will see below. Besides the technical proofs that establish the maps, it is interesting to mention the operator-theoretic perspective of our results. It is shown in [] that the dual-variable sequence of ADM coincides with a sequence of the Douglas- Rachford splitting (DRS) algorithm [7, 8]. Our ADM s primal-dual equivalence can be obtained through the above ADM DRS relation and the Moreau identit: pro h +pro h = I, applied to the proimal maps of f and f and those of g and g. The details are omitted in this chapter. Here, pro h () := argmin s h(s) + s. Our results of primal-dual and update-order equivalence for ADM etends to the Peaceman-Rachford splitting (PRS) algorithm. Let the PRS operator [9] be denoted as T PRS = (pro f I) (pro g I). The DRS operator is the average of the identit map and the PRS operator: T DRS = I + T PRS, and the relaed PRS (RPRS) operator is a weighted-average: T RPRS = ( α)i + αt PRS, where α (0,]. The DRS and PRS algorithms that iterativel appl their operators to find a fied point were originall proposed for evolving PDEs with two spatial dimensions in the 90s and then etended to finding a root of the sum of two maimal monotone (set-valued) mappings b Lions and Mercier [8]. Eckstein showed, in [8, Chapter.], that DRS/PRS applied to the primal problem () is equivalent

7 Self Equivalence of the Alternating Direction Method of Multipliers 7 to DRS/PRS applied to the dual problem (4) when C = I. We will show that RPRS applied to () is equivalent to RPRS applied to () for all C. In addition to the aforementioned primal-dual and update-order equivalence, we obtain a primal-dual algorithm for the saddle-point formulation of (P) that is also equivalent to the ADM. This primal-dual algorithm is generall different from the primal-dual algorithm proposed b Chambolle and Pock [], while the become the same in a special case. The connection between these two algorithms will be eplained. Even when using the same number of dumm variables, trul different ADM algorithms can have different iteration compleities (do not confuse them with the difficulties of their subproblems). The convergence analsis of ADM, such as conditions for sublinear or linear convergence, involves man different scenarios [4 6]. The discussion of convergence rates of ADM algorithms is beond the scope of this chapter. Our focus is on the equivalence.. Organization This chapter is organized as follows. Section specifies our notation, definitions, and basic assumptions. The three equivalence results for ADM are shown in Sections 4,, and 6: The primal-dual equivalence of ADM is discussed in Sections 4; ADM is shown to be equivalent to a primal-dual algorithm applied to the saddlepoint formulation in Section ; In Section 6, we show the update-order equivalence of ADM if f or g is a quadratic function, possibl subject to an affine constraint. Sections 4-6 do not require an knowledge of monotone operators. The primal-dual and update-order equivalence of RPRS is shown in Section 7 based on monotone operator properties. We conclude this chapter with the application of our results on total variation image denoising in Section 8. Notation, definitions, and assumptions Let H, H, and G be (possibl infinite dimensional) Hilbert spaces. Bold lowercase letters such as,, u, and v are used for points in the Hilbert spaces. In the eample of (P), we have H, H, and b G. When the Hilbert space a point belongs to is clear from the contet, we do not specif it for the sake of simplicit. The inner product between points and is denoted b,, and :=, is the corresponding norm; and denote the l and l norms, respectivel. Bold uppercase letters such as A and B are used for both continuous linear mappings and matrices. A denotes the adjoint of A. I denotes the identit mapping. If C is a conve and nonempt set, the indicator function ι C is defined as follows:

8 8 Ming Yan and Wotao Yin { 0, if C, ι C () =, if / C. Both lower and upper case letters such as f, g, F, and G are used for functions. Let f () be the subdifferential of function f at. The proimal operator pro f ( ) of function f is defined as pro f ( ) () = argmin f () +, where the minimization problem has a unique solution. The conve conjugate f of function f is defined as f (v) = sup{ v, f ()}. Let L : H G, the infimal postcomposition [, Def..] of f : H (,+ ] b L is given b with dom(l f ) = L(dom( f )). L f : s inf f (L (s)) = inf f (), :L=s Lemma. If f is conve and L is affine and epressed as L( ) = A +b, then L f is conve and the conve conjugate of L f can be found as follows: (L f ) ( ) = f (A ) +,b. Proof. Following from the definitions of conve conjugate and infimal postcomposition, we have (L f ) (v) = sup v, L f () = sup v,a + b f () = sup A v, f () + v,b = f (A v) + v,b. Definition. We sa that an algorithm I applied to a problem is equivalent to an algorithm II applied to either the same or an equivalent problem if, given the set of parameters and a sequence of iterates {ξ k } k 0 of algorithm II, i.e., ξ k+ = A (ξ k k,ξ,,ξ k ) with 0, there eist a set of parameters and a sequence of iterates {ξ k } k 0 of algorithm I such that ξ k = T (ξ k k,ξ,,ξ k ) for some transformation T and 0. Definition. An optimization algorithm is called primal-dual equivalent if this algorithm applied to the primal formulation is equivalent to the same algorithm applied to its Lagrange dual. It is important to note that most algorithms are not primal-dual equivalent. ALM applied to the primal problem is equivalent to proimal point method applied to

9 Self Equivalence of the Alternating Direction Method of Multipliers 9 the dual problem [0], but both algorithms are not primal-dual equivalent. In this chapter, we will show that ADM and RPRS are primal-dual equivalent. We make the following assumptions throughout the chapter: Assumption All the functions in this chapter are assumed to be proper, closed, and conve. Assumption The saddle-point solutions to all the optimization problems in this chapter are assumed to eist. Equivalent problems A primal formulation equivalent to (P) is { minimize F(s) + G(t) s,t subject to s + t = 0, (P) where s,t G and F(s) := min f () + ι {:A=s} (), (7a) G(t) := ming() + ι {:B b=t} (). (7b) Remark. If we define L f and L g as L f () = A and L g () = B b, respectivel, then F = L f f, G = L g g. The Lagrange dual of (P) is minimize v f ( A v) + g ( B v) + v,b, (8) ( ) which can be derived from minimize v min, with the Lagrangian defined as follows: An ADM-read formulation of (8) is L(,,v) = f () + g() + v,a + B b. { minimize f ( A u) + g ( B v) + v,b u,v subject to u v = 0. (D) When ADM is applied to an ADM-read formulation of the Lagrange dual problem, we call it Dual ADM. The original ADM is called Primal ADM.

10 0 Ming Yan and Wotao Yin is Following similar steps, the ADM read formulation of the Lagrange dual of (P) { minimize F ( u) + G ( v) u,v subject to u v = 0. (D) The equivalence between (D) and (D) is trivial since F (u) = f (A u), G (v) = g (B v) v,b, which follows from Lemma. Although there can be multiple equivalent formulations of the same problem (e.g., (P), (P), (8), and (D)/(D) are equivalent), an algorithm ma or ma not be applicable to some of them. Even when the are, on different formulations, their behaviors such as convergence and speed of convergence are different. In particular, most algorithms have different behaviors on primal and dual formulations of the same problem. An algorithm applied to a primal formulation does not dictate the behaviors of the same algorithm applied to the related dual formulation. The simple method in linear programming has different performance when applied to both the primal and dual problems, i.e., the primal simple method starts with a primal basic feasible solution (dual infeasible) until the dual feasibilit conditions are satisfied, while the dual simple method starts with a dual basic feasible solution (primal infeasible) until the primal feasibilit conditions are satisfied. The ALM also has different performance when applied to the primal and dual problems, i.e., ALM applied to the primal problem is equivalent to proimal point method applied to the related dual problem, and proimal point method is, in general, different from ALM on the same problem. 4 Primal-dual equivalence of ADM In this section we show the primal-dual equivalence of ADM. Algorithms - describe how ADM is applied to (P), (P), and (D)/ (D) [4, ]. Algorithm ADM on (P) initialize 0, z0, λ > 0 for k = 0,, do k+ argming() + (λ) A k + B b + λzk k+ argmin f () + (λ) A + B k+ b + λz k z k+ = z k + λ (A k+ + B k+ b) end for

11 Self Equivalence of the Alternating Direction Method of Multipliers Algorithm ADM on (P) initialize s 0, z0, λ > 0 for k = 0,, do t k+ = argming(t) + (λ) s k + t + λzk s k+ = argmin t s z k+ = z k + λ (s k+ + t k+ end for F(s) + (λ) s + t k+ + λz k ) Algorithm ADM on (D)/(D) initialize u 0, z0, λ > 0 for k = 0,, do v k+ = argming ( v) + λ uk v + λ z k u k+ v = argmin u z k+ = z k + λ(uk+ v k+ end for F ( u) + λ u vk+ + λ z k ) The k and k in Algorithm ma not be unique because of the matrices A and B, while A k and Bk are unique. In addition, Ak and Bk are calculated for twice and thus stored in the implementation of Algorithm to save the second calculation. Following the equivalence of Algorithms and in Part of the following Theorem, we can view problem (P) as the master problem of (P). We can sa that ADM is essentiall an algorithm applied onl to the master problem (P), which is Algorithm ; this fact has been obscured b the often-seen Algorithm, which integrates ADM on the master problem with the independent subproblems in (7). Theorem (Equivalence of Algorithms -). Suppose A 0 = s0 = z0 and z0 = z 0 = u0 and that the same parameter λ is used in Algorithms -. Then, their equivalence can be established as follows:. From k, k, zk of Algorithm, we obtain tk, sk, zk of Algorithm through: t k = Bk b, s k = Ak, z k = zk. (9a) (9b) (9c) From t k, sk, zk of Algorithm, we obtain k, k, zk of Algorithm through: k = argmin {g() : B b = t k }, (0a) k = argmin { f () : A = s k }, (0b) z k = zk. (0c)

12 Ming Yan and Wotao Yin. We can recover the iterates of Algorithms and from each other through u k = zk, zk = sk. () Proof. Part. Proof b induction. We argue that under (9b) and (9c), Algorithms and have essentiall identical subproblems in their first steps at the kth iteration. Consider the following problem, which is obtained b plugging the definition of G( ) into the t k+ -subproblem of Algorithm : ( k+,t k+ ) = argming() + ι {(,t):b b=t} (,t) + (λ) s k + t + λzk. (),t If one minimizes over first while keeping t as a variable, one eliminates and recovers the t k+ -subproblem of Algorithm. If one minimizes over t first while keeping as a variable, then after plugging in (9b) and (9c), problem () reduces to the k+ -subproblem of Algorithm. In addition, ( k+,t k+ ) obes t k+ = B k+ b, () which is (9a) at k +. Plugging t = t k+ into () ields problem (0a) for k+, which must be equivalent to the k+ -subproblem of Algorithm. Therefore, the k+ -subproblem of Algorithm and the t k+ -subproblem of Algorithm are equivalent through (9a) and (0a) at k +, respectivel. Similarl, under () and (9c), we can show that the k+ -subproblem of Algorithm and the s k+ -subproblem of Algorithm are equivalent through the formulas for (9b) and (0b) at k +, respectivel. Finall, under (9a) and (9b) at k + and z k = zk, the formulas for zk+ and z k+ in Algorithms and are identical, and the return z k+ = z k+, which is (9c) and (0c) at k +. Part. Proof b induction. Suppose that () holds. We shall show that () holds at k +. Starting from the optimalit condition of the t k+ -subproblem of Algorithm, we derive 0 G(t k+ ) + λ (s k + tk+ + λz k ) t k+ G ( λ (s k + tk+ + λz k )) [ ] λ λ (s k + tk+ + λz k ) (λz k + sk ) G ( λ (s k + tk+ + λz k )) [ ] λ (s k + tk+ + λz k ) + (λu k + zk ) G ( λ (s k + tk+ λ + λz k )) [ ] 0 G ( λ (s k + tk+ + λz k )) λ u k λ (s k + tk+ + λz k ) + λ z k v k+ = λ (s k + tk+ + λz k ) = λ (z k + tk+ + λz k ), where the last equivalence follows from the optimalit condition for the v k+ - subproblem of Algorithm.

13 Self Equivalence of the Alternating Direction Method of Multipliers Starting from the optimalit condition of the s k+ -subproblem of Algorithm, and appling the update, z k+ = z k +λ (s k+ +t k+ ), in Algorithm and the identit of t k+ obtained above, we derive 0 F(s k+ ) + λ (s k+ + t k+ + λz k ) 0 F(s k+ ) + z k+ 0 s k+ F ( z k+ ) 0 λ(z k+ z k ) tk+ F ( z k+ ) 0 λ(z k+ z k ) + zk + λ(zk vk+ ) F ( z k+ ) 0 F ( z k+ ) + λ(z k+ v k+ + λ z k ) z k+ = u k+. where the last equivalence follows from the optimalit condition for the u k+ - subproblem of Algorithm. Finall, combining the update formulas of z k+ and z k+ in Algorithms and, respectivel, as well as the identities for u k+ and v k+ obtained above, we obtain z k+ = z k + λ(uk+ v k+ ) = s k + λ(z k+ z k λ (s k + tk+ )) = λ(z k+ z k ) tk+ = s k+. Remark. Part of the theorem (ADM s primal-dual equivalence) can also be derived b combining the following two equivalence results: (i) the equivalence between ADM on the primal problem and the Douglas-Rachford splitting (DRS) algorithm [7, 8] on the dual problem [], and (ii) the equivalence result between DRS algorithms applied to the master problem (P) and its dual problem (cf. [8, Chapter.] [9]). In this chapter, however, we provide an elementar algebraic proof in order to derive the formulas in Theorem that recover the iterates of one algorithm from another. Part of the theorem shows that ADM is a smmetric primal-dual algorithm. The reciprocal positions of parameter λ indicates its function to balance the primal and dual progresses. Part of the theorem also shows that Algorithms and have no difference, in terms of per-iteration compleit and the number of iterations needed to reach an accurac. However, Algorithms and have difference in terms of per-iteration compleit. In fact, Algorithm is implemented for Algorithm because Algorithm has smaller compleit than Algorithm. See the eamples in Sections 4. and 4..

14 4 Ming Yan and Wotao Yin 4. Primal-dual equivalence of ADM on () with three subproblems In Section., we introduced four different was to appl ADM on () with three subproblems. The ADM-read formulation for the primal problem is (), and the ADM applied to this formulation is k+ = argmin s k + λz k s + C k + λz k, s k+ = argminu(s) + (λ) k+ s + λz k s, s k+ = argminv() + (λ) C k+ + λz k, (4a) (4b) (4c) z k+ s = z k s + λ ( k+ s k+ ), (4d) z k+ = z k + λ (C k+ k+ ). (4e) Similarl, we can introduce a dumm variable t into the left formulation in (4) and obtain a new equivalent formulation { minimize u (u) + v (t) u,v,t subject to C v + u = 0, v t = 0. () The ADM applied to () is v k+ = argmin C v + u k + λ z k u + v t k + λ z k t, v u k+ = argminu (u) + λ u C v k+ + u + λ z k u, (6a) (6b) t k+ = argminv (t) + λ t vk+ t + λ z k t, (6c) z k+ u = z k u + λ(c v k+ + u k+ ), (6d) z k+ t = z k t + λ(v k+ t k+ ). (6e) Interestingl, as shown in the following theorem, ADM algorithms (4) and (6) applied to () and () are equivalent. Theorem. If the initialization for algorithms (4) and (6) satisfies z 0 = t 0, z 0 s = u 0, s 0 = z 0 u, and 0 = z 0 t. Then for k, we have the following equivalence results between the iterations of the two algorithms: z k = t k, z k s = u k, s k = z k u, k = z k t. The proof is similar to the proof of Theorem and is omitted here.

15 Self Equivalence of the Alternating Direction Method of Multipliers 4. Eample: basis pursuit The basis pursuit problem seeks for the minimal l solution to a set of linear equations: Its Lagrange dual is minimize u subject to Au = b. (7) u minimize b T subject to A. (8) The YALL algorithms [4] implement ADMs on a set of primal and dual formulations for basis pursuit and LASSO, et ADM for (7) is not given (however, a linearized ADM is given for (7)). Although seemingl awkward, problem (7) can be turned equivalentl into the ADM-read form minimize v + ι {u:au=b} (u) subject to u v = 0. (9) u,v Similarl, problem (8) can be turned equivalentl into the ADM-read form minimize b T + ι B, () subject to A = 0, (0) where B = { : }. For simplicit, let us suppose that A has full row rank so the inverse of AA eists. (Otherwise, Au = b are redundant whenever the are consistent; and (AA ) shall be replaced b the pseudo-inverse below.) ADM for problem (9) can be simplified to the iteration: v k+ =argmin v + λ uk v + λ zk, v (a) u k+ =v k+ λ zk A (AA ) (A(v k+ λ zk ) b), (b) z k+ =z k + λ(uk+ v k+ ). (c) And ADM for problem (0) can be simplified to the iteration: k+ =P B (A k + λzk ), (a) k+ =(AA ) (A k+ λ(az k b)), (b) z k+ =z k + λ (A k+ k+ ), (c) where P B is the projection onto B. Looking into the iteration in (), we can find that A k is used in both the kth and k + st iterations. To save the computation, we can store A k as sk. In addition, let tk = k and zk = zk, we have

16 6 Ming Yan and Wotao Yin t k+ =P B (s k + λzk ), (a) s k+ =A (AA ) (A(t k+ λaz k ) + λb)), (b) z k+ =z k + λ (s k+ t k+ ), (c) which is eactl Algorithm for (0). Thus, Algorithm has smaller compleit than Algorithm, i.e., one matri vector multiplication A k is saved from Algorithm. The corollar below follows directl from Theorem b associating (0) and (9) as (P) and (D), and () and () with the iterations of Algorithms and, respectivel. Corollar. Suppose that Au = b are consistent. Consider ADM iterations () and (). Let u 0 = z0 and z0 = A 0. Then, for k, iterations () and () are equivalent. In particular, From k, zk in (), we obtain uk, zk in () through: u k = zk, zk = A k. From u k, zk in (), we obtain k, zk in () through: k = (AA ) Az k, zk = uk. 4. Eample: basis pursuit denoising The basis pursuit denoising problem is and its Lagrange dual, in the ADM-read form, is minimize u + u α Au b (4) minimize b, + α, + ι B () subject to A = 0. () The iteration of ADM for () is k+ =P B (A k + λzk ), (6a) k+ =(AA + αλi) (A k+ λ(az k b)), (6b) z k+ =z k + λ (A k+ k+ ). (6c) Looking into the iteration in (6), we can find that A k is used in both the kth and k + st iterations. To save the computation, we can store A k as sk. In addition, let t k = k and zk = zk, we have

17 Self Equivalence of the Alternating Direction Method of Multipliers 7 t k+ =P B (s k + λzk ), (7a) s k+ =A (AA + αλi) (A(t k+ λz k ) + λb)), (7b) z k+ =z k + λ (s k+ t k+ ), (7c) which is eactl Algorithm for (). Thus, Algorithm has a lower per iteration compleit than Algorithm, i.e., one matri vector multiplication A k is saved from Algorithm. In addition, if A A = I, (7b) becomes s k+ =(αλ + ) (t k+ λz k + λa b), (8) and no matri vector multiplications is needed during the iteration because λa b can be precalculated. The ADM-read form of the original problem (4) is whose ADM iteration is minimize v + u,v α Au b subject to u v = 0, (9) v k+ =argmin v + λ uk v + λ zk, v (0a) u k+ =(A A + αλi) (A b + αλv k+ αz k ), (0b) z k+ =z k + λ(uk+ v k+ ). (0c) The corollar below follows directl from Theorem. Corollar. Consider ADM iterations (6) and (0). Let u 0 = z0 and z0 = A 0. For k, ADM on the dual and primal problems (6) and (0) are equivalent in the following wa: From k, zk in (6), we recover uk, zk in (0) through: u k = zk, zk = A k. From u k, zk in (0), we recover k, zk in (6) through: k = (Auk b)/α, zk = uk. Remark. Iteration (0) is different from that of ADM for another ADM-read form of (4) minimize u,v u + α v subject to Au v = b, ()

18 8 Ming Yan and Wotao Yin which is used in [4]. In general, there are different ADM-read forms and their ADM algorithms ield different iterates. ADM on one ADM-read form is equivalent to it on the corresponding dual ADM-read form. ADM as a primal-dual algorithm on the saddle-point problem As shown in Section 4, ADM on a pair of conve primal and dual problems are equivalent, and there is a connection between z k in Algorithm and dual variable u k in Algorithm. This primal-dual equivalence naturall suggests that ADM is also equivalent to a primal-dual algorithm involving both primal and dual variables. We derive problem (P) into an equivalent primal-dual saddle-point problem () as follows: min, g() + f () + ι {(,):A=b B}(,) =ming() + F(b B) =minma g() + u,b B u F ( u) () =minma g() + u,b b f ( A u). () u A primal-dual algorithm for solving () is described in Algorithm 4. Theorem establishes the equivalence between Algorithms and 4. Algorithm 4 Primal-dual formulation of ADM on problem () initialize u 0 4, u 4, 0 4, λ > 0 for k = 0,, do ū k 4 = uk 4 uk 4 k+ 4 = argming() + (λ) B B k 4 + λūk 4 u k+ end for 4 = argmin u f ( A u) u,b k+ 4 b + λ/ u u k 4 Remark 4. Publication [] proposed a primal-dual algorithm for () and obtained its connection to ADM [0]: When B = I, ADM is equivalent to the primal-dual algorithm in []; When B I, the primal-dual algorithm is a preconditioned ADM as an additional proimal term δ/ k 4 (λ) B B k 4 is added to the subproblem for k+ 4. This is also a special case of ineact ADM in [6]. Our Algorithm 4 is a primal-dual algorithm that is equivalent to ADM in the general case. Theorem (Equivalence between Algorithms and 4). Suppose that A 0 = λ(u 0 4 u 4 ) + b B0 4 and z0 = u0 4. Then, Algorithms and 4 are equivalent with

19 Self Equivalence of the Alternating Direction Method of Multipliers 9 the identities: for all k > 0. A k = λ(uk 4 uk 4 ) + b B k 4, zk = uk 4, (4) Proof. B assumption, (4) holds at iteration k = 0. Proof b induction. Suppose that (4) holds at iteration k 0. We shall establish (4) at iteration k +. From the first step of Algorithm, we have k+ =argming() + (λ) A k + B b + λzk =argming() + (λ) λ(u k 4 uk 4 ) + B B k 4 + λuk 4, which is the same as the first step in Algorithm 4. Thus we have k+ = k+ Combing the second and third steps of Algorithm, we have 4. 0 f ( k+ ) + λ A (A k+ + B k+ b + λz k ) = f (k+ ) + A z k+. Therefore, k+ f ( A z k+ ) = A k+ F ( z k+ ) λ(z k+ z k ) + b Bk+ F ( z k+ ) z k+ = argminf ( z) z,b k+ b + λ/ z z k z z k+ = argmin f ( A z) z,b k+ 4 b + λ/ z u k 4, z where the last line is the second step of Algorithm 4. Therefore, we have z k+ = u k+ 4 and A k+ = λ(z k+ z k ) + b Bk+ = λ(u k+ 4 u k 4 ) + b Bk Equivalence of ADM for different orders In both problem (P) and Algorithm, we can swap and and obtain Algorithm, which is still an ADM algorithm. In general, the two algorithms are different. In this section, we show that for a certain tpe of functions f (or g), Algorithms and become equivalent.

20 0 Ming Yan and Wotao Yin Algorithm ADM on (P) initialize 0, z0, λ > 0 for k = 0,, do k+ = argmin f () + (λ) A + B k b + λzk k+ = argmin g() + (λ) A k+ + B b + λz k z k+ = z k + λ (A k+ + B k+ b) end for The assumption that we need is that either pro F( ) or pro G( ) is affine (cf. (7) for the definitions of F and G). The definition of affine mapping is given in Definition. Definition. A mapping T is affine if T (r) T (0) is linear in r, i.e., T (αr + βr ) T (0) = α[t (r ) T (0)] + β[t (r ) T (0)], α,β R. () A mapping T is affine if and onl if it can be written as a linear mapping plus a constant, and the following proposition provides several equivalent statements for pro G( ) being affine. Proposition. Let λ > 0. The following statements are equivalent:. pro G( ) is affine;. pro λg( ) is affine;. apro G( ) bi + ci is affine for an scalars a, b and c; 4. pro G ( ) is affine;. G is conve quadratic (or, affine or constant) and its domain dom(g) is either G or the intersection of hperplanes in G. In addition, if function g is conve quadratic and its domain is the intersection of hperplanes, then function G defined in (7b) satisfies Part above. Proposition. If pro G( ) is affine, then the following holds for an r and r : pro G( ) (r r ) = pro G( ) r pro G( ) r. (6) Proof. Equation (6) is obtained b letting α = and β = in (). Theorem 4 (Equivalence of Algorithms and ).. Assume that pro λg( ) is affine. Given the sequences k, zk, and k of Algorithm, if 0 and z0 satisf z0 G(B0 b), then we can initialize Algorithm with 0 = and z0 = z0 + λ (A + B0 b), and recover the sequences k and zk of Algorithm through k = k+, (7a) z k = zk + λ (A k+ + B k b). (7b)

21 Self Equivalence of the Alternating Direction Method of Multipliers. Assume that pro λf( ) is affine. Given the sequences k, zk, and k of Algorithm, if 0 and z0 satisf z0 F(A0 ), then we can initialize Algorithm with 0 = and z0 = z0 + λ (A 0 + B b), and recover the sequences k and z k of Algorithm through k = k+, (8a) z k = zk + λ (A k + Bk+ b). (8b) Proof. We prove Part onl b induction. (The proof for the other part is similar.) The initialization of Algorithm clearl follows (7) at k = 0. Suppose that (7) holds at k 0. We shall show that (7) holds at k +. We first show from the affine propert of pro λg( ) that B k+ = B k+ B k. (9) The optimization subproblems for and in Algorithms and, respectivel, are as follows: k+ = argming() + (λ) A k + B b + λzk, k+ = argmin Following the definition of G in (7), we have g() + (λ) A k+ + B b + λz k. B k+ b = pro λg( ) ( A k λzk ), (40a) B k+ b = pro λg( ) ( A k+ λz k ), (40b) B k The third step of Algorithm is b = pro λg( ) ( A k λzk ). (40c) z k = zk + λ (A k + Bk b). (4) (Note that for k = 0, the assumption z 0 G(B0 b) ensures the eistence of z in (40c) and (4).) Then, (7) and (4) give us A k + λzk (7) = A k+ + λz k + Ak+ + B k b = (A k+ + λz k ) (λzk Bk + b) (4) = (A k+ + λz k ) (Ak + λzk ). Since pro λg( ) is affine, we have (6). Once we plug in (6): r = A k+ λz k, r = A k λzk, and r r = A k λzk and then appl (40), we obtain (9). Net, the third step of Algorithm and (9) give us

22 Ming Yan and Wotao Yin B k+ b + λz k (9) = (B k+ b) (B k b) + λzk + (Ak+ + B k b) = (B k+ b) + λz k + (Ak+ + B k+ b) = (B k+ b) + λz k+. This identit shows that the updates of k+ and k+ in Algorithms and, respectivel, have identical data, and therefore, we recover k+ = k+. Lastl, from the third step of Algorithm and the identities above, it follows that z k+ = z k + λ (A k+ + B k+ ( b) ) = z k + λ A k+ + (B k+ b + λz k+ λz k ) = z k+ + λ (A k+ + B k+ b). Therefore, we obtain (7) at k +. Remark. We can avoid the technical condition z 0 G(B0 b) on Algorithm in Part of Theorem 4. When it does not hold, we can use the alwas-true relation z G(B b) instead; correspondingl, we shall add one iteration to the iterates of Algorithm, namel, initialize Algorithm with 0 = and z 0 = z + λ (A + B b) and recover the sequences k and zk of Algorithm through k = k+, (4a) z k = zk+ + λ (A k+ + B k+ b). (4b) Similar arguments appl to the other part of Theorem 4. 7 Equivalence results of relaed PRS In this section, we consider the following conve problem: minimize and its corresponding Lagrangian dual f () + g(a), (P) minimize v f (A v) + g ( v). (D) In addition, we introduce another primal-dual pair equivalent to (P)-(D): minimize minimize u ( f A ) () + g(), (P4) f (u) + (g A) ( u). (D4)

23 Self Equivalence of the Alternating Direction Method of Multipliers Here (P4) is obtained as the dual of (D) b reformulating (D) as minimize v, v f (A v) + g ( v) subject to v = v, and (D4) is obtained as the dual of (P) in a similar wa. Lemma below will establish the equivalence between the two primal-dual pairs. Remark 6. When A = I, we have ( f A ) = f, and problem (P) is eactl the same as problem (P4). Similarl, problem (D) is eactl the same as problem (D4). Lemma. Problems (P) and (P4) are equivalent in the following sense: Given an solution to (P), = A is a solution to (P4), Given an solution to (P4), argmin :A= f () is a solution to (P). The equivalence between problems (D) and (D4) is similar: Given an solution v to (D), A v is a solution to (D4), Given an solution u to (D4), v argmin v:a v=u g ( v) is a solution to (D). Proof. We prove onl the equivalence of (P) and (P4), the proof for the equivalence of (D) and (D4) is similar. Part : If is a solution to (P), we have 0 f ( )+A g(a ). Assume that there eists q such that q g(a ) and A q f ( ). Then we have Therefore, A q f ( ) f (A q) = A A f (A q) = ( f A )(q) q ( f A ) (A ). 0 ( f A ) (A ) + g(a ) and A is a solution to (P4). Part : If is a solution to (P4), the optimalit condition gives us 0 ( f A ) ( ) + g( ). Assume that there eists q such that q g( ) and q ( f A ) ( ). Then we have q ( f A ) ( ) ( f A )(q). (4) Consider the following optimization problem for finding from minimize and the corresponding dual problem f () subject to A =,

24 4 Ming Yan and Wotao Yin maimize f (A v) + v,. v It is eas to obtain from (4) that q is a solution of the dual problem. The optimal dualit gap is zero and the strong dualit gives us f ( ) = f ( ) q,a = f (A q) + q,. (44) Thus is a solution of minimize Because q g( ) = g(a ), f () A q, and A q f ( ) 0 f ( ) A q. (4) 0 f ( ) + A g(a ) = f ( ) + (g A)( ) (46) Therefore is a solution of (P). Net we will show the equivalence between the RPRS to the primal and dual problems: RPRS on (P) RPRS on (D4) RPRS on (P4) RPRS on (D) We describe the RPRS on (P) in Algorithm 6, and the RPRS on other problems can be obtained in the same wa. Algorithm 6 RPRS on (P) initialize w 0, λ > 0, 0 < α. for k = 0,, do k+ = pro λ f ( ) w k w k+ = ( α)w k + α(pro λg A( ) I)( k+ w k ) end for Theorem. [Primal-dual equivalence of RPRS] RPRS on (P) is equivalent to RPRS on (D4). RPRS on (P4) is equivalent to RPRS on (D). Before proving this theorem, we introduce a lemma, which was also given in [8, Proposition.4]. Here, we prove it in a different wa using the generalized Moreau decomposition. Lemma. For λ > 0, we have λ (pro λf( ) I)w = (I pro λ F ( ) )(w/λ) = (pro λ F ( ) I)( w/λ). (47)

25 Self Equivalence of the Alternating Direction Method of Multipliers Proof. We prove it using the generalized Moreau decomposition [, Theorem..] w = pro λf( ) (w) + λpro λ F ( )(w/λ). (48) Using the generalized Moreau decomposition, we have λ (pro λf( ) I)w = λ pro λf( ) (w) w/λ The last equalit of (47) comes from (48) = λ (w λpro λ F ( )(w/λ)) w/λ = w/λ pro λ F ( ) (w/λ) = (I pro λ F ( ) )(w/λ). pro λ F ( ) ( w/λ) = pro λ F ( ) (w/λ). Proof (Proof of Theorem ). We will prove onl the equivalence of RPRS on (P) and (D4). The proof for the other equivalence is the same. The RPRS on (P) and (D4) can be formulated as and w k+ = ( α)w k + α(pro λg A( ) I)(pro λ f ( ) I)w k, (49) w k+ = ( α)w k + α(pro λ (g A) ( ) I)(pro λ f ( ) I)wk, (0) respectivel. In addition, we can recover the variables k (or v k ) from w k (or wk ) using the following: k+ = pro λ f ( ) w k, () v k+ = pro λ f ( ) wk. () Proof b induction. Suppose w k = wk /λ holds. We net show that wk+ = w k+ /λ. w k+ =( α)w k /λ + α(pro λ (g A) ( ) I)(pro λ f ( ) I)(wk /λ) (47) = ( α)w k /λ + α(pro λ (g A) ( ) I)( λ (pro λ f ( ) I)w k ) (47) = ( α)w k /λ + αλ (pro λ(g A)( ) I)(pro λ f ( ) I)w k =λ [( α)w k + α(pro λ(g A)( ) I)(pro λ f ( ) I)w k ] =w k+ /λ. In addition we have

26 6 Ming Yan and Wotao Yin k+ + λv k+ =pro λ f ( ) w k + λpro λ f ( ) wk =pro λ f ( ) w k + λpro λ f ( ) (wk /λ) = wk. Remark 7. Eckstein showed in [8, Chapter.] that DRS/PRS on (P) is equivalent to DRS/PRS on (D) when A = I. This special case can be obtained from this theorem immediatel because when A = I, (D) is eactl the same as (D4) and we have DRS/PRS on (P) DRS/PRS on (D4) DRS/PRS on (D) DRS/PRS on (P4). Remark 8. In order to make sure that RPRS on the primal and dual problems are equivalent, the initial conditions and parameters have to satisf conditions described in the proof of Theorem. We need the initial condition to satisf w 0 = w0 /λ and the parameter for RPRS on the dual problem has to be chosen as λ, see the differences in (49) and (0). Similarl to the ADM, we can swap f and g A and obtain a new RPRS. The iteration in Algorithm 6 can be written as w k+ = ( α)w k + α(pro λg A( ) I)(pro λ f ( ) I)w k, () and the RPRS after the swapping is w k+ = ( α)w k + α(pro λ f ( ) I)(pro λg A( ) I)w k. (4) We show below that for a certain tpe of function f (or g), () and (4) are equivalent. Theorem 6.. Assume that pro λ f ( ) is affine. If () and (4) initiall satisf w 0 = (pro λ f ( ) I)w 0, then wk = (pro λ f ( ) I)w k for all k.. Assume that pro λg A( ) is affine. If () and (4) initiall satisf w 0 = (pro λg A( ) I)w 0, then wk = (pro λg A( ) I)w k for all k. Proof. We onl prove the first statement, as the second one can be proved in a similar wa. We appl proof b induction. Suppose w k = (pro λ f ( ) I)w k holds. From (4), we have w k+ = ( α)(pro λ f ( ) I)w k + α(pro λ f ( ) I)(pro λg A( ) I)(pro λ f ( ) I)w k [ ] = (pro λ f ( ) I) ( α)w k + α(pro λg A( ) I)(pro λ f ( ) I)w k = (pro λ f ( ) I)w k+.

27 Self Equivalence of the Alternating Direction Method of Multipliers 7 The first equalit holds because pro λ f ( ) I is affine, which comes from the assumption that pro λ f ( ) is affine and Lemma. The second equalit comes from (). 8 Application: total variation image denoising ADM (or split Bregman [6]) has been applied on man image processing applications, and we appl the previous equivalence results of ADM to derive several equivalent algorithms for total variation denoising. The total variation (ROF model []) applied on image denoising is minimize + α BV (Ω) Ω b where stands for an image, and BV (Ω) is the set of all bounded variation functions on Ω. The first term is known as the total variation of, minimizing which tends to ield a piece-wise constant solution. The discrete version is as follows: minimize, + α b, where is a finite difference approimation of the gradient, which can be epressed as a linear operator. Without loss of generalit, we consider the twodimensional image, and the discrete total variation, of image is defined as, = ( ) i j, i j where is the -norm of a vector. The equivalent ADM-read form [6, Equation (.)] is minimize,, + α b subject to = 0, () and its dual problem in ADM-read form [, Equation (8)] is minimize v,u α div u + αb + ι {v: v, }(v) subject to u v = 0, (6) where v, = ma (v) i j and div u is the finite difference approimation of divergence that satisfies,div u =,u for an and u. In addition, the equivalent i j saddle-point problem is minimizemaimize v α b + v, ι {v: v, }(v). (7)

28 8 Ming Yan and Wotao Yin We list the following equivalent algorithms for solving the total variation image denoising problem. The equivalence result stated in Corollar can be obtained from Theorems -4.. Algorithm (primal ADM) on () is k+ =argmin k+ =argmin α b + (λ) k + λzk, (8a), + (λ) k+ + λz k, (8b) z k+ =z k + λ ( k+ k+ ). (8c). Algorithm (dual ADM) on (6) is =argmin α div u + αb + λ vk u + λ z k, u k+ u (9a) v k+ =argmin v ι {v: v, }(v) + λ v uk+ + λ z k, (9b) z k+ =z k + λ(vk+ u k+ ). (9c). Algorithm 4 (primal-dual) on (7) is v k 4 =vk 4 vk 4, (60a) α 4 =argmin b + (λ) k 4 + λ vk 4, (60b) k+ v k+ 4 =argmin v ι {v: v, }(v) v, k+ 4 + λ v vk. (60c) 4. Algorithm (primal ADM with order swapped) on () is k+ =argmin, + (λ) k + λzk, k+ =argmin (6a) α b + (λ) k+ + λz k, (6b) z k+ =z k + λ ( k+ k+ ). (6c) Corollar. Let 0 = b+α div z 0. If the initialization for all algorithms (8)-(6) satisfies 0 = z0 = 0 4 λ(v0 4 v 4 ) = and z0 = v0 = v0 4 = z0 + λ ( 0 ). Then for k, we have the following equivalence results between the iterations of the four algorithms: k = zk = k 4 λ(vk 4 vk 4 ) = k+, z k = vk = v k 4 = z k + λ ( k k+ ). Remark 9. In an of the four algorithms, the or div operator is separated in a different subproblem from the term, or its dual norm,. The or div op-

29 Self Equivalence of the Alternating Direction Method of Multipliers 9 erator is translation invariant, so their subproblems can be solved b a diagonalization trick []. The subproblems involving the term, or the indicator function ι {v: v, } have closed-form solutions. Therefore, in addition to the equivalence results, all the four algorithms have essentiall the same per-iteration costs. Acknowledgments This work is supported b NSF Grants DMS-498 and DMS-760 and ARO MURI Grant W9NF We thank Jonathan Eckstein for bringing his earl work [8, Chapter.] and [9] to our attention and anonmous reviewers for their helpful comments and suggestions. References. Bauschke, H.H., Combettes, P.L.: Conve Analsis and Monotone Operator Theor in Hilbert Spaces. Springer (0). Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 0(-), (004). Chambolle, A., Pock, T.: A first-order primal-dual algorithm for conve problems with applications to imaging. Journal of Mathematical Imaging and Vision 40(), 0 4 (0) 4. Davis, D., Yin, W.: Faster convergence rates of relaed Peaceman-Rachford and ADMM under regularit assumptions. arxiv preprint arxiv:407.0 (04). Davis, D., Yin, W.: Convergence rate analsis of several splitting schemes. In: R. Glowinski, S. Osher, W. Yin (eds.) Splitting Methods in Communication and Imaging, Science and Engineering, Chapter 4. Springer (06) 6. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. Journal of Scientific Computing 66(), (0) 7. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Transactions of the American Mathematical Societ 8(), 4 49 (96) 8. Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D. thesis, Massachusetts Institute of Technolog (989) 9. Eckstein, J., Fukushima, M.: Some reformulations and applications of the alternating direction method of multipliers. In: Large Scale Optimization, pp. 4. Springer (994) 0. Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal-dual algorithms for conve optimization in imaging science. SIAM Journal on Imaging Sciences (4), (00). Esser, J.: Primal dual algorithms for conve models and applications to image restoration, registration and nonlocal inpainting. Ph.D. thesis, Universit of California, Los Angeles (00). Fukushima, M.: The primal Douglas-Rachford splitting algorithm for a class of monotone mappings with application to the traffic equilibrium problem. Mathematical Programming 7(), (996). Gaba, D.: Applications of the method of multipliers to variational inequalities. In: M. Fortin, R. Glowinski (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundar-Value Problems. North-Holland: Amsterdam, Amsterdam (98) 4. Gaba, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approimation. Computers & Mathematics with Applications (), 7 40 (976)

Self Equivalence of the Alternating Direction Method of Multipliers

Self Equivalence of the Alternating Direction Method of Multipliers Self Equivalence of the Alternating Direction Method of Multipliers Ming Yan Wotao Yin March 9, 2015 Abstract The alternating direction method of multipliers (ADM or ADMM) breaks a comple optimization

More information

Self Equivalence of the Alternating Direction Method of Multipliers

Self Equivalence of the Alternating Direction Method of Multipliers Self Equivalence of the Alternating Direction Method of Multipliers Ming Yan Wotao Yin August 11, 2014 Abstract In this paper, we show interesting self equivalence results for the alternating direction

More information

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem 1 Conve Analsis Main references: Vandenberghe UCLA): EECS236C - Optimiation methods for large scale sstems, http://www.seas.ucla.edu/ vandenbe/ee236c.html Parikh and Bod, Proimal algorithms, slides and

More information

Linear programming: Theory

Linear programming: Theory Division of the Humanities and Social Sciences Ec 181 KC Border Convex Analsis and Economic Theor Winter 2018 Topic 28: Linear programming: Theor 28.1 The saddlepoint theorem for linear programming The

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

A. Derivation of regularized ERM duality

A. Derivation of regularized ERM duality A. Derivation of regularized ERM dualit For completeness, in this section we derive the dual 5 to the problem of computing proximal operator for the ERM objective 3. We can rewrite the primal problem as

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS

QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Journal of the Operations Research Societ of Japan 008, Vol. 51, No., 191-01 QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Tomonari Kitahara Shinji Mizuno Kazuhide Nakata Toko Institute of Technolog

More information

8. BOOLEAN ALGEBRAS x x

8. BOOLEAN ALGEBRAS x x 8. BOOLEAN ALGEBRAS 8.1. Definition of a Boolean Algebra There are man sstems of interest to computing scientists that have a common underling structure. It makes sense to describe such a mathematical

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

The Entropy Power Inequality and Mrs. Gerber s Lemma for Groups of order 2 n

The Entropy Power Inequality and Mrs. Gerber s Lemma for Groups of order 2 n The Entrop Power Inequalit and Mrs. Gerber s Lemma for Groups of order 2 n Varun Jog EECS, UC Berkele Berkele, CA-94720 Email: varunjog@eecs.berkele.edu Venkat Anantharam EECS, UC Berkele Berkele, CA-94720

More information

A set C R n is convex if and only if δ C is convex if and only if. f : R n R is strictly convex if and only if f is convex and the inequality (1.

A set C R n is convex if and only if δ C is convex if and only if. f : R n R is strictly convex if and only if f is convex and the inequality (1. ONVEX OPTIMIZATION SUMMARY ANDREW TULLOH 1. Eistence Definition. For R n, define δ as 0 δ () = / (1.1) Note minimizes f over if and onl if minimizes f + δ over R n. Definition. (ii) (i) dom f = R n f()

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

Math 273a: Optimization Lagrange Duality

Math 273a: Optimization Lagrange Duality Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

RELATIONS AND FUNCTIONS through

RELATIONS AND FUNCTIONS through RELATIONS AND FUNCTIONS 11.1.2 through 11.1. Relations and Functions establish a correspondence between the input values (usuall ) and the output values (usuall ) according to the particular relation or

More information

arxiv: v4 [math.oc] 29 Jan 2018

arxiv: v4 [math.oc] 29 Jan 2018 Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:

More information

Week 3 September 5-7.

Week 3 September 5-7. MA322 Weekl topics and quiz preparations Week 3 September 5-7. Topics These are alread partl covered in lectures. We collect the details for convenience.. Solutions of homogeneous equations AX =. 2. Using

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Demonstrate solution methods for systems of linear equations. Show that a system of equations can be represented in matrix-vector form.

Demonstrate solution methods for systems of linear equations. Show that a system of equations can be represented in matrix-vector form. Chapter Linear lgebra Objective Demonstrate solution methods for sstems of linear equations. Show that a sstem of equations can be represented in matri-vector form. 4 Flowrates in kmol/hr Figure.: Two

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric

More information

MATRIX TRANSFORMATIONS

MATRIX TRANSFORMATIONS CHAPTER 5. MATRIX TRANSFORMATIONS INSTITIÚID TEICNEOLAÍOCHTA CHEATHARLACH INSTITUTE OF TECHNOLOGY CARLOW MATRIX TRANSFORMATIONS Matri Transformations Definition Let A and B be sets. A function f : A B

More information

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

Tensor products in Riesz space theory

Tensor products in Riesz space theory Tensor products in Riesz space theor Jan van Waaij Master thesis defended on Jul 16, 2013 Thesis advisors dr. O.W. van Gaans dr. M.F.E. de Jeu Mathematical Institute, Universit of Leiden CONTENTS 2 Contents

More information

Mathematics 309 Conic sections and their applicationsn. Chapter 2. Quadric figures. ai,j x i x j + b i x i + c =0. 1. Coordinate changes

Mathematics 309 Conic sections and their applicationsn. Chapter 2. Quadric figures. ai,j x i x j + b i x i + c =0. 1. Coordinate changes Mathematics 309 Conic sections and their applicationsn Chapter 2. Quadric figures In this chapter want to outline quickl how to decide what figure associated in 2D and 3D to quadratic equations look like.

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

8.7 Systems of Non-Linear Equations and Inequalities

8.7 Systems of Non-Linear Equations and Inequalities 8.7 Sstems of Non-Linear Equations and Inequalities 67 8.7 Sstems of Non-Linear Equations and Inequalities In this section, we stud sstems of non-linear equations and inequalities. Unlike the sstems of

More information

On the order of the operators in the Douglas Rachford algorithm

On the order of the operators in the Douglas Rachford algorithm On the order of the operators in the Douglas Rachford algorithm Heinz H. Bauschke and Walaa M. Moursi June 11, 2015 Abstract The Douglas Rachford algorithm is a popular method for finding zeros of sums

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics

More information

Homework Notes Week 6

Homework Notes Week 6 Homework Notes Week 6 Math 24 Spring 24 34#4b The sstem + 2 3 3 + 4 = 2 + 2 + 3 4 = 2 + 2 3 = is consistent To see this we put the matri 3 2 A b = 2 into reduced row echelon form Adding times the first

More information

Constant 2-labelling of a graph

Constant 2-labelling of a graph Constant 2-labelling of a graph S. Gravier, and E. Vandomme June 18, 2012 Abstract We introduce the concept of constant 2-labelling of a graph and show how it can be used to obtain periodic sphere packing.

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

PICONE S IDENTITY FOR A SYSTEM OF FIRST-ORDER NONLINEAR PARTIAL DIFFERENTIAL EQUATIONS

PICONE S IDENTITY FOR A SYSTEM OF FIRST-ORDER NONLINEAR PARTIAL DIFFERENTIAL EQUATIONS Electronic Journal of Differential Equations, Vol. 2013 (2013), No. 143, pp. 1 7. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu PICONE S IDENTITY

More information

Duality, Geometry, and Support Vector Regression

Duality, Geometry, and Support Vector Regression ualit, Geometr, and Support Vector Regression Jinbo Bi and Kristin P. Bennett epartment of Mathematical Sciences Rensselaer Poltechnic Institute Tro, NY 80 bij@rpi.edu, bennek@rpi.edu Abstract We develop

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

MAT389 Fall 2016, Problem Set 2

MAT389 Fall 2016, Problem Set 2 MAT389 Fall 2016, Problem Set 2 Circles in the Riemann sphere Recall that the Riemann sphere is defined as the set Let P be the plane defined b Σ = { (a, b, c) R 3 a 2 + b 2 + c 2 = 1 } P = { (a, b, c)

More information

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator https://doi.org/10.1007/s10915-018-0680-3 A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator Ming Yan 1,2 Received: 22 January 2018 / Accepted: 22 February 2018

More information

Section 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y

Section 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y Paul J. Bruillard MATH 0.970 Problem Set 6 An Introduction to Abstract Mathematics R. Bond and W. Keane Section 3.1: 3b,c,e,i, 4bd, 6, 9, 15, 16, 18c,e, 19a, 0, 1b Section 3.: 1f,i, e, 6, 1e,f,h, 13e,

More information

On Range and Reflecting Functions About the Line y = mx

On Range and Reflecting Functions About the Line y = mx On Range and Reflecting Functions About the Line = m Scott J. Beslin Brian K. Heck Jerem J. Becnel Dept.of Mathematics and Dept. of Mathematics and Dept. of Mathematics and Computer Science Computer Science

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

Sample-based Optimal Transport and Barycenter Problems

Sample-based Optimal Transport and Barycenter Problems Sample-based Optimal Transport and Barcenter Problems MAX UANG New York Universit, Courant Institute of Mathematical Sciences AND ESTEBAN G. TABA New York Universit, Courant Institute of Mathematical Sciences

More information

SEPARABLE EQUATIONS 2.2

SEPARABLE EQUATIONS 2.2 46 CHAPTER FIRST-ORDER DIFFERENTIAL EQUATIONS 4. Chemical Reactions When certain kinds of chemicals are combined, the rate at which the new compound is formed is modeled b the autonomous differential equation

More information

16.5. Maclaurin and Taylor Series. Introduction. Prerequisites. Learning Outcomes

16.5. Maclaurin and Taylor Series. Introduction. Prerequisites. Learning Outcomes Maclaurin and Talor Series 6.5 Introduction In this Section we eamine how functions ma be epressed in terms of power series. This is an etremel useful wa of epressing a function since (as we shall see)

More information

8.1 Exponents and Roots

8.1 Exponents and Roots Section 8. Eponents and Roots 75 8. Eponents and Roots Before defining the net famil of functions, the eponential functions, we will need to discuss eponent notation in detail. As we shall see, eponents

More information

ABOUT THE WEAK EFFICIENCIES IN VECTOR OPTIMIZATION

ABOUT THE WEAK EFFICIENCIES IN VECTOR OPTIMIZATION ABOUT THE WEAK EFFICIENCIES IN VECTOR OPTIMIZATION CRISTINA STAMATE Institute of Mathematics 8, Carol I street, Iasi 6600, Romania cstamate@mail.com Abstract: We present the principal properties of the

More information

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method

More information

On convergence rate of the Douglas-Rachford operator splitting method

On convergence rate of the Douglas-Rachford operator splitting method On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford

More information

15. Eigenvalues, Eigenvectors

15. Eigenvalues, Eigenvectors 5 Eigenvalues, Eigenvectors Matri of a Linear Transformation Consider a linear ( transformation ) L : a b R 2 R 2 Suppose we know that L and L Then c d because of linearit, we can determine what L does

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Nash Equilibrium and the Legendre Transform in Optimal Stopping Games with One Dimensional Diffusions

Nash Equilibrium and the Legendre Transform in Optimal Stopping Games with One Dimensional Diffusions Nash Equilibrium and the Legendre Transform in Optimal Stopping Games with One Dimensional Diffusions J. L. Seton This version: 9 Januar 2014 First version: 2 December 2011 Research Report No. 9, 2011,

More information

4 Inverse function theorem

4 Inverse function theorem Tel Aviv Universit, 2013/14 Analsis-III,IV 53 4 Inverse function theorem 4a What is the problem................ 53 4b Simple observations before the theorem..... 54 4c The theorem.....................

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Eigenvectors and Eigenvalues 1

Eigenvectors and Eigenvalues 1 Ma 2015 page 1 Eigenvectors and Eigenvalues 1 In this handout, we will eplore eigenvectors and eigenvalues. We will begin with an eploration, then provide some direct eplanation and worked eamples, and

More information

2.2 SEPARABLE VARIABLES

2.2 SEPARABLE VARIABLES 44 CHAPTER FIRST-ORDER DIFFERENTIAL EQUATIONS 6 Consider the autonomous DE 6 Use our ideas from Problem 5 to find intervals on the -ais for which solution curves are concave up and intervals for which

More information

Polynomial and Rational Functions

Polynomial and Rational Functions Polnomial and Rational Functions Figure -mm film, once the standard for capturing photographic images, has been made largel obsolete b digital photograph. (credit film : modification of work b Horia Varlan;

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

1.7 Inverse Functions

1.7 Inverse Functions 71_0107.qd 1/7/0 10: AM Page 17 Section 1.7 Inverse Functions 17 1.7 Inverse Functions Inverse Functions Recall from Section 1. that a function can be represented b a set of ordered pairs. For instance,

More information

ES.1803 Topic 16 Notes Jeremy Orloff

ES.1803 Topic 16 Notes Jeremy Orloff ES803 Topic 6 Notes Jerem Orloff 6 Eigenalues, diagonalization, decoupling This note coers topics that will take us seeral classes to get through We will look almost eclusiel at 2 2 matrices These hae

More information

Chapter 4 Analytic Trigonometry

Chapter 4 Analytic Trigonometry Analtic Trigonometr Chapter Analtic Trigonometr Inverse Trigonometric Functions The trigonometric functions act as an operator on the variable (angle, resulting in an output value Suppose this process

More information

UNCORRECTED SAMPLE PAGES. 3Quadratics. Chapter 3. Objectives

UNCORRECTED SAMPLE PAGES. 3Quadratics. Chapter 3. Objectives Chapter 3 3Quadratics Objectives To recognise and sketch the graphs of quadratic polnomials. To find the ke features of the graph of a quadratic polnomial: ais intercepts, turning point and ais of smmetr.

More information

Ordinal One-Switch Utility Functions

Ordinal One-Switch Utility Functions Ordinal One-Switch Utilit Functions Ali E. Abbas Universit of Southern California, Los Angeles, California 90089, aliabbas@usc.edu David E. Bell Harvard Business School, Boston, Massachusetts 0163, dbell@hbs.edu

More information

Chapter 11 Optimization with Equality Constraints

Chapter 11 Optimization with Equality Constraints Ch. - Optimization with Equalit Constraints Chapter Optimization with Equalit Constraints Albert William Tucker 95-995 arold William Kuhn 95 oseph-ouis Giuseppe odovico comte de arane 76-. General roblem

More information

Strain Transformation and Rosette Gage Theory

Strain Transformation and Rosette Gage Theory Strain Transformation and Rosette Gage Theor It is often desired to measure the full state of strain on the surface of a part, that is to measure not onl the two etensional strains, and, but also the shear

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

An analytic proof of the theorems of Pappus and Desargues

An analytic proof of the theorems of Pappus and Desargues Note di Matematica 22, n. 1, 2003, 99 106. An analtic proof of the theorems of Pappus and Desargues Erwin Kleinfeld and Tuong Ton-That Department of Mathematics, The Universit of Iowa, Iowa Cit, IA 52242,

More information

Improved Fast Dual Gradient Methods for Embedded Model Predictive Control

Improved Fast Dual Gradient Methods for Embedded Model Predictive Control Preprints of the 9th World Congress The International Federation of Automatic Control Improved Fast Dual Gradient Methods for Embedded Model Predictive Control Pontus Giselsson Electrical Engineering,

More information

Review Topics for MATH 1400 Elements of Calculus Table of Contents

Review Topics for MATH 1400 Elements of Calculus Table of Contents Math 1400 - Mano Table of Contents - Review - page 1 of 2 Review Topics for MATH 1400 Elements of Calculus Table of Contents MATH 1400 Elements of Calculus is one of the Marquette Core Courses for Mathematical

More information

x y plane is the plane in which the stresses act, yy xy xy Figure 3.5.1: non-zero stress components acting in the x y plane

x y plane is the plane in which the stresses act, yy xy xy Figure 3.5.1: non-zero stress components acting in the x y plane 3.5 Plane Stress This section is concerned with a special two-dimensional state of stress called plane stress. It is important for two reasons: () it arises in real components (particularl in thin components

More information

4Cubic. polynomials UNCORRECTED PAGE PROOFS

4Cubic. polynomials UNCORRECTED PAGE PROOFS 4Cubic polnomials 4.1 Kick off with CAS 4. Polnomials 4.3 The remainder and factor theorems 4.4 Graphs of cubic polnomials 4.5 Equations of cubic polnomials 4.6 Cubic models and applications 4.7 Review

More information

D u f f x h f y k. Applying this theorem a second time, we have. f xx h f yx k h f xy h f yy k k. f xx h 2 2 f xy hk f yy k 2

D u f f x h f y k. Applying this theorem a second time, we have. f xx h f yx k h f xy h f yy k k. f xx h 2 2 f xy hk f yy k 2 93 CHAPTER 4 PARTIAL DERIVATIVES We close this section b giving a proof of the first part of the Second Derivatives Test. Part (b) has a similar proof. PROOF OF THEOREM 3, PART (A) We compute the second-order

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Lecture 7: Weak Duality

Lecture 7: Weak Duality EE 227A: Conve Optimization and Applications February 7, 2012 Lecture 7: Weak Duality Lecturer: Laurent El Ghaoui 7.1 Lagrange Dual problem 7.1.1 Primal problem In this section, we consider a possibly

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

4. Product measure spaces and the Lebesgue integral in R n.

4. Product measure spaces and the Lebesgue integral in R n. 4 M. M. PELOSO 4. Product measure spaces and the Lebesgue integral in R n. Our current goal is to define the Lebesgue measure on the higher-dimensional eucledean space R n, and to reduce the computations

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures AB = BA = I,

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures AB = BA = I, FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 7 MATRICES II Inverse of a matri Sstems of linear equations Solution of sets of linear equations elimination methods 4

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient

More information

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY UNIVERSITY OF MARYLAND: ECON 600 1. Some Eamples 1 A general problem that arises countless times in economics takes the form: (Verbally):

More information

8.4. If we let x denote the number of gallons pumped, then the price y in dollars can $ $1.70 $ $1.70 $ $1.70 $ $1.

8.4. If we let x denote the number of gallons pumped, then the price y in dollars can $ $1.70 $ $1.70 $ $1.70 $ $1. 8.4 An Introduction to Functions: Linear Functions, Applications, and Models We often describe one quantit in terms of another; for eample, the growth of a plant is related to the amount of light it receives,

More information

Properties of Limits

Properties of Limits 33460_003qd //04 :3 PM Page 59 SECTION 3 Evaluating Limits Analticall 59 Section 3 Evaluating Limits Analticall Evaluate a it using properties of its Develop and use a strateg for finding its Evaluate

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed

More information

Get Solution of These Packages & Learn by Video Tutorials on Matrices

Get Solution of These Packages & Learn by Video Tutorials on  Matrices FEE Download Stud Package from website: wwwtekoclassescom & wwwmathsbsuhagcom Get Solution of These Packages & Learn b Video Tutorials on wwwmathsbsuhagcom Matrices An rectangular arrangement of numbers

More information

You don't have to be a mathematician to have a feel for numbers. John Forbes Nash, Jr.

You don't have to be a mathematician to have a feel for numbers. John Forbes Nash, Jr. Course Title: Real Analsis Course Code: MTH3 Course instructor: Dr. Atiq ur Rehman Class: MSc-II Course URL: www.mathcit.org/atiq/fa5-mth3 You don't have to be a mathematician to have a feel for numbers.

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Affine transformations

Affine transformations Reading Optional reading: Affine transformations Brian Curless CSE 557 Autumn 207 Angel and Shreiner: 3., 3.7-3. Marschner and Shirle: 2.3, 2.4.-2.4.4, 6..-6..4, 6.2., 6.3 Further reading: Angel, the rest

More information

10.2 The Unit Circle: Cosine and Sine

10.2 The Unit Circle: Cosine and Sine 0. The Unit Circle: Cosine and Sine 77 0. The Unit Circle: Cosine and Sine In Section 0.., we introduced circular motion and derived a formula which describes the linear velocit of an object moving on

More information

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants 53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants Panagiotis Patrinos and Lorenzo Stella

More information

Gauss and Gauss Jordan Elimination

Gauss and Gauss Jordan Elimination Gauss and Gauss Jordan Elimination Row-echelon form: (,, ) A matri is said to be in row echelon form if it has the following three properties. () All row consisting entirel of zeros occur at the bottom

More information

ALTERNATING DIRECTION METHOD OF MULTIPLIERS WITH VARIABLE STEP SIZES

ALTERNATING DIRECTION METHOD OF MULTIPLIERS WITH VARIABLE STEP SIZES ALTERNATING DIRECTION METHOD OF MULTIPLIERS WITH VARIABLE STEP SIZES Abstract. The alternating direction method of multipliers (ADMM) is a flexible method to solve a large class of convex minimization

More information

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725 Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions

More information

Convex Optimization Overview (cnt d)

Convex Optimization Overview (cnt d) Conve Optimization Overview (cnt d) Chuong B. Do November 29, 2009 During last week s section, we began our study of conve optimization, the study of mathematical optimization problems of the form, minimize

More information

CONTINUOUS SPATIAL DATA ANALYSIS

CONTINUOUS SPATIAL DATA ANALYSIS CONTINUOUS SPATIAL DATA ANALSIS 1. Overview of Spatial Stochastic Processes The ke difference between continuous spatial data and point patterns is that there is now assumed to be a meaningful value, s

More information

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Mathematical Programming manuscript No. (will be inserted by the editor) On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Daniel O Connor Lieven Vandenberghe

More information

Chapter 5: Systems of Equations

Chapter 5: Systems of Equations Chapter : Sstems of Equations Section.: Sstems in Two Variables... 0 Section. Eercises... 9 Section.: Sstems in Three Variables... Section. Eercises... Section.: Linear Inequalities... Section.: Eercises.

More information