arxiv: v1 [math.oc] 21 Apr 2016

Size: px
Start display at page:

Download "arxiv: v1 [math.oc] 21 Apr 2016"

Transcription

1 Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv: v [math.oc] Apr 06 Abstract We study acceleration and preconditioning strategies for a class of Douglas Rachford methods aiming at the solution of convex-concave saddle-point problems associated with Fenchel Rockafellar duality. While the basic iteration converges weakly in Hilbert space with O(/k) ergodic convergence of restricted primal-dual gaps, acceleration can be achieved under strong-convexity assumptions. Namely, if either the primal or dual functional in the saddle-point formulation is strongly convex, then the method can be modified to yield O(/k ) ergodic convergence. In case of both functionals being strongly convex, similar modifications lead to an asymptotic convergence of O(ϑ k ) for some 0 < ϑ <. All methods allow in particular for preconditioning, i.e., the inexact solution of the implicit linear step in terms of linear splitting methods with all convergence rates being maintained. The efficiency of the proposed methods is verified and compared numerically, especially showing competitiveness with respect to state-of-the-art accelerated algorithms. Key words. Saddle-point problems, Douglas Rachford splitting, acceleration strategies, linear preconditioners, convergence analysis, primal-dual gap. AMS subject classifications. 65K0, 49K35, 90C5, 65F08. Introduction In this work we are concerned with Douglas Rachford-type algorithms for solving the saddle-point problem min max Kx, y + F(x) G(y) () x dom F y dom G where F : X ], ], G : Y ], ] are proper, convex and lower semi-continuous functionals on the real Hilbert spaces X and Y, respectively, and K : X Y is a linear and continuous mapping. Problems of this type commonly occur as primal-dual formulations in the context of Fenchel Rockafellar duality. In case the latter holds, solutions of () are exactly solutions of the primal-dual problem min F(x) + x X G (Kx), max F ( K y) G(y) () y Y with F and G denoting the Fenchel conjugate functionals of F and G, respectively. We propose and study a class of Douglas Rachford iterative algorithms for the solution of () with a focus Institute for Mathematics and Scientific Computing, University of Graz, Heinrichstraße 36, A-800 Graz, Austria. kristian.bredies@uni-graz.at. The Institute for Mathematics and Scientific Computing is a member of NAWI Graz ( Institute for Mathematical Sciences, Renmin University of China, No. 59, Zhongguancun Street, Haidian District, Beijing, 0087 Beijing, People s Republic of China. hpsun@amss.ac.cn.

2 on acceleration and preconditioning strategies. The main aspects are threefold: First, the algorithms only utilize specific implicit steps, the proximal mappings of F and G as well as the solution of linear equations. In particular, they converge independently from the choice of step-sizes associated with these operations. Here, convergence of the iterates is in the weak sense and we have a O(/k) rate for restricted primal-dual gaps evaluated at associated ergodic sequences. Second, in case of strong convexity of F or G, the iteration can be modified to yield accelerated convergence. Specifically, in case of strongly convex F, the associated primal sequence {x k } converges strongly at rate O(/k ) in terms of the quadratic norm distance while there is weak convergence for the associated dual sequence {y k }. The rate O(/k ) also follows for restricted primal-dual gaps evaluated at associated ergodic sequences. Moreover, in case of F and G both being strongly convex, another accelerating modification yields strongly convergent primal and dual sequences with a rate of o(ϑ k ) in terms of squared norms for some 0 < ϑ <. Analogously, the full primal-dual gap converges with the rate O(ϑ k ) when evaluated at associated ergodic sequences. Third, both the basic and accelerated versions of the Douglas Rachford iteration may be amended by preconditioning allowing for inexact solution of the implicit linear step. Here, preconditioning is incorporated by performing one or finitely many steps of linear operator splitting method instead of solving the full implicit equation in each iteration step. For appropriate preconditioners, the Douglas Rachford method as well as its accelerated variants converge with the same asymptotic rates. To the best knowledge of the authors, this is the first time that accelerated Douglas Rachfordtype methods for the solution of () are proposed for which the optimal rate of O(/k ) can be proven. Nevertheless, a lot of related research has already been carried out regarding the solution of () or (), the design and convergence analysis of Douglas Rachford-type methods as well as acceleration and preconditioning strategies. For the solution of (), first-order primal-dual methods are commonly used, such as the popular primal-dual iteration of [9] which performs explicit steps involving the operator K and its adjoint as well as proximal steps with respect to F and G. Likewise, a commonly-used approach for solving the primal problem in () is the alternating direction method of multipliers (ADMM) in which the arguments of F and G are treated independently and coupled by linear constraints involving the operator K, see [3]. The latter is often identified as a Douglas Rachford iteration on the dual problem in (), see [3, 4], however, as this iteration works, in its most general form, for maximally monotone operators (see [3]), it may also be applied directly on the optimality conditions for (), see [7], as it is also done in this paper. Besides this application, Douglas Rachford-type algorithms are applied for the minimization of sums of convex functionals (see [], for instance), potentially with additional structure (see [, ], for instance). Regarding acceleration strategies, the work [7] firstly introduced approaches to minimize a class of possibly non-smooth convex functional with optimal complexity O(/k ) on the functional error. Since then, several accelerated algorithms have been designed, most notably FISTA [] which is applicable for the primal problem in () when G is smooth, and again, the acceleration strategy presented in [9, 0], which is applicable when F is strongly convex and yields, as in this paper, optimal rates for restricted primal-dual gaps evaluated at ergodic sequences. Recently, an accelerated Douglas Rachford method for the minimization of the primal problem in () was proposed in [0], which, however, requires G to be quadratic and involves step-size constraints. In contrast, the acceleration strategies for the solution of () in the present paper only require strong convexity of the primal (or dual) functional, which are the same assumptions as in [9]. We also refer to [6, 9] for accelerated ADMM-type algorithms as well as to [8, ] for a further overview on acceleration strategies. Finally, preconditioning techniques for the solution of () or () can, for instance, be found in [] and base on modifying the Hilbert-space norms leading to different proximal mappings. In particular, for non-linear proximal mappings, this approach can be quite limiting with respect to the choice of preconditioners and often, only diagonal preconditioners are applicable in practice.

3 The situation is different for linear mappings where a variety of preconditioners is applicable. This circumstance has been exploited in [4, 5] where Douglas Rachford methods for the solution () have been introduced that realize one or finitely iteration steps of classical linear operator splitting methods, as it is also done in the present work. The latter allows in particular for the inexact solution of the implicit linear step without losing convergence properties or introducing any kind of error control. Besides these approaches, the Douglas Rachford iteration and related methods may be preconditioned by so-called metric selection, see [5]. This paper focuses, however, on obtaining linear convergence only and needs restrictive strong convexity and smoothness assumptions on the involved functionals. In contrast, the preconditioning approaches in the present work are applicable for the proposed Douglas Rachford methods for the solution of () and, in particular, for the accelerated variants without any additional assumptions. The outline of the paper is as follows. First, in Section, the Douglas Rachford-type algorithm is derived from the optimality conditions for () which serves as a basis for acceleration. We prove weak convergence to saddle-points, an asymptotic ergodic rate of O(/k) for restricted primal-dual gaps and shortly discuss preconditioning. This iteration scheme is then modified, in Section 3, to yield accelerated iterations in case of strong convexity of the involved functionals. In particular, we obtain the ergodic rates O(/k ) and O(ϑ k ), respectively, in case that one or both functionals are strongly convex. For these accelerated schemes, preconditioning is again discussed. Numerical experiments and comparisons are then shown in Section 4 for the accelerated schemes and variational image denoising with quadratic discrepancy and total variation (TV) as well as Huber-regularized total variation. We conclude with a summary of the convergence results and an outlook in Section 5. The basic Douglas Rachford iteration To set up for the accelerated Douglas Rachford methods, let us first discuss the basic solution strategy for problem (). We proceed by writing the iteration scheme as a proximal point algorithm associated with as degenerate metric, similar to [4, 5]. Here, however, the reformulation differs in the way how the linear and non-linear terms are treated and by the fact that step-size operators are introduced. The latter allow, in particular, for different primal and dual step-sizes, a feature that is crucial for acceleration, as it will turn out in Section 3. The proposed Douglas Rachford iteration is derived as follows. First observe that in terms of subdifferentials, saddle-point pairs (x, y) X Y can be characterized by the inclusion relation ( ) 0 0 ( 0 K K 0 ) ( ) ( ) ( ) x F 0 x +. (3) y 0 G y This may, in turn, be written as the finding the root of the sum of two monotone operators: Denoting, Z = X Y, z = (x, y), we find the equivalent representation with the data ( 0 K Az = K 0 ) ( x y 0 Az + Bz ), Bz = ( F 0 0 G ) ( ) x. (4) y Both A and B are maximally monotone operators on Z. Since A has full domain, also A + B is maximally monotone [6]. Among the many possibilities for solving the root-finding problem 0 Az + Bz, the Douglas Rachford method is fully implicit, i.e., a reformulation in terms of the proximal point iteration suggests itself. This can indeed be achieved with a degenerate metric on Z Z. Introduce a step-size operator Σ : Z Z which is assumed to be a continuous, symmetric, positive operator. 3

4 Then, setting Σ z Az, we see that { 0 Bz + Σ z, 0 Az + Bz Σ z Az { 0 Bz + Σ z, 0 Σ z + Σ A Σ z. Hence, one arrives at the equivalent problem ( ) ( ) z B Σ 0 A, A = z Σ Σ A Σ. (5) We employ the proximal point algorithm with respect to the degenerate metric ( Σ Σ M = ), Σ Σ which corresponds to ( z 0 M k+ z k ) z k+ z k + A ( z k+ z k+ ), (6) or, equivalently, { z k+ = (id +ΣB) (z k z k ), z k+ = (id +A Σ ) (z k+ z k + z k ). Using Moreau s identity and substituting z k = z k z k, one indeed arrives at the Douglas Rachford iteration { z k+ = (id +ΣB) ( z k ), z k+ = z k + (id +ΣA) (z k+ z k ) z k+ (7). Plugging in the data (4) as well as setting, for σ, τ > 0, ( ) σ id 0 Σ =, (8) 0 τ id the iterative scheme (7) turns out to be equivalent to x k+ = (id +σ F) ( x k ), y k+ = (id +τ G) (ȳ k ), x k+ = x k+ σk (y k+ + ȳ k+ ȳ k ), ȳ k+ = y k+ + τk(x k+ + x k+ x k ). (9) The latter two equations are coupled but can easily be decoupled, resulting, e.g., with denoting d k+ = x k+ + x k+ x k, in an update scheme as shown in Table. With the knowledge of the resolvents (id +σ F), (id +τ G), (id +στk K), the iteration can be computed.. Convergence analysis Regarding convergence of the iteration (9), introduce the Lagrangian L : dom F dom G R associated with the saddle-point problem (): L(x, y) = Kx, y + F(x) G(y). If Fenchel Rockafellar duality holds, then we may consider restricted primal-dual gaps G X0 Y 0 : X Y ], ] associated with X 0 Y 0 dom F dom G given by G X0 Y 0 (x, y) = sup L(x, y ) L(x, y). (x,y ) X 0 Y 0 4

5 DR Objective: Solve min x dom F max y dom G Kx, y + F(x) G(y) Initialization: ( x 0, ȳ 0 ) X Y initial guess, σ > 0, τ > 0 step sizes Iteration: x k+ = (id +σ F) ( x k ) y k+ = (id +τ G) (ȳ k ) b k+ = (x k+ x k ) σk (y k+ ȳ k ) d k+ = (id +στk K) b k+ x k+ = x k x k+ + d k+ ȳ k+ = y k+ + τkd k+ Output: {(x k, y k )} primal-dual sequence (DR) Table : The basic Douglas Rachford iteration for the solution of convex-concave saddle-point problems of type (). Letting (x, y ) be a primal-dual solution pair of (), we furthermore consider the following restricted primal and dual error functionals E p Y 0 (x) = [ F(x) + sup y Y 0 Kx, y G(y ) ] [ F(x ) + sup y Y 0 Kx, y G(y ) ], E d X 0 (y) = [ G(y) + sup x X 0 K y, x F(x ) ] [ G(y ) + sup x X 0 K y, x F(x ) ]. The special case X 0 = dom F, Y 0 = dom G gives the well-known primal-dual gap as well as the primal and dual error, i.e., G(x, y) = F(x) + G (Kx) + G(y) + F ( K y), E p (x) = F(x) + G (Kx) [ F(x ) + G (Kx ) ], E d (y) = G(y) + F ( K y) [ G(y ) + F ( K y ) ], where F and G are again the Fenchel conjugates. For this reason, the study of the difference of Lagrangian is appropriate for convergence analysis. We will later find situations where G X0 Y 0, E p Y 0 and E d X 0 coincide, for the iterates of the proposed methods, with G, E p and E d, respectively, for X 0 and Y 0 not corresponding to the full domains of F and G. However, the restricted variants are already suitable for measuring optimality, as the following proposition shows. Proposition. Let X 0 Y 0 dom F dom G contain a saddle-point (x, y ) of (). Then, G X0 Y 0, E p Y 0 and E d X 0 are non-negative and we have G X0 Y 0 (x, y) = E p X 0 (x) + E d Y 0 (y), E p Y 0 (x) G {x } Y 0 (x, y), E d X 0 (y) G X0 {y }(x, y) for each (x, y) X Y. Proof. We first show the statements for G X0 Y 0. As (x, y ) is a saddle-point, L(x, y ) L(x, y) 0 for any (x, y) X Y, hence G X0 Y 0 (x, y) 0. By optimality of saddle-points, it follows that G (Kx ) = Kx, y G(y ) sup y Y 0 Kx, y G(y ) Kx, y G(y ), and analogously, F ( K y ) = sup x X 0 K y, x F(x ). By Fenchel Rockafellar duality, F(x ) + G (Kx ) + G(y ) + F ( K y ) = 0, hence E p Y 0 (x) + E d X 0 (y) = F(x) + sup y Y 0 Kx, y G(y ) + G(y) + sup x X 0 K y, x F(x ) = sup (x,y ) X 0 Y 0 L(x, y ) L(x, y) = G X0 Y 0 (x, y). 5

6 Next, observe that optimality of (x, y ) and the subgradient inequality implies E p Y 0 (x) F(x) + Kx, y G(y ) F(x ) Kx, y + G(y ) = F(x) F(x ) K y, x x 0. Analogously, one sees that E d X 0 0. Using the non-negativity and the already-proven identity for G X0 Y 0 yields E p Y 0 (x) E p Y 0 (x) + E d {x } (y) = G {x } Y 0 (x, y) and the analogous estimate E d X 0 (x) G X0 {y }(x, y). We are particularly interested in the restricted error functionals associated with bounded sets containing a saddle-point. These can, however, coincide with the full functionals under mild assumptions. Lemma. Suppose a saddle point (x, y ) of L exists.. If F is strongly coercive, i.e., F(x)/ x if x, then for each bounded Y 0 dom G there is a bounded X 0 dom F such that E d = E d X 0 on Y 0.. If G is strongly coercive, then for each bounded X 0 dom F there is a bounded Y 0 dom G such that E p = E p Y 0 on X If both F and G are strongly coercive, then for bounded X 0 Y 0 dom F dom G there exist bounded X 0 Y 0 dom F dom G such that G = G X 0 Y 0 on X 0 Y 0. Proof. Clearly, it suffices to show the first statement as the second follows by analogy and the third by combining the first and the second. For this purpose, let Y 0 dom G be bounded, i.e., y C for each y Y 0 {y } and some C > 0 independent of y. Pick x 0 dom F and choose M x 0 such that F(x ) C K ( x + x 0 ) + F(x 0 ) for each x > M which is possible by strong coercivity of F. Then, for all y Y 0 {y } and x > M, K y, x F(x ) K x y C K x C K x 0 F(x 0 ) K y, x 0 F(x 0 ), hence the supremum F ( K y) = sup x dom F K y, x F(x ) is attained on the bounded set X 0 = { x M}. This shows E d = E d X 0 on Y 0. As the next step preparing the convergence analysis, let us denote the symmetric bilinear form associated with the positive semi-definite operator M: [ ] [ ] z z (z, z ), (z, z ) M = M, = Σ (z z z z ), z z. With the choice (8) and the substitution z = z z = ( x, ȳ) for (z, z) = (x, y, x, ỹ), this becomes, with a slight abuse of notation, (z, z ), (z, z ) M = z, z M = σ x, x + τ ȳ, ȳ. Naturally, the induced squared norm will be denoted by (z, z) M = z M = σ x + τ ȳ. Lemma. For each k N and z = (x, y) dom F dom G, iteration (9) satisfies L(x k+, y) L(x, y k+ ) zk (id ΣA)z M zk+ (id ΣA)z M zk z k+ M = ( x k x + σk y xk+ x + σk y xk x k+ ) σ + ( ȳ k y τkx ȳk+ y τkx ȳk ȳ k+ ). τ (0) 6

7 Proof. The iteration (9) tells that σ ( xk x k+ ) F(x k+ ) and τ (ȳk y k+ ) G(y k+ ), meaning, in particular, that F(x k+ ) F(x) σ xk x k+, x k+ x, G(y k+ ) G(y) τ ȳk y k+, y k+ y. With the identity σ xk x k+, x k+ x = σ xk x k+, x k+ x + σ xk+ x k+, x k+ + x k+ x k x and the update rule for x k+ in (9) in the difference x k+ x k+, we get σ xk x k+, x k+ x Kx, y k+ = σ xk x k+, x k+ x K(x k+ + x k+ x k x), y k+ + ȳ k+ ȳ k + Kx, y k+ = σ xk x k+, x k+ x Kx, ȳ k ȳ k+ K(x k+ + x k+ x k ), y k+ + ȳ k+ ȳ k. Analogously, we may arrive at τ ȳk y k+, y k+ y + Kx k+, y = τ ȳk ȳ k+, ȳ k+ y + x k x k+, K y + K(x k+ + x k+ x k ), y k+ + ȳ k+ ȳ k. Consequently, the estimate L(x k+, y) L(x, y k+ ) σ xk x k+, x k+ x Kx, y k+ + τ ȳk y k+, y k+ y + Kx k+, y = σ xk x k+, x k+ x + σk y + τ ȳk ȳ k+, ȳ k+ y τkx = z k z k+, z k+ (id ΣA)z M, () remembering the definitions (4) and (8). Employing the Hilbert space identity u, v = u + v u v leads us to (0). Lemma 3. Iteration (9) possesses the following properties:. A point (x, y, x, ȳ ) is a fixed point if and only if (x, y ) is a saddle point for () and x σk y = x, y + τkx = ȳ.. If w-lim i ( x k i, ȳ k i ) = ( x, ȳ ) and lim i ( x k i x k i+, ȳ k i ȳ k i+ ) = (0, 0), then (x k i+, y k i+, x k i+, ȳ k i+ ) converges weakly to a fixed point. Proof. Regarding the first point, observe that as (9) is equivalent to (6) with the choice (8) and z k = z k z k for all k, fixed-points (z, z ) are exactly the solutions to (5) of the form (z, z ), z = z z. With A and B according to (4), fixed-points are equivalent to solutions of (5) which obey 0 Az + Bz and z ΣAz = z which means that z = (x, y ) is a solution of () and z = ( x, ȳ ) satisfies x = x σk y and ȳ = y + τkx. Next, suppose that some subsequence ( x k i, ȳ k i ) converges weakly to ( x, ȳ ) and lim i ( x ki+ x k i, ȳ ki+ ȳ k i ) = (0, 0) as i. The iteration (9) then gives [ ] [ x k i + id σk = τk id y k i+ ] [ x k i + + σk (ȳ k i+ ȳ k i ) ȳ k i+ τk( x k i+ x k i ) ], 7

8 so the sequence (x ki+, y ki+ ) converges weakly to (x, y ) with x σky = x and y +τkx = ȳ by weak sequential continuity of continuous linear operators. Now, iteration (9) can also be written as σ ( xk i x ki+ ) K (ȳ ki+ ȳ k i ) K y ki+ + F(x ki+ ) τ (ȳk i ȳ ki+ ) + K( x ki+ x k i ) Kx ki+ + G(y ki+ ) where the left-hand side converges strongly to (0, 0). As (x, y) ( K y + F(x), Kx + G(y) ) is maximally monotone, it is in particular weak-strong closed. Consequently, (0, 0) ( K y + F(x ), Kx + G(y ) ), so (x, y ) is a saddle-point of (). In particular, x σky = x and y + τkx = ȳ, hence, the weak limit of {(x ki+, y ki+, x ki+, ȳ ki+ )} is a fixed-point of the iteration. Proposition. If () possesses a solution, then iteration (DR) converges weakly to a (x, y, x, ȳ ) for which (x, y ) is a solution of () and ( x, ȳ ) = (x σk y, y + τkx ). Proof. Let (x, y ) be a solution of (). Then, 0 L(x k+, y ) L(x, y k+ ) for each k, so by (0) and recursion we have for k 0 k that x k x + σk y + ȳk y τkx [ k + xk x k + + ȳk ȳ k + ] σ τ σ τ k =k 0 xk 0 x + σk y σ + ȳk 0 y τkx. () τ With x = x σk y and y = y + τkx and d k = σ xk x + τ ȳk y, this implies that {d k } is a non-increasing sequence with limit d. Furthermore, x k x k+ 0 as well as ȳ k ȳ k+ 0 as k. Now, () says that {( x k, ȳ k )} is bounded whenever () has a solution, hence there exists a weak accumulation point ( x, ȳ ). Setting (x, y ) Z as the unique solution to the equation { x σk y = x, y + τkx = ȳ (3), gives, by virtue of Lemma 3, that (x, y, x, ȳ ) is a fixed-point of the iteration with (x, y ) being a solution of () and an accumulation point of {(x k, y k )}. Suppose that ( x, ȳ ) is another weak accumulation point of the same sequence and choose the corresponding (x, y ) according to (3). Then, σ xk, x x + τ ȳk, ȳ ȳ = xk x + σk y + ȳk y τkx σ τ xk x + σk y ȳk y τkx x σ τ σ ȳ τ + x σ + ȳ. τ As both (x, y ) and (x, y ) are solutions to (), the right-hand side converges as a consequence of (). Plugging in the subsequences weakly converging to ( x, ȳ ) and ( x, ȳ ), respectively, on the left-hand side, we see that both limits must coincide: σ x, x x + τ ȳ, ȳ ȳ = σ x, x x + τ ȳ, ȳ ȳ, hence x = x and ȳ = ȳ, implying that the whole sequence {( x k, ȳ k )} converges weakly to ( x, ȳ ). Inspecting the iteration (9) we see that w-lim k (x k σk y k, y k +τkx k ) = ( x, ȳ ) and as solving for (x k, y k ) constitutes a continuous linear operator, also w-lim k (x k, y k ) = (x, y ), what remained to show. 8

9 In particular, restricted primal-dual gaps vanish for the iteration and can thus be used as a stopping criterion. Corollary. In the situation of Proposition, for X 0 Y 0 dom F dom G bounded and containing a saddle-point, the associated restricted primal-dual gap obeys G X0 Y 0 (x k, y k ) 0 for each k, lim k G X 0 Y 0 (x k, y k ) = 0. Proof. The convergence follows from applying the Cauchy Schwarz inequality to (), taking the supremum and noting that z k z k+ M 0 as k as shown in Proposition. This convergence comes, however, without a rate, making it necessary to switch to ergodic sequences. Here, the usual O(/k) convergence speed follows almost immediately. Theorem. Let X 0 dom F, Y 0 dom G be bounded and such that X 0 Y 0 contains a saddle-point, i.e., a solution of (). Then, in addition to w-lim k (x k, y k ) = (x, y ) the ergodic sequences x k erg = k x k, yerg k = k y k (4) k k k = converge weakly to x and y, respectively, and the restricted primal-dual gap obeys G X0 Y 0 (x k erg, y k erg) k [ x 0 x + σk y sup (x,y) X 0 Y 0 σ k = + ȳ0 y τkx ] = O(/k). (5) τ Proof. First of all, the assumptions state the existence of a saddle-point, so by Proposition, {(x k, y k )} converges weakly to a solution (x, y ) of (). To obtain the weak convergence of {x k erg}, test with an x and utilize the Stolz Cesàro theorem to obtain k lim k xk k erg, x = lim = xk, x k k k = = lim k xk, x = x, x. The property w-lim k y k erg = y can be proven analogously. Finally, in order to show the estimate on G X0 Y 0 (x k erg, y k erg), observe that for each (x, y) X 0 Y 0, the function (x, y ) L(x, y) L(x, y ) is convex. Thus, using (0) gives L(x k erg, y) L(x, y k erg) k k k k =0 k k =0 L(x k +, y) L(x, y k + ) ( x k x + σk y σ xk + x + σk y ) + ( ȳ k y τkx ȳk + y τkx ) τ ( x 0 x + σk y + ȳ0 y τkx ). k σ τ Taking the supremum over all (x, y) X 0 Y 0 gives G X0 Y 0 (x k erg, y k erg) on the left-hand side and its estimate from above in (5). 9

10 pdr Objective: Solve min x dom F max y dom G Kx, y + F(x) G(y) Initialization: Iteration: Output: ( x 0, ȳ 0, d 0 ) X Y X initial guess, σ > 0, τ > 0 step sizes, M feasible preconditioner for T = id +στk K x k+ = (id +σ F) ( x k ) y k+ = (id +τ G) (ȳ k ) b k+ = (x k+ x k ) σk (y k+ ȳ k ) d k+ = d k + M (b k+ T d k ) x k+ = x k x k+ + d k+ ȳ k+ = y k+ + τkd k+ {(x k, y k )} primal-dual sequence (pdr) Table : The preconditioned Douglas Rachford iteration for the solution of convex-concave saddle-point problems of type ().. Preconditioning The iteration (DR) can now be preconditioned as follows. We amend the dual variable y by a preconditioning variable x p X and the linear operator K by H : X X linear, continuous and consider the saddle-point problem min x dom F max Kx, y + Hx, x p + F(x) G(y) I {0} (x p ) (6) y dom G, x p=0 whose solutions have exactly the form (x, y, 0) with (x, y ) being a saddle-point of (). Plugging this into the Douglas Rachford iteration for (6) yields by construction that x k+ p = 0 as well as x k+ p = τhd k+ for each k, so no new quantities have to be introduced into the iteration. Introducing the notation T = id +στk K and M = id +στ(k K + H H), we see that the step d k+ = T b k+ in (DR) is just replaced by d k+ = d k + M (b k+ T d k ), which can be interpreted as one iteration step of an operator splitting method for the solution of T d k+ = b k+ with respect to the splitting T = M (M T ). The full iteration is shown in Table (the term feasible preconditioner is defined after the next paragraph). Given T = id +στk K, one usually aims at choosing M such that it corresponds to wellknown solution techniques such as the Gauss Seidel or successive over-relaxation procedure in the finite-dimensional case, for instance. For this purpose, one has to ensure that a H : X X linear and continuous can be found for given M. The following definition from [5] gives a necessary and sufficient criterion. Definition. Let M, T : X X linear, self-adjoint, continuous and positive semi-definite. Then, M is called a feasible preconditioner for T if M is boundedly invertible and M T is positive semi-definite. Indeed, given H : X X linear and continuous, it is clear from the definition of M and T that M is a feasible preconditioner. The other way around, if M is a feasible preconditioner for T, then one can take the square root of the operator M T and consequently, there is linear, continuous and self-adjoint H : X X such that H = στ (M T ) which implies M = id +στ(k K + H H). Summarized, the preconditioned Douglas Rachford iterations based on (6) exactly corresponds to (pdr) with M being a feasible preconditioner. Its convergence then follows from Proposition and Theorem, with weak convergence and asymptotic rate being maintained. 0

11 Preconditioner T λ id (λ + )D M SGS M SSOR Conditions λ T λ λ max (T D) ω (0, ) Douglas Symmetric Symmetric Iteration type Richardson Damped Jacobi Rachford Gauss Seidel SOR D = diag(t ), T = D E E, E lower triangular, M SGS = (D E)D (D E ), M SSOR = ( ω ω D E)( ω D) ( ω D E ) Table 3: Summary of common preconditioners and possible conditions for feasibility. Theorem. Let X 0 dom F, Y 0 dom G be bounded and such that X 0 Y 0 contains a saddlepoint, i.e., a solution of (). Further, let M be a feasible preconditioner for T = id +στk K. Then, for the iteration generated by (pdr), it holds that w-lim k (x k, y k ) = (x, y ) with (x, y ) being a saddle-point of (). Additionally, the ergodic sequence {(x k erg, y k erg)} given by (4) converges weakly to (x, y ) and the restricted primal-dual gap obeys G X0 Y 0 (x k erg, y k erg) k where x M T = (M T )x, x. [ x 0 x + σk y sup + d0 x M T (x,y) X 0 Y 0 σ σ + ȳ0 y τkx ] τ (7) Proof. The above considerations show that one can find a H : X X such that the iteration (pdr) corresponds to the Douglas Rachford iteration for the solution of (6). Proposition then already yields weak convergence and Theorem gives the estimate (5) on the restricted primal-dual gap associated with (6) (note that we apply it for X 0 (Y 0 {0})). Now, observe that as x p = 0 must hold, the latter coincides with the restricted primal-dual gap associated with (). Plugging in x 0 p = τhd 0 and M T = στh H into (5) finally yields (7). The condition that M should be feasible for T allows for a great flexibility in preconditioning. Basically, classical splitting methods such as symmetric Gauss Seidel or symmetric successive over-relaxation are feasible as a n-fold application thereof. We summarize the most important statements regarding feasibility of classical operator splittings in Table 3 and some general results in the following propositions and refer to [4, 5] for further details. Proposition 3. Let T : X X linear, continuous, symmetric and positive definite be given. Then, the following update schemes correspond to feasible preconditioners.. For M 0 : X X linear, continuous such that M 0 T is positive definite, { d k+/ = d k + M0 (bk+ T d k ), d k+ = d k+/ + M 0 (bk+ T d k+/ ).. For M : X X feasible for T and n, { d k+(i+)/n = d k+i/n + M (b k+ T d k+i/n ), i = 0,..., n. (8) (9) 3. For T = T + T, with T, T : X X linear, continuous, symmetric, T positive definite, T positive semi-definite and M : X X feasible for T, M T boundedly invertible, { d k+/ = d k + M ( (b k+ T d k ) T d k), d k+ = d k + M ( (b k+ T d k+/ ) T d k) (0).

12 The update scheme (8) is useful if one has a non-symmetric feasible preconditioner M 0 for T as then, the concatenation with the adjoint preconditioner will be symmetric and feasible. Likewise, (9) corresponds to the n-fold application of a feasible preconditioner which is again feasible. Finally, (0) is useful if T can be split into T + T for which T can be easily preconditioned by M. Then, one obtains a feasible preconditioner for T only using forward evaluation of T. 3 Acceleration strategies We now turn to the main results of the paper. As already mentioned in the introduction, in the case that the functionals F and G possess strong convexity properties, the iteration (DR) can be modified such that the asymptotic ergodic rates for the restricted primal-dual gap can further be improved. Definition. A convex functional F : X R admits the modulus of strong convexity γ 0 if for each x, x dom F and λ [0, ] we have F ( λx + ( λ) x ) λf(x) + ( λ)f( x) γ λ( λ) x x. Likewise, a concave functional F admits the modulus of strong concavity γ if F admits the modulus of strong convexity γ. Note that for x dom F, ξ F(x) and x dom F it follows that F(x) + ξ, x x + γ x x F( x). The basic idea for acceleration is now, in case the modulus of strong convexity is positive, to use the additional quadratic terms to obtain estimates of the type ρ ( L(x k+, y) L(x, y k+ ) ) + res(ˆx k, ŷ k, x k+, y k+, x ρ, y ρ ) ˆxk x ρ + σk y ρ + ŷk y ρ τkx ρ σ τ [ ˆx k+ x ρ + σ K y ρ ϑ σ + ŷk+ y ρ τ Kx ρ ] τ for a modified iteration based on (DR). Here, the convex combination (x ρ, y ρ ) = ( ρ)(x, y ) + ρ(x, y) of (x, y) dom F dom G and a saddle-point (x, y ) is taken for ρ [0, ], (ˆx k, ŷ k ) denotes an additional primal-dual pair that is updated during the iteration, res a non-negative residual functional that will become apparent later, 0 < ϑ < an acceleration factor and σ > 0, τ > 0 step-sizes that are possibly different from σ and τ. In order to achieve this, we consider a pair (ˆx k, ŷ k ) X Y and set { x k+ = (id +σ F) (ˆx k ), y k+ = (id +τ G) (ŷ k () ), as well as, with ϑ p, ϑ d > 0, { x k+ = σk [y k+ + ϑ p (ȳ k+ ŷ k )], ỹ k+ = τk[x k+ + ϑ d ( x k+ ˆx k )], () where, throughout this section, we always denote x k+ = x k+ x k+ and ỹ k+ = y k+ ȳ k+. Observe that for ϑ p = ϑ d = and (ˆx k+, ŷ k+ ) = ( x k+, ȳ k+ ), the iteration (9) is recovered which corresponds to (DR). In the following, however, we will choose these parameters differently depending on the type of strong convexity of the F and G. To analyze the step () and (), let us first provide an estimate analogous to (0) in Lemma.

13 Lemma 4. Let ρ [0, ], denote by γ, γ 0 the moduli of strong convexity for F and G, respectively, and let (x, y ) dom F dom G be a saddle-point of L. For (ˆx k, ŷ k ) X Y as well as (x, y) dom F dom G, the equations () and () imply ρ ( L(x k+, y) L(x, y k+ ) ) + σ xk+ x ρ + σk y ρ + γ + ρ xk+ x ρ + σ ˆxk x k+ + ( ϑ p ) ŷ k ȳ k+, K[x k+ + x k+ ˆx k x ρ ] + τ ȳk+ y ρ τkx ρ + γ + ρ yk+ y ρ + τ ŷk ȳ k+ ( ϑ d ) y k+ + ȳ k+ ŷ k y ρ, K[ˆx k x k+ ] σ ˆxk x ρ + σk y ρ + τ ŷk y ρ τkx ρ (3) where x ρ = ( ρ)x + ρx and y ρ = ( ρ)y + ρy. Proof. First, optimality of (x, y ) according to (3), strong convexity and the subgradient inequality give γ xk+ x + γ yk+ y L(x k+, y ) L(x, y k+ ). Combined with strong concavity of x L(x, y k+ ) and y L(x k+, y) with moduli γ and γ, respectively, this yields for the convex combinations x ρ and y ρ that ( ρ) γ [ x k+ x + ρ x x ] + ( ρ) γ [ y k+ y + ρ y y ] + ρ ( L(x k+, y) L(x, y k+ ) ) L(x k+, y ρ ) L(x ρ, y k+ ). Employing convexity once more, we arrive at ρ γ + ρ xk+ x ρ ( ρ) γ [ x k+ x + ρ x x ] and the analogous statement for the dual variables. Putting things together yields ρ [ γ + ρ xk+ x ρ + γ yk+ y ρ ] + ρ ( L(x k+, y) L(x, y k+ ) ) L(x k+, y ρ ) L(x ρ, y k+ ). (4) The next steps aim at estimating the right-hand side of (4). Using again the strong convexity of F and G and the subgradient inequality for (), we obtain γ xk+ x ρ + F(x k+ ) F(x ρ ) as well as σ ˆxk x k+, x k+ x ρ = σ ˆxk x k+ + x k+, x k+ x k+ x ρ + σ ˆxk + x k+ x k+ + x ρ, x k+ = σ ˆxk x k+, x k+ x ρ σ xk+ + x k+ ˆx k x ρ, x k+ γ yk+ y ρ + G(y k+ ) G(y ρ ) τ ŷk ȳ k+, ȳ k+ y ρ τ yk+ + ȳ k+ ŷ k y ρ, ỹ k+. 3

14 Collecting the terms involving x k+ and ỹ k+, adding the primal-dual coupling terms Kx k+, y ρ Kx ρ, y k+ as well as employing () gives σ xk+ + x k+ ˆx k x ρ, x k+ τ yk+ + ȳ k+ ŷ k y ρ, ỹ k+ It then follows that + Kx k+, y ρ Kx ρ, y k+ = y k+ + ȳ k+ ŷ k y ρ, K[x k+ + x k+ ˆx k ] y k+ + ȳ k+ ŷ k, K[x k+ + x k+ ˆx k x ρ ] + ( ϑ p ) ȳ k+ ŷ k, K[x k+ + x k+ ˆx k x ρ ] ( ϑ d ) y k+ + ȳ k+ ŷ k y ρ, K[ x k+ ˆx k ] + Kx k+, y ρ Kx ρ, y k+ = ( ϑ p ) ȳ k+ ŷ k, K[x k+ + x k+ ˆx k x ρ ] ( ϑ d ) y k+ + ȳ k+ ŷ k y ρ, K[ x k+ ˆx k ] + y ρ, K[ˆx k x k+ ] ŷ k ȳ k+, Kx ρ. F(x k+ ) F(x ρ ) + G(y k+ ) G(y ρ ) + Kx k+, y ρ Kx ρ, y k+ σ ˆxk x k+, x k+ x ρ + σk y ρ + τ ŷk ȳ k+, ȳ k+ y ρ τkx ρ γ xk+ x ρ + ( ϑ p ) ȳ k+ ŷ k, K[x k+ + x k+ ˆx k x ρ ] γ yk+ y ρ ( ϑ d ) y k+ + ȳ k+ ŷ k y ρ, K[ x k+ ˆx k ]. (5) As before, employing the Hilbert-space identity u, v = u + v u v, plugging the result into (4) and rearranging finally yields (3). 3. Strong convexity of F Now, assume that γ > 0 while for γ, no restrictions are made, such that we set γ = 0. In this case, we can rewrite the quadratic terms involving x k+ and ȳ k+ in (3) as follows. Lemma 5. In the situation of Lemma 4 and for γ > 0 we have where σ xk+ x ρ +σk y ρ + γ xk+ x ρ + τ ȳk+ y ρ τkx ρ = [ ϑ σ ˆxk+ x ρ + σ K y ρ + τ ŷk+ y ρ τ Kx ρ ] + ϑ [ ϑ τ yk+ y ρ, τkx ρ + ỹ k+ σ xk+ x ρ, σk y ρ x ] k+ σγ τ τkx ρ + ỹ k+ (6) ϑ = + σγ, { σ = ϑσ, τ = ϑ τ, { ˆx k+ = x k+ ϑ x k+, ŷ k+ = y k+ ϑ ỹ k+. Proof. This is a result of straightforward computations which we present for completeness and (7) 4

15 reader s convenience. Computing σ xk+ x ρ + σk y ρ + γ xk+ x ρ = + σγ σ xk+ x ρ + + σγ x k+ x ρ, σ + + σγ σ + σγ [σk y ρ x k+ ] + σγ [σk y ρ x k+ ] σ ( + σγ ) x k+ x ρ, σk y ρ x k+ = + σγ + σγ x k+ x ρ + [σk y ρ x k+ ] σ + σγ σ ( + σγ ) x k+ x ρ, σk y ρ x k+ as well as τ ȳk+ y ρ τkx ρ = τ yk+ y ρ τ yk+ y ρ, + σγ[τkx ρ + ỹ k+ ] + τ + σγ[τkx ρ + ỹ k+ ] + τ ( + σγ ) y k+ y ρ, τkx ρ + ỹ k+ σγ τ τkx ρ + ỹ k+ + σγ = τ + σγ yk+ y ρ + σγ[τkx ρ + ỹ k+ ] + τ ( + σγ ) y k+ y ρ, τkx ρ + ỹ k+ σγ τ τkx ρ + ỹ k+ and plugging in the definitions (7) already gives (6). As the next step, we aim at combining the error terms on the right-hand side of (6) and left-hand side of (3) such that they become non-negative. One important point here is to choose ϑ p and ϑ d such that possibly negative terms involving y k+ y ρ cancel out. Lemma 6. In the situation of Lemma 5, with ϑ p = ϑ and ϑ d = ϑ, we have ( ϑ p ) ŷ k ȳ k+, K[x k+ + x k+ ˆx k x ρ ] ( ϑ d ) y k+ + ȳ k+ ŷ k y ρ, K[ˆx k x k+ ] + ϑ [ ϑ τ yk+ y ρ, τkx ρ + ỹ k+ σ xk+ x ρ, σk y ρ x ] k+ σγ τ τkx ρ + ỹ k+ c σ ˆxk x k+ c τ ŷk ȳ k+ whenever σ γτ K < c < holds for some c > 0. c( + σγ)σγτ K (c σ γτ K ) xk+ x ρ (8) Proof. We start with rewriting the scalar-product terms from the right-hand side of (6) by plugging in () as follows τ yk+ y ρ,τkx ρ + ỹ k+ σ xk+ x ρ, σk y ρ x k+ = y k+ y ρ, K[x ρ x k+ + x k+ ˆx k ] y k+ y ρ + ȳ k+ ŷ k, K[x ρ x k+ ] ( + ϑ d ) y k+ y ρ, K[ x k+ ˆx k ] + ( ϑ p ) ȳ k+ ŷ k, K[x ρ x k+ ] = ŷ k ȳ k+, K[x ρ x k+ + x k+ ˆx k ] y k+ y ρ + ȳ k+ ŷ k, K[ˆx k x k+ ] + ( + ϑ d ) y k+ y ρ, K[ˆx k x k+ ] + ( ϑ p ) ȳ k+ ŷ k, K[x ρ x k+ ] = ŷ k ȳ k+, K[ϑ p (x ρ x k+ ) + x k+ ˆx k ] ϑ d (y k+ y ρ ) + ȳ k+ ŷ k, K[ˆx k x k+ ]. 5

16 Incorporating the scalar-product terms from the left-hand side of (3), the choice of ϑ p and ϑ d yields ( ϑ p ) ŷ k ȳ k+, K[x k+ + x k+ ˆx k x ρ ] ( ϑ d ) y k+ + ȳ k+ ŷ k y ρ, K[ˆx k x k+ ] ϑ ϑ ϑ d(y k+ y ρ ) + ȳ k+ ŷ k, K[ˆx k x k+ ] + ϑ ϑ ŷk ȳ k+, K[ϑ p (x ρ x k+ ) + x k+ ˆx k ] = ŷ k ȳ k+, K[( ϑ p)(x k+ x ρ ) + (ϑ d ϑ p )( x k+ ˆx k )] = ( ϑ p) ŷ k ȳ k+, K[(x k+ x ρ ) + ϑ d ( x k+ ˆx k )]. Using Young s inequality, this allows to estimate, as c > 0, ( ϑ p) ŷ k ȳ k+, K[(x k+ x ρ ) + ϑ d ( x k+ ˆx k )] c τ ŷk ȳ k+ ( ϑ p) τ K[(x k+ x ρ ) + ϑ d ( x k+ ˆx k )]. c Taking the missing quadratic term into account, using that c < and once again the definitions of ϑ, ϑ p and ϑ d, gives, for ε > 0, the estimate ( ϑ p) τ K[(x k+ x ρ ) + ϑ d ( x k+ ˆx k )] + σγ c τ τkx ρ + ỹ k+ ( (ϑ p ) = + ϑ p ) τ K[(x k+ x ρ ) + ϑ d ( x k+ ˆx k )] c We would like to set which is equivalent to ϑ4 p ϑ p τ K[(x k+ x ρ ) + ϑ d ( x k+ ˆx k )] c ( + ε) ϑ p τ K x k+ ˆx k + c ( + ε ( + ε) ϑ p τ K = c c σ, ε = c σ γτ K σ γτ K ) ϑ 4 p ϑ p τ K x k+ x ρ. c and leading to ε > 0 since σ γτ K < c by assumption. Plugged into the last term in the above estimate, we get ( + ) ϑ 4 p ϑ p τ K x k+ x ρ = ε c Putting all estimates together then yields (8). = c ϑ p c σ γτ K c ϑ pτ K x k+ x ρ c( + σγ)σγτ K (c σ γτ K ) xk+ x ρ. In view of combining (3), (6) and (8), the factor in front of x k+ x ρ in (8) should not be too large. Choosing γ > 0 small enough, one can indeed control this quantity. Doing so, one arrives at the following result. Proposition 4. With the definitions in (7) and γ chosen such that 0 < γ < γ + ρ + ( + ρ + σγ )στ K (9) 6

17 there is a 0 < c < and a c > 0 such that the following estimate holds: ρ ( L(x k+, y) L(x, y k+ ) ) + [ ϑ σ ˆxk+ x ρ + σ K y ρ + τ ŷk+ y ρ τ Kx ρ ] [ + ( c) σ ˆxk x k+ + τ ŷk ȳ k+ ] + c xk+ x ρ Proof. Note that (9) implies that σ γτ K < and σ ˆxk x ρ + σk y ρ + τ ŷk y ρ τkx ρ. (30) ( + σγ)σγτ K ( σ γτ K ) < γ + ρ γ, so by continuity of c c( + σγ)σγτ K / ( (c σ γτ K ) ) in c =, one can find a c < with c > σ γτ K such that c = γ + ρ γ c( + σγ)σγτ K (c σ γτ K ) > 0. The prerequisites for Lemma 6 are satisfied, hence one can combine (3), (6) and (8) in order to get (30). Remark. From the proof it is also immediate that if (9) holds for a σ 0 > 0 instead of σ, the estimate (30) will still hold for all 0 < σ σ 0 and τ > 0 such that στ = σ 0 τ 0 with c and c independent from σ. The estimate (30) suggests to adapt the step-sizes (σ, τ) (σ, τ ) in each iteration step. This yields the accelerated Douglas Rachford iteration which obeys the following recursion: ϑ k = + σk γ, x k+ = (id +σ k F) (ˆx k ), y k+ = (id +τ k G) (ŷ k ), x k+ = σ k K [y k+ + ϑ k (ȳk+ ŷ k )], ỹ k+ = τ k K[x k+ + ϑ k ( x k+ ˆx k )], ˆx k+ = x k+ ϑ k x k+, ŷ k+ = y k+ ϑ k ỹk+, σ k+ = ϑ k σ k, τ k+ = ϑ k τ k. As before, the iteration can be written down explicitly and in a simplified manner, as, for instance, the product satisfies σ k τ k = σ 0 τ 0 and some auxiliary variables can be omitted. Such a version is summarized in Table 4 along with conditions we will need in the following convergence analysis. One sees in particular that (adr) requires only negligibly more computational and implementational effort than (DR). Remark. Of course, the iteration (adr) can easily be adapted to involve (id +σ 0 τ 0 KK ) instead of (id +σ 0 τ 0 K K), in case the former can be computed more efficiently, for instance. A straightforward application of Woodbury s formula, however, would introduce an additional evaluation of K and K, respectively, and hence, potentially higher computational effort. As in this situation, the issue cannot be resolved by simply interchanging primal and dual variable, we explicitly state the necessary modifications in order to maintain one evaluation of K and K in each iteration step: { b k+ = ( ( + ϑ k ˆx k+ = x k+ ϑ k σ k K d k+, )yk+ ϑ k (3) ŷk) + τ k K ( ( + ϑ k )xk+ ˆx k), d k+ = (id +σ 0 τ 0 KK ) b k+ ŷ k+ = ϑ k (ŷk y k+ ) + d k+. 7

18 adr Objective: Solve min x dom F max y dom G Kx, y + F(x) G(y) Prerequisites: F is strongly convex with modulus γ > 0 Initialization: Iteration: (ˆx 0, ŷ 0 ) X Y initial guess, σ 0 > 0, τ 0 > 0 initial step sizes, γ 0 < γ < +σ 0 τ 0 acceleration factor, ϑ K 0 = +σ0 γ x k+ = (id +σ k F) (ˆx k ) y k+ = (id +τ k G) (ŷ k ) b k+ = (( + ϑ k )x k+ ϑ k ˆx k ) σ k K (( + ϑ k )y k+ ŷ k ) d k+ = (id +σ 0 τ 0 K K) b k+ ˆx k+ = ϑ k (ˆx k x k+ ) + d k+ ŷ k+ = y k+ + ϑ k τ kkd k+ σ k+ = ϑ k σ k, τ k+ = ϑ k τ k, ϑ k+ = +σk+ γ (adr) Output: {(x k, y k )} primal-dual sequence Table 4: The accelerated Douglas Rachford iteration for the solution of convex-concave saddlepoint problems of type (). Now, as F is strongly convex, we might hope for improved convergence properties for {x k } compared to (DR). This is indeed the case. For the convergence analysis, we introduce the following quantities. λ k = k ϑ k k =0, ν k = [ k k =0 Lemma 7. The sequences {λ k } and {ν k } obey, for all k, + λ k ] (3) kσ 0 γ + σ0 γ + λ k + kσ [ 0γ, ν k k + (k )kσ ] 0γ ( = O(/k ). (33) + σ 0 γ + ) Proof. Observing that λ k+ = ϑ k λ k as well as σ k = σ 0 λ k λ k+ λ k = λ k ( + σk γ ) = we find that λ k and λ k + σk γ + σ kγ = σ 0 γ + σk γ +. for all k 0. The sequence {σ k } is monotonically decreasing and positive, so estimating the denominator accordingly gives the bounds on λ k. Summing up yields the estimate on ν k, in particular, for k, we have ν k 4( +σ 0 γ+) σ 0 γ, i.e., ν k k = O(/k ). Proposition 5. If () possesses a solution, then the iteration (adr) converges to a saddle-point (x, y ) of () in the following sense: lim k xk = x with x k x = O(/k ), w-lim k yk = y. In particular, each saddle-point (x, y ) of () satisfies x = x. Proof. Observe that since (adr) requires γ < γ /( + σ 0 τ 0 K ), and since σ k = σ 0 /λ k as well as Lemma 7 implies lim k σ k = 0, there is a k 0 0 such that (9) is satisfied for ρ = 0 and σ k 8

19 for all k k 0. Letting (x, y ) be a saddle-point, applying (30) recursively gives λ k [ σ k ˆx k x + σ k K y + τ k ŷ k y τ k Kx ] + k k =k 0 λ k [ c σ k ˆx k x k + + c ŷ k ȳ k + + c + τ k xk x ] λ k0 [ σ k0 ˆx k 0 x + σ k0 K y + τ k0 ŷ k 0 y τ k0 Kx ]. (34) This implies, on the one hand, the convergence of the series [ λ k ˆx k x k+ + ŷ k ȳ k+ + σ k τ k xk+ x ] <. k=0 As λ k = σ 0 /σ k = τ k /τ 0, we have in particular that lim k σ k (ˆx k x k+ ) = 0 and lim k (ŷ k ȳ k+ ) = 0. Furthermore, defining d k (x, y ) = λ k [ σ k ˆx k x + σ k K y + τ k ŷ k y τ k Kx ], the limit lim k d k (x, y ) = d (x, y ) has to exist: On the one hand, the sequence is bounded from below and admits a finite limes inferior. On the other hand, traversing with k 0 a subsequence that converges to the limes inferior, the estimate (34) yields that limes superior and limes inferior have to coincide, hence, the sequence is convergent. Denoting by ξ k = ˆx k x + σ k K y, ζ k = ŷ k y τ k Kx, we further conclude that { σ k ξ k } as well as {ζ k } are bounded. Plugging in the iteration (3) gives the identities { x k x σ k K (y k y ) = ξ k + ϑ k σ kk (ȳ k ŷ k ), y k y + τ k K(x k x ) = ζ k ϑ k τ k K( x k ˆx k (35) ), which can be solved with respect to x k x and y k y yielding { x k x = (id +σ 0 τ 0 K K) [ ξ k + σ k K ( ζ k + ϑ k (ȳk ŷ k ) ) σ 0 τ 0 ϑ k K K( x k ˆx k ) ], y k y = (id +σ 0 τ 0 KK ) [ ζ k τ k K ( ξ k + ϑ k ( x k ˆx k ) ) σ 0 τ 0 ϑ k KK (ȳ k ŷ k ) ]. Regarding the norm of x k x, the boundedness and convergence properties of the involved terms allow to conclude that x k x Cσ k = O(/k ) and, in particular, lim k x k = x as well as coincidence of the primal part for each saddle-point. Further, choosing a subsequence associated with the indices {k i } such that w-lim i σ ki ξ k i = ξ and w-lim i ζ k i = ζ (such a subsequence must exist), we see with τ k = σ 0 τ 0 /σ k that w-lim i σ ki (x k i x ) = (id +σ 0 τ 0 K K) (ξ + K ζ ), w-lim i (yk i y ) = (id +σ 0 τ 0 KK ) (ζ σ 0 τ 0 Kξ ). Weakening the statements, we conclude that w-lim i (x k i, y k i ) = (x, y ) for some y Y. Looking at the iteration (3) again yields σ ki τ ki (ˆx k i x k i ) ϑ k i K (ȳ k i ŷ k i ) K y k i + F(x k i ), (ŷ k i ȳ k i ) + ϑ ki K( x k i ˆx k i ) Kx k i + G(y k i ), 9

20 so by weak-strong closedness of maximally monotone operators, it follows that (0, 0) ( K y + F(x ), Kx + G(y ) ), i.e., (x, y ) is a saddle-point. As the subsequence was arbitrary, each weak accumulation point of {(x k, y k )} is a saddle-point. Suppose that (q, y ) and (q, y ) are both weak accumulation points of { ( σ k (x k x ), y k) }, ) = (q, y ) and w-lim i ( σ ki (x k i x ), y k i) = (q, y ) for some ( i.e., w-lim i σ ki (x k i x ), y k i {k i } and {k i }, respectively. Denote by (ξk, ζ k ) as above but with y replaced by y. Without loss of generality, we may assume that w-lim ( i σ ki ξ k i, ζ k i ) = (ξ, ζ ), w-lim ( i σ ξ k i k, ζ k i ) = (ξ, ζ ) i for some (ξ, ζ ), (ξ, ζ ) X Y. Dividing the first identity in (35) by σ k and passing both identities to the respective weak subsequential limits yields ξ ξ = q q K (y y ), ζ ζ = y y + σ 0 τ 0 K(q q ). Now, plugging in the definitions, we arrive at d k (x, y ) d k (x, y ) = σ 0 σ k ξ k + K (y y ), K (y y ) + τ 0 ζ k + y y, y y so that rearranging implies σ 0 σ k ξ k, K (y y ) + τ 0 ζ k, y y = d k (x, y ) d k (x, y ) + σ 0 K (y y ) + τ 0 y y. The right-hand side converges for the whole sequence, so passing the left-hand side to the respective subsequential weak limits and using the identities for ξ ξ and ζ ζ gives 0 = σ 0 ξ ξ, K (y y ) + τ 0 ζ ζ, y y = σ 0 K (y y ) + τ 0 y y, and hence, y = y. Consequently, y k y as k what was left to show. In addition to the convergence speed O(/k ) for x k x, restricted primal-dual gaps also converge and a restricted primal error of energy also admits a rate. Corollary. Let X 0 Y 0 dom F dom G be bounded and contain a saddle-point. Then, for the sequence generated by (adr), it holds that G X0 Y 0 (x k, y k ) 0 as k, E p Y 0 (x k ) = o(/k). Proof. From the estimate (5) in the proof of Lemma 4 with ρ = as well as the estimates ϑ k σ k γ and /ϑ k σ k γ we infer for (x, y) X 0 Y 0 that L(x k+, y) L(x, y k+ ) σ k σ k (ˆx k x k+ ) + σ k (ˆx k x k+ ) ˆx k x + σ k K y + σ k σ 0 τ 0 [ ŷ k ȳ k+ + ŷ k ȳ k+ ŷ k y τ k Kx ] + σ k γ K [ ȳ k+ ŷ k x k+ + x k+ ˆx k x + x k+ ˆx k y k+ + ȳ k+ ŷ k y ]. (36) 0

21 In the proof of Proposition 5 we have seen that σ k ˆx k x k+ 0 and ŷ k ȳ k+ 0 as k. Moreover, denoting by (x, y ) X 0 Y 0 a saddle-point of () and using the notation and results from the proof of Proposition 5, it follows that ˆx k x + σ k K y ξ k + sup x X 0 x x + σ 0 K sup y Y 0 y y C, σ k ŷ k y τ k Kx σ 0 ζ k + σ 0 sup y Y 0 y y + σ 0 τ 0 K sup x X 0 x x C for a suitable C > 0, so we see together with the boundedness of {(x k, y k )} that the right-hand side of (36) tends to zero as k, giving the estimate on G X0 Y 0. Regarding the rate on E p Y 0 (x k ), choosing X 0 = {x } and employing the boundedness of { σ k ξ k }, the refined estimates ˆx k x + σ k K y σ k [ C + K sup y Y 0 y y ], σ k ŷ k y τ k Kx σ k [ C + sup y Y 0 y y ] follow for some C > 0. Plugged into (36), one obtains with σ k = O(/k) the rate G {x } Y 0 (x k+, y k+ ) = sup y Y 0 Proposition then yields the desired estimate for E p Y 0. L(x k+, y) L(x, y k+ ) = o(/k). Now, switching again to ergodic sequences, the optimal rate O(/k ) can be obtained with the accelerated iteration. Theorem 3. For X 0 Y 0 dom F dom G bounded and containing a saddle-point (x, y ), the ergodic sequences according to k x k erg = ν k k =0 λ k x k +, k yerg k = ν k k =0 λ k y k +, where {(x k, y k )} is generated by (adr) and λ k, ν k are given by (3), converge to the same limit (x, y ) as {(x k, y k )}, in the sense that x k erg x = O(/k ) and w-lim k yerg k = y. The associated restricted primal-dual gap satisfies, for some k 0 0 and ρ > 0, the estimate G X0 Y 0 (x k erg, y k erg) ν k sup (x,y) X 0 Y 0 [ σ0 ρσ k 0 ˆx k 0 x ρ + σ k0 K y ρ + ρτ 0 ŷ k 0 y ρ τ k0 Kx ρ λ k ( L(x k +, y) L(x, y k + ) )] = O(/k ), k 0 + k =0 where again x ρ = ρx + ( ρ)x and y ρ = ρy + ( ρ)y. Proof. As each x k erg is a convex combination, the convergence x k x = O(/k) implies, together with the estimates of Lemma 7, that k x k erg x ν k k =0 k λ k x k + x Cν k k =0 λ k k + C ν k k = O(/k) for appropriate C, C > 0. The weak convergence of {yerg} k follows again by the Stolz Cesàro theorem: Indeed, for y Y, we have k lim k yk k erg, y = lim =0 λ k yk +, y λ k y k, y k k k =0 λ = lim = y, y. k k λ k (37)

22 For proving the estimate on the restricted primal-dual gap, first observe that as (adr) requires γ < γ /( + σ 0 τ 0 K ) we also have γ < γ /( + ρ + ( + ρ)σ 0 τ 0 K ) for some ρ > 0 and we can choose again k 0 such that Proposition 4 can be applied for k k 0. For (x, y) X 0 Y 0, convexity of (x, y ) L(x, y) L(x, y ) and estimate (30) yield ρ ( L(x k erg, y) L(x, yerg) k ) k ν k k 0 ν k k =0 k + ν k k 0 = ν k k =0 k =0 λ k ρ ( L(x k +, y) L(x, y k + ) ) k =k 0 λ k [ σ k [ λ k + λ k ρ ( L(x k +, y) L(x, y k + ) ) ˆx k x ρ + σ k K y ρ + ŷ k y ρ τ k Kx ρ ] τ k σ k + ˆx k + x ρ + σ k +K y ρ + ŷ k + y ρ τ k τ +Kx ρ ] k + λ k ρ ( L(x k +, y) L(x, y k + ) ) + ν k λ k0 [ σ k0 ˆx k 0 x ρ + σ k0 K y ρ + τ k0 ŷ k 0 y ρ τ k0 Kx ρ ] ν k λ k [ σ k ˆx k x ρ + σ k K y ρ + τ k ŷ k y ρ τ k Kx ρ ]. The estimate from above for the restricted primal-dual gap in (34) is thus valid. Moreover, the rate follows by Lemma 7 as ν k = O(/k ) and the fact that the supremum on the right-hand side of (37) is bounded. Indeed, the latter follows, from applying (5) for k = 0,..., k 0 as well as taking the boundedness of X 0 and Y 0 into account. Corollary 3. The full dual error obeys E d (y k erg) = O(/k ). If, moreover, G is strongly coercive, then additionally, the primal error satisfies E p (x k erg) = O(/k ). Proof. Employing Lemma on a bounded Y 0 dom G containing y k erg for all k and, subsequently, Proposition, it is sufficient for the estimate on E d that F is strongly coercive. This follows immediately from the strong convexity. The analogous argument can be used in order to obtain the rate for E p, however, we have to assume strong coercivity of G. In particular, we have an optimization scheme for the dual problem with the optimal rate as well as weak convergence, comparable to the modified FISTA method presented in [8]. Moreover, in many situations, even the primal problem obeys the optimal rate. Remark 3. If (9) is already satisfied for σ 0, τ 0 and ρ > 0, which is possible for fixed σ 0 τ 0 by choosing σ 0 and ρ > 0 small enough, then one may choose k 0 = 0 such that the estimate on the restricted primal-dual gap reads as G X0 Y 0 (x k erg, y k erg) ν k sup (x,y) X 0 Y 0 [ ˆx 0 x ρ + σ 0 K y ρ ρσ 0 + ŷ0 y ρ τ 0 Kx ρ ρτ 0 ], (38) which has the same structure as (5), the corresponding estimate for the basic Douglas Rachford iteration. However, for small σ 0 γ, see (33), the smallest constant C > 0 such that ν k C/k for all k might become large. This potentially leads to (38) being worse than (37) for a greater σ 0. Thus, choosing σ 0 sufficiently small such (38) holds does not offer a clear advantage.

23 3. Strong convexity of F and G Assume in the following that both F and G are strongly convex, i.e., γ > 0 and γ > 0. We obtain an analogous identity to (6), however, with different ϑ and without changing the step-sizes σ and τ. Lemma 8. Let, in the situation of Lemma 4, the values γ > 0 and γ > 0 satisfy σγ = τγ. Then, for σ xk+ x ρ + σk y ρ + γ xk+ x ρ + τ ȳk+ y ρ τkx ρ + γ yk+ y ρ = [ ϑ σ xk+ x ρ + σk y ρ + τ ȳk+ y ρ τkx ρ ] + ϑ [ ϑ τ yk+ y ρ, τkx ρ + ỹ k+ σ xk+ x ρ, σk y ρ x ] k+ γ σk y ρ x k+ γ τkx ρ + ỹ k+ (39) ϑ = + σγ = + τγ. Proof. This follows immediately from the identities σ xk+ x ρ + σk y ρ + γ xk+ x ρ as well as = + σγ σ xk+ x ρ + σk y ρ γ x k+ x ρ, σk y ρ x k+ γ σk y ρ x k+ τ ȳk+ y ρ τkx ρ + γ yk+ y ρ = + τγ ȳ k+ y ρ τkx ρ + γ y k+ y ρ, τkx ρ + ỹ k+ γ τ τkx ρ + ỹ k+. Proposition 6. In the situation of Lemma 8, with ϑ p = ϑ d = ϑ, ρ [0, ] and γ > 0, γ > 0 chosen such that γ γ + ρ + ( + ρ + σγ )στ K, γ = γγ (40) γ there is a 0 < c < and a c > 0 such that the following estimate holds: ρ ( L(x k+, y) L(x, y k+ ) ) + ϑ [ + ( c) [ σ xk+ x ρ + σk y ρ + τ ȳk+ y ρ τkx ρ ] σ ˆxk x k+ + τ ŷk ȳ k+ ] + c xk+ x ρ + c yk+ y ρ σ ˆxk x ρ + σk y ρ + τ ŷk y ρ τkx ρ. (4) Proof. Plugging in the definitions (), the scalar-product terms in (39) can be written as ϑ [ ϑ τ yk+ y ρ, τkx ρ + ỹ k+ σ xk+ x ρ, σk y ρ x ] k+ = ( ϑ) [ K[x k+ x ρ ], ȳ k+ ŷ k y k+ y ρ, K[ x k+ ˆx k ] ]. 3

24 Adding the scalar-product terms in (3), we arrive at ( ϑ) ŷ k ȳ k+, K[x k+ + x k+ ˆx k x ρ ] ( ϑ) y k+ + ȳ k+ ŷ k y ρ, K[ˆx k x k+ ] + ( ϑ) K[x k+ x ρ ], ȳ k+ ŷ k ( ϑ) y k+ y ρ, K[ x k+ ˆx k ] = ( ϑ) ŷ k ȳ k+, K[ x k+ ˆx k ] ( ϑ) ȳ k+ ŷ k, K[ˆx k x k+ ] = 0, so one only needs to estimate the squared-norm terms on the right-hand side of (39). Doing so, employing the restriction on γ yields σγστ K <, so choosing c according to allows to obtain as well as ( max σγστ K ), ( + σγ) < c < γ σk y ρ x k+ = σ γ K [y k+ y ρ + ϑ(ȳ k+ ŷ k )] c τ ȳk+ ŷ k c( + σγ) σ γ K + (c( + σγ) σ τγ K ) yk+ y ρ, γ τkx ρ + ỹ k+ c σ xk+ ˆx k c( + τγ ) τ γ K + (c( + τγ ) στ γ K ) xk+ x ρ. As before, we would like to have that the factors in front of x k+ x ρ and y k+ y ρ are strictly below γ +ρ γ and γ +ρ γ, respectively. Indeed, with the restriction on γ, γ = σ τ γ, γ = σ τ γ, strict monotonicity as well as c( + σγ) >, one sees that σ γ K σγστ K + ρ γ γ and c( + σγ) σ γ K (c( + σγ) σ τγ K ) γ + ρ γ c for a c > 0. Analogously, again with γ = σ τ γ, it also follows that τ γ K τγ στ K + ρ γ γ and c( + τγ ) τ γ K (c( + τγ ) στ γ K ) γ + ρ γ c for a c > 0. Choosing c = min(c, c ) and combining (3), (39) with the above estimates thus yields (4). Letting ˆx k+ = x k+ and ŷ k+ = ȳ k+, we arrive at the accelerated Douglas Rachford method for strongly convex-concave saddle-point problems x k+ = (id +σ F) ( x k ), y k+ = (id +τ G) (ȳ k ), x k+ = x k+ σk [y k+ + ϑ(ȳ k+ ȳ k (4) )], ȳ k+ = y k+ + τk[x k+ + ϑ( x k+ x k )], which, together with the restrictions σγ = τγ and (40), is also summarized in Table 5. As we will see in the following, the estimate (4) implies R-linear convergence of the sequences {(x k, y k )} and {( x k, ȳ k )}, i.e., an exponential rate. 4

25 adr sc Objective: Solve min x dom F max y dom G Kx, y + F(x) G(y) Prerequisites: F, G are strongly convex with respective moduli γ, γ > 0 Initialization: ( x 0, ȳ 0 ) X Y initial guess, σ, τ > 0 step sizes with σγ = τγ, γ γ +(+σγ acceleration factor, ϑ = )στ K +σγ Iteration: x k+ = (id +σ F) ( x k ) y k+ = (id +τ G) (ȳ k ) b k+ = ( +ϑ ϑ xk+ x k ) ϑσk ( +ϑ ϑ yk+ ȳ k ) d k+ = (id +ϑ στk K) b k+ x k+ = x k ϑ x k+ + d k+ ȳ k+ = y k+ + ϑτkd k+ (adr sc ) Output: {(x k, y k )} primal-dual sequence Table 5: The accelerated Douglas Rachford iteration for the solution of strongly convex-concave saddle-point problems of type (). Theorem 4. The iteration (adr sc ) converges R-linearly to the unique saddle-point (x, y ) in the following sense: lim k (xk, y k ) = (x, y ), x k x = o(ϑ k ), y k y = o(ϑ k ). Furthermore, denoting x = x σk y and ȳ = y + τkx, we have lim k ( xk, ȳ k ) = ( x, ȳ ), x k x = o(ϑ k ), ȳ k ȳ = o(ϑ k ). Proof. As F and G are strongly convex, a unique saddle-point (x, y ) must exist. Letting (x, y ) = (x, y ), the choice of γ in (adr sc ) leads to Proposition 6 being applicable for ρ = 0, such that (4) yields [ ϑ k σ xk x + σk y + + k ϑ k k =0 [ ( c) τ ȳk y τkx ] [ x k x k + σ + ȳk ȳ k + τ σ x0 x + σk y + τ ȳ0 y τkx ] + c xk + x + y k + y ] for each k. Consequently, the sequences {ϑ k x k x k+ }, {ϑ k ȳ k ȳ k+ } as well as {ϑ k x k x }, {ϑ k y k y } are summable. This implies the stated rates for {x k } and {y k } as well as x k+ x k = o(ϑ k ) and ȳ k+ ȳ k = o(ϑ k ). The convergence statements and rates for { x k } and {ȳ k } are then a consequence of (4). Corollary 4. The primal-dual gap obeys G(x k, y k ) = o(ϑ k/ ). Proof. As F and G are strongly convex, these functionals are also strongly coercive and Lemma yields that G(x k, y k ) = G X0 Y 0 (x k, y k ) for some bounded X 0 Y 0 dom F dom G containing the saddle-point (x, y ) and all k. For (x, y) X 0 Y 0, the estimate (5) gives, for ρ =, ] L(x k+, y) L(x, y k+ ) x k x [ k+ σ xk+ x + σk y + K (y k+ + ȳ k+ ȳ k y) ] + ȳ k ȳ [ k+ τ ȳk+ y τkx + K(x k+ + x k+ x k x), 5

26 so by boundedness, G X0 Y 0 (x k+, y k+ ) C( x k x k+ + ȳ k ȳ k+ ) = o(ϑ k/ ). Remark 4. Considering the ergodic sequences k x k erg = ν k k =0 ϑ k x k +, k yerg k = ν k k =0 ϑ k y k +, ν k = ( [ ϑ)ϑk k ϑ k = ϑ k ], k =0 gives the slightly worse rates x k erg x = O(ϑ k ) and yerg k y = O(ϑ k ). Moreover, if the restriction on γ in (adr sc γ ) is not sharp, i.e., γ < +(+σγ, then (40) holds for some )στ K ρ > 0 and the estimate (4) also implies ρ ( L(x k erg, y) L(x, y k erg) ) ν k [ σ x0 x ρ + σk y ρ + τ ȳ0 y ρ τkx ρ ] ν k ϑ k[ σ xk x ρ + σk y ρ + τ ȳk y ρ τkx ρ ]. This leads to the following estimate on the restricted primal-dual gap [ G X0 Y 0 (x k erg, yerg) k ν k sup X 0 Y 0 ρσ x0 x ρ + σk y ρ + ρτ ȳ0 y ρ τkx ρ ], and, choosing a bounded X 0 Y 0 such that G(x k erg, y k erg) = G X0 Y 0 (x k erg, y k erg) for all k, to the rate G(x k erg, y k erg) = O(ϑ k ). Thus, while having roughly the same qualitative convergence behavior, the ergodic sequences admit a better rate in the exponential decay. Remark 5. With the choice σγ = τγ, the best ϑ with respect to the restrictions (40) for fixed σ, τ > 0 is given by ϑ = + σγ, γ = γ + ( + σγ )στ K. For varying σ and τ, this choice of ϑ becomes minimal when σγ becomes maximal. Performing the maximization in the non-trivial case K = 0 leads to the problem of finding the positive root of the third-order polynomial σ γ γ K σ 4 γ γ K σ 3. A very rough approximation of this root can be obtained by dropping the cubic term, leading to the easily computable parameters γ σ = γ K, τ = γ γ K, γ = γ K K +, γ γ K = γ γ K + γ γ which, in turn, can be expected to give reasonable convergence properties. 3.3 Preconditioning The iterations (adr) and (adr sc ) can now be preconditioned with the same technique as for the basic iteration (see Subsection.), i.e., based on the amended saddle-point problem (6). Let us first discuss preconditioning of (adr): With F being strongly convex, the primal functional in (6) is strongly convex, so we may indeed employ the accelerated Douglas Rachford iteration to this problem and obtain, in particular, an ergodic convergence rate of O(/k ). Taking the explicit structure into account similar to Subsection., one arrives at the preconditioned accelerated Douglas Rachford method according to Table 6. In particular, the update step for d k+ again corresponds to an iteration step for the solution of T d k+ = b k+ with respect to the matrix splitting T = M + (T M). In order to obtain convergence, one needs that M is a feasible preconditioner for T = id +σ 0 τ 0 K K and has choose the parameter γ according to γ < γ /( + σ 0 τ 0 K K + H H ) = γ / M. 6

27 padr Objective: Solve min x dom F max y dom G Kx, y + F(x) G(y) Prerequisites: F is strongly convex with modulus γ > 0 Initialization: Iteration: (ˆx 0, ŷ 0 ) X Y initial guess, σ 0 > 0, τ 0 > 0 initial step sizes, M feasible preconditioner for T = id +σ 0 τ 0 K K, 0 < γ < γ M acceleration factor, ϑ 0 = +σ0 γ x k+ = (id +σ k F) (ˆx k ) y k+ = (id +τ k G) (ŷ k ) b k+ = (( + ϑ k )x k+ ϑ k ˆx k ) σ k K (( + ϑ k )y k+ ŷ k ) d k+ = d k + M (b k+ T d k ) ˆx k+ = ϑ k (ˆx k x k+ ) + d k+ ŷ k+ = y k+ + ϑ k τ kkd k+ σ k+ = ϑ k σ k, τ k+ = ϑ k τ k, ϑ k+ = +σk+ γ (padr) Output: {(x k, y k )} primal-dual sequence Table 6: The preconditioned accelerated Douglas Rachford iteration for the solution of convexconcave saddle-point problems of type (). The latter condition, however, requires an estimate on M which usually has to be obtained case by case. One observes, nevertheless, the following properties for the feasible preconditioner M n corresponding to the n-fold application of the feasible preconditioner M. In particular, it will turn out that applying a preconditioner multiple times only requires an estimate on M. Furthermore, the norms M n approach T as n. Proposition 7. Let M be a feasible preconditioner for T. For each n, denote by M n the n-fold application of M, i.e., such that d k+ = d k + Mn (b k+ T d k ) holds for d k+ in (9) of Proposition 3. Then, the norm of the M n is monotonically decreasing for increasing n, i.e., M n M n... M = M. Furthermore, if T is invertible, the spectral radius of id M T obeys ρ(id M T ) < and M n whenever the expression on the right-hand side is non-negative. T M ρ(id M T ) n (43) Proof. The proof is adapted from the proof of Proposition.5 of [5] from which we need the following two conclusions. One is that for any linear, self-adjoint and positive definite pair of operators B and S in the Hilbert space X, the following three statements are equivalent: B S 0, id B / SB / 0 and σ(id B S) [0, [ where σ denotes the spectrum. Here, B / is the square root of B, and B / is its inverse. Actually, we also have σ(id B S) = σ ( B(id B S)B ) = σ(id SB ) [0, [, while σ(id SB ) [0, [ also means S B. The other conclusion from Proposition.5 of [5] is the recursion relation for M n according to Mn+ = M n + (id M T ) n M = Mn + M / (id M / T M / ) n M / 7

28 with id M / T M / being a positive semidefinite operator. Actually, since M / is also self-adjoint, we have that id M / T M / is also self-adjoint, and it follows that M / (id M / T M / ) n M / 0 M n+ M n. By the above equivalence, this leads to M n+ M n, and M n+ M n from which the monotonicity behavior follows by induction. For the estimate on M n in case T is invertible, introduce an equivalent norm on X associated with the M-scalar product x, x M = Mx, x. Then, id Mn T is self-adjoint with respect to the M-scalar product: (id M n T )x, x M = M(id M T ) n x, x = (id T M ) n Mx, x = Mx, (id M T ) n x = x, (id Mn T )x M for x, x X. Let 0 q and suppose that x 0 is chosen such that (id M T )x M q x M. Then, ( q ) Mx, x = ( q ) x M T x, x + (id T / M T / )T / x, T / x T x, x, as id T / M T / is positive semi-definite due to σ(id T / M T / ) = σ(id M T ) [0, [. Consequently, since T is continuously invertible, T x ( q ) M x q T M. Hence, q must be bounded away from and, consequently, ρ(id M T ) = id M T M < as the contrary would result in a contradiction. ρ(id M T ) n as well as Now, id M n T M id M T n M = id M n Eventually, estimating T = M / (id Mn T )M / M M / Mρ(id M T ) n = M ρ(id M T ) n. M n T + M n id Mn T T + M n M ρ(id M T ) n gives, if M ρ(id M T ) n <, the desired estimate (43). Remark 6. Let us shortly discuss possible ways to estimate M for the classical preconditioners in Table 3. For the Richardson preconditioner, it is clear that M = λ with λ T being an estimate on T. For the damped Jacobi preconditioner, one has to choose λ λ max (T D) where λ max denotes the greatest eigenvalue, which results in M = (λ + ) D. The norm of the diagonal matrix D is easily obtainable as the maximum of the entries. 8

29 For symmetric SOR preconditioners, which include the symmetric Gauss Seidel case, we can express M T as M T = ω ( ω ω ω D + E) D ( ω ω D + E ) (44) for T = D E E, D diagonal and E lower triangular matrix of T, respectively, as well as ω ]0, [. Hence, M T can be estimated by estimating the squared norm of ω ω D/ + D / E. This can, for instance, be done as follows. Denoting by t ij the entries of T and performing some computational steps, we get for fixed i that ( ω ( ω ω D/ x + D / E x) i = ii x i ) t ij x j leading to ( ω ω ω t/ t / ii j>i [ ω ( t ii + t ) ij ω t ii j>i ] max t ij x j>i D/ + D / E ) ω ω D + diag(d E ) max t ij = K. j i where is the maximum row-sum norm and symmetry of T has been used. This results in M T + ω ω K. In particular, in case that T is weakly diagonally dominant, then D E such that for the Gauss Seidel preconditioner one obtains the easily computable estimate M T + max j i t ij. Let us next discuss preconditioning of (adr sc ). It is immediate that for strongly convex F and G, the acceleration strategy in (4) may be employed for the modified saddle-point problem (6). As before, this results in a preconditioned version of (adr sc ) with M = id +ϑ στ(k K + H H) being a feasible preconditioner for T = id +ϑ στk K. However, as one usually constructs M for given T, it will in this case depend on ϑ and, consequently, on the restriction on γ which reads in this context as γ γ + ( + σγ )ϑ ( M ). (45) As ϑ = +σγ, the condition on γ is implicit and it is not clear whether it will be satisfied for a given M. Nevertheless, if the latter is the case, preconditioning yields the iteration scheme (padr sc ) in Table 7 which inherits all convergence properties of the unpreconditioned iteration. So, in order to satisfy the conditions on M and γ, we introduce and discuss a notion that leads to sufficient conditions. Definition 3. Let (M ϑ ) ϑ, ϑ [0, ] be a family of feasible preconditioners for T ϑ = id +ϑ T where T 0. We call this family ϑ-norm-monotone with respect to T, if ϑ ϑ implies ϑ ( M ϑ ) (ϑ ) ( M ϑ ) for ϑ, ϑ [0, ]. Lemma 9. If (M ϑ ) ϑ is ϑ-norm-monotone with respect to στk K and γ 0 > 0 satisfies, for some ϑ [0, ], γ 0 < γ 0 + ( + σγ )(ϑ ) ( M ϑ ), ϑ, + σγ 0 then, γ = γ 0 satisfies (45) for M = M ϑ with ϑ = implication γ 0 γ γ + ( + σγ )ϑ ( M ϑ ) +σγ 0 and we have, for each γ > 0, the + σγ ϑ. 9

30 padr sc Objective: Solve min x dom F max y dom G Kx, y + F(x) G(y) Prerequisites: F, G are strongly convex with respective moduli γ, γ, > 0 Initialization: ( x 0, ȳ 0 ) X Y initial guess, σ, τ > 0 step sizes with σγ = τγ, γ acceleration factor, M feasible preconditioner for T = id +ϑ στk K γ and such that γ +(+σγ )ϑ ( M ) where ϑ = +σγ Iteration: x k+ = (id +σ F) ( x k ) Output: y k+ = (id +τ G) (ȳ k ) b k+ = ( +ϑ ϑ xk+ x k ) ϑσk ( +ϑ ϑ yk+ ȳ k ) d k+ = d k + M (b k+ T d k ) x k+ = x k ϑ x k+ + d k+ ȳ k+ = y k+ + ϑτkd k+ {(x k, y k )} primal-dual sequence (padr sc ) Table 7: The preconditioned accelerated Douglas Rachford iteration for the solution of strongly convex-concave saddle-point problems of type (). Proof. By the choice of γ 0 and ϑ-norm-monotonicity we immediately see that γ 0 γ / ( + ( + σγ )ϑ ( M ϑ ) ) for ϑ = ( + σγ 0 ), giving (45) as stated. The remaining conclusion is just a consequence of the monotonicity of γ ( + σγ). Remark 7. The result of Lemma 9 can be used as follows. Choose a family of feasible preconditioners (M ϑ ) ϑ for id +ϑ στk K which is ϑ-norm-monotone with respect to στk K. Starting with ϑ = and M, choosing γ 0 according to the lemma already establishes (45) for γ = γ 0. However, the corresponding M ϑ yields an improved bound for γ 0 and a greater γ 0 can be used instead without violating (45). This procedure may be iterated in order to obtain a good acceleration factor γ. Observe that the argumentation remains valid if the norm M ϑ is replaced by some estimate K ϑ M ϑ in Definition 3 and Lemma 9. Remark 8. Let us discuss feasible preconditioners for T ϑ = id +ϑ T, T 0 that are ϑ-normmonotone with respect to T. For the Richardson preconditioner, choose λ : [0, ] R for which ϑ ϑ λ(ϑ) is monotonically increasing and λ(ϑ) ϑ T for each ϑ [0, ]. Then, M ϑ = (λ(ϑ) + ) id defines a family of feasible preconditioners for T ϑ which is ϑ-norm-monotone as ϑ ( M ϑ ) = ϑ λ(ϑ) for each ϑ ]0, ]. For the damped Jacobi preconditioner, we see that for D ϑ the diagonal matrix of T ϑ we have D ϑ = + ϑ D where D is the diagonal matrix of T. Furthermore, T ϑ D ϑ = ϑ (T D ), so choosing λ : [0, ] R such that ϑ ϑ λ(ϑ) is monotonically increasing and λ(ϑ) ϑ T D yields feasible M ϑ = (λ(ϑ)+)d ϑ. This family is ϑ-norm-monotone with respect to T as ϑ ( M ϑ ) = ϑ λ(ϑ)+(λ(ϑ)+) D is monotonically increasing. For the symmetric Gauss Seidel preconditioner, we have M ϑ = (D ϑ E ϑ )D ϑ (D ϑ Eϑ ) = T ϑ +E ϑ D ϑ E ϑ where T ϑ = D ϑ E ϑ Eϑ, D ϑ is the diagonal and E ϑ lower diagonal matrix of T ϑ, respectively. Obviously, D ϑ and E ϑ admit the form D ϑ = id +ϑ D and E ϑ = ϑ E for respective D and E. Observe that ϑ ϑ /( + ϑ d) is monotonically increasing for each d 0. Consequently, it follows from ϑ ϑ that ϑ D ϑ (ϑ ) D ϑ and, further that ϑ E ϑ D ϑ E ϑ = E ϑ D ϑ (E ) E (ϑ ) D ϑ (E ) = (ϑ ) E ϑ D ϑ E ϑ. 30

31 As ϑ (T ϑ id) is independent from ϑ, we get ϑ (M ϑ id) (ϑ ) (M ϑ id). In particular, ϑ ( M ϑ ) (ϑ ) ( M ϑ ) which shows that the symmetric Gauss Seidel preconditioners (M ϑ ) ϑ constitute a ϑ-norm-monotone family. The SSOR methods are in general not ϑ-norm-monotone, except for the symmetric Gauss Seidel case ω =, as discussed above. 4 Numerical experiments In order to assess the performance of the discussed Douglas Rachford algorithms, we performed numerical experiments for two specific convex optimization problems in imaging: image denoising with L -discrepancy and TV regularization as well as TV-Huber approximation, respectively. 4. L -TV denoising We tested the basic and the accelerated Douglas Rachford iteration for a discrete L -TV-denoising problem (also called ROF model) according to min u X u f + α u, (46) with X the space of functions on a D regular rectangular grid and : X Y a forward finite-difference gradient operator with homogeneous Neumann conditions. The minimization problem may be written as the saddle-point problem () with primal variable x = u X, dual variable y = p Y, F(u) = u f /, G(p) = I { p α}(p) and K =. The associated dual problem then reads as min p Y div p + f f + I { p α}(p). (47) such that the primal-dual gap, which is used as the stopping criterion, becomes G L - TV(u, p) = u f + α u + div p + f f + I { p α}(p). (48) As F is strongly convex, the accelerated methods (adr) and (padr) proposed in this paper are applicable for solving L -TV denoising problems, in addition to the basic methods (DR) and (pdr). The computational building blocks for the respective implementations are well-known, see [4, 5, 9], for instance. We compare with first-order methods which also achieve O(/k ) convergence rate in some sense, namely ALG from [9], FISTA [] and the fast Douglas Rachford iteration in [0]. The parameter settings for the O(/k )-methods are as follows: ALG: Accelerated primal-dual algorithm introduced in [9] with parameters τ 0 = /L, τ k σ k L =, γ = 0.35 and L = 8. FISTA: Fast iterative shrinkage thresholding algorithm [] on the dual problem (47). The Lipschitz constant estimate L as in [] here is chosen as L = 8. FDR: Fast Douglas Rachford splitting method proposed in [0] on the dual problem (47) with parameters L f = 8, γ =, λ k = λ = γl f +γl f, β 0 = 0, and β k = k k+ for k > 0. L f adr: Accelerated Douglas Rachford method as in Table 4. Here, the discrete cosine transform (DCT) is used for solving the discrete elliptic equation with operator id σ 0 τ 0. The parameters read as L = 8, σ 0 =, τ 0 = 5/σ 0, γ = /( + σ 0 τ 0 L ). 3

32 (a) Original image (b) Noise level: σ = 0. (c) α = 0. (d) α = 0.5 Figure : Results for variational L -TV denoising. All denoised images are obtained with the padr algorithm which was stopped once the primal-dual gap normalized by the number of pixels became less than 0 7. (a) is the original image: Taj Mahal ( pixels, gray). (b) shows a noise-perturbed version of (a) (additive Gaussian noise, standard deviation 0.), (c) and (d) are the denoised images with α = 0. and α = 0.5, respectively. padr: Preconditioned accelerated Douglas Rachford method as in Table 6 with two steps of the symmetric Red-Black Gauss Seidel iteration as preconditioner and parameters σ0 =, τ0 = 5/σ0, γ =, L = 8. The norm km k is estimated as km k < kt kest + km T kest with kt kest = + σ0 τ0 L and km T kest = 4(σ0 τ0 ) /( + 4σ0 τ0 ). The parameter γ is set as γ = γ /(kt kest + km T kest ). We also tested the O(/k) basic Douglas Rachford algorithms using the following parameters. DR: Douglas-Rachford method as in Table. Again, the discrete cosine transform (DCT) is used for applying the inverse of the elliptic operator id στ. Here, σ =, τ = 5/σ. pdr: Preconditioned Douglas-Rachford method as in Table. Again, the preconditioner is given by two steps of the symmetric Red-Black Gauss Seidel iteration. The step sizes are chosen as σ =, τ = 5/σ. Computations were performed for the image Taj Mahal (size pixels), additive Gaussian noise (noise level 0.) and different regularization parameters α in (46) using an Intel Xeon CPU (E5-690v3,.6 GHz, cores). Figure includes the original image, the noisy image and denoised images with different regularization parameters. It can be seen from Table 8 as well as Figure that the proposed algorithms pdr and padr are competitive both in terms of iteration number and computation time, especially padr. While in terms of iteration numbers, DR and pdr as well as adr and padr perform roughly equally well, the preconditioned variants massively benefit from the preconditioner in terms of computation time with each iteration taking only a fraction compared to the respective unpreconditioned variants. 3

33 α = 0. ε= ALG FISTA FDR adr padr DR pdr 0 5 α = 0.5 ε= 0 7 ε= 0 5 ε = (.5s) (.80s) 730 (74.39s) 68 (6.00s) 75 (.30s) 80 (8.77s) 659 (0.75s) 497 (494.95s) 357 (3.s) 369 (6.5s) 368 (3.99s) 8 (0.4s) 80 (.76s) 04 (9.4s) 8 (.5s) 74 (3.7s) 4534 (56.7s) 879 (45.9s) 757 (66.s) 83 (3.70s) 57 (5.4s) 65 (.07s) 596 (54.45s) 638 (0.06s) 33 (.50s) 4 (.6s) 3067 (88.7s) 307 (47.87s) Table 8: Numerical results for the L -TV image denoising (ROF) problem (46) with noise level 0. and regularization parameters α = 0., α = 0.5. For all algorithms, we use the pair k(t) to denote the iteration number k and CPU time cost t. The iteration is performed until the primal-dual gap (48) normalized by the number of pixels is below ε ALG FISTA FDR adr padr Normalized Primal-Dual Gap 0 0 O(/k ) 3 0 ALG FISTA FDR adr padr 0 Normalized Primal-Dual Gap Iterations CPU Time (s) (a) Convergence with respect to iteration number. (b) Convergence with respect to computation time. Figure : Numerical convergence rate compared with iteration number and iteration time for the ROF model with regularization parameter α = 0.5. The plot in (a) utilizes a double-logarithmic scale while the plot in (b) is semi-logarithmic. 4. L -Huber-TV denoising There are several ways to approximate the non-smooth -norm term in (46) by a smooth functional. One possibility is employing the so-called Huber loss function instead of the Euclidean vector norm, resulting in ( pi X if pi αλ, λ kpkα,λ = pi α,λ, pi α,λ = λα α pi if pi > αλ, i pixels and the associated L -Huber-TV minimization problem min u X ku f k + k ukα,λ. (49) An appropriate saddle-point formulation can be obtained from the saddle-point formulation for the ROF model by replacing G by the functional G(p) = I{kpk α} (p) + λ kpk. The primal-dual 33

34 (a) Original image (b) Noise level: σ = 0.5 (c) ALG3 (d) padrsc Figure 3: Results for variational L -Huber-TV denoising. (a) is the original image: Man (04 04 pixels, gray). (b) shows a noise-perturbed version of (a) (additive Gaussian noise, standard deviation 0.5), (c) and (d) are the denoised images with α =.0 and λ = 0.05 obtained with algorithm ALG3 and padrsc, respectively. The iteration was stopped as soon as the primal-dual gap normalized by the number of pixels was less than 0 4. gap consequently reads as GL -Huber-TV (u, p) = ku f k kdiv p + f k kf k λ + k ukα,λ + + I{kpk α} + kpk. (50) Since both F and G are strongly convex functions with respective moduli γ = and γ = λ, we can use (adrsc ) and (padrsc ) to solve the saddle-point problem. Additionally, we compare to ALG3 from [9], a solution algorithm for the same class of strongly convex saddle-point problems. All algorithms admit an O(ϑk )-convergence rate for the primal-dual gap. ALG3: Accelerated primal-dual algorithm ALG3 for strongly convex saddle-point problems as in [9], using parameters γ =, δ = λ, µ = γδ/l, L = 8, ϑ = /( + µ). adrsc : Accelerated Douglas Rachford method for strongly convex saddle-point problems as in Table 5, using parameters γ =, γ = λ, σ = 0., τ = σγ /γ, γ = γ / + ( + σγ )στ L with L = 8. padrsc : Preconditioned accelerated Douglas Rachford method for strongly convex saddlepoint problems as in Table 7. Computing three steps of the symmetric Red-Black Gauss Seidel method is employed as preconditioner. The parameters are chosen as: γ =, γ = λ, σ = 0.5, τ = σγ /γ, L = 8. The values ϑ and γ are obtained by the procedure outlined in Remark 7: Setting ϑ = first, then performing 0 times the update kt kest +ϑ στ L, km T kest 4ϑ4 (στ ) /(+4ϑ στ ), γ γ / +(+σγ )ϑ (kt kest +km T kest ), ϑ /( + σγ). Numerical experiments have been performed for the image Man (04 04 pixels, gray) which has been contaminated with additive Gaussian noise (noise level 0.5) using the same CPU as for the L -TV-denoising experiments. The results for the parameters α =.0 and λ = 0.05 are depicted in Figure 3. The table (a) in Figure 4 indicates that padrsc is competitive in comparison to a well-established fast algorithm suitable for the same class of saddle-point problems with speed benefits again coming from the preconditioner. Figure 4 (b) moreover shows that the observed geometric reduction factor for the primal-dual gap is lower for adrsc and padrsc than for ALG3, with the former being roughly equal. 34

Preconditioned Alternating Direction Method of Multipliers for the Minimization of Quadratic plus Non-Smooth Convex Functionals

Preconditioned Alternating Direction Method of Multipliers for the Minimization of Quadratic plus Non-Smooth Convex Functionals SpezialForschungsBereich F 32 Karl Franzens Universita t Graz Technische Universita t Graz Medizinische Universita t Graz Preconditioned Alternating Direction Method of Multipliers for the Minimization

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Regularization of linear inverse problems with total generalized variation

Regularization of linear inverse problems with total generalized variation Regularization of linear inverse problems with total generalized variation Kristian Bredies Martin Holler January 27, 2014 Abstract The regularization properties of the total generalized variation (TGV)

More information

M. Marques Alves Marina Geremia. November 30, 2017

M. Marques Alves Marina Geremia. November 30, 2017 Iteration complexity of an inexact Douglas-Rachford method and of a Douglas-Rachford-Tseng s F-B four-operator splitting method for solving monotone inclusions M. Marques Alves Marina Geremia November

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

arxiv: v2 [math.oc] 21 Nov 2017

arxiv: v2 [math.oc] 21 Nov 2017 Unifying abstract inexact convergence theorems and block coordinate variable metric ipiano arxiv:1602.07283v2 [math.oc] 21 Nov 2017 Peter Ochs Mathematical Optimization Group Saarland University Germany

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Second order forward-backward dynamical systems for monotone inclusion problems

Second order forward-backward dynamical systems for monotone inclusion problems Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from

More information

Convex Optimization Conjugate, Subdifferential, Proximation

Convex Optimization Conjugate, Subdifferential, Proximation 1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview

More information

BASICS OF CONVEX ANALYSIS

BASICS OF CONVEX ANALYSIS BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

Maximal Monotone Inclusions and Fitzpatrick Functions

Maximal Monotone Inclusions and Fitzpatrick Functions JOTA manuscript No. (will be inserted by the editor) Maximal Monotone Inclusions and Fitzpatrick Functions J. M. Borwein J. Dutta Communicated by Michel Thera. Abstract In this paper, we study maximal

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

A first-order primal-dual algorithm with linesearch

A first-order primal-dual algorithm with linesearch A first-order primal-dual algorithm with linesearch Yura Malitsky Thomas Pock arxiv:608.08883v2 [math.oc] 23 Mar 208 Abstract The paper proposes a linesearch for a primal-dual method. Each iteration of

More information

Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators

Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators Renato D.C. Monteiro, Chee-Khian Sim November 3, 206 Abstract This paper considers the

More information

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Key words. alternating direction method of multipliers, convex composite optimization, indefinite proximal terms, majorization, iteration-complexity

Key words. alternating direction method of multipliers, convex composite optimization, indefinite proximal terms, majorization, iteration-complexity A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS FOR LINEARLY CONSTRAINED CONVEX COMPOSITE OPTIMIZATION MIN LI, DEFENG SUN, AND KIM-CHUAN TOH Abstract. This paper presents a majorized alternating direction

More information

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

More information

EXAMPLES OF CLASSICAL ITERATIVE METHODS

EXAMPLES OF CLASSICAL ITERATIVE METHODS EXAMPLES OF CLASSICAL ITERATIVE METHODS In these lecture notes we revisit a few classical fixpoint iterations for the solution of the linear systems of equations. We focus on the algebraic and algorithmic

More information

Levenberg-Marquardt method in Banach spaces with general convex regularization terms

Levenberg-Marquardt method in Banach spaces with general convex regularization terms Levenberg-Marquardt method in Banach spaces with general convex regularization terms Qinian Jin Hongqi Yang Abstract We propose a Levenberg-Marquardt method with general uniformly convex regularization

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

A Globally Linearly Convergent Method for Pointwise Quadratically Supportable Convex-Concave Saddle Point Problems

A Globally Linearly Convergent Method for Pointwise Quadratically Supportable Convex-Concave Saddle Point Problems A Globally Linearly Convergent Method for Pointwise Quadratically Supportable Convex-Concave Saddle Point Problems D. Russell Luke and Ron Shefi February 28, 2017 Abstract We study the Proximal Alternating

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs

Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs Christian Clason Faculty of Mathematics, Universität Duisburg-Essen joint work with Barbara Kaltenbacher, Tuomo Valkonen,

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Peter Ochs, Jalal Fadili, and Thomas Brox Saarland University, Saarbrücken, Germany Normandie Univ, ENSICAEN, CNRS, GREYC, France

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06 A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06 CHRISTIAN LÉONARD Contents Preliminaries 1 1. Convexity without topology 1 2. Convexity

More information

An inexact subgradient algorithm for Equilibrium Problems

An inexact subgradient algorithm for Equilibrium Problems Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Renato D. C. Monteiro B. F. Svaiter August, 010 Revised: December 1, 011) Abstract In this paper,

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

On duality theory of conic linear problems

On duality theory of conic linear problems On duality theory of conic linear problems Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 3332-25, USA e-mail: ashapiro@isye.gatech.edu

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

Dedicated to Michel Théra in honor of his 70th birthday

Dedicated to Michel Théra in honor of his 70th birthday VARIATIONAL GEOMETRIC APPROACH TO GENERALIZED DIFFERENTIAL AND CONJUGATE CALCULI IN CONVEX ANALYSIS B. S. MORDUKHOVICH 1, N. M. NAM 2, R. B. RECTOR 3 and T. TRAN 4. Dedicated to Michel Théra in honor of

More information

Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control

Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control RTyrrell Rockafellar and Peter R Wolenski Abstract This paper describes some recent results in Hamilton- Jacobi theory

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

arxiv: v7 [math.oc] 22 Feb 2018

arxiv: v7 [math.oc] 22 Feb 2018 A SMOOTH PRIMAL-DUAL OPTIMIZATION FRAMEWORK FOR NONSMOOTH COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, OLIVIER FERCOQ, AND VOLKAN CEVHER arxiv:1507.06243v7 [math.oc] 22 Feb 2018 Abstract. We propose a

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Linear Programming Redux

Linear Programming Redux Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains

More information

CAAM 454/554: Stationary Iterative Methods

CAAM 454/554: Stationary Iterative Methods CAAM 454/554: Stationary Iterative Methods Yin Zhang (draft) CAAM, Rice University, Houston, TX 77005 2007, Revised 2010 Abstract Stationary iterative methods for solving systems of linear equations are

More information

Dual methods for the minimization of the total variation

Dual methods for the minimization of the total variation 1 / 30 Dual methods for the minimization of the total variation Rémy Abergel supervisor Lionel Moisan MAP5 - CNRS UMR 8145 Different Learning Seminar, LTCI Thursday 21st April 2016 2 / 30 Plan 1 Introduction

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

A Solution Method for Semidefinite Variational Inequality with Coupled Constraints

A Solution Method for Semidefinite Variational Inequality with Coupled Constraints Communications in Mathematics and Applications Volume 4 (2013), Number 1, pp. 39 48 RGN Publications http://www.rgnpublications.com A Solution Method for Semidefinite Variational Inequality with Coupled

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Primal-dual subgradient methods for convex problems

Primal-dual subgradient methods for convex problems Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different

More information

Signal Processing and Networks Optimization Part VI: Duality

Signal Processing and Networks Optimization Part VI: Duality Signal Processing and Networks Optimization Part VI: Duality Pierre Borgnat 1, Jean-Christophe Pesquet 2, Nelly Pustelnik 1 1 ENS Lyon Laboratoire de Physique CNRS UMR 5672 pierre.borgnat@ens-lyon.fr,

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

arxiv: v2 [math.oc] 28 Jan 2019

arxiv: v2 [math.oc] 28 Jan 2019 On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming Liang Chen Xudong Li Defeng Sun and Kim-Chuan Toh March 28, 2018; Revised on Jan 28, 2019 arxiv:1803.10803v2

More information

Subdifferential representation of convex functions: refinements and applications

Subdifferential representation of convex functions: refinements and applications Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential

More information

Your first day at work MATH 806 (Fall 2015)

Your first day at work MATH 806 (Fall 2015) Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

A primal-dual hybrid gradient method for non-linear operators with applications

A primal-dual hybrid gradient method for non-linear operators with applications A primal-dual hybrid gradient method for non-linear operators with applications to MRI Tuomo Valkonen Abstract We study the solution of minimax problems min x max y G(x) + K(x), y F (y) in finite-dimensional

More information

c 2013 Society for Industrial and Applied Mathematics

c 2013 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 3, No., pp. 109 115 c 013 Society for Industrial and Applied Mathematics AN ACCELERATED HYBRID PROXIMAL EXTRAGRADIENT METHOD FOR CONVEX OPTIMIZATION AND ITS IMPLICATIONS TO SECOND-ORDER

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

The Subdifferential of Convex Deviation Measures and Risk Functions

The Subdifferential of Convex Deviation Measures and Risk Functions The Subdifferential of Convex Deviation Measures and Risk Functions Nicole Lorenz Gert Wanka In this paper we give subdifferential formulas of some convex deviation measures using their conjugate functions

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Abstract This paper presents an accelerated

More information

Linear convergence of iterative soft-thresholding

Linear convergence of iterative soft-thresholding arxiv:0709.1598v3 [math.fa] 11 Dec 007 Linear convergence of iterative soft-thresholding Kristian Bredies and Dirk A. Lorenz ABSTRACT. In this article, the convergence of the often used iterative softthresholding

More information

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions Angelia Nedić and Asuman Ozdaglar April 15, 2006 Abstract We provide a unifying geometric framework for the

More information

An Efficient Inexact Symmetric Gauss-Seidel Based Majorized ADMM for High-Dimensional Convex Composite Conic Programming

An Efficient Inexact Symmetric Gauss-Seidel Based Majorized ADMM for High-Dimensional Convex Composite Conic Programming Journal manuscript No. (will be inserted by the editor) An Efficient Inexact Symmetric Gauss-Seidel Based Majorized ADMM for High-Dimensional Convex Composite Conic Programming Liang Chen Defeng Sun Kim-Chuan

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization

An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization H. Mansouri M. Zangiabadi Y. Bai C. Roos Department of Mathematical Science, Shahrekord University, P.O. Box 115, Shahrekord,

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

Radius Theorems for Monotone Mappings

Radius Theorems for Monotone Mappings Radius Theorems for Monotone Mappings A. L. Dontchev, A. Eberhard and R. T. Rockafellar Abstract. For a Hilbert space X and a mapping F : X X (potentially set-valued) that is maximal monotone locally around

More information

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction J. Korean Math. Soc. 38 (2001), No. 3, pp. 683 695 ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE Sangho Kum and Gue Myung Lee Abstract. In this paper we are concerned with theoretical properties

More information

An inexact block-decomposition method for extra large-scale conic semidefinite programming

An inexact block-decomposition method for extra large-scale conic semidefinite programming An inexact block-decomposition method for extra large-scale conic semidefinite programming Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter December 9, 2013 Abstract In this paper, we present an inexact

More information

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers Optimization for Communications and Networks Poompat Saengudomlert Session 4 Duality and Lagrange Multipliers P Saengudomlert (2015) Optimization Session 4 1 / 14 24 Dual Problems Consider a primal convex

More information

On proximal-like methods for equilibrium programming

On proximal-like methods for equilibrium programming On proximal-lie methods for equilibrium programming Nils Langenberg Department of Mathematics, University of Trier 54286 Trier, Germany, langenberg@uni-trier.de Abstract In [?] Flam and Antipin discussed

More information

Convergence Theorems of Approximate Proximal Point Algorithm for Zeroes of Maximal Monotone Operators in Hilbert Spaces 1

Convergence Theorems of Approximate Proximal Point Algorithm for Zeroes of Maximal Monotone Operators in Hilbert Spaces 1 Int. Journal of Math. Analysis, Vol. 1, 2007, no. 4, 175-186 Convergence Theorems of Approximate Proximal Point Algorithm for Zeroes of Maximal Monotone Operators in Hilbert Spaces 1 Haiyun Zhou Institute

More information

1 Introduction and preliminaries

1 Introduction and preliminaries Proximal Methods for a Class of Relaxed Nonlinear Variational Inclusions Abdellatif Moudafi Université des Antilles et de la Guyane, Grimaag B.P. 7209, 97275 Schoelcher, Martinique abdellatif.moudafi@martinique.univ-ag.fr

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

COURSE Iterative methods for solving linear systems

COURSE Iterative methods for solving linear systems COURSE 0 4.3. Iterative methods for solving linear systems Because of round-off errors, direct methods become less efficient than iterative methods for large systems (>00 000 variables). An iterative scheme

More information

FIXED POINT ITERATIONS

FIXED POINT ITERATIONS FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in

More information

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem 1 Conve Analsis Main references: Vandenberghe UCLA): EECS236C - Optimiation methods for large scale sstems, http://www.seas.ucla.edu/ vandenbe/ee236c.html Parikh and Bod, Proimal algorithms, slides and

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm

More information

ADMM for monotone operators: convergence analysis and rates

ADMM for monotone operators: convergence analysis and rates ADMM for monotone operators: convergence analysis and rates Radu Ioan Boţ Ernö Robert Csetne May 4, 07 Abstract. We propose in this paper a unifying scheme for several algorithms from the literature dedicated

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

Convex and Nonconvex Optimization Techniques for the Constrained Fermat-Torricelli Problem

Convex and Nonconvex Optimization Techniques for the Constrained Fermat-Torricelli Problem Portland State University PDXScholar University Honors Theses University Honors College 2016 Convex and Nonconvex Optimization Techniques for the Constrained Fermat-Torricelli Problem Nathan Lawrence Portland

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information