A first-order primal-dual algorithm with linesearch

Size: px
Start display at page:

Download "A first-order primal-dual algorithm with linesearch"

Transcription

1 A first-order primal-dual algorithm with linesearch Yura Malitsky Thomas Pock arxiv: v2 [math.oc] 23 Mar 208 Abstract The paper proposes a linesearch for a primal-dual method. Each iteration of the linesearch requires to update only the dual (or primal) variable. For many problems, in particular for regularized least squares, the linesearch does not require any additional matrix-vector multiplications. We prove convergence of the proposed method under standard assumptions. We also show an ergodic O(/N) rate of convergence for our method. In case one or both of the prox-functions are strongly convex, we modify our basic method to get a better convergence rate. Finally, we propose a linesearch for a saddle point problem with an additional smooth term. Several numerical experiments confirm the efficiency of our proposed methods. 200 Mathematics Subject Classification: 49M29 65K0 65Y20 90C25 Keywords: Saddle-point problems, first-order algorithms, primal-dual algorithms, linesearch, convergence rates, backtracking In this work we propose a linesearch procedure for the primal-dual algorithm (PDA) that was introduced in [4]. It is a simple first-order method that is widely used for solving nonsmooth composite minimization problems. Recently, it was shown the connection of PDA with proximal point algorithms [2] and ADMM [6]. Some generalizations of the method were considered in [5, 7, 0]. A survey of possible applications of the algorithm can be found in [6, 3]. The basic form of PDA uses fixed step sizes during all iterations. This has several drawbacks. First, we have to compute the norm of the operator, which may be quite expensive for large scale dense matrices. Second, even if we know this norm, one can often use larger steps, which usually yields a faster convergence. As a remedy for the first issue one can use a diagonal precondition [6], but still there is no strong evidence that such a precondition improves or at least does not worsen the speed of convergence of PDA. Regarding the second issue, as we will see in our experiments, the speed improvement gained by using the linesearch sometimes can be significant. Our proposed analysis of PDA exploits the idea of recent works [4, 5] where several algorithms are proposed for solving a monotone variational inequality. Those algorithms are different from the PDA; however, they also use a similar extrapolation step. Although our analysis of the primal-dual method is not so elegant as, for example, that in [2], it gives a simple and a cheap way to incorporate the linesearch for defining the step sizes. Each inner iteration of the linesearch requires updating only the dual variables. Moreover, Institute for Computer Graphics and Vision, Graz University of Technology, 800 Graz,Austria. y.malitsky@gmail.com, pock@icg.tugraz.at

2 the step sizes may increase from iteration to iteration. We prove the convergence of the algorithm under quite general assumptions. Also we show that in many important cases the PDAL (primal-dual algorithm with linesearch) preserves the complexity of the iteration of PDA. In particular, our method, applied to the regularized least-squares problems, uses the same number of matrix-vector multiplication per iteration as the forward-backward method or FISTA [3] (both with fixed step size) does, but does not require to know the matrix norm and, in addition, uses adaptive steps. For the case when the primal or dual objectives are strongly convex, we modify our linesearch procedure in order to construct accelerated versions of PDA. This is done in a similar way as in [4]. The obtained algorithms share the same complexity per iteration as PDAL does, but in many cases substantially outperform PDA and PDAL. We also consider a more general primal-dual problem which involves an additional smooth function with Lipschitz-continuous gradient (see [7]). For this case we generalize our linesearch to avoid knowing that Lipschitz constant. The authors in [9,0] also proposed a linesearch for the primal-dual method, with the goal to vary the ratio between primal and dual steps such that primal and dual residuals remain roughly of the same size. The same idea was used in [] for the ADMM method. However, we should highlight that this principle is just a heuristic, as it is not clear that it in fact improves the speed of convergence. The linesearch proposed in [9, 0] requires an update of both primal and dual variables, which may make the algorithm much more expensive than the basic PDA. Also the authors proved convergence of the iterates only in the case when one of the sequences (x k ) or (y k ) is bounded. Although this is often the case, there are many problems which can not be encompassed by this assumption. Finally, it is not clear how to derive accelerated versions of that algorithm. As a byproduct of our analysis, we show how one can use a variable ratio between primal and dual steps and under which circumstances we can guarantee convergence. However, it was not the purpose of this paper to develop new strategies for varying such a ratio during iterations. The paper is organized as follows. In the next section we introduce the notations and recall some useful facts. Section 2 presents our basic primal-dual algorithm with linesearch. We prove its convergence, establish its ergodic convergence rate and consider some particular examples of how the linesearch works. In section 3 we propose accelerated versions of PDAL under the assumption that the primal or dual problem is strongly convex. Section 4 deals with more general saddle point problems which involve an additional smooth function. In section 5 we illustrate the efficiency of our methods for several typical problems. Preliminaries Let X, Y be two finite-dimensional real vector spaces equipped with an inner product, and a norm =,. We are focusing on the following problem: where min max Kx, y + g(x) f (y), () x X y Y 2

3 K : X Y is a bounded linear operator, with the operator norm L = K ; g : X (, + ] and f : Y (, + ] are proper lower semicontinuous convex (l.s.c.) functions; problem () has a saddle point. Note that f denotes the Legendre Fenchel conjugate of a convex l.s.c. function f. By a slight (but common) abuse of notation, we write K to denote the adjoint of the operator K. Under the above assumptions, () is equivalent to the primal problem and the dual problem: min f(kx) + g(x) (2) x X min f (y) + g ( K x). (3) y Y Recall that for a proper l.s.c. convex function h: X (, + ] the proximal operator prox h is defined as prox h : X X : x argmin z { h(z) + 2 z x 2}. The following important characteristic property of the proximal operator is well known: x = prox h x x x, y x h( x) h(y) y X. (4) We will often use the following identity (cosine rule): 2 a b, c a = b c 2 a b 2 a c 2 a, b, c X. (5) Let (ˆx, ŷ) be a saddle point of problem (). Then by the definition of the saddle point we have Pˆx,ŷ (x) := g(x) g(ˆx) + K ŷ, x ˆx 0 x X, (6) Dˆx,ŷ (y) := f (y) f (ŷ) K ˆx, y ŷ 0 y Y. (7) The expression Gˆx,ŷ (x, y) = Pˆx,ŷ (x) + Dˆx,ŷ (y) is known as a primal-dual gap. In certain cases when it is clear which saddle point is considered, we will omit the subscript in P, D, and G. It is also important to highlight that for fixed (ˆx, ŷ) functions P ( ), D( ), and G(, ) are convex. Consider the original primal-dual method: y k+ = prox σf (y k + σk x k ) x k+ = prox τg (x k τk y k+ ) x k+ = x k+ + θ(x k+ x k ). In [4] its convergence was proved under assumptions θ =, τ, σ > 0, and τσl 2 <. In the next section we will show how to incorporate a linesearch into this method. 3

4 Algorithm Primal-dual algorithm with linesearch Initialization: Choose x 0 X, y Y, τ 0 > 0, µ (0, ), δ (0, ), and β > 0. Set θ 0 =. Main iteration:. Compute x k = prox τk g(x k τ k K y k ). 2. Choose any τ k [τ k, τ k + θk ] and run Linesearch: 2.a. Compute θ k = τ k τ k x k = x k + θ k (x k x k ) y k+ = prox βτk f (yk + βτ k K x k ) 2.b. Break linesearch if βτ k K y k+ K y k δ y k+ y k (8) Otherwise, set τ k := τ k µ and go to 2.a. End of linesearch 2 Linesearch The primal-dual algorithm with linesearch (PDAL) is summarized in Algorithm. Given all information from the current iterate: x k, y k, τ k, and θ k, we first choose some trial step τ k [τ k, τ k + θk ], then during every iteration of the linesearch it is decreased by µ (0, ). At the end of the linesearch we obtain a new iterate y k+ and a step size τ k which will be used to compute the next iterate x k+. In Algorithm there are two opposite options: always start the linesearch from the largest possible step τ k = τ k + θk, or, the contrary, never increase τ k. Step 2 also allows us to choose compromise between them. Note that in PDAL parameter β plays the role of the ratio σ between fixed steps in τ PDA. Thus, we can rewrite the stopping criteria (8) as τ k σ k K y k+ K y k 2 δ 2 y k+ y k 2, where σ k = βτ k. Of course in PDAL we can always choose fixed steps τ k, σ k with τ k σ k δ2 and set θ L 2 k =. In this case PDAL will coincide with PDA, though our proposed analysis seems to be new. Parameter δ is mainly used for theoretical purposes: in order to be able to control y k+ y k 2, we need δ <. For the experiments δ should be chosen very close to. Remark. It is clear that each iteration of the linesearch requires computation of prox βτk f ( ) and K y k+. As in problem () we can always exchange primal and dual variables, hence it makes sense to choose for the dual variable in PDAL the one for which the respective prox-operator is simpler to compute. Note that during the linesearch we need to compute Kx k only once and then use that K x k = ( + θ k )Kx k θ k Kx k. 4

5 Remark 2. Note that when prox σf is a linear (or affine) operator, the linesearch becomes extremely simple: it does not require any additional matrix-vector multiplications. We itemize some examples below:. f (y) = c, y. Then it is easy to verify that prox σf u = u σc. Thus, we have y k+ = prox σk f (yk + σ k K x k ) = y k + σ k K x k σ k c K y k+ = K y k + σ k (K K x k K c) 2. f (y) = 2 y b 2. Then prox σf u = u+σb +σ and we obtain y k+ = prox σk f (yk + σ k K x k ) = yk + σ k (K x k + b) + σ k K y k+ = + σ k (K y k + σ k (K K x k + K b)) 3. f (y) = δ H (y), the indicator function of the hyperplane H = {u: u, a = b}. Then prox σf u = P H u = u + b u,a a. And hence, a 2 y k+ = prox σk f (yk + σ k K x k ) = y k + σ k K x k + b a, yk + σ k K x k a a 2 K y k+ = K y k + σ k K K x k + b a, yk + σ k K x k a 2 K a. Evidently, each iteration of PDAL for all the cases above requires only two matrix-vector multiplications. Indeed, before the linesearch starts we have to compute Kx k and K Kx k and then during the linesearch we should use the following relations: K x k = ( + θ k )Kx k θ k Kx k K K x k = ( + θ k )K Kx k θ k K Kx k, where Kx k and K Kx k should be reused from the previous iteration. All other operations are comparably cheap and, hence, the cost per iteration of PDA and PDAL are almost the same. However, this might not be true if the matrix K is very sparse. For example, for a finite differences operator the cost of a matrix-vector multiplication is the same as the cost of a few vector-vector additions. One simple implication of these facts is that for the regularized least-squares problem min x 2 Ax b 2 + g(x), (9) our method does not require one to know A but at the same time does not require any additional matrix-vector multiplication as it is the case for standard first order methods with backtracking (e.g. proximal gradient method, FISTA). In fact, we can rewrite (9) as min x max g(x) + Ax, y y 2 y + b 2. (0) Here f (y) = y + 2 b 2 and hence, we are in the situation where the operator prox f is affine. By construction of the algorithm we simply have the following: 5

6 Lemma. (i) The linesearch in PDAL always terminates. (ii) There exists τ > 0 such that τ k > τ for all k 0. (iii) There exists θ > 0 such that θ k θ for all k 0. Proof. (i) In each iteration of the linesearch τ k is multiplied by factor µ <. Since, (8) is satisfied for any τ k, the inner loop can not run indefinitely. δ βl (ii) Without loss of generality, assume that τ 0 > βl δµ. Our goal is to show that from τ k > βl δµ follows τ k > βl δµ. Suppose that τ k = τ k + θk µ i for some i Z +. If i = 0 then τ k > τ k > δµ βl. If i > 0 then τ k = τ k + θk µ i does not satisfy (8). Thus, τ k > βl δ and hence, τ k > βl δµ. (iii) From τ k τ k + θk and θ k = τ k τ k it follows that θ k + θ k. From this it can be easily concluded that θ k 5+ 2 for all k Z +. In fact, assume the contrary, and let r be the smallest number such that θ r > Since θ 0 =, we have r, and hence θ r θ 2 r > This yields a contradiction. One might be interested in how many iterations the linesearch needs in order to terminate. Of course, we can give only a priori upper bounds. Assume that (τ k ) is bounded from above by τ max. Consider in step 2 of Algorithm two opposite ways of choosing step size τ k : maximally increase it, that is, always set τ k = τ k + θk or never increase it and set τ k = τ k. If the former case holds, then after i iterations of the linesearch we have τ k τ k + θk µ i τ 5+ max µ i, where we have used the 2 2δ bound from Lemma (iii). Hence, if i log µ βl(+ 5)τ max, then τ k δ and the βl linesearch procedure must terminate. Similarly, if the latter case holds, then after at δ most + log µ βlτ max iterations the linesearch stops. However, because in this case (τ k ) is nonincreasing, this number is the upper bound for the total number of the linesearch iterations. The following is our main convergence result. Theorem. Let (x k, y k ) be a sequence generated by PDAL. Then it is a bounded sequence in X Y and all its cluster points are solutions of (). Moreover, if g dom g is continuous and (τ k ) is bounded from above then the whole sequence (x k, y k ) converges to a solution of (). The condition of g dom g to be continuous is not restrictive: it holds for any g with open dom g (this includes all finite-valued functions) or for an indicator δ C of any closed convex set C. Also it holds for any separable convex l.s.c. function (Corollary 9.5, [2]). The boundedness of (τ k ) from above is rather theoretical: clearly we can easily bound it in PDAL. 6

7 Proof. Let (ˆx, ŷ) be any saddle point of (). From (4) it follows that x k+ x k + τ k K y k+, ˆx x k+ τ k (g(x k+ ) g(ˆx)) () β (yk+ y k ) τ k K x k, ŷ y k+ τ k (f (y k+ ) f (ŷ)). (2) Since x k = prox τk g(x k τ k K y k ), we have again by (4) that for all x X x k x k + τ k K y k, x x k τ k (g(x k ) g(x)). After substitution in the last inequality x = x k+ and x = x k we get x k x k + τ k K y k, x k+ x k τ k (g(x k ) g(x k+ )), (3) x k x k + τ k K y k, x k x k τ k (g(x k ) g(x k )). (4) Adding (3), multiplied by θ k = τ k τ k, and (4), multiplied by θ 2 k, we obtain x k x k + τ k K y k, x k+ x k τ k (( + θ k )g(x k ) g(x k+ ) θ k g(x k )), (5) where we have used that x k = x k + θ k (x k x k ). Consider the following identity: τ k K y k+ K ŷ, x k ˆx τ k K x k K ˆx, y k+ ŷ = 0. (6) Summing (), (2), (5), and (6), we get x k+ x k, ˆx x k+ + β yk+ y k, ŷ y k+ + x k x k, x k+ x k + τ k K y k+ K y k, x k x k+ τ k K y, x k ˆx + τ k K ˆx, y k+ y τ k ( f (y k+ ) f (ŷ) + ( + θ k )g(x k ) θ k g(x k ) g(ˆx) ). (7) Using that and f (y k+ ) f (ŷ) K ˆx, y k+ ŷ = Dˆx,ŷ (y k+ ) (8) (+θ k )g(x k ) θ k g(x k ) g(ˆx) + K y, x k ˆx = ( + θ k ) ( g(x k ) g(ˆx) + K ŷ, x k ˆx ) θ k ( g(x k ) g(ˆx) + K ŷ, x k ˆx ) = ( + θ k )Pˆx,ŷ (x k ) θ k Pˆx,ŷ (x k ), (9) we can rewrite (7) as (we will not henceforth write the subscripts for P and D unless it is unclear) x k+ x k, ˆx x k+ + β yk+ y k, ŷ y k+ + x k x k, x k+ x k + τ k K y k+ K y k, x k x k+ τ k (( + θ k )P (x k ) θ k P (x k ) + D(y k+ )). (20) 7

8 Let ε k denotes the right-hand side of (20). Using the cosine rule for every item in the first line of (20), we obtain 2 ( xk ˆx 2 x k+ ˆx 2 x k+ x k 2 ) + 2β ( yk ŷ 2 y k+ ŷ 2 y k+ y k 2 ) + 2 ( xk+ x k 2 x k x k 2 x k+ x k 2 ) By (8), Cauchy Schwarz, and Cauchy s inequalities we get τ k K y k+ K y k, x k x k+ from which we derive that + τ k K y k+ K y k, x k x k+ ε k. (2) δ β x k+ x k y k+ y k 2 xk+ x k 2 + δ2 2β yk+ y k 2, 2 ( xk ˆx 2 x k+ ˆx 2 ) + 2β ( yk ŷ 2 y k+ ŷ 2 ) 2 xk x k 2 δ2 2β yk+ y k 2 ε k. (22) Since (ˆx, ŷ) is a saddle point, D(y k ) 0 and P (x k ) 0 and hence (22) yields 2 ( xk ˆx 2 x k+ ˆx 2 ) + 2β ( yk ŷ 2 y k+ ŷ 2 ) 2 xk x k 2 δ2 2β yk+ y k 2 τ k (( + θ k )P (x k ) θ k P (x k )) (23) or, taking into account θ k τ k ( + θ k )τ k, 2 xk+ ˆx 2 + 2β yk+ ŷ 2 + τ k ( + θ k )P (x k ) 2 xk ˆx 2 + 2β yk ŷ 2 + τ k ( + θ k )P (x k ) 2 xk x k 2 δ2 2β yk+ y k 2 (24) From this we deduce that (x k ), (y k ) are bounded sequences and lim k x k x k = 0, lim k y k y k = 0. Also notice that x k+ x k τ k = xk+ x k+ τ k+ 0 as k, 8

9 where the latter holds because τ k is separated from 0 by Lemma. Let (x k i, y k i ) be a subsequence that converges to some cluster point (x, y ). Passing to the limit in (x ki+ x k i ) + K y k i, x x ki+ g(x ki+ ) g(x) τ ki x X, (y ki+ y k i ) K x k i, y y ki+ f (y ki+ ) f (y) βτ ki y Y, we obtain that (x, y ) is a saddle point of (). When g dom g is continuous, g(x k i ) g(x ), and hence, P x,y (xk i ) 0. From (24) it follows that the sequence a k = 2 xk+ x 2 + 2β yk+ y 2 + τ k ( + θ k )P x,y (xk ) is monotone. Taking into account boundedness of (τ k ) and (θ k ), we obtain which means that x k x, y k y. lim a k = lim a ki = 0, k i Theorem 2 (ergodic convergence). Let (x k, y k ) be a sequence generated by PDAL and (ˆx, ŷ) be any saddle point of (). Then for the ergodic sequence (X N, Y N ) it holds that Gˆx,ŷ (X N, Y N ) s N ( 2 x ˆx 2 + ) 2β y ŷ 2 + τ θ Pˆx,ŷ (x 0 ), where s N = N k= τ k, X N = τ θ x 0 + N k= τ k x k τ θ + s N, Y N = Proof. Summing up (22) from k = to N, we get Nk= τ k y k s N. 2 ( x ˆx 2 x N+ ˆx 2 ) + 2β ( y ŷ 2 y N+ ŷ 2 ) The right-hand side in (25) can be expressed as By convexity of P, ε k = τ N ( + θ N )P (x N ) + [( + θ k )τ k θ k τ k )]P (x k ) k= k=2 θ τ P (x 0 ) + τ k D(y k+ ) k= ε k (25) k= τ N ( + θ N )P (x N )+ [( + θ k )τ k θ k τ k )]P (x k ) (26) k=2 (τ θ + s N )P ( τ ( + θ )x + N k=2 τ k x k ) (27) τ θ + s N = (τ θ + s N )P ( τ θ x 0 + N k= τ k x k τ θ + s N ) s N P (X N ), (28) 9

10 where s N = N k= τ k. Similarly, Hence, and we conclude k= G(X N, Y N ) = P (X N ) + D(Y N ) s N Nk= τ k D(y k τ k y k ) s N D( ) = s N D(Y N ). (29) s N ε k s N (P (X N ) + D(Y N )) τ θ P (x 0 ) k= ( 2 x ˆx 2 + ) 2β y ŷ 2 + τ θ P (x 0 ). Clearly, we have the same O(/N) rate of convergence as in [4, 5, 9], though with the ergodic sequence (X N, Y N ) defined in a different way. Analysing the proof of Theorems and 2, the reader may find out that our proof does not rely on the proximal interpretation of the PDA [2]. An obvious shortcoming of our approach is that deriving new extensions of the proposed method, like inertial or relaxed versions [5], still requires some nontrivial efforts. It would be interesting to obtain a general approach for the proposed method and its possible extensions. It is well known that in many cases the speed of convergence of PDA crucially depends on the ratio between primal and dual steps β = σ. Motivated by this, paper [0] proposed τ an adaptive strategy for how to choose β in every iteration. Although it is not the goal of this work to study the strategies for defining β, we show that the analysis of PDAL allows us to incorporate such strategies in a very natural way. Theorem 3. Let (β k ) (β min, β max ) be a monotone sequence with β min, β max > 0 and (x k, y k ) be a sequence generated by PDAL with variable (β k ). Then the statement of Theorem holds. Proof. Let (β k ) be nondecreasing. Then using β k β k, we get from (24) that 2 xk+ ˆx 2 + y k+ ŷ 2 + τ k ( + θ k )P (x k ) 2β k 2 xk ˆx 2 + y k ŷ 2 + τ k ( + θ k )P (x k ) 2β k 2 xk x k 2 δ2 2β k y k+ y k 2. (30) Since (β k ) is bounded from above, the conclusion in Theorem simply follows. If (β k ) is decreasing, then the above arguments should be modified. Consider (23), multiplied by β k, β k 2 ( xk ˆx 2 x k+ ˆx 2 ) + 2 ( yk ŷ 2 y k+ ŷ 2 ) β k 2 xk x k 2 δ2 y k+ y k 2 β k τ k (( + θ k )P (x k ) θ k P (x k )). (3) 2 0

11 As β k < β k, we have θ k β k τ k ( + θ k )β k τ k, which in turn implies β k 2 xk+ ˆx yk+ ŷ 2 + τ k β k ( + θ k )P (x k ) β k 2 xk ˆx yk ŷ 2 + τ k β k ( + θ k )P (x k ) Due to the given properties of (β k ), the rest is trivial. β k 2 xk x k 2 δ2 y k+ y k 2. (32) 2 It is natural to ask if it is possible to use nonmonotone (β k ). To answer this question, we can easily use the strategies from [], where an ADMM with variable steps was proposed. One way is to use any β k during a finite number of iterations and then switch to monotone (β k ). Another way is to relax the monotonicity of (β k ) to the following: there exists (ρ k ) R + such that: k ρ k < and β k β k ( + ρ k ) k N or β k β k ( + ρ k ) k N. We believe that it should be quite straightforward to prove convergence of PDAL with the latter strategy. 3 Acceleration It has been shown [4] that in the case when g or f is strongly convex, one can modify the primal-dual algorithm and derive a better convergence rate. We show that the same holds for PDAL. The main difference of the accelerated variant APDAL from the basic PDAL is that now we have to vary β in every iteration. Of course due to the symmetry of the primal and dual variables in (), we can always assume that the primal objective is strongly convex. However, from the computational point of view it might not be desirable to exchange g and f in the PDAL because of Remark 2. Therefore, we discuss the two cases separately. Also notice that both accelerated algorithms below coincide with PDAL when the parameter of strong convexity γ = g is strongly convex Assume that g is γ strongly convex, i.e., g(x 2 ) g(x ) u, x 2 x + γ 2 x 2 x 2 x, x 2 X, u g(x ). Below we assume that the parameter γ is known. The following algorithm (APDAL) exploits the strong convexity of g: Note that in contrast to PDAL, we set δ =, as in any case we will not be able to prove convergence of (y k ).

12 Algorithm 2 Accelerated primal-dual algorithm with linesearch: g is strongly convex Initialization: Choose x 0 X, y Y, µ (0, ), τ 0 > 0, β 0 > 0. Set θ 0 =. Main iteration:. Compute 2. Choose any τ k [τ βk k β k Linesearch: 2.a. Compute x k = prox τk g(x k τ k K y k ) β k = β k ( + γτ k ), τ βk k β k ( + θ k )] and run τ k θ k = τ k x k = x k + θ k (x k x k ) y k+ = prox βk τ k f (yk + β k τ k K x k ) 2.b. Break linesearch if β k τ k K y k+ K y k y k+ y k (33) Otherwise, set τ k := τ k µ and go to 2.a. End of linesearch Instead of (), now one can use the stronger inequality x k+ x k + τ k K y k+, ˆx x k+ τ k (g(x k+ ) g(ˆx) + γ 2 xk+ ˆx 2 ). (34) In turn, (34) yields a stronger version of (22) (also with β k instead of β): 2 ( xk ˆx 2 x k+ ˆx 2 ) + ( y k ŷ 2 y k+ ŷ 2 ) 2β k or, alternatively, 2 xk ˆx 2 + y k ŷ 2 2β k 2 xk x k 2 Note that the algorithm provides that β k+ β k Then from (36) follows 2 xk x k 2 ε k + γτ k 2 xk+ ˆx 2 (35) ε k + + γτ k x k+ ˆx 2 + β k+ y k+ ŷ 2. (36) 2 β k 2β k+ = + γτ k. For brevity let A k = 2 xk ˆx 2 + 2β k y k ŷ 2. β k+ A k+ + ε k A k β k 2

13 or β k+ A k+ + β k ε k β k A k. Thus, summing the above from k = to N, we get β N+ A N+ + β k ε k β A. (37) k= Using the convexity in the same way as in (26), (29), we obtain β k ε k = β N τ N ( + θ N )P (x N ) + [( + θ k )β k τ k θ k β k τ k )]P (x k ) k= k=2 θ β τ P (x 0 ) + β k τ k D(y k+ ) s N (P (X N ) + D(Y N )) θ σ P (x 0 ), (38) k= where Hence, σ k = β k τ k s N = σ k k= X N = σ θ x 0 + N k= σ k x k Nk= Y N σ k y k+ = σ θ + s N s N β N+ A N+ + s N G(X N, Y N ) β A + θ σ P (x 0 ). (39) From this we deduce that the sequence ( y k ŷ ) is bounded and G(X N, Y N ) s N (β A + θ σ P (x 0 )) x N+ ˆx 2 β N+ (β A + θ σ P (x 0 )). Our next goal is to derive asymptotics for β N and s N. Obviously, (33) holds for any τ k, β k such that τ k β k L. Since in each iteration of the linesearch we decrease τ k by factor of µ, τ k can not be less than µ. Hence, we have β k L µ β k+ = β k ( + γτ k ) β k ( + γ L ) = β k + γµ β k L β k. (40) By induction, one can show that there exists C > 0 such that β k Ck 2 for all k > 0. Then for some constant C > 0 we have x N+ ˆx 2 C (N + ) 2. From (40) it follows that β k+ β k γµ L Ck. As σk = β k+ β k γ, we obtain σ k µ L Ck and thus s N = N k= σ k = O(N 2 ). This means that for some constant C 2 > 0 G(X N, Y N ) C 2 N 2. We have shown the following result: Theorem 4. Let (x k, y k ) be a sequence generated by Algorithm 2. Then x N ˆx = O(/N) and G(X N, Y N ) = O(/N 2 ). 3

14 3.2 f is strongly convex The case when f is γ strongly convex can be treated in a similar way. Algorithm 3 Accelerated primal-dual algorithm with linesearch: f is strongly convex Initialization: Choose x 0 X, y Y, µ (0, ), τ 0 > 0, β 0 > 0. Set θ 0 =. Main iteration:. Compute x k = prox τk g(x k τ k K y k ) β k β k = + γβ k τ k 2. Choose any τ k [τ k, τ k + θk ] and run Linesearch: 2.a. Compute θ k = τ k, σ k = β k τ k τ k x k = x k + θ k (x k x k ) y k+ = prox σk f (yk + σ k K x k ) 2.b. Break linesearch if β k τ k K y k+ K y k y k+ y k Otherwise, set τ k := τ k µ and go to 2.a. End of linesearch Note that again we set δ =. Dividing (22) over τ k and taking into account the strong convexity of f, which has to be used in (2), we deduce ( x k ˆx 2 x k+ ˆx 2 ) + ( y k ŷ 2 y k+ ŷ 2 ) 2τ k 2σ k which can be rewritten as 2τ k x k ˆx 2 + 2σ k y k ŷ 2 2τ k x k x k 2 2τ k x k x k 2 ε k τ k + γ 2 yk+ ŷ 2, (4) ε k + τ k+ x k+ ˆx 2 + σ k+ ( + γσ k ) y k+ ŷ 2, (42) τ k τ k 2τ k+ σ k 2σ k+ Note that by construction of (β k ) in Algorithm 3, we have τ k+ = σ k+ ( + γσ k ). τ k σ k Let A k = 2τ k x k ˆx 2 + 2σ k y k ŷ 2. Then (42) is equivalent to τ k+ A k+ + ε k A k x k x k 2 τ k τ k 2τ k 4

15 or τ k+ A k+ + ε k τ k A k 2 xk x k 2. Finally, summing the above from k = to N, we get τ N+ A N+ + ε k τ A k= 2 x k x k 2 (43) k= From this we conclude that the sequence (x k ) is bounded, lim k x k x k = 0, and G(X N, Y N ) s N (τ A + θ τ P (x 0 )), y N+ ŷ 2 σ N+ τ N+ (τ A + θ τ P (x 0 )) = β N+ (τ A + θ τ P (x 0 )), where X N, Y N, s N are the same as in Theorem 2. Let us turn to the derivation of the asymptotics of (τ N ) and (s N ). Analogously, we have that τ k β µ and thus k L β k+ = β k + γβ k τ k β k + γ µ. βk L Again it is not difficult to show by induction that β k C for some constant C > 0. In k 2 β fact, as φ(β) = is increasing, it is sufficient to show that +γ µ L β β k+ β k + γ µ βk L C k 2 + γ µ L C k 2 C (k + ) 2. The latter inequality is equivalent to C (2 + k ) L, which obviously holds for C large γµ enough (of course C must also satisfy the induction basis). The obtained asymptotics for (β k ) yields τ k µ βk L µk CL, from which we deduce s N = N k= τ k µ CL Nk= k. Finally, we obtain the following result: Theorem 5. Let (x k, y k ) be a sequence generated by Algorithm 3. Then y N ŷ = O(/N) and G(X N, Y N ) = O(/N 2 ). Remark 3. In the case when both g and f are strongly convex, one can derive a new algorithm, combining the ideas of Algorithms 2 and 3 (see [4] for more details). 5

16 4 A more general problem In this section we show how to apply the linesearch procedure to the more general problem min max Kx, y + g(x) f (y) h(y), (44) x X y Y where in addition to the previous assumptions, we suppose that h: Y R is a smooth convex function with L h Lipschitz continuous gradient h. Using the idea of [2], Condat and Vũ in [7, 8] proposed an extension of the primal-dual method to solve (44): y k+ = prox σf (y k + σ(k x k h(y k ))) x k+ = prox τg (x k τk y k+ ) x k+ = 2x k+ x k. This scheme was proved to converge under the condition τσ K 2 σl h. Originally the smooth function was added to the primal part and not to the dual as in (44). For us it is more convenient to consider precisely that form due to the nonsymmetry of the proposed linesearch procedure. However, simply exchanging max and min in (44), we recover the form which was considered in [7]. In addition to the issues related to the operator norm of K which motivated us to derive the PDAL, here we also have to know the Lipschitz constant L h of h. This has several drawbacks. First, its computation might be expensive. Second, our estimation of L h might be very conservative and will result in smaller steps. Third, using local information about h instead of global L h often allows the use of larger steps. Therefore, the introduction of a linesearch to the algorithm above is of great practical interest. The algorithm below exploits the same idea as Algorithm does. However, its stopping criterion is more involved. The interested reader may identify it as a combination of the stopping criterion (8) and the descent lemma for smooth functions h. Note that in the case h 0, Algorithm 4 corresponds exactly to Algorithm. We briefly sketch the proof of convergence. Let (ˆx, ŷ) be any saddle point of (). Similarly to (), (2), and (5), we get x k+ x k + τ k K y k+, ˆx x k+ τ k (g(x k+ ) g(ˆx)) β (yk+ y k ) τ k K x k + τ k h(y k ), ŷ y k+ τ k (f (y k+ ) f (ŷ)). θ k (x k x k ) + τ k K y k, x k+ x k τ k (( + θ k )g(x k ) g(x k+ ) θ k g(x k )). Summation of the three inequalities above and identity (6) yields x k+ x k, ˆx x k+ + β yk+ y k, ŷ y k+ + θ k x k x k, x k+ x k + τ k K y k+ K y k, x k x k+ τ k K y, x k ˆx + τ k K ˆx, y k+ y + τ k h(y k ), ŷ y k+ τ k ( f (y k+ ) f (ŷ) + ( + θ k )g(x k ) θ k g(x k ) g(ˆx) ). (46) 6

17 Algorithm 4 General primal-dual algorithm with linesearch Initialization: Choose x 0 X, y Y, τ 0 > 0, µ (0, ), δ (0, ) and β 0 > 0. Set θ 0 =. Main iteration:. Compute x k = prox τk g(x k τ k K y k ). 2. Choose any τ k [τ k, τ k + θk ] and run Linesearch: 2.a. Compute θ k = τ k, σ k = βτ k τ k 2.b. Break linesearch if x k = x k + θ k (x k x k ) y k+ = prox σk f (yk + σ k (K x k h(y k ))) τ k σ k K y k+ K y k 2 + 2σ k [h(y k+ ) h(y k ) h(y k ), y k+ y k ] Otherwise, set τ k := τ k µ and go to 2.a. End of linesearch δ y k+ y k 2 (45) By convexity of h, we have τ k (h(ŷ) h(y k ) h(y k ), ŷ y k ) 0. Combining it with inequality (45), divided over 2β, we get δ 2β yk+ y k 2 τ 2 k 2 K y k+ K y k 2 τ k [h(y k+ ) h(ŷ) h(y k ), y k+ ŷ ]. Adding the above inequality to (46) gives us x k+ x k, ˆx x k+ + β yk+ y k, ŷ y k+ + θ k x k x k, x k+ x k + τ k K y k+ K y k, x k x k+ τ k K y, x k ˆx + τ k K ˆx, y k+ y + δ 2β yk+ y k 2 τ 2 k 2 K y k+ K y k 2 τ k ( (f + h)(y k+ ) (f + h)(ŷ) + ( + θ k )g(x k ) θ k g(x k ) g(ˆx) ). (47) Note that for problem (44) instead of (7) we have to use Dˆx,ŷ (y) := f (y) + h(y) f (ŷ) h(ŷ) K ˆx, y ŷ 0 y Y, (48) 7

18 which is true by definition of (ˆx, ŷ). Now we can rewrite (47) as x k+ x k, ˆx x k+ + β yk+ y k, ŷ y k+ + θ k x k x k, x k+ x k + τ k K y k+ K y k, x k x k+ + δ 2β yk+ y k 2 τ 2 k 2 K y k+ K y k 2 τ k ( ( + θk )P (x k ) θ k P (x k ) + D(y k+ ) ). (49) Applying cosine rules for all inner products in the first line in (49) and using that θ k (x k x k ) = x k x k, we obtain 2 ( xk ˆx 2 x k+ ˆx 2 x k+ x k 2 )+ 2β ( yk ŷ 2 y k+ ŷ 2 y k+ y k 2 ) + 2 ( xk+ x k 2 x k x k 2 x k+ x k 2 ) + τ k K y k+ K y k x k+ x k + δ 2β yk+ y k 2 τ 2 k 2 K y k+ K y k 2 τ k ( ( + θk )P (x k ) θ k P (x k ) + D(y k+ ) ). (50) Finally, applying Cauchy s inequality in (50) and using that τ k θ k τ k ( + θ k ), we get 2 xk+ ˆx 2 + 2β yk+ ŷ 2 + τ k ( + θ k )P (x k ) 2 xk ˆx 2 + 2β yk ŷ 2 + τ k ( + θ k )P (x k ) 2 xk x k 2 δ 2β yk+ y k 2, (5) from which the convergence of (x k ) and (y k ) to a saddle point of (44) can be derived in a similar way as in Theorem. 5 Numerical Experiments This section collects several numerical tests that will illustrate the performance of the proposed methods. Computations were performed using Python 3 on an Intel Core i3-2350m CPU 2.30GHz running 64-bit Debian Linux 8.7. For PDAL and APDAL we initialize the input data as µ = 0.7, δ = 0.99, τ 0 = min {m,n} A F. The latter is easy to compute and it is an upper bound of. The parameter A β for PDAL is always taken as σ in PDA with fixed steps σ and τ. A trial step τ τ k in Step 2 is always chosen as τ k = τ k + θk. Codes can be found on 8

19 5. Matrix game We are interested in the following min-max matrix game: min max Ax, y, (52) x n y m where x R n, y R m, A R m n, and m, n denote the standard unit simplices in R m and R n, respectively. For this problem we study the performance of PDA, PDAL (Algorithm ), Tseng s FBF method [7], and PEGM [5]. For comparison we use the primal-dual gap G(x, y), which can be easily computed for a feasible pair (x, y) as G(x, y) = max(ax) i min(a y) j. i j Since iterates obtained by Tseng s method may be infeasible, we used an auxiliary point (see [7]) to compute the primal-dual gap. The initial point in all cases was chosen as x 0 = (,..., ) and n y0 = (,..., ). In m order to compute projection ontothe unit simplex we used the algorithm from [8]. For PDA we use τ = σ = / A = / λ max (A A), which we compute in advance. The input data for FBF and PEGM are the same as in [5]. Note that these methods also use a linesearch. We consider four differently generated samples of the matrix A R m n :. m = n = 00. All entries of A are generated independently from the uniform distribution in [, ]. 2. m = n = 00. All entries of A are generated independently from the the normal distribution N (0, ). 3. m = 500, n = 00. All entries of A are generated independently from the normal distribution N (0, ). 4. m = 000, n = The matrix A is sparse with 0% nonzero elements generated independently from the uniform distribution in [0, ]. For every case we report the primal-dual gap G(x k, y k ) computed in every iteration vs CPU time. The results are presented in Figure. 5.2 l -regularized least squares We study the following l -regularized problem: min x φ(x) := 2 Ax b 2 + λ x, (53) where A R m n, x R n, b R m. Let g(x) = λ x, f(p) = 2 p b 2. Analogously to (9) and (0), we can rewrite (53) as min x max g(x) + Ax, y f (y), y 9

20 PDA PDAL FBF PEGM CPU time, seconds PD gap (x k, y k ) PD gap (x k, y k ) PDA PDAL FBF PEGM PDA PDAL FBF PEGM CPU time, seconds CPU time, seconds (b) Example PD gap (x k, y k ) PD gap (x k, y k ) (a) Example (c) Example PDA PDAL FBF PEGM CPU time, seconds (d) Example 4 Figure : Convergence plots for problem (52) where f (y) = 2 kyk2 +(b, y) = 2 ky +bk2 2 kbk2. Clearly, the last term does not have any impact on the prox-term and we can conclude that proxλf (y) = y λb. This means that +λ the linesearch in Algorithm does not require any additional matrix-vector multiplication (see Remark 2). We generate four instances of problem (53), on which we compare the performance of PDA, PDAL, APDA (accelerated primal-dual algorithm), APDAL, FISTA [3], and SpaRSA [9]. The latter method is a variant of the proximal gradient method with an adaptive linesearch. All methods except q PDAL and SpaRSA require predefined step sizes. For this we compute in advance kak = λmax (A A). For all instances below we use the following parameters: PDA: σ =,τ 20kAk = 20 ; kak PDAL: β = /400; APDA, APDAL: β =, γ = 0.; FISTA: α = ; kak2 SpaRSA: In the first iteration we run a standard Armijo linesearch procedure to 20

21 define α 0 and then we run SpaRSA with parameters as described in [9]: M = 5, σ = 0.0, α max = /α min = For all cases below we generate some random w R n in which s random coordinates are drawn from from the uniform distribution in [ 0, 0] and the rest are zeros. Then we generate ν R m with entries drawn from N (0, 0.) and set b = Aw + ν. The parameter λ = 0. for all examples and the initial points are x 0 = (0,..., 0), y 0 = Ax 0 b. The matrix A R m n is constructed in one of the following ways:. n = 000, m = 200, s = 0. All entries of A are generated independently from N (0, ). 2. n = 2000, m = 000, s = 00. All entries of A are generated independently from N (0, ). 3. n = 5000, m = 000, s = 50. First, we generate the matrix B with entries from N (0, ). Then for any p (0, ) we construct the matrix A by columns A j, j =,..., n as follows: A = B p, A 2 j = p A j + B j. As p increases, A becomes more ill-conditioned (see [] where this example was considered). In this experiment we take p = The same as the previous example, but with p = 0.9. Figure 2 collects the convergence results with φ(x k ) φ vs CPU time. Since in fact the value φ is unknown, we instead run our algorithms for sufficiently many iterations to obtain the ground truth solution x. Then we simply set φ = φ(x ). 5.3 Nonnegative least squares Next, we consider another regularized least squares problem: min φ(x) := x 0 2 Ax b 2, (54) where A R m n, x R n, b R m. Similarly as before, we can express it as min x max g(x) + Ax, y f y), (55) y where g(x) = δ R n + (x), f (y) = 2 y+b 2. For all cases below we generate a random matrix A R m n with density d (0, ). In order to make the optimal value φ = 0, we always generate w as a sparse vector in R n whose s nonzero entries are drawn from the uniform distribution in [0, 00]. Then we set b = Aw. We test the performance of the same algorithms as in the previous example. For FISTA and SpaRSA we use the same parameters as before. For every instance of the problem, PDA and PDAL use the same β. For APDA and APDAL we always set β = and γ = 0.. The initial points are x 0 = (0,..., 0) and y 0 = Ax 0 b = b. We generate our data as follows:. m = 2000, n = 4000, d =, s = 000; the entries of A are generated independently from the uniform distribution in [, ]. β = 25. 2

22 (x) * PDA PDAL APDA APDAL FISTA SPARSA CPU time, seconds (a) Example (x) * PDA PDAL APDA APDAL FISTA SPARSA CPU time, seconds (b) Example 2 (x) * PDA PDAL APDA APDAL FISTA SPARSA CPU time, seconds (c) Example 3 (x) * PDA PDAL APDA APDAL FISTA SPARSA CPU time, seconds (d) Example 4 Figure 2: Convergence plots for problem (53) 2. m = 000, n = 2000, d = 0.5, s = 00; the nonzero entries of A are generated independently from the uniform distribution in [0, ]. β = m = 3000, n = 5000, d = 0., s = 00; the nonzero entries of A are generated independently from the uniform distribution in [0, ]. β = m = 0000, n = 20000, d = 0.0, s = 500; the nonzero entries of A are generated independently from the normal distribution N (0, ). β =. As (54) is again just a regularized least squares problem, the linesearch does not require any additional matrix-vector multiplications. The results with φ(x k ) φ vs. CPU time are presented in Figure 3. Although primal-dual methods may converge faster, they require tuning β = σ/τ. It is also interesting to highlight that sometimes non-accelerated methods with a properly chosen ratio σ/τ can be faster than their accelerated variants. 6 Conclusion In this work, we have presented several primal-dual algorithms with linesearch. On the one hand, this allows us to avoid the evaluation of the operator norm, and on the other 22

23 (x k ) * PDA PDAL APDA APDAL FISTA SPARSA CPU time, seconds (a) Example (x k ) * PDA PDAL APDA APDAL FISTA SPARSA CPU time, seconds (b) Example 2 (x k ) * PDA PDAL APDA APDAL FISTA SPARSA CPU time, seconds (c) Example 3 (x k ) * PDA PDAL APDA APDAL FISTA SPARSA CPU time, seconds (d) Example 4 Figure 3: Convergence plots for problem (54) hand, it allows us to make larger steps. The proposed linesearch is very simple and in many important cases it does not require any additional expensive operations (such as matrix-vector multiplications or prox-operators). For all methods we have proved convergence. Our experiments confirm the numerical efficiency of the proposed methods. Acknowledgements: The work is supported by the Austrian science fund (FWF) under the project "Efficient Algorithms for Nonsmooth Optimization in Imaging" (EANOI) No. I48. The authors also would like to thank the referees and the SIOPT editor Wotao Yin for their careful reading of the manuscript and their numerous helpful comments. References [] A. Agarwal, S. Negahban, and M. J. Wainwright. Fast global convergence rates of gradient methods for high-dimensional statistical recovery. In Adv. Neur. In., pages 37 45, 200. [2] H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York,

24 [3] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problem. SIAM J. Imaging Sci., 2():83 202, [4] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging. Vis., 40():20 45, 20. [5] A. Chambolle and T. Pock. On the ergodic convergence rates of a first-order primal dual algorithm. Math. Program., pages 35, 205. [6] A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta Numerica, 25:6 39, 206. [7] L. Condat. A primal dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms. J. Optimiz. Theory App., 58(2): , 203. [8] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l -ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pages , [9] T. Goldstein, M. Li, and X. Yuan. Adaptive primal-dual splitting methods for statistical learning and image processing. In Advances in Neural Information Processing Systems, pages , 205. [0] T. Goldstein, M. Li, X. Yuan, E. Esser, and R. Baraniuk. Adaptive primal-dual hybrid gradient methods for saddle-point problems. arxiv preprint arxiv: , 203. [] B. He, H. Yang, and S. Wang. Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optimiz. Theory App., 06(2): , [2] B. He and X. Yuan. Convergence analysis of primal-dual algorithms for a saddlepoint problem: From contraction perspective. SIAM J. Imag. Sci., 5():9 49, 202. [3] N. Komodakis and J. C. Pesquet. Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems. IEEE Signal Proc. Mag., 32(6):3 54, 205. [4] Y. Malitsky. Reflected projected gradient method for solving monotone variational inequalities. SIAM J. Optimiz., 25(): , 205. [5] Y. Malitsky. Proximal extrapolated gradient methods for variational inequalities. Optimization Methods and Software, 33():40 64, 208. [6] T. Pock and A. Chambolle. Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In Computer Vision (ICCV), 20 IEEE International Conference on, pages IEEE,

25 [7] P. Tseng. A modified forward-backward splitting method for maximal monotone mappings. SIAM J. Control, 38:43 446, [8] B. C. Vũ. A splitting algorithm for dual monotone inclusions involving cocoercive operators. Advances in Computational Mathematics, 38(3):667 68, 203. [9] S. J. Wright, R. D. Nowak, and M. A. Figueiredo. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7): ,

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

arxiv: v1 [math.oc] 21 Apr 2016

arxiv: v1 [math.oc] 21 Apr 2016 Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv:604.068v [math.oc] Apr 06 Abstract We study acceleration and

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

Golden Ratio Algorithms for Variational Inequalities

Golden Ratio Algorithms for Variational Inequalities Golden Ratio Algorithms for Variational Inequalities Yura Malitsky Abstract arxiv:1803.08832v1 [math.oc] 23 Mar 2018 The paper presents a fully explicit algorithm for monotone variational inequalities.

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING KATYA SCHEINBERG, DONALD GOLDFARB, AND XI BAI Abstract. We propose new versions of accelerated first order methods for convex

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

arxiv: v4 [math.oc] 29 Jan 2018

arxiv: v4 [math.oc] 29 Jan 2018 Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

1 Introduction and preliminaries

1 Introduction and preliminaries Proximal Methods for a Class of Relaxed Nonlinear Variational Inclusions Abdellatif Moudafi Université des Antilles et de la Guyane, Grimaag B.P. 7209, 97275 Schoelcher, Martinique abdellatif.moudafi@martinique.univ-ag.fr

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

arxiv: v3 [math.oc] 29 Jun 2016

arxiv: v3 [math.oc] 29 Jun 2016 MOCCA: mirrored convex/concave optimization for nonconvex composite functions Rina Foygel Barber and Emil Y. Sidky arxiv:50.0884v3 [math.oc] 9 Jun 06 04.3.6 Abstract Many optimization problems arising

More information

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Radu Ioan Boţ Christopher Hendrich November 7, 202 Abstract. In this paper we

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018 Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Second order forward-backward dynamical systems for monotone inclusion problems

Second order forward-backward dynamical systems for monotone inclusion problems Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from

More information

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Mathematical Programming manuscript No. (will be inserted by the editor) On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Daniel O Connor Lieven Vandenberghe

More information

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS Igor V. Konnov Department of Applied Mathematics, Kazan University Kazan 420008, Russia Preprint, March 2002 ISBN 951-42-6687-0 AMS classification:

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection

A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection EUSIPCO 2015 1/19 A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection Jean-Christophe Pesquet Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est

More information

Primal-dual algorithms for the sum of two and three functions 1

Primal-dual algorithms for the sum of two and three functions 1 Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms

More information

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Peter Ochs, Jalal Fadili, and Thomas Brox Saarland University, Saarbrücken, Germany Normandie Univ, ENSICAEN, CNRS, GREYC, France

More information

On the order of the operators in the Douglas Rachford algorithm

On the order of the operators in the Douglas Rachford algorithm On the order of the operators in the Douglas Rachford algorithm Heinz H. Bauschke and Walaa M. Moursi June 11, 2015 Abstract The Douglas Rachford algorithm is a popular method for finding zeros of sums

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

arxiv: v2 [math.oc] 21 Nov 2017

arxiv: v2 [math.oc] 21 Nov 2017 Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems arxiv:1711.06831v [math.oc] 1 Nov 017 Lei Yang Abstract In this paper, we consider a class of

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Compressed Sensing: a Subgradient Descent Method for Missing Data Problems

Compressed Sensing: a Subgradient Descent Method for Missing Data Problems Compressed Sensing: a Subgradient Descent Method for Missing Data Problems ANZIAM, Jan 30 Feb 3, 2011 Jonathan M. Borwein Jointly with D. Russell Luke, University of Goettingen FRSC FAAAS FBAS FAA Director,

More information

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities Caihua Chen Xiaoling Fu Bingsheng He Xiaoming Yuan January 13, 2015 Abstract. Projection type methods

More information

Convergence rate estimates for the gradient differential inclusion

Convergence rate estimates for the gradient differential inclusion Convergence rate estimates for the gradient differential inclusion Osman Güler November 23 Abstract Let f : H R { } be a proper, lower semi continuous, convex function in a Hilbert space H. The gradient

More information

arxiv: v7 [math.oc] 22 Feb 2018

arxiv: v7 [math.oc] 22 Feb 2018 A SMOOTH PRIMAL-DUAL OPTIMIZATION FRAMEWORK FOR NONSMOOTH COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, OLIVIER FERCOQ, AND VOLKAN CEVHER arxiv:1507.06243v7 [math.oc] 22 Feb 2018 Abstract. We propose a

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm

More information

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator https://doi.org/10.1007/s10915-018-0680-3 A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator Ming Yan 1,2 Received: 22 January 2018 / Accepted: 22 February 2018

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

ADMM for monotone operators: convergence analysis and rates

ADMM for monotone operators: convergence analysis and rates ADMM for monotone operators: convergence analysis and rates Radu Ioan Boţ Ernö Robert Csetne May 4, 07 Abstract. We propose in this paper a unifying scheme for several algorithms from the literature dedicated

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Lecture: Algorithms for Compressed Sensing

Lecture: Algorithms for Compressed Sensing 1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

In collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29

In collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29 A Random block-coordinate primal-dual proximal algorithm with application to 3D mesh denoising Emilie CHOUZENOUX Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est, France Horizon Maths 2014

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Primal and Dual Predicted Decrease Approximation Methods

Primal and Dual Predicted Decrease Approximation Methods Primal and Dual Predicted Decrease Approximation Methods Amir Beck Edouard Pauwels Shoham Sabach March 22, 2017 Abstract We introduce the notion of predicted decrease approximation (PDA) for constrained

More information

Linear Programming Redux

Linear Programming Redux Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Universal Gradient Methods for Convex Optimization Problems

Universal Gradient Methods for Convex Optimization Problems CORE DISCUSSION PAPER 203/26 Universal Gradient Methods for Convex Optimization Problems Yu. Nesterov April 8, 203; revised June 2, 203 Abstract In this paper, we present new methods for black-box convex

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Renato D. C. Monteiro B. F. Svaiter August, 010 Revised: December 1, 011) Abstract In this paper,

More information

Positive semidefinite matrix approximation with a trace constraint

Positive semidefinite matrix approximation with a trace constraint Positive semidefinite matrix approximation with a trace constraint Kouhei Harada August 8, 208 We propose an efficient algorithm to solve positive a semidefinite matrix approximation problem with a trace

More information

M. Marques Alves Marina Geremia. November 30, 2017

M. Marques Alves Marina Geremia. November 30, 2017 Iteration complexity of an inexact Douglas-Rachford method and of a Douglas-Rachford-Tseng s F-B four-operator splitting method for solving monotone inclusions M. Marques Alves Marina Geremia November

More information

An inexact subgradient algorithm for Equilibrium Problems

An inexact subgradient algorithm for Equilibrium Problems Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order

More information

A double projection method for solving variational inequalities without monotonicity

A double projection method for solving variational inequalities without monotonicity A double projection method for solving variational inequalities without monotonicity Minglu Ye Yiran He Accepted by Computational Optimization and Applications, DOI: 10.1007/s10589-014-9659-7,Apr 05, 2014

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS. Yazheng Dang. Jie Sun. Honglei Xu

INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS. Yazheng Dang. Jie Sun. Honglei Xu Manuscript submitted to AIMS Journals Volume X, Number 0X, XX 200X doi:10.3934/xx.xx.xx.xx pp. X XX INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS Yazheng Dang School of Management

More information

Inertial Douglas-Rachford splitting for monotone inclusion problems

Inertial Douglas-Rachford splitting for monotone inclusion problems Inertial Douglas-Rachford splitting for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek Christopher Hendrich January 5, 2015 Abstract. We propose an inertial Douglas-Rachford splitting algorithm

More information

Adaptive Restarting for First Order Optimization Methods

Adaptive Restarting for First Order Optimization Methods Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization

An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization H. Mansouri M. Zangiabadi Y. Bai C. Roos Department of Mathematical Science, Shahrekord University, P.O. Box 115, Shahrekord,

More information

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms JOTA manuscript No. (will be inserted by the editor) Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms Peter Ochs Jalal Fadili Thomas Brox Received: date / Accepted: date Abstract

More information