Local Linear Convergence Analysis of Primal Dual Splitting Methods

Size: px
Start display at page:

Download "Local Linear Convergence Analysis of Primal Dual Splitting Methods"

Transcription

1 Local Linear Convergence Analysis of Primal Dual Splitting Methods Jingwei Liang a and Jalal Fadili b and Gabriel Peyré c a DAMTP, University of Cambridge, UK; b Normandie Université, ENSICAEN, CNS, GEYC, France; b CNS, DMA, ENS Paris, France ATICLE HISTOY Compiled January 8, 2018 ABSTACT In this paper, we study the local linear convergence properties of a versatile class of Primal Dual splitting methods for minimizing composite non-smooth convex optimization problems. Under the assumption that the non-smooth components of the problem are partly smooth relative to smooth manifolds, we present a unified local convergence analysis framework for these methods. More precisely, in our framework we first show that (i) the sequences generated by Primal Dual splitting methods identify a pair of primal and dual smooth manifolds in a finite number of iterations, and then (ii) enter a local linear convergence regime, which is characterized based on the structure of the underlying active smooth manifolds. We also show how our results for Primal Dual splitting can be specialized to cover existing ones on Forward Backward splitting and Douglas achford splitting/admm (alternating direction methods of multipliers). Moreover, based on these obtained local convergence analysis result, several practical acceleration techniques are discussed. To exemplify the usefulness of the obtained result, we consider several concrete numerical experiments arising from fields including signal/image processing, inverse problems and machine learning, etc. The demonstration not only verifies the local linear convergence behaviour of Primal Dual splitting methods, but also the insights on how to accelerate them in practice. KEYWODS Primal Dual splitting, Forward Backward splitting, Douglas achford/admm Partial Smoothness, Local Linear Convergence. AMS CLASSIFICATION 49J52, 65K05, 65K Introduction 1.1. Composed optimization problem In various fields such as inverse problems, signal and image processing, statistics and machine learning etc., many problems are (eventually) formulated as structured optimization problems (see Section 6 for some specific examples). A typical example of CONTACT Jingwei Liang. jl993@cam.ac.uk. Most of this work was done while Jingwei Liang was at Normandie Université, ENSICAEN, France.

2 these optimization problems, given in its primal form, reads min x n (x) + F (x) + (J + G)(Lx), (P P ) where (J + G)( ) def = inf v m J( ) + G( v) denotes the infimal convolution of J and G. Throughout, we assume the following: (A.1), F Γ 0 ( n ) with Γ 0 ( n ) being the class of proper convex and lower semicontinuous functions on n, and F is (1/β F )-Lipschitz continuous for some β F > 0, (A.2) J, G Γ 0 ( m ), and G is β G -strongly convex for some β G > 0. (A.3) L : n m is a linear mapping. (A.4) 0 ran( + F +L ( J G)L), where J G def = ( J 1 + G 1 ) 1 is the parallel sum of the subdifferential J and G, and ran( ) denotes the range of a set-valued operator. See emark 3 for the reasoning of this condition. The main difficulties of solving such a problem are that the objective function is nonsmooth, the presence of the linear operator L and the infimal convolution. Consider also the Fenchel-ockafellar dual problem [41] of (P P ), min J (v) + G (v) + ( + F )( L v). (P D ) v m The classical Kuhn-Tucker theory asserts that a pair (x, v ) n m solves (P P )- (P D ) if it satisfies the monotone inclusion [ L 0 L J ] ( x v ) + [ ] ( ) F 0 x 0 G v. (1) One can observe that in (1), the composition of the linear operator and the infimal convolution is decoupled, hence providing possibilities to achieve full splitting. This is a key property used by all Primal Dual algorithms we are about to review. In turn, solving (1) provides a pair of points that are solutions to (P P ) and (P D ) respectively. More complex forms of (P P ) involving for instance a sum of infimal convolutions can be tackled in a similar way using a product space trick, as we will see in Section Primal Dual splitting methods Primal Dual splitting methods to solve more or less complex variants of (P P )-(P D ) have witnessed a recent wave of interest in the literature [12,13,16,18,21,28,48]. All these methods achieve full splitting, they involve the resolvents of and J, the gradients of F and G and the linear operator L, all separately at various points in the course of iteration. For instance, building on the work of [3], the now-popular scheme proposed in [13] solves (P P )-(P D ) with F = G = 0. The authors in [28] have shown that the Primal Dual splitting method of [13] can be seen as a proximal point algorithm (PPA) in n m endowed with a suitable norm. Exploiting the same idea, the author in [21] considered (P P ) with G = 0, and proposed an iterative scheme which can be interpreted as a Forward Backward (FB) splitting again under an appropriately renormed space. This idea is further extended in [48] to solve more complex problems such as that in (P P ). A variable metric version was proposed in [19]. Motivated by the structure of (1), [12] and [18] proposed a Forward Backward-Forward 2

3 scheme [46] to solve it. In this paper, we focus the unrelaxed Primal Dual splitting method summarized in Algorithm 1, which covers [13,16,21,28,48]. Though we omit the details here for brevity, our analysis and conclusions carry through to the method proposed in [12,18]. Algorithm 1: A Primal Dual splitting method Initial: Choose, > 0 and θ [ 1, + [. For k = 0, x 0 n, v 0 m ; repeat k = k + 1; until convergence; x k+1 = prox γ (x k F (x k ) L v k ), x k+1 = x k+1 + θ(x k+1 x k ), v k+1 = prox γj J (v k G (v k ) + L x k+1 ), emark 1. (i) Algorithm 1 is somehow an interesting extension to the literature given the choice of θ that we advocate. Indeed, the range [ 1, + [ is larger than the one proposed in [28], which is [ 1, 1]. It encompasses the Primal Dual splitting method in [48] when θ = 1, the one in [21] when moreover G = 0. When both F = G = 0, it reduces to the Primal Dual splitting method proposed in [13,28]. (ii) It can also be verified that Algorithm 1 covers the Forward Backward (FB) splitting [37] (J = G = 0), Douglas achford (D) splitting [24] (if F = G = 0, L = Id and = 1/ ) as special cases; see Section 4 for a discussion or [13] and references therein for more details. Exploiting this relation, in Section 4, we build connections with the results provided in [34,35] for FB-type methods and D/ADMM. It also should be noted that, D splitting is the limiting case of the Primal Dual splitting [13], and the global convergence result of Primal Dual splitting does not apply to D. (2) 1.3. Contributions In the literature, most studies on the convergence rate of Primal Dual splitting methods mainly focus on the global behaviour [9,10,13,14,22,33]. For instance, it is now known that the (partial) duality gap decreases sublinearly (pointwise or in ergodic sense) at the rate O(1/k) [9,13]. This can be accelerated to O(1/k 2 ) on the iterates sequence under strong convexity of either the primal or the dual problem [10,13]. Linear convergence is achieved if both and J are strongly convex [10,13]. However, in practice, local linear convergence of the sequence generated by Algorithm 1 has been observed for many problems in the absence of strong convexity (as confirmed by our numerical experiments in Section 6). None of the existing theoretical analysis was able to explain this behaviour so far. Providing the theoretical underpinnings of this local behaviour is the main goal pursued in this paper. Our main findings can be summarized as follows. Finite time activity identification For Algorithm 1, let (x, v ) be a Kuhn-Tucker pair, i.e. a solution of (1). Under a non-degeneracy condition, and provided that 3

4 both and J are partly smooth relative to C 2 -smooth manifolds, respectively M x and M J v near x and v (see Definition 2.8), we show that the generated primaldual sequence {(x k, v k )} k N which converges to (x, v ) will identify in finite time the manifold M x MJ v (see Theorem 3.2). In plain words, this means that after a finite number of iterations, say K, we have x k M x and v k M J v for all k K. Local linear convergence Capitalizing on this finite identification result, we first show in Proposition 3.4 that the globally non-linear iteration (2) locally linearizes along the identified smooth manifolds, then we deduce that the convergence of the sequence becomes locally linear (see Theorem 3.7). The rate of linear convergence is characterized precisely based on the properties of the identified partly smooth manifolds and the involved linear operator L. Moreover, when F = G = 0, L = Id and, J are locally polyhedral around (x, v ), we show that the convergence rate is parameterized by the cosine of the largest principal angle (see Definition 2.5) between the tangent spaces of the two manifolds at (x, v ) (see Lemma 3.5). This builds a clear connection between the results in this paper and those we drew in our previous works on D and ADMM [35,36] elated work For the past few years, an increasing attention has been paid to investigate the local linear convergence of first-order proximal splitting methods in the absence of strong convexity. This has been done for instance for FB-type splitting [2,11,26,29,45], and D/ADMM [4,8,23] for special objective functions. In our previous work [32,34 36], based on the powerful framework provided by partial smoothness, we unified all the above-mentioned work and provide even stronger claims. To the best of our knowledge, we are aware of only one recent paper [44] which investigated finite identification and local linear convergence of a Primal Dual splitting method to solve a very special instance of (P P ). More precisely, they assumed to be gauge, F = (hence strong convexity of the primal problem), G = 0 and J the indicator function of a point. Our work goes much beyond this limited case. It also deepens our current understanding of local behaviour of proximal splitting algorithms by complementing the picture we started in [34,35] for FB and D methods. Paper organization The rest of the paper is organized as follows. Some useful prerequisites, including partial smoothness, are collected in Section 2. The main contributions of this paper, i.e. finite time activity identification and local linear convergence of Primal Dual splitting under partial smoothness are the core of Section 3. Several discussions on the obtained result are delivered in Section 4. Section 5 extends the results to the case of more than one infimal convolution. In Section 6, we report various numerical experiments to support our theoretical findings. 2. Preliminaries Throughout the paper, N is the set of non-negative integers, n is a n-dimensional real Euclidean space equipped with scalar product, and associated norm. Id n denotes the identity operator on n, where n will be dropped if the dimension is clear from the context. 4

5 Sets For a nonempty convex set C n, denote aff(c) its affine hull, and par(c) the smallest subspace parallel to aff(c). Denote ι C the indicator function of C, and P C the orthogonal projection operator onto the set. Functions The sub-differential of a function Γ 0 ( n ) is a set-valued operator, : n n, x { g n (x ) (x) + g, x x, x n}, (3) which is maximal monotone (see Definition 2.2). For Γ 0 ( n ), the proximity operator of is prox (x) def 1 = argmin z n 2 z x 2 + (z). (4) Given a function J Γ 0 ( m ), its Legendre-Fenchel conjugate is defined as J (y) = sup v m v, y J(v). (5) Lemma 2.1 (Moreau Identity [39]). Let J Γ 0 ( m ), then for any v m and γ > 0, v = prox γj (v) + γprox J /γ(v/γ). (6) Using the Moreau identity, it is straightforward to see that the update of v k in Algorithm 1 can be obtained also from prox J/γJ. Operators Given a set-valued mapping A : n n, its range is ran(a) = {y n : x n s.t. y A(x)}, and graph is gph(a) def = {(x, u) n n u A(x)}. Definition 2.2 (Monotone operator). A set-valued mapping A : n n is called monotone if, x 1 x 2, v 1 v 2 0, (x 1, v 1 ) gph(a) and (x 2, v 2 ) gph(a). (7) It is moreover maximal monotone if gph(a) can not be contained in the graph of any other monotone operator. Let β ]0, + [, B : n n, then B is β-cocoercive if B(x 1 ) B(x 2 ), x 1 x 2 β B(x 1 ) B(x 2 ) 2, x 1, x 2 n, (8) which implies that B is β 1 -Lipschitz continuous. For a maximal monotone operator A, (Id + A) 1 is called its resolvent. It is known that for Γ 0 ( n ), prox = (Id + ) 1 [6, Example 23.3]. Definition 2.3 (Non-expansive operator). An operator F : n n is nonexpansive if F(x 1 ) F(x 2 ) x 1 x 2, x 1, x 2 n. 5

6 For any α ]0, 1[, F is called α-averaged if there exists a non-expansive operator F such that F = αf + (1 α)id. The class of α-averaged operators is closed under relaxation, convex combination and composition [6,20]. In particular when α = 1 2, F is called firmly non-expansive. Let S( n ) = {V n n V T = V} the set of symmetric positive definite matrices acting on n. The Loewner partial ordering on S( n ) is defined as V 1, V 2 S( n ), V 1 V 2 x n, (V 1 V 2 )x, x 0, that is, V 1 V 2 is positive definite. Given any positive constant ν > 0, define S ν as def S ν = { V S( n ) : V νid }, (9) i.e. the set of symmetric positive definite matrices whose eigenvalues are bounded below by ν. For any V S ν, define the following induced scalar product and norm, x, x V = x, Vx, x V = x, Vx, x, x n. By endowing the Euclidean space n with the above scalar product and norm, we obtain the Hilbert space which is denoted by n V. Lemma 2.4. Let the operators A : n n be maximal monotone, B : n n be β-cocoercive, and V S ν. Then for γ ]0, 2βν[, (i) (Id + γv 1 A) 1 : n V n V is firmly non-expansive; (ii) (Id γv 1 B) : n V n γ V is 2βν -averaged non-expansive; (iii) (Id + γv 1 A) 1 (Id γv 1 B) : n V n 2βν V is 4βν γ -averaged non-expansive. Proof. (i) See [19, Lemma 3.7(ii)]; (ii) Since B : n n is β-cocoercive, given any x, x n, we have x x, V 1 B(x) V 1 B(x ) V β B(x) B(x ) 2 = β V ( V 1 B(x) V 1 B(x ) ) 2 = β V 1/2 (V 1 B(x) V 1 B(x ))) 2 V βν V 1 B(x) V 1 B(x ) 2 V, which means that V 1 B : n V n V is (βν)-cocoercive. The rest of the proof follows [6, Proposition 4.33]. (iii) See [40, Theorem 3] Angles between subspaces Let T 1 and T 2 be two subspaces of n. Without the loss of generality, we assume 1 p def = dim(t 1 ) q def = dim(t 2 ) n 1. Definition 2.5 (Principal angles). The principal angles θ k [0, π 2 ], k = 1,..., p be- 6

7 def tween subspaces T 1 and T 2 are defined by, with u 0 = v 0 = 0, and cos(θ k ) def = u k, v k = max u, v s.t. u T 1, v T 2, u = 1, v = 1, u, u i = v, v i = 0, i = 0,, k 1. The principal angles θ k are unique with 0 θ 1 θ 2 θ p π/2. Definition 2.6 (Friedrichs angle). The Friedrichs angle θ F ]0, π 2 ] between the two subspaces T 1 and T 2 is cos ( θ F (T 1, T 2 ) ) def = max u, v s.t. u T 1 (T 1 T 2 ), u = 1, v T 2 (T 1 T 2 ), v = 1. The following lemma shows the relation between the Friedrichs and principal angles whose proof can be found in [5, Proposition 3.3]. Lemma 2.7. The Friedrichs angle is the (d + 1) th principal angle where d def = dim(t 1 T 2 ). Moreover, θ F (T 1, T 2 ) > 0. emark 2. The principal angles can be obtained by the singular value decomposition (SVD). For instance, let X n p and Y n q form the orthonormal basis for the subspaces T 1 and T 2 respectively. Let UΣV T be the SVD of X T Y p q, then cos(θ k ) = σ k, k = 1, 2,..., p and σ k corresponds to the k th largest singular value in Σ Partial smoothness The concept of partial smoothness is first introduced in [31]. This concept, as well as that of identifiable surfaces [49], captures the essential features of the geometry of non-smoothness which are along the so-called active/identifiable manifold. Loosely speaking, a partly smooth function behaves smoothly as we move along the identifiable submanifold, and sharply if we move transversal to the manifold. Let M be a C 2 -smooth embedded submanifold of n around a point x. To lighten the notation, henceforth we state as C 2 -manifold for short. The natural embedding of a submanifold M into n permits to define a iemannian structure on M, and we simply say M is a iemannian manifold. T M (x) denotes the tangent space to M at any point near x in M. More materials on manifolds are given in Section A. Below is the definition of partly smoothness associated to functions in Γ 0 ( n ). Definition 2.8 (Partly smooth function). Let Γ 0 ( n ), and x n such that (x). is then said to be partly smooth at x relative to a set M containing x if (i) Smoothness: M is a C 2 -manifold around x, restricted to M is C 2 around x; def (ii) Sharpness: The tangent space T M (x) coincides with T x = par( (x)) ; (iii) Continuity: The set-valued mapping is continuous at x relative to M. The class of partly smooth functions at x relative to M is denoted as PSF x (M). Owing to the results of [31], it can be shown that under transversality assumptions, the set of partly smooth functions is closed under addition and pre-composition by a linear operator. Popular examples of partly smooth functions are summarized in Section 6 whose details can be found in [34]. 7

8 The next lemma gives expressions of the iemannian gradient and Hessian (see Section A for definitions) of a partly smooth function. Lemma 2.9. If PSF x (M), then for any x M near x, In turn, for all h T x, M (x ) = P Tx ( (x )). 2 M(x )h = P Tx 2 (x )h + W x ( h, PT x (x ) ), where is any smooth extension (representative) of on M, and Wx (, ) : T x Tx T x is the Weingarten map of M at x. Proof. See [34, Fact 3.3]. 3. Local linear convergence of Primal Dual splitting methods In this section, we present the main result of the paper, the local linear convergence analysis of Primal Dual splitting methods. We start with the finite activity identification of the sequence (x k, v k ) generated by the methods, from which we further show that the fixed-point iteration of Primal Dual splitting methods locally can be linearized, and the linear convergence follows naturally Finite activity identification Let us first recall the result from [17,48], that under a proper renorming, Algorithm 1 can be written as Forward Backward. Let θ = 1, from the definition of the proximity operator (4), we have that (2) is equivalent to [ ] ( ) [ ] ( ) [ ] ( ) F 0 xk L xk+1 Idn /γ 0 G v k L J v + L xk+1 x k k+1 L Id m / v k+1 v. (10) k Let K = n m be a product space, Id the identity operator on K, and define the following variable and operators ( ) [ ] [ ] [ ] def xk z k = v, A def L = k L J, B def F 0 = 0 G, V def Idn /γ = L L Id m /γ. (11) J It is easy to verify that A is maximal monotone [12], B is min{β F, β G }-cocoercive. For V, denote ν = (1 L 2 ) min{ 1 1 γ, J γ }, then V is symmetric and ν-positive definite [19,48]. Define K V the Hilbert space induced by V. Now (10) can be reformulated as z k+1 = (V + A) 1 (V B)(z k ) = (Id + V 1 A) 1 (Id V 1 B)(z k ). (12) Clearly, (12) is the FB splitting on K V [48]. When F = G = 0, it reduces to the metric PPA discussed in [13,28]. 8

9 Before presenting the finite time activity identification under partial smoothness, we first recall the global convergence of Algorithm 1. Lemma 3.1 (Convergence of Algorithm 1). Consider Algorithm 1 under assumptions (A.1)-(A.4). Let θ = 1 and choose, such that 2 min{β F, β G } min { }( 1 1, γ 1 L 2) > 1, (13) then there exists a Kuhn-Tucker pair (x, v ) such that x solves (P P ), v solves (P D ), and (x k, v k ) (x, v ). Proof. See [48, Corollary 4.2]. emark 3. (i) Assumption (A.4) is important to ensure the existence of Kuhn-Tucker pairs. There are sufficient conditions which ensure that (A.4) can be satisfied. For instance, assuming that (P P ) has at least one solution and some classical domain qualification condition is satisfied (see e.g. [18, Proposition 4.3]), assumption (A.4) can be shown to be in force. (ii) It is obvious from (13) that L 2 < 1, which is also the condition needed in [13] for convergence for the special case F = G = 0. The convergence condition in [21] differs from (13), however, L 2 < 1 still is a key condition. The values of, can also be made varying along iterations, and convergence of the iteration remains under the rule provided in [19]. However, for the sake of brevity, we omit the details of this case here. (iii) Lemma 3.1 addresses global convergence of the iterates provided by Algorithm 1 only for the case θ = 1. For the choices θ [ 1, 1[ ]1, + [, so far the corresponding convergence of the iteration cannot be obtained directly, and a correction step as proposed in [28] for θ [ 1, 1[ is needed so that the iteration is a contraction. Unfortunately, such a correction step leads to a new iterative scheme, different from (2); see [28] for more details. In a very recent paper [50], the authors also proved the convergence of the Primal Dual splitting method of [48] for the case of θ [ 1, 1] with a proper modification of the iterates. Since the main focus of this work is to investigate local convergence behaviour, the analysis of global convergence of Algorithm 1 for any θ [ 1, + [ is beyond the scope of this paper. Thus, we will mainly consider the case θ = 1 in our analysis. Nevertheless, as we will see later, locally θ > 1 could give faster convergence rate compared to the choice θ [ 1, 1] for certain problems. This points out a future direction of research to design new Primal Dual splitting methods. Given a solution pair (x, v ) of (P P )-(P D ), we denote M x and MJv the C2 -smooth manifolds that x and v live in respectively, and denote Tx, T J v the tangent spaces of M x, MJv at x and v respectively. Theorem 3.2 (Finite activity identification). Consider Algorithm 1 under assumptions (A.1)-(A.4). Let θ = 1 and choose, based on Lemma 3.1. Thus (x k, v k ) (x, v ), where (x, v ) is a Kuhn-Tucker pair that solves (P P )-(P D ). If moreover 9

10 PSF x (M x ) and J PSF v (M J v ), and the non-degeneracy condition holds L v F (x ) ri ( (x ) ), Lx G (v ) ri ( J (v ) ). (ND) Then, there exists a large enough K N such that for all k K, (x k, v k ) M x MJ v. Moreover, (i) if M x = x + Tx, then T x k = Tx and x k M x hold for k > K. (ii) If M J v = v + T J v, then T J v k = T J v holds for k > K. (iii) If is locally polyhedral around x, then k K, x k M x = x + Tx, Tx k = Tx, M (x k) = x M ), and 2 x M (x k) = 0. x (iv) If J is locally polyhedral around v, then k K, v k M J v = v + T J v, T J v k = T J v, M (v J v k ) = M J (v ), and 2 J (v v M J k ) = 0. v emark 4. (i) The non-degeneracy condition (ND) is a strengthened version of (1). (ii) In general, we have no identification guarantees for x k and v k if the proximity operators are computed approximately, even if the approximation errors are summable, in which case one can still prove global convergence. The reason behind this is that in the exact case, under condition (ND), the proximal mapping of the partly smooth function and that of its restriction to M x locally agree nearby x (and similarly for J and v ). This property can be easily violated if approximate proximal mappings are involved, see [34] for an example. (iii) Theorem 3.2 only states the existence of K after which the identification of the sequences happens, but no bounds are provided. In [34,35], lower bounds of K for the FB and D methods are delivered. Though similar lower bounds can be obtained for the Primal Dual splitting method, the proof is not a simple adaptation of that in [34] even if the Primal Dual splitting method can be cast as a FB splitting in an appropriate metric. Indeed, the estimate in [34] is intimately tied to the fact that FB was applied to an optimization problem, while in the context of Primal-Dual splitting, FB is applied to a monotone inclusion. Since, moreover, these lower-bounds are only of theoretical interest, we decided to forgo the corresponding details here for the sake of conciseness. Proof of Theorem 3.2. From the definition of proximity operator (4) and the updating of x k+1 in (2), we have which yields 1 ( xk F (x k ) L v k x k+1 ) (xk+1 ), dist ( L v F (x ), (x k+1 ) ) L v F (x ) 1 (x k F (x k ) L v k x k+1 ) ( β F ) xk x + L v k v 0. 10

11 Then similarly for v k+1, we have 1 ( vk G (v k ) + L x k+1 v k+1 ) J (v k+1 ), and dist ( Lx G (v ), J (v k+1 ) ) Lx G (v ) 1 (v k G (v k ) + L x k+1 v k+1 ) ( β G ) vk v + L ( (1 + θ) x k+1 x + θ x k x ) 0. By assumption, J Γ 0 ( m ), Γ 0 ( n ), hence they are sub-differentially continuous at every point in their respective domains [42, Example 13.30], and in particular at v and x. It then follows that J (v k ) J (v ) and (x k ) (x ). Altogether with the non-degeneracy condition (ND), shows that the conditions of [27, Theorem 5.3] are fulfilled for Lx + G (v ), + J and L v + F (x ), +, and the finite identification claim follows. (i) In this case, M x is an affine subspace, it is straight to have x k M x. Then as is partly smooth at x relative to M x, the sharpness property holds for all nearby points in M x [31, Proposition 2.10]. Thus for k large enough, we have indeed T xk (M x ) = T x k = Tx as claimed. (ii) Similar to (i). (iii) It is immediate to verify that a locally polyhedral function around x is indeed partly smooth relative to the affine subspace x + Tx, and thus, the first claim follows from (i). For the rest, it is sufficient to observe that by polyhedrality, for any x M x near x, (x) = (x ). Therefore, combining local normal sharpness [31, Proposition 2.10] and Lemma A.2 yields the second conclusion. (iv) Similar to (iii) Locally linearized iteration elying on the identification result, now we are able to show that the globally nonlinear fixed-point iteration (12) can be locally linearized along the manifolds M x MJ v. As a result, the convergence rate of the iteration essentially boils down to analyzing the spectral properties of the matrix obtained in the linearization. Given a Kuhn-Tucker pair (x, v ), define the following two functions (x) def = (x) + x, L v + F (x ), J (y) def = J (y) y, Lx G (v ). (14) We have the following lemma. Lemma 3.3. Let (x, v ) be a Kuhn-Tucker pair such that PSF x (M x ), J PSF x (M J v ). Denote the iemannian Hessians of and J as def H = P T 2 x M (x )P x T and H def x J = γ J P T J 2 v M J (v )P J v T J. (15) v Then H and H J conditions: (i) (ND) holds. are symmetric positive semi-definite under either of the following 11

12 (ii) M x Define, and MJv are affine subspaces. def W = (Id n + H ) 1 def and W J = (Id m + H J ) 1, (16) then both W and W J are firmly non-expansive. Proof. See [35, Lemma 6.1]. For the smooth functions F and G, in addition to (A.1) and (A.2), for the rest of the paper, we assume that (A.5) F and G locally are C 2 -smooth around x and v respectively. Now define the restricted Hessians of F and G, def H F = P T 2 F (x )P x T and H def x G = P T J 2 G (v )P v T J. (17) v def def Denote H F = Id n H F, H G = Id m H G, L def = P T J LP v T and x [ ] def W M PD = H F W L (1 + θ)w J LW H F θ W J L W J H G (1 + θ)w J LW L. (18) We have the following proposition. Proposition 3.4 (Local linearization). Suppose that Algorithm 1 is run under the identification conditions of Theorem 3.2, and moreover assumption (A.5) holds. Then for all k large enough, z k+1 z = M PD (z k z ) + o( z k z ). (19) emark 5. (i) For the case of varying (, ) along iteration, i.e. {(,k,,k)} k. According to the result of [34], (19) remains true if these parameters converge to some constants such that condition (13) still holds. (ii) Taking H G = Id m (i.e. G = 0) in (18), one gets the linearized iteration associated to the Primal Dual splitting method of [21]. If we further let H F = Id n, this will correspond to the linearized version of the method in [13]. Proof of Proposition 3.4. From the update of x k in (2), we have x k F (x k ) L v k x k+1 (x k+1 ), F (x ) L v (x ). Denote τk the parallel translation from T x k to Tx. Then project on to corresponding tangent spaces and apply parallel translation, τ k M x (x k+1) = τ k P T x x k+1(x k F (x k ) L v k x k+1 ) = P T (x x k F (x k ) L v k x k+1 ) + (τk P T x x k+1 P T )(x x k F (x k ) L v k x k+1 ), M (x ) = P x T ( x F (x ) γ L v ), 12

13 which leads to τk M (x k+1) γ x M (x ) x = P T ((x x k F (x k ) L v k x k+1 ) (x F (x ) L v x )) + (τk P T x x k+1 P T )( γ F x (x ) L v ) }{{} Term 1 + (τk P T x x k+1 P T )((x x k F (x k ) L v k x k+1 ) + ( F (x ) + L v )). }{{} Term 2 (20) Moving Term 1 to the other side leads to τk M (x k+1) x M (x ) (τ x k P T x x k+1 P T )( x F (x ) γ L v ) = τk ( M (x k+1) + (L v + F (x )) ) ( γ x M (x ) + (L v + F (x )) ) x = P T 2 x M (x )P x T (x x k+1 x ) + o( x k+1 x ), where Lemma A.2 is applied. Since x k+1 = prox γ (x k F (x k ) L v k ), prox γ is firmly non-expansive and Id n F is non-expansive (under the parameter setting), then (x k F (x k ) L v k x k+1 ) (x F (x ) L v x ) (Id n F )(x k ) (Id n F )(x ) + L v k L v x k x + L v k v. (21) Therefore, for Term 2, owing to Lemma A.1, we have (τ k P T x x k+1 P T x )((x k F (x k ) L v k x k+1 ) (x F (x ) L v x )) = o( x k x + L v k v ). Therefore, from (20), and apply x k x = P T x (x k x ) + o(x k x ) [32, Lemma 5.1] to (x k+1 x ) and (x k x ), we get (Id n + H )(x k+1 x ) + o( x k+1 x ) = (x k x ) P T x ( F (x k) F (x )) P T x L (v k v ) + o( x k x + L v k v ). Then apply Taylor expansion to F, and apply [32, Lemma 5.1] to (v k v ), (Id n + H )(x k+1 x ) = (Id n H F )(x k x ) L (v k v ) + o( x k x + L v k v ). (22) Then invert (Id n + H ) and apply [32, Lemma 5.1], we get x k+1 x = W H F (x k x ) W L (v k v ) + o( x k x + L v k v ). (23) 13

14 Now from the update of v k+1 v k G (v k ) + L x k+1 v k+1 J (v k+1 ), v G (v ) + Lx v J (v ). Denote τ J k+1 the parallel translation from T J v k+1 to T J v, then τ J k+1 M J v J (v k+1 ) = τ J k+1 P T J v k+1 (v k G (v k ) + L x k+1 v k+1 ) = P T J v (v k G (v k ) + L x k+1 v k+1 ) + (τ J k+1 P T J v k+1 M J v J (v ) = P T J v (v G (v ) + Lx v ) which leads to P T J v )(v k G (v k ) + L x k+1 v k+1 ), τk+1 J M J J (v k+1 ) γ v J M J J (v ) v = P T J ((v v k G (v k ) + L x k+1 v k+1 ) (v G (v ) + Lx v )) + (τk+1p J T J P v k+1 T J )(v v k G (v k ) + L x k+1 v k+1 ) = P T J ((v v k G (v k ) + L x k+1 v k+1 ) (v G (v ) + Lx v )) + (τk+1p J T J P v k+1 T J )(γ v J Lx G (v )) }{{} Term 3 + (τk+1p J T J P v k+1 T J )((v v k G (v k ) + L x k+1 v k+1 ) + ( G (v ) Lx )). }{{} Term 4 (24) Similarly to the previous analysis, for Term 3, move to the lefthand side of the inequality and apply Lemma A.2, τ J k+1 M J v J (v k+1 ) M J v J (v ) (τ J k+1p T J v k+1 P T J v )( Lx G (v )) = τk+1( J M J J (v k+1 ) (Lx G (v ))) γ v J ( M J J (v ) (Lx G (v ))) v = P T J 2 v M J (v )P J T v J (v v k+1 v ) + o( v k+1 v ). Since θ 1, we have x k+1 x (1 + θ) x k+1 x + θ x k x 2( x k x + L v k v ) + x k x = 3 x k x + 2 L v k v. Then for Term 4, since L 2 < 1, prox γj J G is non-expansive, we have is firmly non-expansive and Id m (τ J k+1p T J v k+1 P T J v )((v k G (v k ) + L x k+1 v k+1 ) (v G (v ) + Lx v )) = o( v k v + L x k x ). 14

15 Therefore, from (24), apply [32, Lemma 5.1] to (v k+1 v ) and (v k v ), we get (Id m + H J )(v k+1 v ) = (Id m H G )(v k v ) + L( x k+1 x ) + o( v k v + L x k x ). (25) Then similar to (23), we get from (25) v k+1 v = W J H G (v k v ) + W J L( x k+1 x ) + o( v k v + L x k x ) = W J H G (v k v ) + (1 + θ) W J L(x k+1 x ) θ W J L(x k x ) + o( v k v + L x k x ) = W J H G (v k v ) θ W J L(x k x ) + (1 + θ) W J L ( W H F (x k x ) W L (v k v ) ) + o( x k x + L v k v ) + o( v k v + L x k x ) = ( W J H G (1 + θ) W J LW L ) (v k v ) + ( (1 + θ) W J LW H F θ W J L ) (x k x ) + o( x k x + L v k v ) + o( v k v + L x k x ). (26) Now we consider the small o-terms. For the 2 small o-terms in (22) and (25). First, let a 1, a 2 be two constants, then we have a 1 + a 2 = ( a 1 + a 2 ) 2 2(a a 2 2) = 2 Denote b = max{1, L, L }, then ( a1 a 2 ). ( v k v + L x k x ) + ( x k x + L v k v ) 2b( x k x + v k v ) 2 ( ) 2b xk x v k v. Combining this with (23) and (26), and ignoring the constants of the small o-term leads to the claimed result. Now we need to study the spectral properties of M PD. Let p def = dim(tx def ), q = dim(t J v ) be the dimensions of the tangent spaces T x and T J v respectively, define Sx def = (T x ) and S J def v = (T J v ). Assume that q p (alternative situations are discussed in emark 6). Let L = XΣ L Y be the singular value decomposition of L, and define the rank as l def = rank(l). Clearly, we have l p. Denote M k the kth power of M PD PD. Lemma 3.5 (Convergence property of M PD ). The following holds for the matrix M PD : (i) If θ = 1, then there exists a finite matrix M to which M k converges, i.e. PD PD M PD def = lim k M k PD. (27) 15

16 Moreover, k N, M k PD M PD = (M PD M PD )k and ρ(m PD M PD ) < 1. Given any ρ ]ρ(m PD M ), 1[, there is K large enough such that for all k K, PD M k PD M PD = O(ρk ). (ii) If F = G = 0, and, J are locally polyhedral around (x, v ). Then given any θ ]0, 1], M PD is convergent with M PD = [ Y X ] 0 l Id n l 0 l Id m l [ Y X ]. (28) Moreover, all the eigenvalues of M PD M PD are complex with the spectral radius ρ(m PD M 1 ) = θγ PD σmin 2 < 1, (29) where σ min is the smallest non-zero singular value of L. emark 6. We discuss in short several possible cases of (28) when F = G = 0 and, J are locally polyhedral around (x, v ). (i) If L = Id, then L = P T J P v T and σ x min is the cosine value of the biggest principal angle (yet strictly smaller than π/2) between tangent spaces Tx and T J v. (ii) For the spectral radius formula in (29), let us consider the case of Arrow Hurwicz scheme [3], i.e. θ = 0. Let, J be locally polyhedral, and Σ L = (σ j ) {j=1,...,l} be the singular values of L, then the eigenvalues of M PD are ρ j = 2( 1 (2 γ σj 2 ) ± σj 2( σj 2 4)), j {1,..., l}, (30) which apparently are complex ( σ 2 j L 2 < 1). Moreover, ρ j = 1 2 (2 σj 2)2 σj 2( σj 2 4) = 1. This implies that M PD has multiple eigenvalues with absolute values all equal to 1, then owing to the result of [5], we have M PD is not convergent. Furthermore, for θ [ 1, 0[, we have 1 θ σmin 2 > 1 meaning that M PD is not convergent, this implies that the correction step proposed in [28] is necessary for θ [ 1, 0]. Discussion on θ > 1 is left to Section 4. Proof of Proposition 3.5. (i) When θ = 1, M PD becomes [ ] W M PD = H F W L 2 W J LW H F W J L W J H G 2 W J LW L (31) Next we show that M PD is averaged non-expansive. 16

17 First define the following matrices A = [ ] [ ] [ ] H / L HF 0 Idn /γ, B = L H J / 0 H, V = L, (32) G L Id m / where we have A is maximal monotone [12], B is min{β F, β G }-cocoercive, and V is ν-positive definite with ν = (1 L 2 ) min{ 1 γ, 1 γ }. J Now we have V + A = [ Idn+H γ 0 Id 2L m+h J ] [ (V + A) 1 = [ 1 ] γ H F L and V B = 1. As a result, we get L γ H G J [ (V + A) 1 γ (V B) = W 0 2 W J LW W J [ W = H F 2 W J LW H F W J L ] [ 1 H F W 0 2 W J LW W J L L 1 H G ] ], W L W J H G 2 W J LW L which is exactly (31). From Lemma 2.4 we know that M PD : K V K V is averaged non-expansive, hence it is convergent [6]. Then we have the induced matrix norm lim M k M PD PD V = lim M M PD PD k k k V = 0. Since we are in the finite dimensional space and V is an isomorphism, then the above limit implies that lim M M PD PD k k = 0, which means that ρ(m PD M ) < 1. The rest of the proof is classical using the PD spectral radius formula, see e.g. [5, Theorem 2.12(i)]. (ii) When and J are locally polyhedral, then W = Id n, W J = Id m, altogether with F = 0, G = 0, for any θ [0, 1], we have With the SVD of L, for M PD, we have [ Idn γ M PD = L ] L Id m (1 + θ)ll. (33) [ ] [ Idn γ M PD = L Y Y γ L Id m (1 + θ) LL = Y Σ ] L X XΣ L Y XX (1 + θ) XΣ 2 L [ ] [ X Y Idn γ = Σ ] [ ] L Y X Σ L Id m (1 + θ) Σ 2 X. L }{{} M Σ ], 17

18 Since we assume that rank(l) = l p, then Σ L can be represented as Σ L = [ ] Σl 0 l,n l, 0 m l,l 0 m l,n l where Σ l = (σ j ) j=1,...,l. Back to M Σ, we have Id l 0 l,n l Σ l 0 l,m l 0 M Σ = n l,l Id n l 0 n l,l 0 n l,m l Σ l 0 l,n l Id l (1 + θ) Σ 2 l 0. l,m l 0 m l,l 0 m l,n l 0 m l,l Id m l Let s study the eigenvalues of M Σ, M Σ ρid m+n (1 ρ)id l 0 l,n l Σ l 0 l,m l 0 = n l,l (1 ρ)id n l 0 n l,l 0 n l,m l γ J Σ l 0 l,n l (1 ρ)id l (1 + θ) Σ 2 l 0 l,m l 0 m l,l 0 m l,n l 0 m l,l (1 ρ)id m l [ ] = (1 ρ) m+n 2l (1 ρ)idl Σ l Σ l (1 ρ)id l (1 + θ) Σ 2. l Since ( Σ l )((1 ρ)id l ) = ((1 ρ)id l )( Σ l ), then by [43, Theorem 3], we have M Σ ρid a+b = (1 ρ) m+n 2l [ (1 ρ)idl Σ l Σ l (1 ρ)id l (1 + θ) Σ 2 l ] = (1 ρ) m+n 2l [ (1 ρ)((1 ρ)id l (1 + θ) Σ 2 l ) + Σ lσ l ] = (1 ρ) m+n 2l [ (1 ρ) 2 Id l (1 ρ)(1 + θ) Σ 2 l + Σ lσ l ] = (1 ρ) m+n 2l l j=1 (ρ2 (2 (1 + θ) σ 2 j )ρ + (1 θ σ 2 j )). For the eigenvalues ρ, clearly, except the 1 s, we have for j = 1,..., l ρ j = (2 (1 + θ) σ2 j ) ± (1 + θ) 2 γ 2 J γ2 σ4 j 4 σ2 j 2 Since σ 2 j L 2 < 1, then ρ j are complex and ρ j = 1 (2 ) (1 + θ)γj γ 2 σj 2 2 ( (1 + θ) 2 γ 2γ2σ4 J j 4γ ) γ J σj 2 = 1 θ σj 2 < 1.. As a result, we also obtain the M, which reads PD M PD = [ Y X ] 0 l Id n l 0 l Id m l [ Y X ]. 18

19 Corollary 3.6. Suppose that Algorithm 1 is run under the identification conditions of Theorem 3.2, and moreover assumption (A.5) holds. Then the following holds (i) the linearized iteration (19) is equivalent to (Id M PD )(z k+1 z ) = M PD (Id M PD )(z k z )+o((id M PD ) z k z ). (34) (ii) If moreover, J are locally polyhedral around (x, v ), and F, G are quadratic, then M PD (z k z ) = 0 for all k large enough, and (34) becomes Proof. See [35, Corollary 5.1]. z k+1 z = (M PD M PD )(z k z ). (35) 3.3. Local linear convergence Finally, we are able to present the local linear convergence result. Theorem 3.7 (Local linear convergence). Suppose that Algorithm 1 is run under the identification conditions of Theorem 3.2, and moreover assumption (A.5) holds. Then: (i) given any ρ ]ρ(m PD M ), 1[, there exists a K large enough such that k K, PD (Id M PD )(z k z ) = O(ρ k K ). (36) (ii) If moreover,, J are locally polyhedral around (x, v ), and F, G are quadratic, then there exists a K large enough such that for all k K, we have directly for ρ [ρ(m PD M ), 1[. PD Proof. See [35, Theorem 5.1]. z k z = O(ρ k K ), (37) emark 7. (i) Similar to Proposition 3.4 and emark 5, the above result remains hold if (, ) are varying yet convergent. However, the local rate convergence of z k z will depends on how fast {(,k,,k)} k converge, that means, if they converge at a sublinear rate, then the convergence rate of z k z will eventually become sublinear. See [35, Section 8.3] for the case of Douglas achford splitting method. (ii) When F = G = 0 and both and J are locally polyhedral around the (x, v ), then the convergence rate of the Primal Dual splitting method is controlled by θ and as shown in (29); see the upcoming section for a detailed discussion. For general situations (i.e. F, G are nontrivial and, J are general partly smooth functions), the factors that contribute to the local convergence rate are much more complicated, such as the iemannian Hessians of, J. 4. Discussions In this part, we present several discussions on the obtained local linear convergence result, including the effects of θ 1, local oscillation and connections with FB and D methods. 19

20 To make the discussion easier to deliver, for the rest of this section we focus on the case where F = G = 0, i.e. the Primal Dual splitting method of [13], and moreover, J are locally polyhedral around the Kuhn-Tucker pair (x, v ). Under such setting, the matrix defined in (18) becomes M PD [ def Idn γ = L ] L Id m (1 + θ) LL. (38) 4.1. Choice of θ Owing to Lemma 3.5, the matrix M PD in (38) is convergent for θ ]0, 1], see Eq. (28), with the spectral radius ρ(m PD M 1 ) = θγ PD σmin 2 < 1, (39) with σ min being the smallest non-zero singular value of L. In general, given a solution pair (x, v ), σ min is fixed, hence the spectral radius ρ(m PD M ) is simply controlled by θ and the product γ γ PD J. To make the local convergence rate as faster as possible, it is obvious that we need to make the value of θ as big as possible. ecall in the global convergence of Primal Dual splitting method or the result from [13], that L 2 < 1. Denote σ max the biggest singular value of L. It is then straightforward that σmax 2 L 2 < 1 and moreover ρ(m PD M PD ) = 1 θ σ 2 min > 1 θ(σ min / L ) 2 1 θ(σ min /σ max ) 2. (40) If we define cnd def = σ max /σ min the condition number of L, then we have ρ(m PD M PD ) > 1 θ(1/cnd) 2. To this end, it is clear that θ = 1 gives the best convergence rate for θ [ 1, 1]. Next let us look at what happens locally if we choose θ > 1.The spectral radius formula (39) implies that bigger value of θ yields smaller spectral radius ρ(m PD M ). Therefore, PD locally we should choose θ as big as possible. However, there is an upper bound of θ which is discussed below. Following emark 6, let Σ L = (σ j ) {j=1,...,l} be the singular values of L, let ρ j be the eigenvalue of M PD M PD, we have known that ρ j is complex with ρ j = 2( 1 (2 (1 + θ)γ σj 2 ) ± (1 + θ) 2 γ 2γ2σ4 J j 4γ ) σj 2, ρj = 1 θ σj 2. Now let θ > 1, to ensure ρ j make sense for all j {1,..., l}, there must holds 1 θ σ 2 max 0 θ 1 σmax 2, which means that θ indeed is bounded from above. 20

21 Unfortunately, since L = P T LP x T, the upper bound can be only obtained if we J v had the solution pair (x, v ). However, in practice one can use back-tracking or the Armijo-Goldstein-rule to find the proper θ. See Section 6.4 for an illustration of online searching of θ. It should be noted that such updating rule can also be applied to, since we have L L. Moreover, it should be noted that in practice one can choose to enlarge either θ or as they will have very similar acceleration outcome. emark 8. It should be noted that the above discussion on the effect of θ > 1 may only valid for the case F = 0, G = 0, i.e. the Primal Dual splitting method of [13]. If F and/or G are not vanished, then locally, θ < 1 may give faster convergence rate Oscillations For the inertial Forward Backward and FISTA [7] methods, it is shown in [34] that they locally oscillate when the inertia momentum are too high (see [34, Section 4.4] for more details). When solving certain type of problems (i.e. F = G = 0 and, J are locally polyhedral around the solution pair (x, v )), the Primal Dual splitting method also locally oscillates (see Figure 6 for an illustration). As revealed in the proof of Lemma 3.5, all the eigenvalues of M PD M in (38) are complex. This means that PD locally the sequences generated by the Primal Dual splitting iteration may oscillate. For σ min, the smallest non-zero singular of L, one of its corresponding eigenvalues of M PD reads ρ σmin = 2( 1 (2 (1 + θ)γj σmin) 2 + (1 + θ) 2 γ 2γ2σ4 J min 4γ γ J σmin) 2, and (1 + θ) 2 γ 2 γ2 J σ4 min 4 σ 2 min < 0. Denote ω the argument of ρ σ min, then cos(ω) = 2 (1 + θ)γ σ2 min. (41) 1 θγj σmin 2 The oscillation period of the sequence z k z is then exactly π ω. See Figure 6 for an illustration elations with FB and D/ADMM In this part, we discuss the relation between the obtained result and our previous work on local linear convergence of Forward Backward [32,34] and Douglas achford/admm [35,36] Forward Backward splitting For problem (P P ), when J = G = 0, Algorithm 1 reduces to, denoting γ = β = β F, and x k+1 = prox γ ( xk γ F (x k ) ), γ ]0, 2β[, (42) which is the non-relaxed FB splitting [37] with constant step-size. Let x Argmin( + F ) be a global minimizer to which {x k } k N of (42) converges, the non-degeneracy condition (ND) for identification then becomes F (x ) 21

22 ri( (x )) which recovers the conditions of [34, Theorem 4.11]. Following the notations def of Section 3, define M FB = W (Id n γh F ), we have for all k large enough x k+1 x = M FB (x k x ) + o( x k x ). From Theorem 3.7, we obtain the following result for the FB splitting method, for the case γ being fixed. Denote M x the manifold that x lives in. Corollary 4.1. For problem (P P ), let J = G = 0 and suppose that (A.1) holds and Argmin( + F ), and the FB iteration (42) creates a sequence x k x Argmin( + F ) such that PSF x (M x ), F is C 2 near x, and condition F (x ) ri( (x )) holds. Then (i) given any ρ ]ρ(m FB M ), 1[, there exists a K large enough such that for all FB k K, (Id M FB )(x k x ) = O(ρ k K ). (43) (ii) If moreover, are locally polyhedral around x, there exists a K large enough such that for all k K, we have directly for ρ [ρ(m FB M ), 1[. FB x k x = O(ρ k K ), (44) 2β Proof. Owing to [6], M FB is 4β γ -averaged non-expansive, hence convergent. The convergence rates in (43) and (44) are straightforward from Theorem 3.7. The result in [32,34] required a so-called restricted injectivity (I) condition, which means that H F should be positive definite. Moreover, this I condition was removed only when J is locally polyhedral around x (e.g. see [34, Theorem 4.9]). Here, we show that neither I condition nor polyhedrality are needed to show local linear convergence. As such, the analysis on this paper generalizes that of [34] even for FB splitting. However, the price of removing those conditions is that the obtained convergence rate is on a different criterion (i.e. (Id M FB )(x k x ) ) other than the sequence itself directly (i.e. x k x ) Douglas achford splitting and ADMM Let F = G = 0 and L = Id, then problem (P P ) becomes min (x) + J(x). x n For the above problem, below we briefly show that D splitting is the limiting case of Primal Dual splitting by letting = 1. First, for the Primal Dual splitting scheme of Algorithm 1, let θ = 1 and change the order of updating the variables, we obtain the following iteration v k+1 = prox γj J (v k + x k ) x k+1 = prox γ (x k v k+1 ) x k+1 = 2x k+1 x k. 22 (45)

23 Then apply the Moreau s identity (6) to prox γj J, let = 1/ and define z k+1 = x k v k+1, iteration (45) becomes u k+1 = prox γ J(2x k z k ) z k+1 = z k + u k+1 x k x k+1 = prox γ (z k+1 ), which is the non-relaxed D splitting method [24]. At convergence, we have u k, x k x = prox γ (z ) where z is a fixed point of the iteration. See also [13, Section 4.2]. Specializing the derivation of (18) to (45) and (46), we obtain the following two linearized fixed-point operator for (45) and (46) respectively [ Id n M PD = P T J P v T x [ Id n M D = 1 P γ T J P v T x P T P x T J v Id n 2 P T J P v T P x T J v ] P T P x T J v Id n 2P T J P v T P x T J v Owing to (ii) of Lemma 3.5, M PD, M D are convergent. Let ω be the largest principal angle (yet smaller than π/2) between tangent spaces Tx and T J v, then we have the spectral radius of M PD MPD reads ((i) of emark 6),. ], (46) ρ(m PD M PD) = 1 cos 2 (ω) 1 cos 2 (ω) = sin(ω) = cos(π/2 ω). (47) Suppose that the Kuhn-Tucker pair (x, v ) is unique, and moreover that and J are polyhedral. Therefore, we have that if J is locally polyhedral near v along v + T J v, then J is locally polyhedral near x around x + Tx J, and moreover there holds Tx J = (T J v ). As a result, the principal angles between Tx, T J v and the ones between Tx, T x J are complementary, which means that π/2 ω is the Friedrichs angle between tangent spaces Tx, T x J. Thus, following (47), we have ρ(m PD M PD) = 1 cos 2 (ω) cos(π/2 ω) = ρ(m D M D). We emphasise the fact that such connection can be drawn only for the polyhedral case, which justifies the different analysis carried in [35,36]. In addition, for D we were able to characterize situations where finite convergence provably occurs, while this is not (yet) the case for Primal Dual splitting even for and J being locally polyhedral around (x, v ) and F = G = 0 but L is non-trivial. 5. Multiple infimal convolutions In this section, we consider problem (P P ) with more than one infimal convolution. Le m 1 be a positive integer. Consider the problem of solving min (x) + F (x) + m x n i=1 (J i + G i )(L i x), (PP m) 23

24 where (A.1) holds for and F, and for every i = 1,..., m the followings are hold: (A.2) J i, G i Γ 0 ( mi ), with G i being differentiable and β G,i -strongly convex for β G,i > 0. (A.3) L i : n mi is a linear operator. (A.4) The condition 0 ran( + F + m i=1 L i ( J i G i )L i ) holds. The dual problem of (PP m) reads, ( ) min ( + F m ) v 1 m 1,...,v m mm i=1 L m ( i v i + i=1 J i (v i ) + G i (v i ) ). (PD m) Problem (PP m ) is considered in [19,48], and a Primal Dual splitting algorithm is proposed there which is an extension of Algorithm 1 using a product space trick, see Algorithm 2 hereafter for details. In both schemes in [19,48], θ is set as 1. Algorithm 2: A Primal Dual splitting method Initial: Choose, ( i ) i > 0. For k = 0, x 0 n, v i,0 mi, i {1,..., m}; repeat k = k + 1; until convergence; ( x k+1 = prox γ xk F (x k ) i L ) i v i,k x k+1 = 2x k+1 x k (48) For i = 1,..., m ( vi,k+1 = prox γji J i vi,k γ G Ji i (v i,k ) + γ L ) Ji i x k+1, 5.1. Product space The following result is taken from [19]. Define the product space K = n m1 mm, and let Id be the identity operator on K. Define the following operators A def = L 1 L m L 1 J L m J m, B def = F G 1... G m, V def = Id n L 1 L m Id m1 1 (49) Then A is maximal monotone, B is min{β F, β G1,..., β Gm }-cocoercive, and V is symmetric and ν-positive definite with ν = (1 i γj i Li 2 ) min{ 1, 1 1,..., 1 m }. Define z k = (x k, v 1,k,, v m,k ) T, then it can be shown that (48) is equivalent to z k+1 = (V + A) 1 (V B)z k = (Id + V 1 A) 1 (Id V 1 B)z k. (50) L 1. L m... Id mm m. 24

25 5.2. Local convergence analysis Let (x, v 1,..., v m) be a Kuhn-Tucker pair. Define the following functions Ji (v i ) def = Ji (v i ) v i, L i x G i (vi ), v i mi, i {1,..., m}, (51) and the iemannian Hessian of each J i, H J i def = γ P J 2 J i T J i v i M J i i (vi )P J v T i i v i and W J i def = (Id mi + H J i ) 1, i {1,..., m}. (52) For each i {1,..., m}, owing to Lemma 3.3, we have that W J is firmly non-expansive i if the non-degeneracy condition (ND m ) holds. Now suppose (A.5) F locally is C 2 -smooth around x and G i locally is C 2 around vi. Define the restricted Hessian H G i def L i = P J T i v i M PD def = L i P T, and the matrix x def = P T J i v i 2 G i (v i )P T J i v i def. Define H G i = Id mi H G i i, W H F W L 1 W L m W J L 1 (2W 1 1 H F Id n) W J (H 1 G 2γ 1 J L 1 W L 1 ) W J L m (2W m m H F Id n) W J (H m G 2γ m J γ m L m W L m ) (53) Using the same strategy of the proof of Lemma 3.5, one can show that M PD is convergent, which again is denoted as M, and ρ(m M ) < 1. PD PD PD Corollary 5.1. Consider Algorithm 2 under assumptions (A.1) and (A.2)-(A.5). Choose, ( i ) i > 0 such that 2 min{β F, β G1,..., β Gm } min { 1, 1 1,..., 1 m }( 1 i i L i 2) > 1. (54) Then (x k, v 1,k,..., v m,k ) (x, v1,..., v m), where x solves (PP m) and (v 1,..., v m) solve (PD m). If moreover PSF x (M x ) and J i PSF v i (M J i v ), i {1,..., m}, and i the non-degeneracy condition holds i L i v i F (x ) ri ( (x ) ) L i x G i (v ) ri ( J i (v ) ), i {1,..., m}. Then, (i) there exists K > 0 such that for all k K, (x k, v 1,k,..., v m,k ) M x MJ 1 v 1 MJ m v m.. (ND m ) (ii) Given any ρ ]ρ(m PD M ), 1[, there exists a K large enough such that for all PD k K, (Id M PD )(z k z ) = O(ρ k K ). (55) 25

Activity Identification and Local Linear Convergence of Forward Backward-type methods

Activity Identification and Local Linear Convergence of Forward Backward-type methods Activity Identification and Local Linear Convergence of Forward Backward-type methods Jingwei Liang Jalal M. Fadili Gabriel Peyré Abstract. In this paper, we consider a class of Forward Backward FB splitting

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

M. Marques Alves Marina Geremia. November 30, 2017

M. Marques Alves Marina Geremia. November 30, 2017 Iteration complexity of an inexact Douglas-Rachford method and of a Douglas-Rachford-Tseng s F-B four-operator splitting method for solving monotone inclusions M. Marques Alves Marina Geremia November

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Second order forward-backward dynamical systems for monotone inclusion problems

Second order forward-backward dynamical systems for monotone inclusion problems Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from

More information

ADMM for monotone operators: convergence analysis and rates

ADMM for monotone operators: convergence analysis and rates ADMM for monotone operators: convergence analysis and rates Radu Ioan Boţ Ernö Robert Csetne May 4, 07 Abstract. We propose in this paper a unifying scheme for several algorithms from the literature dedicated

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Peter Ochs, Jalal Fadili, and Thomas Brox Saarland University, Saarbrücken, Germany Normandie Univ, ENSICAEN, CNRS, GREYC, France

More information

1 Introduction and preliminaries

1 Introduction and preliminaries Proximal Methods for a Class of Relaxed Nonlinear Variational Inclusions Abdellatif Moudafi Université des Antilles et de la Guyane, Grimaag B.P. 7209, 97275 Schoelcher, Martinique abdellatif.moudafi@martinique.univ-ag.fr

More information

Optimality, identifiability, and sensitivity

Optimality, identifiability, and sensitivity Noname manuscript No. (will be inserted by the editor) Optimality, identifiability, and sensitivity D. Drusvyatskiy A. S. Lewis Received: date / Accepted: date Abstract Around a solution of an optimization

More information

arxiv: v1 [math.oc] 21 Apr 2016

arxiv: v1 [math.oc] 21 Apr 2016 Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv:604.068v [math.oc] Apr 06 Abstract We study acceleration and

More information

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms JOTA manuscript No. (will be inserted by the editor) Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms Peter Ochs Jalal Fadili Thomas Brox Received: date / Accepted: date Abstract

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

10. Smooth Varieties. 82 Andreas Gathmann

10. Smooth Varieties. 82 Andreas Gathmann 82 Andreas Gathmann 10. Smooth Varieties Let a be a point on a variety X. In the last chapter we have introduced the tangent cone C a X as a way to study X locally around a (see Construction 9.20). It

More information

The resolvent average of monotone operators: dominant and recessive properties

The resolvent average of monotone operators: dominant and recessive properties The resolvent average of monotone operators: dominant and recessive properties Sedi Bartz, Heinz H. Bauschke, Sarah M. Moffat, and Xianfu Wang September 30, 2015 (first revision) December 22, 2015 (second

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

Signal Processing and Networks Optimization Part VI: Duality

Signal Processing and Networks Optimization Part VI: Duality Signal Processing and Networks Optimization Part VI: Duality Pierre Borgnat 1, Jean-Christophe Pesquet 2, Nelly Pustelnik 1 1 ENS Lyon Laboratoire de Physique CNRS UMR 5672 pierre.borgnat@ens-lyon.fr,

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

COMP 558 lecture 18 Nov. 15, 2010

COMP 558 lecture 18 Nov. 15, 2010 Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to

More information

On nonexpansive and accretive operators in Banach spaces

On nonexpansive and accretive operators in Banach spaces Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 3437 3446 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa On nonexpansive and accretive

More information

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS RALPH HOWARD DEPARTMENT OF MATHEMATICS UNIVERSITY OF SOUTH CAROLINA COLUMBIA, S.C. 29208, USA HOWARD@MATH.SC.EDU Abstract. This is an edited version of a

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Carlos Humes Jr. a, Benar F. Svaiter b, Paulo J. S. Silva a, a Dept. of Computer Science, University of São Paulo, Brazil Email: {humes,rsilva}@ime.usp.br

More information

Subdifferential representation of convex functions: refinements and applications

Subdifferential representation of convex functions: refinements and applications Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION JOEL A. TROPP Abstract. Matrix approximation problems with non-negativity constraints arise during the analysis of high-dimensional

More information

The Complexity of Primal-Dual Fixed Point Methods for Ridge Regression

The Complexity of Primal-Dual Fixed Point Methods for Ridge Regression The Complexity of Primal-Dual Fixed Point Methods for Ridge Regression Ademir Alves Ribeiro Peter Richtárik August 28, 27 Abstract We study the ridge regression L 2 regularized least squares problem and

More information

Homework 3. Convex Optimization /36-725

Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

SPRING 2006 PRELIMINARY EXAMINATION SOLUTIONS

SPRING 2006 PRELIMINARY EXAMINATION SOLUTIONS SPRING 006 PRELIMINARY EXAMINATION SOLUTIONS 1A. Let G be the subgroup of the free abelian group Z 4 consisting of all integer vectors (x, y, z, w) such that x + 3y + 5z + 7w = 0. (a) Determine a linearly

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

Convex Functions. Pontus Giselsson

Convex Functions. Pontus Giselsson Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum

More information

arxiv: v1 [math.oc] 12 Mar 2013

arxiv: v1 [math.oc] 12 Mar 2013 On the convergence rate improvement of a primal-dual splitting algorithm for solving monotone inclusion problems arxiv:303.875v [math.oc] Mar 03 Radu Ioan Boţ Ernö Robert Csetnek André Heinrich February

More information

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological

More information

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM CAIHUA CHEN, SHIQIAN MA, AND JUNFENG YANG Abstract. In this paper, we first propose a general inertial proximal point method

More information

On the order of the operators in the Douglas Rachford algorithm

On the order of the operators in the Douglas Rachford algorithm On the order of the operators in the Douglas Rachford algorithm Heinz H. Bauschke and Walaa M. Moursi June 11, 2015 Abstract The Douglas Rachford algorithm is a popular method for finding zeros of sums

More information

Optimality, identifiability, and sensitivity

Optimality, identifiability, and sensitivity Noname manuscript No. (will be inserted by the editor) Optimality, identifiability, and sensitivity D. Drusvyatskiy A. S. Lewis Received: date / Accepted: date Abstract Around a solution of an optimization

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities Caihua Chen Xiaoling Fu Bingsheng He Xiaoming Yuan January 13, 2015 Abstract. Projection type methods

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

A Dykstra-like algorithm for two monotone operators

A Dykstra-like algorithm for two monotone operators A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct

More information

FIXED POINT ITERATIONS

FIXED POINT ITERATIONS FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in

More information

Date: July 5, Contents

Date: July 5, Contents 2 Lagrange Multipliers Date: July 5, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 14 2.3. Informative Lagrange Multipliers...........

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Radu Ioan Boţ Christopher Hendrich November 7, 202 Abstract. In this paper we

More information

Convergence rate analysis for averaged fixed point iterations in the presence of Hölder regularity

Convergence rate analysis for averaged fixed point iterations in the presence of Hölder regularity Convergence rate analysis for averaged fixed point iterations in the presence of Hölder regularity Jonathan M. Borwein Guoyin Li Matthew K. Tam October 23, 205 Abstract In this paper, we establish sublinear

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION

PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION ALESSIO FIGALLI AND YOUNG-HEON KIM Abstract. Given Ω, Λ R n two bounded open sets, and f and g two probability densities concentrated

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

On Gap Functions for Equilibrium Problems via Fenchel Duality

On Gap Functions for Equilibrium Problems via Fenchel Duality On Gap Functions for Equilibrium Problems via Fenchel Duality Lkhamsuren Altangerel 1 Radu Ioan Boţ 2 Gert Wanka 3 Abstract: In this paper we deal with the construction of gap functions for equilibrium

More information

Monotone Operator Splitting Methods in Signal and Image Recovery

Monotone Operator Splitting Methods in Signal and Image Recovery Monotone Operator Splitting Methods in Signal and Image Recovery P.L. Combettes 1, J.-C. Pesquet 2, and N. Pustelnik 3 2 Univ. Pierre et Marie Curie, Paris 6 LJLL CNRS UMR 7598 2 Univ. Paris-Est LIGM CNRS

More information

Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping

Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping March 0, 206 3:4 WSPC Proceedings - 9in x 6in secondorderanisotropicdamping206030 page Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping Radu

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Analysis II: The Implicit and Inverse Function Theorems

Analysis II: The Implicit and Inverse Function Theorems Analysis II: The Implicit and Inverse Function Theorems Jesse Ratzkin November 17, 2009 Let f : R n R m be C 1. When is the zero set Z = {x R n : f(x) = 0} the graph of another function? When is Z nicely

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

MET Workshop: Exercises

MET Workshop: Exercises MET Workshop: Exercises Alex Blumenthal and Anthony Quas May 7, 206 Notation. R d is endowed with the standard inner product (, ) and Euclidean norm. M d d (R) denotes the space of n n real matrices. When

More information

Radius Theorems for Monotone Mappings

Radius Theorems for Monotone Mappings Radius Theorems for Monotone Mappings A. L. Dontchev, A. Eberhard and R. T. Rockafellar Abstract. For a Hilbert space X and a mapping F : X X (potentially set-valued) that is maximal monotone locally around

More information

Semidefinite Programming

Semidefinite Programming Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has

More information

On convergence rate of the Douglas-Rachford operator splitting method

On convergence rate of the Douglas-Rachford operator splitting method On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford

More information

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

ZERO DUALITY GAP FOR CONVEX PROGRAMS: A GENERAL RESULT

ZERO DUALITY GAP FOR CONVEX PROGRAMS: A GENERAL RESULT ZERO DUALITY GAP FOR CONVEX PROGRAMS: A GENERAL RESULT EMIL ERNST AND MICHEL VOLLE Abstract. This article addresses a general criterion providing a zero duality gap for convex programs in the setting of

More information

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Naohiko Arima, Sunyoung Kim, Masakazu Kojima, and Kim-Chuan Toh Abstract. In Part I of

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

SMSTC (2017/18) Geometry and Topology 2.

SMSTC (2017/18) Geometry and Topology 2. SMSTC (2017/18) Geometry and Topology 2 Lecture 1: Differentiable Functions and Manifolds in R n Lecturer: Diletta Martinelli (Notes by Bernd Schroers) a wwwsmstcacuk 11 General remarks In this lecture

More information

Math 210B. Artin Rees and completions

Math 210B. Artin Rees and completions Math 210B. Artin Rees and completions 1. Definitions and an example Let A be a ring, I an ideal, and M an A-module. In class we defined the I-adic completion of M to be M = lim M/I n M. We will soon show

More information

SYMPLECTIC GEOMETRY: LECTURE 5

SYMPLECTIC GEOMETRY: LECTURE 5 SYMPLECTIC GEOMETRY: LECTURE 5 LIAT KESSLER Let (M, ω) be a connected compact symplectic manifold, T a torus, T M M a Hamiltonian action of T on M, and Φ: M t the assoaciated moment map. Theorem 0.1 (The

More information

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Convex functions, subdifferentials, and the L.a.s.s.o.

Convex functions, subdifferentials, and the L.a.s.s.o. Convex functions, subdifferentials, and the L.a.s.s.o. MATH 697 AM:ST September 26, 2017 Warmup Let us consider the absolute value function in R p u(x) = (x, x) = x 2 1 +... + x2 p Evidently, u is differentiable

More information

Monotone operators and bigger conjugate functions

Monotone operators and bigger conjugate functions Monotone operators and bigger conjugate functions Heinz H. Bauschke, Jonathan M. Borwein, Xianfu Wang, and Liangjin Yao August 12, 2011 Abstract We study a question posed by Stephen Simons in his 2008

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information