Proximal splitting methods on convex problems with a quadratic term: Relax!

Similar documents
Coordinate Update Algorithm Short Course Operator Splitting

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual algorithms for the sum of two and three functions 1

Primal-dual coordinate descent

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Dual and primal-dual methods

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

A Primal-dual Three-operator Splitting Scheme

Operator Splitting for Parallel and Distributed Optimization

Tight Rates and Equivalence Results of Operator Splitting Schemes

Convergence of Fixed-Point Iterations

6. Proximal gradient method

Splitting methods for decomposing separable convex programs

Mathematical methods for Image Processing

ADMM for monotone operators: convergence analysis and rates

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

arxiv: v4 [math.oc] 29 Jan 2018

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Signal Processing and Networks Optimization Part VI: Duality

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

6. Proximal gradient method

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

Accelerated primal-dual methods for linearly constrained convex problems

Design of optimal RF pulses for NMR as a discrete-valued control problem

arxiv: v1 [math.oc] 12 Mar 2013

Solving monotone inclusions involving parallel sums of linearly composed maximally monotone operators

About Split Proximal Algorithms for the Q-Lasso

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization

Support Vector Machines

A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection

A generalized forward-backward method for solving split equality quasi inclusion problems in Banach spaces

Distributed Computation of Quantiles via ADMM

Adaptive Primal Dual Optimization for Image Processing and Learning

A first-order primal-dual algorithm with linesearch

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

A primal dual fixed point algorithm for convex separable minimization with applications to image restoration

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

arxiv: v1 [math.oc] 21 Apr 2016

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Sparse Optimization Lecture: Dual Methods, Part I

M. Marques Alves Marina Geremia. November 30, 2017

In collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29

Adaptive Restarting for First Order Optimization Methods

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Math 273a: Optimization Lagrange Duality

On the equivalence of the primal-dual hybrid gradient method and Douglas-Rachford splitting

Optimization for Learning and Big Data

Abstract In this article, we consider monotone inclusions in real Hilbert spaces

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

Conditional Gradient (Frank-Wolfe) Method

Monotone Operator Splitting Methods in Signal and Image Recovery

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting

On convergence rate of the Douglas-Rachford operator splitting method

Inertial Douglas-Rachford splitting for monotone inclusion problems

ARock: an algorithmic framework for asynchronous parallel coordinate updates

1 Introduction and preliminaries

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

Second order forward-backward dynamical systems for monotone inclusion problems

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

arxiv: v3 [math.oc] 29 Jun 2016

Distributed Optimization via Alternating Direction Method of Multipliers

EE364b Convex Optimization II May 30 June 2, Final exam

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM

A New Use of Douglas-Rachford Splitting and ADMM for Classifying Infeasible, Unbounded, and Pathological Conic Programs

The Alternating Direction Method of Multipliers

Local Linear Convergence Analysis of Primal Dual Splitting Methods

A primal dual Splitting Method for Convex. Optimization Involving Lipschitzian, Proximable and Linear Composite Terms,

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

Sparse Regularization via Convex Analysis

Extensions of the CQ Algorithm for the Split Feasibility and Split Equality Problems

Lecture: Algorithms for Compressed Sensing

1 Sparsity and l 1 relaxation

Dual Proximal Gradient Method

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

arxiv: v1 [math.oc] 20 Jun 2014

Support Vector Machines and Kernel Methods

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems?

Convex Optimization Notes

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

You should be able to...

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Line Search Methods for Unconstrained Optimisation

Auxiliary-Function Methods in Optimization

arxiv: v2 [math.oc] 21 Nov 2017

Convex Optimization and Modeling

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE

On the order of the operators in the Douglas Rachford algorithm

Implicit Euler Numerical Simulation of Sliding Mode Systems

Convergence rate analysis for averaged fixed point iterations in the presence of Hölder regularity

Transcription:

Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan. 2017

Proximal splitting algorithms A zoo of methods in the literature ADMM ISTA Douglas-Rachford PDFB Chambolle-Pock This is the primal-dual forward-backward algorithm I proposed, sometimes called Condat-Vu algorithm 1 / 39

Goal We consider the problem ( MX ) Find x 2 Arg min (x) = m=1 g m (L m x) where the U m and X are real Hilbert spaces, the functions g m are convex, from U m to R [ {+1}, the L m are linear operators from X to U m. 2 / 39

Goal We consider the problem ( MX ) Find x 2 Arg min (x) = m=1 g m (L m x) We want full splitting, with individual activation of L m, L m, the gradient or proximity operator of g m. 3 / 39

Goal We consider the problem ( MX ) Find x 2 Arg min (x) = m=1 g m (L m x) We want full splitting, with individual activation of L m, L m, the gradient or proximity operator of g m. no implicit operation (inner loop or linear system to solve) 3 / 39

The subdifferential @f : X! 2 X @f (x) is the set of gradients of the affine minorants of f at x. f is smooth at x! @f (x) ={rf (x)}. 4 / 39

Fermat s rule ( MX ) x 2 Arg min (x) = m=1 g m (L m x) (x) 0 2 @ ( x) x Pierre de Fermat, 1601-1665 5 / 39

Fermat s rule ( MX ) x 2 Arg min (x) = m=1 g m (L m x) (x) 0 2 @ ( x) x 0 2 MX L m@g m (L m x) Pierre de Fermat, 1601-1665 m=1 Here some qualification constraints are hidden 5 / 39

The proximity operator prox f : X! X, x 7! arg min z2x f (z)+ 1 2 kz xk2 6 / 39

The proximity operator prox f : X! X, x 7! arg min z2x f (z)+ 1 2 kz xk2 Explicit forms of prox f for a large class of functions. f (x) = x prox f (x) = sgn(x) max( x 1, 0) 6 / 39

The proximity operator prox f : X! X, x 7! arg min z2x f (z)+ 1 2 kz xk2 8 > 0, prox f = (Id + @f ) 1 7 / 39

Iterative algorithms Principle: design an algorithm which iterates x (i+1) = T (x (i) ) for some operator T : X! X 8 / 39

Iterative algorithms Principle: design an algorithm which iterates x (i+1) = T (x (i) ) for some operator T : X! X Convergence: kx (i) xk!0, for some solution x. 8 / 39

Iterative algorithms Principle: design an algorithm which iterates x (i+1) = T (x (i) ) for some nonexpansive operator T : X! X i.e. 8x, y, kt (x) T (y)k applekx yk, and such that the set S of solutions is the set of fixed points of T, i.e. T ( x) = x. Here replace y by a fixed point and you get Fejer-monotonicity: at every iteration you get closer to the solution set 9 / 39

Iterative algorithms But nonexpansiveness is not sufficient : for instance, X = R 2 and T is a rotation. x (1) x (2) x x (3) 10 / 39

Iterative algorithms But nonexpansiveness is not sufficient : for instance, X = R 2 and T is a rotation. x (1) we need a bit more than nonexpansiveness. x (2) x x (3) "firmly nonexpansive" = 1/2-averaged 10 / 39

Iterative algorithms But nonexpansiveness is not sufficient : for instance, X = R 2 and T is a rotation. Definition: T is -averaged if T = T 0 + (1 )Id, for some 0 < < 1 and nonexpansive op. T 0. x (1) x (2) x (3) x 11 / 39

Iterative algorithms Theorem (Krasnoselskii Mann) If T is averaged, for some 0 < < 1, and a fixed point of T exists, then the iteration x (i+1) = T (x (i) ) converges to some fixed point x of T. 12 / 39

Over-relaxation An algorithm: w (i+1) = T (w (i) ) with T -averaged w (i) T w (i+1) the solution is around here! Its over-relaxed variant: $ w (i+ 1 2 ) = T (w (i) ) w (i+1) = w (i) + (w (i+ 1 2 ) w (i) ) T with 1 < < 1/ w (i) w (i+ 1 2 ) w (i+1) 13 / 39

Over-relaxation An algorithm: w (i+1) = T (w (i) ) with T -averaged w (i) T w (i+1) a firmly nonexpansive operator can be over-relaxed with 1 < < 2 T w (i) w (i+ 1 2 ) w (i+1) 14 / 39

Goal ( MX ) Find x 2 Arg min (x) = m=1 g m (L m x) by iterating x (i+1) = T (x (i) ) for an -averaged operator T. 15 / 39

Goal ( MX ) Find x 2 Arg min (x) = m=1 g m (L m x) by iterating x (i+1) = T (x (i) ) for an -averaged operator T. How to build T from the L m, L m, gradient or proximity operator of g m? 15 / 39

Forward-backward splitting Find x 2 Arg min where h is smooth with n o h(x) + f (x) -Lipschitz cont. gradient. the forward-backward iteration x (i+1) = prox f x (i) rh(x (i) ) converges to a solution x, if 0 < < 2. 16 / 39

Forward-backward splitting Find x 2 Arg min where h is smooth with n o h(x) + f (x) -Lipschitz cont. gradient. $ the over-relaxed forward-backward iteration x (i+ 1 2 ) = prox f x (i) rh(x (i) ) x (i+1) = x (i) + (x (i+ 1 2 ) x (i) ) converges to a solution x, if 0 < < 2,1apple < 2 2. 17 / 39

Forward-backward splitting Find x 2 Arg min where h is smooth with n o h(x) + f (x) -Lipschitz cont. gradient. $ the over-relaxed forward-backward iteration x (i+ 1 2 ) = prox f x (i) rh(x (i) ) x (i+1) = x (i) + (x (i+ 1 2 ) x (i) ) converges to a solution x, if 0 < < 2,1apple < 2 2. For instance, = 1, = 1.49 17 / 39

Minimization of 2 functions Find x 2 Arg min n o f (x) + g(x) where f and g have simple prox.! calls to prox f and prox g. 18 / 39

Minimization of 2 functions Find x 2 Arg min n o f (x) + g(x) find x 2 X such that 0 2 @f (x) + @g(x) 19 / 39

Minimization of 2 functions Find x 2 Arg min n o f (x) + g(x) find x 2 X such that 0 2 @f (x) + @g(x) find (x, u) 2 X X such that u 2 @f (x) u 2 @g(x) The dual variable u can just be viewed as an auxiliary variable. 19 / 39

Minimization of 2 functions Find x 2 Arg min n o f (x) + g(x) find x 2 X such that 0 2 @f (x) + @g(x) find (x, u) 2 X X such that u 2 @f (x) x 2 (@g) 1 (u) 20 / 39

Minimization of 2 functions Find x 2 Arg min n o f (x) + g(x) find x 2 X such that 0 2 @f (x) + @g(x) find (x, u) 2 X X such that 0 2 @f (x) + u 0 2 (@g) 1 (u) x 21 / 39

Minimization of 2 functions Find x 2 Arg min n o f (x) + g(x) find x 2 X such that 0 2 @f (x) + @g(x) find (x, u) 2 X X such that 0 @f (x) + u 2 0 x + (@g) 1 (u) 22 / 39

Primal-dual monotone inclusion Find x 2 Arg min n o f (x) + g(x) find x 2 X such that 0 2 @f (x) + @g(x) find (x, u) 2 X X such that 0 @f (x) + u 2 0 x + (@g) 1 (u) maximally monotone operator 22 / 39

The proximal point algorithm To solve the monotone inclusion 0 2 M( w), the proximal point algorithm is: w (i+1) =(P 1 M + Id) 1 (w (i) ), 0 2 M(w (i+1) )+P(w (i+1) w (i) ) Convergence, for any linear, self-adjoint, strongly positive, P. With w=(x,u), be careful that convergence is on the pair (x,u) and with respect to a distorted <.,P.> norm. So, the variable x alone does not get closer to the solution at every iteration, it can oscillate. 23 / 39

Douglas-Rachford splitting design an algorithm such that, 8i 2 N, 0 2 0 @f (x (i+1) ) + u (i+1) x (i+1) +(@g) 1 (u (i+1) ) +?? x (i+1)?? u (i+1) x (i) u (i) We need to find this linear operator P to have a primal-dual PPA like on the previous slide 24 / 39

Douglas-Rachford algorithm 0 2 0 design an algorithm such that, 8i 2 N, @f (x (i+1) ) + u (i+1) 1 x (i+1) + @g (u (i+1) + Id ) Id Id x (i+1) Id u (i+1) x (i) u (i) Douglas Rachford iteration: $ x (i+1) = prox f x (i) u (i) u (i+1) = prox g / u(i) + 1 (2x (i+1) x (i) ) The linear operator in orange is positive, not strongly positive, but the convergence proofs can be extended to this case. 25 / 39

Douglas-Rachford algorithm 0 2 0 design an algorithm such that, 8i 2 N, @f (x (i+1) ) + u (i+1) 1 x (i+1) + @g (u (i+1) + Id ) Id Id x (i+1) Id u (i+1) x (i) u (i) 6 4 Douglas Rachford iteration: x (i+1) = prox f s (i) z (i+1) = prox g 2x (i+1) s (i) s (i+1) = s (i) + z (i+1) x (i+1) 26 / 39

Douglas-Rachford algorithm Over-relaxed Douglas Rachford iteration: x (i+1) = prox f s (i) 6 4 z (i+1) = prox g 2x (i+1) s (i) s (i+1) = s (i) + (z (i+1) x (i+1) ) Convergence with 1 apple < 2. take = 1.9! Over-relaxation does not cost anything here! 27 / 39

Douglas-Rachford algorithm, version 2 Switching the update order! different iteration: 0 @f (x (i+1) ) + u (i+1) 1 2 0 x (i+1) + @g (u (i+1) + Id +Id x (i+1) ) +Id Id u (i+1) x (i) u (i) 28 / 39

Douglas-Rachford algorithm, version 2 Switching the update order! different iteration: 0 @f (x (i+1) ) + u (i+1) 1 2 0 x (i+1) + @g (u (i+1) + Id +Id x (i+1) ) +Id Id u (i+1) x (i) u (i) switching the roles of the primal and dual variables / problems 28 / 39

Douglas-Rachford algorithm, version 2 Switching the update order! different iteration: 0 @f (x (i+1) ) + u (i+1) 1 2 0 x (i+1) + @g (u (i+1) + Id +Id x (i+1) ) +Id Id u (i+1) x (i) u (i) $ u (i+1) = prox g / u(i) + 1 x (i) x (i+1) = prox f x (i) (2u (i+1) u (i) ) 29 / 39

ADMM Switching the update order! different iteration: 0 @f (x (i+1) ) + u (i+1) 1 2 0 x (i+1) + @g (u (i+1) + Id +Id x (i+1) ) +Id Id u (i+1) x (i) u (i) $ u (i+1) = prox g / u(i) + 1 x (i) x (i+1) = prox f x (i) (2u (i+1) u (i) ) 6 4 x (i) = prox f z (i) u (i) z (i+1) = prox g x (i) + u (i) This is ADMM u (i+1) = u (i) + 1 (x (i) z (i+1) ) 29 / 39

ADMM Switching the update order! different iteration: 0 @f (x (i+1) ) + u (i+1) 1 2 0 x (i+1) + @g (u (i+1) + Id +Id x (i+1) ) +Id Id u (i+1) 6 4 z (i+1) = prox g s (i) x (i+1) = prox f 2z (i+1) s (i) s (i+1) = s (i) + x (i+1) z (i+1) x (i) u (i) This is Douglas- Rachford like in slide 26, just with f and g exchanged. switching the roles of f and g in Douglas Rachford 30 / 39

Generalized Douglas-Rachford Find ( x, z) 2 Arg min (x,z) n o f (x) + g(z) s.t. Lx + Kz =0 6 4 Douglas Rachford iteration: x (i+1) 2 arg min x f (x) + 1 2 klx s(i) k 2 z (i+1) 2 arg min z g(z) + 1 2 kkz + (2Lx (i+1) s (i) )k 2 s (i+1) = s (i) (Lx (i+1) + Kz (i+1) ) take = 1.9! The ADMM is usually expressed with 1 or 2 linear operators, L and K here. We can write it in this completely equivalent Douglas- Rachford form. The advantage is that we can easily over-relax! 31 / 39

Chambolle-Pock algorithm Find x 2 Arg min n o f (x) + g(lx) 0 2 0 Design an algorithm such that, 8i 2 N, @f (x (i+1) ) + L u (i+1) 1 Lx (i+1) + @g (u (i+1) + Id L x (i+1) ) L 1 Id u (i+1) x (i) u (i) Once we understand that Douglas-Rachford is just a primal-dual PPA, there is no price to pay for introducing a linear operator L in the problem. It is the same as the L in the previous slide (with K=-Id) but this time we split. Other said, the Chambolle-Pock algorithm is a preconditioned ADMM/Douglas-Rachford. 32 / 39

Chambolle-Pock algorithm Find x 2 Arg min n o f (x) + g(lx) Chambolle Pock algorithm: $ x (i+1) = prox f x (i) L u (i) u (i+1) = prox g u (i) + L(2x (i+1) x (i) ) 33 / 39

Chambolle-Pock algorithm Find x 2 Arg min n o f (x) + g(lx) Chambolle Pock algorithm: $ x (i+1) = prox f x (i) L u (i) u (i+1) = prox g u (i) + L(2x (i+1) x (i) ) Convergence if klk 2 apple 1 set =1/( klk 2 ) This condition is proved in my paper JOTA 13 This way, there is only one parameter, tau, to tune, and we recover Douglas-Rachford exactly when L=Id (slide 25) 33 / 39

Chambolle-Pock algorithm Find x 2 Arg min n o f (x) + g(lx) Over-relaxed Chambolle Pock algorithm: x (i+ 1 2 ) = prox f x (i) L u (i) u (i+ 1 2 ) = prox g u (i) + L(2x (i+ 1 2 ) x (i) ) 6 4 x (i+1) = x (i) + (x (i+ 1 2 ) x (i) ) u (i+1) = u (i) + (u (i+ 1 2 ) u (i) ) 34 / 39

Chambolle-Pock algorithm Find x 2 Arg min n o f (x) + g(lx) Over-relaxed Chambolle Pock algorithm: x (i+ 1 2 ) = prox f x (i) L u (i) u (i+ 1 2 ) = prox g u (i) + L(2x (i+ 1 2 ) x (i) ) 6 4 x (i+1) = x (i) + (x (i+ 1 2 ) x (i) ) u (i+1) = u (i) + (u (i+ 1 2 ) u (i) ) Convergence with 1 apple < 2. take = 1.9! 34 / 39

Preconditioned Chambolle-Pock algo. Find x 2 Arg min n o 1 2 kax yk2 + f (x) + g(lx) primal-dual PPA: 0 2 0 A Ax (i+1) A y + @f (x (i+1) ) + L u (i+1) Lx (i+1) + @g (u (i+1) ) 1 + Id A A L x (i+1) L 1 Id u (i+1) x (i) u (i) 35 / 39

Preconditioned Chambolle-Pock algo. Find x 2 Arg min n o 1 2 kax yk2 + f (x) + g(lx) primal-dual PPA: 0 2 0 A Ax (i+1) A y + @f (x (i+1) ) + L u (i+1) Lx (i+1) + @g (u (i+1) ) 1 + Id A A L x (i+1) L 1 Id u (i+1) x (i) u (i) convergence if < 1 1 kak 2 and = kak 2 klk 2 35 / 39

Preconditioned Chambolle-Pock algo. Find x 2 Arg min n o 1 2 kax yk2 + f (x) + g(lx) 6 4 preconditioned Chambolle Pock algorithm: x (i+ 1 2 ) = prox f x (i) L u (i) (A Ax (i) A y) u (i+ 1 2 ) = prox g u (i) + L(2x (i+ 1 2 ) x (i) ) x (i+1) = x (i) + (x (i+ 1 2 ) x (i) ) u (i+1) = u (i) + (u (i+ 1 2 ) u (i) ) This is exactly the algorithm I proposed (JOTA 13), and exactly to solve this kind of regularized inverse problems. But if you view it as a preconditioned Chambolle-Pock algorithm instead of a primal-dual forward-backward, you can over-relax more! 36 / 39

Preconditioned Chambolle-Pock algo. Find x 2 Arg min n o 1 2 kax yk2 + f (x) + g(lx) 6 4 preconditioned Chambolle Pock algorithm: x (i+ 1 2 ) = prox f x (i) L u (i) (A Ax (i) A y) u (i+ 1 2 ) = prox g u (i) + L(2x (i+ 1 2 ) x (i) ) x (i+1) = x (i) + (x (i+ 1 2 ) x (i) ) u (i+1) = u (i) + (u (i+ 1 2 ) u (i) ) Convergence with 1 apple < 2. take = 1.9! With the conditions in my paper you cannot choose 1.9. This is specific to the smooth term in blue being quadratic. 36 / 39

Preconditioned PPA Find x 2 Arg min n o 1 2 kax yk2 + f (x) $ x (i+ 1 2 ) = prox f x (i) (A Ax (i) A y) x (i+1) = x (i) + (x (i+ 1 2 ) x (i) ) Convergence if < 1 and 1 apple < 2 kak2 Instead of = 1.99 kak 2, = 1, try = 0.99 kak 2, = 1.9! Here also, the forwardbackward can be viewed as just preconditioning the proximity operator. 37 / 39

Preconditioned PPA Find x 2 Arg min n o 1 2 kax yk2 + f (x) $ x (i+ 1 2 ) = prox f x (i) (A Ax (i) A y) x (i+1) = x (i) + (x (i+ 1 2 ) x (i) ) Convergence if < 1 and 1 apple < 2 kak2 Instead of = 1.99 kak 2, = 1, try = 0.99 kak 2, = 1.9! there is a continuum between these two regimes (proof unfinished) 37 / 39

Conclusion You can and should over-relax the algorithms, even more when the smooth function is quadratic. RELAX! 38 / 39

Conclusion You can and should over-relax the algorithms, even more when the smooth function is quadratic. No time to explain: Product-space trick to generalize g L to MX g m L m Applications to other splitting methods m=1 Design an algorithm = play LEGO... See the two bonus slides for the Chambolle-Pock algorithm (one of the two forms, the other one in my paper JOTA 13, algorithm 5.2). 39 / 39

The product-space trick minimize MX m=1 g m (L m x) minimize g(lx) with g : U 1 U M! R [ {+1} (z 1,..., z M ) 7! P M m=1 g m(z m ) and L : X! U 1 U M x 7! (L 1 x,..., L M x) 41 / 39

Chambolle-Pock algorithm ( MX ) minimize f (x) + m=1 g m (L m x) Over-relaxed Chambolle-Pock algo. Iterate, for i 2 N x (i+ 1 2 ) := prox f x (i) P M m=1 L mu m (i), x (i+1) := x (i) + (x (i+ 1 2 ) x (i) ), For m = 1,..., M, 1 6 6 4 4 u(i+ 2 ) m := prox g u (i) m m + L m (2x (i+ 1 2 ) x (i) ), u m (i+1) := u m (i) + (u (i+ 1 2 ) m u m (i) ). 42 / 39